import healpyxel
from healpyxel import core
import pandas as pd
import numpy as np
print(f"healpyxel version: {healpyxel.__version__}")healpyxel version: 0.1.0
Import the healpyxel package and core utilities:
Test the core utility functions:
# Validate nside parameter
try:
core.validate_nside(64) # Valid power of 2
print("✓ nside=64 is valid")
except ValueError as e:
print(f"✗ {e}")
try:
core.validate_nside(100) # Invalid
except ValueError as e:
print(f"✓ Caught invalid nside: {e}")✓ nside=64 is valid
✓ Caught invalid nside: nside must be a power of 2, got 100
# Test MAD (Median Absolute Deviation)
data = np.array([1, 2, 3, 4, 5, 100]) # Outlier at 100
mad_value = core.mad(data)
print(f"MAD of {data}: {mad_value:.2f}")MAD of [ 1 2 3 4 5 100]: 1.50
# Test robust standard deviation
robust_std = core.robust_std(data)
normal_std = np.std(data)
print(f"Robust std: {robust_std:.2f}")
print(f"Normal std: {normal_std:.2f}")
print(f"Robust std is less affected by the outlier (100)")Robust std: 2.22
Normal std: 36.17
Robust std is less affected by the outlier (100)
Check available test data in the package:
from pathlib import Path
# Look for test data
test_data_dir = Path('../test_data')
if test_data_dir.exists():
print("Test data directory found!")
print(f"\nContents:")
for item in sorted(test_data_dir.iterdir()):
if item.is_file():
size_mb = item.stat().st_size / 1024 / 1024
print(f" {item.name}: {size_mb:.2f} MB")
elif item.is_dir():
n_files = len(list(item.glob('*')))
print(f" {item.name}/: {n_files} files")
else:
print("Test data directory not found. Run create_test_data.sh to generate test data.")Test data directory found!
Contents:
README.md: 0.00 MB
batches/: 10 files
regions/: 0 files
samples/: 3 files
validation/: 2 files
The package provides four CLI commands:
healpix_sidecar - Create sidecar files with HEALPix assignmentshealpix_aggregate - Aggregate data into HEALPix cellshealpix_accumulator - Stream large datasets with online statisticshealpix_finalize - Finalize accumulated state to statistical mapsCheck if commands are available:
import subprocess
commands = ['healpix_sidecar', 'healpix_aggregate', 'healpix_accumulator', 'healpix_finalize']
for cmd in commands:
result = subprocess.run(['which', cmd], capture_output=True, text=True)
if result.returncode == 0:
print(f"✓ {cmd}: {result.stdout.strip()}")
else:
print(f"✗ {cmd}: not found")✓ healpix_sidecar: /home/kidpixo/miniconda3/envs/mertis/bin/healpix_sidecar
✓ healpix_aggregate: /home/kidpixo/miniconda3/envs/mertis/bin/healpix_aggregate
✓ healpix_accumulator: /home/kidpixo/miniconda3/envs/mertis/bin/healpix_accumulator
✓ healpix_finalize: /home/kidpixo/miniconda3/envs/mertis/bin/healpix_finalize
If test data is available, let’s try a quick aggregation:
# Check for sample data
sample_file = test_data_dir / 'sample_001.parquet'
if sample_file.exists():
print(f"Loading sample: {sample_file.name}")
df = pd.read_parquet(sample_file)
print(f"\nShape: {df.shape}")
print(f"\nColumns: {list(df.columns)}")
print(f"\nFirst few rows:")
display(df.head())
# Check for lat/lon columns
if 'latitude' in df.columns and 'longitude' in df.columns:
print(f"\n✓ Found latitude/longitude columns for HEALPix conversion")
print(f" Lat range: [{df['latitude'].min():.2f}, {df['latitude'].max():.2f}]")
print(f" Lon range: [{df['longitude'].min():.2f}, {df['longitude'].max():.2f}]")
else:
print("Sample data not found. Generate test data first:")
print(" cd .. && bash create_test_data.sh")Sample data not found. Generate test data first:
cd .. && bash create_test_data.sh
Below is a complete CLI regridding example that mirrors the notebook workflow using sample_50k.parquet, with nside=32. It writes outputs into a dedicated folder to keep the sidecar index stable.
Save the script as examples/cli_regrid_sample_50k.sh and run it from the repo root:
#!/usr/bin/env bash
set -euo pipefail
ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
INPUT="$ROOT_DIR/test_data/samples/sample_50k.parquet"
OUT_DIR="$ROOT_DIR/test_data/derived/cli_quickstart"
NSIDE=32
MODE=fuzzy
LON_CONVENTION=0_360
mkdir -p "$OUT_DIR"
# 1) Create HEALPix sidecar
healpix_sidecar \
--input "$INPUT" \
--nside "$NSIDE" \
--mode "$MODE" \
--lon-convention "$LON_CONVENTION" \
--output_dir "$OUT_DIR"
# 2) Aggregate (densified) regridded map
healpix_aggregate \
--input "$INPUT" \
--sidecar-dir "$OUT_DIR" \
--sidecar-index 0 \
--aggregate \
--columns r1050 \
--aggs mean median std mad robust_std \
--min-count 1 \
--densify \
--output "$OUT_DIR/sample_50k_nside${NSIDE}_r1050_aggregate.parquet"If you want to aggregate a different column, replace
r1050in the script.
For more detailed examples, see:
01_sidecar.ipynb - HEALPix assignment02_aggregate.ipynb - Statistical aggregation03_accumulator.ipynb - Large dataset processing04_finalize.ipynb - Final map productsOr run the CLI commands directly: