HEALPix Aggregate

Aggregate data by HEALPix cells (batch processing)

Usage Example

See the main() function for CLI usage, or import functions directly for programmatic use.

Function Reference

Core Aggregation Functions

aggregate_by_sidecar() - Main aggregation function that merges sidecar mappings with original data and computes statistics by HEALPix cell
densify_healpix_aggregates() - Fills sparse HEALPix grid to include all cells (empty cells filled with NaN)

Sidecar Management

collect_sidecar_outputs() - Scans directory for sidecar files matching input file stem, parses metadata from filenames
validate_sidecar_metadata() - Validates .meta.json files and checks source_file consistency
extract_nside_from_filename() - Extracts nside parameter from filename using regex (fallback method)

File Operations

generate_output_filename() - Creates output filename following naming convention: <stem>-aggregated.<sidecar_suffix>.parquet
print_parquet_schema() - Displays parquet file schema and metadata
print_sidecar_summary() - Shows formatted table of available sidecars with statistics
print_dry_run_summary() - Preview of batch processing operations without execution

Batch Processing

process_single_sidecar() - Processes one sidecar file with full aggregation workflow
parse_arguments() - CLI argument parser with validation for batch mode options
main() - Entry point supporting single/batch processing with comprehensive error handling

Aggregation Functions

Available statistical functions in AGG_LOOKUP: - mean - Arithmetic mean (ignores NaN) - median - Median value (ignores NaN) - std - Standard deviation - min / max - Minimum/maximum values - mad - Median Absolute Deviation (robust statistic) - robust_std - MAD × 1.4826 (approximates std for normal distributions)

Batch Processing Features

New in this version: - --sidecar-index all - Process all sidecars in batch mode - --sidecar-index 0 1 2 - Process specific sidecar indices - --stop-on-error - Halt batch processing on first error (default: continue) - --list-sidecars --stats - Show sidecar statistics (row counts, unique cells) - --sidecar-schema INDEX - Display schema of specific sidecar - --dry-run - Preview operations without writing files - Comprehensive batch summary with success/error reporting - Metadata validation with lenient/strict modes