Healpyxel
  • Home
  • Quickstart
  • Source Code
  • Report a Bug
  1. Start
  • Start
  • Examples
    • Quickstart
    • Visualization
    • Visualization : Gaussian PSF - WIP!
    • Accumulation - WIP!
    • Streaming - WIP!
  • API Reference
    • Package Structure
    • HEALPix Sidecar
    • HEALPix Aggregate
    • HEALPix Accumulator
    • HEALPix Finalize
    • Generate HEALPix sidecar
    • Optional Dependencies
    • Geospatial

On this page

  • Features
  • Installation
    • Optional Dependencies
  • Quick Start
    • 1. Split: Map observations to HEALPix cells
    • 2. Apply: Aggregate values per HEALPix cell
    • 3. Combine: Attach HEALPix cell geometry
    • Optional: Cache geometries
    • Batch Processing
    • Streaming Processing - WORK IN PROGRESS
  • CLI Workflow
    • 1) Create HEALPix sidecar(s)
    • 2) Aggregate sparse regridded map(s)
    • 3) Aggregate densified regridded map(s)
    • 4) Convert aggregated maps to GeoParquet
  • Python API
  • Developed for MESSENGER/MASCS
    • Useful Healpix data for Moon Venus Mercury
  • License
  • Report an issue

Other Formats

  • CommonMark

Start

HEALPix-based spatial aggregation for planetary science data

Features

  • Batch Processing: Classical split-apply-combine for static datasets
  • Streaming Aggregation: Incremental statistics for growing datasets
  • Memory Efficient: Process datasets larger than RAM
  • Native Resolution Mosaics: Fine-grained HEALPix grids
  • Flexible Statistics: Mean, median, std, MAD, robust_std, percentiles

Installation

pip install healpyxel

Optional Dependencies

# For geospatial operations (sidecar generation)
pip install healpyxel[geospatial]

# For streaming/incremental statistics (accumulator)
pip install healpyxel[streaming]

# For visualization (maps, plots)
pip install healpyxel[viz]

# Development tools (nbdev, testing, linting)
pip install healpyxel[dev]

# All optional dependencies
pip install healpyxel[all]

Extras breakdown: - geospatial: geopandas, shapely, dask-geopandas, antimeridian (required for healpyxel_sidecar) - streaming: tdigest (percentile tracking in healpyxel_accumulator) - viz: matplotlib, scikit-image, skyproj (mapping workflows) - dev: All of the above + nbdev, pytest, black, ruff, mypy - all: Installs geospatial + streaming + viz (excludes dev tools)

Quick Start

The healpyxel workflow implements spatial aggregation using three core steps:

1. Split: Map observations to HEALPix cells

You start with observation data (GeoParquet): geometries + values per record. A sidecar file links each observation (source_id) to HEALPix cells at your target resolution (nside).

Data contract: - Input: observations.parquet → columns: source_id, value, geometry - Output: observations-sidecar.parquet → columns: source_id, healpix_id, weight (fuzzy mode only)

CLI: healpyxel_sidecar --input observations.parquet --nside 64 128 --mode fuzzy

✓ Saved: Sidecar.svg (74732 bytes)

Sidecar.svg
True

2. Apply: Aggregate values per HEALPix cell

Group all observations assigned to the same cell and compute statistics (median, mean, MAD, robust_std, etc.).

Data contract: - Input: observations.parquet + sidecar file - Output: observations-aggregated.parquet → columns: healpix_id, value_median, value_robust_std, …

CLI: healpyxel_aggregate --input observations.parquet --sidecar-dir output/ --columns value --aggs median robust_std

✓ Saved: Aggregate.svg (75340 bytes)

Aggregate.svg
True

3. Combine: Attach HEALPix cell geometry

Add polygon boundaries to aggregated cells (computed from healpix_id via healpy).

Data contract: - Input: observations-aggregated.parquet - Output: observations-aggregated.geo.parquet → adds column: geometry (HEALPix cell polygon)

CLI: healpyxel_to_geoparquet --aggregate-path observations-aggregated.parquet --output-dir output/

✓ Saved: Geometry.svg (69438 bytes)

Geometry.svg
True

Optional: Cache geometries

Pre-compute HEALPix cell boundaries for faster repeated use (especially for high nside).

CLI: healpyxel_cache --nside 64 128 --order nested --lon-convention 0_360

Batch Processing

see below

# 1. Generate HEALPix sidecar (SPLIT)
healpyxel_sidecar \
  --input observations.parquet \
  --nside 64 128 \
  --mode fuzzy \
  --output-dir output/

# 2. Aggregate by HEALPix cells (APPLY)
healpyxel_aggregate \
  --input observations.parquet \
  --sidecar-dir output/ \
  --sidecar-index 0 \
  --aggregate \
  --columns r750 r950 \
  --aggs median robust_std \
  --min-count 3

# 3. Convert to GeoParquet (for visualization)
healpyxel_to_geoparquet \
  --aggregate-path output/observations-aggregated.*.parquet \
  --output-dir output/ \
  --lon-convention -180_180

# 4. Cache HEALPix geometry (optional, speeds up visualization)
healpyxel_cache --nside 64 128 --order nested --lon-convention 0_360

Streaming Processing - WORK IN PROGRESS

# Day 1: Initialize accumulator
healpyxel_accumulate --input day001.parquet \
  --columns r750 r950 --state-output state_v001.parquet

# Day 2+: Incremental updates
healpyxel_accumulate --input day002.parquet \
  --columns r750 r950 \
  --state-input state_v001.parquet --state-output state_v002.parquet

# Finalize to statistics
healpyxel_finalize --state state_v030.parquet --output mosaic.parquet \
  --percentiles 25 50 75 --densify --nside 512

CLI Workflow

This section explan a full CLI workflow on a test sample 50k data, including the outputs produced at each stage.

The same workflow is done completely in python with healpyxel API in Examples>Visualization section.

All input/output are in this repsitory:

  • script is at examples/cli_regrid_sample_50k.sh
  • input are at test_data/samples/sample_50k.parquet
  • ouput are in test_data/derived/cli_quickstart
Original files excerpt (transposed for clarity):
Sidecar Metadata:
Unique sources: 49988
  Unique HEALPix cells: 10860
  Total assignments: 54931

  Sidecar Data:
lat_center lon_center surface width length ang_incidence ang_emission ang_phase azimuth geometry
0 5.186568 272.40450 1567133.4 1006.63727 1982.1799 43.049232 34.814793 77.85916 109.019295 POLYGON ((272.39758 5.16433, 272.41583 5.18307...
1 -60.939438 71.77686 13564574.0 4064.49850 4249.2210 64.178116 37.690910 101.84035 111.930336 POLYGON ((71.72596 -60.89612, 71.69186 -60.963...
2 5.613894 54.23045 1755143.5 1013.51886 2204.9104 53.815990 24.053764 77.86254 99.559425 POLYGON ((54.24406 5.63592, 54.22025 5.62014, ...
3 -41.672714 324.49740 23309360.0 6511.20950 4558.0470 52.841824 46.625698 99.40995 121.833626 POLYGON ((324.54932 -41.70964, 324.56927 -41.6...
source_id healpix_id weight
0 0 7943 1.0
1 1 8287 1.0
2 2 5819 1.0
3 3 11685 1.0
4 4 3618 1.0
5 5 3805 1.0
6 6 9522 1.0
7 7 10975 1.0
8 8 1820 1.0
9 9 3710 1.0

1) Create HEALPix sidecar(s)

Those files link each row in the input parquet file to the HEALPix cells at the requested nside resolution; see Useful Healpix data for Moon Venus Mercury for some cells data. Refer to healpyxel_sidecar --help for full options. The --mode flag is especially important: - fuzzy: assign each input record to every cell it touches - strict: assign only records fully contained within a cell

healpyxel_sidecar \
  --input "test_data/samples/sample_50k.parquet" \
  --nside 32 64 \
  --mode fuzzy \
  --lon-convention 0_360 \
  --output_dir "test_data/derived/cli_quickstart"

Outputs

  • sample_50k.cell-healpix_assignment-fuzzy_nside-32_order-nested.parquet
  • sample_50k.cell-healpix_assignment-fuzzy_nside-32_order-nested.meta.json
  • sample_50k.cell-healpix_assignment-fuzzy_nside-64_order-nested.parquet
  • sample_50k.cell-healpix_assignment-fuzzy_nside-64_order-nested.meta.json
Nside 32: 54931 assignments, 10860 unique cells
source_id healpix_id weight
0 0 7943 1.0
1 1 8287 1.0
2 2 5819 1.0
3 3 11685 1.0
4 4 3618 1.0
# original_df['geometry'] = original_df.geometry.apply(
#     lambda geom: geom.buffer(0) if geom and not geom.is_valid else geom
# )

2) Aggregate sparse regridded map(s)

Now we need to aggregate initial data on the cells, refer to healpyxel_aggregate --help for all the option. Some flag are particurarly useful:

  • --schema : show input parquet schema, useful to look which data are there to aggregate.
  • --list-sidecars : list available sidecar for an input files, they are addressed by index.
  • --sidecar-schema INDEX : show schema for specific sidecar file
  • --aggs mean : aggregation functions (choices: mean, median, std, min, max, mad, robust_std).

Example :

  • input file contains columns A (you can check it with healpyxel_aggregate -i input --schema)
  • --agg mean median std
  • this produce un output the columns A_mean, A_median and A_std created appling those function on all input files rows listd in the sidecar file for a single HEALPix cell
healpyxel_aggregate \
  --input "test_data/samples/sample_50k.parquet" \
  --sidecar-dir "test_data/derived/cli_quickstart" \
  --sidecar-index all \
  --aggregate \
  --columns r1050 \
  --aggs mean median std mad robust_std \

This produces sparse output : only cells with actual values are written in ouput.

Outputs - sample_50k-aggregated.cell-healpix_assignment-fuzzy_nside-32_order-nested.parquet - sample_50k-aggregated.cell-healpix_assignment-fuzzy_nside-32_order-nested.meta.json - sample_50k-aggregated.cell-healpix_assignment-fuzzy_nside-64_order-nested.parquet - sample_50k-aggregated.cell-healpix_assignment-fuzzy_nside-64_order-nested.meta.json

Nside 32: 10860 unique cells
r1050_mean r1050_median r1050_std r1050_mad r1050_robust_std n_sources
healpix_id
0 0.048616 0.047857 0.003759 0.002672 0.003962 4
1 0.051467 0.052283 0.002976 0.001888 0.002799 6
2 0.049697 0.049118 0.003637 0.002289 0.003394 6
3 0.059066 0.063241 0.007149 0.001711 0.002537 3
4 0.051262 0.051523 0.006552 0.002510 0.003721 9

3) Aggregate densified regridded map(s)

healpyxel_aggregate \
  --input "test_data/samples/sample_50k.parquet" \
  --sidecar-dir "test_data/derived/cli_quickstart" \
  --sidecar-index all \
  --aggregate \
  --columns r1050 \
  --aggs mean median std mad robust_std \
  --densify

This produces dense output : all HEALPix cells are writeen in ouput, empty one as filled with Nan.

Outputs - sample_50k-aggregated-densified.cell-healpix_assignment-fuzzy_nside-32_order-nested.parquet - sample_50k-aggregated-densified.cell-healpix_assignment-fuzzy_nside-32_order-nested.meta.json - sample_50k-aggregated-densified.cell-healpix_assignment-fuzzy_nside-64_order-nested.parquet - sample_50k-aggregated-densified.cell-healpix_assignment-fuzzy_nside-64_order-nested.meta.json

Nside 32: 12288 unique cells <- densified , 1428 additional empty cells filled in by densification
r1050_mean r1050_median r1050_std r1050_mad r1050_robust_std n_sources
healpix_id
29 0.046644 0.046644 0.000000 0.000000 0.000000 1.0
30 NaN NaN NaN NaN NaN NaN
31 0.040205 0.040986 0.009636 0.007137 0.010581 4.0
32 0.054966 0.054413 0.003162 0.002148 0.003184 8.0
33 0.054424 0.055591 0.004131 0.003358 0.004979 8.0
34 0.057463 0.057463 0.001704 0.001704 0.002526 2.0
35 0.050470 0.057635 0.017546 0.004688 0.006951 4.0
36 0.054052 0.053640 0.004915 0.002833 0.004200 6.0
37 0.056132 0.056019 0.002281 0.002128 0.003155 4.0
38 0.060452 0.060592 0.002127 0.001878 0.002784 4.0
39 0.060708 0.070030 0.014303 0.001562 0.002316 3.0
40 0.041480 0.041480 0.000000 0.000000 0.000000 1.0
41 0.028736 0.028736 0.000000 0.000000 0.000000 1.0
42 0.070738 0.070655 0.009835 0.011921 0.017674 3.0
43 0.062058 0.061658 0.009862 0.008409 0.012467 8.0
44 NaN NaN NaN NaN NaN NaN
45 0.053895 0.054106 0.001107 0.001026 0.001521 3.0

4) Convert aggregated maps to GeoParquet

This convert each aggregated file to a geoparquet.

for f in "test_data/derived/cli_quickstart"/*-aggregated*parquet; do
  healpyxel_to_geoparquet -a "$f" -d "test_data/derived/cli_quickstart" -l -180_180 -f
done

Outputs - sample_50k-aggregated-densified.cell-healpix_assignment-fuzzy_nside-32_order-nested.geo.parquet - sample_50k-aggregated-densified.cell-healpix_assignment-fuzzy_nside-64_order-nested.geo.parquet - sample_50k-aggregated.cell-healpix_assignment-fuzzy_nside-32_order-nested.geo.parquet - sample_50k-aggregated.cell-healpix_assignment-fuzzy_nside-64_order-nested.geo.parquet

Nside 32: 10860 unique cells
geometry r1050_mean r1050_median r1050_std r1050_mad r1050_robust_std n_sources
healpix_id
0 POLYGON ((45 2.38802, 43.59375 1.19375, 45 0, ... 0.048616 0.047857 0.003759 0.002672 0.003962 4
1 POLYGON ((46.40625 3.58332, 45 2.38802, 46.406... 0.051467 0.052283 0.002976 0.001888 0.002799 6
2 POLYGON ((43.59375 3.58332, 42.1875 2.38802, 4... 0.049697 0.049118 0.003637 0.002289 0.003394 6
3 POLYGON ((45 4.78019, 43.59375 3.58332, 45 2.3... 0.059066 0.063241 0.007149 0.001711 0.002537 3
4 POLYGON ((47.8125 4.78019, 46.40625 3.58332, 4... 0.051262 0.051523 0.006552 0.002510 0.003721 9

Each cell is linked to some initial observation via the sidecar file, we can see here the distribution of one value in all the cell

We can visualize each pixel with one of the aggregator function output available in healpyxel_aggregate :

  • mean: Arithmetic mean
  • median: Median (50th percentile)
  • std: Standard deviation
  • min: Minimum value
  • max: Maximum value
  • mad: Median Absolute Deviation (robust to outliers)
  • robust_std: MAD × 1.4826 (equivalent to standard deviation for normal distributions, robust to outliers)

Each function generates one output column per input value column, named <column>_<agg> (e.g., r1050_mean, r1050_median, r1050_mad). Robust statistics (mad, robust_std) are recommended for outlier-prone datasets.

Python API

Minimal end-to-end python API example, each level works on previous one output.

  • initial data →
  • sidecar : generate data <> healpix grid connections →
  • aggregate →
  • attach geometry →
  • accumulate →
  • finalize

minimal code, a more detailed explanation is in Examples>Visualization section.


from healpyxel import sidecar, aggregate, accumulator, finalize
from healpyxel.geospatial import healpix_to_geodataframe

# Minimal API sanity checks (nbdev-friendly)
assert hasattr(sidecar, "generate")
assert hasattr(aggregate, "by_sidecar")
assert hasattr(accumulator, "update_state")
assert hasattr(finalize, "from_state")
assert callable(healpix_to_geodataframe)

# 1) Sidecar (split)
sidecar_df = sidecar.generate(
    gdf,
    nside=64,
    mode="fuzzy",
    order="nested",
    lon_convention="0_360",
)

# 2) Aggregate (apply)
agg_df = aggregate.by_sidecar(
    original=df,
    sidecar=sidecar_df,
    value_columns=["r750", "r950"],
    aggs=["median", "robust_std"],
    min_count=3,
)

# 2b) Attach geometry to step-2 products (geospatial)
cells_gdf = healpix_to_geodataframe(
    nside=64,
    order="nested",
    lon_convention="0_360",
    pixels=agg_df["healpix_id"].to_numpy(),
    fix_antimeridian=True,
    cache_mode="use",
).reset_index(drop=False)

agg_geo_gdf = cells_gdf.merge(agg_df, on="healpix_id", how="left")

# 3) Accumulator (streaming apply)
state_df = accumulator.update_state(
    batch=df,
    sidecar=sidecar_df,
    value_columns=["r750", "r950"],
    state=None,
)

# 4) Finalize (combine)
final_df = finalize.from_state(
    state=state_df,
    aggs=["mean", "std", "median", "robust_std"],
)

Developed for MESSENGER/MASCS

This package was developed to process spectral observations from the MESSENGER/MASCS instrument studying Mercury’s surface. The workflow handles:

  • Millions of observations with complex footprint geometries
  • Multi-spectral reflectance data (VIS + NIR)
  • Streaming data from ongoing missions
  • Native resolution mosaics (sub-footprint sampling)

While designed for MASCS, healpyxel is general-purpose and works with any planetary science dataset in GeoParquet format.

Useful Healpix data for Moon Venus Mercury

Number of Cells Cell Angular Size (deg) Mercury Cell Size (km) Moon Cell Size (km) Venus Cell Size (km)
nside
1 12 58.632 2496.610 1777.928 6192.969
2 48 29.316 1248.305 888.964 3096.484
4 192 14.658 624.153 444.482 1548.242
8 768 7.329 312.076 222.241 774.121
16 3,072 3.665 156.038 111.120 387.061
32 12,288 1.832 78.019 55.560 193.530
64 49,152 0.916 39.010 27.780 96.765
128 196,608 0.458 19.505 13.890 48.383
256 786,432 0.229 9.752 6.945 24.191
512 3,145,728 0.115 4.876 3.473 12.096
1,024 12,582,912 0.057 2.438 1.736 6.048
2,048 50,331,648 0.029 1.219 0.868 3.024
4,096 201,326,592 0.014 0.610 0.434 1.512
8,192 805,306,368 0.007 0.305 0.217 0.756

License

Apache 2.0

  • Report an issue