Healpyxel
  • Home
  • Quickstart
  • Source Code
  • Report a Bug
  1. API Reference
  2. Geospatial
  • Start
  • Examples
    • Quickstart
    • Visualization
    • Visualization : Gaussian PSF - WIP!
    • Accumulation - WIP!
    • Streaming - WIP!
  • API Reference
    • Package Structure
    • HEALPix Sidecar
    • HEALPix Aggregate
    • HEALPix Accumulator
    • HEALPix Finalize
    • Generate HEALPix sidecar
    • Optional Dependencies
    • Geospatial

On this page

  • Core helpers
  • Caching & XDG Configuration
    • init_user_config
  • Cache Management Core Logic
    • manage_healpix_cache
  • Caching Tests
    • test_cache_verification_corrupt_nans
    • test_cache_verification_incomplete
    • test_cache_verification_missing
    • test_cache_verification_complete
    • test_cache_mode_require_missing_cache
    • test_spherical_conversion
    • test_cache_key_generation
    • test_xdg_precedence
  • HEALPix Grid Caching System
    • Key Features
    • Cache Modes
    • Production Pipeline Workflow
    • Directory Resolution Precedence
    • Configuration File
    • Environment Variables
    • Cache Management Commands
    • Troubleshooting
  • Polygon creation and antimeridian handling
  • Main API: build GeoDataFrame and save geoparquet
    • healpix_to_geodataframe
    • save_healpix_to_geoparquet
    • export_healpix_to_geotiff
  • Quick test
  • CLI with Metadata Auto-Detection
    • main
  • Comparison: Old vs. New UX
  • Implementation: Why This Approach Wins
  • Summary: Metadata Auto-Detection Workflow
    • What Changed
    • Code Changes
    • Usage
    • Testing ✓
  • Report an issue

Other Formats

  • CommonMark
  1. API Reference
  2. Geospatial

Geospatial

HEALPix → vector utilities: produce polygon geometries for HEALPix cells and save as GeoParquet

Core helpers

Caching & XDG Configuration

HEALPix grids are expensive to compute. This module supports caching boundaries in parquet files using XDG Base Directory standards and a persistent configuration.

Precedence for directory resolution (highest to lowest): 1. CLI argument (e.g., --cache-dir /tmp) 2. Environment variable (e.g., HEALPYXEL_CACHE=/fast/disk) 3. XDG spec: $XDG_CACHE_HOME or $XDG_CONFIG_HOME 4. Fallback: ~/.cache/healpyxel/healpix_grids or ~/.config/healpyxel

Configuration file: $XDG_CONFIG_HOME/healpyxel/settings.ini (or ~/.config/healpyxel/settings.ini) - Controls precomputed nsides, antimeridian handling, cache location override - Auto-created on first use; can be edited manually


init_user_config

 init_user_config (config_dir:Optional[pathlib.Path]=None)

*Create default ~/.config/healpyxel/settings.ini if it doesn’t exist.

Args: config_dir: optional override for config directory

Returns: Path to config file (whether newly created or already existed)

Example: >>> config_file = init_user_config() >>> config_file.exists() True*

Cache Management Core Logic

Central dispatch for all cache operations: generate, list, clean, view configuration. Called by thin CLI wrapper in 05_cli.ipynb with no Click dependencies.


manage_healpix_cache

 manage_healpix_cache (action:str='list', nsides:Optional[List[int]]=None,
                       cache_dir:Optional[pathlib.Path]=None,
                       config_dir:Optional[pathlib.Path]=None,
                       force:bool=False)

*Core cache management logic with precedence awareness.

No Click dependencies; called by CLI wrapper in 05_cli.ipynb. Uses _get_cache_dir() and _get_config_dir() for proper precedence.

Args: action: ‘list’, ‘generate’, ‘verify’, ‘clean’, ‘info’, or ‘config’ nsides: list of nside values for ‘generate’ or ‘verify’ actions cache_dir: explicit CLI override (highest precedence) config_dir: explicit CLI override (highest precedence) force: whether to overwrite existing cache files during ‘generate’

Returns: dict with keys: ‘action’: str, action performed ‘cache_dir’: str, resolved cache directory ‘config_dir’: str, resolved config directory ‘status’: ‘ok’ or ‘error’ ‘count’/‘files’/‘deleted’/‘generated’/etc: action-specific data

Raises: ValueError for invalid action or missing required args*

Caching Tests

Verify XDG precedence logic and cache I/O roundtrip.


test_cache_verification_corrupt_nans

 test_cache_verification_corrupt_nans ()

Test cache verification with NaN values in coordinates.


test_cache_verification_incomplete

 test_cache_verification_incomplete ()

Test cache verification with incomplete cache (missing pixels).


test_cache_verification_missing

 test_cache_verification_missing ()

Test cache verification with missing cache file.


test_cache_verification_complete

 test_cache_verification_complete ()

Test cache verification with a complete valid cache.


test_cache_mode_require_missing_cache

 test_cache_mode_require_missing_cache ()

Verify that cache_mode=‘require’ raises ValueError when cache is missing.


test_spherical_conversion

 test_spherical_conversion ()

Verify spherical to lon/lat conversion.


test_cache_key_generation

 test_cache_key_generation ()

Verify cache key generation.


test_xdg_precedence

 test_xdg_precedence ()

Verify XDG directory resolution with full precedence.

HEALPix Grid Caching System

This module provides a robust caching system for HEALPix cell geometries to accelerate repeated conversions.

Key Features

  1. XDG Base Directory Compliant — Follows freedesktop.org standards for cross-platform cache storage
  2. Explicit Precedence — Clear resolution order: CLI arg > env var > config file > XDG defaults
  3. Spherical Coordinate Storage — Caches (theta, phi) radians in parquet for format-agnostic reuse
  4. Smart Subsetting — For sparse aggregates, loads only the required pixels from cache
  5. Strict Cache Modes — Prevents accidental full-grid computation with explicit cache policies

Cache Modes

The cache_mode parameter provides strict control over caching behavior:

Mode Behavior Use Case Safety Guarantee
use Opportunistic: load cache if available, compute missing pixels on demand Development, interactive analysis ⚠️ May trigger expensive computation if cache incomplete
require Strict: fail immediately if cache missing or incomplete CI/CD pipelines, production ETL ✅ Never computes full grid silently
off Ignore cache entirely, always compute from scratch Testing, benchmarking, one-off analysis ⚠️ Always pays full computation cost

Production Recommendation: Use cache_mode='require' in automated pipelines to prevent accidental multi-hour computations when cache is stale or missing.

Usage Examples:

# Development: opportunistic cache use (default)
healpyxel_to_geoparquet -a data.parquet

# Production: fail-fast if cache missing (recommended for CI/CD)
healpyxel_to_geoparquet -a data.parquet --cache-mode require

# Ignore cache entirely
healpyxel_to_geoparquet -a data.parquet --cache-mode off

Production Pipeline Workflow

For reliable CI/CD and production ETL, follow this pattern:

# 1. Generate cache for required nsides (run once or in setup stage)
healpyxel-cache --generate 256 --generate 512 --generate 1024

# 2. Verify cache integrity (recommended in CI)
healpyxel-cache --verify 256 --verify 512 --verify 1024

# 3. Process data with strict cache policy (fail if cache missing)
healpyxel_to_geoparquet -a batch_001.parquet --cache-mode require -n 256
healpyxel_to_geoparquet -a batch_002.parquet --cache-mode require -n 512
healpyxel_to_geoparquet -a batch_003.parquet --cache-mode require -n 1024

# 4. Cache becomes stale? Regenerate with --force
healpyxel-cache --generate 256 --force

Why This Matters: - cache_mode='use' (default) will silently compute 50M+ boundaries if cache is missing at nside=2048 - A sparse aggregate with 10 pixels at nside=2048 + missing cache = catastrophic performance regression - cache_mode='require' makes this an explicit error instead of a silent 3-hour job

Directory Resolution Precedence

Both cache and config use XDG Base Directory Specification with explicit precedence:

Rank Method Example Scope
1 CLI argument --cache-dir /tmp This command only
2 Environment variable HEALPYXEL_CACHE=/mnt/ssd This shell session
3 Config file ~/.config/healpyxel/settings.ini Persistent (all sessions)
4 XDG env var $XDG_CACHE_HOME or $XDG_CONFIG_HOME System-wide (multi-user systems)
5 XDG defaults ~/.cache or ~/.config Fallback (POSIX standard)

Effective paths:

# Default (nothing configured)
Cache:  $HOME/.cache/healpyxel/healpix_grids
Config: $HOME/.config/healpyxel

# With XDG_CACHE_HOME set
Cache:  $XDG_CACHE_HOME/healpyxel/healpix_grids

# With HEALPYXEL_CACHE env var (overrides XDG)
Cache:  $HEALPYXEL_CACHE

# CLI arg (overrides everything)
healpyxel-cache --cache-dir /custom/path --list

Configuration File

Location: $XDG_CONFIG_HOME/healpyxel/settings.ini (or ~/.config/healpyxel/settings.ini)

Auto-created on first use. Edit manually to customize:

# ~/.config/healpyxel/settings.ini

[cache]
# Cache directory for HEALPix grids (parquet files with spherical coordinates)
# Special value 'auto' means use XDG resolution
cache_dir = auto

# Precomputed nsides (comma-separated) to generate/cache automatically
precomputed_nsides = 32,64,128,256

[general]
# Whether to fix antimeridian-crossing polygons during boundary computation
fix_antimeridian = true

# Tolerance in degrees for antimeridian detection (advanced)
antimeridian_tolerance = 1.0

Environment Variables

Variable Purpose Example
HEALPYXEL_CACHE Cache directory (session override) export HEALPYXEL_CACHE=/fast/disk
HEALPYXEL_CONFIG Config directory (session override) export HEALPYXEL_CONFIG=~/.healpyxel_alt
XDG_CACHE_HOME XDG cache root (system-wide) Standard: leave unset (defaults to ~/.cache)
XDG_CONFIG_HOME XDG config root (system-wide) Standard: leave unset (defaults to ~/.config)

Cache Management Commands

List cached grids:

healpyxel-cache --list
# Output:
# Cached grids (3):
#   nside_032_nest_spherical.parquet    786432 cells  (3.2 MB)
#   nside_256_nest_spherical.parquet   49152 cells  (25.6 MB)
#   nside_512_nest_spherical.parquet  196608 cells  (102.4 MB)

Generate cache for specific nsides:

healpyxel-cache --generate 32 --generate 256 --generate 512
# Computes and caches all three at once

Verify cache integrity (recommended for CI):

healpyxel-cache --verify 256 --verify 512
# Checks:
#   ✓ All expected pixels present (no missing cells)
#   ✓ No NaN values in coordinate columns
#   ✓ Correct schema (theta_0...3, phi_0...3, healpix_id)
#   ✓ healpix_id values in valid range [0, npix)
# Returns non-zero exit code if any check fails

Show configuration and precedence:

healpyxel-cache --config
# Output:
# Config file: /home/user/.config/healpyxel/settings.ini
# Exists: true
#
# Current Settings:
#   cache_dir: auto (XDG)
#   precomputed_nsides: [32, 64, 128, 256]
#   fix_antimeridian: true
#   antimeridian_tolerance: 1.0
#
# Precedence Resolution:
#   cache_dir_resolved: /home/user/.cache/healpyxel/healpix_grids

Clean cache (remove all files):

healpyxel-cache --clean
# WARNING: Deletes all cached grids. Use with caution!

Troubleshooting

Cache not found:

ValueError: Cache required but not found: nside_256_nest_spherical.parquet

→ Solution: Generate cache first: healpyxel-cache --generate 256

Performance regression with sparse aggregates:

# Sparse aggregate with 10 pixels at nside=2048
# Takes 3 hours instead of 10 seconds

→ Root cause: Missing cache forces full 50M pixel computation
→ Solution: Use cache_mode='require' to fail fast, then generate cache

Cache verification failed:

healpyxel-cache --verify 256
# ERROR: Expected 786432 pixels, found 786000 (432 missing)

→ Solution: Regenerate: healpyxel-cache --generate 256 --force

Wrong cache directory:

healpyxel-cache --list
# Shows 0 files but you know cache exists

→ Diagnosis: Check precedence: healpyxel-cache --config
→ Solution: Set HEALPYXEL_CACHE env var or use --cache-dir explicitly

Polygon creation and antimeridian handling

Main API: build GeoDataFrame and save geoparquet


healpix_to_geodataframe

 healpix_to_geodataframe (nside:int, order:str='nested',
                          lon_convention:str='0_360',
                          pixels:Optional[Iterable[int]]=None,
                          fix_antimeridian:bool=True,
                          chunk_size:int=65536, cache_mode:str='use',
                          cache_dir:Optional[pathlib.Path]=None)

*Create a GeoDataFrame of HEALPix cell polygons.

Args: nside: HEALPix nside order: ‘nested’ or ‘ring’ lon_convention: ‘0_360’ or ‘-180_180’ (affects polygon coordinates) pixels: optional iterable of pixel indices; default = all pixels fix_antimeridian: whether to call antimeridian.fix_polygon on polygons crossing the meridian chunk_size: number of pixels to process per chunk for memory control cache_mode: one of {‘use’,‘require’,‘off’} - ‘use’: load cache if available, otherwise compute requested pixels only - ‘require’: require cache; if missing, raise error (no computation) - ‘off’: ignore cache entirely cache_dir: optional cache directory override

Returns: GeoDataFrame with columns: ‘healpix_id’ and ‘geometry’ (EPSG:4326)*


save_healpix_to_geoparquet

 save_healpix_to_geoparquet (nside:int,
                             output_path:Union[str,pathlib.Path],
                             order:str='nested',
                             lon_convention:str='0_360',
                             fix_antimeridian:bool=True,
                             chunk_size:int=65536,
                             parquet_kwargs:Optional[dict]=None)

*Build HEALPix vector layer and save as GeoParquet. This will create a GeoParquet file containing one polygon per HEALPix cell. For large nsides consider increasing memory or using chunked processing.

Args: nside: HEALPix nside output_path: path to output geoparquet file order: ‘nested’ or ‘ring’ lon_convention: ‘0_360’ or ‘-180_180’ fix_antimeridian: whether to fix antimeridian-wrapping chunk_size: pixels per chunk when building geometries parquet_kwargs: forwarded to GeoDataFrame.to_parquet Returns: Path to written file*


export_healpix_to_geotiff

 export_healpix_to_geotiff (df:pandas.core.frame.DataFrame, column:str,
                            output_path:Union[str,pathlib.Path],
                            nside:int, order:str='nested',
                            crs:str='IAU:19900', width:int=1440,
                            height:int=720)

*Export a HEALPix column to GeoTIFF (requires rasterio + healpy).

Args: df: DataFrame with healpix_id index or healpix_id column column: data column to export output_path: GeoTIFF output path nside: HEALPix nside order: ‘nested’ or ‘ring’ crs: CRS string for GeoTIFF width: output raster width (pixels) height: output raster height (pixels)

Returns: Path to written GeoTIFF*

Type Default Details
df DataFrame
column str
output_path Union
nside int
order str nested
crs str IAU:19900 Mercury IAU CRS
width int 1440
height int 720
Returns Path

Quick test

CLI with Metadata Auto-Detection

The CLI now supports intelligent parameter inference from metadata sidecars:

Metadata Sidecar Pattern: - For aggregate sample_50k_nside256_aggregate.parquet, place metadata at sample_50k_nside256_aggregate.meta.json - The CLI automatically loads and extracts: nside, order, lon_convention

Parameter Resolution Precedence: 1. CLI args (highest priority) — explicit user override 2. Metadata — from .meta.json sidecar (if present) 3. Defaults — fallback values or inference from aggregate

lon_convention Behavior: - --lon-convention auto (default) → searches metadata, falls back to 0_360 - --lon-convention 0_360 or -180_180 → explicit override - Prevents user confusion about which convention was used in aggregation

Usage Examples:

# Zero-config: metadata has all parameters
healpyxel_to_geoparquet -a sample_50k_nside256_aggregate.parquet

# Override metadata
healpyxel_to_geoparquet -a sample_50k_nside256_aggregate.parquet -l -180_180 -O ring

# Batch mode with metadata
healpyxel_to_geoparquet -a data.parquet -y  # Auto-confirm overwrites

main

 main ()

*CLI entry point for healpyxel_to_geoparquet.

Converts aggregate parquet output with HEALPix geometry to GeoParquet. Automatically infers nside from aggregate row count (dense mode) or filename (sparse mode). Output filename is constructed as: {input_stem}{suffix}.parquet Default suffix is ‘.geo’ so ‘sample_50k_nside256_aggregate.parquet’ → ‘sample_50k_nside256_aggregate.geo.parquet’*

Comparison: Old vs. New UX

Scenario Old New
With metadata sidecar healpyxel_to_geoparquet -a data.parquet -l 0_360 -O nested healpyxel_to_geoparquet -a data.parquet ✓ Zero-config
Sparse aggregate Must pass -n 256 explicitly Can pass -n 256 OR use metadata
Different lon convention Defaults to 0_360, must override Auto-detects from metadata
Error on parameter mismatch No validation (risk of wrong geometry) Metadata enforces consistency

Key Benefits: - ✅ Reduced UX friction: One argument instead of 3–4 - ✅ Consistency: Geometry respects aggregation parameters from metadata - ✅ Backward compatible: All explicit args still work and override metadata - ✅ Safe defaults: -180_180 lon convention now automatically used if that’s what data was processed with

Implementation: Why This Approach Wins

Architecture Decision: Metadata Sidecar Pattern

You proposed three approaches; here’s why option 2 (metadata sidecar) is best:

Approach Trade-offs Winner?
Option 1: Auto mode for lon_convention Only solves one param; nside/order still require explicit args ❌ Partial solution
Option 2: Pass metadata directly Higher UX friction (need to know metadata path); metadata is parallel to aggregate ✅ Best
Option 3: Flexible input (parquet OR metadata) Complex parsing logic; confusing precedence ❌ Overengineered

Why We Chose Option 2 (Enhanced): - Metadata .meta.json files are already generated alongside aggregates by the pipeline → zero user effort to provide it - Single metadata file contains all context: nside, order, lon_convention, timestamps, processing params - Sidecar pattern is industry-standard (e.g., .sidecar.json in STAC, .meta in scientific tools) - Auto-discovery: User only needs to pass aggregate path; CLI looks for {aggregate_stem}.meta.json - Backward compatible: Explicit CLI args still override when needed (e.g., testing with different parameters)

Why This Beats Manual Overrides: - Old way: healpyxel_to_geoparquet -a data.parquet -n 256 -O nested -l 0_360 (remember 4 params) - New way: healpyxel_to_geoparquet -a data.parquet (metadata does the work) - Problem solved: User can’t accidentally build geometries with wrong lon_convention → no more coordinate mismatches

Summary: Metadata Auto-Detection Workflow

You asked: How to handle --lon-convention which is stored in metadata?

Answer: Implement metadata sidecar auto-detection with parameter precedence.

What Changed

New Behavior: 1. CLI automatically discovers {aggregate_stem}.meta.json in the same directory 2. Extracts: nside, order, lon_convention from metadata keys: - ["sidecar_metadata"]["healpix"]["nside"] - ["sidecar_metadata"]["healpix"]["order"] - ["sidecar_metadata"]["coordinates"]["lon_convention"] 3. Default for --lon-convention: Changed from '0_360' to 'auto' - 'auto' → search metadata, fallback to '0_360' if not found - '0_360' or '-180_180' → explicit override (ignores metadata)

Parameter Precedence (highest to lowest):

CLI args > metadata > defaults

Code Changes

Two new helper functions: - _load_metadata_for_aggregate(agg_path) → loads .meta.json sidecar (quiet fail if missing) - _extract_healpix_params_from_metadata(metadata) → extracts nside, order, lon_convention

Updated main() CLI: - Option --lon-convention now accepts ['0_360', '-180_180', 'auto'] - Error message improved for sparse aggregates (mentions metadata option) - Logs which source was used: “Using lon_convention=0_360 from metadata” or “Using default…”

Usage

Zero-config (best case):

healpyxel_to_geoparquet -a sample_50k_nside256_aggregate.parquet
# Auto-detects: nside, order, lon_convention from metadata

Override metadata (for testing/validation):

healpyxel_to_geoparquet -a data.parquet -l -180_180 -n 256
# -l -180_180 overrides metadata, nside still from metadata

Batch mode with metadata:

healpyxel_to_geoparquet -a data.parquet -y
# -y auto-confirms overwrites, metadata provides all params

Testing ✓

  • Metadata extraction logic verified
  • Precedence (CLI > metadata > defaults) tested
  • Helper functions properly exported for nbdev
  • Report an issue