Healpyxel
  • Home
  • Quickstart
  • Source Code
  • Report a Bug
  1. API Reference
  2. Development: opportunistic cache use (default)
  • HealPyxel
  • Examples
    • Quickstart
    • Complete workflow
    • Gaussian PSF - WIP!
    • Streaming Accumulation - WIP!
    • Streaming - WIP!
  • API Reference
    • Package Structure
    • HEALPix Sidecar
    • HEALPix Aggregate
    • Accumulator
    • Usage Example
    • Generate HEALPix sidecar
    • Optional Dependencies
    • Development: opportunistic cache use (default)

On this page

  • Core helpers
  • Caching & XDG Configuration
    • is_geometry_valid
    • init_user_config
  • Cache Management Core Logic
    • manage_healpix_cache
  • Caching Tests
    • test_cache_verification_corrupt_nans
    • test_cache_verification_incomplete
    • test_cache_verification_missing
    • test_cache_verification_complete
    • test_cache_mode_require_missing_cache
    • test_spherical_conversion
    • test_cache_key_generation
    • test_xdg_precedence
  • Polygon creation and antimeridian handling
  • Main API: build GeoDataFrame and save geoparquet
    • healpix_to_geodataframe
    • save_healpix_to_geoparquet
    • export_healpix_to_geotiff
  • Quick test
  • CLI with Metadata Auto-Detection
    • main
  • Comparison: Old vs. New UX
  • Implementation: Why This Approach Wins
  • Summary: Metadata Auto-Detection Workflow
    • What Changed
    • Code Changes
    • Usage
    • Testing ✓
  • Report an issue

Other Formats

  • CommonMark
  1. API Reference
  2. Development: opportunistic cache use (default)

Development: opportunistic cache use (default)

Core helpers

Caching & XDG Configuration

HEALPix grids are expensive to compute. This module supports caching boundaries in parquet files using XDG Base Directory standards and a persistent configuration.

Precedence for directory resolution (highest to lowest): 1. CLI argument (e.g., --cache-dir /tmp) 2. Environment variable (e.g., HEALPYXEL_CACHE=/fast/disk) 3. XDG spec: $XDG_CACHE_HOME or $XDG_CONFIG_HOME 4. Fallback: ~/.cache/healpyxel/healpix_grids or ~/.config/healpyxel

Configuration file: $XDG_CONFIG_HOME/healpyxel/settings.ini (or ~/.config/healpyxel/settings.ini) - Controls precomputed nsides, antimeridian handling, cache location override - Auto-created on first use; can be edited manually


is_geometry_valid

 is_geometry_valid (geom:shapely.geometry.base.BaseGeometry)

*Check if a geometry is valid for use in a spherical projection.

This function ensures that the geometry is either associated with a CRS in degrees or has latitude/longitude coordinates within valid bounds. If no CRS is provided, it checks that the latitude is within [-90, 90] and longitude is within [-180, 360].

Args: geom (BaseGeometry): A Shapely geometry object to validate.

Returns: bool: True if the geometry is valid for spherical projection, False otherwise.

Example: >>> from shapely.geometry import Polygon >>> geom = Polygon([(-180, -90), (-180, 90), (180, 90), (180, -90), (-180, -90)]) >>> is_geometry_valid(geom) # Valid lat/lon bounds True >>> geom = Polygon([(1e32, 1e32), (180, 90), (180, -90), (1e32, 1e32)]) >>> is_geometry_valid(geom) # Invalid due to extreme coordinates False*


init_user_config

 init_user_config (config_dir:Optional[pathlib.Path]=None)

*Create default ~/.config/healpyxel/settings.ini if it doesn’t exist.

Args: config_dir: optional override for config directory

Returns: Path to config file (whether newly created or already existed)

Example: >>> config_file = init_user_config() >>> config_file.exists() True*

Cache Management Core Logic

Central dispatch for all cache operations: generate, list, clean, view configuration. Called by thin CLI wrapper in 05_cli.ipynb with no Click dependencies.


manage_healpix_cache

 manage_healpix_cache (action:str='list', nsides:Optional[List[int]]=None,
                       cache_dir:Optional[pathlib.Path]=None,
                       config_dir:Optional[pathlib.Path]=None,
                       force:bool=False)

*Core cache management logic with precedence awareness.

No Click dependencies; called by CLI wrapper in 05_cli.ipynb. Uses _get_cache_dir() and _get_config_dir() for proper precedence.

Args: action: ‘list’, ‘generate’, ‘verify’, ‘clean’, ‘info’, or ‘config’ nsides: list of nside values for ‘generate’ or ‘verify’ actions cache_dir: explicit CLI override (highest precedence) config_dir: explicit CLI override (highest precedence) force: whether to overwrite existing cache files during ‘generate’

Returns: dict with keys: ‘action’: str, action performed ‘cache_dir’: str, resolved cache directory ‘config_dir’: str, resolved config directory ‘status’: ‘ok’ or ‘error’ ‘count’/‘files’/‘deleted’/‘generated’/etc: action-specific data

Raises: ValueError for invalid action or missing required args*

Caching Tests

Verify XDG precedence logic and cache I/O roundtrip.


test_cache_verification_corrupt_nans

 test_cache_verification_corrupt_nans ()

Test cache verification with NaN values in coordinates.


test_cache_verification_incomplete

 test_cache_verification_incomplete ()

Test cache verification with incomplete cache (missing pixels).


test_cache_verification_missing

 test_cache_verification_missing ()

Test cache verification with missing cache file.


test_cache_verification_complete

 test_cache_verification_complete ()

Test cache verification with a complete valid cache.


test_cache_mode_require_missing_cache

 test_cache_mode_require_missing_cache ()

Verify that cache_mode=‘require’ raises ValueError when cache is missing.


test_spherical_conversion

 test_spherical_conversion ()

Verify spherical to lon/lat conversion.


test_cache_key_generation

 test_cache_key_generation ()

Verify cache key generation.


test_xdg_precedence

 test_xdg_precedence ()

Verify XDG directory resolution with full precedence.

Polygon creation and antimeridian handling

Main API: build GeoDataFrame and save geoparquet


healpix_to_geodataframe

 healpix_to_geodataframe (nside:int, order:str='nested',
                          lon_convention:str='0_360',
                          pixels:Optional[Iterable[int]]=None,
                          fix_antimeridian:bool=True,
                          chunk_size:int=65536, cache_mode:str='use',
                          cache_dir:Optional[pathlib.Path]=None)

*Create a GeoDataFrame of HEALPix cell polygons.

Args: nside: HEALPix nside order: ‘nested’ or ‘ring’ lon_convention: ‘0_360’ or ‘-180_180’ (affects polygon coordinates) pixels: optional iterable of pixel indices; default = all pixels fix_antimeridian: whether to call antimeridian.fix_polygon on polygons crossing the meridian chunk_size: number of pixels to process per chunk for memory control cache_mode: one of {‘use’,‘require’,‘off’} - ‘use’: load cache if available, otherwise compute requested pixels only - ‘require’: require cache; if missing, raise error (no computation) - ‘off’: ignore cache entirely cache_dir: optional cache directory override

Returns: GeoDataFrame with columns: ‘healpix_id’ and ‘geometry’ (EPSG:4326)*


save_healpix_to_geoparquet

 save_healpix_to_geoparquet (nside:int,
                             output_path:Union[str,pathlib.Path],
                             order:str='nested',
                             lon_convention:str='0_360',
                             fix_antimeridian:bool=True,
                             chunk_size:int=65536,
                             parquet_kwargs:Optional[dict]=None,
                             overwrite:bool=False, interactive:bool=True)

*Build HEALPix vector layer and save as GeoParquet.

Args: nside: HEALPix nside output_path: Path to output GeoParquet file order: ‘nested’ or ‘ring’ lon_convention: ‘0_360’ or ‘-180_180’ fix_antimeridian: Whether to fix antimeridian-wrapping chunk_size: Pixels per chunk when building geometries parquet_kwargs: Forwarded to GeoDataFrame.to_parquet overwrite: Whether to overwrite the file if it exists (default: False) interactive: If True, prompt the user for confirmation before overwriting

Returns: Path to the written file

Raises: FileExistsError: If the file exists and overwrite is False*


export_healpix_to_geotiff

 export_healpix_to_geotiff (df:pandas.core.frame.DataFrame, column:str,
                            output_path:Union[str,pathlib.Path],
                            nside:int, order:str='nested',
                            crs:str='IAU:19900', width:int=1440,
                            height:int=720)

*Export a HEALPix column to GeoTIFF (requires rasterio + healpy).

Args: df: DataFrame with healpix_id index or healpix_id column column: data column to export output_path: GeoTIFF output path nside: HEALPix nside order: ‘nested’ or ‘ring’ crs: CRS string for GeoTIFF width: output raster width (pixels) height: output raster height (pixels)

Returns: Path to written GeoTIFF*

Type Default Details
df DataFrame
column str
output_path Union
nside int
order str nested
crs str IAU:19900 Mercury IAU CRS
width int 1440
height int 720
Returns Path

Quick test

CLI with Metadata Auto-Detection

The CLI now supports intelligent parameter inference from metadata sidecars:

Metadata Sidecar Pattern: - For aggregate sample_50k_nside256_aggregate.parquet, place metadata at sample_50k_nside256_aggregate.meta.json - The CLI automatically loads and extracts: nside, order, lon_convention

Parameter Resolution Precedence: 1. CLI args (highest priority) — explicit user override 2. Metadata — from .meta.json sidecar (if present) 3. Defaults — fallback values or inference from aggregate

lon_convention Behavior: - --lon-convention auto (default) → searches metadata, falls back to 0_360 - --lon-convention 0_360 or -180_180 → explicit override - Prevents user confusion about which convention was used in aggregation

Usage Examples:

# Zero-config: metadata has all parameters
healpyxel_to_geoparquet -a sample_50k_nside256_aggregate.parquet

# Override metadata
healpyxel_to_geoparquet -a sample_50k_nside256_aggregate.parquet -l -180_180 -O ring

# Batch mode with metadata
healpyxel_to_geoparquet -a data.parquet -y  # Auto-confirm overwrites

main

 main ()

*CLI entry point for healpyxel_to_geoparquet.

Converts aggregate parquet output with HEALPix geometry to GeoParquet. Automatically infers nside from aggregate row count (dense mode) or filename (sparse mode). Output filename is constructed as: {input_stem}{suffix}.parquet Default suffix is ‘.geo’ so ‘sample_50k_nside256_aggregate.parquet’ → ‘sample_50k_nside256_aggregate.geo.parquet’*

Comparison: Old vs. New UX

Scenario Old New
With metadata sidecar healpyxel_to_geoparquet -a data.parquet -l 0_360 -O nested healpyxel_to_geoparquet -a data.parquet ✓ Zero-config
Sparse aggregate Must pass -n 256 explicitly Can pass -n 256 OR use metadata
Different lon convention Defaults to 0_360, must override Auto-detects from metadata
Error on parameter mismatch No validation (risk of wrong geometry) Metadata enforces consistency

Key Benefits: - ✅ Reduced UX friction: One argument instead of 3–4 - ✅ Consistency: Geometry respects aggregation parameters from metadata - ✅ Backward compatible: All explicit args still work and override metadata - ✅ Safe defaults: -180_180 lon convention now automatically used if that’s what data was processed with

Implementation: Why This Approach Wins

Architecture Decision: Metadata Sidecar Pattern

You proposed three approaches; here’s why option 2 (metadata sidecar) is best:

Approach Trade-offs Winner?
Option 1: Auto mode for lon_convention Only solves one param; nside/order still require explicit args ❌ Partial solution
Option 2: Pass metadata directly Higher UX friction (need to know metadata path); metadata is parallel to aggregate ✅ Best
Option 3: Flexible input (parquet OR metadata) Complex parsing logic; confusing precedence ❌ Overengineered

Why We Chose Option 2 (Enhanced): - Metadata .meta.json files are already generated alongside aggregates by the pipeline → zero user effort to provide it - Single metadata file contains all context: nside, order, lon_convention, timestamps, processing params - Sidecar pattern is industry-standard (e.g., .sidecar.json in STAC, .meta in scientific tools) - Auto-discovery: User only needs to pass aggregate path; CLI looks for {aggregate_stem}.meta.json - Backward compatible: Explicit CLI args still override when needed (e.g., testing with different parameters)

Why This Beats Manual Overrides: - Old way: healpyxel_to_geoparquet -a data.parquet -n 256 -O nested -l 0_360 (remember 4 params) - New way: healpyxel_to_geoparquet -a data.parquet (metadata does the work) - Problem solved: User can’t accidentally build geometries with wrong lon_convention → no more coordinate mismatches

Summary: Metadata Auto-Detection Workflow

You asked: How to handle --lon-convention which is stored in metadata?

Answer: Implement metadata sidecar auto-detection with parameter precedence.

What Changed

New Behavior: 1. CLI automatically discovers {aggregate_stem}.meta.json in the same directory 2. Extracts: nside, order, lon_convention from metadata keys: - ["sidecar_metadata"]["healpix"]["nside"] - ["sidecar_metadata"]["healpix"]["order"] - ["sidecar_metadata"]["coordinates"]["lon_convention"] 3. Default for --lon-convention: Changed from '0_360' to 'auto' - 'auto' → search metadata, fallback to '0_360' if not found - '0_360' or '-180_180' → explicit override (ignores metadata)

Parameter Precedence (highest to lowest):

CLI args > metadata > defaults

Code Changes

Two new helper functions: - _load_metadata_for_aggregate(agg_path) → loads .meta.json sidecar (quiet fail if missing) - _extract_healpix_params_from_metadata(metadata) → extracts nside, order, lon_convention

Updated main() CLI: - Option --lon-convention now accepts ['0_360', '-180_180', 'auto'] - Error message improved for sparse aggregates (mentions metadata option) - Logs which source was used: “Using lon_convention=0_360 from metadata” or “Using default…”

Usage

Zero-config (best case):

healpyxel_to_geoparquet -a sample_50k_nside256_aggregate.parquet
# Auto-detects: nside, order, lon_convention from metadata

Override metadata (for testing/validation):

healpyxel_to_geoparquet -a data.parquet -l -180_180 -n 256
# -l -180_180 overrides metadata, nside still from metadata

Batch mode with metadata:

healpyxel_to_geoparquet -a data.parquet -y
# -y auto-confirms overwrites, metadata provides all params

Testing ✓

  • Metadata extraction logic verified
  • Precedence (CLI > metadata > defaults) tested
  • Helper functions properly exported for nbdev
  • Report an issue