Development: opportunistic cache use (default)
Core helpers
Caching & XDG Configuration
HEALPix grids are expensive to compute. This module supports caching boundaries in parquet files using XDG Base Directory standards and a persistent configuration.
Precedence for directory resolution (highest to lowest): 1. CLI argument (e.g., --cache-dir /tmp) 2. Environment variable (e.g., HEALPYXEL_CACHE=/fast/disk) 3. XDG spec: $XDG_CACHE_HOME or $XDG_CONFIG_HOME 4. Fallback: ~/.cache/healpyxel/healpix_grids or ~/.config/healpyxel
Configuration file: $XDG_CONFIG_HOME/healpyxel/settings.ini (or ~/.config/healpyxel/settings.ini) - Controls precomputed nsides, antimeridian handling, cache location override - Auto-created on first use; can be edited manually
is_geometry_valid
is_geometry_valid (geom:shapely.geometry.base.BaseGeometry)
*Check if a geometry is valid for use in a spherical projection.
This function ensures that the geometry is either associated with a CRS in degrees or has latitude/longitude coordinates within valid bounds. If no CRS is provided, it checks that the latitude is within [-90, 90] and longitude is within [-180, 360].
Args: geom (BaseGeometry): A Shapely geometry object to validate.
Returns: bool: True if the geometry is valid for spherical projection, False otherwise.
Example: >>> from shapely.geometry import Polygon >>> geom = Polygon([(-180, -90), (-180, 90), (180, 90), (180, -90), (-180, -90)]) >>> is_geometry_valid(geom) # Valid lat/lon bounds True >>> geom = Polygon([(1e32, 1e32), (180, 90), (180, -90), (1e32, 1e32)]) >>> is_geometry_valid(geom) # Invalid due to extreme coordinates False*
init_user_config
init_user_config (config_dir:Optional[pathlib.Path]=None)
*Create default ~/.config/healpyxel/settings.ini if it doesn’t exist.
Args: config_dir: optional override for config directory
Returns: Path to config file (whether newly created or already existed)
Example: >>> config_file = init_user_config() >>> config_file.exists() True*
Cache Management Core Logic
Central dispatch for all cache operations: generate, list, clean, view configuration. Called by thin CLI wrapper in 05_cli.ipynb with no Click dependencies.
manage_healpix_cache
manage_healpix_cache (action:str='list', nsides:Optional[List[int]]=None, cache_dir:Optional[pathlib.Path]=None, config_dir:Optional[pathlib.Path]=None, force:bool=False)
*Core cache management logic with precedence awareness.
No Click dependencies; called by CLI wrapper in 05_cli.ipynb. Uses _get_cache_dir() and _get_config_dir() for proper precedence.
Args: action: ‘list’, ‘generate’, ‘verify’, ‘clean’, ‘info’, or ‘config’ nsides: list of nside values for ‘generate’ or ‘verify’ actions cache_dir: explicit CLI override (highest precedence) config_dir: explicit CLI override (highest precedence) force: whether to overwrite existing cache files during ‘generate’
Returns: dict with keys: ‘action’: str, action performed ‘cache_dir’: str, resolved cache directory ‘config_dir’: str, resolved config directory ‘status’: ‘ok’ or ‘error’ ‘count’/‘files’/‘deleted’/‘generated’/etc: action-specific data
Raises: ValueError for invalid action or missing required args*
Caching Tests
Verify XDG precedence logic and cache I/O roundtrip.
test_cache_verification_corrupt_nans
test_cache_verification_corrupt_nans ()
Test cache verification with NaN values in coordinates.
test_cache_verification_incomplete
test_cache_verification_incomplete ()
Test cache verification with incomplete cache (missing pixels).
test_cache_verification_missing
test_cache_verification_missing ()
Test cache verification with missing cache file.
test_cache_verification_complete
test_cache_verification_complete ()
Test cache verification with a complete valid cache.
test_cache_mode_require_missing_cache
test_cache_mode_require_missing_cache ()
Verify that cache_mode=‘require’ raises ValueError when cache is missing.
test_spherical_conversion
test_spherical_conversion ()
Verify spherical to lon/lat conversion.
test_cache_key_generation
test_cache_key_generation ()
Verify cache key generation.
test_xdg_precedence
test_xdg_precedence ()
Verify XDG directory resolution with full precedence.
Polygon creation and antimeridian handling
Main API: build GeoDataFrame and save geoparquet
healpix_to_geodataframe
healpix_to_geodataframe (nside:int, order:str='nested', lon_convention:str='0_360', pixels:Optional[Iterable[int]]=None, fix_antimeridian:bool=True, chunk_size:int=65536, cache_mode:str='use', cache_dir:Optional[pathlib.Path]=None)
*Create a GeoDataFrame of HEALPix cell polygons.
Args: nside: HEALPix nside order: ‘nested’ or ‘ring’ lon_convention: ‘0_360’ or ‘-180_180’ (affects polygon coordinates) pixels: optional iterable of pixel indices; default = all pixels fix_antimeridian: whether to call antimeridian.fix_polygon on polygons crossing the meridian chunk_size: number of pixels to process per chunk for memory control cache_mode: one of {‘use’,‘require’,‘off’} - ‘use’: load cache if available, otherwise compute requested pixels only - ‘require’: require cache; if missing, raise error (no computation) - ‘off’: ignore cache entirely cache_dir: optional cache directory override
Returns: GeoDataFrame with columns: ‘healpix_id’ and ‘geometry’ (EPSG:4326)*
save_healpix_to_geoparquet
save_healpix_to_geoparquet (nside:int, output_path:Union[str,pathlib.Path], order:str='nested', lon_convention:str='0_360', fix_antimeridian:bool=True, chunk_size:int=65536, parquet_kwargs:Optional[dict]=None, overwrite:bool=False, interactive:bool=True)
*Build HEALPix vector layer and save as GeoParquet.
Args: nside: HEALPix nside output_path: Path to output GeoParquet file order: ‘nested’ or ‘ring’ lon_convention: ‘0_360’ or ‘-180_180’ fix_antimeridian: Whether to fix antimeridian-wrapping chunk_size: Pixels per chunk when building geometries parquet_kwargs: Forwarded to GeoDataFrame.to_parquet overwrite: Whether to overwrite the file if it exists (default: False) interactive: If True, prompt the user for confirmation before overwriting
Returns: Path to the written file
Raises: FileExistsError: If the file exists and overwrite is False*
export_healpix_to_geotiff
export_healpix_to_geotiff (df:pandas.core.frame.DataFrame, column:str, output_path:Union[str,pathlib.Path], nside:int, order:str='nested', crs:str='IAU:19900', width:int=1440, height:int=720)
*Export a HEALPix column to GeoTIFF (requires rasterio + healpy).
Args: df: DataFrame with healpix_id index or healpix_id column column: data column to export output_path: GeoTIFF output path nside: HEALPix nside order: ‘nested’ or ‘ring’ crs: CRS string for GeoTIFF width: output raster width (pixels) height: output raster height (pixels)
Returns: Path to written GeoTIFF*
| Type | Default | Details | |
|---|---|---|---|
| df | DataFrame | ||
| column | str | ||
| output_path | Union | ||
| nside | int | ||
| order | str | nested | |
| crs | str | IAU:19900 | Mercury IAU CRS |
| width | int | 1440 | |
| height | int | 720 | |
| Returns | Path |
Quick test
CLI with Metadata Auto-Detection
The CLI now supports intelligent parameter inference from metadata sidecars:
Metadata Sidecar Pattern: - For aggregate sample_50k_nside256_aggregate.parquet, place metadata at sample_50k_nside256_aggregate.meta.json - The CLI automatically loads and extracts: nside, order, lon_convention
Parameter Resolution Precedence: 1. CLI args (highest priority) — explicit user override 2. Metadata — from .meta.json sidecar (if present) 3. Defaults — fallback values or inference from aggregate
lon_convention Behavior: - --lon-convention auto (default) → searches metadata, falls back to 0_360 - --lon-convention 0_360 or -180_180 → explicit override - Prevents user confusion about which convention was used in aggregation
Usage Examples:
# Zero-config: metadata has all parameters
healpyxel_to_geoparquet -a sample_50k_nside256_aggregate.parquet
# Override metadata
healpyxel_to_geoparquet -a sample_50k_nside256_aggregate.parquet -l -180_180 -O ring
# Batch mode with metadata
healpyxel_to_geoparquet -a data.parquet -y # Auto-confirm overwritesmain
main ()
*CLI entry point for healpyxel_to_geoparquet.
Converts aggregate parquet output with HEALPix geometry to GeoParquet. Automatically infers nside from aggregate row count (dense mode) or filename (sparse mode). Output filename is constructed as: {input_stem}{suffix}.parquet Default suffix is ‘.geo’ so ‘sample_50k_nside256_aggregate.parquet’ → ‘sample_50k_nside256_aggregate.geo.parquet’*
Comparison: Old vs. New UX
| Scenario | Old | New |
|---|---|---|
| With metadata sidecar | healpyxel_to_geoparquet -a data.parquet -l 0_360 -O nested |
healpyxel_to_geoparquet -a data.parquet ✓ Zero-config |
| Sparse aggregate | Must pass -n 256 explicitly |
Can pass -n 256 OR use metadata |
| Different lon convention | Defaults to 0_360, must override |
Auto-detects from metadata |
| Error on parameter mismatch | No validation (risk of wrong geometry) | Metadata enforces consistency |
Key Benefits: - ✅ Reduced UX friction: One argument instead of 3–4 - ✅ Consistency: Geometry respects aggregation parameters from metadata - ✅ Backward compatible: All explicit args still work and override metadata - ✅ Safe defaults: -180_180 lon convention now automatically used if that’s what data was processed with
Implementation: Why This Approach Wins
Architecture Decision: Metadata Sidecar Pattern
You proposed three approaches; here’s why option 2 (metadata sidecar) is best:
| Approach | Trade-offs | Winner? |
|---|---|---|
| Option 1: Auto mode for lon_convention | Only solves one param; nside/order still require explicit args | ❌ Partial solution |
| Option 2: Pass metadata directly | Higher UX friction (need to know metadata path); metadata is parallel to aggregate | ✅ Best |
| Option 3: Flexible input (parquet OR metadata) | Complex parsing logic; confusing precedence | ❌ Overengineered |
Why We Chose Option 2 (Enhanced): - Metadata .meta.json files are already generated alongside aggregates by the pipeline → zero user effort to provide it - Single metadata file contains all context: nside, order, lon_convention, timestamps, processing params - Sidecar pattern is industry-standard (e.g., .sidecar.json in STAC, .meta in scientific tools) - Auto-discovery: User only needs to pass aggregate path; CLI looks for {aggregate_stem}.meta.json - Backward compatible: Explicit CLI args still override when needed (e.g., testing with different parameters)
Why This Beats Manual Overrides: - Old way: healpyxel_to_geoparquet -a data.parquet -n 256 -O nested -l 0_360 (remember 4 params) - New way: healpyxel_to_geoparquet -a data.parquet (metadata does the work) - Problem solved: User can’t accidentally build geometries with wrong lon_convention → no more coordinate mismatches
Summary: Metadata Auto-Detection Workflow
You asked: How to handle --lon-convention which is stored in metadata?
Answer: Implement metadata sidecar auto-detection with parameter precedence.
What Changed
New Behavior: 1. CLI automatically discovers {aggregate_stem}.meta.json in the same directory 2. Extracts: nside, order, lon_convention from metadata keys: - ["sidecar_metadata"]["healpix"]["nside"] - ["sidecar_metadata"]["healpix"]["order"] - ["sidecar_metadata"]["coordinates"]["lon_convention"] 3. Default for --lon-convention: Changed from '0_360' to 'auto' - 'auto' → search metadata, fallback to '0_360' if not found - '0_360' or '-180_180' → explicit override (ignores metadata)
Parameter Precedence (highest to lowest):
CLI args > metadata > defaults
Code Changes
Two new helper functions: - _load_metadata_for_aggregate(agg_path) → loads .meta.json sidecar (quiet fail if missing) - _extract_healpix_params_from_metadata(metadata) → extracts nside, order, lon_convention
Updated main() CLI: - Option --lon-convention now accepts ['0_360', '-180_180', 'auto'] - Error message improved for sparse aggregates (mentions metadata option) - Logs which source was used: “Using lon_convention=0_360 from metadata” or “Using default…”
Usage
Zero-config (best case):
healpyxel_to_geoparquet -a sample_50k_nside256_aggregate.parquet
# Auto-detects: nside, order, lon_convention from metadataOverride metadata (for testing/validation):
healpyxel_to_geoparquet -a data.parquet -l -180_180 -n 256
# -l -180_180 overrides metadata, nside still from metadataBatch mode with metadata:
healpyxel_to_geoparquet -a data.parquet -y
# -y auto-confirms overwrites, metadata provides all paramsTesting ✓
- Metadata extraction logic verified
- Precedence (CLI > metadata > defaults) tested
- Helper functions properly exported for nbdev