Healpyxel
  • Home
  • Quickstart
  • Source Code
  • Report a Bug
  1. API Reference
  2. HEALPix Sidecar
  • Start
  • Examples
    • Quickstart
    • Visualization
    • Visualization : Gaussian PSF - WIP!
    • Accumulation - WIP!
    • Streaming - WIP!
  • API Reference
    • Package Structure
    • HEALPix Sidecar
    • HEALPix Aggregate
    • HEALPix Accumulator
    • HEALPix Finalize
    • Generate HEALPix sidecar
    • Optional Dependencies
    • Geospatial

On this page

  • HEALPix Sidecar: PSF Weighting Extensions
    • compute_healpix_ids_from_lonlat
    • format_geo_statistics
    • compute_geo_statistics
    • detect_lonlat_columns
    • process_partition
    • add_psf_weights_to_sidecar
    • normalize_weights_per_cell
    • compute_assignment_weight
    • write_sidecar_metadata
    • build_output_path
    • parse_arguments
    • validate_nside
    • get_psf
    • GaussianPSF
    • PSF
    • write_partitioned_output
    • write_coalesced_output
    • main
    • get_healpix_cell_geometry
  • Usage Example
  • Geographical Statistics Feature
    • Key Design
    • Usage
  • Step 1b: Inspect with filtering to see what will be processed
  • Step 2: Run actual processing with appropriate convention
  • Optional: Specify explicit lon/lat columns
  • Optional: Control sample size for geometry-based extraction
  • Compare raw vs filtered statistics
    • Statistics Computed
    • Why This Matters
  • Report an issue

Other Formats

  • CommonMark
  1. API Reference
  2. HEALPix Sidecar

HEALPix Sidecar

Generate HEALPix cell assignments for spatial data

HEALPix Sidecar: PSF Weighting Extensions

This section introduces support for data point spread functions (PSF) and cell spread functions (CSF) in the sidecar generation process.

  • Data PSF: Models the spatial response of each data geometry (e.g., a 2D Gaussian).
  • Cell PSF: Models the spatial response of each HEALPix cell (e.g., a 2D Gaussian centered on the cell).
  • Combination: The final weight for each (source, cell) assignment is computed by combining the two (default: multiplication).
  • Normalization: Weights are normalized per cell so that their sum is 1, preserving compatibility with unweighted aggregation.

The implementation is modular and ready for future extension to custom/user-provided PSFs.


compute_healpix_ids_from_lonlat

 compute_healpix_ids_from_lonlat (nside:int, lons:numpy.ndarray,
                                  lats:numpy.ndarray)

*Compute HEALPix indices for arrays of lon,lat in degrees.

Tries to use cdshealpix if available, otherwise falls back to healpy. Returns a 1D integer numpy array of same length as inputs.*


format_geo_statistics

 format_geo_statistics (stats:dict)

*Format geo-statistics for display using rich tables.

Args: stats: Statistics dictionary from compute_geo_statistics

Returns: Formatted string representation*


compute_geo_statistics

 compute_geo_statistics (input_path:pathlib.Path, lon_col:str|None=None,
                         lat_col:str|None=None, sample_size:int=10000,
                         lon_convention:str|None=None)

*Compute geographical statistics for a GeoParquet file using DuckDB for efficiency.

This function can analyze raw data or apply filtering based on longitude convention.

Args: input_path: Path to input GeoParquet file lon_col: Name of longitude column (if None, will auto-detect or extract from geometry) lat_col: Name of latitude column (if None, will auto-detect or extract from geometry) sample_size: Number of rows to sample for geometry-based extraction (if needed) lon_convention: Optional longitude convention for filtering: ‘0_360’ for [0,360) × [-90,90] ‘minus_plus180’ for [-180,180) × [-90,90] None (default) for no filtering (raw data)

Returns: Dictionary with statistics: { ‘lon’: {‘min’, ‘max’, ‘mean’, ‘std’, ‘count’}, ‘lat’: {‘min’, ‘max’, ‘mean’, ‘std’, ‘count’}, ‘source’: ‘columns’ or ‘geometry’, ‘lon_col’: column name or None, ‘lat_col’: column name or None, ‘filtered’: bool (True if convention filtering was applied), ‘total_count’: int (total records before filtering, if filtered), ‘filtered_count’: int (records after filtering, if filtered) }*


detect_lonlat_columns

 detect_lonlat_columns (gdf_sample)

*Auto-detect longitude and latitude columns from a GeoDataFrame sample.

Returns: Tuple of (lon_column, lat_column) or (None, None) if not found*


process_partition

 process_partition (gdf, nside:int, mode:str, base_index:int|None=None,
                    lon_convention:str='None', data_psf=None,
                    cell_psf=None, combine_method='multiply')

*Process a single dask partition (GeoDataFrame) and return DataFrame of assignments.

The returned D/ataFrame has columns [‘source_id’, ‘healpix_id’] and one row per assignment (for strict mode: at most one row per source_id; for fuzzy mode: one row per touched healpix cell).

Args: gdf: GeoDataFrame partition nside: HEALPix nside parameter mode: ‘strict’ or ‘fuzzy’ assignment mode base_index: Base index for source_id generation lon_convention: Longitude convention - ‘0_360’ for [0,360) or ‘minus_plus180’ for [-180,180)*


add_psf_weights_to_sidecar

 add_psf_weights_to_sidecar (sidecar_df, src_geoms, cell_geoms,
                             data_psf=None, cell_psf=None,
                             combine_method='multiply', normalize=True)

Add a ‘weight’ column to the sidecar DataFrame using PSF functions. - src_geoms: sequence of source geometries (indexed by source_id) - cell_geoms: dict or sequence mapping healpix_id to cell geometry


normalize_weights_per_cell

 normalize_weights_per_cell (df, cell_col='healpix_id',
                             weight_col='weight')

Normalize weights so that sum of weights per cell is 1.0.


compute_assignment_weight

 compute_assignment_weight (src_geom, cell_geom, data_psf=None,
                            cell_psf=None, combine_method='multiply',
                            data_psf_sigma=None, cell_psf_sigma=None)

Compute the assignment weight for a (source geometry, cell geometry) pair. - data_psf: callable or None - cell_psf: callable or None - combine_method: ‘multiply’, ‘sum’, ‘min’, ‘max’


write_sidecar_metadata

 write_sidecar_metadata (output_path:pathlib.Path,
                         input_path:pathlib.Path, nside:int, mode:str,
                         lon_convention:str, ncores:int, args)

*Write sidecar processing metadata to JSON file.

Args: output_path: Path to the sidecar parquet file input_path: Path to the input file nside: HEALPix nside parameter mode: Assignment mode (‘strict’ or ‘fuzzy’) lon_convention: Longitude convention used ncores: Number of cores used args: Parsed command-line arguments

Returns: Path to the written metadata file*


build_output_path

 build_output_path (input_path:pathlib.Path, mode:str, nside:int)

Build output path for sidecar file based on input and parameters.


parse_arguments

 parse_arguments (argv=None)

Parse command line arguments.


validate_nside

 validate_nside (nside:int)

Validate that nside is a positive power of two.


get_psf

 get_psf (psf_type, sigma=None)

GaussianPSF

 GaussianPSF (sigma=None)

2D Gaussian PSF centered at (0,0).


PSF

 PSF ()

Base class for Point Spread Functions (PSF).


write_partitioned_output

 write_partitioned_output (tasks, out_file:pathlib.Path, nparts:int)

*Write output as partitioned parquet files (one per partition).

Returns: Total number of rows written*


write_coalesced_output

 write_coalesced_output (tasks, out_file:pathlib.Path, nside:int,
                         mode:str, ncores:int, nparts:int)

*Write output as a single coalesced parquet file with incremental batching.

Returns: Total number of rows written*


main

 main (argv=None)

Main entry point for HEALPix sidecar generation.


get_healpix_cell_geometry

 get_healpix_cell_geometry (healpix_id, nside, nest=True)

Return a shapely Polygon for the given HEALPix cell. Uses healpy boundaries (in degrees, lon/lat).

Usage Example

See the main() function for CLI usage, or import functions directly for programmatic use.

Geographical Statistics Feature

The sidecar tool includes a standalone geo-statistics feature to inspect your data before processing.

Key Design

--geo-stats is a separate operation: When specified, the tool analyzes the raw data, displays statistics, and exits without performing sidecar calculations. This allows you to:

  1. Inspect data ranges and quality
  2. Get convention recommendations based on actual data
  3. Validate coordinates before heavy processing

Usage

```bash # Step 1: Inspect your data (raw, no filtering - auto-detects lon/lat columns) healpyxel-sidecar -i data.parquet –geo-stats

Step 1b: Inspect with filtering to see what will be processed

healpyxel-sidecar -i data.parquet –geo-stats –lon-convention 0_360

Step 2: Run actual processing with appropriate convention

healpyxel-sidecar -i data.parquet –nside 64 –lon-convention 0_360 –mode fuzzy

Optional: Specify explicit lon/lat columns

healpyxel-sidecar -i data.parquet –geo-stats
–lon-col longitude –lat-col latitude

Optional: Control sample size for geometry-based extraction

healpyxel-sidecar -i data.parquet –geo-stats
–stats-sample-size 50000

Compare raw vs filtered statistics

  1. Flexible analysis modes:
    • Raw data (default): No filtering, shows actual data ranges
    • Filtered data (with --lon-convention): Apply same filtering as HEALPix processing
  2. Automatic column detection: Intelligently detects lon/lat columns using common naming patterns
  3. Geometry extraction: Falls back to extracting coordinates from geometry column (centroids for polygons)
  4. Efficient computation: Uses DuckDB SQL with WHERE clauses for fast filtered statistics
  5. Filtering impact: Shows total records, filtered records, and drop percentage
  6. Convention suggestion: Recommends appropriate --lon-convention based on data ranges
  7. Validation warnings: Checks if coordinates are within valid ranges
  8. Beautiful output: Uses Rich library for formatted tables (falls back to plain text if unavailable)
  9. JSON export: Saves statistics to .geo_stats.json file for reference

Statistics Computed

  • Count: Number of valid coordinates
  • Min/Max: Range of longitude and latitude values (as-is from file)
  • Mean: Average position
  • Std Dev: Standard deviation (spread)

Why This Matters

  • Workflow efficiency: Quick data inspection before heavy processing

  • Convention detection: Tool suggests the right --lon-convention for your data

  • Data validation: Catch coordinate system issues early ### Why This Matters- Performance: Statistics computed efficiently without loading entire dataset into memory

  • Quality control: Identify outliers or invalid coordinates

  • Quality control: Identify outliers or invalid coordinates

  • Performance: Statistics computed efficiently without loading entire dataset into memory

  • Workflow efficiency: Quick data inspection before heavy processing- Data validation: Catch coordinate system issues early

  • Convention detection: Tool suggests the right --lon-convention for your data

  • Report an issue