HEALPix Accumulator
StreamingStats
StreamingStats ()
*Container for streaming statistics using Welford’s algorithm.
Maintains running statistics (mean, std, min, max) without storing raw data.*
CellAccumulator
CellAccumulator (use_tdigest:bool=True)
*Accumulator for a single HEALPix cell.
Maintains streaming statistics for multiple columns plus optional T-Digest for approximate percentile computation.*
accumulate_batch
accumulate_batch (new_data:pandas.core.frame.DataFrame, sidecar:pandas.core.frame.DataFrame, value_columns:List[str], existing_state:Optional[Dict[i nt,__main__.CellAccumulator]]=None, use_tdigest:bool=True, filter_expr:Optional[str]=None)
*Process one batch of data and update accumulator state.
Args: new_data: DataFrame with observations sidecar: HEALPix mapping (source_id -> healpix_id) value_columns: Columns to accumulate existing_state: Previous accumulator state (None for first batch) use_tdigest: Enable T-Digest for approximate percentiles filter_expr: Optional pandas query expression to filter data
Returns: Updated state dictionary {healpix_id: CellAccumulator}*
save_state
save_state (state:Dict[int,__main__.CellAccumulator], output_path:pathlib.Path, meta:healpyxel.metadata.HEALPyxelxMetadata, processing_metadata:Optional[Dict[str,Any]]=None)
*Save accumulator state to parquet with embedded HEALPix metadata.
The parquet file stores nested dictionaries efficiently with validated HEALPix metadata embedded in schema. A .meta.json sidecar provides human-readable processing metadata.
Args: state: Dictionary of {healpix_id: CellAccumulator} output_path: Path to output state parquet file meta: HEALPyxelxMetadata with validated nside, mode, order processing_metadata: Optional dict with processing parameters*
load_state
load_state (input_path:pathlib.Path, use_tdigest:bool=True)
*Load accumulator state and HEALPix metadata from parquet file.
Attempts to load embedded or companion HEALPyxelxMetadata to validate state file consistency.
Args: input_path: Path to state parquet file use_tdigest: Whether to restore T-Digest data
Returns: Tuple of (state dict, metadata) where metadata may be None if not found
Raises: FileNotFoundError: If state file does not exist*
validate_accumulator_sidecar_compatibility
validate_accumulator_sidecar_compatibility (state_meta:healpyxel.metadat a.HEALPyxelxMetadata, sidecar _meta:healpyxel.metadata.HEAL PyxelxMetadata)
*Validate that accumulator state is compatible with sidecar file.
Checks that nside, mode, and order match to prevent silent corruption from mixing incompatible files.
Args: state_meta: Metadata from loaded state file sidecar_meta: Metadata from sidecar file
Returns: dict with validation results
Raises: AssertionError: If critical parameters mismatch*
find_sidecar
find_sidecar (input_path:pathlib.Path, nside:Optional[int]=None, mode:str='fuzzy')
*Attempt to find matching sidecar file for input data.
Args: input_path: Path to input parquet file nside: Desired nside (if None, finds any matching sidecar) mode: Assignment mode (‘fuzzy’ or ‘strict’)
Returns: Path to matching sidecar file, or None if not found*
main
main (argv=None)
Usage Example
See the main() function for CLI usage, or import functions directly for programmatic use.