HEALPix Accumulator

Streaming accumulation with incremental statistics

StreamingStats

 StreamingStats ()

*Container for streaming statistics using Welford’s algorithm.

Maintains running statistics (mean, std, min, max) without storing raw data.*

CellAccumulator

 CellAccumulator (use_tdigest:bool=True)

*Accumulator for a single HEALPix cell.

Maintains streaming statistics for multiple columns plus optional T-Digest for approximate percentile computation.*

accumulate_batch

 accumulate_batch (new_data:pandas.core.frame.DataFrame,
                   sidecar:pandas.core.frame.DataFrame,
                   value_columns:List[str], existing_state:Optional[Dict[i
                   nt,__main__.CellAccumulator]]=None,
                   use_tdigest:bool=True, filter_expr:Optional[str]=None)

*Process one batch of data and update accumulator state.

Args: new_data: DataFrame with observations sidecar: HEALPix mapping (source_id -> healpix_id) value_columns: Columns to accumulate existing_state: Previous accumulator state (None for first batch) use_tdigest: Enable T-Digest for approximate percentiles filter_expr: Optional pandas query expression to filter data

Returns: Updated state dictionary {healpix_id: CellAccumulator}*

save_state

 save_state (state:Dict[int,__main__.CellAccumulator],
             output_path:pathlib.Path,
             meta:healpyxel.metadata.HEALPyxelxMetadata,
             processing_metadata:Optional[Dict[str,Any]]=None)

*Save accumulator state to parquet with embedded HEALPix metadata.

The parquet file stores nested dictionaries efficiently with validated HEALPix metadata embedded in schema. A .meta.json sidecar provides human-readable processing metadata.

Args: state: Dictionary of {healpix_id: CellAccumulator} output_path: Path to output state parquet file meta: HEALPyxelxMetadata with validated nside, mode, order processing_metadata: Optional dict with processing parameters*

load_state

 load_state (input_path:pathlib.Path, use_tdigest:bool=True)

*Load accumulator state and HEALPix metadata from parquet file.

Attempts to load embedded or companion HEALPyxelxMetadata to validate state file consistency.

Args: input_path: Path to state parquet file use_tdigest: Whether to restore T-Digest data

Returns: Tuple of (state dict, metadata) where metadata may be None if not found

Raises: FileNotFoundError: If state file does not exist*

validate_accumulator_sidecar_compatibility

 validate_accumulator_sidecar_compatibility
                                             (state_meta:healpyxel.metadat
                                             a.HEALPyxelxMetadata, sidecar
                                             _meta:healpyxel.metadata.HEAL
                                             PyxelxMetadata)

*Validate that accumulator state is compatible with sidecar file.

Checks that nside, mode, and order match to prevent silent corruption from mixing incompatible files.

Args: state_meta: Metadata from loaded state file sidecar_meta: Metadata from sidecar file

Returns: dict with validation results

Raises: AssertionError: If critical parameters mismatch*

find_sidecar

 find_sidecar (input_path:pathlib.Path, nside:Optional[int]=None,
               mode:str='fuzzy')

*Attempt to find matching sidecar file for input data.

Args: input_path: Path to input parquet file nside: Desired nside (if None, finds any matching sidecar) mode: Assignment mode (‘fuzzy’ or ‘strict’)

Returns: Path to matching sidecar file, or None if not found*

main

 main (argv=None)

Usage Example

See the main() function for CLI usage, or import functions directly for programmatic use.