median¶

median(arr, axis=None, skip_na=True)[source]¶

Compute median over the time axis of a 1D, 2D, 3D, or 4D array.

Parameters:

arr (numpy.ndarray) – Input array.
axis (int, optional) – Axis along which to compute the median. If None, the median is computed over the first axis. Currently, this is only supported for 4D arrays.
skip_na (bool, optional) – Whether to skip NaN values, by default True. If False, the median of any pixel containing a NaN will be NaN.

Overview¶

median computes the per‑pixel (or per element) median along the leading time axis of arrays with 1–4 dimensions:

Supported input layouts (time-first): - 1D: (time,) - 2D: (time, band|feature) - 3D: (time, y, x) - 4D: (time, band, y, x)

The output removes the leading time axis and preserves the remaining spatial / band axes: - 1D input → scalar (float64) - 2D input → (band,) - 3D input → (y, x) - 4D input → (band, y, x)

NaN Handling¶

By default (skip_na=True) NaNs are excluded from the median calculation for each pixel / band. If all values for a pixel along time are NaN (after skipping), the result is NaN. If skip_na=False, any NaN in the time series forces the output at that position to NaN.

Performance Characteristics¶

The function is implemented in Rust with specialized dispatch for each supported rank: - 1D & 2D: sequential loops (small overhead; no parallelism needed). - 3D & 4D: parallel iteration over spatial indices (rows/columns or band/rows/columns) using rayon. Each pixel’s time series is copied into a temporary vector, optionally filtered for NaNs, sorted, and the median selected (middle element or average of the two middle elements for even-length series).

Memory Notes: - Sorting per pixel allocates a vector of length = time dimension. - There are no large intermediate arrays created; processing is per pixel. - Internal working dtype is float64 (inputs are coerced once for stability).

Parameters¶

arrnumpy.ndarray: Time-first array of rank 1–4 with any numeric dtype (int/uint/float). Coerced to float64 internally.
skip_nabool, default True: Whether to ignore NaN values in each time series. If False, any NaN in the time series forces the output at that position to NaN.

Returns¶

numpy.ndarray: Array with time axis removed; dtype float64.

Raises¶

TypeError: If arr is not a 1D–4D numeric NumPy array after coercion.
ValueError: (Indirectly) for internal shape inconsistencies (rare; indicates malformed input).

Examples¶

1D (simple series):

>>> import numpy as np
>>> from eo_processor import median
>>> series = np.array([1.0, 5.0, 3.0, 2.0])
>>> median(series)
2.5  # average of (2.0,3.0) after sorting → [1,2,3,5]

2D (time, band):

>>> cube = np.array([[1, 10],
...                  [4,  2],
...                  [3,  8]], dtype=float)
>>> median(cube)
array([3., 8.])  # medians per band

3D (time, y, x) with NaNs:

>>> stack = np.array([
...     [[1.0,  np.nan],
...      [5.0,  4.0]],
...     [[2.0,  7.0],
...      [np.nan, 6.0]],
...     [[3.0,  8.0],
...      [2.0,  5.0]],
... ])
>>> median(stack)  # skip_na=True (default)
array([[2., 7.],
       [3., 5.]])
>>> median(stack, skip_na=False)
array([[2., nan],
       [nan, 5.]])

4D (time, band, y, x):

>>> arr4d = np.random.rand(6, 2, 4, 4)  # 6 time steps
>>> med = median(arr4d)
>>> med.shape
(2, 4, 4)

Comparison to NumPy¶

NumPy’s np.nanmedian does not directly support a “time-first multi-rank specialized parallel loop”. This implementation: - Avoids Python overhead for per-pixel extraction loops. - Selectively parallelizes only spatial axes (not the time axis) for better cache behavior. Use np.nanmedian(arr, axis=0) for correctness comparison when skip_na=True.

Integration With Compositing¶

composite(arr, method="median") is a thin wrapper around median. Future compositing methods (e.g., percentiles) will follow the same shape semantics.

Performance¶

Benchmark (single run; macOS ARM64, CPython 3.10, release build – time.perf_counter):

Median Benchmark (Single Run)¶
Time Series Shape	Rust (s)	NumPy (s)	Rust Throughput (M elems/s)	NumPy Throughput (M elems/s)	Speedup
(15, 1024, 1024)	0.336	0.507	46.81	31.01	1.51x
(15, 2000, 2000)	1.313	1.947	45.72	30.82	1.48x

Reproduction snippet: .. code-block:: python

import numpy as np, time from eo_processor import median

cube = np.random.rand(15, 2000, 2000) t0 = time.perf_counter(); rust_out = median(cube); rust_t = time.perf_counter() - t0 t0 = time.perf_counter(); np_out = np.nanmedian(cube, axis=0); np_t = time.perf_counter() - t0 print(f”Rust {rust_t:.3f}s vs NumPy {np_t:.3f}s speedup {np_t/rust_t:.2f}x”) assert np.allclose(rust_out, np_out, atol=1e-12)

Interpretation: - Gains come from per-pixel time series extraction, native sort, and parallel spatial iteration. - Very small time dimensions may show smaller or no gains; benchmark your workload. - Sorting cost scales with time axis length; consider temporal downsampling if performance is critical.

Performance Claim Template: .. code-block:: text

Benchmark: Shape: (15, 2000, 2000) NumPy nanmedian: 1.947s Rust median: 1.313s Speedup: 1.48x Methodology: single run, time.perf_counter(), float64, np.allclose(…, atol=1e-12)

Best Practices¶

Ensure the first axis truly represents time; transpose beforehand if needed.
For very large time dimensions, consider preprocessing (e.g., masking or downsampling) to reduce per-pixel sort cost.
Keep skip_na=True unless specific sentinel NaN propagation semantics are needed.

Edge Cases¶

All NaNs in a pixel (with skip_na=True) → output NaN.
Empty time dimension (e.g., shape (0, y, x)) is not supported (will raise).
Extremely large time axis sizes may increase sort overhead; a streaming median is not implemented yet.

Version Notes¶

Document reflects public API as of version 0.6.0. Any change to behavior or added parameters will require a minor version bump per project guidelines.

License¶

MIT (see repository root).

End of median reference.