median

median(arr, axis=None, skip_na=True)[source]

Compute median over the time axis of a 1D, 2D, 3D, or 4D array.

Parameters:
  • arr (numpy.ndarray) – Input array.

  • axis (int, optional) – Axis along which to compute the median. If None, the median is computed over the first axis. Currently, this is only supported for 4D arrays.

  • skip_na (bool, optional) – Whether to skip NaN values, by default True. If False, the median of any pixel containing a NaN will be NaN.

Overview

median computes the per‑pixel (or per element) median along the leading time axis of arrays with 1–4 dimensions:

Supported input layouts (time-first): - 1D: (time,) - 2D: (time, band|feature) - 3D: (time, y, x) - 4D: (time, band, y, x)

The output removes the leading time axis and preserves the remaining spatial / band axes: - 1D input → scalar (float64) - 2D input → (band,) - 3D input → (y, x) - 4D input → (band, y, x)

NaN Handling

By default (skip_na=True) NaNs are excluded from the median calculation for each pixel / band. If all values for a pixel along time are NaN (after skipping), the result is NaN. If skip_na=False, any NaN in the time series forces the output at that position to NaN.

Performance Characteristics

The function is implemented in Rust with specialized dispatch for each supported rank: - 1D & 2D: sequential loops (small overhead; no parallelism needed). - 3D & 4D: parallel iteration over spatial indices (rows/columns or band/rows/columns) using rayon. Each pixel’s time series is copied into a temporary vector, optionally filtered for NaNs, sorted, and the median selected (middle element or average of the two middle elements for even-length series).

Memory Notes: - Sorting per pixel allocates a vector of length = time dimension. - There are no large intermediate arrays created; processing is per pixel. - Internal working dtype is float64 (inputs are coerced once for stability).

Parameters

arrnumpy.ndarray

Time-first array of rank 1–4 with any numeric dtype (int/uint/float). Coerced to float64 internally.

skip_nabool, default True

Whether to ignore NaN values in each time series. If False, any NaN in the time series forces the output at that position to NaN.

Returns

numpy.ndarray

Array with time axis removed; dtype float64.

Raises

TypeError

If arr is not a 1D–4D numeric NumPy array after coercion.

ValueError

(Indirectly) for internal shape inconsistencies (rare; indicates malformed input).

Examples

1D (simple series):
>>> import numpy as np
>>> from eo_processor import median
>>> series = np.array([1.0, 5.0, 3.0, 2.0])
>>> median(series)
2.5  # average of (2.0,3.0) after sorting → [1,2,3,5]
2D (time, band):
>>> cube = np.array([[1, 10],
...                  [4,  2],
...                  [3,  8]], dtype=float)
>>> median(cube)
array([3., 8.])  # medians per band
3D (time, y, x) with NaNs:
>>> stack = np.array([
...     [[1.0,  np.nan],
...      [5.0,  4.0]],
...     [[2.0,  7.0],
...      [np.nan, 6.0]],
...     [[3.0,  8.0],
...      [2.0,  5.0]],
... ])
>>> median(stack)  # skip_na=True (default)
array([[2., 7.],
       [3., 5.]])
>>> median(stack, skip_na=False)
array([[2., nan],
       [nan, 5.]])
4D (time, band, y, x):
>>> arr4d = np.random.rand(6, 2, 4, 4)  # 6 time steps
>>> med = median(arr4d)
>>> med.shape
(2, 4, 4)

Comparison to NumPy

NumPy’s np.nanmedian does not directly support a “time-first multi-rank specialized parallel loop”. This implementation: - Avoids Python overhead for per-pixel extraction loops. - Selectively parallelizes only spatial axes (not the time axis) for better cache behavior. Use np.nanmedian(arr, axis=0) for correctness comparison when skip_na=True.

Integration With Compositing

composite(arr, method="median") is a thin wrapper around median. Future compositing methods (e.g., percentiles) will follow the same shape semantics.

Performance

Benchmark (single run; macOS ARM64, CPython 3.10, release build – time.perf_counter):

Median Benchmark (Single Run)

Time Series Shape

Rust (s)

NumPy (s)

Rust Throughput (M elems/s)

NumPy Throughput (M elems/s)

Speedup

(15, 1024, 1024)

0.336

0.507

46.81

31.01

1.51x

(15, 2000, 2000)

1.313

1.947

45.72

30.82

1.48x

Reproduction snippet: .. code-block:: python

import numpy as np, time from eo_processor import median

cube = np.random.rand(15, 2000, 2000) t0 = time.perf_counter(); rust_out = median(cube); rust_t = time.perf_counter() - t0 t0 = time.perf_counter(); np_out = np.nanmedian(cube, axis=0); np_t = time.perf_counter() - t0 print(f”Rust {rust_t:.3f}s vs NumPy {np_t:.3f}s speedup {np_t/rust_t:.2f}x”) assert np.allclose(rust_out, np_out, atol=1e-12)

Interpretation: - Gains come from per-pixel time series extraction, native sort, and parallel spatial iteration. - Very small time dimensions may show smaller or no gains; benchmark your workload. - Sorting cost scales with time axis length; consider temporal downsampling if performance is critical.

Performance Claim Template: .. code-block:: text

Benchmark: Shape: (15, 2000, 2000) NumPy nanmedian: 1.947s Rust median: 1.313s Speedup: 1.48x Methodology: single run, time.perf_counter(), float64, np.allclose(…, atol=1e-12)

Best Practices

  • Ensure the first axis truly represents time; transpose beforehand if needed.

  • For very large time dimensions, consider preprocessing (e.g., masking or downsampling) to reduce per-pixel sort cost.

  • Keep skip_na=True unless specific sentinel NaN propagation semantics are needed.

Edge Cases

  • All NaNs in a pixel (with skip_na=True) → output NaN.

  • Empty time dimension (e.g., shape (0, y, x)) is not supported (will raise).

  • Extremely large time axis sizes may increase sort overhead; a streaming median is not implemented yet.

Version Notes

Document reflects public API as of version 0.6.0. Any change to behavior or added parameters will require a minor version bump per project guidelines.

License

MIT (see repository root).

End of median reference.