median¶
- median(arr, axis=None, skip_na=True)[source]¶
Compute median over the time axis of a 1D, 2D, 3D, or 4D array.
- Parameters:
arr (
numpy.ndarray) – Input array.axis (
int, optional) – Axis along which to compute the median. If None, the median is computed over the first axis. Currently, this is only supported for 4D arrays.skip_na (
bool, optional) – Whether to skip NaN values, by default True. If False, the median of any pixel containing a NaN will be NaN.
Overview¶
median computes the per‑pixel (or per element) median along the leading time axis
of arrays with 1–4 dimensions:
Supported input layouts (time-first): - 1D: (time,) - 2D: (time, band|feature) - 3D: (time, y, x) - 4D: (time, band, y, x)
The output removes the leading time axis and preserves the remaining spatial / band axes: - 1D input → scalar (float64) - 2D input → (band,) - 3D input → (y, x) - 4D input → (band, y, x)
NaN Handling¶
By default (skip_na=True) NaNs are excluded from the median calculation for each pixel / band.
If all values for a pixel along time are NaN (after skipping), the result is NaN.
If skip_na=False, any NaN in the time series forces the output at that position to NaN.
Performance Characteristics¶
The function is implemented in Rust with specialized dispatch for each supported rank: - 1D & 2D: sequential loops (small overhead; no parallelism needed). - 3D & 4D: parallel iteration over spatial indices (rows/columns or band/rows/columns) using rayon. Each pixel’s time series is copied into a temporary vector, optionally filtered for NaNs, sorted, and the median selected (middle element or average of the two middle elements for even-length series).
Memory Notes: - Sorting per pixel allocates a vector of length = time dimension. - There are no large intermediate arrays created; processing is per pixel. - Internal working dtype is float64 (inputs are coerced once for stability).
Parameters¶
- arrnumpy.ndarray
Time-first array of rank 1–4 with any numeric dtype (int/uint/float). Coerced to float64 internally.
- skip_nabool, default True
Whether to ignore NaN values in each time series. If False, any NaN in the time series forces the output at that position to NaN.
Returns¶
- numpy.ndarray
Array with time axis removed; dtype float64.
Raises¶
- TypeError
If
arris not a 1D–4D numeric NumPy array after coercion.- ValueError
(Indirectly) for internal shape inconsistencies (rare; indicates malformed input).
Examples¶
- 1D (simple series):
>>> import numpy as np >>> from eo_processor import median >>> series = np.array([1.0, 5.0, 3.0, 2.0]) >>> median(series) 2.5 # average of (2.0,3.0) after sorting → [1,2,3,5]
- 2D (time, band):
>>> cube = np.array([[1, 10], ... [4, 2], ... [3, 8]], dtype=float) >>> median(cube) array([3., 8.]) # medians per band
- 3D (time, y, x) with NaNs:
>>> stack = np.array([ ... [[1.0, np.nan], ... [5.0, 4.0]], ... [[2.0, 7.0], ... [np.nan, 6.0]], ... [[3.0, 8.0], ... [2.0, 5.0]], ... ]) >>> median(stack) # skip_na=True (default) array([[2., 7.], [3., 5.]]) >>> median(stack, skip_na=False) array([[2., nan], [nan, 5.]])
- 4D (time, band, y, x):
>>> arr4d = np.random.rand(6, 2, 4, 4) # 6 time steps >>> med = median(arr4d) >>> med.shape (2, 4, 4)
Comparison to NumPy¶
NumPy’s np.nanmedian does not directly support a “time-first multi-rank specialized parallel loop”.
This implementation:
- Avoids Python overhead for per-pixel extraction loops.
- Selectively parallelizes only spatial axes (not the time axis) for better cache behavior.
Use np.nanmedian(arr, axis=0) for correctness comparison when skip_na=True.
Integration With Compositing¶
composite(arr, method="median") is a thin wrapper around median. Future compositing
methods (e.g., percentiles) will follow the same shape semantics.
Performance¶
Benchmark (single run; macOS ARM64, CPython 3.10, release build – time.perf_counter):
Time Series Shape |
Rust (s) |
NumPy (s) |
Rust Throughput (M elems/s) |
NumPy Throughput (M elems/s) |
Speedup |
|---|---|---|---|---|---|
(15, 1024, 1024) |
0.336 |
0.507 |
46.81 |
31.01 |
1.51x |
(15, 2000, 2000) |
1.313 |
1.947 |
45.72 |
30.82 |
1.48x |
Reproduction snippet: .. code-block:: python
import numpy as np, time from eo_processor import median
cube = np.random.rand(15, 2000, 2000) t0 = time.perf_counter(); rust_out = median(cube); rust_t = time.perf_counter() - t0 t0 = time.perf_counter(); np_out = np.nanmedian(cube, axis=0); np_t = time.perf_counter() - t0 print(f”Rust {rust_t:.3f}s vs NumPy {np_t:.3f}s speedup {np_t/rust_t:.2f}x”) assert np.allclose(rust_out, np_out, atol=1e-12)
Interpretation: - Gains come from per-pixel time series extraction, native sort, and parallel spatial iteration. - Very small time dimensions may show smaller or no gains; benchmark your workload. - Sorting cost scales with time axis length; consider temporal downsampling if performance is critical.
Performance Claim Template: .. code-block:: text
Benchmark: Shape: (15, 2000, 2000) NumPy nanmedian: 1.947s Rust median: 1.313s Speedup: 1.48x Methodology: single run, time.perf_counter(), float64, np.allclose(…, atol=1e-12)
Best Practices¶
Ensure the first axis truly represents time; transpose beforehand if needed.
For very large time dimensions, consider preprocessing (e.g., masking or downsampling) to reduce per-pixel sort cost.
Keep
skip_na=Trueunless specific sentinel NaN propagation semantics are needed.
Edge Cases¶
All NaNs in a pixel (with
skip_na=True) → output NaN.Empty time dimension (e.g., shape (0, y, x)) is not supported (will raise).
Extremely large time axis sizes may increase sort overhead; a streaming median is not implemented yet.
Version Notes¶
Document reflects public API as of version 0.6.0. Any change to behavior or added parameters will require a minor version bump per project guidelines.
License¶
MIT (see repository root).
End of median reference.