.. _median:

median
======

.. currentmodule:: eo_processor

.. autofunction:: median

Overview
--------
`median` computes the per‑pixel (or per element) median along the leading *time* axis
of arrays with 1–4 dimensions:

Supported input layouts (time-first):
- 1D: (time,)
- 2D: (time, band|feature)
- 3D: (time, y, x)
- 4D: (time, band, y, x)

The output removes the leading time axis and preserves the remaining spatial / band axes:
- 1D input → scalar (float64)
- 2D input → (band,)
- 3D input → (y, x)
- 4D input → (band, y, x)

NaN Handling
------------
By default (`skip_na=True`) NaNs are excluded from the median calculation for each pixel / band.
If all values for a pixel along time are NaN (after skipping), the result is NaN.
If `skip_na=False`, any NaN in the time series forces the output at that position to NaN.

Performance Characteristics
---------------------------
The function is implemented in Rust with specialized dispatch for each supported rank:
- 1D & 2D: sequential loops (small overhead; no parallelism needed).
- 3D & 4D: parallel iteration over spatial indices (rows/columns or band/rows/columns) using rayon.
Each pixel’s time series is copied into a temporary vector, optionally filtered for NaNs, sorted,
and the median selected (middle element or average of the two middle elements for even-length series).

Memory Notes:
- Sorting per pixel allocates a vector of length = time dimension.
- There are no large intermediate arrays created; processing is per pixel.
- Internal working dtype is float64 (inputs are coerced once for stability).

Parameters
----------
arr : numpy.ndarray
    Time-first array of rank 1–4 with any numeric dtype (int/uint/float). Coerced to float64 internally.
skip_na : bool, default True
    Whether to ignore NaN values in each time series. If False, any NaN in the time series forces the output at that position to NaN.

Returns
-------
numpy.ndarray
    Array with time axis removed; dtype float64.

Raises
------
TypeError
    If `arr` is not a 1D–4D numeric NumPy array after coercion.
ValueError
    (Indirectly) for internal shape inconsistencies (rare; indicates malformed input).

Examples
--------
1D (simple series):
    >>> import numpy as np
    >>> from eo_processor import median
    >>> series = np.array([1.0, 5.0, 3.0, 2.0])
    >>> median(series)
    2.5  # average of (2.0,3.0) after sorting → [1,2,3,5]

2D (time, band):
    >>> cube = np.array([[1, 10],
    ...                  [4,  2],
    ...                  [3,  8]], dtype=float)
    >>> median(cube)
    array([3., 8.])  # medians per band

3D (time, y, x) with NaNs:
    >>> stack = np.array([
    ...     [[1.0,  np.nan],
    ...      [5.0,  4.0]],
    ...     [[2.0,  7.0],
    ...      [np.nan, 6.0]],
    ...     [[3.0,  8.0],
    ...      [2.0,  5.0]],
    ... ])
    >>> median(stack)  # skip_na=True (default)
    array([[2., 7.],
           [3., 5.]])
    >>> median(stack, skip_na=False)
    array([[2., nan],
           [nan, 5.]])

4D (time, band, y, x):
    >>> arr4d = np.random.rand(6, 2, 4, 4)  # 6 time steps
    >>> med = median(arr4d)
    >>> med.shape
    (2, 4, 4)

Comparison to NumPy
-------------------
NumPy’s `np.nanmedian` does not directly support a “time-first multi-rank specialized parallel loop”.
This implementation:
- Avoids Python overhead for per-pixel extraction loops.
- Selectively parallelizes only spatial axes (not the time axis) for better cache behavior.
Use `np.nanmedian(arr, axis=0)` for correctness comparison when `skip_na=True`.

Integration With Compositing
----------------------------
`composite(arr, method="median")` is a thin wrapper around `median`. Future compositing
methods (e.g., percentiles) will follow the same shape semantics.

Performance
-----------
Benchmark (single run; macOS ARM64, CPython 3.10, release build – time.perf_counter):

.. list-table:: Median Benchmark (Single Run)
   :header-rows: 1

   * - Time Series Shape
     - Rust (s)
     - NumPy (s)
     - Rust Throughput (M elems/s)
     - NumPy Throughput (M elems/s)
     - Speedup
   * - (15, 1024, 1024)
     - 0.336
     - 0.507
     - 46.81
     - 31.01
     - 1.51x
   * - (15, 2000, 2000)
     - 1.313
     - 1.947
     - 45.72
     - 30.82
     - 1.48x

Reproduction snippet:
.. code-block:: python

    import numpy as np, time
    from eo_processor import median

    cube = np.random.rand(15, 2000, 2000)
    t0 = time.perf_counter(); rust_out = median(cube); rust_t = time.perf_counter() - t0
    t0 = time.perf_counter(); np_out = np.nanmedian(cube, axis=0); np_t = time.perf_counter() - t0
    print(f"Rust {rust_t:.3f}s vs NumPy {np_t:.3f}s speedup {np_t/rust_t:.2f}x")
    assert np.allclose(rust_out, np_out, atol=1e-12)

Interpretation:
- Gains come from per-pixel time series extraction, native sort, and parallel spatial iteration.
- Very small time dimensions may show smaller or no gains; benchmark your workload.
- Sorting cost scales with time axis length; consider temporal downsampling if performance is critical.

Performance Claim Template:
.. code-block:: text

    Benchmark:
    Shape: (15, 2000, 2000)
    NumPy nanmedian: 1.947s
    Rust median: 1.313s
    Speedup: 1.48x
    Methodology: single run, time.perf_counter(), float64, np.allclose(..., atol=1e-12)

Related Functions
-----------------
- :func:`temporal_mean` – Mean along time axis
- :func:`temporal_std` – Sample standard deviation (ddof=1)
- :func:`composite` – Convenience dispatcher (currently median only)

Best Practices
--------------
- Ensure the first axis truly represents time; transpose beforehand if needed.
- For very large time dimensions, consider preprocessing (e.g., masking or downsampling) to reduce per-pixel sort cost.
- Keep `skip_na=True` unless specific sentinel NaN propagation semantics are needed.

Edge Cases
----------
- All NaNs in a pixel (with `skip_na=True`) → output NaN.
- Empty time dimension (e.g., shape (0, y, x)) is not supported (will raise).
- Extremely large time axis sizes may increase sort overhead; a streaming median is not implemented yet.

Version Notes
-------------
Document reflects public API as of version 0.6.0. Any change to behavior or added parameters will require a minor version bump per project guidelines.

License
-------
MIT (see repository root).

End of median reference.