# eo-processor
[![PyPI Version](https://img.shields.io/pypi/v/eo-processor.svg?color=blue)](https://pypi.org/project/eo-processor/)
[![PyPI Downloads](https://static.pepy.tech/personalized-badge/eo-processor?period=total&units=INTERNATIONAL_SYSTEM&left_color=BLACK&right_color=GREEN&left_text=downloads)](https://pepy.tech/projects/eo-processor)
[![Coverage](./coverage-badge.svg)](#test-coverage)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)

High-performance Rust (PyO3) UDFs for Earth Observation (EO) processing with Python bindings.
Fast spectral indices, temporal statistics, masking utilities, and spatial distance functions.

---

## Documentation 

Full documentation is hosted online: https://bnjam.dev/eo-processor/

The site contains:
- Quick start guides and worked examples (NDVI, EVI, compositing, masking)
- Complete API reference with function signatures, expected input shapes/dtypes, and return types
- CLI usage and examples for batch processing and PNG preview generation
- Tutorials for integrating with XArray / Dask and for building reproducible benchmarks
- Developer & contribution notes (how to add Rust UDFs, register Python bindings, test, and create type stubs)
- Guidance on building from source, running the benchmark harness, and regenerating coverage badges
- Release notes / changelog and citation information

--- 

## Overview

`eo-processor` accelerates common remote sensing computations using safe Rust (no `unsafe`) exposed via PyO3.
All public functions interoperate with NumPy and can be embedded in XArray / Dask pipelines.
Rust kernels release Python's GIL; multi-core parallelism (via Rayon) is leveraged for selected operations (larger temporal aggregations, pairwise distances).

Focus areas:
- Spectral & change-detection indices
- Temporal statistics & median compositing (1D–4D stacks)
- Masking & data quality filtering (value / range / SCL / invalid sentinels)
- Pairwise spatial distances (utility layer)
- Benchmark harness for reproducible performance measurements

---

## Key Features

- Rust-accelerated numerical kernels (float64 internal, stable results)
- Automatic dimensional dispatch (1D / 2D for spectral indices, 1D–4D for temporal/masking)
- Change detection support (ΔNDVI, ΔNBR)
- Flexible masking utilities (exact values, ranges, SCL codes)
- Median, mean, sample standard deviation over time axis
- Pairwise distance functions (Euclidean, Manhattan, Chebyshev, Minkowski)
- Type stubs (`__init__.pyi`) for IDE / mypy
- Benchmark script with optional NumPy baseline comparison
- Pure CPU, no external network or storage side-effects in core path

---

## Installation

### PyPI (standard)

```bash
pip install eo-processor
```

Optional extras for array ecosystem:

```bash
pip install eo-processor[dask]
```

### Using `uv`

```bash
uv venv
source .venv/bin/activate
uv pip install eo-processor
```

### From Source

Requirements:
- Python 3.8+
- Rust toolchain (`rustup` recommended)
- `maturin` for building the extension module

```bash
git clone https://github.com/BnJam/eo-processor.git
cd eo-processor

pip install maturin
maturin develop --release        # build & install in-place
# or wheel:
maturin build --release
pip install target/wheels/*.whl
```

---

## Quick Start

```python
import numpy as np
from eo_processor import ndvi, ndwi, evi, normalized_difference

nir   = np.array([0.8, 0.7, 0.6])
red   = np.array([0.2, 0.1, 0.3])
blue  = np.array([0.1, 0.05, 0.08])
green = np.array([0.35, 0.42, 0.55])

print(ndvi(nir, red))               # NDVI
print(ndwi(green, nir))             # NDWI
print(evi(nir, red, blue))          # EVI
print(normalized_difference(nir, red))
```

All inputs may be any numeric NumPy dtype (int/uint/float); internal coercion to `float64`.

---

## API Summary

| Function | Purpose |
|----------|---------|
| `normalized_difference(a, b)` | Generic normalized difference `(a - b) / (a + b)` with near-zero denominator safeguard |
| `ndvi(nir, red)` | Normalized Difference Vegetation Index |
| `ndwi(green, nir)` | Normalized Difference Water Index |
| `evi(nir, red, blue)` / `enhanced_vegetation_index(...)` | Enhanced Vegetation Index (G*(NIR - Red)/(NIR + C1*Red - C2*Blue + L)) |
| `savi(nir, red, L=0.5)` | Soil Adjusted Vegetation Index `(NIR - Red)/(NIR + Red + L) * (1 + L)` |
| `nbr(nir, swir2)` | Normalized Burn Ratio `(NIR - SWIR2)/(NIR + SWIR2)` |
| `ndmi(nir, swir1)` | Normalized Difference Moisture Index `(NIR - SWIR1)/(NIR + SWIR1)` |
| `nbr2(swir1, swir2)` | Normalized Burn Ratio 2 `(SWIR1 - SWIR2)/(SWIR1 + SWIR2)` |
| `gci(nir, green)` | Green Chlorophyll Index `(NIR / Green) - 1` (division guard) |
| `delta_ndvi(pre_nir, pre_red, post_nir, post_red)` | Change in NDVI `(NDVI_pre - NDVI_post)` |
| `delta_nbr(pre_nir, pre_swir2, post_nir, post_swir2)` | Change in NBR `(NBR_pre - NBR_post)` |
| `median(arr, skip_na=True)` | Temporal median (time axis) with NaN skipping |
| `composite(arr, method="median")` | Compositing convenience (currently median only) |
| `temporal_mean(arr, skip_na=True)` | Mean across time axis |
| `temporal_std(arr, skip_na=True)` | Sample standard deviation (n-1) across time |
| `moving_average_temporal(arr, window, skip_na=True, mode="same")` | Sliding window mean (same/valid edge modes, NaN skip/propagate) |
| `moving_average_temporal_stride(arr, window, stride, skip_na=True, mode="same")` | Strided moving average (downsampled temporal smoothing) |
| `pixelwise_transform(arr, scale=1.0, offset=0.0, clamp_min=None, clamp_max=None)` | Per-pixel linear transform with optional clamping |
| `euclidean_distance(points_a, points_b)` | Pairwise Euclidean distances |
| `manhattan_distance(points_a, points_b)` | Pairwise L1 distances |
| `chebyshev_distance(points_a, points_b)` | Pairwise L∞ distances |
| `minkowski_distance(points_a, points_b, p)` | Pairwise L^p distances (p ≥ 1) |
| `mask_vals(arr, values=None, fill_value=None, nan_to=None)` | Mask exact codes, optional fill & NaN normalization |
| `replace_nans(arr, value)` | Replace all NaNs with `value` |
| `mask_out_range(arr, min_val=None, max_val=None, fill_value=None)` | Mask values outside `[min, max]` |
| `mask_in_range(arr, min_val=None, max_val=None, fill_value=None)` | Mask values inside `[min, max]` |
| `mask_invalid(arr, invalid_values, fill_value=None)` | Mask list of sentinel values (e.g., `0, -9999`) |
| `mask_scl(scl, keep_codes=None, fill_value=None)` | Mask Sentinel‑2 SCL codes, keeping selected classes |

Temporal dimension expectations:
- 1D: `(time,)`
- 2D: `(time, band)`
- 3D: `(time, y, x)`
- 4D: `(time, band, y, x)`

Distance functions: input shape `(N, D)` and `(M, D)` → output `(N, M)` (O(N*M) memory/time).

---

## Spectral & Change Detection Indices

All indices auto-dispatch 1D vs 2D arrays (matching shapes required).

### NDVI
`(NIR - Red) / (NIR + Red)`
Interpretation (approximate):
- < 0: water / snow
- 0.0–0.2: bare soil / built surfaces
- 0.2–0.5: sparse to moderate vegetation
- > 0.5: healthy dense vegetation

### NDWI
`(Green - NIR) / (Green + NIR)`
- > 0.3: open water (often 0.4–0.6)
- 0.0–0.3: moist vegetation / wetlands
- < 0.0: dry vegetation / soil

### EVI
`G * (NIR - Red) / (NIR + C1*Red - C2*Blue + L)` (MODIS constants: G=2.5, C1=6.0, C2=7.5, L=1.0)
Improves sensitivity over high biomass & reduces soil/atmospheric noise vs NDVI.

### SAVI
`(NIR - Red) / (NIR + Red + L) * (1 + L)`
Typical `L=0.5`. Larger `L` for sparse vegetation (bright soil), smaller for dense vegetation.

### NBR
`(NIR - SWIR2) / (NIR + SWIR2)`
Used for burn severity. Compare pre/post via ΔNBR.

### NDMI
`(NIR - SWIR1) / (NIR + SWIR1)`
Moisture / canopy water content indicator.

### NBR2
`(SWIR1 - SWIR2) / (SWIR1 + SWIR2)`
Highlights moisture & thermal differences; complementary to NBR/NDMI.

### GCI
`(NIR / Green) - 1`
Chlorophyll proxy; division by near-zero guarded to avoid instability.

### Change Detection
`ΔNDVI = NDVI_pre - NDVI_post`
`ΔNBR  = NBR_pre  - NBR_post`
Positive ΔNDVI: vegetation loss. Positive ΔNBR: burn severity increase.

---

## Masking Utilities

Rust-accelerated preprocessing helpers for quality filtering.

| Function | Notes |
|----------|-------|
| `mask_vals` | Exact equality masking (codes → `fill_value` or NaN) + optional NaN normalization |
| `replace_nans` | Force all NaNs to a scalar |
| `mask_out_range` | Mask outside interval |
| `mask_in_range` | Mask inside interval |
| `mask_invalid` | Shorthand for common invalid sentinels |
| `mask_scl` | Keep only selected Sentinel‑2 SCL classes |

Example:

```python
import numpy as np
from eo_processor import mask_vals, replace_nans, mask_out_range, mask_scl

scl = np.array([4,5,6,8,9])  # vegetation, vegetation, water, cloud (med), cloud (high)
clear = mask_scl(scl, keep_codes=[4,5,6])   # -> [4., 5., 6., nan, nan]

ndvi = np.array([-0.3, 0.1, 0.8, 1.2])
valid = mask_out_range(ndvi, min_val=-0.2, max_val=1.0)  # -> [nan,0.1,0.8,nan]

arr = np.array([0, 100, -9999, 50])
clean = mask_vals(arr, values=[0, -9999])  # -> [nan,100.,nan,50.]
filled = replace_nans(clean, -9999.0)      # -> [-9999.,100.,-9999.,50.]
```

---

## Temporal Statistics & Compositing

Median, mean, and standard deviation across time axis (skip NaNs optional):

```python
import numpy as np
from eo_processor import temporal_mean, temporal_std, median

cube = np.random.rand(12, 256, 256)  # (time, y, x)
mean_img  = temporal_mean(cube)      # (256, 256)
std_img   = temporal_std(cube)       # (256, 256)
median_img = median(cube)
```

`composite(cube, method="median")` currently routes to `median`.

## Trend Analysis

`eo-processor` provides a simple trend analysis UDF to detect breaks in a time series. This implementation uses a recursive approach, which is more performant than the previous iterative version.

| Function | Purpose |
|----------|---------|
| `trend_analysis(y, threshold)` | Detects breaks in a time series by recursively fitting linear models. |

Example:

```python
import numpy as np
from eo_processor._core import trend_analysis

# Generate some sample time-series data with a break
y = np.concatenate([
    np.linspace(0, 10, 50),
    np.linspace(10, 0, 50)
]) + np.random.normal(0, 0.5, 100)

# Run the trend analysis
segments = trend_analysis(y.tolist(), threshold=5.0)

# Print the results
print("Trend Analysis Results:")
for segment in segments:
    print(
        f"  Start: {segment.start_index}, "
        f"End: {segment.end_index}, "
        f"Slope: {segment.slope:.4f}, "
        f"Intercept: {segment.intercept:.4f}"
    )
```

## Advanced Temporal & Pixelwise Processing

High-performance smoothing and per-pixel transforms for deep temporal stacks and large spatial tiles.

Formulas:
- Moving average: `MA_t = mean(x_{start..end})` where `[start, end]` is the window centered (same) or fixed (valid) around `t`.
- Strided moving average: sample `MA_{k*stride}` for integer `k` to downsample temporal resolution.
- Pixelwise transform: `y = clamp(scale * x + offset)` (clamping optional).

Example (moving average with edge handling and NaN skipping):
```python
from eo_processor import moving_average_temporal
import numpy as np

series = np.array([1.0, 2.0, 3.0, 4.0])
ma_same  = moving_average_temporal(series, window=3, mode="same")   # length preserved
ma_valid = moving_average_temporal(series, window=3, mode="valid")  # only full windows
```

3D temporal cube smoothing (deep stack):
```python
cube = np.random.rand(48, 1024, 1024)
smoothed = moving_average_temporal(cube, window=5, mode="same", skip_na=True)
```

Strided downsampling (reduce temporal resolution):
```python
from eo_processor import moving_average_temporal_stride
downsampled = moving_average_temporal_stride(cube, window=5, stride=4, mode="same")
print(downsampled.shape)  # (ceil(48/4), 1024, 1024)
```

Pixelwise transform (scale + offset + clamping):
```python
from eo_processor import pixelwise_transform
arr = np.random.rand(2048, 2048)
stretched = pixelwise_transform(arr, scale=1.2, offset=-0.1, clamp_min=0.0, clamp_max=1.0)
```

Chaining operations (temporal smoothing then per-pixel adjustment):
```python
ma = moving_average_temporal(cube, window=7)
enhanced = pixelwise_transform(ma, scale=1.1, offset=0.05, clamp_min=0.0, clamp_max=1.0)
```

Performance Notes:
- Prefix-sum approach makes moving average O(T) per pixel independent of window size.
- Parallelization occurs over spatial/band pixels for 3D/4D arrays.
- Strided variant reduces output size for downstream tasks (e.g., model inference, feature extraction).
- Pixelwise transforms are single-pass and can be fused with other operations in custom workflows.

Use Cases:
- Smoothing noisy temporal reflectance or index stacks prior to trend analysis.
- Reducing temporal dimension before ML model training (stride-based smoothing).
- Intensity scaling & clamping for visualization or input normalization.

---

## Spatial Distances

Pairwise distance matrices:

```python
import numpy as np
from eo_processor import euclidean_distance, manhattan_distance

A = np.random.rand(100, 8)  # (N, D)
B = np.random.rand(250, 8)  # (M, D)

dist_e = euclidean_distance(A, B)    # (100, 250)
dist_l1 = manhattan_distance(A, B)
```

For large N*M consider spatial indexing or chunking (not implemented).

---

## XArray / Dask Integration

```python
import dask.array as da
import xarray as xr
from eo_processor import ndvi

nir_dask  = da.random.random((5000, 5000), chunks=(500, 500))
red_dask  = da.random.random((5000, 5000), chunks=(500, 500))

nir_xr = xr.DataArray(nir_dask, dims=["y", "x"])
red_xr = xr.DataArray(red_dask, dims=["y", "x"])

ndvi_xr = xr.apply_ufunc(
    ndvi,
    nir_xr,
    red_xr,
    dask="parallelized",
    output_dtypes=[float],
)

result = ndvi_xr.compute()
```

---

## CLI Usage

Console script exposed as `eo-processor` (installed via PyPI):

```bash
# Single index
eo-processor --index ndvi --nir nir.npy --red red.npy --out ndvi.npy

# Multiple indices (provide necessary bands)
eo-processor --index ndvi savi ndmi nbr --nir nir.npy --red red.npy --swir1 swir1.npy --swir2 swir2.npy --out-dir outputs/

# Change detection (ΔNBR)
eo-processor --index delta_nbr \
  --pre-nir pre/nir.npy --pre-swir2 pre/swir2.npy \
  --post-nir post/nir.npy --post-swir2 post/swir2.npy \
  --out outputs/delta_nbr.npy

# List supported indices
eo-processor --list

# Apply cloud mask (0=cloud, 1=clear)
eo-processor --index ndvi --nir nir.npy --red red.npy --mask cloudmask.npy --out ndvi_masked.npy

# PNG preview (requires optional Pillow)
eo-processor --index ndvi --nir nir.npy --red red.npy --out ndvi.npy --png-preview ndvi.png
```

Selected flags:
- `--savi-l` soil brightness factor for SAVI.
- `--clamp MIN MAX` output range clamping.
- `--allow-missing` skip indices lacking required bands instead of error.

---

## Performance

Example benchmark (NDVI on a large array):

```python
import numpy as np, time
from eo_processor import ndvi

nir = np.random.rand(5000, 5000)
red = np.random.rand(5000, 5000)

t0 = time.time()
rust_out = ndvi(nir, red)
t_rust = time.time() - t0

t0 = time.time()
numpy_out = (nir - red) / (nir + red)
t_numpy = time.time() - t0

print(f"Rust: {t_rust:.3f}s  NumPy: {t_numpy:.3f}s  Speedup: {t_numpy/t_rust:.2f}x")
```

Speedups depend on array shape, memory bandwidth, and CPU cores.
Use the benchmark harness for systematic comparison.

---

## Benchmark Harness

`scripts/benchmark.py` provides grouped tests:

```bash
# Spectral functions (e.g., NDVI, NDWI, EVI, SAVI, NBR, NDMI, NBR2, GCI)
python scripts/benchmark.py --group spectral --height 2048 --width 2048

# Temporal (compare Rust vs NumPy)
python scripts/benchmark.py --group temporal --time 24 --height 1024 --width 1024 --compare-numpy

# Distances
python scripts/benchmark.py --group distances --points-a 2000 --points-b 2000 --point-dim 8

# All groups; write reports
python scripts/benchmark.py --group all --compare-numpy --json-out bench.json --md-out bench.md
```

Key options:
- `--functions <list>` override group selection.
- `--compare-numpy` baseline timings (speedup > 1.0 ⇒ Rust faster).
- `--minkowski-p <p>` set order (p ≥ 1).
- `--loops`, `--warmups` repetition control.
- `--json-out`, `--md-out` artifact outputs.

---

## Test Coverage

Regenerate badge after modifying logic/tests:

```bash
tox -e coverage
python scripts/generate_coverage_badge.py coverage.xml coverage-badge.svg
```

Ensure the badge is committed if coverage changes materially.

---

## Contributing

Follow repository guidelines (`AGENTS.md`, copilot instructions). Checklist before proposing a PR:

1. Implement Rust function(s) (no `unsafe`)
2. Register via `wrap_pyfunction!` in `src/lib.rs`
3. Export in `python/eo_processor/__init__.py`
4. Add type stubs in `python/eo_processor/__init__.pyi`
5. Add tests (`tests/test_<feature>.py`) including edge cases & NaN handling
6. Update README (API Summary, examples, formulas)
7. Run:
   - `cargo fmt`
   - `cargo clippy -- -D warnings`
   - `cargo test` (if Rust tests)
   - `pytest`
   - `tox -e coverage`
   - `ruff` and `mypy` (if configured)
8. Update version if public API added (minor bump)
9. Regenerate coverage badge if changed
10. Confirm no secrets / large binaries staged

Commit message pattern:

```
<type>(scope): concise summary

Optional rationale, benchmarks, references
```

Types: feat, fix, perf, docs, test, chore, build, ci

Example:
```
feat(indices): add Green Chlorophyll Index (GCI)

Implements 1D/2D dispatch, tests, docs, benchmark entry.
```

---

## Semantic Versioning

- Patch: Internal fixes, refactors, docs only
- Minor: New functions (backward-compatible)
- Major: Breaking changes (signature changes, removals)

---

## Roadmap (Indicative)

- Additional spectral indices (future: NBR derivatives, custom moisture composites)
- Sliding window / neighborhood statistics (mean, variance)
- Optional multithread strategies for very large temporal cubes
- Expanded masking (boolean predicate composition)
- Extended change metrics (ΔNDMI, fractional vegetation cover)

(Items requiring strategic design will request human review before implementation.)

---

## Scientific Citation

```bibtex
@software{eo_processor,
  title   = {eo-processor: High-performance Rust UDFs for Earth Observation},
  author  = {Ben Smith},
  year    = {2025},
  url     = {https://github.com/BnJam/eo-processor}
}
```

---

## License

MIT License – see `LICENSE`.

---

## Disclaimer

Core library focuses on computational primitives. It does NOT perform:
- Sensor-specific radiometric calibration
- Atmospheric correction
- CRS reprojection / spatial indexing
- Cloud/shadow detection algorithms beyond simple masking
- Data acquisition / I/O orchestration

Integrate with domain tools (rasterio, xarray, dask, geopandas) for full pipelines.

---

## Support

Open issues for bugs or enhancements. Provide:
- Reproducible snippet
- Input shapes / dtypes
- Expected vs actual output
- Benchmark data (if performance-related)

---


---

## Acknowledgements

Built with PyO3, NumPy, ndarray, and Rayon.
Thanks to the scientific EO community for standardized index formulations.

---

Enjoy fast, reproducible Earth Observation processing!