eo-processor

PyPI Version PyPI Downloads Coverage License: MIT

High-performance Rust (PyO3) UDFs for Earth Observation (EO) processing with Python bindings. Fast spectral indices, temporal statistics, masking utilities, and spatial distance functions.


Documentation

Full documentation is hosted online: https://bnjam.dev/eo-processor/

The site contains:

  • Quick start guides and worked examples (NDVI, EVI, compositing, masking)

  • Complete API reference with function signatures, expected input shapes/dtypes, and return types

  • CLI usage and examples for batch processing and PNG preview generation

  • Tutorials for integrating with XArray / Dask and for building reproducible benchmarks

  • Developer & contribution notes (how to add Rust UDFs, register Python bindings, test, and create type stubs)

  • Guidance on building from source, running the benchmark harness, and regenerating coverage badges

  • Release notes / changelog and citation information


Overview

eo-processor accelerates common remote sensing computations using safe Rust (no unsafe) exposed via PyO3. All public functions interoperate with NumPy and can be embedded in XArray / Dask pipelines. Rust kernels release Python’s GIL; multi-core parallelism (via Rayon) is leveraged for selected operations (larger temporal aggregations, pairwise distances).

Focus areas:

  • Spectral & change-detection indices

  • Temporal statistics & median compositing (1D–4D stacks)

  • Masking & data quality filtering (value / range / SCL / invalid sentinels)

  • Pairwise spatial distances (utility layer)

  • Benchmark harness for reproducible performance measurements


Key Features

  • Rust-accelerated numerical kernels (float64 internal, stable results)

  • Automatic dimensional dispatch (1D / 2D for spectral indices, 1D–4D for temporal/masking)

  • Change detection support (ΔNDVI, ΔNBR)

  • Flexible masking utilities (exact values, ranges, SCL codes)

  • Median, mean, sample standard deviation over time axis

  • Pairwise distance functions (Euclidean, Manhattan, Chebyshev, Minkowski)

  • Type stubs (__init__.pyi) for IDE / mypy

  • Benchmark script with optional NumPy baseline comparison

  • Pure CPU, no external network or storage side-effects in core path


Installation

PyPI (standard)

pip install eo-processor

Optional extras for array ecosystem:

pip install eo-processor[dask]

Using uv

uv venv
source .venv/bin/activate
uv pip install eo-processor

From Source

Requirements:

  • Python 3.8+

  • Rust toolchain (rustup recommended)

  • maturin for building the extension module

git clone https://github.com/BnJam/eo-processor.git
cd eo-processor

pip install maturin
maturin develop --release        # build & install in-place
# or wheel:
maturin build --release
pip install target/wheels/*.whl

Quick Start

import numpy as np
from eo_processor import ndvi, ndwi, evi, normalized_difference

nir   = np.array([0.8, 0.7, 0.6])
red   = np.array([0.2, 0.1, 0.3])
blue  = np.array([0.1, 0.05, 0.08])
green = np.array([0.35, 0.42, 0.55])

print(ndvi(nir, red))               # NDVI
print(ndwi(green, nir))             # NDWI
print(evi(nir, red, blue))          # EVI
print(normalized_difference(nir, red))

All inputs may be any numeric NumPy dtype (int/uint/float); internal coercion to float64.


API Summary

Function

Purpose

normalized_difference(a, b)

Generic normalized difference (a - b) / (a + b) with near-zero denominator safeguard

ndvi(nir, red)

Normalized Difference Vegetation Index

ndwi(green, nir)

Normalized Difference Water Index

evi(nir, red, blue) / enhanced_vegetation_index(...)

Enhanced Vegetation Index (G*(NIR - Red)/(NIR + C1Red - C2Blue + L))

savi(nir, red, L=0.5)

Soil Adjusted Vegetation Index (NIR - Red)/(NIR + Red + L) * (1 + L)

nbr(nir, swir2)

Normalized Burn Ratio (NIR - SWIR2)/(NIR + SWIR2)

ndmi(nir, swir1)

Normalized Difference Moisture Index (NIR - SWIR1)/(NIR + SWIR1)

nbr2(swir1, swir2)

Normalized Burn Ratio 2 (SWIR1 - SWIR2)/(SWIR1 + SWIR2)

gci(nir, green)

Green Chlorophyll Index (NIR / Green) - 1 (division guard)

delta_ndvi(pre_nir, pre_red, post_nir, post_red)

Change in NDVI (NDVI_pre - NDVI_post)

delta_nbr(pre_nir, pre_swir2, post_nir, post_swir2)

Change in NBR (NBR_pre - NBR_post)

median(arr, skip_na=True)

Temporal median (time axis) with NaN skipping

composite(arr, method="median")

Compositing convenience (currently median only)

temporal_mean(arr, skip_na=True)

Mean across time axis

temporal_std(arr, skip_na=True)

Sample standard deviation (n-1) across time

moving_average_temporal(arr, window, skip_na=True, mode="same")

Sliding window mean (same/valid edge modes, NaN skip/propagate)

moving_average_temporal_stride(arr, window, stride, skip_na=True, mode="same")

Strided moving average (downsampled temporal smoothing)

pixelwise_transform(arr, scale=1.0, offset=0.0, clamp_min=None, clamp_max=None)

Per-pixel linear transform with optional clamping

euclidean_distance(points_a, points_b)

Pairwise Euclidean distances

manhattan_distance(points_a, points_b)

Pairwise L1 distances

chebyshev_distance(points_a, points_b)

Pairwise L∞ distances

minkowski_distance(points_a, points_b, p)

Pairwise L^p distances (p ≥ 1)

mask_vals(arr, values=None, fill_value=None, nan_to=None)

Mask exact codes, optional fill & NaN normalization

replace_nans(arr, value)

Replace all NaNs with value

mask_out_range(arr, min_val=None, max_val=None, fill_value=None)

Mask values outside [min, max]

mask_in_range(arr, min_val=None, max_val=None, fill_value=None)

Mask values inside [min, max]

mask_invalid(arr, invalid_values, fill_value=None)

Mask list of sentinel values (e.g., 0, -9999)

mask_scl(scl, keep_codes=None, fill_value=None)

Mask Sentinel‑2 SCL codes, keeping selected classes

Temporal dimension expectations:

  • 1D: (time,)

  • 2D: (time, band)

  • 3D: (time, y, x)

  • 4D: (time, band, y, x)

Distance functions: input shape (N, D) and (M, D) → output (N, M) (O(N*M) memory/time).


Spectral & Change Detection Indices

All indices auto-dispatch 1D vs 2D arrays (matching shapes required).

NDVI

(NIR - Red) / (NIR + Red) Interpretation (approximate):

  • < 0: water / snow

  • 0.0–0.2: bare soil / built surfaces

  • 0.2–0.5: sparse to moderate vegetation

  • 0.5: healthy dense vegetation

NDWI

(Green - NIR) / (Green + NIR)

  • 0.3: open water (often 0.4–0.6)

  • 0.0–0.3: moist vegetation / wetlands

  • < 0.0: dry vegetation / soil

EVI

G * (NIR - Red) / (NIR + C1*Red - C2*Blue + L) (MODIS constants: G=2.5, C1=6.0, C2=7.5, L=1.0) Improves sensitivity over high biomass & reduces soil/atmospheric noise vs NDVI.

SAVI

(NIR - Red) / (NIR + Red + L) * (1 + L) Typical L=0.5. Larger L for sparse vegetation (bright soil), smaller for dense vegetation.

NBR

(NIR - SWIR2) / (NIR + SWIR2) Used for burn severity. Compare pre/post via ΔNBR.

NDMI

(NIR - SWIR1) / (NIR + SWIR1) Moisture / canopy water content indicator.

NBR2

(SWIR1 - SWIR2) / (SWIR1 + SWIR2) Highlights moisture & thermal differences; complementary to NBR/NDMI.

GCI

(NIR / Green) - 1 Chlorophyll proxy; division by near-zero guarded to avoid instability.

Change Detection

ΔNDVI = NDVI_pre - NDVI_post ΔNBR  = NBR_pre  - NBR_post Positive ΔNDVI: vegetation loss. Positive ΔNBR: burn severity increase.


Masking Utilities

Rust-accelerated preprocessing helpers for quality filtering.

Function

Notes

mask_vals

Exact equality masking (codes → fill_value or NaN) + optional NaN normalization

replace_nans

Force all NaNs to a scalar

mask_out_range

Mask outside interval

mask_in_range

Mask inside interval

mask_invalid

Shorthand for common invalid sentinels

mask_scl

Keep only selected Sentinel‑2 SCL classes

Example:

import numpy as np
from eo_processor import mask_vals, replace_nans, mask_out_range, mask_scl

scl = np.array([4,5,6,8,9])  # vegetation, vegetation, water, cloud (med), cloud (high)
clear = mask_scl(scl, keep_codes=[4,5,6])   # -> [4., 5., 6., nan, nan]

ndvi = np.array([-0.3, 0.1, 0.8, 1.2])
valid = mask_out_range(ndvi, min_val=-0.2, max_val=1.0)  # -> [nan,0.1,0.8,nan]

arr = np.array([0, 100, -9999, 50])
clean = mask_vals(arr, values=[0, -9999])  # -> [nan,100.,nan,50.]
filled = replace_nans(clean, -9999.0)      # -> [-9999.,100.,-9999.,50.]

Temporal Statistics & Compositing

Median, mean, and standard deviation across time axis (skip NaNs optional):

import numpy as np
from eo_processor import temporal_mean, temporal_std, median

cube = np.random.rand(12, 256, 256)  # (time, y, x)
mean_img  = temporal_mean(cube)      # (256, 256)
std_img   = temporal_std(cube)       # (256, 256)
median_img = median(cube)

composite(cube, method="median") currently routes to median.

Trend Analysis

eo-processor provides a simple trend analysis UDF to detect breaks in a time series. This implementation uses a recursive approach, which is more performant than the previous iterative version.

Function

Purpose

trend_analysis(y, threshold)

Detects breaks in a time series by recursively fitting linear models.

Example:

import numpy as np
from eo_processor._core import trend_analysis

# Generate some sample time-series data with a break
y = np.concatenate([
    np.linspace(0, 10, 50),
    np.linspace(10, 0, 50)
]) + np.random.normal(0, 0.5, 100)

# Run the trend analysis
segments = trend_analysis(y.tolist(), threshold=5.0)

# Print the results
print("Trend Analysis Results:")
for segment in segments:
    print(
        f"  Start: {segment.start_index}, "
        f"End: {segment.end_index}, "
        f"Slope: {segment.slope:.4f}, "
        f"Intercept: {segment.intercept:.4f}"
    )

Advanced Temporal & Pixelwise Processing

High-performance smoothing and per-pixel transforms for deep temporal stacks and large spatial tiles.

Formulas:

  • Moving average: MA_t = mean(x_{start..end}) where [start, end] is the window centered (same) or fixed (valid) around t.

  • Strided moving average: sample MA_{k*stride} for integer k to downsample temporal resolution.

  • Pixelwise transform: y = clamp(scale * x + offset) (clamping optional).

Example (moving average with edge handling and NaN skipping):

from eo_processor import moving_average_temporal
import numpy as np

series = np.array([1.0, 2.0, 3.0, 4.0])
ma_same  = moving_average_temporal(series, window=3, mode="same")   # length preserved
ma_valid = moving_average_temporal(series, window=3, mode="valid")  # only full windows

3D temporal cube smoothing (deep stack):

cube = np.random.rand(48, 1024, 1024)
smoothed = moving_average_temporal(cube, window=5, mode="same", skip_na=True)

Strided downsampling (reduce temporal resolution):

from eo_processor import moving_average_temporal_stride
downsampled = moving_average_temporal_stride(cube, window=5, stride=4, mode="same")
print(downsampled.shape)  # (ceil(48/4), 1024, 1024)

Pixelwise transform (scale + offset + clamping):

from eo_processor import pixelwise_transform
arr = np.random.rand(2048, 2048)
stretched = pixelwise_transform(arr, scale=1.2, offset=-0.1, clamp_min=0.0, clamp_max=1.0)

Chaining operations (temporal smoothing then per-pixel adjustment):

ma = moving_average_temporal(cube, window=7)
enhanced = pixelwise_transform(ma, scale=1.1, offset=0.05, clamp_min=0.0, clamp_max=1.0)

Performance Notes:

  • Prefix-sum approach makes moving average O(T) per pixel independent of window size.

  • Parallelization occurs over spatial/band pixels for 3D/4D arrays.

  • Strided variant reduces output size for downstream tasks (e.g., model inference, feature extraction).

  • Pixelwise transforms are single-pass and can be fused with other operations in custom workflows.

Use Cases:

  • Smoothing noisy temporal reflectance or index stacks prior to trend analysis.

  • Reducing temporal dimension before ML model training (stride-based smoothing).

  • Intensity scaling & clamping for visualization or input normalization.


Spatial Distances

Pairwise distance matrices:

import numpy as np
from eo_processor import euclidean_distance, manhattan_distance

A = np.random.rand(100, 8)  # (N, D)
B = np.random.rand(250, 8)  # (M, D)

dist_e = euclidean_distance(A, B)    # (100, 250)
dist_l1 = manhattan_distance(A, B)

For large N*M consider spatial indexing or chunking (not implemented).


XArray / Dask Integration

import dask.array as da
import xarray as xr
from eo_processor import ndvi

nir_dask  = da.random.random((5000, 5000), chunks=(500, 500))
red_dask  = da.random.random((5000, 5000), chunks=(500, 500))

nir_xr = xr.DataArray(nir_dask, dims=["y", "x"])
red_xr = xr.DataArray(red_dask, dims=["y", "x"])

ndvi_xr = xr.apply_ufunc(
    ndvi,
    nir_xr,
    red_xr,
    dask="parallelized",
    output_dtypes=[float],
)

result = ndvi_xr.compute()

CLI Usage

Console script exposed as eo-processor (installed via PyPI):

# Single index
eo-processor --index ndvi --nir nir.npy --red red.npy --out ndvi.npy

# Multiple indices (provide necessary bands)
eo-processor --index ndvi savi ndmi nbr --nir nir.npy --red red.npy --swir1 swir1.npy --swir2 swir2.npy --out-dir outputs/

# Change detection (ΔNBR)
eo-processor --index delta_nbr \
  --pre-nir pre/nir.npy --pre-swir2 pre/swir2.npy \
  --post-nir post/nir.npy --post-swir2 post/swir2.npy \
  --out outputs/delta_nbr.npy

# List supported indices
eo-processor --list

# Apply cloud mask (0=cloud, 1=clear)
eo-processor --index ndvi --nir nir.npy --red red.npy --mask cloudmask.npy --out ndvi_masked.npy

# PNG preview (requires optional Pillow)
eo-processor --index ndvi --nir nir.npy --red red.npy --out ndvi.npy --png-preview ndvi.png

Selected flags:

  • --savi-l soil brightness factor for SAVI.

  • --clamp MIN MAX output range clamping.

  • --allow-missing skip indices lacking required bands instead of error.


Performance

Example benchmark (NDVI on a large array):

import numpy as np, time
from eo_processor import ndvi

nir = np.random.rand(5000, 5000)
red = np.random.rand(5000, 5000)

t0 = time.time()
rust_out = ndvi(nir, red)
t_rust = time.time() - t0

t0 = time.time()
numpy_out = (nir - red) / (nir + red)
t_numpy = time.time() - t0

print(f"Rust: {t_rust:.3f}s  NumPy: {t_numpy:.3f}s  Speedup: {t_numpy/t_rust:.2f}x")

Speedups depend on array shape, memory bandwidth, and CPU cores. Use the benchmark harness for systematic comparison.


Benchmark Harness

scripts/benchmark.py provides grouped tests:

# Spectral functions (e.g., NDVI, NDWI, EVI, SAVI, NBR, NDMI, NBR2, GCI)
python scripts/benchmark.py --group spectral --height 2048 --width 2048

# Temporal (compare Rust vs NumPy)
python scripts/benchmark.py --group temporal --time 24 --height 1024 --width 1024 --compare-numpy

# Distances
python scripts/benchmark.py --group distances --points-a 2000 --points-b 2000 --point-dim 8

# All groups; write reports
python scripts/benchmark.py --group all --compare-numpy --json-out bench.json --md-out bench.md

Key options:

  • --functions <list> override group selection.

  • --compare-numpy baseline timings (speedup > 1.0 ⇒ Rust faster).

  • --minkowski-p <p> set order (p ≥ 1).

  • --loops, --warmups repetition control.

  • --json-out, --md-out artifact outputs.


Test Coverage

Regenerate badge after modifying logic/tests:

tox -e coverage
python scripts/generate_coverage_badge.py coverage.xml coverage-badge.svg

Ensure the badge is committed if coverage changes materially.


Contributing

Follow repository guidelines (AGENTS.md, copilot instructions). Checklist before proposing a PR:

  1. Implement Rust function(s) (no unsafe)

  2. Register via wrap_pyfunction! in src/lib.rs

  3. Export in python/eo_processor/__init__.py

  4. Add type stubs in python/eo_processor/__init__.pyi

  5. Add tests (tests/test_<feature>.py) including edge cases & NaN handling

  6. Update README (API Summary, examples, formulas)

  7. Run:

    • cargo fmt

    • cargo clippy -- -D warnings

    • cargo test (if Rust tests)

    • pytest

    • tox -e coverage

    • ruff and mypy (if configured)

  8. Update version if public API added (minor bump)

  9. Regenerate coverage badge if changed

  10. Confirm no secrets / large binaries staged

Commit message pattern:

<type>(scope): concise summary

Optional rationale, benchmarks, references

Types: feat, fix, perf, docs, test, chore, build, ci

Example:

feat(indices): add Green Chlorophyll Index (GCI)

Implements 1D/2D dispatch, tests, docs, benchmark entry.

Semantic Versioning

  • Patch: Internal fixes, refactors, docs only

  • Minor: New functions (backward-compatible)

  • Major: Breaking changes (signature changes, removals)


Roadmap (Indicative)

  • Additional spectral indices (future: NBR derivatives, custom moisture composites)

  • Sliding window / neighborhood statistics (mean, variance)

  • Optional multithread strategies for very large temporal cubes

  • Expanded masking (boolean predicate composition)

  • Extended change metrics (ΔNDMI, fractional vegetation cover)

(Items requiring strategic design will request human review before implementation.)


Scientific Citation

@software{eo_processor,
  title   = {eo-processor: High-performance Rust UDFs for Earth Observation},
  author  = {Ben Smith},
  year    = {2025},
  url     = {https://github.com/BnJam/eo-processor}
}

License

MIT License – see LICENSE.


Disclaimer

Core library focuses on computational primitives. It does NOT perform:

  • Sensor-specific radiometric calibration

  • Atmospheric correction

  • CRS reprojection / spatial indexing

  • Cloud/shadow detection algorithms beyond simple masking

  • Data acquisition / I/O orchestration

Integrate with domain tools (rasterio, xarray, dask, geopandas) for full pipelines.


Support

Open issues for bugs or enhancements. Provide:

  • Reproducible snippet

  • Input shapes / dtypes

  • Expected vs actual output

  • Benchmark data (if performance-related)



Acknowledgements

Built with PyO3, NumPy, ndarray, and Rayon. Thanks to the scientific EO community for standardized index formulations.


Enjoy fast, reproducible Earth Observation processing!