eo-processor: Rust-Accelerated EO Processing from Python
Most Earth Observation workflows are “just array math”… until the arrays are 10k×10k, the time
axis is 60+ scenes, and your pipeline starts spending more time in overhead than in actual
science. eo-processor is my attempt to keep the ergonomics of Python while pushing
the hot loops into safe, parallel Rust.
What eo-processor is (and isn’t)
At its core, eo-processor is a collection of deterministic numerical kernels that
operate on NumPy arrays:
- Spectral and change-detection indices (NDVI, NDWI, NBR, ΔNDVI, …)
- Temporal stats and compositing over stacks (mean/median/std/sum, moving averages)
- Masking utilities (ranges, sentinel values, Sentinel‑2 SCL, NaN replacement)
- Spatial utilities like pairwise distances
It’s intentionally not a full geospatial pipeline engine: there’s no reprojection, no I/O orchestration, and no data acquisition. The goal is to be the fast “math layer” you can drop into rasterio / xarray / dask pipelines.
Why Rust here?
NumPy is great, but some workloads hit limits that look like Python overhead, temporary array churn, and GIL contention when you start composing many operations. Rust lets me:
- Run tight loops with predictable memory behavior
- Release the GIL inside kernels so Dask/XArray can parallelize cleanly
- Use multi-core CPU parallelism (Rayon) where it actually helps
- Keep the core memory-safe (no
unsafeblocks in public kernels)
Hybrid architecture: PyO3 + maturin
The package is a standard Python wheel, but the implementation lives in a Rust
cdylib. PyO3 exposes Rust functions as CPython-callable entry points, and
maturin builds and packages everything.
pip install eo-processor
When developing locally, the quickest loop is to build an in-place extension and import it from Python:
pip install maturin
maturin develop --release
API design: small, composable primitives
I biased toward small kernels that do one thing, take arrays, and return arrays. That keeps the
surface area easy to test and makes it straightforward to use with xarray.apply_ufunc.
import numpy as np
import xarray as xr
from eo_processor import ndvi
nir = xr.DataArray(np.random.rand(2048, 2048), dims=["y", "x"])
red = xr.DataArray(np.random.rand(2048, 2048), dims=["y", "x"])
ndvi_xr = xr.apply_ufunc(
ndvi,
nir,
red,
dask="parallelized",
output_dtypes=[float],
)
Dimensional dispatch and numerical guardrails
EO data shows up as 1D time series, 2D images, or 3D/4D stacks. A lot of the library is about taking the same intent (e.g., “compute temporal median”) and doing the right thing across those shapes while staying explicit about expectations.
For normalized-difference style indices, the implementation includes a near-zero denominator
safeguard so you don’t get accidental infinities from a + b ≈ 0. Inputs accept any
numeric NumPy dtype and are coerced to float64 in Rust for stable computation.
A small CLI for batch computation
For quick experiments or glue in batch workflows, there’s a CLI that computes one or more
indices from .npy band arrays, with optional masking and a PNG quicklook.
# list supported indices
eo-processor --list
# compute NDVI
eo-processor --index ndvi --nir nir.npy --red red.npy --out ndvi.npy
# compute multiple indices into a directory
eo-processor --index ndvi ndmi nbr --nir nir.npy --red red.npy --swir1 swir1.npy --swir2 swir2.npy --out-dir outputs/
Tests, stubs, and benchmarks
The project is intentionally “boring” from a maintenance standpoint:
- Python tests (pytest) cover correctness and edge cases like NaNs and shape mismatches
- Type stubs (
__init__.pyi) keep the public API pleasant in IDEs - A benchmark harness can compare Rust kernels against NumPy baselines
The main discipline is keeping Rust and Python layers in sync: add the Rust function, register it in the module, export it in Python, update stubs, add tests, and update docs.
What’s next
I want to keep adding high-value primitives (more indices, better temporal tooling, and more masking helpers) without turning the library into a monolith. If you have a real-world EO workload that’s bottlenecked on a specific operation, that’s usually a good candidate for a new kernel.