Architecture & Performance Model

Overview

eo-processor accelerates Earth Observation (EO) numeric primitives by implementing user‑defined functions (UDFs) in Rust and exposing them to Python via PyO3. This page details the architectural choices that yield consistent speedups over pure Python / NumPy workflows while preserving safety, determinism, and API clarity.

Design Objectives

  1. Eliminate Python GIL contention for CPU-bound loops.

  2. Provide dimensional dispatch (1D–4D) without Python-level branching overhead.

  3. Preserve numerical stability (uniform float64 internal type + epsilon guards).

  4. Minimize intermediate allocations (single-pass or fused operations where feasible).

  5. Ensure compile-time memory safety (no unsafe blocks).

  6. Fit seamlessly into XArray / Dask pipelines as ufunc-like callables.

  7. Keep error messages explicit (shape mismatches, invalid parameter values).

  8. Offer predictable scaling thresholds for parallel execution (avoid micro-task overhead).

Rust Extension Boundary

The compiled module (eo_processor._core) is a native extension created by maturin. Key steps at build/import time:

  1. Rust crates (ndarray, numpy, rayon, pyo3) compile into a shared object.

  2. Functions annotated with #[pyfunction] are registered inside the #[pymodule] fn _core(...).

  3. Python entrypoints in python/eo_processor/__init__.py provide thin wrappers (and aliases like evi).

  4. Autodoc and type stub (__init__.pyi) expose signatures to editors and static type checkers.

Dimensional Dispatch

Each computational family (indices, temporal, spatial, masking) inspects the input array rank inside Rust:

  • Spectral indices: support 1D or 2D (and some 3D/4D variants for batch/time-first layouts).

  • Temporal statistics: 1D–4D; interpret leading axis as time.

  • Median compositing: specialized 1D–4D implementations to avoid generalized recursion.

  • Masking utilities: unified loops over arbitrary rank (explicit 1D–4D branches).

This avoids repeated Python if arr.ndim == ... checks and reduces interpreter overhead.

Numerical Stability & Coercion

All incoming numeric dtypes (int, uint, float32, float64) are coerced to f64 once. Benefits:

  • Single arithmetic path (no branching for dtype).

  • Consistent floating precision for downstream calculations.

  • Simplifies NaN handling (uses IEEE-754 semantics uniformly).

Small denominators in normalized difference style formulas are guarded with an EPSILON constant to prevent division blow-ups; near-zero denominators default to 0 or neutral outcomes as documented in docstrings.

Parallel Execution Strategy

Selective Rayon usage:

  • Temporal loops: parallel over spatial indices (pixels) after slicing time series.

  • Pairwise distances: parallel over outer dimension only when N * M exceeds a threshold.

  • Median/Std/Mean: parallel chunking for 3D / 4D arrays.

Why thresholds? Aggressive parallelism on small arrays causes overhead that erodes gains. Threshold tuning (e.g., 10_000 distance pair cutoff) can be adjusted with empirical benchmarking.

Memory Efficiency

Common Python/NumPy pattern:

(nir - red) / (nir + red)

Creates two temporaries unless optimized; Rust versions combine operations in a single loop. Similarly, ΔNDVI and ΔNBR compute pre+post indices in fused passes (one set of ephemeral values). Median computations limit series extraction to exact pixel slices, sorting compact vectors.

NaN Handling

Temporal and median functions optionally skip_na=True:

  • Filter NaNs before statistical aggregation.

  • If all samples are NaN, result is NaN (propagate emptiness semantics).

  • Avoids NumPy masked array overhead and branch-heavy Python loops.

Error Handling

Rust raises Python exceptions with explicit context:

  • Shape mismatch (e.g., spectral band arrays differ in shape).

  • Invalid parameter (e.g., Minkowski p < 1.0).

  • Unsupported dimensionality (non 1D–4D arrays trigger TypeError).

  • SCL masking receives invalid codes gracefully; missing code sets output to NaN.

Performance Comparative Summary

Category | Rust UDF Advantage | Fairness Caveat —————— | —————— | ————— Spectral Indices | Fused arithmetic + GIL release reduce temporaries. | NumPy already vectorized; gains typically modest (≤1.3×) unless very large tiles. Temporal Stats | Parallel spatial reduction (Rayon) amortizes time axis iteration. | Small grids may see overhead; benchmark your T,Y,X before adopting. Median Composite | Native per‑pixel nth‑selection (planned) + parallel iteration. | Current implementation still sorts entire series; optimization ongoing. Pairwise Distances | Streaming algorithm avoids allocating N×M×D broadcasts. | A “broadcast” NumPy baseline does extra work; compare both broadcast & streaming baselines. Masking Utilities | Single pass with minimal Python overhead. | If mask condition is simple, NumPy boolean indexing can match performance.

Important: A headline speedup >2× usually indicates an algorithmic difference (e.g., avoiding large broadcasts) rather than lower constant factors alone. Always verify with size sweeps.

Integration with XArray / Dask

Use xarray.apply_ufunc with dask="parallelized":

import xarray as xr
import dask.array as da
from eo_processor import ndvi

nir = xr.DataArray(da.random.random((6000, 6000), chunks=(750, 750)), dims=["y", "x"])
red = xr.DataArray(da.random.random((6000, 6000), chunks=(750, 750)), dims=["y", "x"])

ndvi_xr = xr.apply_ufunc(
    ndvi,
    nir,
    red,
    dask="parallelized",
    output_dtypes=[float],
)
out = ndvi_xr.compute()

The Rust function executes inside each Dask worker on chunk data without holding the GIL. For large temporal stacks (time-first), chunk on spatial dimensions to maximize per-task parallelism.

Extensibility Guidelines

When adding a new index or transform:

  1. Implement in a domain-specific module (e.g., indices.rs).

  2. Provide dimensional dispatch for relevant rank(s).

  3. Add input shape assertions early.

  4. Add parallelization only if O(N) or O(N*M) complexity is high enough.

  5. Update Python exports + stubs + README + Sphinx autosummary list.

  6. Add tests (shape mismatch, NaN scenario, typical value range).

  7. Benchmark if claiming performance improvement (>20% threshold preferred).

Testing & Validation

Rust unit tests validate core arithmetic and shape behavior. Python tests (in tests/) confirm wrapper semantics and dtype coercion. Coverage badge should be regenerated after logic additions.

Representative Benchmark Template & Empirical Results

Template

import numpy as np, time
from eo_processor import temporal_mean

cube = np.random.rand(24, 1024, 1024)  # (time, y, x)
t0 = time.perf_counter()
rust_mean = temporal_mean(cube)
rust_t = time.perf_counter() - t0

t0 = time.perf_counter()
numpy_mean = np.nanmean(cube, axis=0)
numpy_t = time.perf_counter() - t0

print(f"Rust {rust_t:.3f}s vs NumPy {numpy_t:.3f}s (speedup {numpy_t / rust_t:.2f}x)")
assert np.allclose(rust_mean, numpy_mean, atol=1e-12)

Methodology

  • Platform: macOS (ARM64), CPython 3.10, release build of the Rust extension.

  • Timings: single run wall-clock using time.perf_counter(), warm cache.

  • Dtypes: float64 inputs (internal coercion matches typical use).

  • Validation: np.allclose(…) with tight tolerances (1e-12–1e-9).

  • Disclaimer: Results vary with CPU topology, memory bandwidth, and array shapes. Always benchmark on your target workload before making pipeline decisions.

Representative Results

Large-array benchmarks (single run; no warm repeats):

Function

Input Shape

Rust (s)

NumPy (s)

Speedup

ndvi median (temporal) temporal_mean euclidean_distance

(5000, 5000) (15, 2000, 2000) (24, 2000, 2000) (3000, 64) × (3000, 64)

0.080 1.313 0.891 0.063

0.112 1.947 0.474 0.041

1.40x 1.48x 0.53x 0.65x

Interpretation

  • Fused Index Arithmetic: ndvi shows a clear gain from single-pass fused math + GIL release.

  • Temporal Median: Sorting per pixel in native code + parallel spatial iteration yields >1.4× improvement versus NumPy’s nanmedian for this shape.

  • Temporal Mean / Std: For medium-sized grids Rust may be slower if parallel thresholds trigger overhead. Larger time dimensions or spatial extents generally improve relative performance; tune chunking (e.g., via Dask) for best results.

  • Pairwise Distances: For moderately sized matrices NumPy’s optimized BLAS-backed matrix multiply strategy can outperform the current Rust implementation. Rust gains appear for larger N×M where outer-loop parallelization amortizes overhead.

Guidance

  • Use Rust spectral indices (ndvi, nbr, etc.) for production-scale tile processing.

  • Prefer the Rust median for large temporal stacks when robustness to outliers matters.

  • Benchmark temporal_mean / temporal_std on your exact sizes; consider increasing spatial dimensions or enabling chunked execution for better scaling.

  • For very large pairwise distance workloads (> millions of pairs) the Rust implementation can scale with additional cores; otherwise NumPy’s vectorized BLAS approach may suffice.

Performance Claim Template

When adding new optimized functions, follow the required documentation pattern:

Benchmark:
Array size: 5000 x 5000
Old: 1.42s
New: 1.05s
Speedup: 1.35x

Methodology:
Single run, warm cache discarded
Timing via time.perf_counter()
Validation: np.allclose(new, old, atol=1e-12)

Keep empirical claims conservative and reproducible. Attach code snippets or reference a test harness (noting environment) for transparency.

Key Differences from Pure NumPy

  • Fewer temporaries in multi-step formulas.

  • Parallelism not limited by GIL.

  • Consistent float64 internal representation.

  • Explicit shape checks produce clearer early errors than silent broadcasting misalignments.

  • Integrated NaN filtering in statistical functions (no separate boolean masks required).

Future Optimization Opportunities

(Subject to human review per repository guidelines)

  • Adaptive parallel thresholds informed by runtime profiling.

  • Optional streaming median (reducing per-pixel vector allocation for long time series).

  • SIMD acceleration (requires careful evaluation; staying within safe Rust & avoiding unsafe).

  • Advanced mask composition DSL (predicate fusion to reduce data passes).

Security & Safety Notes

  • No unsafe code blocks in kernel implementations.

  • No network or file I/O in the performance-critical path.

  • All external inputs constrained to NumPy array interfaces; memory safety enforced by borrow semantics.

Reference of Core Modules

Module | Purpose ————- | ——————————————– indices | Spectral & change detection indices (NDVI, NDWI, SAVI, EVI, NBR, NDMI, NBR2, GCI, ΔNDVI, ΔNBR) temporal | Time-axis statistics (mean, std) with NaN skipping spatial | Median compositing + pairwise distance calculations masking | Value/range/sentinel/SCL masking utilities lib.rs | PyO3 module registration (bridge to Python)

Rust Acceleration Note

The high-level rationale for Rust UDF acceleration (GIL release, dimensional dispatch, float64 coercion, parallel strategy) is also summarized on the main index page. For convenience:

Summary

  • GIL-free native loops enable true multi-core throughput.

  • Float64 coercion yields stable, predictable numerics.

  • Selective Rayon parallelism avoids overhead on small workloads.

  • Fused loop bodies minimize intermediate allocations.

  • Dimension-aware implementations remove Python branching overhead.

  • No unsafe; memory safety guaranteed by compiler.

Cross-References

License

MIT. See repository root for full text.

Benchmark Report (Generated)

The following benchmark summary (generated via scripts/benchmark.py) is included for up-to-date performance measurements across spectral indices and related functions:

Meta

Python: 3.10.18 Platform: macOS-15.6.1-arm64-arm-64bit Group: all Functions: ndvi, ndwi, evi, savi, nbr, ndmi, nbr2, gci, delta_ndvi, delta_nbr, normalized_difference, temporal_mean, temporal_std, median, euclidean_distance, manhattan_distance, chebyshev_distance, minkowski_distance Loops: 5 Warmups: 1 Seed: 42 Compare NumPy: True Height: 2000 Width: 2000 Time: 24 Points A: 3000 Points B: 3000 Point Dim: 64

Results

+=======================+===========+============+==========+==========+============+=============================+==============================+==================+======================+ | Function | Mean (ms) | StDev (ms) | Min (ms) | Max (ms) | Elements | Rust Throughput (M elems/s) | NumPy Throughput (M elems/s) | Speedup vs NumPy | Shape | +=======================+===========+============+==========+==========+============+=============================+==============================+==================+======================+ | ndvi | 6.60 | 1.82 | 3.95 | 8.87 | 4,000,000 | 606.29 | 500.76 | 1.21x | 2000x2000 | | ndwi | 3.86 | 0.32 | 3.29 | 4.21 | 4,000,000 | 1037.40 | 608.32 | 1.71x | 2000x2000 | | evi | 3.28 | 0.26 | 2.96 | 3.55 | 4,000,000 | 1218.22 | 293.03 | 4.16x | 2000x2000 | | savi | 3.03 | 0.06 | 2.97 | 3.15 | 4,000,000 | 1320.59 | 453.30 | 2.91x | 2000x2000 | | nbr | 5.11 | 0.33 | 4.47 | 5.45 | 4,000,000 | 782.84 | 561.07 | 1.40x | 2000x2000 | | ndmi | 4.46 | 0.17 | 4.27 | 4.77 | 4,000,000 | 897.38 | 578.29 | 1.55x | 2000x2000 | | nbr2 | 5.18 | 0.55 | 4.50 | 6.10 | 4,000,000 | 772.37 | 572.01 | 1.35x | 2000x2000 | | gci | 3.16 | 0.21 | 2.95 | 3.41 | 4,000,000 | 1267.30 | 1148.23 | 1.10x | 2000x2000 | | delta_ndvi | 11.39 | 0.32 | 10.80 | 11.68 | 4,000,000 | 351.18 | 288.93 | 1.22x | 2000x2000 | | delta_nbr | 11.06 | 0.85 | 9.81 | 12.21 | 4,000,000 | 361.75 | 276.35 | 1.31x | 2000x2000 | | normalized_difference | 4.60 | 0.36 | 4.19 | 5.19 | 4,000,000 | 869.27 | 603.90 | 1.44x | 2000x2000 | | temporal_mean | 705.91 | 10.48 | 692.74 | 719.92 | 96,000,000 | 135.99 | 3631.48 | 0.04x | 24x2000x2000 | | temporal_std | 853.01 | 22.19 | 836.60 | 896.36 | 96,000,000 | 112.54 | 535.49 | 0.21x | 24x2000x2000 | | median | 1178.52 | 25.05 | 1158.20 | 1226.58 | 96,000,000 | 81.46 | 78.36 | 1.04x | 24x2000x2000 | | euclidean_distance | 58.48 | 0.57 | 57.83 | 59.20 | 9,000,000 | 153.90 | 285.61 | 0.54x | N=3000, M=3000, D=64 | | manhattan_distance | 70.94 | 1.12 | 69.31 | 72.76 | 9,000,000 | 126.87 | 3.97 | 31.92x | N=3000, M=3000, D=64 | | chebyshev_distance | 81.83 | 3.33 | 78.45 | 87.87 | 9,000,000 | 109.98 | 4.33 | 25.38x | N=3000, M=3000, D=64 | | minkowski_distance | 714.12 | 10.32 | 705.25 | 732.82 | 9,000,000 | 12.60 | 0.71 | 17.78x | N=3000, M=3000, D=64 | +=======================+===========+============+==========+==========+============+=============================+==============================+==================+======================+

Speedup vs NumPy = (NumPy mean time / Rust mean time); values > 1 indicate Rust is faster.

Meta

Python: 3.10.18 Platform: macOS-15.6.1-arm64-arm-64bit Group: distances Functions: euclidean_distance, manhattan_distance, chebyshev_distance, minkowski_distance Loops: 2 Warmups: 1 Seed: 42 Compare NumPy: True Height: 2048 Width: 2048 Time: 12 Points A: 800 Points B: 600 Point Dim: 32

Results

+====================+===========+============+==========+==========+==========+=============================+==============================+==================+====================+ | Function | Mean (ms) | StDev (ms) | Min (ms) | Max (ms) | Elements | Rust Throughput (M elems/s) | NumPy Throughput (M elems/s) | Speedup vs NumPy | Shape | +====================+===========+============+==========+==========+==========+=============================+==============================+==================+====================+ | euclidean_distance | 1.85 | 0.00 | 1.85 | 1.85 | 480,000 | 259.41 | 325.24 | 0.80x | N=800, M=600, D=32 | | manhattan_distance | 2.57 | 0.14 | 2.43 | 2.71 | 480,000 | 186.61 | 13.93 | 13.40x | N=800, M=600, D=32 | | chebyshev_distance | 2.67 | 0.09 | 2.58 | 2.76 | 480,000 | 179.79 | 11.01 | 16.33x | N=800, M=600, D=32 | | minkowski_distance | 22.62 | 0.07 | 22.55 | 22.69 | 480,000 | 21.22 | 1.97 | 10.78x | N=800, M=600, D=32 | +====================+===========+============+==========+==========+==========+=============================+==============================+==================+====================+

Speedup vs NumPy = (NumPy mean time / Rust mean time); values > 1 indicate Rust is faster.

Python Version Benchmarks

The following per-interpreter benchmark reports are generated via version-specific tox benchmark environments (py38-bench … py313-bench). Each file captures synthetic performance snapshots for that Python version using identical data shapes and loop counts.

Meta

Python: 3.9.6 Platform: macOS-15.6.1-arm64-arm-64bit Group: all Functions: ndvi, ndwi, evi, savi, nbr, ndmi, nbr2, gci, delta_ndvi, delta_nbr, normalized_difference, temporal_mean, temporal_std, median, euclidean_distance, manhattan_distance, chebyshev_distance, minkowski_distance Loops: 3 Warmups: 1 Seed: 42 Compare NumPy: True Height: 2000 Width: 2000 Time: 24 Points A: 2000 Points B: 2000 Point Dim: 32

Results

+=======================+===========+============+==========+==========+============+=============================+==============================+==================+======================+ | Function | Mean (ms) | StDev (ms) | Min (ms) | Max (ms) | Elements | Rust Throughput (M elems/s) | NumPy Throughput (M elems/s) | Speedup vs NumPy | Shape | +=======================+===========+============+==========+==========+============+=============================+==============================+==================+======================+ | ndvi | 123.25 | 0.14 | 123.07 | 123.42 | 4,000,000 | 32.45 | 578.77 | 0.06x | 2000x2000 | | ndwi | 113.71 | 2.70 | 109.90 | 115.66 | 4,000,000 | 35.18 | 543.97 | 0.06x | 2000x2000 | | evi | 153.32 | 1.20 | 151.65 | 154.41 | 4,000,000 | 26.09 | 308.38 | 0.08x | 2000x2000 | | savi | 113.26 | 1.11 | 112.42 | 114.83 | 4,000,000 | 35.32 | 471.92 | 0.07x | 2000x2000 | | nbr | 111.81 | 0.61 | 110.96 | 112.36 | 4,000,000 | 35.77 | 542.18 | 0.07x | 2000x2000 | | ndmi | 111.78 | 0.31 | 111.39 | 112.15 | 4,000,000 | 35.78 | 571.47 | 0.06x | 2000x2000 | | nbr2 | 109.76 | 0.25 | 109.49 | 110.09 | 4,000,000 | 36.44 | 610.93 | 0.06x | 2000x2000 | | gci | 107.58 | 0.91 | 106.66 | 108.81 | 4,000,000 | 37.18 | 1198.72 | 0.03x | 2000x2000 | | delta_ndvi | 318.82 | 0.59 | 318.23 | 319.62 | 4,000,000 | 12.55 | 273.72 | 0.05x | 2000x2000 | | delta_nbr | 322.70 | 3.14 | 318.49 | 326.03 | 4,000,000 | 12.40 | 266.72 | 0.05x | 2000x2000 | | normalized_difference | 111.13 | 1.34 | 109.32 | 112.51 | 4,000,000 | 35.99 | 619.32 | 0.06x | 2000x2000 | | temporal_mean | 3226.06 | 11.28 | 3213.18 | 3240.64 | 96,000,000 | 29.76 | 3854.30 | 0.01x | 24x2000x2000 | | temporal_std | 3212.00 | 16.87 | 3188.55 | 3227.50 | 96,000,000 | 29.89 | 582.95 | 0.05x | 24x2000x2000 | | median | 3679.59 | 32.86 | 3634.57 | 3712.08 | 96,000,000 | 26.09 | 76.92 | 0.34x | 24x2000x2000 | | euclidean_distance | 1017.24 | 17.33 | 992.91 | 1031.93 | 4,000,000 | 3.93 | 307.06 | 0.01x | N=2000, M=2000, D=32 | | manhattan_distance | 939.43 | 18.27 | 917.30 | 962.05 | 4,000,000 | 4.26 | 10.79 | 0.39x | N=2000, M=2000, D=32 | | chebyshev_distance | 1022.55 | 27.52 | 998.33 | 1061.05 | 4,000,000 | 3.91 | 11.80 | 0.33x | N=2000, M=2000, D=32 | | minkowski_distance | 1433.37 | 19.46 | 1415.26 | 1460.37 | 4,000,000 | 2.79 | 1.80 | 1.55x | N=2000, M=2000, D=32 | +=======================+===========+============+==========+==========+============+=============================+==============================+==================+======================+

Speedup vs NumPy = (NumPy mean time / Rust mean time); values > 1 indicate Rust is faster.

Meta

Python: 3.10.18 Platform: macOS-15.6.1-arm64-arm-64bit Group: all Functions: ndvi, ndwi, evi, savi, nbr, ndmi, nbr2, gci, delta_ndvi, delta_nbr, normalized_difference, temporal_mean, temporal_std, median, euclidean_distance, manhattan_distance, chebyshev_distance, minkowski_distance Loops: 3 Warmups: 1 Seed: 42 Compare NumPy: True Height: 2000 Width: 2000 Time: 24 Points A: 2000 Points B: 2000 Point Dim: 32

Results

+=======================+===========+============+==========+==========+============+=============================+==============================+==================+======================+ | Function | Mean (ms) | StDev (ms) | Min (ms) | Max (ms) | Elements | Rust Throughput (M elems/s) | NumPy Throughput (M elems/s) | Speedup vs NumPy | Shape | +=======================+===========+============+==========+==========+============+=============================+==============================+==================+======================+ | ndvi | 112.67 | 0.80 | 111.73 | 113.69 | 4,000,000 | 35.50 | 543.78 | 0.07x | 2000x2000 | | ndwi | 111.14 | 0.27 | 110.81 | 111.47 | 4,000,000 | 35.99 | 608.27 | 0.06x | 2000x2000 | | evi | 155.20 | 0.23 | 154.87 | 155.37 | 4,000,000 | 25.77 | 311.76 | 0.08x | 2000x2000 | | savi | 122.93 | 0.85 | 122.18 | 124.12 | 4,000,000 | 32.54 | 487.15 | 0.07x | 2000x2000 | | nbr | 111.33 | 0.24 | 111.16 | 111.68 | 4,000,000 | 35.93 | 612.21 | 0.06x | 2000x2000 | | ndmi | 111.96 | 0.40 | 111.64 | 112.53 | 4,000,000 | 35.73 | 558.10 | 0.06x | 2000x2000 | | nbr2 | 111.18 | 1.43 | 109.49 | 112.98 | 4,000,000 | 35.98 | 624.97 | 0.06x | 2000x2000 | | gci | 108.13 | 0.23 | 107.82 | 108.39 | 4,000,000 | 36.99 | 1225.23 | 0.03x | 2000x2000 | | delta_ndvi | 321.86 | 2.18 | 319.75 | 324.86 | 4,000,000 | 12.43 | 282.07 | 0.04x | 2000x2000 | | delta_nbr | 321.11 | 2.92 | 318.92 | 325.24 | 4,000,000 | 12.46 | 269.04 | 0.05x | 2000x2000 | | normalized_difference | 112.57 | 0.80 | 111.95 | 113.71 | 4,000,000 | 35.53 | 547.53 | 0.06x | 2000x2000 | | temporal_mean | 3167.30 | 46.43 | 3125.99 | 3232.15 | 96,000,000 | 30.31 | 4104.33 | 0.01x | 24x2000x2000 | | temporal_std | 3156.65 | 6.46 | 3149.78 | 3165.31 | 96,000,000 | 30.41 | 608.00 | 0.05x | 24x2000x2000 | | median | 3743.51 | 71.44 | 3651.62 | 3825.82 | 96,000,000 | 25.64 | 75.68 | 0.34x | 24x2000x2000 | | euclidean_distance | 992.90 | 25.30 | 971.80 | 1028.47 | 4,000,000 | 4.03 | 285.65 | 0.01x | N=2000, M=2000, D=32 | | manhattan_distance | 938.50 | 3.91 | 933.12 | 942.29 | 4,000,000 | 4.26 | 12.01 | 0.35x | N=2000, M=2000, D=32 | | chebyshev_distance | 1041.33 | 32.36 | 1006.59 | 1084.50 | 4,000,000 | 3.84 | 10.71 | 0.36x | N=2000, M=2000, D=32 | | minkowski_distance | 1440.12 | 19.92 | 1418.23 | 1466.42 | 4,000,000 | 2.78 | 1.79 | 1.55x | N=2000, M=2000, D=32 | +=======================+===========+============+==========+==========+============+=============================+==============================+==================+======================+

Speedup vs NumPy = (NumPy mean time / Rust mean time); values > 1 indicate Rust is faster.

Meta

Python: 3.11.6 Platform: macOS-15.6.1-arm64-arm-64bit Group: all Functions: ndvi, ndwi, evi, savi, nbr, ndmi, nbr2, gci, delta_ndvi, delta_nbr, normalized_difference, temporal_mean, temporal_std, median, euclidean_distance, manhattan_distance, chebyshev_distance, minkowski_distance Loops: 3 Warmups: 1 Seed: 42 Compare NumPy: True Height: 2000 Width: 2000 Time: 24 Points A: 2000 Points B: 2000 Point Dim: 32

Results

+=======================+===========+============+==========+==========+============+=============================+==============================+==================+======================+ | Function | Mean (ms) | StDev (ms) | Min (ms) | Max (ms) | Elements | Rust Throughput (M elems/s) | NumPy Throughput (M elems/s) | Speedup vs NumPy | Shape | +=======================+===========+============+==========+==========+============+=============================+==============================+==================+======================+ | ndvi | 294.78 | 0.93 | 293.60 | 295.88 | 4,000,000 | 13.57 | 616.41 | 0.02x | 2000x2000 | | ndwi | 295.97 | 1.19 | 294.30 | 296.97 | 4,000,000 | 13.51 | 567.63 | 0.02x | 2000x2000 | | evi | 158.06 | 0.16 | 157.90 | 158.28 | 4,000,000 | 25.31 | 319.12 | 0.08x | 2000x2000 | | savi | 116.13 | 0.84 | 114.94 | 116.78 | 4,000,000 | 34.45 | 517.95 | 0.07x | 2000x2000 | | nbr | 295.21 | 1.43 | 293.27 | 296.68 | 4,000,000 | 13.55 | 643.17 | 0.02x | 2000x2000 | | ndmi | 290.18 | 0.71 | 289.52 | 291.18 | 4,000,000 | 13.78 | 611.97 | 0.02x | 2000x2000 | | nbr2 | 290.90 | 2.24 | 288.96 | 294.03 | 4,000,000 | 13.75 | 630.33 | 0.02x | 2000x2000 | | gci | 113.17 | 0.19 | 112.94 | 113.40 | 4,000,000 | 35.34 | 1229.00 | 0.03x | 2000x2000 | | delta_ndvi | 682.76 | 4.94 | 677.32 | 689.27 | 4,000,000 | 5.86 | 308.17 | 0.02x | 2000x2000 | | delta_nbr | 688.46 | 2.29 | 685.52 | 691.10 | 4,000,000 | 5.81 | 263.95 | 0.02x | 2000x2000 | | normalized_difference | 301.21 | 2.52 | 297.84 | 303.91 | 4,000,000 | 13.28 | 449.09 | 0.03x | 2000x2000 | | temporal_mean | 3271.24 | 3.20 | 3267.05 | 3274.84 | 96,000,000 | 29.35 | 3864.13 | 0.01x | 24x2000x2000 | | temporal_std | 3264.86 | 14.59 | 3250.30 | 3284.80 | 96,000,000 | 29.40 | 573.60 | 0.05x | 24x2000x2000 | | median | 3769.29 | 31.36 | 3726.56 | 3800.91 | 96,000,000 | 25.47 | 78.22 | 0.33x | 24x2000x2000 | | euclidean_distance | 966.39 | 11.53 | 952.93 | 981.08 | 4,000,000 | 4.14 | 295.05 | 0.01x | N=2000, M=2000, D=32 | | manhattan_distance | 917.14 | 21.01 | 896.59 | 945.99 | 4,000,000 | 4.36 | 12.80 | 0.34x | N=2000, M=2000, D=32 | | chebyshev_distance | 961.53 | 20.21 | 947.03 | 990.10 | 4,000,000 | 4.16 | 12.29 | 0.34x | N=2000, M=2000, D=32 | | minkowski_distance | 1400.93 | 14.43 | 1380.62 | 1412.78 | 4,000,000 | 2.86 | 1.90 | 1.50x | N=2000, M=2000, D=32 | +=======================+===========+============+==========+==========+============+=============================+==============================+==================+======================+

Speedup vs NumPy = (NumPy mean time / Rust mean time); values > 1 indicate Rust is faster.

Meta

Python: 3.12.3 Platform: Linux-6.8.0-x86_64-with-glibc2.39 Group: all Functions: ndvi, ndwi, evi, savi, nbr, ndmi, nbr2, gci, delta_ndvi, delta_nbr, normalized_difference, temporal_mean, temporal_std, median, trend_analysis, euclidean_distance, manhattan_distance, chebyshev_distance, minkowski_distance, moving_average_temporal, moving_average_temporal_stride, pixelwise_transform, zonal_stats, binary_dilation, binary_erosion, binary_opening, binary_closing, texture_entropy Distance Baseline: broadcast Stress Mode: False Loops: 3 Warmups: 1 Seed: 42 Compare NumPy: True Height: 1000 Width: 1000 Time: 24 Points A: 1000 Points B: 1000 Point Dim: 32 Size Sweep: None MA Window: 5 MA Stride: 4 MA Baseline: naive Zones Count: 100

Results

+================================+===========+============+==========+==========+============+=============================+==============================+==================+===============================+ | Function | Mean (ms) | StDev (ms) | Min (ms) | Max (ms) | Elements | Rust Throughput (M elems/s) | NumPy Throughput (M elems/s) | Speedup vs NumPy | Shape | +================================+===========+============+==========+==========+============+=============================+==============================+==================+===============================+ | ndvi | 93.61 | 0.60 | 92.89 | 94.36 | 1,000,000 | 10.68 | 111.15 | 0.10x | 1000x1000 | | ndwi | 87.87 | 0.41 | 87.30 | 88.29 | 1,000,000 | 11.38 | 189.66 | 0.06x | 1000x1000 | | evi | 108.42 | 0.57 | 107.62 | 108.91 | 1,000,000 | 9.22 | 69.04 | 0.13x | 1000x1000 | | savi | 88.00 | 0.96 | 87.06 | 89.31 | 1,000,000 | 11.36 | 156.33 | 0.07x | 1000x1000 | | nbr | 98.35 | 8.91 | 89.37 | 110.50 | 1,000,000 | 10.17 | 183.38 | 0.06x | 1000x1000 | | ndmi | 86.85 | 0.23 | 86.54 | 87.10 | 1,000,000 | 11.51 | 188.11 | 0.06x | 1000x1000 | | nbr2 | 113.07 | 36.67 | 87.04 | 164.93 | 1,000,000 | 8.84 | 180.74 | 0.05x | 1000x1000 | | gci | 86.24 | 2.03 | 84.76 | 89.10 | 1,000,000 | 11.60 | 395.62 | 0.03x | 1000x1000 | | delta_ndvi | 262.70 | 1.67 | 260.78 | 264.85 | 1,000,000 | 3.81 | 76.93 | 0.05x | 1000x1000 | | delta_nbr | 270.77 | 16.45 | 258.52 | 294.03 | 1,000,000 | 3.69 | 84.24 | 0.04x | 1000x1000 | | normalized_difference | 88.14 | 0.25 | 87.87 | 88.47 | 1,000,000 | 11.35 | 188.40 | 0.06x | 1000x1000 | | temporal_mean | 610.57 | 27.96 | 589.19 | 650.07 | 24,000,000 | 39.31 | 910.57 | 0.04x | 24x1000x1000 | | temporal_std | 1223.60 | 16.51 | 1200.47 | 1237.89 | 24,000,000 | 19.61 | 149.15 | 0.13x | 24x1000x1000 | | median | 2462.81 | 4.95 | 2458.15 | 2469.66 | 24,000,000 | 9.74 | 36.56 | 0.27x | 24x1000x1000 | | trend_analysis | 0.09 | 0.04 | 0.06 | 0.15 | 24 | 0.25 | - | - | T=24 | | euclidean_distance | 817.66 | 6.09 | 810.85 | 825.64 | 32,000,000 | 39.14 | 3347.13 | 0.01x | N=1000, M=1000, D=32 | | manhattan_distance | 792.98 | 4.42 | 786.77 | 796.71 | 32,000,000 | 40.35 | 120.13 | 0.34x | N=1000, M=1000, D=32 | | chebyshev_distance | 820.30 | 12.74 | 803.05 | 833.42 | 32,000,000 | 39.01 | 106.89 | 0.36x | N=1000, M=1000, D=32 | | minkowski_distance | 1113.61 | 8.38 | 1101.76 | 1119.67 | 32,000,000 | 28.74 | 23.52 | 1.22x | N=1000, M=1000, D=32 | | moving_average_temporal | 3157.49 | 464.50 | 2801.94 | 3813.63 | 24,000,000 | 7.60 | 35.68 | 0.21x | 24x1000x1000(win=5) | | moving_average_temporal_stride | 3209.63 | 150.25 | 3102.05 | 3422.10 | 24,000,000 | 7.48 | 40.31 | 0.19x | 24x1000x1000(win=5, stride=4) | | pixelwise_transform | 1052.10 | 1.60 | 1050.42 | 1054.26 | 24,000,000 | 22.81 | 150.83 | 0.15x | 24x1000x1000 | | zonal_stats | 58.60 | 0.35 | 58.30 | 59.10 | - | - | - | 3.57x | 1000x1000 (Zones=100) | | binary_dilation | 55.65 | 1.81 | 53.17 | 57.41 | - | - | - | 0.03x | 1000x1000 (Kernel=3) | | binary_erosion | 45.19 | 0.19 | 44.98 | 45.45 | - | - | - | 0.04x | 1000x1000 (Kernel=3) | | binary_opening | 229.28 | 1.33 | 228.16 | 231.14 | - | - | - | 0.01x | 1000x1000 (Kernel=3) | | binary_closing | 248.73 | 0.76 | 247.70 | 249.49 | - | - | - | 0.01x | 1000x1000 (Kernel=3) | | texture_entropy | 2618.82 | 38.66 | 2585.20 | 2672.96 | 1,000,000 | 0.38 | 0.03 | 12.05x | 1000x1000 (Window=3) | +================================+===========+============+==========+==========+============+=============================+==============================+==================+===============================+

Speedup vs NumPy = (NumPy mean time / Rust mean time); values > 1 indicate Rust is faster.

Benchmark Fairness & Methodology

To ensure transparent comparisons we distinguish two baseline styles:

  1. Broadcast Baseline (NumPy): - Uses broadcasting to form (N, M, D) intermediary (explicitly or implicitly via algebra). - High memory footprint; stresses allocator & cache.

  2. Streaming Baseline (NumPy): - Pure Python loop over outer dimension; each iteration performs a vectorized reduction over D. - Low memory; more Python overhead.

Rust distance functions currently implement a streaming pattern with parallel outer-loop execution (when size threshold exceeded). When reporting speedups: speedup_vs_numpy = baseline_mean / rust_mean Values > 1 mean Rust faster. A large speedup versus the broadcast baseline may disappear or shrink versus the streaming baseline.

Reporting Guidelines: - Always state which baseline was used (broadcast / streaming / both). - Include array or point set shape and element count. - Use ≥5 timing loops + ≥1 warmup; prefer median over mean if distribution skewed. - Disclose algorithmic differences (e.g., fewer temporaries) rather than attributing all gains to “Rust”. - For spectral indices, expect near bandwidth-limited behavior; huge (>1.5×) gains are unlikely on modern NumPy unless parallelism or algorithmic fusion changes.

Stress Benchmarks

Stress benchmarks highlight scalability for Earth Observation “big tile” or large point set workloads (e.g., 4096×4096 spatial grids, time depth ≥48, distance matrices with N,M ≥ 10,000). Enable via:

scripts/benchmark.py –group all –stress –compare-numpy –distance-baseline both –loops 7 –warmups 2 –size-sweep 2048x2048 4096x4096

Outputs should be included as generated reST fragments: .. include:: stress-benchmarks.rst

Interpreting Stress Results: - Spectral indices: near-linear scaling with element count until memory bandwidth saturates. - Temporal reducers: benefit from more spatial pixels (parallel work units) and longer time axis (amortized slice overhead). - Distances: Rust streaming avoids allocating (N,M,D) memory; broadcast NumPy baseline may degrade sharply with very large N,M,D.

Planned Improvements (Benchmarking): - Add nth‑selection median to reduce O(T log T) sorts to O(T). - Adaptive parallel threshold auto-tuned from calibration run. - Optional SIMD path (still safe Rust) after empirical validation.

Documentation Update Protocol: Whenever a new optimization lands: 1. Re-run size sweep (small, medium, large shapes). 2. Update fairness table (above) if algorithmic class changes. 3. Regenerate stress-benchmarks.rst. 4. Provide before/after code snippet and numerical equivalence check (np.allclose with tolerance). 5. Bump minor version if public API extended; patch if internal only.

End of Architecture Overview.