Design Notes: Moving Hot EO Kernels to Rust with a Python Surface
This note analyzes the core engineering tradeoffs when accelerating array-heavy Earth Observation (EO) primitives by implementing hot kernels in a memory-safe systems language while keeping a Python-friendly API. The focus is on algorithmic correctness, numerical robustness, and systems-level performance constraints rather than packaging or distribution.
Motivation
EO workloads often look like large, repeated array math. Performance bottlenecks arise from temporary allocations, crossing language boundaries, and scheduler/overhead costs when composing many small operations. Moving hot loops into Rust can reduce overhead, allow predictable memory behavior, and enable safe parallelism when done carefully.
Core design considerations
- API minimalism: expose small, composable kernels that accept arrays and return arrays so the primitives are easy to test and compose.
- Numeric guardrails: coerce inputs to stable numeric types, defend against near-zero denominators, and make NaN/invalid handling explicit.
- Memory & ownership: avoid unnecessary copies, prefer in-place or streamed interfaces where safe, and document ownership semantics across the language boundary.
- Parallelism: release the Python GIL in kernels and use a controlled threading model (e.g., Rayon) while ensuring reproducible results where possible.
Performance and measurement
Microbenchmarks can show large speedups for some kernels (especially algorithmically distinct distance metrics or simple per-element transforms) and more modest gains for others where memory bandwidth or BLAS-optimized paths dominate. A disciplined measure-first approach is essential: identify hot paths at end-to-end scale, then optimize targeted kernels rather than attempting wholesale rewrites.
Correctness and reproducibility
Determinism, stable numeric behavior, and clear test coverage are paramount. Maintain a benchmark harness and representative inputs so that performance claims are reproducible and regressions are detectable. Small kernels are easier to verify with unit tests and property-based checks.
Integration patterns
Prefer thin, well-documented bindings that integrate with existing array ecosystems (e.g., xarray/dask) via composable primitives. This keeps the accelerated math layer focused and testable while delegating orchestration and I/O to higher-level frameworks.
Open questions
- When should we favor in-place memory transformations versus functional (copying) APIs for safety and performance?
- How to design benchmark suites that accurately reflect varied real-world EO workloads across spatial/temporal shapes?
- What are the minimal invariants consumers should expect from accelerated kernels (dtype behavior, NaN handling, edge cases)?
Conclusion
Accelerating EO primitives in a systems language offers clear gains when guided by careful measurement and a disciplined API design. The right approach balances small, verifiable kernels with explicit guardrails and integration into the broader data orchestration stack.