euclidean_distance ================== .. currentmodule:: eo_processor .. autofunction:: euclidean_distance Synopsis -------- `euclidean_distance(points_a, points_b)` computes the pairwise Euclidean (L2) distance matrix between two 2D numeric arrays: `(N, D)` and `(M, D)` → `(N, M)`. This is a Rust-accelerated implementation releasing the Python GIL. For large inputs it uses internal parallelism (outer-loop style) when beneficial. Shape Requirements ------------------ - `points_a.ndim == 2` - `points_b.ndim == 2` - `points_a.shape[1] == points_b.shape[1]` (matching feature dimension) Parameters ---------- points_a : numpy.ndarray Array of shape `(N, D)` containing N points with D features. points_b : numpy.ndarray Array of shape `(M, D)` containing M points with D features. Returns ------- numpy.ndarray Distance matrix of shape `(N, M)` where element `(i, j)` is `sqrt(sum_k (points_a[i, k] - points_b[j, k])**2)`. Notes ----- - All numeric dtypes are coerced to `float64` internally for stable arithmetic. - If either input violates the 2D requirement or feature size mismatch, a `ValueError` is raised. - Memory complexity is `O(N*M)`—be cautious with very large N or M (e.g. > 50k) to avoid excessive allocation. - Consider block-wise or approximate methods for extreme sizes (e.g., spatial indexing, locality sensitive hashing). Not implemented here. Performance ----------- Overview: The implementation uses an outer-loop strategy with optional parallelization (Rayon) for large pair counts. All arithmetic is performed in fused loops, avoiding Python-level broadcasting and reducing temporary allocations. Inputs are coerced once to float64. Representative Benchmarks (Single Run) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Platform: macOS (ARM64), CPython 3.10, release build, time.perf_counter(), warm cache Validation: np.allclose(..., atol=1e-9) .. list-table:: Euclidean Distance Benchmarks (Single Run) :header-rows: 1 * - Workload (A shape × B shape) - Rust (s) - NumPy (s) - Rust Throughput (M/s) - NumPy Throughput (M/s) - Speedup - Notes * - (1200, 32) × (1200, 32) - 0.005 - 0.008 - 288.00 - 180.00 - 1.44x - Smaller matrix; fused loops * - (3000, 64) × (3000, 64) - 0.063 - 0.041 - 142.86 - 219.51 - 0.65x - NumPy BLAS matmul advantage Interpretation: - For moderate dimensions with relatively small feature size, Rust can outperform a pure NumPy formulation due to low Python overhead and early parallel thresholds. - For larger point sets with feature dimensions amenable to efficient BLAS matmul, the vectorized NumPy approach (A^2 + B^2 - 2 A B^T) can be faster. - Performance crossover depends on (N * M), D, and available cores. Benchmark on your workload. Reproduction Snippet: .. code-block:: python import numpy as np, time from eo_processor import euclidean_distance A = np.random.rand(3000, 64) B = np.random.rand(3000, 64) t0 = time.perf_counter() rust_dist = euclidean_distance(A, B) rust_t = time.perf_counter() - t0 t0 = time.perf_counter() A_sq = (A**2).sum(axis=1)[:, None] B_sq = (B**2).sum(axis=1)[None, :] prod = A @ B.T numpy_dist = np.sqrt(np.clip(A_sq + B_sq - 2 * prod, 0, None)) numpy_t = time.perf_counter() - t0 print(f"Rust {rust_t:.3f}s vs NumPy {numpy_t:.3f}s speedup {numpy_t/rust_t:.2f}x") assert np.allclose(rust_dist, numpy_dist, atol=1e-9) Guidance: - Use rust implementation when memory for full pair matrix is acceptable and (N * M) is large enough for parallel benefits or when integrating consistently with other Rust-accelerated kernels. - For extremely large N, M (risking RAM exhaustion), consider blockwise strategies or approximate nearest-neighbor methods (not yet implemented here). - Always validate correctness with np.allclose before adopting benchmarks. Performance Claim Template: .. code-block:: text Benchmark: Shapes: (3000, 64) × (3000, 64) NumPy: 0.041s Rust: 0.063s Speedup: 0.65x (NumPy faster) Methodology: single run, time.perf_counter(), float64 arrays Validation: np.allclose(rust, numpy, atol=1e-9) Claim Guidelines: Report both faster and slower cases transparently. Provide shapes, methodology, and validation tolerance. Prefer multiple runs or median timing for publication-quality claims. Examples -------- Basic usage with small arrays: .. code-block:: python import numpy as np from eo_processor import euclidean_distance A = np.array([[0.0, 1.0], [1.0, 2.0], [2.0, 2.0]], dtype=np.float64) # (3, 2) B = np.array([[0.0, 0.0], [2.0, 1.0]], dtype=np.float64) # (2, 2) dist = euclidean_distance(A, B) # dist.shape == (3, 2) # Each row i corresponds to A[i], columns correspond to B[j] Large problem sketch (be cautious of memory): .. code-block:: python import numpy as np from eo_processor import euclidean_distance # 10k x 64 and 8k x 64 -> 80 million distances (requires several hundred MB) A = np.random.rand(10_000, 64) B = np.random.rand(8_000, 64) dist = euclidean_distance(A, B) # Consider chunking if this is near memory limits. See Also -------- - :func:`manhattan_distance` (L1 norm) - :func:`chebyshev_distance` (L∞ norm) - :func:`minkowski_distance` (general L^p) - :func:`normalized_difference` (band-wise index primitive, unrelated but shared numeric stability patterns) Error Handling -------------- Raises `ValueError` for: - Non-2D inputs - Feature dimension mismatch Raises `TypeError` if inputs are not interpretable as numeric arrays. End of euclidean_distance documentation.