-
| Hey there! I recently began using  Minimal Working Example (MWE)Consider a matrix  import numpy as np
import xarray as xr
import numba 
A = np.random.rand(20, 5)  # 20 samples, 5 features
b = np.random.rand(20, 2)  # 20 samples, (fixed) extra dimension
@numba.njit(fastmath=True, parallel=True)
def foo_nb(A, b, n_out: int = 3):
    n_samples, n_features = A.shape
    res1 = np.empty((n_samples, n_features, n_out))
    res2 = np.empty((n_samples, n_out))
    res3 = np.empty((n_samples,))
    for i in range(n_samples):
        # here numba arithmetics happen
        # note that actually X will have a dimensionalty of (n_samples_reduced, n_features)
        # with n_samples_reduced ~ 20% n_samples
        X = A * np.sum(b[i] ** 2)
        # ... 
        U, s, VT = np.linalg.svd(X)
        res1[i] = VT[:n_out].T
        res2[i] = s[:n_out]
        res3[i] = np.sum(s)
    return res1, res2, res3
foo_nb(A, b, n_out=3)This works flawlessly! However, when trying to adapt this code to work with  A = xr.DataArray(A, dims=["sample", "feature"])
b = xr.DataArray(b, dims=["sample", "extra_dim"])
# Attempt to parallelize over samples, so -> core dimensions
xr.apply_ufunc(
    foo_nb,
    A,
    b,
    input_core_dims=[["sample"], ["sample"]],
    output_core_dims=[["sample"], ["sample"], ["sample"]],
    # dask="parallelized",
)Questions
 For context, in some real-world scenarios, I anticipate handling datasets ranging from thousands to hundreds of thousands of samples. Additionally, the  | 
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
| Look here: https://tutorial.xarray.dev/advanced/apply_ufunc/apply_ufunc.html and let us know how it goes. If you see opportunities to improve that material, PRs are very welcome! | 
Beta Was this translation helpful? Give feedback.
Look here: https://tutorial.xarray.dev/advanced/apply_ufunc/apply_ufunc.html and let us know how it goes. If you see opportunities to improve that material, PRs are very welcome!