I have a dataset ds (dimension mid_date,y,x) that hosts a 3D numpy array (ds.v).
I have a function that I want to apply on each y,x cell of the array, which returns a vector reduced in dimension compared to the input:
# Load input matrix
array_input = ds.v.values
# Create a variable m <= array_input.shape[0]
m = 10
# Create function that inputs a vector size array_input.shape[0] and returns a vector size m
func(pt_in, m):
# reduce the dimensionality of the input
pt_out = pt_in[:m]
return pt_out
# Apply the function to every y,x cell of the input_array, store the returns in an output array
array_output = np.zeros((m, ds.v.values[1], ds.v.values[2]))
for i in range(ds.v.values.shape[1]):
for j in range(ds.v.values.shape[2]):
array_output[:,i,j] = func(array_input[:,i,j], m)
I have two objectives:
- Parallelize this function so it can iterate through the
y, xdimensions of my input array, and store the results in an output array of size (m, x, y). - Use Dask so I can apply this function on chunks rather than loading the entire dataset on memory.
I tried using dask_array.map_blocks but the issue is that my real function has a few more inputs (6 inputs in total), but just like this function only the 1st input changes: the cell being squeezed. I did not manage to pass the function with map_blocks because it takes 1 cell of 1 chunk as an argument, and not the whole chunk.
How could I achieve that ?
ps: the datacube I'm working with is hosted here: url = 'http://its-live-data.s3.amazonaws.com/datacubes/v02/N50W140/ITS_LIVE_vel_EPSG3413_G0120_X-3350000_Y350000.zarr' (beware it's big)
You need
xarray.apply_ufunc. It's a complicated, but extremely powerful function, that can map practically any behavior you want over xarray objects. If you read through the xarray tutorial documentation section onapply_ufuncyou should see examples that are similar in structure to your problem (i.e. 2 input core dimensions and potentially one new dimension on the output).