Efficiently access small chunk of grid on OPeNDAP server with python xarray when grid is saved in a sparse format

222 views Asked by At

I am trying to access gridrad data. This data is stored on an OPeNDAP server, with one URL/file for each timestamp (example). Normally xarray is great for accessing these OPeNDAP grids, because you can slice up the giant grid to get only the section you want, before calling a .compute() to actually download it (or at least, I think that's how it works).

For example, I can download a subset of elevation data like this:

elevation_url  = 'https://pae-paha.pacioos.hawaii.edu/thredds/dodsC/srtm30plus_v11_land'
elevation_data = xr.open_dataarray(elevation_url)
elevation_data = elevation_data.sel(lon = slice(-105, -100), lat = slice(35, 45)).compute()

However, the gridrad dataset is stored in a sparse format, as seen below

url = 'https://rda.ucar.edu/thredds/dodsC/files/g/ds841.1/202006/nexrad_3d_v4_2_20200601T000000Z.nc'
radar_data = xr.open_dataset(url)
print(radar_data)

<xarray.Dataset>
Dimensions:         (Longitude: 2832, Latitude: 1248, Altitude: 28,
                     Sweep: 1902, Index: 4288002, time: 1)
Coordinates:
  * Longitude       (Longitude) float64 235.0 235.0 235.1 ... 293.9 294.0 294.0
  * Latitude        (Latitude) float64 24.01 24.03 24.05 ... 49.95 49.97 49.99
  * Altitude        (Altitude) float64 1.0 1.5 2.0 2.5 ... 19.0 20.0 21.0 22.0
  * time            (time) datetime64[ns] 2020-06-01
Dimensions without coordinates: Sweep, Index
Data variables:
    sweeps_merged   (Sweep) |S64 ...
    index           (Index) int32 ...
    Reflectivity    (Index) float32 ...
    wReflectivity   (Index) float32 ...
    SpectrumWidth   (Index) float32 ...
    wSpectrumWidth  (Index) float32 ...
    datehour        (time) int32 ...
    Nradobs         (Altitude, Latitude, Longitude) int8 ...
    Nradecho        (Altitude, Latitude, Longitude) int8 ...
...

Where "Index" gives the flattened index location of each variable in a Altitude x Latitude x Longitude array.

I can extract the data like this:

refl   = np.zeros(radar_data['Nradobs'].size)*np.nan
refl[radar_data['index']]   = radar_data['Reflectivity']
refl  = refl.reshape(radar_data['Nradobs'].shape)

But this requires downloading the entire 'Reflectivity' grid, which is way more than I actually need.

I can come up with a general method to get just the indices I want, but I'm not sure where to go after this:

min_lon = 240
max_lon = 241
min_lat = 37
max_lat = 38
min_alt = 1
max_alt = 2

alt_inds, lat_inds, lon_inds = np.unravel_index(radar_data['index'], radar_data['Nradobs'].shape)

good_lon_inds = radar_data['Longitude'][lon_inds].to_series().between(min_lon, max_lon).to_numpy()
good_lat_inds = radar_data['Latitude'][lat_inds].to_series().between(min_lat, max_lat).to_numpy()
good_alt_inds = radar_data['Altitude'][alt_inds].to_series().between(min_alt, max_alt).to_numpy()

good_inds = good_lon_inds & good_lat_inds & good_alt_inds

Is there a way to access only a small geographic subset of the whole grid, even though it is saved in this sparse format? Bonus points if you can come up with a way to get a temporal range (by accessing multiple URLs) as well.

0

There are 0 answers