How to convert datashader canvas.points to GeoTiff?

194 views Asked by At

I am interested on generating GeoTiff images given a set of scattered points, the dataset is normally between 500K points up to 2M points. I have been using datashader and geoviews to generate static images of the data and that works fine. However, when I want to generate the geotiff, there are two options:

  • To generate an RGB geotiff with the coloring as provided by datashader Image color map
  • To use the values inside the geotiff to create a colormap on the geotiff server.

We are interested on going for the second solution but I do not understand the values which I extract from the shade function:

import pandas as pd
import datashader as ds
import datashader.transfer_functions as tf
df = pd.DataFrame({"x": [0, 0.6, 1, 1.2, 2, 5], "y":[0, 0.6, 0.8, 1, 5, 6], "value":[1, 1, 1, 1, 0.5, 0.5]})
canvas = ds.Canvas(plot_width=100, plot_height=100, x_range=(0, 5), y_range=(0, 5), x_axis_type="linear", y_axis_type="linear")
shaded = tf.shade(canvas.points(df, "x", "y", ds.mean("value")))

As far as I know, I extract the aggregated values from shade as numpy and can use this array to save a GeoTiff with one band but the values into that array make no sense for me.

shaded.to_numpy().max()
4293318829

It is always huge numbers which have no relationship with the aggregation process.

So the questions are, what do those values actually mean? Is it possible to get the actual aggregated values ? Is it possible to extract the color mapping from an Image object so I can pass it to the server and they color the geotiff as I expect it to be colored?

2

There are 2 answers

0
Jonatan Aponte On

So datashader uses a couple of steps to produce the results we get. The more important, there is a step in which the values are mapped to RGB using the color_map input. After that the colourize function inside datashader stack the 4 RGBA uint8 arrays of size (100, 100) into one uint32 (100, 100) array using this:

From datashader:

/transfer_functions/__init__.py

values = module.dstack([r, g, b, a]).view(module.uint32).reshape(a.shape)

That is why the values are so strange. To get the RGBA, one can access the data like this:

shaded.to_numpy().base.shape
(310, 400, 4)
0
kangkang On

All your questions can be found in the datashader documentation. The actual aggregated values are stored in canvas.points(df, "x", "y", ds.mean("value")). This is a xarray object. You can write it to netcdf or geotiff easily. But in this way you will lose the geographic references/coordinate system information.

I suppose your dataset use epsg:3857 as your projection coordinate system. You can use the rioxarray package to add the geographic references/coordinate system information and write to geotiff (see https://corteva.github.io/rioxarray/latest/rioxarray.html#rioxarray.rioxarray.XRasterBase.write_crs). like this:

ag = canvas.points(df, "x", "y", ds.mean("value"))
ag.rio.write_crs("epsg:3857", inplace=True)
ag.rio.to_raster("../data/interim/test2.tif")

I think the color mapping should be the function of your server. To know what information can geotiff store, you can check here:https://www.bluemarblegeo.com/about-geotiff-format/.