How to do zonal statistics for large size rasters in Python?

415 views Asked by At

My goal is to do zonal statistics for hundreds of rasters using the same shaplefile and save the results as a csv file. I have wrote a code and it works fine for small size test shaplefile but not my real project.

Here is the code I am using.

##### This code now works for small but not large size problem

import rasterio
import geopandas as gpd
from rasterstats import zonal_stats
from rasterstats import gen_zonal_stats
import pandas as pd
import glob

# Open the shapefile
polygons = gpd.read_file('/test/LargeDomainBoundary.shp')

# List of rasters to process
# Get a list of all raster files in a directory
raster_list = glob.glob('/test1/*.tif')
print(raster_list)

# Creating an empty dataframe to store the results
results_df = pd.DataFrame()

# Loop through the rasters
for raster_path in raster_list:

    # Open the raster file
    with rasterio.open(raster_path) as src:
      raster = src.read(1)
      print(raster.shape)
      trans = src.transform # --> here do src.transform instead of src.affine
      print(trans)

    # Perform zonal statistics
    stats = zonal_stats(polygons, raster, stats=['mean', 'sum', 'min', 'count', 'max'], affine=trans, tolerance=0.1, nodata=-32768) #nodata=-9999
#     stats = list(
#         gen_zonal_stats(polygons, raster, affine=trans, stats=['mean', 'sum', 'min', 'count', 'max'], nodata=-32768,
#                         block_size=(1000, 1000)))


    # Creating a dataframe from the stats
    df = pd.DataFrame(stats)
    # Adding the name of the raster processed as a column
    df["Raster"] = raster_path
    # Concatenating the result to the results dataframe
    results_df = pd.concat([results_df, df])

# Saving the results to a CSV file
results_df.to_csv('/globalhome/test/zonal_stats_results.csv', index=False)

I am using zonal_stats of rasterstats to do the job. I also tried gen_zonal_stats, which can set the block_size so the raster can be processed in smaller chunks. However, even 100*100 block size won't solve the problem. My rasters size is (13665, 30641) and I got the following error:

Traceback (most recent call last): File "C:\GDAL\zonalusingsameshapefile.py", line 32, in stats = list( File "C:\Users\Anaconda3\envs\venvgdal\lib\site-packages\rasterstats\main.py", line 155, in gen_zonal_stats fsrc = rast.read(bounds=geom_bounds) File "C:\Users\Anaconda3\envs\venvgdal\lib\site-packages\rasterstats\io.py", line 298, in read new_array = boundless_array( File "C:\Users\Anaconda3\envs\venvgdal\lib\site-packages\rasterstats\io.py", line 179, in boundless_array out = np.ones(shape=window_shape) * nodata File "C:\Users\Anaconda3\envs\venvgdal\lib\site-packages\numpy\core\numeric.py", line 204, in ones a = empty(shape, dtype, order) ValueError: array is too big; arr.size * arr.dtype.itemsize is larger than the maximum possible size.

Process finished with exit code 1

0

There are 0 answers