How to debug a pandas psycopg3 sqlalchemy read_sql memory leak when geometry column is included in result

26 views Asked by At

Summary: I am noticing a memory leak when querying a postgresql/postgis database from pandas when one of the columns returned by the query is a geometry type. I know that geopandas exists for this purpose, but I am lazy sometimes to filter columns during the query.

Here's psuedo code that reproduces the issue

with sqlalchemyconn("dbame") as conn:
  df = pd.read_sql("select nongeocol from table limit 100000", conn)
del df
# Notice that allocated memory is mostly freed / reasonable  RSS increase: ~10s MB

with sqlalchemyconn("dbame") as conn:
  df = pd.read_sql("select geocol from table limit 100000", conn)
del df
# most allocated memory is not freed RSS ~1.4GB
gc.collect()
# no change to RSS

I'm aware of excellent explainers in this space like: https://pythonspeed.com/articles/pandas-sql-chunking/ , but this issue seems isolated to this particular data type coming from the database.

My questions

  1. How can I debug this and perhaps generate a coherent bug report for whatever library may be to blame that I am using?
  2. Anybody else seen this problem?

thanks!

0

There are 0 answers