Summary: I am noticing a memory leak when querying a postgresql/postgis database from pandas when one of the columns returned by the query is a geometry type. I know that geopandas exists for this purpose, but I am lazy sometimes to filter columns during the query.
Here's psuedo code that reproduces the issue
with sqlalchemyconn("dbame") as conn:
df = pd.read_sql("select nongeocol from table limit 100000", conn)
del df
# Notice that allocated memory is mostly freed / reasonable RSS increase: ~10s MB
with sqlalchemyconn("dbame") as conn:
df = pd.read_sql("select geocol from table limit 100000", conn)
del df
# most allocated memory is not freed RSS ~1.4GB
gc.collect()
# no change to RSS
I'm aware of excellent explainers in this space like: https://pythonspeed.com/articles/pandas-sql-chunking/ , but this issue seems isolated to this particular data type coming from the database.
My questions
- How can I debug this and perhaps generate a coherent bug report for whatever library may be to blame that I am using?
- Anybody else seen this problem?
thanks!