Create empty pandas dataframe from pandera DataFrameModel

1.2k views Asked by At

Is there a way to create an empty pandas dataframe from a pandera schema?

Given the following schema, I would like to get an empty dataframe as shown below:

from pandera.typing import Series, DataFrame

class MySchema(pa.DataFrameModel):
    state: Series[str]
    city: Series[str]
    price: Series[int]

def get_empty_df_of_schema(schema: pa.DataFrameModel) -> pd.DataFrame:
    pass

wanted_result = pd.DataFrame(
    columns=['state', 'city', 'price']
).astype({'state': str, 'city': str, 'price': int})
wanted_result.info()

Desired result:

Index: 0 entries
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   state   0 non-null      object
 1   city    0 non-null      object
 2   price   0 non-null      int64 

Edit:

Found a working solution:

def get_empty_df_of_pandera_model(model: [DataFrameModel, MetaModel]) -> pd.DataFrame:
    schema = model.to_schema()
    column_names = list(schema.columns.keys())
    data_types = {column_name: column_type.dtype.type.name for column_name, column_type in schema.columns.items()}
    return pd.DataFrame(columns=column_names).astype(data_types)
2

There are 2 answers

2
Nimra Tahir On

Yes, it is possible to create empty pandas dataframe using pandera schema with the help of the function schema.to_dataframe().

Here is the updated version of the function get_empty_df_of_schema

def get_empty_df_of_schema(schema: pa.DataFrameModel) -> pd.DataFrame:
    row_empty = schema({}).astype(str).iloc[0]
    return pd.DataFrame(columns=row_empty.index).astype(row_empty.to_dict())

Also, have a look at dataframes schemas through the following link

1
camo On

The current pandera docs have small section on pandas data types

This suggests the following solution:

import pandera as pa
import pandas as pd

def empty_dataframe_from_model(Model: pa.DataFrameModel) -> pd.DataFrame:
    schema = Model.to_schema()
    return pd.DataFrame(columns=schema.dtypes.keys()).astype(
        {col: str(dtype) for col, dtype in schema.dtypes.items()}
    )