I am new to Pandera and am trying to dynamically create multiple checks on same column of the dataframe. It looks like the checks are getting overwritten somehow.
Below the rules list based upon which I am trying to dynamically create the schema.
rules = [
{'column': 'series_value_date', 'validation_function': 'futureDateCheck', 'error_message':'date cannot be in future'},
{'column': 'series_value_date', 'validation_function': 'weekendDateCheck', 'error_message':'date cannot be on a weekend'}
]
schema-creation
def _create_schema(rules: List[Dict[str, Any]]) -> Dict[str, pa.Column]:
dynamic_checks = {}
for rule in rules:
column_name = rule['column']
additional_param = rule.get('additional_param', None)
if column_name in dynamic_checks:
dynamic_checks[column_name].checks.append(
pa.Check(
lambda series, param=additional_param: rule['validation_function'](series, param),
element_wise=True,
error=rule['error']
)
)
else:
dynamic_checks[column_name] = pa.Column(
str,
checks=[pa.Check(
lambda series, param=additional_param: rule['validation_function'](series, param),
element_wise=True,
error=rule['error']
)]
)
return dynamic_checks
If I have just one rule in the rules list or if the rules apply to different columns, everything is working as expected. How do I make multiple checks on single column work dynamically ?
Update
I see that trying to pass additional param to custom validation functions is causing a problem.
So if I remove below and not try to pass additional param, all works.
lambda series, param=additional_param: rule['validation_function'](series, param)