I am trying to apply some data quality rules using pandera library. I am trying to check the quality of PV timeseries and i want to apply these 2 rules (if there are any negative values, and if a particular threshold is exceeded). I tried this code, but it seems that it executes only the 1st rule, am I missing something?:
import pandas as pd
import pandera as pa
df = pd.read_excel('Axxx PV data values only modified.xlsx')
# Convert 'timestamp' column to datetime objects
df['timestamp'] = pd.to_datetime(df['timestamp'], format='%Y-%m-%dT%H:%M:%S.%fZ')
#df = pd.DataFrame(data)
# Define the schema with element-wise Pandera rules
schema = pa.DataFrameSchema({
"timestamp": pa.Column(pa.DateTime, required=True),
"value": pa.Column(pa.Float, checks=[
pa.Check(lambda s: (s >= 100), element_wise= True, error="Value exceeds threshold"),
pa.Check(lambda s: (s >= 0), element_wise= True, error="Negative values not allowed")
])
})
thanks in advance