Regarding Great Expectations I want to create a custom expectation to validate if there are multiple unique observations of id_client based on a given id_product key in a DataFrame.
After set up my Great Expectations project, I'm having trouble figuring out how to define and implement a custom expectation for this specific validation.
Here is a Data Sample:
import pandas as pd
# Create a sample DataFrame
df = pd.DataFrame({
'id_product': [1, 1, 2, 2, 2, 3, 3],
'id_client': [101, 102, 201, 202, 203, 301, 301]
})
This is the validation I can do in pandas but not in great expectations:
def count_unique_rows(df, id_column, other_column):
unique_rows = df.groupby([id_column, other_column]).size().reset_index()
count = unique_rows.groupby(id_column).size().reset_index(name='count')
return count
assert any(count_unique_rows(df, 'id'_product, 'id_client')['count'] > 1)
Basically I want to study if there is any data inconsistence by setting up a condition
You could add a custom excpectation as this one :
The
expect_unique_pairmethod will check against the given customPandasDataset for uniqueness of the key [id_product, id_client]. It returns a series of boolean wether the pair is unique or not.