Does Pentaho allow the "Filter rows" step to load a list of values?

1.3k views Asked by At

I need to filter the rows of a table according to a list DOM_LABEL that contains the correct values that should match the ones present in the column label.

DOM_LABEL = ['Microsoft', 'Apple', 'Lenovo'] this list is contained inside a csv file.

The problem is that some rows have incorrect values such as 'Samsung', 'HP', etc. I don't know the incorrect values present, so by filtering the rows by label values not contained in DOM_LABEL helps me analyze the problem better.

However, the Filter rows step does not allow me to load a list of values. And manually adding them into the step is gruesome because the DOM_LABEL contains more than 100 values. Is there a way to load those values into the Filter rows step?

2

There are 2 answers

2
Bert-Jan Stroop On

I think the best sollution in this case, if I'm understanding the issue correctly, is by setting up a field beforehand, which checks if the value is in the list and then filter rows on that field.

How to do this?

Load the data from the csv in a seperate step (csv input). Then in your mainflow BEFORE the filter row step, us a 'stream lookup' step to find the value from your main flow column in the csv. retrieve the column i which you lookup the value. Then in the filter rows, just check if the lookedup value is empty or not.

Layout of flow

0
Omegad On

According to this documentation, you can use the Filter step with a List of values.

You need to select 'IN LIST' in Functions and then put values as String seperated by a semicolon (;). This also works on numeric values like integers. The step return True if value is in the LIST.

e.g : Microsoft;Apple;Lenovo

Screenshot of step

Considering you're a using a CSV file Input, you must concat fields using the Group by step before using the Filter and make sure to seperate using semicolon (;).

You can add a Constant and group by the constant field to concatenate the label.