Sorting on similar values using talend

27 views Asked by At

I have a CSV file

I want to group together similar DeptID and sort the MID in ascending order and assign the value of lowest MID to SID who have similar DeptID using talend open studio for data integration. If there is one DeptID, then assign the same value of MID to SID.

Input CSV :

DeptID SID StudentName MID
111 Nancy C1
111 Nancy B1
111 Nancy A1
222 James Z1

I have used tFileInputDelimited to read the input file, I have used tSortRow to sort MID. And I have used tAggregateRow to group the values.

I am getting output as:

DeptID SID StudentName MID
111 Nancy [A1,B1,C1]
222 James [Z1]

The Output CSV should be as follows:

DeptID SID StudentName MID
111 A1 Nancy A1
111 A1 Nancy B1
111 A1 Nancy C1
222 Z1 James Z1
1

There are 1 answers

0
Corentin On

One simple solution would be to read twice your input file : one time as the main flow, to get detail of MID column , one time as the lookup to get MIN value of MID column as your SID column. Then join the 2 flows with a tMap, joining on deptID (with "all matches" join type). enter image description here

enter image description here

Another solution could be to use internal variables of tMap to get the work done with fewer components :

Once you have sorted your data with tSortRow, create 2 variables in a tMap following your tSort component : enter image description here

  • "sequence" creates an increment based on DeptId, starting at 1
  • "currentVal" checks if sequence equals 1 : if so you get the current MID as SID. Else SID don't change. enter image description here

The 2 solutions work to get the SID value .