Addiing rows from from one dataset to another based on certain conditions when all columns name don't match in kdb+

205 views Asked by At

I have two datasets, named data1 and data2. data1 look like:

 id1     id2  exc1  exc2  exc3  exc4
 "aa2"   "12ac"   45     54   53     65 
 "bb"     "23"   23     33   23     12

data2 looks like:

kid1   id2   sf1   sf2  sf3   sf4  exc1 exc2
"aa2" "ads2"  55    6    55   66    45   54

The kid1 column and id1 column have the same entries, just the number of rows in both dataset is different. There are some rows missing in data1 which I have to pick from data2. To do this I have to combine id1 and id2 in data1 and kid1 and id2 in data2 and create a new column called link. In excel I did this like "aa2 | 12ac" and similarly for data2. Further I have to search which entries of link are present in data2 but not in data1 and I have to add them in data1.

The condition for adding new rows in data1 from data2 is that: if columns name are same in data1 and data2, then use the data from data2 otherwise the column name which are not present in data2 but are present in data1, copy the data from row where kid1 = id1.

I have done the entire thing in excel and wanted to replicate the entire thing in kdb to make the process faster. It'd be great if someone can help me on this. Any leads on this is appreciated. Thanks

2

There are 2 answers

6
Ryan McCarron On BEST ANSWER

Here's one method to do what I think you're looking for. First, setting up tables:

t:([]id1:("aa2";"bb");id2:("12ac";"23");exc1:45 23; exc2:54 33;exc3:53 23;exc4:65 12)
q:([]kid:enlist "aa2";id2:enlist "ads2";sf1:(),55;sf2:(),6;sf3:(),55;sf4:(),66;exc1:(),45;exc2:(),54)

Then, using the sv keyword to join the id1/id2 and kid/id2 columns together into the link column, as you do above, and key the tables on this new column

rt:`link xkey update link:`$"|"sv/:flip(id1;id2),id1:`$id1,id2:`$id2 from t
rq:`link xkey update link:`$"|"sv/:flip(kid;id2),kid:`$kid,id2:`$id2 from q

Then use uj to join the tables together, which will automatically overwrite values in the first table with values from the second table if the records match, or keep the old values if not:

q)rt uj rq
link     | id1   id2    exc1 exc2 exc3 exc4 kid   sf1 sf2 sf3 sf4
---------| ------------------------------------------------------
aa2|12ac | "aa2" "12ac" 45   54   53   65   ""
bb|23    | "bb"  "23"   23   33   23   12   ""
aa2|ads2 | ""    "ads2" 45   54             "aa2" 55  6   55  66

Hope that helps.

3
Jonathon McMurray On

Are you looking for something like this?

q)(2!data1) uj `id1`id2 xkey update id1:kid1 from data2
id1   id2   | exc1 exc2 exc3 exc4 kid1  sf1 sf2 sf3 sf4
------------| -----------------------------------------
"aa2" "12ac"| 45   54   53   65   ""
"bb"  "23"  | 23   33   23   12   ""
"aa2" "ads2"| 45   54             "aa2" 55  6   55  66

The uj with two keyed tables will return a table with the union of the keys

In order to fall back to updating exc2 and exc3 from data1 if they're blank, using only id1 as a key, you can try something like this:

q)t:(2!data1) uj `id1`id2 xkey update id1:kid1 from data2   //same as before
q)(t lj 1!select id1,exc2,exc3 from data1)^t                //lj these fields on, use fill to only update null fields
id1   id2   | exc1 exc2 exc3 exc4 kid1  sf1 sf2 sf3 sf4
------------| -----------------------------------------
"aa2" "12ac"| 45   54   53   65   ""
"bb"  "23"  | 23   33   23   12   ""
"aa2" "ads2"| 45   54   53        "aa2" 55  6   55  66
q)cols[data1]#0!(t lj 1!select id1,exc2,exc3 from data1)^t  //use Ryan's suggestion for getting the cols you desire
id1   id2    exc1 exc2 exc3 exc4
--------------------------------
"aa2" "12ac" 45   54   53   65
"bb"  "23"   23   33   23   12
"aa2" "ads2" 45   54   53