Currently I use a function to run an aj join for a single column from a large table on to a smaller table which has its time column shifted t milliseconds ahead, joining on a sym column as well as time. I then compute and programatically name a new column based on this joined column, before deleting the original joined column from the small table. This returns the small table with a new column based on values joined from the larger table t milliseconds ahead.
I then use an Over loop / to repeat this over a list of different delays t, recursively adding one new column for each delay in the list, passing as an argument the table so columns are added recursively.
My issue is the query, join and processing are slow on a large table. I have many cores so I would like to parallelise this operation to take advantage of all available cores, as well as optimising the steps taken to add the new columns. The large table is partitioned on disk by date and sym.
[Edit:] Here is an example of what I have at the moment.
smallT: ([] sym: (20#`AAPL),(20#`MSFT); time: (asc 00:00:00+20?til 100), (asc 00:00:00+20?til 100));
bigT: ([] sym: (100#`AAPL),(100#`MSFT); time: (asc 00:00:00+til 100), (asc 00:00:00+til 100); price: (til 100),(til 100));
delays: 00:00:00 + (7 * til 5);
foo: ([bigTab; smallTab2; delays]
smallTab2: aj[ `sym`time; `sym`time xasc select from (update time:time+delays from smallTab2); `sym`time xasc select sym, time, future pricesprice from bigTabl;
smallTab2: ![smallTab2; (); 0b; enlist[$"colnametime_", string(`int$delays)] ! enlist(%;`future_price;100)];
delete future_price from smallTab2
}[bigT];
smallT:foo/[select from smallT; delays];
smallT
I am relatively new to q and kdb so verbose explanations of how and why a solution works with working code on a toy example would be very greatly appreciated.
Your function is repeatedly sorting the tables inside the loop which will slow you down.
Also as noted in documentation attributes should be applied on the tables which will greatly improve performance. https://code.kx.com/q/ref/aj/#performance
Rather than looping through the delay offsets you can instead create the full list and only
ajonce. https://community.kx.com/t5/New-kdb-q-users-question-forum/How-do-you-start-thinking-in-vectors/td-p/12722