I have this DataFrame:
| timestep | ID | x1 | x2 | y |
|---|---|---|---|---|
| 1 | 1 | 2 | 3 | 0 |
| 2 | 1 | 4 | 5 | 0 |
| ... | ... | ... | ... | ... |
| 24 | 1 | 5 | 5 | 0 |
| ... | ... | ... | ... | ... |
| 1 | 9 | 2 | 3 | 1 |
| 2 | 9 | 4 | 5 | 1 |
| ... | ... | ... | ... | ... |
| 24 | 9 | 2 | 2 | 1 |
I expect it converted to the numpy ndarray X_train with shape (9, 24, 2).
Where, 9 represent how many IDs. 24 represent n-rows aka timestep. 2 represents how many features (x1 and x2)
It same to the y_train with shape (9, 1) where 9 represents how many IDs and 1 represents scalar value since all y will be same for all 24 timesteps
The minimal reproducible example for this, when accessing
X_train[0,:,:] it will returns table with ID=1 with 24 rows (timesteps) like this:
| x1 | x2 |
|---|---|
| 2 | 3 |
| 4 | 5 |
| ... | ... |
| 5 | 5 |
(the last 24-th row)
First sort the rows by ID/timestep with
sort_values, then convertto_numpyandreshape:Or, for an arbitrary number of features/columns:
NB. assuming no combinations of ID/timestamps is missing/duplicated
For
y, since the IDs are already sorted: