I would like to to shuffle some columns from a table in Postgres database. I have 2 millions rows. I need to update all not null values by another.
I need to keep the same dataset. It's not possible to have the same value two times. It's not possible to swap data with next values because, if I do the same process with another column I will keep the same link. It's to anonymise my database. Just need to shuffle data and keep the dataset.
exemple (change firstname and lastname):
| id | firstname | lastname |
|---|---|---|
| 1 | albert | einsten |
| 2 | isaac | newton |
| 3 | curie | |
| 4 | alexandre | Graham Bell |
| 5 | thomas | Edison |
shuffle firstname column:
| id | firstname | lastname |
|---|---|---|
| 1 | isaac | Graham Bell |
| 2 | albert | Edison |
| 3 | einsten | |
| 4 | thomas | newton |
| 5 | alexandre | curie |
How to do this with a speedy process?
Given the updated requirement, you might be better off using a strategy like Erwin Brandstetter's solution that can easily be applied to an arbitrary number of columns, however I will leave my original answer with an update for doing the second column.
Original answer (requirement to shuffle one column):
Given how general your requirements are about the order of the shuffle, I'm not sure how much this will help practically, but I think it answers your question:
The idea here is that the inner query gets the values shifted by one (ignoring nulls). The
COALESCEis used since the first one doesn't have a proceeding entry, so it falls back on someLAST_VALUElogic to get the last value (i.e. it behaves as if the shift loops around).The surrounding
UPDATEstatement joins test to the subquery to actually update the data.You can see it working in this Fiddle.
Updated (requirement to also shuffle a second column):
Given the updated requirement to also shuffle a second field, you could apply different logic there:
This simply shuffles the
lastnamein the opposite direction, sofirstnameis grabbed from the previous non-null row (wrapping around) andlastnameis grabbed from the next non-null row (wrapping around). Both columns will be changed.Here is a fiddle of it working