R generate json string from two key columns

65 views Asked by At

New to R. I'm developing an entity resolution algorithm using the RecordLinkage package. I've had pretty good success so far - using dedup, I end up with a data frame, two columns of which are keys of matched records, as below:

x <- list(key1 = c(1,1,2,2,3,3,3,4,5,6))
y <- list(key2 = c(3,4,5,6,4,8,9,7,10,11))
df <- data.frame(key1 = x, key2 = y)
df
     key1 key2
1     1    3
2     1    4
3     2    5
4     2    6
5     3    4
6     3    8
7     3    9
8     4    7
9     5   10
10    6   11

Trying to figure out how to end up one row for each entity with a json string containing all the keys for that entity. Such as:

               entity_keys
1 {"awkeys":"1,3,4,8,9,7"}
2 {"awkeys":"2,5,6,10,11"}

I'm using toJSON from rjson to generate the string - the tough part is how to compile the list of keys. Am assuming transitive matching here (ex. if 1 matches 3 and 3 matches 8, then 1 matches 8).

Am sure there's a snazzy R way to do this but don't know what that would be. Any help is appreciated.

0

There are 0 answers