I have a data frame with a column called listA, and a listB. I want to pull out only those rows in the data frame which match to an entry in listB, so I have:
newData <- mydata[mydata$listA %in% listB,]
However, some entries of listA are in the format "ABC /// DEF", where both ABC and DEF are possible entries in listB. I want to pull out the rows of the data frame which have a listA for which any of the words match to an entry in listB. So if listB had "ABC" in it, that entry would be included in newData. I found the strsplit function, but things like
strsplit(mydata$listA," ") %in% listB
always returns FALSE, presumably because it's checking if the whole list returned by strsplit is an entry in listB.
match(word_vector, target_vector)allows both arguments to be vectors, which is what you want (note: that's vectors, not lists). In fact,%in%operator is a synonym formatch(), as its help tells you.stringipackage's methodsstri_match_*may well directly do what you want, are all vectorized, and are way more performant than eithermatch()orstrsplit():stri_match_all stri_match_all_regex stri_match_first stri_match_first_regex stri_match_last stri_match_last_regexAlso, you probably won't need to use an explicit split function, but if you must, then use
stringi::stri_split_*(), avoidbase::strsplit()Note on performance: avoid splitting strings like the plague in R whenever possible, it creates memory leaks via unnecessary conscells, as
gc()will show you. That's yet another reason whystringiis very efficient.