How to check if any words in a list of phrases are contained in a list in R?

Question

How to check if any words in a list of phrases are contained in a list in R?

2.1k views Asked by AmandaK At 15 June 2015 at 21:39

I have a data frame with a column called listA, and a listB. I want to pull out only those rows in the data frame which match to an entry in listB, so I have:

newData <- mydata[mydata$listA %in% listB,]

However, some entries of listA are in the format "ABC /// DEF", where both ABC and DEF are possible entries in listB. I want to pull out the rows of the data frame which have a listA for which any of the words match to an entry in listB. So if listB had "ABC" in it, that entry would be included in newData. I found the strsplit function, but things like

strsplit(mydata$listA," ") %in% listB

always returns FALSE, presumably because it's checking if the whole list returned by strsplit is an entry in listB.

Original Q&A

There are 1 answers

**smci** · Answer 1 · 2015-06-15T22:00:53+00:00

match(word_vector, target_vector) allows both arguments to be vectors, which is what you want (note: that's vectors, not lists). In fact, %in% operator is a synonym for match(), as its help tells you.
But stringi package's methods stri_match_* may well directly do what you want, are all vectorized, and are way more performant than either match() or strsplit(): stri_match_all stri_match_all_regex stri_match_first stri_match_first_regex stri_match_last stri_match_last_regex

Also, you probably won't need to use an explicit split function, but if you must, then use stringi::stri_split_*(), avoid base::strsplit()

Note on performance: avoid splitting strings like the plague in R whenever possible, it creates memory leaks via unnecessary conscells, as gc() will show you. That's yet another reason why stringi is very efficient.

TechQA.

How to check if any words in a list of phrases are contained in a list in R?

There are 1 answers

Related Questions in R

Related Questions in STRING

Related Questions in MATCH

Related Questions in VECTORIZATION

Related Questions in STRING-MATCHING

Popular Questions

Trending Questions