Identify text pattern in R dataframe

32 views Asked by At

I have identifiers in two columns of a dataframe but with different structure. It looks like this:

  Description1                Description2
1  A0A2H1CVW1_FASHEprotein1   tr|A0A2H1CVW1|A0A2H1CVW1_FASHEprotein1 
2  A0A4E0RAA2_FASHEprotein2   tr|A0A2H1BSG1|A0A2H1BSG1_FASHEprotein3
3  A0A2H1CFJ4_FASHEprotein4   tr|A0A2H1CFJ4|A0A2H1CFJ4_FASHEprotein4

How could I identify the different identifiers between the two column, for example in row 2?

1

There are 1 answers

0
Allan Cameron On

You could use str_detect from the stringr package to find whether Description1 can be found within Description2

library(stringr)

str_detect(df$Description2, df$Description1)
#> [1]  TRUE FALSE  TRUE

Data in reproducible format

df <- structure(list(Description1 = c("A0A2H1CVW1_FASHEprotein1",  
                                      "A0A4E0RAA2_FASHEprotein2", 
                                      "A0A2H1CFJ4_FASHEprotein4"), 
                     Description2 = c("tr|A0A2H1CVW1|A0A2H1CVW1_FASHEprotein1", 
                                      "tr|A0A2H1BSG1|A0A2H1BSG1_FASHEprotein3",
                                      "tr|A0A2H1CFJ4|A0A2H1CFJ4_FASHEprotein4"
                )), class = "data.frame", row.names = c("1", "2", "3"))