I have an Excel .CSV file in which one column has the transcription of a conversation. Whenever the speaker uses Spanish, the Spanish is written within brackets.
One example sentence:
so [usualmente] maybe [me levanto como a las nueve y media] like I exercise and the I like either go to class online or in person like it depends on the day
Ideally, I'd like to extract the English and Spanish separately, so one file would contain all the Spanish words, and another would contain all the English words.
Any ideas on how to do this? Or which function/package to use?
Edited to add: there's about 100 cells that contain text in this Excel sheet. I guess where I'm confused is how do I treat this entire CSV as a "string"?
I don't want to copy and paste every cell as a "strng" -- I was hoping I could someone just upload the entire CSV

To load the CSV into R, you could use
readr::read_CSV(YOUR_FILE.CSV). There are more options, some of which are available to you if you use the "File -- Import Dataset -- From Text (readr)" menu option in RStudio.Supposing you have the data loaded, you will likely need to rely on some form of "regex" to parse the text into sections based on the brackets. There are some base R functions for this, but I find the functions in
stringr(part of thetidyversemeta-package) to be useful for this. Andtidyr::separate_rowsis a nice way to split the text into more lines.In the regex below, there are a few ingredients:
(?=...)means to split before the[but to keep it.\\[is how we refer to[because brackets have special meaning in regex so we need to "escape" them to treat them as a literal character.(?<=...)means to split after the]but keep it.|in the last row means "or"(Granted, I'm still a regex beginner, so I expect there are more concise ways to do this.)
So we could do something like:
Result