How can extract specific text from separated by specific key word in big text documents in R?

21 views Asked by At

I have big text data documents with medical records with different diagnoses which are separated by keyword [report_complete]. I want to extract the entire patient information (between "[report_complete]" to "[report_complete]") if the patient has colon cancer from the following database. How can we do it?

here is the data:

"[report_complete]"

Name: age: sex: Institution: Date of Operation: 8/2/2015 Date of Accession: 8/2/2015 Reported: 8/5/2015 16:10 Results FINAL DIAGNOSIS: *RIGHT FOOT TRANSMETATARSAL AMPUTATION:

"[report_complete]"*

Name: age: sex:

ANATOMIC PATHOLOGY Date of Operation: 7/11/2015 Date of Accession: 7/11/2015 Reported: 7/14/2015 FINAL PATHOLOGIC DIAGNOSIS: Colon cancer (biopsy done)

"[report_complete]"

I am using stringr function but, I am getting the error using it. I want to know how can we make a proper script for it?

1

There are 1 answers

0
Mark On

Assuming the data is like this:

example <- "\"[report_complete]\"

Name: age: sex: Institution: Date of Operation: 8/2/2015 Date of Accession: 8/2/2015 Reported: 8/5/2015 16:10 Results FINAL DIAGNOSIS: *RIGHT FOOT TRANSMETATARSAL AMPUTATION:

\"[report_complete]\"*

Name: age: sex:

ANATOMIC PATHOLOGY Date of Operation: 7/11/2015 Date of Accession: 7/11/2015 Reported: 7/14/2015 FINAL PATHOLOGIC DIAGNOSIS: Colon cancer (biopsy done)

\"[report_complete]\""

You can use this:

stringr::str_extract_all(example, "(?s)(?<=\\[report_complete\\]\").*?(?=\"\\[report_complete\\]\")")