I am using the Rvest and RSelenium packages in R to extract data from a website.
My current code is the following:
install.packages("RSelenium")
library(RSelenium)
rD <- rsDriver(browser = "chrome", port = 4447L, geckover = NULL,
chromever = "latest", iedrver = NULL,
phantomver = NULL)
remDr <- rD[["client"]]
remDr$navigate("https://data.anbima.com.br/fundos?page=1&size=100&classe_anbima=A%C3%A7%C3%B5es&tipo_anbima=&benchmark=")
remDr$findElements("id", "item-title-1")[[1]]$clickElement()
**# Extracting data from HTML**
Sys.sleep(5) # give the page time to fully load
html <- remDr$getPageSource()[[1]]
primeiro_aporte <- read_html(html) %>% # parse HTML
html_nodes(xpath='//*[@id="output__container--primeiroAporte"]/div/span')
primeiro_aporte
The code above outputs the following:
> {xml_nodeset (1)}
[1] <span class="anbima-ui-output__value">27/11/2020</span>
However, what I actually need is to extract the data (in this case, 27/11/2020). Nothing I tried worked so far. I would appreciate any help! Thanks!
Not sure if you're still looking to solve this, but you can extract the text of an html element using this:
This also works for attributes, for example you can extract all the
hrefattributes from all the links on the page like this: