Using IMPORTXML on Google Sheets to extract hyperlinks

969 views Asked by At

Trying to scrape coinmarketcap.com for the links listed under "Audits."

The =importXML function in Google Sheets seems to be the best method to extract the url for these audits since their API does not provide them. Can anyone show me how to obtain the XPath to these links so that I can create a list in Google Sheets? Or is there a better method?

Example of url I am looking for

1

There are 1 answers

0
Trey_H On

To get the first-ranked hyperlink then it would look like

="https://coinmarketcap.com"&IMPORTXML("https://coinmarketcap.com/","//*[@id='__next']/div/div[1]/div[2]/div/div/div[5]/table/tbody/tr[1]/td[3]/div/a/@href")

The second-ranked hyperlink you would make tr[1] into tr[2] like below

="https://coinmarketcap.com"&IMPORTXML("https://coinmarketcap.com/","//*[@id='__next']/div/div[1]/div[2]/div/div/div[5]/table/tbody/tr[2]/td[3]/div/a/@href")

The third-ranked hyperlink you would make tr[2] into tr[3] like below,

="https://coinmarketcap.com"&IMPORTXML("https://coinmarketcap.com/","//*[@id='__next']/div/div[1]/div[2]/div/div/div[5]/table/tbody/tr[2]/td[3]/div/a/@href")

As you may have noticed tr[1] gets the row of the chart and td[3] gets the column/section of that chart. So if you want the Price section hyperlink then change td[3] to td[4]. The @href gets the name of the href which is the website path after the home page. If you left the @href out, you would get the value of /a instead. This will only work with the columns/sections that have hyperlinks of course. I do not see the audit section shown in your image.

The formula below will give you all the hyperlinks in the first row by making td[1] into td[*]

=ArrayFormula("https://coinmarketcap.com"&IMPORTXML("https://coinmarketcap.com/","//*[@id='__next']/div/div[1]/div[2]/div/div/div[5]/table/tbody/tr[1]/td[*]/div/a/@href"))