Extract relevant information for a Chrome extension

Question

Extract relevant information for a Chrome extension

143 views Asked by Piyush At 28 March 2023 at 05:17

I am trying to build a chrome extension that aggregates information from a bunch of sites when the user visits a site A


async function fetchHTML(url) {
    const response = await fetch(proxyUrl + url);
    const html = await response.text();
    console.log(html);
    return html;
  }

  // Function to extract the element - total violations from the HTML content
  function extractTotalViolations(html) {
    const parser = new DOMParser();
    const doc = parser.parseFromString(html, "text/html");
    const totalViolations = doc.querySelector(".total-violations").textContent;
    return totalViolations;
  }
  
  // The URL of the page we want to scrape
  const url = "https://whoownswhat.justfix.org/en/address/MANHATTAN/610/EAST%2020%20STREET";
  
  // Fetch the HTML content of the page and extract the total violations
  fetchHTML(url).then(html => {
    const totalViolations = extractTotalViolations(html);
    console.log(totalViolations);
  });

When I print totalViolations, I get NULL. So I printed the HTML that was fetched & I realized that I am getting some javascript code that doesn't look anything like the HTML code I see on the website directly. I suspect the website is using some javascript masking or maybe I am not fetching the HTML correctly

<script>
!function(e){function t(t){for(var n,l,i=t[0],f=t[1],a=t[2],p=0,s=[];p<i.length;p++)l=i[p],Object.prototype.hasOwnProperty.call(o,l)&&o[l]&&s.push(o[l][0]),o[l]=0;for(n in f)Object.prototype.hasOwnProperty.call(f,n)&&(e[n]=f[n]);for(c&&c(t);s.length;)s.shift()();return u.push.apply(u,a||[]),r()}function r(){for(var e,t=0;t<u.length;t++){for(var r=u[t],n=!0,i=1;i<r.length;i++){var f=r[i];0!==o[f]&&(n=!1)}n&&(u.splice(t--,1),e=l(l.s=r[0]))}return e}var n={},o={1:0},u=[];function l(t){if(n[t])return n[t].exports;var r=n[t]={i:t,l:!1,exports:{}};return e[t].call(r.exports,r,r.exports,l),r.l=!0,r.exports}l.m=e,l.c=n,l.d=function(e,t,r){l.o(e,t)||Object.defineProperty(e,t,{enumerable:!0,get:r})},l.r=function(e){"undefined"!=typeof Symbol&&Symbol.toStringTag&&Object.defineProperty(e,Symbol.toStringTag,{value:"Module"}),Object.defineProperty(e,"__esModule",
</script>

My question is how can I extract the HTMl correctly so that I can parse the DOM & get all the information from this site that I want to put on the extension. Thanks

Original Q&A

There are 2 answers

**Lajos Arpad** · Answer 1 · 2023-03-30T08:19:22+00:00

The fact that you've got Javascript as a response proves that:

the request was correct
you received a response

which means that you need to load the page while your browser's Dev Tools are open and carefully study the requests that are being sent. Based on your description it's likely that the first request being sent when you visit the page will load a Javascript code, which then is processed and sends further requests to the server. Carefully study the requests, along with their URLs, request headers and payloads as well as the responses.

You will need to replicate the request sending and you will also need to parse the response. If the response will end up being some HTML, then you can parse it in the way you already tried to parse (with the change being effected on where and how the request or requests are being sent), otherwise, if the response is not HTML, but something else, such as JSON, then carefully study the HTML that ends up being displayed on the target site and implement a code that converts the raw server response into a similar HTML code.

**Chuck Terry** · Answer 2 · 2023-03-31T12:08:26+00:00

You will have to delve a bit deeper into fetching resources to get what you're looking for. The URL in question loads content dynamically, likely to make scraping content an inconvenience... But nothing is perfect.

This URL is requested without any key or credentials and seems to contain the information you're looking for.

As others have said, pull out devTools and use the network tab to watch how the page loads its resources. It will help get you a lot closer to the data you're looking for.

TechQA.

Extract relevant information for a Chrome extension

There are 2 answers

Related Questions in JAVASCRIPT

Related Questions in GOOGLE-CHROME

Related Questions in GOOGLE-CHROME-EXTENSION

Related Questions in DOMPARSER

Popular Questions

Trending Questions