In Python Splinter/Selenium, how to load all contents in a lazy-load web page

133 views Asked by At

What I want to do - Now I want to crawl the contents (similar to stock prices of companies) in a website. The value of each element (i.e. stock price) is updated every 1s. However, this web is a lazy-loaded page, so only 5 elements are visible at a time, meanwhile, I need to collect all data from ~200 elements.

What I tried - I use Python Splinter to get the data in the div.class of the elements, however, only 5-10 elements surrounding the current view appear in the HTML code. I tried scrolling down the browser, then I can get the next elements (stock prices of next companies), but the information of the prior elements is no longer available. This process (scrolling down and get new data) is too slow and when I can finish getting all 200 elements, the first element's value was changed several times.

So, can you suggest some approaches to handle this issue? Is there any way to force the browser to load all contents instead of lazy-loading?

1

There are 1 answers

0
CampingCow On

there is not the one right way. It depends on how is the website working in background. Normaly there are two options if its a lazy loaded page.

  1. Selenium. It executes all js scripts and "merges" all requests from the background to a complete page like a normal webbrowser.

  2. Access the API. In this case you dont have to care for the ui and dynamicly hidden elements. The API gives you access to all data on the webpage, often more than displayed.

In your case, if there is an update every second it sounds like a stream connection (maybe webstream). So try to figure out how the website gets its data and then try to scrape the api endpoint directly. What page is it?