I wanted to scrape the Midjourney website, as usual I went to requests-html which I had previously worked with on a famous dynamic website called Digikala. The problem is that the rendering fails and I can't select the images!
Using requests_html.HTMLSession:
from requests_html import HTMLSession
session = HTMLSession()
response = session.get('https://midjourney.com/showcase/top/')
response.html.arender(timeout=60, sleep=5)
print(response.html.xpath('//img')) # Output: []
And with requests_html.AsyncHTMLSession:
from requests_html import AsyncHTMLSession
asession = AsyncHTMLSession()
response = await asession.get('https://midjourney.com/showcase/top/')
await response.html.arender(timeout=60, sleep=5)
print(response.html.xpath('//img')) # Output: []
I tried all kinds of ways, including in this issue:
https://github.com/psf/requests-html/issues/294
The result with selenium was like this:
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Firefox() # Also tested on Chrome
driver.get('https://midjourney.com/showcase/top/')
print(driver.find_elements(By.XPATH, '//img'))
# Output:
# [<selenium.webdriver.remote.webelement.WebElement (session="b475751b-5ab3-4da8-b68b-523ceaa1ad5e", element="62660cb6-495b-434b-9bf0-80ff7a7df544")>,
# <selenium.webdriver.remote.webelement.WebElement (session="b475751b-5ab3-4da8-b68b-523ceaa1ad5e", element="8584c640-460c-422e-bd56-41327a745cee")>,
# <selenium.webdriver.remote.webelement.WebElement (session="b475751b-5ab3-4da8-b68b-523ceaa1ad5e", element="90f29838-1a88-4f2b-b4f7-b143af549a0b")>,
# <selenium.webdriver.remote.webelement.WebElement (session="b475751b-5ab3-4da8-b68b-523ceaa1ad5e", element="7eca7e52-d807-4a1c-9b4b-cc9d4c98d728")>,
...]
A solution to work with requests-html...
Here is a way to get that data with Requests (not requests-html - that is deprecated).
Result in terminal: