I can't extract the correct result with using requests_html:
>>> from requests_html import HTMLSession
>>> session = HTMLSession()
>>> r = session.get('https://www.amazon.com/dp/B07569DYGN')
>>> r.html.find("#productDetails_detailBullets_sections1")
[]
I can find the id 'productDetails_detailBullets_sections1' in the source content:
>>> """<table id="productDetails_detailBullets_sections1" class="a-keyvalue prodDetTable" role="presentation">""" in r.text
True
Actually, the issue similarly exist in PyQuery.
Why can't requests_html find this element?
I was searching for
#comparison_price_rowwhich still finds something. The next id in the source iscomparison_shipping_info_rowbut searching for#comparison_shipping_info_rowreturns an empty array. The two elements are on the same level (same parent). I examined all the source between the two but found no problem.At first.
Then I saw that there is a NUL byte somewhere between the two which probably makes the library stumble.
After removing the NUL bytes from the input, the wanted element could be found: