I'm trying to scrape a website that doesn't use class or ids, and the structure is like this:
<div>
<div>
<div>
Some content
</div>
</div>
<div>
Other content
<div>
</div>
I'm trying something like doc.css('div div') but that's returning duplicates of the content, since nested containers all match that selector.
How do I select only the bottom of the nest, knowing that they are not all the same depth?
Another way to phrase the question, is there a way to do something like "div with no div children"? It may have other children, just not divs
Edit:
Trying to clarify, with the above html I can call:
doc.css('div div').map(&:text)
To get the text of the document, divided into an array by the divs. The problem is, that line is returning "Some content" twice, because even though it exists once in the html, there are two 'div div' matches with that text.
This code finds all the leaf elements and checks if they're divs. This is what I'm assuming what you're trying to do.