I have an HTML file like this:(More than 100 records)
<div class="cell-62 pl-1 pt-0_5">
<h3 class="very-big-text light-text">John Smith</h3>
<span class="light-text">Center - VAR - Employee I</span>
</div>
<div class="cell-62 pl-1 pt-0_5">
<h3 class="very-big-text light-text">Jenna Smith</h3>
<span class="light-text">West - VAR - Employee I</span>
</div>
<div class="cell-62 pl-1 pt-0_5">
<h3 class="very-big-text light-text">Jordan Smith</h3>
<span class="light-text">East - VAR - Employee II</span>
</div>
I need to extract the names IF they are Employee I, which makes it challenging. How can I select those tags that have Employee I in the next tag? Or should I use a different method? Is it even possible to use condition in this case?
with open("file.html", 'r') as input:
html = input.read()
print(re.search(r'\bEmployee I\b',html).group(0))
Like, how can I specify to go to read previous tag?
gives
I've formatted the list comprehension over several lines, for clarity, so that it may be easier to see where to adjust things accordingly to other use cases. Of course, a normal for-loop and appending to a list also works fine; I just like list comprehensions.
The
re.compile('Employee I$')is necessary to avoid matching on'Employee II'. Theclass_argument is an extra, and may not be needed.The rest is near self-explanatory, especially with the BeautifulSoup documentation next to it.
Note that if the
.stringattribute used to be.text, in case you're using an older version of BeautifulSoup.