While parsing HTML Content with help of the PHP (Symfony) DomCrawler Library like:
$html = <<<'HTML'
<!DOCTYPE html>
<html>
<body>
<div class="content">
<p class="message">Hello World!</p>
!!!This content is not processed by DomCrawler as Children!!!
<p>Hello Crawler!</p>
</div>
</body>
</html>
HTML;
$crawler = new Crawler($html);
$content = $crawler->filterXPath('descendant-or-self::body/div[@class="content"]');
foreach ($content->children() as $contentChild) {
// There ar 2x iterations, missing the middle text - without tag (nodeName)
}
the middle content "!!!This content is not processed by DomCrawler as Children!!!" is not parsed in the loop and only the content with valid Tag is accepted. It might be a minor configuration needed to achieve this. Anyone knows how to fix this issue and be able to have a DomNode also for the text with no HTML Tag?
Looking forward for any hint/help, thank you in advance.
Looking at the code for DomCrawler, they appear to be filtering only elements, which goes against providing nodes (maybe a bug in their implementation or documentation?). This is technically incorrect, but you can get around it by modifying your xpath expression to look for all child nodes instead: