parsing HTML5 with Enlive/Tagsoup/JSoup

729 views Asked by At

HTML5 allows <meta> tags to appear in the body, but Enlive does not seem to support this:

(deftest test-enlive
  (testing "enlive"
    (let [html-as-string "<!DOCTYPE html><html lang=\"en\"><body><div><meta foo=\"bar\"><span>the content</span></body></html>"
          parsed-html (enlive/html-resource (java.io.StringReader. html-as-string))
          span (enlive/select parsed-html [ :div  :span ])
          content (first (map enlive/text span))]
      (is (= "the content" content)))))

This test fails, but will pass if you remove the meta tag.

This old thread led me to realize that it was the meta tag that was causing a problem.

I realize that Enlive depends on Tagsoup, but when I switch it out for JSoup (which claims to support HTML5) I get the same result.

0

There are 0 answers