li tags that are *NOT par" /> li tags that are *NOT par" /> li tags that are *NOT par"/>

Clojure (Enlive) How to use html/but (negation)

113 views Asked by At

Hi All I'm trying to "parse/extract" html-data with Clojure en Enlive (any better choices ?)

I am trying to get all the ul > li tags that are *NOT part of the <nav> tag I think I should use the (html/but) function from Enlive but can't seem to make it work ?

;;test-envlive.clj

(defn get-tags [dom tag-list]
  (let [tags
         (mapv
          #(vec (html/select dom %1))
          tag-list)]
    tags))

;;Gives NO tags
(get-tags test-dom [[[(html/but :nav) :ul :> :li]]])

;;Gives ALL the LI-tags
(get-tags test-dom [[:ul :> :li]])
<!-- test.html -->
<html>
<head><title>Test page</title>  </head>
<body>
    <div>
        <nav>
            <ul>
                <li>
                    skip these navs-li
                </li>
                
            </ul>
        </nav>
        <h1>Hello World<h1>                 
        <ul><li>get only these li's</li>                
        </ul>           
    </div>  
</body></html>
3

There are 3 answers

1
akond On BEST ANSWER

If you had a valid xhtml, you could use XPath from sigel:

(require '[sigel.xpath.core :as xpath])
(let [data "<html><head><title>Test page</title></head>
                <body><div><nav><ul><li>skip these navs-li</li></ul></nav>
                <h1>Hello World</h1>
                <ul><li>get only these li's</li></ul>
                </div></body></html>"]
        (xpath/select data "//li[not(ancestor::nav)]"))
0
Alan Thompson On

You could do this with the Tupelo Forest library. Watch the video and see the examples in the unit tests.

Here is one way to solve your problem:

(ns tst.tupelo.forest-examples
  (:use tupelo.core tupelo.forest tupelo.test)
  (:require. ... ))

<snip>

(verify
  (let [html-data "<html>
                      <head><title>Test page</title>  </head>
                      <body>
                          <div>
                              <nav>
                                  <ul>
                                      <li>
                                          skip these navs-li
                                      </li>

                                  </ul>
                              </nav>
                              <h1>Hello World<h1>
                              <ul><li>get only these li's</li>
                              </ul>
                          </div>
                      </body>
                  </html> "]

and the interesting part comes next.

    (hid-count-reset)
    (with-forest (new-forest)
      (let [root-hid   (add-tree-html html-data)
            out-hiccup (hid->hiccup root-hid)
            result-1   (find-paths root-hid [:html :body :div :ul :li])
            li-hid     (last (only result-1))
            li-hiccup  (hid->hiccup li-hid)]
        (is= out-hiccup [:html
                         [:head [:title "Test page"]]
                         [:body
                          [:div
                           [:nav
                            [:ul
                             [:li
                              "\n                                          skip these navs-li\n                                      "]]]
                           [:h1 "Hello World"]
                           [:ul [:li "get only these li's"]]]]])
        (is= result-1 [[1011 1010 1009 1008 1007]])
        (is= li-hid 1007)
        (is= li-hiccup [:li "get only these li's"])))))

The above code can be seen live in the examples.

0
Martin Půda On

I was able to select target li with Hickory, so if you don't mind changing your library:

Dependency: [hickory "0.7.1"]

Require: [hickory.core :as h] [hickory.select :as s]

(s/select (s/and
            (s/descendant (s/tag :ul)
                          (s/tag :li))
            (s/not (s/descendant (s/tag :nav)
                                 (s/tag :li))))
          (h/as-hickory (h/parse (slurp "resources/site.html"))))

=> [{:type :element, :attrs nil, :tag :li, :content ["get only these li's"]}]