How to write the following clojure enlive selector?

726 views Asked by At

I am trying to scrape a website using clojure's enlive library. The corresponding CSS selector is:

body > table:nth-child(2) > tbody > tr > td:nth-child(3) > table > tbody > tr > td > table > tbody > tr:nth-child(n+3)

I have tested the above selector using jquery, and it works. But I don't know how to translate the above to enlive's selector syntax. I have tried to write something along the lines of:

(ns vimindex.core
  (:gen-class)
  (:require [net.cgrand.enlive-html :as html]))

(def ^:dynamic *vim-org-url* "http://www.vim.org/scripts/script_search_results.php?order_by=creation_date&direction=descending")
(defn fetch-url [url]
  (html/html-resource (java.net.URL. url)))

(defn scrape-vimorg []
  (println "Scraping vimorg")
  (println
    (html/select (fetch-url *vim-org-url*)
                 [:body :> [:table (html/nth-child 2)] :> :tbody :> :tr :> [:td (html/nth-child 3)] :> :table :> :tbody :> :tr :> :td :> :table :> :tbody :> [:tr (html/nth-child 1 3)]])))
;                  body  >   table:nth-child(2)         >  tbody  >  tr  >   td:nth-child(3)         >  table  >  tbody  >  tr  >  td  >  table  >  tbody  >   tr:nth-child(n + 3)
; Above selector works with jquery

(defn -main
  [& args]
  (scrape-vimorg))

But I get an empty response. Could you please tell me how to translate the above CSS selector in enlive's syntax.

Thanks a lot.

Edited: To include the full code.

2

There are 2 answers

2
jmargolisvt On

The syntax you are missing is an additional set of brackets around elements that use pseudo-selectors. So you want something like this:

 [:body :> [:table (html/nth-child 2)] :> :tbody :> :tr 
 [:td (html/nth-child 3)] :> :table :> :tbody :> :tr :> :td :> 
 :table :tbody :> [:tr (html/nth-child 1 3)]])
0
Anton Harald On

It looks like browsers (at least my version of firefox) add a tbody tag in their DOM representation even if it's not in the actual source.

Enlive does not do so. So your code should work when you omit the tbody parts.