How to Extract Clojure string to enumerable of strings?

128 views Asked by At

Suppose I have a simple string that I want to parse into array of string:

"add (multiply (add 1 2) (add 3 4)) (add 5 6)"

How do I parse it into 3 strings (based on outer parentheses):

add
(multiply (add 1 2) (add 3 4))
(add 5 6)

With my OOP mind, I think I need a for loop index and if else statement to do this.

I have tried parse it with string split, however I got:

command
(multiply
1
(add
3
2))
(add
3
4)

which is not what I expected

3

There are 3 answers

0
Rulle On BEST ANSWER

Either you can use the build-in LispReader

(import '[clojure.lang LispReader LineNumberingPushbackReader])
(import '[java.io PushbackReader StringReader])

(defn could-read? [pr]
  (try
    (LispReader/read pr nil)
    true
    (catch RuntimeException e false)))

(defn paren-split2 [s]
  (let [sr (StringReader. s)
        pr (LineNumberingPushbackReader. sr)
        inds (loop [result [0]]
               (if (could-read? pr)
                 (recur (conj result (.getColumnNumber pr)))
                 result))
        len (count s)
        bounds (partition 2 1 inds)]
    (for [[l u] bounds
          :let [result (clojure.string/trim (subs s l (min len u)))] :when (seq result)]
      result)))

(paren-split2 "add (    multiply (   add      1 2) (add 3 4))   (add 5   6  )")
;; => ("add" "(    multiply (   add      1 2) (add 3 4))" "(add 5   6  )")

or you can hand-code a parser:

(def conj-non-empty ((remove empty?) conj))

(defn acc-paren-split [{:keys [dst depth current] :as state} c]
  (case c
    \( (-> state
           (update :depth inc)
           (update :current str c))
    \) (if (= 1 depth)
         {:depth 0 :dst (conj-non-empty dst (str current c)) :current ""}
         (-> state
             (update :depth dec)
             (update :current str c)))
    \space (if (zero? depth)
             {:depth 0 :dst (conj-non-empty dst current) :current ""}
             (update state :current str c))
    (update state :current str c)))

(defn paren-split [s]
  (:dst (reduce acc-paren-split
                {:dst []
                 :depth 0
                 :current ""}
                s)))

(paren-split "add (    multiply (   add      1 2) (add 3 4))   (add 5   6  )")
;; => ["add" "(    multiply (   add      1 2) (add 3 4))" "(add 5   6  )"]

Note: Either approach will preserve spaces in the input strings.

0
leetwinski On

since your data elements are already in the well formed polish notation, you can simply read it as edn, and operate on the clojure's data structures:

(def s "add (multiply (add 1 2) (add 3 4)) (add 5 6)")

(map str (clojure.edn/read-string (str "(" s ")")))

;;=> ("add" "(multiply (add 1 2) (add 3 4))" "(add 5 6)")

i'm still unaware of your end goal, but this seems to fulfill the asked one.

1
Gwang-Jin Kim On

You could use read-string from clojure core to use the built-in reader of clojure. Here we read-in, use str to generated of the read-in chunk a string and subtract it from the string, clojure.string/trim the ends then, to start the cycle anew, until after trimming an empty string occurs. Then, the collected result is returned.

(defn pre-parse [s]
  (loop [s s
         acc []]
    (if (zero? (count s))
      acc
      (let* [chunk (read-string s)
             s_ (str chunk)
             rest-s (clojure.string/trim (subs s (count s_)))]
        (recur rest-s (conj acc s_))))))

recure takes its arguments, and calls loop on it with the arguments given in the order as loop takes them. We can test it with:

(def x "add (multiply (add 1 2) (add 3 4)) (add 5 6)")
(pre-parse x)
;; => ["add" "(multiply (add 1 2) (add 3 4))" "(add 5 6)"]