Convert output of Berkeley Neural Parser to Chomsky Normal Form (binary branching tree)

16 views Asked by At

I'm using the Berkeley Neural Parser but need the output to be a binary branching constituency tree (Chomsky Normal Form), meaning that every node has at most two daughter nodes. Here's a simple example:

import spacy, benepar 

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe('benepar', config={'model': 'benepar_en3'})
doc = nlp('The hungry squirrel wants to eat some yummy nuts')

sent = list(doc.sents)[0]
tree = sent._.parse_string

print(tree)

Output:

(S (NP (DT The) (JJ hungry) (NN squirrel)) (VP (VBZ wants) (S (VP (TO to) (VP (VB eat) (NP (DT some) (JJ yummy) (NNS nuts)))))))

The subject NP 'the hungry squirrel' is parsed as consisting of three constituents, 'the,' 'hungry' and 'squirrel,' thereby violating binary branching.

Benepar doesn't seem to have a built-in mechanism to convert trees into CNF. NLTK has the nltk.tree.transforms.module but it isn't straightforwardly applicable to a benepar tree.

Has anyone had this issue before or has an idea how to deal with it? Let me know if there's any other info you'd need. Thanks!

0

There are 0 answers