Spacy Span as_doc() Loses Components from Original Doc (Benepar)

Question

Spacy Span as_doc() Loses Components from Original Doc (Benepar)

128 views Asked by Dan At 22 September 2022 at 15:09

I have a string with a few sentences in it. I want to get the constituency parse for each of those sentences. I am doing this by doing a nlp parse of the full string to get the spacy Doc, then looping through the doc.sents and converting the Spans to Docs with span.as_doc(). However it looks like when I convert the Spans back to the Docs not all of the original data is preserved. Specifically, the benepar constituency parse is no longer there.

import spacy
import benepar

nlp = spacy.load("en_core_sci_md", disable=["ner", "lemmatizer", "textcat"])
nlp.add_pipe('benepar', config={'model': BENEPAR_DIR})
nlp_test1 = nlp('The quick brown fox jumps over the lazy dog')
print(list(nlp_test1.sents)[0]._.parse_string) # Uses benepar (works)

nlp_test2 = list(nlp_test1.sents)[0].as_doc()
print(list(nlp_test2.sents)[0]._.parse_string) # No constituency parse found (no benepar)

nlp_test3 = list(nlp_test.sents)[0].as_doc(array_head=nlp_test._get_array_attrs())
print(list(nlp_test3.sents)[0]._.parse_string) # Doesn't work either

How do I convert a Span into a Doc while keeping the benepar constituency parse data? Or is this not possible and benepar only parses the first of the doc.sents?

Original Q&A

There are 2 answers

**JulienBr** · Answer 1 · 2022-09-23T12:45:39+00:00

JulienBr On 23 September 2022 at 12:45

It seems like as_doc() doesn't run extra pipelines, added by nlp.add_pipe().

Instead of

nlp_test2 = list(nlp_test1.sents)[0].as_doc()

Do

nlp_test2 = nlp(list(nlp_test1.sents)[0].text)

So nlp_test2 will be created using the benepar pipeline.

**aab** · Answer 2 · 2022-09-26T07:58:01+00:00

aab On 26 September 2022 at 07:58

This isn't possible because benepar stores the parse information for the whole doc in one internal object, which Span.as_doc isn't able to split up for just that span.

TechQA.

Spacy Span as_doc() Loses Components from Original Doc (Benepar)

There are 2 answers

Related Questions in PYTHON

Related Questions in NLP

Related Questions in SPACY

Related Questions in BENEPAR

Popular Questions

Trending Questions