CRF++/Wapiti include category of entire sentence as feature

228 views Asked by At

How can I represent category of sentence predicted from Naive Bayes as a feature in CRF++ or Wapiti?

For instance, if the sentence, Tumblr merges with Yahoo., is classified as Business, then while composing the training file for crf, where can I indicate the label Business as a feature? And how should then the template be modeled?

Should the train file be like this

Tumblr    business    ORG
merges    business    O
with     business    O
Yahoo    business    ORG

Or only include the category with the ORG label? How so? And the template file?

2

There are 2 answers

1
user2238884 On

Method 1: You can add business as a feature in the same way you have shown or you can simply write 1 instead of business. Similarly, for category sports you can add another column and the value in this column shall be 1 for words belonging to sports sentence. You'll have to add each column in the template file too, respectively.

U42:%x[0,1] #for business
U43:%x[0,2] #for sports

Method 2: Including category with ORG might not be a good idea because the same ORG can appear in different categories.

0
eldams On

As far as I know your train file is the only way to include sentence-level annotation, unless you'd consider adapting / implementing a CRF that takes into account sentence-level features.

If you have enough training data and a limited number of categories, this method would probably affect a low weight to sentence categories: it would only be used to distinguish named entities whenever they are ambiguous and when the computed NE categories probabilities are somehow close.

Best way would indeed be to train with/without this feature and see if it improves NER! Should be an interesting experimentation :)