Get all possible part-of-speech tags for a word Python

617 views Asked by At

Is there any way to make this code work with a column on a data frame that contains 1 word only? I just need all POS that a single word can have. Enclosed is an example of pack which can be a NN or VB.

import nltk
from nltk.corpus import brown
from collections import Counter, defaultdict

x = defaultdict(list)
for word, pos in brown.tagged_words()[1:1000000000]:
    if pos not in x[word]:      
        x[word].append(pos)    
print(x["pack"])

Out put: ['NN', 'VB']

2

There are 2 answers

0
morteza On

There is a similar question and other one which provides a solution, i hadn't test them, but

By Default, Linguistic toolkits use disambiguation techniques to assign only one tag to a word (according to this page - "section 4: DISAMBIGUATION BY WORD" and this question) bsaed on the context in which the word is used.

Also i have read this NLTK POS-tagging page, you can make several POS taggers, each of which have different N-gram size. so you can find several tags for a word according to a different neighbor of a word in a sentence or document. There are examples for context filtering in official NLTK website

Finally, you can invite stackoverflow top users in #part-of-speach to answer this question.

0
Esteban Echandi On

This is the answer that I was looking for.

from lemminflect import getInflection, getAllInflections, getAllInflectionsOOV, getAllLemmas

getAllLemmas('pack') {'NOUN': ('pack',), 'VERB': ('pack',)}

For the Dataframe, the word will be the column name of the df.

df['tag'] = df['word'].apply(lambda x: getAllLemmas(x))