I am trying to run the code for N-Gram Language Modelling with NLTK which is taken from https://www.geeksforgeeks.org/n-gram-language-modelling-with-nltk/. But it is throwing an error.
# generate frequency of n-grams
freq_bi = FreqDist(bigram)
freq_tri = FreqDist(trigram)
d = defaultdict(Counter)
for a, b, c in freq_tri:
if(a != None and b!= None and c!= None):
d[a, b] += freq_tri[a, b, c]
The error I got was as below,
`AttributeError Traceback (most recent call last)
<ipython-input-12-ae7c0728f2d6> in <module>
3 print(freq_tri[a,b,c])
4 if(a != None and b!= None and c!= None):
----> 5 d[a, b] += freq_tri[a, b, c]
AttributeError: 'int' object has no attribute 'items' `
The entire code is available at the site
The code on the geeksforgeeks is kinda outdated and lack a full working example =(
Lets walkthrough the code and go step-by-step instead of having some copy+paste solve it answer!
Download the data/model dependencies
Import the modules necessary to pre-rpcess the data from Reuters
Read the Reuter corpus and collect the n-grams
The geeksforgeeks code hardcoded the ngrams, but there is a cool
everygramsfeature https://stackoverflow.com/a/54177775/610569:Picking words from the salad
[out]:
What about some probabilities?
See https://www.kaggle.com/code/alvations/n-gram-language-model-with-nltk