checking string presence in list gives TypeError: argument of type 'float' is not iterable

48 views Asked by At

I am running into a TypeError when checking a strings' presence in a list:

I checked similar issues such as the one in this thread and this one but i couldn't find a solution there.

I expect the code to calculate (one iteration of) the page rank of all documents inside "pagerank.txt, which looks like this this, but it runs into an error.

My full code:

def calc_pageranks(pagerank_file = "pagerank.txt", damping=0.9):
    pagerankScores = {}
    pagerankData = {} #dict of what files a file (the dict key) points to 
    with open(pagerank_file, "r") as f:
        for row in f:
            row = row.split()
            pagerankScores.update({row[0]:1}) #set starting pagerank for documents
            if len(row) == 1:
                pagerankData.update({row[0]:None})
            else:
                pagerankData.update({row[0]:row[1:]}) #pagerank graph
        for docName in pagerankData:
            pagerank = pagerankScores[docName]
            temp = 0
            for val in pagerankData.values():
                if val == None:
                    pass
                else:
                    if docName in val:
                        temp += pagerank / len(val)
                pagerank = (1 - damping) + damping * temp
                pagerankData.update({docName:pagerank})

    return pagerankScores, pagerankData

docName and val look like this: doc1.txt <class 'str'> ['doc2.txt', 'doc8.txt'] <class 'list'>

Full error message:

/Library/Frameworks/Python.framework/Versions/3.9/bin/python3 /Users/MacSuperior/Desktop/Coding/IR_system/search_engine.py
Traceback (most recent call last):
  File "/Users/MacSuperior/Desktop/Coding/IR_system/search_engine.py", line 99, in <module>
    print(calc_pageranks())
  File "/Users/MacSuperior/Desktop/Coding/IR_system/search_engine.py", line 92, in calc_pageranks
    if docName in val:
TypeError: argument of type 'float' is not iterable

Pagerank.txt:

doc1.txt doc2.txt doc8.txt
doc2.txt doc1.txt doc2.txt doc9.txt
doc3.txt doc4.txt
doc4.txt  doc1.txt doc10.txt
doc5.txt doc6.txt
doc6.txt
doc7.txt doc1.txt
doc8.txt doc9.txt doc10.txt
doc9.txt doc10.txt
doc10.txt doc9.txt
1

There are 1 answers

0
bn_ln On

Here's a couple of ideas to refactor to help eliminate typos.

def calc_pageranks(pagerank_file = "pagerank.txt", damping=0.9):
    pagerankScores = {}
    pagerankData = {} 
    
    with open(pagerank_file, "r") as f:
        for row in f:
            key, *values = row.split()
            pagerankData[key]  = values
        
        for docName in pagerankData:
            pagerank = (1-damping) + damping * sum((docName in value)/len(value) for value in pagerankData.values() if value)
            pagerankScores[docName] = pagerank
    
    return pagerankScores, pagerankData