I am trying to write a program that searches if a list of words are contained in a text file. I was thinking of using the intersection of two sets to accomplish this. I am wondering if there is any other efficient way of achieving this?
Using python for text analytics
3.5k views Asked by Ebelechukwu Nwafor At
2
There are 2 answers
0
Utsav T
On
Hashing can also be used for a quick lookup.
Read the file and parse the text.
Keep storing each unseen(new) word in a hashtable.
Finally, check each of your word in your lookup list if it is present in the hashtable
Dictionaries in Python are implemented using hash tables. So, it could be a good choice. This could be a starter code -
dictionary = {}
lookup_list = ["word1","word2","word3"]
file_data = []
with open("myfile.txt","r") as f:
file_data = f.read().split()
for word in file_data:
if word not in dictionary.keys():
dictionary[word] = 1
else:
dictionary[word] += 1
f.close()
result = [i for i in lookup_list if i in dictionary.keys()]
print result
Related Questions in PYTHON
- How to store a date/time in sqlite (or something similar to a date)
- Instagrapi recently showing HTTPError and UnknownError
- How to Retrieve Data from an MySQL Database and Display it in a GUI?
- How to create a regular expression to partition a string that terminates in either ": 45" or ",", without the ": "
- Python Geopandas unable to convert latitude longitude to points
- Influence of Unused FFN on Model Accuracy in PyTorch
- Seeking Python Libraries for Removing Extraneous Characters and Spaces in Text
- Writes to child subprocess.Popen.stdin don't work from within process group?
- Conda has two different python binarys (python and python3) with the same version for a single environment. Why?
- Problem with add new attribute in table with BOTO3 on python
- Can't install packages in python conda environment
- Setting diagonal of a matrix to zero
- List of numbers converted to list of strings to iterate over it. But receiving TypeError messages
- Basic Python Question: Shortening If Statements
- Python and regex, can't understand why some words are left out of the match
Related Questions in TEXT
- Seeking Python Libraries for Removing Extraneous Characters and Spaces in Text
- How to increase quality of mathjax output?
- How to appropriately handle newlines and the escaping of them?
- How to store data with lots of subdata but keep easy and simple access in python
- Can I make this kind of radio button?
- I am findind it dificult to create a box containing text
- Replacing Text using Javascript
- How to set text inside a div using JavaScript and CSS
- How to get new text input after entering a password in a tab?
- How can I get my hero section to look like this?
- Find text and numbers Formatted: "Case: BE########" and format them, regardless of the number
- Auto style text in flutter
- Text analytics and Insights
- Combine an audio and a text file as one single file
- How to align side text and table horizontally in R-markdown
Related Questions in TEXT-MINING
- divide a column into multiple using regular expressions in R
- Preventing Automatic Fine-Tuning during Inference Loop in Python
- NER features in ML Text Mining
- I can't use unnest tokens properly when importing from excel
- Disambiguate a gene symbol from an English word
- Python code to list all the tables created and tables used to create it from sql script
- R package syuzhet does not work in Hungarian
- Error while creating the TDM - "No applicable method for 'meta' applied to an object of class "character""
- LDA Topic Modeling Producing Identical/Empty Topics
- Python NLTK text dispersion plot has y vertical axis is in backwards / reversed order
- problem with text find and replacement in python
- Extract multicolumn(?) PDFs in python
- replace two prefix with nothing in R
- Recommended way to extract "the representative" (not necessarily most frequent) 4-grams in a corpus? TF-IDF or
- Text Mining newspaper pdf in R?
Related Questions in TEXT-EXTRACTION
- Image cropping from AWS Textract's analyze_expense method
- Getting broken text while reading pdf written in eastern language in python
- Extracting text and comment from Google Doc Python
- Text extraction from pdf file bytes in flutter web
- Unable to extract text from image - Python
- Extract version-specific upgrade notice from readme text
- How to extract text from pdf with complex layouts using python?
- Python Docx - How to read the section (list of paragraphs, images, tables) that are linked to a word in another section using hyperlink
- How to segment the different attributes of a table in an image in key-value pairs using libraries like OpenCV?
- Can I selectively extract text from the table using Python-docx?
- Extract the text on the long text
- Extract words from cell that are exactly 10 characters long and contain number and letter
- Improve Customtkinter performance in data extraction with Pandas
- What is the most efficient way of extracting these integers from a string using SQL?
- Convert PDF to HTML using pdfminer?
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Quick & Easy Method
textblobis a library for text analysis.This part of the docs describes how to you obtain word and noun frequencies e.g.
Higher Performance, Slower Method
If you are looking for high performance and this is a big issue, perhaps try cleaning the file into a list of words with
regexand then get frequencies by usingCollections:Higher Performance Method for a Single Query
or for even higher performance for a single non-repeated query (if you are going to query lots of words, store as a
Counterobject):If you want even higher performance then use a high performance programming language!