I am using apache manifoldcf open source project for indexing documents from Google Drive into my solr. Often I have seen it is quite inconsistent in indexing the data. Also it takes time to reflect even small number of documents in solr . Do you really think its a good option to index Google Drive using it?
Is manifold cf a good option for Google Drive indexing?
416 views Asked by Saurabh Chaturvedi At
2
There are 2 answers
0
Shashank Raj
On
Manifold CF is good for crawling through file-system. You can go for Apache Nutch if you are interested in web crawling.
Yes ManifoldCF does take a lot of time to reflect a small number of document. Also it has very less documentation. Although, you can join the mailing list where you can ask questions to the lead developer "Karl". He is very helpful and usually answers withing a few hours.
P.S. :I have worked using ManifoldCF over a project for a span of 10 months.
Related Questions in INDEXING
- How to give index id to my uploaded json file in FastAPI?
- operator class "gin_trgm_ops" does not exist for access method "gin"
- what is it? my question is what's the meaning of img[img]
- If composite indexing created - indexing is called?
- Autocomplete not working for apache spark in java vscode
- Pyside6, tableView.selectedIndexes, list index out of range
- Indexing in ServiceNow Jelly Report not working
- Wordpress | Page indexing Page is not indexed: Redirect error
- Why does my attempt to print the index of my array ALWAYS return 0.00?
- jQuery - Click and enable Button without affecting other foreach Laravel arrays
- std:array indexing and operator[]
- ChartJS indexing for datapoints
- How to make Postgres GIN index work with jsonb_* functions?
- Using Closing Stock Balance as Opening Stock in subsequent line item
- Using MYSQL optimise table with innodb_optimize_fulltext_only and innodb_ft_num_word_optimize options, how do I know when it's finished?
Related Questions in SOLR
- Upgrading to Solr 9 failes due to NoSuchFileException
- regex to produce duplicate string with modification
- Apache atlas UI not showing up
- SAP Commerce Cloud multisite SOLR configuration
- Solr 9 punctuation issue
- Accessing solr web interface behind reverse proxy returns "Content Encoding Error"
- Getting NPE in apache SOLR 8.11.2 while doing atomic update using add-distinct from my java based appication
- how to specify the maximum number of clusters for the STC algorithm in Solr admin console?
- SOLR compatibility of the KNN query parser with function queries
- How to use Solr as retriever in RAG
- Multiple replacement / substitute NGgram string SOLR 8.6
- Solr updates are taking too long. The update requests are stalling
- solrCloud(9.5) integrates springboots, and adds user authentication, and there is no problem with queries, but the new one keeps reporting errors
- Why does Spring Data for Apache Solr run a count query before running the actual query?
- SOLR 'facet.prefix' is not working as expected
Related Questions in GOOGLE-DRIVE-API
- Can you use the Drive API to share a file in Google Drive to an oath2 subject rather than email address?
- Write R pin to Google Drive without authentication
- Google Drive Service Account gets googleapiclient.errors.HttpError: 401 "Request is missing required authentication credential" when authenticating
- How to set expiry dates for Google Drive
- Trying to fetch images from a Google Drive folder
- How to programmatically zip/download google drive folder?
- google drive file missing
- Trigger Warning: Mysterious Memory Spike on Google Drive Upload using Google Cloud Run
- can replace file in google drive by c#?
- Images stored on google drive are not loading on a website hosted on heroku
- FileNotFoundError while trying to load dataset from drive
- Search in GDrive only the first 5 topics
- Issue with Google Drive API Integration: Unexpected HTML Response from Backend in Production Environment
- Can Google Drive act as a DB for Mobile App?
- java.lang.NoSuchMethodError: 'boolean com.google.api.client.http.HttpTransport.isMtls()'
Related Questions in MANIFOLDCF
- Web crawl using manifoldcf
- Do I need to configure Authorities in ManifoldCF?
- Alfresco Community Edition, ManifoldCF and Elasticsearch to optimize full-text search
- ApacheManifoldCF elasticsearch output connector version compatibility
- Apache ManifoldCF: Get a history report for a repository connection over REST API
- ManifoldCF and Postgresql to crawl 1.5 Million of documents
- Manifoldcf documentum crawling slowness
- Extracting contents using Tika transformation - Manifold CF
- writing Mongo DB output connector for manifoldcf
- Word / PDF document snippet rendering in search
- Best way to crawl through file system and index
- Apache ManifoldCF TIKA
- Crawling Jira with Manifoldcf and Solr - String index out of range
- ManifoldCF ERROR JCIFS connector, crash agents
- manifold sharepoint elasticsearch
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
It is currently bit on slow side, due to response time and throttling constraints from google drive itself. But this limit can probably relieved if you buy additional bandwidth from google. With current setup if you are looking to index a large set of documents in google drive it may not be quick as you may expect