I have an OpenSearch domain with 40 data nodes. There is currently one index in the whole cluster. We are a delete-heavy cluster where we are constantly deleting HTML documents and adding new ones. We currently have about 200,000,000 searchable documents and 160,000,000 deleted documents. Would reindexing be a good idea? Also, are there tools you can use to estimate the time it would take to reindex a domain?
How often should you reindex an elasticsearch cluster?
126 views Asked by Sean At
1
There are 1 answers
Related Questions in ELASTICSEARCH
- How does Elasticsearch do attribute filtering during knn (vector-based) retrieval?
- Elastic python to extract last 1hr tracing
- Elastic search not giving result when Hyphen is used in search text
- FluentD / Fluent-Bit: Concatenate multiple lines of log files and generate one JSON record for all key-value from each line
- Elasticsearch functional_score with parameter of type string array as input not working
- Elasticsearch - cascading http inputs from Airflow API
- AWS Opensearch - Restore snapshot - Failed to parse object: unknown field [uuid] found
- cluster block exception for system index of kibana
- What settings are best for elasticsearch query to find full word and half word
- OpenSearch - Bulk inserting Million rows from Pandas dataframe
- unable access to kibana
- PySpark elastic load fail with error SparkContext is stopping with exitCode 0
- How to use query combined to KNN with ElasticSearch?
- Facing logstash compatibility issues
- If the same document is ingested at two different times, how to have the same id in Elasticsearch
Related Questions in OPENSEARCH
- "object tuple can't be used in 'await' expression" while using OpensearchVectorClient for llama-index
- the difference in terms of performance two types of update in opensearch
- How to use indices in OpenSearch Dashboard?
- AWS Opensearch - Restore snapshot - Failed to parse object: unknown field [uuid] found
- OpenSearch - Bulk inserting Million rows from Pandas dataframe
- Facing logstash compatibility issues
- OpenSearch: How to perform a term aggregation on top of a bucket aggregation?
- Handling mapper_parsing_exception in OpenSearch for dynamic data types from Amazon EventBridge
- Common Method Implementation for Elasticsearch and OpenSearch Java SDK
- Unified search scoring across ElasticSearch and OpenSearch cluster
- How do I get the total no of buckets for the bucket aggregation
- How can I connect to Opensearch Serverless in java?
- Opensearch Terms query wildcard
- Is it possible to create an ISM policy in Opensearch to delete documents in an index that are 30 days old
- how to pre-configure opensearch with a dashboard
Related Questions in AMAZON-OPENSEARCH
- AWS OpenSearch Serverless - alerting
- AWS Opensearch - Restore snapshot - Failed to parse object: unknown field [uuid] found
- Facing logstash compatibility issues
- AWS DMS with opensearch as target issue with DML update
- Opensearch Reindex with Date field format change
- OpenSearch Index Life Cycle Policy
- Opensearch Springboot : When starting App getting Jackson Error
- Opensearch Springboot : ElasticsearchDataConfiguration$ReactiveRestClientConfiguration Exception
- 403 Forbidden error when creating vector index Opensearch Serverless
- opensearch SearchRequest query logging
- In the Opensearch Dashboards, How do I grant access to the Discover page without granting Admin access?
- How to create custom sort in Open Search on a single parameter
- Inconsistent results when using KNN innerproduct scoring on OpenSearch depending on if a filter is set
- How can I access Amazon OpenSearch Cluster (deployed in VPC) from public securely
- Encounter 502 during the process of sink data to openSearch in flink, how to make flink retry it?
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Reindex is not the only option. If you can pause the documents ingestion for a few hours (or maybe days), you can run:
split on the index. If you split by a factor 6, you will have no more segments > 5GB and Elasticsearch will merge segments and at the same time free disk space of deleted documents. But this option requires a lot of free disk space. Please read carefully the documentation.
forcemerge on the index. I think you will have to specify a value for
max_num_segmentsand / oronly_expunge_deletes. Warning: Force merge operation cannot be cancelled and can takes hours on such big shards.Ideally for the future, you should try to avoid having only one big index because they are harder to operate. Usually, it's possible to distribute documents in multiple indexes by a switch (on first letters of HTML domain name for example).