Recently came across this question on how to find the xth percentile for a given stream of numbers. I have a basic understanding of how this could be achieved if the stream was relatively small (can be stored into memory, sorted and the xth value can be found) but I was wondering how the percentile could be approximated if the stream of numbers is fairly large and the number of numbers is unknown.
How to approximate the xth percentile for a large unknown quantity of number
484 views Asked by Bruce At
1
There are 1 answers
Related Questions in ALGORITHM
- MCNP 6 - Doubts about cells
- Given partially sorted array of type x<y => first apperance of x comes before first of y, sort in average O(n)
- What is the algorithm behind math.gcd and why it is faster Euclidean algorithm?
- Purpose of last 2 while loops in the merge algorithm of merge sort sorting technique
- Dots and Boxes with apha-beta pruning
- What is the average and worst-case time complexity of my string searching algorithm?
- Building a School Schedule Generator
- TC problem 5-2:how to calculate the probability of the indicator random variable?
- LCA of a binary tree implemented in Python
- Identify the checksum algorithm
- Algorithm for finding a subset of nodes in a weighted connected graph such that the distance between any pair nodes are under a postive number?
- Creating an efficent and time-saving algorithm to find difference between greater than and lesser than combination
- Algorithm to find neighbours of point by distance with no repeats
- Asking code suggestions about data structure and algorithm
- Heap sort with multithreading
Related Questions in SAMPLING
- create random sample conditionally using a file
- using mstage() in R to draw stratified samples
- Monte Carlo Example using Accept Reject Method
- How to weight samples with sklearns's cross_validate for scoring only?
- Get specific number of samples from audio
- How to use balanced sampler for torch Dataset/Dataloader
- Is it efficient to pass model into a custom dataset to run model inference during training for sampling strategy?
- In statistical modeling: sampling with replacement or without replacement?
- Sampling with Replacement/Bootstrap in Oracle SQL
- Efficient Equidistant Point Sampling on Closed Contours in OpenCV
- Latin Hypercube sampling with constraints
- Sampling transformation - rexp vs rweibull
- Implementing Latin Hypercube sampling from skewed distributions in Java
- Plot of the two dataset having different sampling rate with proper alignment
- Ploting of the two dataset having different sampling rate
Related Questions in PERCENTILE
- Issue with Quantile Regression at the 75th Percentile in R
- percentitles aggregation in Elastic Search just gives percentile value as whatever is given in missing parameter
- Python - Pandas - What is the exact formula for percentile calculation by describe() method?
- Colouring background of dataframe cells using percentiles
- Calculating percentile of values from separate grouped dataframes
- How can I calculate percentile for every single data inside an xarray dataset
- How to Find and Use Percentiles in Stata
- Getting the right behaviour of Excel's PERCENTRANK.EXC on mySQL
- Percentile formula in Excel
- Add an aggregate over full table outside the buckets to every bucket row
- What is the mathematical way of calculating PERCENTILE_DISC() in oracle
- How to get highest and average 95th percentile over a period of time in Prometheus Query Language
- Query for calculating percentile based on average
- Aggregation-Percentile Clickhouse
- A neat way in R to get mean and 5 and 95 percentiles given a probability distribution in a string format?
Related Questions in APPROXIMATION
- Defining Lagrange linear basis function (P1) on P2 isoparametric triangular element
- How to find an Approximate Polynomial using Perceptron
- Computing a piecewise-linear approximation of a function of three variables
- approx function in R is it random?
- I'm trying to make a while loop that approximates the value of cos(x) so that its to within - or + 1-e10
- Approximating mathematical constant e in Python
- How Fast Can We Approximate Set Jaccard Scores?
- Sine curve to fit data cloud using C++
- Calculating the length of a cord line based on forces
- 3d triangle approximation with rectangular prisms
- Why does my binary search algorithm miss the optimal solution, and how can it be improved?
- Solving a system of matrix equations in Mathematica
- Weibull distribution
- Merge datasets in R by similiar (but not the same) row names
- Efficient gabriel graph generation wthout DT
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
I think you could use Reservoir sampling to select uniformly
kelements from the streamSand then approximate xth percentile ofSwith the xth percentile of theseknumbers.kdepends on how much memory do you have and how precise the approximation should be.EDIT
Here is a code example to test the solution:
The result is:
I got a pretty good approximation for every
xi used and currently i don't see why it can't be suitable for your case.