numpy.random.choice with percentages not working in practice

Question

numpy.random.choice with percentages not working in practice

313 views Asked by Layla At 20 September 2021 at 23:14

I'm running python code that's similar to:

import numpy

def get_user_group(user, groups):
    if not user.group_id:
        user.group_id = assign(groups)
    return user.group_id

def assign(groups):
    for group in groups:
        ids.append(group.id)
        percentages.append(group.percentage) # e.g. .33

    assignment = numpy.random.choice(ids, p=percentages)
    return assignment

We are running this in the wild against tens of thousands of users. I've noticed that the assignments do not respect the actual group percentages. E.G. if our percentages are [.9, .1] we've noticed a consistent hour over hour split of 80% and 20%. We've confirmed the inputs of the choice function are correct and mismatch from actual behavior.

Does anyone have a clue why this could be happening? Is it because we are using the global numpy? Some groups will be split between [.9, .1] while others are [.33,.34,.33] etc. Is it possible that different sets of groups are interfering with each other?

We are running this code in a python flask web application on a number of nodes.

Any recommendations on how to get reliable "random" weighted choice?

Original Q&A

There are 1 answers

**DanielTuzes** · Answer 1 · 2021-10-23T18:44:06+00:00

This comment exhausted the limitations of a comment, hence I post it here.

The fact that your team was not able to reproduce the problem but got proper results is a sign that most probably NumPy can suit your needs. You can benefit from NumPy later, when you need efficiency, and it can be seen that efficiency is not your concern now.

A more complete code and infrastructure setup on your nodes would be helpful though. How often do you restart your Flask server? Where do you initialize the NumPy random generator? Consider the following code that creates a page /random which can be customized with size, e.g: localhost:5000/random?size=20:

from flask import Flask, request
import numpy
import pandas

... # your webapp

numpy.random.seed(0)

@app.route('/random', methods=['GET'])
def random():
    """Gives the desired number of random numbers
    with the state of the random number generator.
    """
    # DON'T PUT numpy.random.seed(0) HERE
    size = request.args.get('size')
    
    if size is not None:
        size = int(size)
    else:
        size = 1

    state = numpy.random.get_state()
    data = numpy.random.random(size=size)

    table = pandas.DataFrame(data=data)

    return table.to_html() + repr(state)

In this example, the state is initialized once after the Flask app is started. Whenever the /random page is requested, good random numbers are generated.

If you put the state initialization inside the function, it would surely cause unexpected distributions, bc you'll get the same random numbers (and same choices).

If you use multiple nodes and initialize with the same seed, your different nodes will produce the same choice again. In this case, use the unique node ids as seed values. If you restart the servers often, concatenate the restart ID or timestamp to the unique node ID. It is also a good idea to ensure that the timestamp is logged.

TechQA.

numpy.random.choice with percentages not working in practice

There are 1 answers

Related Questions in PYTHON

Related Questions in NUMPY

Related Questions in FLASK

Related Questions in RANDOM

Related Questions in NUMPY-RANDOM

Popular Questions

Trending Questions