Confusion regarding the inner workings of NumPy's SeedSequence

Question

Confusion regarding the inner workings of NumPy's SeedSequence

53 views Asked by Mel At 07 March 2024 at 06:08

In case it matters at all, I'm using Python 3.11.5 64-bit on a Windows 11 Pro desktop computer with NumPy 1.26.4.

In order to try to better understand what NumPy is doing behind the scenes when I ask for a np.random.Generator object from some given SeedSequence, I decided to try to reconstruct in pure Python what happens when I initialize a SeedSequence from a given entropy value.

Based on the source code for SeedSequence found here, my understanding of how uint32 overflow works, and the fact that (on my machine at least) np.dtype(np.uint32).itemsize is 4, i.e. XSHIFT, defined as np.dtype(np.uint32).itemsize * 8 // 2, is 16, I wrote the following code:

seed = int(input('Please enter a seed: '))
Entropy = seed
Spawn_key = ()
Pool_size = 8
N_children_spawned = 0
Pool = [0 for _ in range(Pool_size)]
Assembled_entropy = []
Ent = Entropy + 0
while Ent > 0:
    Assembled_entropy.append(Ent & 0xffffffff)
    Ent >>= 32
if not Assembled_entropy:
    Assembled_entropy = [0]

hash_const = 0x43b0d7e5
for i in range(Pool_size):
    if i < len(Assembled_entropy):
        Assembled_entropy[i] ^= hash_const
        hash_const *= 0x931e8875
        hash_const &= 0xffffffff
        Assembled_entropy[i] *= hash_const
        Assembled_entropy[i] &= 0xffffffff
        Assembled_entropy[i] ^= Assembled_entropy[i] >> 16
        Pool[i] = Assembled_entropy[i]
    else:
        value = hash_const
        hash_const *= 0x931e8875
        hash_const &= 0xffffffff
        value *= hash_const
        value &= 0xffffffff
        value ^= value >> 16
        Pool[i] = value
for i_src in range(Pool_size):
    for i_dst in range(Pool_size):
        if i_src != i_dst:
            Pool[i_src] ^= hash_const
            hash_const *= 0x931e8875
            hash_const &= 0xffffffff
            Pool[i_src] *= hash_const
            Pool[i_src] &= 0xffffffff
            Pool[i_src] ^= Pool[i_src] >> 16
            x = (0xca01f9dd * Pool[i_dst]) & 0xffffffff
            y = (0x4973f715 * Pool[i_src]) & 0xffffffff
            Pool[i_dst] = x - y
            Pool[i_dst] &= 0xffffffff
            Pool[i_dst] ^= Pool[i_dst] >> 16
for i_src in range(Pool_size, len(Assembled_entropy)):
    for i_dst in range(Pool_size):
        Assembled_entropy[i_src] ^= hash_const
        hash_const *= 0x931e8875
        hash_const &= 0xffffffff
        Assembled_entropy[i_src] *= hash_const
        Assembled_entropy[i_src] &= 0xffffffff
        Assembled_entropy[i_src] ^= Assembled_entropy[i_src] >> 16
        x = (0xca01f9dd * Pool[i_dst]) & 0xffffffff
        y = (0x4973f715 * Assembled_entropy[i_src]) & 0xffffffff
        Pool[i_dst] = x - y
        Pool[i_dst] &= 0xffffffff
        Pool[i_dst] ^= Pool[i_dst] >> 16
print(Pool)

I have copied the shell outputs of some test runs below.

Please enter a seed: 0
[595626433, 3558985979, 200295889, 3864401631, 3155212474, 198111058, 4047350828, 373757291]

Please enter a seed: 1
[2396653877, 491222160, 2441066534, 3196981647, 1764919720, 3210735412, 1132315803, 1197535761]

Please enter a seed: 123456789
[2161290507, 266876805, 2694113549, 3306969538, 3218948428, 3543586554, 886289367, 3129292100]

Please enter a seed: 123456789123456789
[2628723507, 610487362, 209721652, 1960674985, 3519121735, 1259052354, 2097159984, 3934338599]

Please enter a seed: 123456789123456789123456789123456789
[2988668238, 798946769, 2484899198, 1005350017, 2633831484, 343737596, 1402961265, 3184558744]

Please enter a seed: 123456789123456789123456789123456789123456789123456789123456789123456789
[431881030, 3789410928, 218849910, 879851040, 1423068736, 85390627, 3721593143, 198649564]

Please enter a seed: 123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789
[702225118, 2293461530, 514808704, 2115883586, 3179647446, 3197133803, 3807436730, 1822195906]

from numpy.random import SeedSequence
seed = int(input('Please enter a seed: '))
seedseq = SeedSequence(entropy=seed, spawn_key=[], pool_size=8, n_children_spawned=0)
print([int(value) for value in seedseq.pool])

However, providing those same values to the above version of the program, which calls NumPy's SeedSequence directly, gives very different results:

Please enter a seed: 0
[2043904064, 467759482, 3940449851, 2747621207, 4006820188, 4161973813, 800317807, 2622167125]

Please enter a seed: 1
[476219752, 3923368624, 2653737542, 2876255837, 1861759290, 3300511046, 3253139541, 2224879358]

Please enter a seed: 123456789
[480462800, 1421661229, 2686834002, 3365909768, 3295673516, 1830753151, 1249963727, 3680881655]

Please enter a seed: 123456789123456789
[3112345096, 1618497203, 2864025213, 3262672577, 379697145, 163816190, 1265228116, 2568065655]

Please enter a seed: 123456789123456789123456789123456789
[2197723902, 2868273012, 1547285866, 2772382071, 2016971656, 1130152919, 897020445, 135618137]

Please enter a seed: 123456789123456789123456789123456789123456789123456789123456789123456789
[3230290517, 251217303, 1180998335, 454107561, 4150025399, 1840013050, 1216833737, 89665521]

Please enter a seed: 123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789
[902839167, 3446715647, 2106916613, 1578536987, 595141342, 3126308643, 400300642, 3659109886]

What is going on here?

UPDATE: based on @OskarHoffman's answer, I have fixed my code. It is included here in case anybody is interested.

seed = int(input('Please enter a seed: '))
Entropy = seed
Spawn_key = ()
Pool_size = 8
N_children_spawned = 0
Pool = [0 for _ in range(Pool_size)]
Assembled_entropy = []
Ent = Entropy + 0
while Ent > 0:
    Assembled_entropy.append(Ent & 0xffffffff)
    Ent >>= 32
if not Assembled_entropy:
    Assembled_entropy = [0]

hash_const = 0x43b0d7e5
for i in range(Pool_size):
    if i < len(Assembled_entropy):
        temp = Assembled_entropy[i] ^ hash_const
        hash_const *= 0x931e8875
        hash_const &= 0xffffffff
        temp *= hash_const
        temp &= 0xffffffff
        temp ^= temp >> 16
        Pool[i] = temp
    else:
        value = hash_const
        hash_const *= 0x931e8875
        hash_const &= 0xffffffff
        value *= hash_const
        value &= 0xffffffff
        value ^= value >> 16
        Pool[i] = value
for i_src in range(Pool_size):
    for i_dst in range(Pool_size):
        if i_src != i_dst:
            temp = Pool[i_src] ^ hash_const
            hash_const *= 0x931e8875
            hash_const &= 0xffffffff
            temp *= hash_const
            temp &= 0xffffffff
            temp ^= temp >> 16
            x = (0xca01f9dd * Pool[i_dst]) & 0xffffffff
            y = (0x4973f715 * temp) & 0xffffffff
            Pool[i_dst] = x - y
            Pool[i_dst] &= 0xffffffff
            Pool[i_dst] ^= Pool[i_dst] >> 16
for i_src in range(Pool_size, len(Assembled_entropy)):
    for i_dst in range(Pool_size):
        temp = Assembled_entropy[i_src] ^ hash_const
        hash_const *= 0x931e8875
        hash_const &= 0xffffffff
        temp *= hash_const
        temp &= 0xffffffff
        temp ^= temp >> 16
        x = (0xca01f9dd * Pool[i_dst]) & 0xffffffff
        y = (0x4973f715 * temp) & 0xffffffff
        Pool[i_dst] = x - y
        Pool[i_dst] &= 0xffffffff
        Pool[i_dst] ^= Pool[i_dst] >> 16
print(Pool)

Original Q&A

There are 1 answers

**Oskar Hofmann** · Accepted Answer · 2024-03-07T12:18:35+00:00

The difference is in your second for-loop implementing the hashmix() function. You modify your Pool list at position i_src to calculate the value for y. The numpy implementation does not. It just copies the value Pool[i_src] (by using it as an argument for calling the hashmix function) and modifies that copy (discarding it afterwards).

So modifying that for-loop to:

for i_src in range(Pool_size):
    for i_dst in range(Pool_size):
        if i_src != i_dst:
            # work with new variable instead of modifying Pool[i_src]
            temp = Pool[i_src] ^ hash_const
            hash_const *= 0x931e8875
            hash_const &= 0xffffffff
            temp *= hash_const
            temp &= 0xffffffff
            temp ^= temp >> 16
            x = (0xca01f9dd * Pool[i_dst]) & 0xffffffff
            y = (0x4973f715 * temp) & 0xffffffff
            Pool[i_dst] = x - y
            Pool[i_dst] &= 0xffffffff
            Pool[i_dst] ^= Pool[i_dst] >> 16

I get the same results as the numpy-implementation.

TechQA.

Confusion regarding the inner workings of NumPy's SeedSequence

There are 1 answers

Related Questions in PYTHON

Related Questions in PYTHON-3.X

Related Questions in NUMPY

Related Questions in NUMPY-RANDOM

Popular Questions

Trending Questions