GithubHelp home page GithubHelp logo

bashtage / ng-numpy-randomstate Goto Github PK

View Code? Open in Web Editor NEW
43.0 43.0 14.0 18.18 MB

Numpy-compatible random number generator that supports multiple core psuedo RNGs and explicitly parallel generation.

License: Other

C 45.60% Python 53.44% C++ 0.88% Shell 0.08%

ng-numpy-randomstate's People

Contributors

bashtage avatar dapid avatar umireon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ng-numpy-randomstate's Issues

Use memory alignment so that SSE2 can be used with dSFMT

dSFMT fails using generic memory alignment since there are no guarantees that memory is aligned on a 16 byte boundary when allocated by python. This means that is will probably be necessary to directly allocate memory for the state so that it can be over provisioned to guarantee alignment.

This will require a non-trivial change and will probably necessitate adding __dealloc__ to the RandomState class as well as PRNG-specific state initializers/destructors, e.g. _alloc_state(rng_state* rng) and _dealloc_state(rng_state* rng). This path might also require adding a variable or two to contain the original pointer so that it can be freed.

PCG64

https://github.com/rkern/pcg64

Here is my promised C implementation of PCG64 that incorporates the necessary emulation of uint128_t arithmetic when not supported by the C compiler. Not entirely sure how useful this will be for you, but I think it's my preferred candidate for best all-rounder default PRNG.

It "only" supports the settable-sequence variant, but I think that's the one we're actually interested in and dropping the others eliminates a lot of cruft.

1.13 fails to build

fedora linux 25 (gcc-6.3.1)

./randomstate/pcg64.c:3033:14: error: request for member ‘high’ in something not a structure or union
   __pyx_v_out.high = __pyx_t_2;
              ^

...
./randomstate/pcg64.c:3046:14: error: request for member ‘low’ in something not a structure or union
__pyx_v_out.low = __pyx_t_2;

lognormal doc enhancement?

When specifying a lognormal, you may want to specify it in terms of the mean and stddev of the output distribution (not the underlying normal distribution). The following code allows you to find the mean and stddev parameters to supply to lognormal to get the desired results:

from math import log, exp
def sigma_x2 (mu_y, sigma_y):
    return log (1 + exp (2*(log(sigma_y) - log(mu_y))))

def mu_x (mu_y, sigma_y):
    return log(mu_y) - sigma_x2(mu_y, sigma_y)/2

the desired output mean and stddev are mu_y and sigma_y: the resulting mu_x and sigma_x2 are the mean and variance to supply to lognormal (use sqrt(sigma_x2) to pass to lognormal as sigma)

optimized uniform small integers?

I've been using my own wrapper to generate uniform random integers of small specific bit widths. The idea is to cache the output of a block of random_uintegers and then demux into a stream of small width values. For example, you might have a stream of 1-bit (0/1) values. The output array is any integer dtype.

I just did some benchmarking comparing my approach for 1-bit values compared to random_int, and see a quite large speedup.

Maybe something similar could be useful for randomstate?

Can you provide a function to generate random number from multiple streams

When I do my research, I need multiple streams ( the number of streams is larger than 1000) to form a single input data. I also want the generated streams can be reused in the future. However, if I generate the multiple streams as follows :
import randomstate.prng.xorshift1024 as rnds
def func(N, m)
rs = [rnds.RandomState(0) for _ in range(N)]
a0 = np.zeros([N,m])
for i in range(N):
rs[i].jump(i)
a0[i] = rs[i].normal(loc=0.0, scale=1.0, size=[m])
the total time of above code for (8192,256) is 160s. The speed is too slow. For (1024,1024) is about 3s, which is acceptable.
However if mt19937 is used, the speed is unacceptable.

Can you provide a function to generate random number from multiple streams. It should works likes follows.
import numpy as np
import randomstate.prng.mt19937 as rnds

%% four streams with seed 0
sd0 = np.zeros(4)
%% ms_seed to generate four streams
rs = rnds.ms_seed(sd0)

%% generate the 4X100 random numbers from four streams,
%% a0[1] is from the first stream, a0[2] is from the second streams, ... and so on
a0 = rs.normal_ms(size=(4,100))

It also seems that this function is should be done on the CUDA.

Lift "No Compatibility Guarantee"

I just discovered your project which is great but then I saw the "No Compatibility Guarantee" in the docstrings of most PRNG except MT19937, which is a deal breaker for me. The reason I want to use your project is to be able to have many independent streams without too much overhead.

We do mostly economic forecasts and need to evaluate several variants of the same model using the same seed. Currently I use a single MT stream and this is not good, because a change of modelling in one function disrupts the whole sequence and so results are often not comparable at all between variants (it's not just the component that is changed that get different random numbers).

Using different streams for the different parts would nicely solve my problem, but since the whole point is to make results more reproducible (so they can be compared) the lack of compatibility guarantee between versions is a no go for me. I understand you do not want to commit to it yet, but do you have any timeline to add the guarantee? Wouldn't it be possible to change the default generator but keep old versions around using an argument to specify which version you want?

Is there a generic randomstate class I can inherit from?

In this example, I'm using a specific RNG:

class shared_random_state (randomstate.prng.xorshift1024.RandomState):
    def __init__ (self, rs):
        randomstate.prng.xorshift1024.RandomState.__init__(self, rs)

    def __deepcopy__ (self, memo):
        return self

Is there a more generic way I can write this?

Is there a better forum (gitter?) where I can ask such questions?

I'd like to add a Cython API

Hi,

I need to repeatedly create a vector of a around 10^6 random doubles, as part of some neural network code I'm writing in Cython. I want to release the GIL around this function, so I need:

  1. A fast PRNG;
  2. With a permissive license;
  3. With a public C-level API.

It looks like you've got 1 and 2, so I'd like to see whether I can add 3 :). For my purposes, the ideal API would be something like this:

cdef void n_doubles_from_normal(double* result, int n, int seed) nogil:
   ...

I understand that the goal of this repository is to get integrated into numpy. But, would you accept a pull request with a Cython .pxd file, the nogil functions, and the appropriate changes to the setup.py?

help on RandomState functions show numpy examples

Just did the following

from randomstate.prng.xorshift128 import RandomState
r = RandomState()
help(r.normal)

and it finishes with

Examples
    --------
    Draw samples from the distribution:
    
mu, sigma = 0, 0.1 # mean and standard deviation
s = np.random.normal(mu, sigma, 1000)

Is this an inconsistency? Should it not be something along the lines of

from randomstate.prng.xorshift128 import RandomState
mu, sigma = 0, 0.1 # mean and standard deviation
prng = RandomState(seed)
s = prng.normal(mu, sigma, 1000)

Segfault when seeding some prng's with float64

When (wrongly..) using floats as a seed value, most PRNGs will segfault (mrg32k3a, xorshift128, xoroshiro128plus, xorshift1024, mlfg_1279_861 and mrg32k3a), some (dsfmt and mt19937) will fail with TypeError (Cannot cast array from dtype('float64') to dtype('int64') according to the rule 'safe') and some (pcg32, pcg64) will actually work as if int(seed) was given.

Maybe this behavior should be more uniform?
At least the segfaults (killing the interpreter) should not happen in any case.

A bug in MultithreadedRNG

I think there is a bug in the following initialization step of MultithreadedRNG mentioned in the document:

self._random_states = [rs]
for _ in range(1, threads):
    _rs = randomstate.prng.xorshift1024.RandomState()
    rs.jump()
    _rs.set_state(rs.get_state())
    self._random_states.append(_rs)

Namely, the first two random states start from the same state. You can check it by:

>>> mrng = MultithreadedRNG(0, seed=1, threads=2)
>>> rs0, rs1 = mrng._random_states
>>> (rs0.get_state()['state'][0] == rs1.get_state()['state'][0]).all()
True

This can be fixed, for example, by calling _rs.jump() instead of rs.jump() like this:

self._random_states = [rs]
for _ in range(1, threads):
    _rs = randomstate.prng.xorshift1024.RandomState()
    _rs.set_state(rs.get_state())
    _rs.jump()  # <--- fixed
    self._random_states.append(_rs)

Here is the whole code in a Jupyter notebook:
https://gist.github.com/tkf/20d298879ff9d4d52212b3350e2b7262

Thanks for the great package, BTW!

Jump is too slow for larger jump steps!

The time consumption of Jump(i) function seems to depend on the step number i.

Here is the sample code

import timeit
import randomstate.prng.xoroshiro128plus as rnds
def func0(N,offset):
rs1 = [rnds.RandomState(0) for _ in range(N)]
for i in range(N):
rs1[i].jump(i+offset)
t0 = timeit.timeit('func(16,0)', setup="from __main__ import func", number=1)
t1 = timeit.timeit('func(16,4000000)', setup="from __main__ import func", number=1)

t0 is about 0.006s
t1 is 1.06s and 50 times slower than t0. why?

`advance` and `randint`

When using advance in connection with uniform, I understand the behavior. Running

import randomstate as rs
g = rs.prng.pcg64.RandomState(0,0)
x = g.uniform(size=10)
g = rs.prng.pcg64.RandomState(0,0)
g.advance(5)
y = g.uniform(size=10)

creates random number in x and y so that x[5:] == y[:-5] is True for the selected range. When doing the same with randint

import randomstate as rs
g = rs.prng.pcg64.RandomState(0,0)
x = g.randint(256, size=10)
g = rs.prng.pcg64.RandomState(0,0)
g.advance(5)
y = g.randint(256, size=10)

I expected the same behavior, but this is not the case.

What is the method used to generate random numbers with randint? I expected one uniform random number to be used for one integral number, but that does not seen to be the case.

It could be nice to have more documentation on this. Or a better interface for the interaction between advance and the functions which create random values. I'm aware that it may only be possible for the functions that use a fixed number of "basic random values" per random value generated.

Thanks,
Mathias

is this normal?

Built from latest git master (on linux):

import randomstate

In [3]: import randomstate.prng

In [4]: import randomstate.prng.dsfmt

In [10]: import randomstate.prng.sfmt
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-10-0fd7f600ff12> in <module>()
----> 1 import randomstate.prng.sfmt

ModuleNotFoundError: No module named 'randomstate.prng.sfmt'

did random_uintegers go away?

I'm trying to reuse some of my older code, and now I'm getting:

AttributeError: 'randomstate.xorshift1024.RandomState' object has no attribute 'random_uintegers'

Did something change in the API?

Release 1.11 Tasks

Tasks

  • Sync with Numpy 1.11
  • Add per RNG jump/advance docs
  • Add next_stream or stream method where appropriate
  • Rename shim to interface
  • Refactor entropy initialization so that the seed is visible for pickling (required for next_stream feature)
  • Add ability to seed by array for all RNGs (ex PCG)
  • Add implementation details to class docstrings
  • Add multiple stream support for MLFG
  • Add multiple stream support for MRG32K3A
  • Add jump support to dSFMT
  • Refactor Cython DEFs to have use defaults.pxi so that only differences need to be defined
  • Refactor all DEFs and #defines to use the pattern RS_

Python 3.6 wheels

Could you please release wheels for Python 3.6 on PyPI? Installation from source takes really long. Thanks!

Can not be used on Debian 9

My python is 3.5 and Numpy is 1.12 from Debian source on Debian 9. When I use randomstate, the error msg is given as follows:

import randomstate.prng.mt19937 as rnd
Traceback (most recent call last):
File "init.pxd", line 1011, in numpy.import_array
RuntimeError: module compiled against API version 0xb but this version of numpy is 0xa

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.5/dist-packages/randomstate/init.py", line 3, in
from randomstate.prng.mt19937 import *
File "/usr/local/lib/python3.5/dist-packages/randomstate/prng/init.py", line 1, in
from .xorshift128 import xorshift128
File "/usr/local/lib/python3.5/dist-packages/randomstate/prng/xorshift128/init.py", line 1, in
from .xorshift128 import *
File "randomstate/xorshift128.pyx", line 28, in init randomstate.xorshift128
File "init.pxd", line 1013, in numpy.import_array
ImportError: numpy.core.multiarray failed to import

Release 1.14 Tasks

Tasks

  • Sync with Numpy 1.14
  • Add per-PRNG advance documents
  • Add next_stream or stream method where appropriate
  • Add multiple stream support for MLFG

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.