libmir / mir-random Goto Github PK

View Code? Open in Web Editor NEW

31.0 11.0 14.0 1.51 MB

Advanced Random Number Generators

Home Page: http://mir-random.libmir.org/

D 98.60% Meson 0.56% Dockerfile 0.02% Shell 0.07% Makefile 0.76%

rng random random-generation tinflex phobos dub mir-random

mir-random's People

Contributors

Stargazers

Watchers

Forkers

thewilsonator rjmcguire webdrake wilzbach egorst krox shigekikarita n8sh bausshf geod24 fccoelho drug007 faustodavid isabella232

mir-random's Issues

check @safe attr in unittests

opCall template issues when using NormalVariable

Hello, I was attempting to use normalVar when I ran into issues with the code not compiling.

template mir.random.variable.NormalVariable!double.NormalVariable.opCall cannot deduce function from argument types !()(MersenneTwisterEngine!(uint, 32LU, 624LU, 397LU, 31LU, 2567483615u, 11LU, 4294967295u, 7LU, 2636928640u, 15LU, 4022730752u, 18LU, 1812433253u)*), candidates are:
../../../.dub/packages/mir-random-2.1.5/mir-random/source/mir/random/variable.d(912,7):        mir.random.variable.NormalVariable!double.NormalVariable.opCall(G)(ref scope G gen) if (isSaturatedRandomEngine!G)
../../../.dub/packages/mir-random-2.1.5/mir-random/source/mir/random/variable.d(941,7):        mir.random.variable.NormalVariable!double.NormalVariable.opCall(G)(scope G* gen) if (isSaturatedRandomEngine!G)
/usr/bin/ldc2 failed with exit code 1.

These issues came when I tried using the following code.

import std.random;
import std.stdio;
import mir.random.variable;
Random gen = Random(unpredictableSeed);
double x = normalVar!double()(&gen);
writeln(x);

I referred to the documentation to try and get the code working, but I was unable to understand the examples using the undefined rne and threadLocalPtr. Furthermore, when I tried running the examples on the documentation, they seemed to fail. In short, I was wondering what the correct way to get a random double from the NormalVariable struct was.

next release

@n8sh Could you please tag the next v0.2.8 release, list changes since v0.2.5, and make announce in the forum?

use getrandom() on Linux

Since Linux 3.17 there is a getrandom syscall - it's pretty cool because it always to block until enough entropy is provided see e.g.

https://lwn.net/Articles/606141/

or:

http://man7.org/linux/man-pages/man2/getrandom.2.html

Small proof-of-concept for Linux:

/*
 * Flags for getrandom(2)
 *
 * GRND_NONBLOCK	Don't block and return EAGAIN instead
 * GRND_RANDOM		Use the /dev/random pool instead of /dev/urandom
 */
enum GRND_NONBLOCK = 0x0001;
enum GRND_RANDOM = 0x0002;
enum GETRANDOM = 318;

size_t syscall(size_t ident, size_t n, size_t arg1, size_t arg2)
{
    size_t ret;

    synchronized asm
    {
        mov RAX, ident;
        mov RDI, n[RBP];
        mov RSI, arg1[RBP];
        mov RDX, arg2[RBP];
        syscall;
        mov ret, RAX;
    }
    return ret;
}

void main() {
    long buf;
    syscall(GETRANDOM, cast(size_t) &buf, buf.sizeof, 0);

    import std.stdio;
    writeln(buf);
}

syscall is taken from syscall-d.

Also should we provide a way for the user to directly receive random data via OS APIs?

Values in flex/transformations unit test seem wrong

One of the tests in transformations.d has code like this:

        assert(iv.ltx.approxEqual(-8.75651e-27));
        assert(iv.lt1x.approxEqual(-1.02451e-24));
        assert(iv.lt2x.approxEqual(-1.18581e-22));

Note that all the expected values are negative and small.

However, when I actually print out these values immediately before the test, they are all positive (but otherwise the same):

iv.ltx  8.756510893e-27
iv.lt1x 1.024511801e-24
iv.lt2x 1.185806751e-22

How is it that the test is passing when the expected values have the wrong sign? Well, these values are all very small and appoxEqual with default arguments will accept any two values whose absolute difference is less than 1e5, so any two small value will always succeed regardless of their values.

Perhaps something like feqrel should be used instead: the values agree to at least 19 mantissa bits based on that function (if the sign is fixed), or just a purely "relative" test rather than dual relative + absolute tests like approxEqual.

unsecure `range` API

Hi. I'm getting a statistical artifact in a simulation of a quantile estimator. This artifact appears when I use mir-random, but not when I use Phobos.

At first I thought it was just an artifact (more or less expected) of an imperfect pseudo-random number generator. So I expected that if I changed the generator, either the artifact would go away or it would be replaced by different artifacts. But I tried two different mir generators (pcg32 and SplitMixEngine) and the artifact remains. Since this doesn't happen with Phobos, it suggests it might be an issue with mir-random.

Here's my simulation procedure:

Generate a random sample of size n (n=30 in my examples), using some distribution (at first I was using an ExponentialVariable, but I changed it to a UniformVariable, so I could easily compare it with Phobos).
Generate a random quantile between 0 and 1.
Estimate the quantile value using the sample. To do that, we sort the sample (to obtain the order statistics) and select the ordered sample value that best estimates the true quantile value (basically, the kth order statistic estimates q=k/(n+1)).
Since we know the actual distribution, obtain the true quantile corresponding to the estimated quantile value.
In one file (quantile.txt) output the pair (requested quantile, true estimated quantile)
In another file (quantile-diff.txt) output the pair (requested quantile, true estimated quantile - requested quantile)
Plot both files, using gnuplot

Here's an example of the kind of artifact I mean, a visible line among the expected dot dispersion. You might need to run the simulation more than once for the artifact to show, or for it to show in a different region.

Requested vs estimated quantile:

Quantile diff (same simulation run):

As part of my efforts to isolate this issue, one of the things I tried was to use a fixed order statistic (instead of the one that best approximates each quantile). Here's some example data I obtained using the order statistic x_{3:30} (D array index 2).

One trial without the artifact:

One trial with the artifact:

Another trial with the artifact. Notice both a different artifact strength, and a different estimate density (compare side-by-side with the previous example, for it to be easier to see):

Here's the gnuplot code:

set terminal png;

set output "quantile.png"
plot "quantile.txt" with dots notitle

set output "quantile-diff.png"
plot "quantile-diff.txt" with dots notitle

Here's the D simulation code: https://gist.github.com/luismarques/053e8f98bd0aba9657305541c8eabca9

Any ideas on what might be causing this?

Compiling mir-glas and gsl together as a dependency results in error.

I am working on a software for mixed-modelling. Currently, I have been using BLAS' gemm routine
and would like to switch to mir-glas. However, gsl library doesn't go well with mir-glas.

Here is a sample program that can be used to reproduce the errors.
gemm_example.

dub.json => link

If I remove, gsl from dependencies and libs in dub.json, the program compiles without any errors.

dub build --compiler=ldmd2 --parallel --force -v results in

https://gist.github.com/prasunanand/2bfd4b12e5fe43360bf0a0a90369af56

Other info

prasun@devUbuntu:~/dev/temp/gemm$ nm ~/.dub/packages/mir-cpuid-0.4.2/mir-cpuid/libmir-cpuid.a | grep cpuid_init
0000000000000000 T cpuid_init
prasun@devUbuntu:~/dev/temp/gemm$ nm ~/.dub/packages/mir-cpuid-0.4.2/mir-cpuid/libmir-cpuid.a | grep cpuid_dCache
0000000000000000 T cpuid_dCache
prasun@devUbuntu:~/dev/temp/gemm$ nm ~/.dub/packages/mir-cpuid-0.4.2/mir-cpuid/libmir-cpuid.a | grep cpuid_uCache
0000000000000000 T cpuid_uCache
prasun@devUbuntu:~/dev/temp/gemm$ ldmd2 -v
LDC - the LLVM D compiler (1.1.0):
  based on DMD v2.071.2 and LLVM 3.7.1
  built with LDC - the LLVM D compiler (0.17.1)
  Default target: x86_64-unknown-linux-gnu
  Host CPU: haswell
  http://dlang.org - http://wiki.dlang.org/LDC
...

ExponentialVariable seems broken

Maybe, I'm doing something wrong (help would be appreciated) but I'm getting nonsensical results when using an ExponentialVariable (EV).

Take an EV with lambda 0.5. We can confirm that it has a median of around 1.38629: http://www.wolframalpha.com/input/?i=median+of+exponential+distribution+with+lambda+0.5.

Therefore, I would expect that a ExponentialVariable!double(0.5) range would return approximately half of its values < 1.38629, and half > 1.38629. That doesn't seem to be the case.

import std.stdio;
import mir.random.algorithm;
import mir.random.variable;

void main()
{
    enum lambda = 0.5;
    enum median = 1.38629;
    enum n = 1000;
    auto ev = ExponentialVariable!double(lambda).range;

    int smaller;
    int equal;
    int larger;

    foreach(i; 0 .. n)
    {
        auto v = ev.front();
        ev.popFront();
 
        if(v < median)
            ++smaller;
        else if(v == median)
            ++equal;
        else
            ++larger;
    }
    
    auto nf = double(n);
    writefln("%s %s %s", smaller/nf, equal/nf, larger/nf);
}

(I used an explicit loop, instead of ranges, just to be sure I wasn't doing something unexpected).

Running this (https://run.dlang.io/is/r1p4vE) I get about 95% of values are smaller than the median.

ping Nathan Sashihara

Hi Nathan,

Could you please sent me your public profile if any or your email.

Kind regards,
Ilya

ping @n8sh

Q: Why doesn't unpredictableSeed() query the OS APIs for random data?

unpredictableSeed uses the time and process id to generate a random seed (see here).

Q: Why doesn't it let the OS do the work? (e.g. /dev/random or getrandom)

(comes from Dlang's issue tracker: https://issues.dlang.org/show_bug.cgi?id=16493)

Vose's O(1) discrete random sampler

There used to be Vose's O(1) discrete sampler in mir:

libmir/mir#259
https://github.com/libmir/mir/blob/da76cf406d06957e472b9ba90b4c90b917480cb9/source/mir/random/discrete.d
Paper: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.398.3339&rep=rep1&type=pdf

It was pretty cool because it allowed fast sampling once the initial setup cost of O(n) is paid.

@9il: I assume that you didn't port it because of its usage of the experimental Allocator?
It could be changed to use malloc and either @disable this(this) or allow copying with ref-counting.

Q: Is there interest in porting this to mir-random and if so how would you like the allocation of the working data to be done?

What is the current strategy to handle wrong input?

In ndvariable assert is used to signal about wrong input. But assert is not convenient very much 'cause it's too fatal. Cannot exceptions be used here?

typo in transform template

I believe there is a typo in the template below:

template transform(string f0, string f1, string f2, string c)
{
    import std.array : replace;
    enum raw = `_f0 *= _c;
                _f0 = copysign(exp(_f0), _c);
                _f2 = _c * _f1 * _f1 + _f2;
                _f2 *= _c * _f0;
                _f1 *= c;
                _f1 *= _f0;`;
    enum transform = raw.replace("_f0", f0).replace("_f1", f1).replace("_f2", f2).replace("_c", c);
}

Note the line _f1 *= c;, I believe it should be _f1 *= _c;.

It happens to work because everywhere it is called the input variable happens to be named 'c' so the replace("_c", c) isn't strictly necessary in those cases.

multinomial

any plans to include multinomial random variable?

MersenneTwisterEngine does not support non-standard word sizes

The MersenneTwisterEngine template distinguishes between the UIntType used to store generator state, and the word size w: for example, when UIntType is ulong (64-bit), the word size may still be only 32 or (say) 48 bits. In this case the returned variates should consist of only the lowest w bits of the generated UIntType value (which is not currently enforced) and the seeding process should also take this into account (currently only partially done: the first entry in the state array takes the lowest w bits of the provided seed, but the population of the other state array entries does not take this into account).

Comparison of output of mir.random's MersenneTwisterEngine and C++11's mersenne_twister_engine can be made using the following D unittest:

unittest
{
    alias MT(UIntType, uint w) = MersenneTwisterEngine!(UIntType, w, 624, 397, 31,
                                                        0x9908b0df, 11, 0xffffffff, 7,
                                                        0x9d2c5680, 15,
                                                        0xefc60000, 18, 1812433253);

    foreach (R; std.meta.AliasSeq!(MT!(uint, 32), MT!(ulong, 32), MT!(ulong, 48), MT!(ulong, 64)))
    {
        static if (R.wordSize == 48) static assert(R.max == 0xFFFFFFFFFFFF);
        auto a = R(R.defaultSeed);
        import std.stdio : writeln;
        writeln(a());
        writeln(a());
        writeln();
    }
}

... and the following C++11 program:

#include <cinttypes>
#include <iostream>
#include <random>
#include <type_traits>

template<class UIntType, size_t w>
void mt_result ()
{
    std::mersenne_twister_engine<
        UIntType, w, 624, 397, 31,
        0x9908b0df, 11, 0xffffffff, 7,
        0x9d2c5680, 15,
        0xefc60000, 18, 1812433253> gen;

    gen.seed(std::mt19937::default_seed);
    std::cout << gen() << std::endl;
    //for (int i = 0; i < 599; ++i)
    //  gen();
    std::cout << gen() << std::endl;
    std::cout << std::endl;
}

int main ()
{
    mt_result<uint32_t, 32>();
    mt_result<uint64_t, 32>();
    mt_result<uint64_t, 48>();
    mt_result<uint64_t, 64>();
}

Proposal: xorshift1024*φ as the default Random on 64-bit systems

xorshift1024* / xorshift1024*φ is faster and has better statistical properties than Mt19937_94 while occupying considerably less memory (136 bytes vs 2512 bytes).

I might suggest that the default mir.random.engine.Random should be something much more lightweight, but using xorshift1024* will not harm any existing code that assumes that Random is suitable for massive simulations. Rather, such code will be improved.

I have made a branch with the proposed change:
n8sh@7dc9438

Time for mir-random 3.0?

With Mir v0.7.0 released with extMul, I think it may be time for a new mir-random release. Agree?

http://docs.random.dlang.io/ does not work

Error: no property randomSlice for type MersenneTwisterEngine

Running the code below on Windows 10 with --compiler=ldc2

import mir.ndslice;
import mir.random : threadLocalPtr, Random;
import mir.random.variable : normalVar;

void main() {

    auto sample = threadLocalPtr!Random.randomSlice(normalVar, 15);
}

throws the following exception:

Error: no property randomSlice for type mir.random.engine.mersenne_twister.MersenneTwisterEngine!(ulong, 64LU, 312LU, 156LU, 31LU, 13043109905998158313LU, 29LU, 6148914691236517205LU, 17LU, 8202884508482404352LU, 37LU, 18444473444759240704LU, 43LU, 6364136223846793005LU)*

dub.json

"dependencies": {
    "mir-algorithm": "~>3.7.18",
    "mir-random": "~>2.2.11"
},

Do not dual-purpose meaning of `Mt19937` name

mir.random currently uses the name Mt19937 to mean the standard 32-bit generator when building for 32-bit targets and the MT19937-64 generator when building for 64-bit targets.

However, this is problematic terminology. MT19937 is explicitly defined to be the version of the generator that has a 32-bit unsigned integer as its data type, a 32-bit word size, a 624-word state, etc. etc.: see the opening paragraphs of https://en.wikipedia.org/wiki/Mersenne_Twister

The recommended course of action here is to delete the Mt19937_32 symbol and to use Mt19937 as the symbol only for the 32-bit generator. If this is acceptable I'll submit patches to this effect.

Note the changes proposed here do not affect the decision of what the default RNG Random should be.

PhobosRandom is missing from the docs tables

Make mir.random.algorithm's range work with N-dimensional distributions

The forum has a question about simulating multiple values from a multivariate normal distribution. This is possible with a for loop, but it seems not currently possible with mir.random.algorithm's range.

void main()
{
    import mir.random : Random, unpredictableSeed;
    import mir.random.ndvariable : MultivariateNormalVariable;
    import mir.random.algorithm : range;
    import mir.ndslice.slice : sliced;
    import std.range : take;

    auto mu = [10.0, 0.0].sliced;
    auto sigma = [2.0, -1.5, -1.5, 2.0].sliced(2,2);

    auto rng = Random(unpredictableSeed);
    auto sample = range!rng
        		(MultivariateNormalVariable!double(mu, sigma))
        		.take(10);
}

Check that all Saturated RNGs are safe and the trait and mark all other functions as safe

random number generation

Hi there,
I'm developing a time-reaction propagator for physical simulations based on the Gillespie algorithm.
See https://code.dlang.org/packages/gillespied.
I would like to provide an optional dependency on mir.random.
Managed to do this, however, while testing I found that for a particular case, test results don't fit the expected results.
See https://bitbucket.org/Sandman8/gillespied/issues/1/bug-with-real-type
The tests I drive include
std.random of phobos and mir.random
both for types float, double, real and size_t.
The test fails only for the combination of mir.random & real.
This is the reason, why I'm writing here. However, not sure, where the bug really is. Maybe you could have a look at this.

mir-random does not compile with latest ldc on raspberry pi

original libmir/mir#390

Thread local default RNG

We have version(D_betterC) now. This allows to declare a thread local default RNGs like they are in Phobos.

Tiny Mersenne Twister

This is an enhancement request concerning the Tiny Mersenne Twister algorithm :

As a request, could it be implemented / added to mir ?

Thanks,
Ezneh.

this._z is used uninitialized in MersenneTwisterEngine.opCall

The _z field of the MersenneTwisterEngine is uninitialized on struct creation:

mir-random/source/mir/random/engine/mersenne_twister.d

Line 107 in b518dbb

private Uint _z = void;

The constructor does not touch it directly, but calls popFront(), where in this first call, _z is used to set the value of z without _z itself having been initialized first:

mir-random/source/mir/random/engine/mersenne_twister.d

Line 153 in b518dbb

auto z = _z;

Presumably the intention/expectation is that _z should be initialized to data[index] in the constructor?

Random generation of integers

I struggle to find how to do a simple random integer generation.

import mir.random;
import mir.random.engine.mersenne_twister;
void main() {
    auto rng = Mt19937(123);
    auto n = rng.rand!int(10);
}

does not work.
So, whenever I want to generate a random integer I'd have to cast. Is there a Mir alternative to np.random.randint(1, 10)?

Design concerns for isRandomEngine template check

isRandomEngine makes use of the RandomEngine UDA in order to determine if a given functor is a random engine or not. Aside from terminology (technically a random engine is a pseudo-random generator, and this check does not account for pseudo-random vs. 'true' random), there are a number of concerns with this approach:

Forced dependencies. The isUniformRNG check in Phobos' std.random, which looks for a compile-time boolean flag isUniformRandom, means that 3rd-party code can implement compatible generators without having any direct dependency on std.random. For example, the code implemented at https://github.com/WebDrake/dxorshift does not import std.random except in unittests, where it is used to verify compatibility. By contrast any code that wishes to implement mir.random-compatible RNGs is forced to import the RandomEngine enum.
Forwarding of information via generic wrappers. The UDA will not be available if a generator is wrapped by functionality like (say) RefCounted or Unique. Currently enum constant fields will also not be forwarded (see: https://issues.dlang.org/show_bug.cgi?id=14830) but that at least seems possible in principle; it seems more dubious that UDAs would be reasonable to forward via a generic wrapper.

With this in mind it might be worth considering if a UDA is the right way to mark functors as random number generators.

Latest release 0.2.5 does not include PCG?

I can't find pcg.d inside zip for release 0.2.5...

Example code does not compile:

// Compile script below with (set randomRange to RandomRange):
// $ dub run --single random.d

#!/usr/bin/env dub
/+ dub.sdl:
	name "random"
        dependency "mir" version="0.22.0"
        dependency "mir-random" version="~>0.0.1"
        dependency "mir-math" version="~>0.0.1"
+/

import std.range, std.stdio;

import mir.ndslice;
import mir.random;
import mir.random.variable: NormalVariable;
import mir.random.algorithm: RandomRange;


void main(){
	auto rng = Random(unpredictableSeed);        // Engines are allocated on stack or global
	auto sample = rng                            // Engines are passed by reference to algorithms
	    .range(NormalVariable!double(0, 1))// Random variables are passed by value
	    .take(1000)                              // Fix sample length to 1000 elements (Input Range API)
	    .array;                                  // Allocates memory and performs computation

	writeln(sample);
}

//Got the error:

Performing "debug" build using dmd for x86_64.
mir-internal 0.0.1: target for configuration "library" is up to date.
mir-math 0.0.1: target for configuration "library" is up to date.
mir-random 0.0.1: target for configuration "library" is up to date.
random ~master: building configuration "application"...
random.d(20,6): Error: no property 'range' for type 'MersenneTwisterEngine!(ulong, 64LU, 312LU, 156LU, 31LU, 13043109905998158313LU, 29u, 6148914691236517205LU, 17u, 8202884508482404352LU, 37u, 18444473444759240704LU, 43u)'
dmd failed with exit code 1.

Add Linux getrandom syscall number for non-x86/x86_64

See #51 (comment)

Linux kernel headers define __NR_getrandom (and other syscall numbers). I have created an empty https://github.com/libmir/mir-linux-kernel for linux headers.

https://github.com/torvalds/linux can be used as reference.
We need an arch/<arch-name->/include/uapi/asm/unistd.h files and few small others that are imported by some archs.

D file name structure can looks like mir/linux/arch/<arch-name->/uapi/asm/unistd.d plus unified file mir/linux/asm/unistd.d that predifined arch version from predefined-versions. D does not support all architectures, but many of them are supported by LDC and D specification, see predefined-versions for details.

Multivariate normal distribution gives wrong variance of second component

I try to generate two dimensional normal distributed point cloud using covariance matrix:
[ 4 0
0 4 ]
then I calculate variance for each dimension and for second dimension variance is around 1.0 instead of 4.0

this source code is runnable on run.dlang.org:

/+dub.sdl:
dependency "mir-algorithm" version="~>3.7.8"
dependency "mir-random" version="~>2.2.8"
+/

import std;

void main()
{
    import mir.ndslice.slice: sliced;
    import mir.random;
    import mir.random.ndvariable : multivariateNormalVar;

    // given settings
    enum DataSize = 10_000;
    scope Random* gen = threadLocalPtr!Random;
    auto sigma = [4.0f, -0.0f, -0.0f, 4.0f].sliced(2,2);
    auto rv = multivariateNormalVar!float(sigma);
    
    // generate samples
    float[2][] samples;
    samples.length = DataSize;
    samples.each!((ref v) { rv(gen, v[]); });
    //writeln(samples);
    
    // calculate mean and variance
    const mean_y = samples.map!"a[1]".mean;
    writeln(mean_y);
    
    float var_y = 0;
    samples.each!(v=>var_y += (v[1]-mean_y)^^2);
    var_y /= DataSize-1;
    writeln(var_y);          // <====== variance is expected to be around 4.0, but it always is around 1.0
}

libmir / mir-random Goto Github PK

mir-random's People

Contributors

Stargazers

Watchers

Forkers

mir-random's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs