[LON-CAPA-users] random_permutation, choosing the $seed
Stefan Dröschler
st.droeschler at ostfalia.de
Mon Sep 1 14:41:47 EDT 2014
Hi Peter,
> The long answer will follow.
here it is:
LON-CAPA uses Math::Random for random number functionality. Many of the (documented) functions for authors - which are essentially wrappers around Math::Random functions - require the author to pass in a seed. This seed is actually used as a phrase passed to Math::Random::set_seed_from_phrase().
set_seed_from_phrase() then generates the real seed from this phrase which is used to initialise the random number generator (RNG). The real seed consists of two integer components and represents the initial state of a linear congruential generator.
The way random number functionality is offered to authors in LON-CAPA requires to pass in a seed every time one of the functions is used. This is different to common usage of RNGs: the RNG is initialised only once and then generates numbers according to the promised distribution. So in LON-CAPA we effectively use the second state of a RNG every time we request RNG functions. Because of the way LCGs work, the distribution of “second states” depends on the distribution of initial states - garbage in, garbage out applies again.
With this in mind, let’s look at the data. The following graphs show density histograms of the two components of the seeds that are generated by set_seed_from_phrase(). Blue means low frequency, yellowish means high frequency of the respective combination.
The first histogram depicts the seed frequency when using your first approach ($seed = random(1,1E10,1)) for 10^6 samples.
[removed: it seems that the mailing list doesn’t support attached images. see http://public.ostfalia.de/~droescst/seedhist1.png]
One can clearly see that there’s an imbalance or bias. The desired/ideal result would be an area of equal color, corresponding to a uniform distribution in both components. This explains your observations.
If you look at phrtsd in randlib.c (the underlying library of Math::Random, phrtsd being the function phrase -> seed), you’ll find that the function expects the phrase to be a string of @SEEDCHARS (see previous mail). If you don’t use all of the characters (e.g. only decimals), you’ll end up with the situation depicted in the first histogram.
This problem is easy to solve: create your phrase from a random sampling of all the available chars. The second histogram depicts this approach (10^6 samples), which corresponds to the code I suggested earlier. One can still see a slight imbalance at the “borders”, which I can not explain at the moment. But the overall situation is much better.
[removed: it seems that the mailing list doesn’t support attached images. see http://public.ostfalia.de/~droescst/seedhist2.png]
Personally, I would not use random_permutation if a uniform distribution of permutations is a requirement (which in most use cases is not), because of the issues with “seed management”. Instead I’d use something along the lines of your shuffle or this:
sub permute2{
my @list = @_;
my @ret;
while(scalar @list){
push @ret, splice(@list, random(0,scalar @list,1), 1);
}
return @ret;
}
which should be a little more efficient.
Stefan
On 01 Sep 2014, at 12:45, Stefan Dröschler <st.droeschler at ostfalia.de> wrote:
> Hi Peter,
>
>> The latter shows a remarkable imbalance.
>> This seems to be a bias as it remains stable if choosing other randomisation.
> The short answer is: the RNG is not well-fed the way you generate the seeds.
> The bias results from the distribution of the “effective seeds”.
>
> A better albeit not perfect (!!) way would be to add:
>
> my @SEEDCHARS = split "", q{abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!@#$%^&*()_+[];:'\"<>?,./};
> sub gen_seed {
> my $ret = "";
> $ret .= $SEEDCHARS[random(0,$#SEEDCHARS, 1)] for (1..20);
> return $ret;
> }
>
> and replace
> my $seed = random( 1, 1E10, 1 );
> with
> my $seed = gen_seed();
>
> You’ll notice that the resulting permutations are much more uniformly distributed. (And you also might run into “code ran too long”).
> Note: this approach is using knowledge about the implementation of the underlying libraries.
>
> The long answer will follow.
>
> Stefan
>
>
> On 31 Aug 2014, at 17:52, Peter Dencker <dencker at math.uni-luebeck.de> wrote:
>
>>
>> Hi.
>>
>> The problem given below compares two procedures to permute lists: sub
>> 'shuffle' is a Fisher-Yates-shuffle procedure and sub 'permute' directly
>> uses the script function 'random_permutation'. The latter shows a
>> remarkable imbalance. This seems to be a bias as it remains stable if
>> choosing other randomization.
>>
>> All my tries to choose the $seed give a similar behavior.
>> What is the right method?
>>
>> - Peter
>>
>> <problem>
>>
>> <script type="loncapa/perl">
>>
>> our $N = 10000;
>>
>> sub shuffle {
>> my @values = @_;
>> return
>> if !@values;
>>
>> for my $i ( 0 .. $#values - 1 ) {
>> my $j = random( $i, $#values, 1 );
>> @values[ $i, $j ] = @values[ $j, $i ];
>> }
>> return @values;
>> }
>>
>> sub permute {
>> my @values = @_;
>> return
>> if !@values;
>>
>> my $seed = random( 1, 1E10, 1 );
>> return random_permutation( $seed, @values );
>> }
>>
>> our $tables;
>> for ( 'B' .. 'E' ) {
>> my @items = 'A' .. $_;
>> $tables .= '$N random permutations of '
>> . ( join q{}, @items ) . ' counted: <br />';
>> my %number;
>> for ( 1 .. $N ) {
>> $number{'permute'}{ join q{}, permute @items }++;
>> $number{'shuffle'}{ join q{}, shuffle @items }++;
>> }
>> $tables
>> .= '<table border=1><tr><td>permutation</td>'
>> . '<td>shuffle</td>'
>> . '<td>permute</td></tr>'
>> . '<tr><td align="center">'
>> . join '</td></tr><tr><td align="center">', map {
>> join '</td><td align="right">',
>> $_, $number{'shuffle'}{$_},
>> $number{'permute'}{$_};
>> } sort { $a cmp $b }
>> keys %{ $number{'shuffle'} };
>> $tables .= '</td></tr></table>';
>> $tables .= '<br />' x 2;
>> }
>>
>> </script>
>>
>> $tables
>>
>> <customresponse id="0r0">
>> <answer type="loncapa/perl">
>> return 'SUBMITTED';
>> </answer>
>> </customresponse>
>>
>> </problem>
>>
>> --
>> Dr. Peter Dencker
>> wissenschaftl. Mitarbeiter
>>
>> UNIVERSITÄT ZU LÜBECK
>> INSTITUT FÜR MATHEMATIK
>>
>> Ratzeburger Allee 160
>> 23562 Lübeck
>>
>> Tel +49 451 500 4254
>> Fax +49 451 500 3373
>> dencker at math.uni-luebeck.de
>>
>> www.math.uni-luebeck.de
>> _______________________________________________
>> LON-CAPA-users mailing list
>> LON-CAPA-users at mail.lon-capa.org
>> http://mail.lon-capa.org/mailman/listinfo/lon-capa-users
>
> _______________________________________________
> LON-CAPA-users mailing list
> LON-CAPA-users at mail.lon-capa.org
> http://mail.lon-capa.org/mailman/listinfo/lon-capa-users
More information about the LON-CAPA-users
mailing list