No biological random number sensor box. Sensors of random and pseudo-random numbers. Gpsc in cryptography

There are three fundamentally different ways of obtaining numbers used as random: physical, tabular and algorithmic.

It is believed that the first attempt to create a physical generator of random numbers dates back to 3500 BC. and is associated with the board game senet, an ancient Egyptian secular entertainment. According to modern reconstructions of the rules of the game, four flat sticks were used to determine the number of points scored by each player and the sequence of moves in this game, one side of which was white, the other - black. The sticks were thrown at the same time and, depending on the combination of colors that fell, they determined additional opportunities for the players. At the beginning of the XX century. sequences of random numbers were simulated manually - by throwing a coin or a dice, laying out playing cards, using a roulette wheel, removing balls from an urn, etc. Modern physical (hardware) sensors are special devices that generate random numbers based on the transformation of random noises of natural or artificial origin (thermal noise, shot effect in vacuum tubes, radioactive decay, etc.). For example, a car ERNIE 4 (electronic random number indicator equipment),

1 Sometimes, although rarely, the distribution given by the table is referred to as standard 0 1 ... 8 9
0.1 0.1 ... 0.1 0.1 / with the help of which the winning numbers in the monthly British Lottery are determined, uses the thermal noise of transistors as a source of random variables. The physical method of obtaining a sequence of random numbers has features that are disadvantages for a simulation model. These include, first of all, the need for special measures to ensure the stability of the signal source, converted into random numbers, and the impossibility of reproducing the resulting sequence of random numbers.

Tables of random numbers are free from these disadvantages. Let us explain what is meant by a table of random numbers. Suppose we have implemented N independent experiments, which resulted in random numbers a, a 2, osdr. Writing these numbers down (in order of appearance and in the form of a rectangular table) will give what is called a random number table. It is used as follows. In the course of calculations, we may need either a random number or a random number. If a random number is required, then we can take any number from this table. The same applies to the case of a random integer - any digit can be selected for each digit. If we need a random number 0 k of successive digits cc, a 2, ao /, and assume that 8 = (Hoco ^ .- can be in a row, you can use any selection algorithm that does not depend on the values of the table digits, start from anywhere in the table, read in any direction.

The first tables of random numbers were obtained using roulettes. Such tables have been published several times in the form of books. One of the most famous tables, published in 1927, contained over 40,000 random numbers "randomly taken from census reports."

History reference

Leonard Tippett (Leonard Henry Caleb Tippett, 1902-1985) - English statistician, student of K. Pearson and R. Fisher. In 1965-1966. - President of the Royal Statistical Society. Some important results in the theory of extreme values are associated with his name, for example, the Fisher - Tippett distribution and the Fisher - Tippett - Gnedenko theorem.

Later, special devices (machines) were designed that mechanically generate random numbers. The first such machine was used in 1939 by M.J. Kendall and B. Babington-Smith to create tables containing 100,000 random digits. In 1955 the company RAND Corporation published well-known tables with a million random digits obtained by another machine of this type. The practical application of tables of random numbers is currently limited, as a rule, to problems that use random selection methods.

samples, for example, in sociological research or during statistical acceptance control of the quality of piece products for various purposes.

It is interesting

In Russia, GOST 18321-73 (ST SEV 1934-79) is in force, which establishes the rules for selecting units of products in a sample during statistical acceptance quality control, statistical methods of analysis and regulation of technological processes for all types of piece products for industrial purposes and consumer goods. In it, in particular, it is indicated that when selecting units of production in the sample “tables of random numbers according to ST SEV 546-77 are used”.

reapply; all numbers are easily reproducible; and the supply of numbers in such a sequence is limited. However, a sequence of pseudo-random numbers has an obvious advantage over a table: there are simple formulas for calculating a pseudo-random number, while getting each number takes only 3-5 commands, and the calculation program takes up only a few cells in the drive.

There are many algorithms for obtaining sequences of pseudo-random numbers; implementations of such algorithms, called sensors (generators) of pseudo-random numbers, are described in some detail in special literature. Here are some of the most well-known algorithms.

Tippett L. Random sampling numbers. London: Cambridge University Press, 1927.
See: D.E. Knut, The Art of Programming. 3rd ed. M.: Williams, 2000. T. 2. Ch. 3. Random numbers.

Deterministic PRNG

No deterministic algorithm can generate completely random numbers, it can only approximate some properties of random numbers. As John von Neumann said, “ anyone who has a weakness for arithmetic methods of obtaining random numbers is without a doubt a sin».

Any PRNG with limited resources sooner or later gets stuck in a loop - it starts repeating the same sequence of numbers. The length of the PRNG cycles depends on the generator itself and is on average about 2 n / 2, where n is the size of the internal state in bits, although linear congruent and LFSR generators have maximum cycles of the order of 2 n. If a PRNG can converge to too short cycles, the PRNG becomes predictable and unusable.

Most simple arithmetic generators, although fast, suffer from many serious disadvantages:

Period / periods too short.
Consecutive values are not independent.
Some bits are "less random" than others.
Uneven one-dimensional distribution.
Reversibility.

In particular, the mainframe algorithm turned out to be very poor, which raised doubts about the reliability of the results of many studies using this algorithm.

PRNG with entropy source or RNG

Along with the existing need to generate easily reproducible sequences of random numbers, there is also a need to generate completely unpredictable or simply completely random numbers. Such generators are called random number generators(RNG - eng. random number generator, RNG). Since such generators are most often used to generate unique symmetric and asymmetric keys for encryption, they are most often built from a combination of a strong PRNG and an external source of entropy (and it is this combination that is now commonly understood as an RNG).

Almost all major microchip manufacturers supply hardware RNGs with different entropy sources, using different methods to cleanse them of the inevitable predictability. However, at the moment the speed of collecting random numbers by all existing microchips (several thousand bits per second) does not match the speed of modern processors.

In personal computers, software RNG authors use much faster sources of entropy, such as sound card noise or processor clock counts. Before it became possible to read the values of the clock counter, the collection of entropy was the most vulnerable point of the RNG. This problem is still not fully resolved in many devices (eg smart cards) that remain vulnerable in this way. Many RNGs still use traditional (outdated) methods of collecting entropy, such as measuring the user's response (mouse movement, etc.), as, for example, in, or interactions between threads, as, for example, in Java secure random.

Examples of RNG and entropy sources

A few examples of RNGs with their entropy sources and generators:

	Entropy source	PRNG	Dignity	Flaws
/ dev / random on Linux	Processor cycle counter, but only collected during hardware interrupts	LFSR, with output hashing through	It “heats up” for a very long time, it can “get stuck” for a long time, or it works as a PRNG ( / dev / urandom)
Yarrow by Bruce Schneier	Traditional (deprecated) methods	AES -256 and	Flexible cryptographic design	“Heats up” for a long time, very small internal state, depends too much on the cryptographic strength of the selected algorithms, slow, applicable exclusively for key generation
Generator Leonid Yuriev	Sound card noise	?	Most likely a good and fast source of entropy	No independent, known to be cryptographically strong PRNG, available exclusively as Windows
Microsoft		Built into Windows, does not "get stuck"	Small inner state, easily predictable
	Interaction between threads		Java has no other choice yet, big internal state	Slow entropy collection
Chaos by Ruptor	Processor cycle counter, collected continuously	Hashing a 4096-bit internal state based on a nonlinear variant of the Marsaglia generator	Until the fastest of all, great inner state, does not "get stuck"
RRAND by Ruptor	Processor cycle counter	Encrypting internal state with stream cipher	Very fast, internal state of arbitrary size by choice, does not "get stuck"

PRNG in cryptography

A kind of PRNG are PRBG - generators of pseudo-random bits, as well as various stream ciphers. PRNG, like stream ciphers, consist of an internal state (usually from 16 bits to several megabytes in size), a function for initializing the internal state with a key, or seed(eng. seed), internal state update functions, and output functions. PRNGs are subdivided into simple arithmetic, broken cryptographic and cryptographic ones. Their common purpose is to generate sequences of numbers that cannot be distinguished from random ones by computational methods.

Although many cryptographically strong PRNGs or stream ciphers offer much more "random" numbers, such generators are much slower than conventional arithmetic generators and may not be suitable for any kind of research that requires the processor to be free for more useful computations.

For military purposes and in the field, only classified synchronous cryptographically strong PRNGs (stream ciphers) are used, block ciphers are not used. Examples of well-known cryptographically strong PRNGs are ISAAC, SEAL, Snow, the very slow theoretical algorithm of Blum, Blum and Shub, as well as counters with cryptographic hash functions or cryptographically strong block ciphers instead of an output function.

Hardware PRNG

Apart from the outdated, well-known LFSR generators that were widely used as hardware PRNGs in the 20th century, unfortunately, very little is known about modern hardware PRNGs (stream ciphers), since most of them are designed for military purposes and are kept secret. Almost all existing commercial hardware PRNGs are patented and also kept secret. Hardware PRNGs are limited by strict requirements for the consumed memory (most often the use of memory is prohibited), speed (1-2 clock cycles) and area (several hundred FPGAs - or

Due to the lack of good hardware PRNGs, manufacturers are forced to use the much slower but widely known block ciphers available at hand (Computer Review # 29 (2003)

Yuri Lifshits. Course "Modern problems of cryptography" Lecture 9: Pseudo-random generators

L. Barash. AKS algorithm for checking numbers for simplicity and finding constants for pseudo-random number generators

Zhelnikov Vladimir. Pseudo-random sequences of numbers // Cryptography from papyrus to computer M .: ABF, 1996.

random.org (eng.) - online service for generating random numbers

Cryptographic Random Numbers

Theory and Practice of Random Number Generation (eng.)

Zvi Gutterman, Benny Pinkas, Tzachy Reinman. Analysis of the Linux Random Number Generator

A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications NIST SP 800-22

The first algorithm for obtaining pseudo-random numbers was proposed by J. Neumann. It is called the mid-squares method.

Let a 4-digit number R 0 = 0.9876. Let's square it. We get an 8-digit number R 0 2 = 0.97535376. Let's choose 4 middle digits from this number and put R 1 = 0.5353. Then we square it again and extract the 4 middle numbers from it. We get R 2 etc. This algorithm has not proven itself. It turned out more than necessary small values of R i .

However, it is of interest to investigate the quality of this generator with a right-shifted digit selection group for R i 2 :

where a is the maximum value of the fraction for a given computer (for example, a = 8).

b is the number of decimal places in the number R i(for example 5).

INT (A) - integer part of a number.

For a = 8, b = 5, R 0 = 0.51111111 on PKZX-Spectrum, about 1200 non-repeating numbers are obtained.

Exercise: The study should be carried out with varying a, b, R 0 ... Find at what values a, b, R 0 the largest length L of a sequence of non-repeating numbers is obtained with “good” stochastic parameters. Establish whether the value of R affects 0 on the quality of the sensor. If it does, then determine the range of “acceptable” values of the parameter R 0 ... Present the results of testing the optimal variant of the values a, b, R 0 .

Multiplicative algorithms. Sensor # 2: Lehmer Linear Congruential Generator 1951.

where U i, M, C and p are integers.

AmodB- the remainder of the division completely AnaB,

A mod B = A-B * INT (A / B)

The generated sequence has a repeating cycle not exceeding p numbers.

The maximum period is obtained for C0, but such a generator gives poor stochastic results.

When C = 0, generators are called multiplicative. They have the best stochastic parameters. The formulas for their use are also called the method of deductions.

The most popular method for obtaining pseudo-random numbers is the deduction method according to the following formula:

where U i, M, p-integers, 0 i <1, 1U ip-1.

If you choose U 0 and M such that for R 0 = U 0 / p obtained an irreducible fraction and take p and M mutually prime, then all R i will be irreducible fractions of the form: R i= U i/ p.

We get the largest (but not more than p) length of a non-repeating sequence of numbers. The values of U 0 , p and M can be conveniently selected from primes.

Exercise: Investigate for what U 0 , p and M the length of a sequence of non-repeating numbers will be at least 10000 for "good" stochastic parameters. Determine if the value of R affects 0 for Mip = const on the statistical characteristics of the sensor. If it does, then determine the range of admissible values U 0 ... Present generator test results for optimal values of p, M and U 0 .

Sensor # 3: Modification of Korobov.

where p is a large prime, for example 2027, 5087, ...

M is an integer that meets the conditions:

n is an integer. Those. choose M close to p / 2 from the set of numbers M = p– 3 n.

For example, for p = 5087 we take n = 7. Because 3 7 = 2187, and 3 8 = 6561 will be greater than p. So: M = 5087-2187 = 2900.

We get numbers U i in the interval = and the number R i in the range (0,1).

Exercise: Choose the world at which the best statistical parameters of the sensor and the greatest length L. are obtained. Find out whether the value of R affects 0 on the stochastic characteristics of the sensor and, if it does, then determine the range of permissible values R 0 ... Present the sensor test results for optimal values of M, pi and R 0 .

Note that, ideally, the density curve of the distribution of random numbers would look like the one shown in Fig. 22.3. That is, in the ideal case, each interval contains the same number of points: N i = N/k , where N- the total number of points, k- the number of intervals, i= 1,…, k .

Rice. 22.3. Frequency diagram of random numbers,
generated by an ideal generator theoretically

It should be remembered that generating an arbitrary random number consists of two stages:

generating a normalized random number (that is, evenly distributed from 0 to 1);
converting normalized random numbers r i to random numbers x i, which are distributed according to the required user (arbitrary) distribution law or in the required interval.

Random number generators are divided into:

physical;
tabular;
algorithmic.

Physical RNG

An example of a physical RNG is: a coin (heads - 1, tails - 0); dice; a drum with an arrow divided into sectors with numbers; hardware noise generator (HS), which is used as a noisy thermal device, for example, a transistor (Fig. 22.4-22.5).

Rice. 22.4. Scheme of the hardware method for generating random numbers
Rice. 22.5. Diagram of obtaining random numbers by hardware method

The task "Generating random numbers using a coin"

Use a coin to generate a random 3-digit number uniformly distributed from 0 to 1. The precision is three decimal places.

The first way to solve the problem
Flip a coin 9 times, and if the coin comes up tails, then write down "0", if heads, then "1". So, let's say that as a result of the experiment, we got a random sequence 100110100.

Draw an interval from 0 to 1. Reading the numbers in sequence from left to right, split the interval in half and select each time one of the parts of the next interval (if it dropped out 0, then the left one, if it dropped 1, then the right one). Thus, you can get to any point in the interval, as accurately as you like.

So, 1 : the interval is halved - and, - the right half is selected, the interval is narrowed:. Next number, 0 : the interval is halved - and, - the left half is selected, the interval is narrowed:. Next number, 0 : the interval is halved - and, - the left half is selected, the interval is narrowed:. Next number, 1 : the interval is halved - and, - the right half is selected, the interval is narrowed:.

By the condition of the accuracy of the problem, the solution has been found: it is any number from the interval, for example, 0.625.

In principle, if you approach strictly, then the division of the intervals must be continued until the left and right boundaries of the found interval COINCIDE each other with an accuracy of the third decimal place. That is, from the point of view of accuracy, the generated number will no longer be distinguishable from any number from the interval in which it is located.

The second way to solve the problem
Let's split the resulting binary sequence 100110100 into triads: 100, 110, 100. After converting these binary numbers to decimal we get: 4, 6, 4. Substituting "0." in front, we get: 0.464. This method can only get numbers from 0.000 to 0.777 (since the maximum that can be "squeezed" out of three binary digits is 111 2 = 7 8) - that is, in fact, these numbers are represented in octal number system. For translate octal numbers in decimal we will execute the representation:
0.464 8 = 4 · 8 –1 + 6 · 8 –2 + 4 · 8 –3 = 0.6015625 10 = 0.602 10.
So, the required number is equal to: 0.602.

Tabular RNG

Tabular RNGs use specially compiled tables containing verified uncorrelated, that is, independent of each other, numbers as a source of random numbers. Table 22.1 shows a small fragment of such a table. By traversing the table from left to right from top to bottom, you can get random numbers uniformly distributed from 0 to 1 with the required number of decimal places (in our example, we use three decimal places for each number). Since the numbers in the table do not depend on each other, the table can be traversed in different ways, for example, from top to bottom, or from right to left, or, say, you can select numbers that are in even positions.

Table 22.1.
Random numbers. Evenly
distributed from 0 to 1 random numbers

Random numbers								Evenly distributed from 0 to 1 random numbers
9	2	9	2	0	4	2	6	0.929
9	5	7	3	4	9	0	3	0.204
5	9	1	6	6	5	7	6	0.269

The advantage of this method is that it gives truly random numbers, since the table contains verified uncorrelated numbers. Disadvantages of the method: it takes a lot of memory to store a large number of digits; great difficulties in generating and checking such tables, repetitions when using a table no longer guarantee the randomness of the numerical sequence, and therefore the reliability of the result.

There is a table containing 500 absolutely random verified numbers (taken from the book by I. G. Venetsky, V. I. Venetskaya "Basic mathematical and statistical concepts and formulas in economic analysis").

Algorithmic RNG

The numbers generated using these RNGs are always pseudo-random (or quasi-random), that is, each subsequent generated number depends on the previous one:

r i + 1 = f(r i) .

Sequences made up of such numbers form loops, that is, there is necessarily a cycle that repeats an infinite number of times. The repeating cycles are called periods.

The advantage of RNG data is speed; generators practically do not require memory resources, they are compact. Disadvantages: the numbers cannot be fully called random, since there is a dependence between them, as well as the presence of periods in the sequence of quasi-random numbers.

Consider several algorithmic methods for obtaining an RNG:

middle squares method;
method of middle products;
mixing method;
linear congruent method.

Mean square method

There is some four-digit number R 0. This number is squared and entered into R one . Further from R 1 is taken the middle (four middle digits) - a new random number - and written in R 0. Then the procedure is repeated (see fig. 22.6). Note that in fact, it is not necessary to take as a random number ghij, a 0.ghij- with a zero and a decimal point assigned to the left. This fact is reflected as in Fig. 22.6 and in subsequent similar figures.

Rice. 22.6. Mean squares scheme

Disadvantages of the method: 1) if at some iteration the number R 0 becomes equal to zero, then the generator degenerates, so the correct choice of the initial value is important R 0; 2) the generator will repeat the sequence through M n steps (at best), where n- digit capacity R 0 , M- the base of the number system.

For example, in Fig. 22.6: if number R 0 will be represented in binary notation, the sequence of pseudo-random numbers will be repeated in 2 4 = 16 steps. Note that repetition of the sequence can occur earlier if the initial number is not chosen well.

The method described above was proposed by John von Neumann and dates back to 1946. Since this method proved to be unreliable, it was quickly abandoned.

Method of middle products

Number R 0 is multiplied by R 1, from the obtained result R 2 extract the middle R 2 * (this is another random number) and multiplied by R one . All subsequent random numbers are calculated using this scheme (see Fig. 22.7).

Rice. 22.7. Method of middle products

Stirring method

The shuffle method uses operations to cyclically shift the contents of a cell to the left and right. The idea of the method is as follows. Let the cell store the seed R 0. Cyclically shifting the contents of the cell to the left by 1/4 of the cell length, we get a new number R 0 *. Likewise, cyclically shifting the contents of a cell R 0 to the right by 1/4 of the cell length, we get the second number R 0 **. Sum of numbers R 0 * and R 0 ** gives a new random number R one . Further R 1 is entered in R 0, and the entire sequence of operations is repeated (see Figure 22.8).

Rice. 22.8. Mixing method diagram

Please note that the number resulting from the summation R 0 * and R 0 **, may not fit completely in the cell R one . In this case, extra digits should be discarded from the received number. Let us explain this for Fig. 22.8, where all cells are represented by eight binary digits. Let R 0 * = 10010001 2 = 145 10 , R 0 ** = 10100001 2 = 161 10 , then R 0 * + R 0 ** = 100110010 2 = 306 10 ... As you can see, the number 306 occupies 9 digits (in the binary number system), and the cell R 1 (like R 0) can hold a maximum of 8 bits. Therefore, before entering the value into R 1 it is necessary to remove one "extra", the leftmost bit from the number 306, as a result of which in R 1 will go not 306, but 00110010 2 = 50 10. Also note that in languages such as Pascal, the "truncation" of extra bits when a cell is overflowed is done automatically in accordance with the specified type of variable.

Linear congruent method

The linear congruent method is one of the simplest and most commonly used procedures for simulating random numbers. This method uses the mod ( x, y), which returns the remainder of the first argument divided by the second. Each subsequent random number is calculated based on the previous random number using the following formula:

r i+ 1 = mod ( k · r i + b, M) .

A sequence of random numbers obtained using this formula is called linear congruent sequence... Many authors call a linear congruent sequence for b = 0 multiplicative congruent method, and at b ≠ 0 mixed congruent method.

For a high-quality generator, you need to choose suitable coefficients. It is necessary that the number M was quite large, since the period cannot have more M elements. On the other hand, the division used in this method is a rather slow operation, so it would be logical for a binary computing machine to choose M = 2 N, since in this case finding the remainder of the division is reduced inside the computer to a binary logical operation "AND". It is also common to choose the largest prime number M less than 2 N: in the special literature it is proved that in this case the least significant bits of the resulting random number r i+ 1 behave as randomly as the older ones, which has a positive effect on the entire sequence of random numbers as a whole. An example is one of Mersenne numbers equal to 2 31 - 1, and thus M= 2 31 - 1.

One of the requirements for linear congruent sequences is the largest possible period length. The length of the period depends on the values M , k and b... The theorem that we present below allows us to determine whether it is possible to achieve a period of maximum length for specific values M , k and b .

Theorem... Linear congruent sequence defined by numbers M , k , b and r 0, has a period of length M if and only if:

the numbers b and M mutually simple;
k- 1 multiple p for every simple p which is a divisor M ;
k- 1 multiple of 4 if M multiple of 4.

Finally, let's conclude with a couple of examples of using the linear congruential method to generate random numbers.

It was found that a series of pseudo-random numbers generated from the data from example 1 will be repeated every M/ 4 numbers. Number q is set arbitrarily before starting the calculations, but it should be borne in mind that the series gives the impression of being random for large k(which means that q). The result can be slightly improved if b odd and k= 1 + 4 q - in this case, the row will be repeated every M numbers. After a long search k the researchers settled on the values 69069 and 71365.

A random number generator using the data from Example 2 will produce random non-repeating numbers with a period of 7 million.

The multiplicative method for generating pseudo-random numbers was proposed by D. H. Lehmer in 1949.

Checking the quality of the generator

The quality of the entire system and the accuracy of the results depend on the quality of the RNG. Therefore, the random sequence generated by the RNG must satisfy a number of criteria.

The checks performed are of two types:

checks for uniformity of distribution;
checks for statistical independence.

Distribution uniformity checks

1) RNG should produce close to the following values of statistical parameters characteristic of a uniform random law:

2) Frequency test

The frequency test allows you to find out how many numbers fall into the interval (m r σ r ; m r + σ r) , that is (0.5 - 0.2887; 0.5 + 0.2887) or, ultimately, (0.2113; 0.7887). Since 0.7887 - 0.2113 = 0.5774, we conclude that in a good RNG, about 57.7% of all dropped random numbers should fall into this interval (see Fig. 22.9).

Rice. 22.9. Frequency diagram of an ideal RNG
in case of checking it for a frequency test

It is also necessary to take into account that the number of numbers that fall into the interval (0; 0.5) should be approximately equal to the number of numbers that fall into the interval (0.5; 1).

3) Chi-square test

Chi-square test (χ 2 test) is one of the most famous statistical tests; it is the main method used in combination with other criteria. The chi-square test was proposed in 1900 by Karl Pearson. His remarkable work is regarded as the foundation of modern mathematical statistics.

For our case, the chi-square test will allow us to find out how much the real The RNG is close to the RNG standard, that is, whether it meets the requirement of uniform distribution or not.

Frequency diagram reference The RNG is shown in Fig. 22.10. Since the distribution law of the reference RNG is uniform, the (theoretical) probability p i hitting numbers in i th interval (all these intervals k) is equal to p i = 1/k ... And thus, in each of k intervals will fall smooth on p i · N numbers ( N Is the total number of generated numbers).

Rice. 22.10. Frequency diagram of the reference RNG

A real RNG will produce numbers distributed (and not necessarily evenly!) k intervals and each interval will include n i numbers (in the sum n 1 + n 2 + ... + n k = N ). How can we determine how good the tested RNG is and how close to the reference one? It is quite logical to consider the squares of the differences between the received number of numbers. n i and "reference" p i · N ... Let's add them, and as a result we get:

χ 2 exp. = ( n 1 - p one · N) 2 + (n 2 - p 2 N) 2 + ... + ( n k p k · N) 2 .

It follows from this formula that the smaller the difference in each of the terms (and hence the smaller the value of χ 2 exp.), The stronger the distribution law of random numbers generated by the real RNG tends to be uniform.

In the previous expression, each of the terms is assigned the same weight (equal to 1), which in fact may not correspond to reality; therefore, for the chi-square statistic, it is necessary to normalize each i-th term by dividing it by p i · N :

Finally, we write the resulting expression more compactly and simplify it:

We have obtained the value of the chi-square test for experimental data.

Table 22.2 are given theoretical chi-square values (χ 2 theory), where ν = N- 1 is the number of degrees of freedom, p Is the user-defined confidence level that indicates how much the RNG should satisfy the uniform distribution requirements, or p this is the probability that the experimental value of χ 2 exp. will be less than the tabulated (theoretical) χ 2 theory. or equal to him.

Table 22.2.
Some percentage points of the χ 2 distribution

	p = 1%	p = 5%	p = 25%	p = 50%	p = 75%	p = 95%	p = 99%
ν = 1	0.00016	0.00393	0.1015	0.4549	1.323	3.841	6.635
ν = 2	0.02010	0.1026	0.5754	1.386	2.773	5.991	9.210
ν = 3	0.1148	0.3518	1.213	2.366	4.108	7.815	11.34
ν = 4	0.2971	0.7107	1.923	3.357	5.385	9.488	13.28
ν = 5	0.5543	1.1455	2.675	4.351	6.626	11.07	15.09
ν = 6	0.8721	1.635	3.455	5.348	7.841	12.59	16.81
ν = 7	1.239	2.167	4.255	6.346	9.037	14.07	18.48
ν = 8	1.646	2.733	5.071	7.344	10.22	15.51	20.09
ν = 9	2.088	3.325	5.899	8.343	11.39	16.92	21.67
ν = 10	2.558	3.940	6.737	9.342	12.55	18.31	23.21
ν = 11	3.053	4.575	7.584	10.34	13.70	19.68	24.72
ν = 12	3.571	5.226	8.438	11.34	14.85	21.03	26.22
ν = 15	5.229	7.261	11.04	14.34	18.25	25.00	30.58
ν = 20	8.260	10.85	15.45	19.34	23.83	31.41	37.57
ν = 30	14.95	18.49	24.48	29.34	34.80	43.77	50.89
ν = 50	29.71	34.76	42.94	49.33	56.33	67.50	76.15
ν > 30	ν + sqrt (2 ν ) · x p+ 2/3 x 2 p- 2/3 + O(1 / sqrt ( ν ))
x p =	–2.33	–1.64	–0.674	0.00	0.674	1.64	2.33

Considered acceptable p from 10% to 90%.

If χ 2 exp. much more than χ 2 theory. (that is p- great), then the generator does not satisfy uniform distribution requirement, since the observed values n i go too far from theoretical p i · N and cannot be considered random. In other words, the confidence interval is so large that the constraints on the numbers become very loose, the requirements on the numbers are weak. In this case, a very large absolute error will be observed.

Even D. Knut in his book "The Art of Programming" noted that having χ 2 exp. small, too, in general, is not good, although it seems, at first glance, wonderful from the point of view of uniformity. Indeed, take a series of numbers 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, ... - they are ideal from the point of view of uniformity, and χ 2 exp. will be practically zero, but you are unlikely to recognize them as random.

If χ 2 exp. much less than χ 2 theory. (that is p- little), then the generator does not satisfy the requirement of a random uniform distribution, since the observed values n i too close to theoretical p i · N and cannot be considered random.

But if χ 2 exp. lies in a certain range, between two values of χ 2 theor. which correspond, for example, p= 25% and p= 50%, then we can assume that the values of random numbers generated by the sensor are completely random.

In addition, it should be borne in mind that all values p i · N must be large enough, for example, more than 5 (found out empirically). Only then (with a sufficiently large statistical sample) can the experimental conditions be considered satisfactory.

So, the verification procedure is as follows.

Statistical independence tests

1) Check for the frequency of occurrence of a digit in a sequence

Let's look at an example. The random number 0.2463389991 consists of the digits 2463389991, and the number 0.5467766618 consists of the digits 5467766618. Combining the sequences of digits, we have: 24633899915467766618.

It is clear that the theoretical probability p i fallouts i-th digit (from 0 to 9) is 0.1.

2) Checking the appearance of series of identical numbers

Let us denote by n L number of series of consecutive digits of length L... Everything needs to be checked L from 1 to m, where m Is a user-specified number: the maximum occurring number of identical digits in a series.

In the example "24633899915467766618" 2 series of length 2 (33 and 77) were found, that is n 2 = 2 and 2 series 3 long (999 and 666), that is n 3 = 2 .

The probability of occurrence of a series with a length of L is equal to: p L= 9 10 - L (theoretical). That is, the probability of a series of one character in length is: p 1 = 0.9 (theoretical). The probability of a series of two characters in length is: p 2 = 0.09 (theoretical). The probability of a streak of three characters in length is: p 3 = 0.009 (theoretical).

For example, the probability of a series of one character in length is p L= 0.9, since there can be only one character out of 10, and there are 9 characters in total (zero does not count). And the probability that two identical symbols "XX" will occur in a row is 0.1 · 0.1 · 9, that is, the probability of 0.1 that the symbol "X" will appear in the first position is multiplied by the probability 0.1 that the same symbol will appear in the second position "X" and multiplied by the number of such combinations 9.

The frequency of the appearance of the series is calculated according to the previously analyzed "chi-square" formula using the values p L .

Note: the generator can be checked many times, however the checks are not complete and do not guarantee that the generator produces random numbers. For example, a generator issuing the sequence 12345678912345 ... will be considered ideal during checks, which is obviously not entirely true.

In conclusion, we note that the third chapter of Donald E. Knuth's book "The Art of Programming" (volume 2) is completely devoted to the study of random numbers. It explores various methods for generating random numbers, statistical criteria for randomness, and the conversion of uniformly distributed random numbers to other types of random variables. More than two hundred pages have been devoted to the presentation of this material.

Getting and converting random numbers.

There are two main ways to get random numbers:

1) Random numbers are generated by a special electronic attachment (random number generator) installed on a computer. The implementation of this method requires almost no additional operations, except for accessing the random number generator.

2) Algorithmic method - based on the formation of random numbers in the machine itself by means of a special program. The disadvantage of this method is the additional consumption of computer time, since in this case the machine performs the operations of the electronic device itself.

The program for generating random numbers with a given distribution law can be cumbersome. Therefore, random numbers with a given distribution law are usually obtained not directly, but by transforming random numbers that have some kind of standard distribution. Often such a standard distribution is a uniform distribution (ease of obtaining and convenience of transformation into other laws).

It is most profitable to obtain random numbers with a uniform law with the help of an electronic attachment that frees the computer from additional costs of computer time. Obtaining a purely uniform distribution on a computer is impossible due to the limited discharge grid. Therefore, instead of a continuous collection of numbers on the interval (0, 1), a discrete collection of 2 n numbers, where n- bit width of the machine word.

The distribution law of such a population is called quasi-uniform ... At n³20, the differences between uniform and quasi-uniform laws become insignificant.

To obtain quasi-uniform random numbers, 2 methods are used:

1) generating random numbers using an electronic device by simulating some random processes;

2) obtaining pseudo-random numbers using special algorithms.

For getting n-digit binary random number according to the first method, a sequence of independent random variables is modeled z i, taking the value 0 or 1. the resulting sequence of 0 and 1, if we consider it as a fractional number, and is a random value of a quasi-uniform distribution on the interval (0, 1). The hardware methods for getting these numbers differ only in the way of getting the implementation z i.

One of the methods is based on counting the number of radioactive particles over a certain period of time. Dt if the number of particles for Dt even then z i=1 , and if odd, then z i=0 .

Another method uses the noise effect of a vacuum tube. Fixing the value of the noise voltage at certain points in time t i, we obtain the values of independent random variables U (t i), i.e. voltage (Volts).

The magnitude z i determined by law:

where a- some value of the threshold voltage.

The magnitude a usually selected from the condition:

The disadvantage of the hardware method is that it does not allow using the double run method to control the operation of the algorithm for solving a problem, since the same random numbers cannot be obtained during a repeated run.

Pseudo-random are called numbers formed on a computer using special programs in a recurrent way: each random number is obtained from the previous one using special transformations.

The simplest of these transformations is as follows. Let there be some n- bit binary number from the interval nÎ (0, 1). Let's square it, and we will get already 2 n bit number. Let's highlight the average n discharges. Obtained in this way n Is a bit number and will be the new value of the random number. We square it again, etc. This sequence is pseudo-random, since from a theoretical point of view, it is not accidental.

The disadvantage of recurrent algorithms is that sequences of random numbers can degenerate (for example, we will receive only the zero sequence or a sequence of ones, or periodicity may appear).