Some results on new statistical randomness tests based on length of runs

Các dãy và các số ngẫu nhiên đóng
một vai trò rất quan trọng trong mật mã. Trong
các nguyên thuỷ mật mã đối xứng, khoá bí mật
chính là thành phần quan trọng nhất nhằm đảm
bảo tính an toàn của chúng. Trong khi đó, các
giao thức mật mã hay lược đồ chữ ký số cũng phụ
thuộc nhiều vào các giá trị ngẫu nhiên. Ngoài ra,
một trong các tiêu chí để đánh giá tính an toàn
cho các nguyên thuỷ mật mã như mã khối, hàm
băm là đánh giá tính ngẫu nhiên đầu ra. Do đó,
việc đánh giá tính ngẫu nhiên theo các kiểm tra
thống kê thực sự rất quan trọng đối với việc đánh
giá tính an toàn của các thuật toán mật mã.
Trong bài báo này, chúng tôi trình bày một số kết
quả nghiên cứu về các tiêu chuẩn kiểm tra loạt
dựa trên độ dài đã được đề xuất bởi A.
Doğanaksoy cùng đồng sự năm 2015. Đầu tiên,
chúng tôi chỉ ra rằng một số giá trị xác suất cho
các loạt độ dài 1 và 2 là chưa chính xác và đề xuất
chỉnh sửa. Sau đó, chúng tôi đã đưa ra và chứng
minh cho trường hợp tổng quát các loạt có độ dài
k bất kỳ. Cuối cùng, chúng tôi đã xây dựng một
công cụ kiểm tra tính ngẫu nhiên dựa trên độ dài
các loạt và áp dụng đánh giá cho các nguồn ngẫu
nhiên thực sự.
Download
Trang 1
Trang 2
Trang 3
Trang 4
Trang 5
Trang 6
Trang 7
Trang 8
Trang 9
9 trang minhkhanh 2620
Download
Bạn đang xem tài liệu "Some results on new statistical randomness tests based on length of runs", để tải tài liệu gốc về máy hãy click vào nút Download ở trên
Tóm tắt nội dung tài liệu: Some results on new statistical randomness tests based on length of runs

 Số 2.CS (08) 2018 10 
Linh Hoang Dinh
Abstract— Random Sequences and random 
numbers play a very important role in 
cryptography. In symmetric cryptography 
primitives, a secret key is the most important 
component to ensure their security. While 
cryptographic protocols or digital signature 
schemes are also strongly dependent on random 
values. In addition, one of the criteria for 
evaluating security for cryptographic primitives 
such as block cipher, hash function... is to 
evaluate the output randomness. Therefore, the 
assessment of randomness according to statistical 
tests is really important for measuring the 
security of cryptographic algorithms. In this 
paper, we present some research results on 
randomness tests based on the length of runs 
proposed by A. Doğanaksoy et al in 2015. First, 
we show that some probability values for tests 
based on lengths 1 and 2 are inaccurate and 
suggest editing. Secondly, we have given and 
demonstrated for the general case the runs of 
any length k. Finally, we built a randomness 
testing tool and applied evaluations to true 
random sources. 
Tóm tắt— Các dãy và các số ngẫu nhiên đóng 
một vai trò rất quan trọng trong mật mã. Trong 
các nguyên thuỷ mật mã đối xứng, khoá bí mật 
chính là thành phần quan trọng nhất nhằm đảm 
bảo tính an toàn của chúng. Trong khi đó, các 
giao thức mật mã hay lược đồ chữ ký số cũng phụ 
thuộc nhiều vào các giá trị ngẫu nhiên. Ngoài ra, 
một trong các tiêu chí để đánh giá tính an toàn 
cho các nguyên thuỷ mật mã như mã khối, hàm 
băm là đánh giá tính ngẫu nhiên đầu ra. Do đó, 
việc đánh giá tính ngẫu nhiên theo các kiểm tra 
thống kê thực sự rất quan trọng đối với việc đánh 
giá tính an toàn của các thuật toán mật mã. 
Trong bài báo này, chúng tôi trình bày một số kết 
quả nghiên cứu về các tiêu chuẩn kiểm tra loạt 
dựa trên độ dài đã được đề xuất bởi A. 
Doğanaksoy cùng đồng sự năm 2015. Đầu tiên, 
chúng tôi chỉ ra rằng một số giá trị xác suất cho 
các loạt độ dài 1 và 2 là chưa chính xác và đề xuất 
chỉnh sửa. Sau đó, chúng tôi đã đưa ra và chứng 
This manuscript is received on December 1, 2018. It is 
commented on December 6, 2018 and is accepted on 
December 12, 2018 by the first reviewer. It is commented on 
December 16, 2018 and is accepted on December 22, 2018 
by the second reviewer. 
minh cho trường hợp tổng quát các loạt có độ dài 
k bất kỳ. Cuối cùng, chúng tôi đã xây dựng một 
công cụ kiểm tra tính ngẫu nhiên dựa trên độ dài 
các loạt và áp dụng đánh giá cho các nguồn ngẫu 
nhiên thực sự. 
Keywords— Randomness testing; Block 
cipher; Hash function; Runs. 
Từ khóa— Kiểm tra tính ngẫu nhiên; Mã 
khối; Hàm băm; Loạt. 
I. INTRODUCTION 
Statistical randomness tests play an 
important role in assessing the security of 
cryptographic algorithms. There have been 
many independently statistical randomness tests 
in the literature. Knuth [1] presented a number 
of statistical tests including frequency check, 
serial test, poker test, series test (run)... Another 
test suite is the DIEHARD tests [2] including 18 
statistical tests. In addition, there is a Crypt-XS 
test suite [3] proposed by the Information 
Security Research Center of Queensland 
University of Technology. Finally, the currently 
widely used test suite is the SP 800-22 
statistical test suite [4] originally developed by 
NIST with 16 tests but then shortened to 15 
tests (omitted Lempel-Ziv complexity test). 
In addition, there are a number of 
randomness testing standards that are not 
presented in test suites or independently used. 
In 1992 Maurer proposed a universal statistical 
test for random bit generators. In 2004, 
Hernandez et al. proposed a new test called the 
Strict Avalanche Criterion (SAC)... And most 
recently, Doğanaksoy et al. [5] proposed three 
new randomness tests based on the length of 
runs in 2015. 
Our Contributions. In this paper, we 
present some new results on randomness tests 
based on the length of runs. Specifically, we 
have given and demonstrated in detail the 
probability calculation formula for the general 
case of a binary sequence of length n having 
exactly lk runs of length k, with 1 k n . 
Furthermore, we have shown that some 
Some results on new statistical randomness 
tests based on length of runs
Nghiên cứu Khoa học và Công nghệ trong lĩnh vực An toàn thông tin 
 Số 2.CS (08) 2018 11 
probability values given in [5] are inaccurate, 
and this may lead to a mistake in assessing the 
randomness of the input sequences, thereby 
giving to incorrect assertions about the security 
of cryptographic algorithms. Finally, we built a 
tool that quickly, accurately and efficiently 
checks a data file that is random or not using 
three new randomness tests based on length 
of runs. 
Construction. The rest of the paper 
includes: Part II presents three postulates on 
randomness given by Golomb [6] as well as 
some tests related to runs. The new results of 
research on tests based on length of run are 
presented in section III. In section IV, we 
present a randomness assessment tool using 3 
new tests. Finally, the conclusions and future 
research directions are presented in Section V. 
II. PRELIMINARIES 
In this section, we present the three 
propositions of randomness given by Golomb in 
[6]. This is one of the bases for assessing the 
randomness of a sequence. Then, we outlined 
some of the testing standards related to the runs 
as well as the reasons for studying new 
standards based on length of runs. 
A. Golomb’s Randomness Postulates 
Let 
0 1 1
, , , ,
n
S s s s
  
be an infnite binary 
sequence periodic with n or a finite sequence of 
length n. A run is defined as an uninterrupted 
maximal sequence of identical bits. Runs of 0’s 
are called gap; runs of 1’s are called block. R1, 
R2, and R3 are Golomb’s randomness 
postulates which are given as follows: 
 (R1) In a p ... ty values for runs of length 
one test for 128-bit blocks 
 Interval Probability 
Box 1 0-27 0.2191945278 
Box 2 28-31 0.2304573984 
Box 3 32-34 0.1843489091 
Box 4 35-38 0.1945435197 
Box 5 39-128 0.1714556450 
Total 1 
Remark 1: In the Table 1, we have use the 
intervals given in [5], however the calculated 
probabilities of Box 4 and Box 5 are not match 
with the probabilities given in [5]. After 
retesting, we find that the authors in [5] give 
correct intervals but wrong probabilities. The 
correct probabilities are as in Table 1. 
Moreover, the probabilities given in [5] are 
belong to the intervals 35-40, and 41-128, 
that can not belong to the intervals 35-38 
and 39-128. 
Similarly, we can calculate probabilitiy 
intervals for sequences with different lengths. 
The subinterval probabilities for runs of length 
1 can be seen in Table 2. 
Table 2. Interval and probability values for runs of length 
one test for 64, 128, 256, and 512-bit blocks 
 n = 64 n = 128 
Interva
l 
Probability Interval Probability 
Box 
1 
0-12 
0.19008234
44 
0-27 
0.21919452
78 
Box 
2 
13-15 
0.23887746
37 
28-31 
0.23045739
84 
Box 
3 
16-17 
0.17456037
41 
32-34 
0.18434890
91 
Box 
4 
18-20 
0.21147040
14 
35-38 
0.19454351
97 
Box 
5 
21-64 
0.18500941
64 
39-128 
0.17145564
50 
Total 1 1 
 n = 256 n = 512 
Interva
l 
Probability Interval Probability 
Box 
1 
0-56 
0.18725584
09 
0-117 
0.19356638
36 
Box 
2 
57-61 
0.18928091
85 
118-125 
0.21863011
42 
Box 
3 
62-66 
0.21985945
18 
126-132 
0.21707667
90 
Box 
4 
67-72 
0.21877592
27 
133-140 
0.19951554
92 
Box 
5 
73-256 
0.18482786
61 
141-512 
0.17121127
40 
Total 1 1 
Nghiên cứu Khoa học và Công nghệ trong lĩnh vực An toàn thông tin 
 Số 2.CS (08) 2018 15 
Remark 2: In [5], the authors give the 
intervals and the corresponding probabilities for 
runs of length 1. However, we found that some 
value in [5] are incorrect! For 64n , if we use 
the intervals in [5], then the correct probabilities 
should be: 
Table 2.1. Interval and probability values for runs of 
length 1 test for 64-bit blocks 
 n = 64 
 Interval Probability 
Box 1 0-13 0.2613425337 
Box 2 14-16 0.2561417553 
Box 3 17-18 0.1659176815 
Box 4 19-21 0.1812433426 
Box 5 22-64 0.1353546869 
Total 1 
These probabilities are not match with the 
probabilities given in [5]. Moreover, these 
intervals are not equivalent. Therefore, we have 
re-divide into new intervals and recalculate 
probabilities in new intervals. Interestingly, 
these probability values approximate the values 
given in [5] but belong to other intervals. 
Similar to the case 128n , intervals and the 
probability values given in [5] is not match. 
Specifically, if we take the given intervals in 
[5], we have recalculated the probabilities 
exactly as shown in Table 1 and Table 2. If we 
use the intervals as follows, the probability 
values coincide with the values given in [5]. 
Table 2.2. Interval and probability values for runs of 
length 1 test for 128-bit blocks 
 n = 128 
 Interval Probability 
Box 1 0-26 0.1731718548 
Box 2 27-30 0.2142651725 
Box 3 31-33 0.1869770204 
Box 4 34-37 0.2133929800 
Box 5 38-128 0.2121929722 
Total 0.9999999999 
In case 2k , we have calculated all 
probability values and divide into subintervals 
as follow: 
Table 3. Interval and probability values for runs of length 
two test for 64, 128, 256, and 512-bit blocks 
 n = 64 n = 128 
 Interval Probability Interval Probability 
Box 
1 
0-5 
0.16134454
44 
0-12 
0.16707580
01 
Box 
2 
6-7 
0.26096400
39 
13-14 
0.17407560
80 
Box 
3 
8 
0.14909396
94 
15-16 
0.20979407
61 
Box 
4 
9-10 
0.24528775
67 
17-19 
0.26659050
46 
Box 
5 
11-32 
0.18330972
55 
20-64 
0.18246401
11 
Total 
0.99999999
99 
0.99999999
99 
 n = 256 n = 512 
 Interval Probability Interval Probability 
Box 
1 
0-27 
0.19257931
49 
0-57 
0.18893841
82 
Box 
2 
28-30 
0.19405196
89 
58-61 
0.17879497
30 
Box 
3 
31-33 
0.22292350
33 
62-65 
0.21049627
69 
Box 
4 
34-36 
0.18785330
79 
66-70 
0.22561551
70 
Box 
5 
37-128 
0.20259190
51 
71-256 
0.19615481
50 
Total 
1.00000000
01 
1.00000000
01 
In case 3k , we have calculated all 
probability values and divide into subintervals 
as follow: 
Table 4. Interval and probability values for runs of length 
three test for 64, 128, 256, and 512-bit blocks 
 n = 64 n = 128 
 Interval Probability Interval Probability 
Box 
1 
0-2 
0.20782508
99 
0-5 
0.16320900
84 
Box 
2 
3 
0.20431981
09 
6-7 
0.27450079
90 
Box 
3 
4 
0.21673204
08 
8 
0.15485472
29 
Box 
4 
5-6 
0.28324547
60 
9-10 
0.24505901
18 
Box 
5 
7-21 
0.08787758
25 
11-42 
0.16237645
79 
Total 
1.00000000
01 
1.00000000
00 
 n = 256 n = 512 
Interv
al 
Probability 
Interva
l 
Probability 
Box 
1 
0-13 
0.248734758
4 
0-27 
0.189852605
4 
Box 
2 
14-15 
0.207164715
8 
28-30 
0.201497107
3 
Box 
3 
16-17 
0.213743768
1 
31-33 
0.231181647
3 
Box 
4 
18-20 
0.222144506
9 
34-36 
0.190006612
6 
Box 
5 
20-85 
0.108212250
8 
37-170 
0.187462027
4 
Total 
1.000000000
0 
1.000000000
0 
Journal of Science and Technology on Information Security 
16 Số 2.CS (08) 2018 
Remark 3: In the case of 512n we used 
Magma software to divide the intervals and 
calculate the probability values because it takes 
quite a long time to run in C ++ language. The 
calculation time on Magma for this case is 
about 5000 seconds. 
C. Tests Descriptions 
After calculating the probabilities, we begin 
to build a new test based on the number of runs 
of length k. Specifically, to test a sequence of 
N n m bits, where n is the block size we 
choose. Or we consider m outputs of a 
cryptographic primitive (a block cipher or a 
hash function) that have output block size is n-
bit. First, we'll count the number of runs of 
length k of each sequence in m blocks, and 
increases the count value of the corresponding 
sub-interval to 1. After calculating, we record 
the counting values of each sub-interval, 
denoted by 
1 2 5
, , ,F F F respectively. We use the 
approach as in [8], using χ2 test to evaluate the 
randomness of the sequence. 
Consider: 
2
5
2
1
Pr
Pr
i i
i i
F m
m

  , 
Lastly p-value is calculated according to the 
given values: 
25 1
,
2 2
p value igamc
 
. 
By comparing the produced p-value with the 
level of significance , we can conclude about 
the randomness of the input sequence. 
Note that for 2 test, we require Pr 5
i
m 
therefore for Pr 0.2
i
 we need 25m . Thus, 
the new tests can be applied for short sequences 
of length N n m bit for 25m . 
In addition, counting the total runs numbers and 
runs numbers with length k of a sequence by 
definition is difficult. Therefore, we use the 
concept of “derivative” of a sequence. 
Definition 1 (Remark 11, [5]) (derivative of 
a sequence) Let 
0 1 1
, , ,
n
S s s s
  be a binary 
sequence of length n, the derivative of S, 
denoted by 
0 1 1
, , ,
n
S s s s
  is defined as 
follows 
1
 if 0,1, , 2,
1 if 1.
i i
i
s s i n
s
i n
  

Also we use a variation of S , denoted by 
S of length 1n by adding 1’s at the 
beginning the sequence S . The variation of 
derivative is an important part of new defned 
run tests, since the number of runs of diﬀerent 
length is determined by this sequence. 
Let 
0 1 1
, , ,
n
S s s s
  be a binary sequence 
and derivative of S is 
0 1 1
, , ,
n
S s s s
  . 
 Then, 
0 1
, , ,
n
S s s s  is defined as 
follows 
1
 if 1,2, , ,
1 if 0.
i
i
s i n
s
i

It is easy to prove that the total runs number 
of a sequence is the weight of the derivative of 
that sequence, and the runs number with the 
length k in a given sequence S is the number of 
samples 
1
10 01
k 
 overlaps in S . 
Algorithm 2 presents the pseudocode of the 
test based on the number of runs of length k: 
Algorithm 2: Test based on the number of 
runs of length k 1 21 2, , , , , , , mm k k k kS S S L l l l   
,0 ,1 ,
, , ,
i i i i n
S s s s  
0, 0i
k
i l  
while j n k do 
1 2 0
, , 1 , 2 ,
2 2 2 2k k k
i j i j i j i j k
temp s s s s 
 
 if 2 1ktemp then 
 1i i
k k
l l 
 end if 
 1i i 
end while 
Applying χ2 test for 
k
L 
return p-value. 
IV. SOME EXPERIMENTAL RESULTS 
 We have developed a randomness test 
program using tests based on runs of lengths 1, 
2 and 3. The program interface is shown in 
Figure 1. Specifically, we have tested 4 files 
true random: sample1.rng, sample2.rng, 
sample3.rng, sample4.rng (these true random 
files are actually 32 KB in size, that is, bits of 
32 1024 8N length, downloaded at 
 with 
the following cases n = 64, 128, 256 and 512. 
The results of all 4 files have passed 3 new runs 
Nghiên cứu Khoa học và Công nghệ trong lĩnh vực An toàn thông tin 
 Số 2.CS (08) 2018 17 
tests with n = 64, 128, 256 and 512. 
Specifically, for the case n = 64, select the 
significance level 0.01 , the file to be 
checked is sample1.rng, we get the result as 
shown in Figure 1: 
Fig 1. The program interface of 3 new runs tests for 
n = 64, 0.01 for file sample1.rng 
Similarly, we perform tests for the cases 
128,256, 512n and for files sample2.rng, 
sample3.rng, sample4.rng. The results are 
summarized in the following Table 5: 
Table 5. Results of 3 new runs tests for 
true random files 
Case n = 64 
sample
1.rng 
sample2.r
ng 
sample3.r
ng 
sample4.r
ng 
Runs of 
length 
1 test 
0.2654
71 
0.177239 0.249560 0.857602 
Runs of 
length 
2 test 
0.5320
56 
0.054239 0.509319 0.219101 
Runs of 
length 
3 test 
0.5785
69 
0.832500 0.445590 0.941098 
Case n = 128 
sample
1.rng 
sample2.r
ng 
sample3.r
ng 
sample4.r
ng 
Runs of 
length 
1 test 
0.0897
78 
0.601941 0.251491 0.941470 
Runs of 
length 
2 test 
0.1695
05 
0.435659 0.645554 0.416198 
Runs of 
length 
3 test 
0.2641
85 
0.893517 0.393173 0.978088 
Case n = 256 
sample
1.rng 
sample2.r
ng 
sample3.r
ng 
sample4.r
ng 
Runs of 
length 
1 test 
0.6409
39 
0.308548 0.272620 0.422990 
Runs of 
length 
2 test 
0.8992
93 
0.231489 0.571770 0.779767 
Runs of 
length 
3 test 
0.5063
32 
0.814081 0.011770 0.591287 
Case n = 512 
sample
1.rng 
sample2.r
ng 
sample3.r
ng 
sample4.r
ng 
Runs of 
length 
1 test 
0.1205
85 
0.292832 0.480338 0.861397 
Runs of 
length 
2 test 
0.2522
12 
0.821268 0.105730 0.726579 
Runs of 
length 
3 test 
0.8111
72 
0.471682 0.110607 0.834620 
V. CONCLUSION 
In this paper, we present some results on new 
randomness tests based on length of run 
proposed by A. Doğanaksoy et al. [5]. First, we 
have given and demonstrated in detail the 
probability calculation formula for runs of 
length k, with 1 k n . Second, we show that 
some probability values for runs lengths 1 and 2 
are inaccurate and suggest corrections. Third, 
we have built a randomness testing algorithm 
based on the length of runs. Finally, we 
programmed to build an accurate and efficient 
tool to test randomness based on the length of 
runs and apply evaluations to true random 
sources. 
Further research directions: Note that the 
criteria presented in this paper can only be used 
to evaluate sequences of lengths greater than 
512 bits, so it is not applicable to assess 
randomness output for cryptographic primitives 
such as block ciphers or hash functions. To be 
able to evaluate for sequences of length less 
than or equal to 512 bits, we need to recalculate 
the probability distribution for blocks of smaller 
lengths and divide the probability interval 
accordingly. This is an open problem that needs 
further research in the future. In addition, the 
evaluation of probability values for series with a 
length greater than or equal to 4 and the 
correlation between these tests also need further 
consideration and research. 
Journal of Science and Technology on Information Security 
18 Số 2.CS (08) 2018 
REFERENCES 
[1]. M. D. MacLaren, “The art of computer 
programming. Volume 2: Seminumerical 
algorithms (Donald E. Knuth)”, SIAM Review 
12, pp. 306-308, 1970. 
[2]. G. Marsaglia, “The marsaglia random number 
cdrom including the diehard battery of tests of 
randomness, 1995”. URL  stat. fsu. 
edu/pub/diehard, 2008. 
[3]. W. Caelli, “Crypt x package documentation”. 
Information Security Research Centre and 
School of Mathematics, Queensland University 
of Technology, 1992. 
[4]. A. Rukhin, J. Soto, J. Nechvatal, E. Barker, S. 
Leigh, M. Levenson, D. Banks, A. Heckert, J. 
Dray, S. Vo, “Statistical test suite for random 
and pseudorandom number generators for 
cryptographic applications, NIST special 
publication”, 2010. 
[5]. A. Doğanaksoy, F. Sulak, M. Uğuz, O. Şeker, Z. 
Akcengiz, “New statistical randomness tests 
based on length of runs”. Mathematical 
Problems in Engineering, 2015. 
[6]. S. W. Golomb, “Shift register sequences”. 
Aegean Park Press, 1982. 
[7]. P. L'Ecuyer, R. Simard, “TestU01: AC library 
for empirical testing of random number 
generators”. ACM Transactions on 
Mathematical Software (TOMS) 33, 22, 2007. 
[8]. F. Sulak, A. Doğanaksoy, B. Ege, O. Koçak, 
“Evaluation of randomness test results for short 
sequences”, in International Conference on 
Sequences and Their Applications. Springer, pp. 
309-319, 2010. 
ABOUT THE AUTHOR 
B.S. Linh Hoang Dinh 
Workplace: Institute of 
Cryptography Science and 
Technology. 
Email: linhhd@bcy.gov.vn 
The education process: has received 
a mathematical bachelor degree in 
Ha Noi University of Science, in 
2014. 
Research today: symmetric cryptography algorithm.
File đính kèm:
some_results_on_new_statistical_randomness_tests_based_on_le.pdf