Some results on new statistical randomness tests based on length of runs
Các dãy và các số ngẫu nhiên đóng
một vai trò rất quan trọng trong mật mã. Trong
các nguyên thuỷ mật mã đối xứng, khoá bí mật
chính là thành phần quan trọng nhất nhằm đảm
bảo tính an toàn của chúng. Trong khi đó, các
giao thức mật mã hay lược đồ chữ ký số cũng phụ
thuộc nhiều vào các giá trị ngẫu nhiên. Ngoài ra,
một trong các tiêu chí để đánh giá tính an toàn
cho các nguyên thuỷ mật mã như mã khối, hàm
băm là đánh giá tính ngẫu nhiên đầu ra. Do đó,
việc đánh giá tính ngẫu nhiên theo các kiểm tra
thống kê thực sự rất quan trọng đối với việc đánh
giá tính an toàn của các thuật toán mật mã.
Trong bài báo này, chúng tôi trình bày một số kết
quả nghiên cứu về các tiêu chuẩn kiểm tra loạt
dựa trên độ dài đã được đề xuất bởi A.
Doğanaksoy cùng đồng sự năm 2015. Đầu tiên,
chúng tôi chỉ ra rằng một số giá trị xác suất cho
các loạt độ dài 1 và 2 là chưa chính xác và đề xuất
chỉnh sửa. Sau đó, chúng tôi đã đưa ra và chứng
minh cho trường hợp tổng quát các loạt có độ dài
k bất kỳ. Cuối cùng, chúng tôi đã xây dựng một
công cụ kiểm tra tính ngẫu nhiên dựa trên độ dài
các loạt và áp dụng đánh giá cho các nguồn ngẫu
nhiên thực sự.
Trang 1
Trang 2
Trang 3
Trang 4
Trang 5
Trang 6
Trang 7
Trang 8
Trang 9
Tóm tắt nội dung tài liệu: Some results on new statistical randomness tests based on length of runs
Số 2.CS (08) 2018 10 Linh Hoang Dinh Abstract— Random Sequences and random numbers play a very important role in cryptography. In symmetric cryptography primitives, a secret key is the most important component to ensure their security. While cryptographic protocols or digital signature schemes are also strongly dependent on random values. In addition, one of the criteria for evaluating security for cryptographic primitives such as block cipher, hash function... is to evaluate the output randomness. Therefore, the assessment of randomness according to statistical tests is really important for measuring the security of cryptographic algorithms. In this paper, we present some research results on randomness tests based on the length of runs proposed by A. Doğanaksoy et al in 2015. First, we show that some probability values for tests based on lengths 1 and 2 are inaccurate and suggest editing. Secondly, we have given and demonstrated for the general case the runs of any length k. Finally, we built a randomness testing tool and applied evaluations to true random sources. Tóm tắt— Các dãy và các số ngẫu nhiên đóng một vai trò rất quan trọng trong mật mã. Trong các nguyên thuỷ mật mã đối xứng, khoá bí mật chính là thành phần quan trọng nhất nhằm đảm bảo tính an toàn của chúng. Trong khi đó, các giao thức mật mã hay lược đồ chữ ký số cũng phụ thuộc nhiều vào các giá trị ngẫu nhiên. Ngoài ra, một trong các tiêu chí để đánh giá tính an toàn cho các nguyên thuỷ mật mã như mã khối, hàm băm là đánh giá tính ngẫu nhiên đầu ra. Do đó, việc đánh giá tính ngẫu nhiên theo các kiểm tra thống kê thực sự rất quan trọng đối với việc đánh giá tính an toàn của các thuật toán mật mã. Trong bài báo này, chúng tôi trình bày một số kết quả nghiên cứu về các tiêu chuẩn kiểm tra loạt dựa trên độ dài đã được đề xuất bởi A. Doğanaksoy cùng đồng sự năm 2015. Đầu tiên, chúng tôi chỉ ra rằng một số giá trị xác suất cho các loạt độ dài 1 và 2 là chưa chính xác và đề xuất chỉnh sửa. Sau đó, chúng tôi đã đưa ra và chứng This manuscript is received on December 1, 2018. It is commented on December 6, 2018 and is accepted on December 12, 2018 by the first reviewer. It is commented on December 16, 2018 and is accepted on December 22, 2018 by the second reviewer. minh cho trường hợp tổng quát các loạt có độ dài k bất kỳ. Cuối cùng, chúng tôi đã xây dựng một công cụ kiểm tra tính ngẫu nhiên dựa trên độ dài các loạt và áp dụng đánh giá cho các nguồn ngẫu nhiên thực sự. Keywords— Randomness testing; Block cipher; Hash function; Runs. Từ khóa— Kiểm tra tính ngẫu nhiên; Mã khối; Hàm băm; Loạt. I. INTRODUCTION Statistical randomness tests play an important role in assessing the security of cryptographic algorithms. There have been many independently statistical randomness tests in the literature. Knuth [1] presented a number of statistical tests including frequency check, serial test, poker test, series test (run)... Another test suite is the DIEHARD tests [2] including 18 statistical tests. In addition, there is a Crypt-XS test suite [3] proposed by the Information Security Research Center of Queensland University of Technology. Finally, the currently widely used test suite is the SP 800-22 statistical test suite [4] originally developed by NIST with 16 tests but then shortened to 15 tests (omitted Lempel-Ziv complexity test). In addition, there are a number of randomness testing standards that are not presented in test suites or independently used. In 1992 Maurer proposed a universal statistical test for random bit generators. In 2004, Hernandez et al. proposed a new test called the Strict Avalanche Criterion (SAC)... And most recently, Doğanaksoy et al. [5] proposed three new randomness tests based on the length of runs in 2015. Our Contributions. In this paper, we present some new results on randomness tests based on the length of runs. Specifically, we have given and demonstrated in detail the probability calculation formula for the general case of a binary sequence of length n having exactly lk runs of length k, with 1 k n . Furthermore, we have shown that some Some results on new statistical randomness tests based on length of runs Nghiên cứu Khoa học và Công nghệ trong lĩnh vực An toàn thông tin Số 2.CS (08) 2018 11 probability values given in [5] are inaccurate, and this may lead to a mistake in assessing the randomness of the input sequences, thereby giving to incorrect assertions about the security of cryptographic algorithms. Finally, we built a tool that quickly, accurately and efficiently checks a data file that is random or not using three new randomness tests based on length of runs. Construction. The rest of the paper includes: Part II presents three postulates on randomness given by Golomb [6] as well as some tests related to runs. The new results of research on tests based on length of run are presented in section III. In section IV, we present a randomness assessment tool using 3 new tests. Finally, the conclusions and future research directions are presented in Section V. II. PRELIMINARIES In this section, we present the three propositions of randomness given by Golomb in [6]. This is one of the bases for assessing the randomness of a sequence. Then, we outlined some of the testing standards related to the runs as well as the reasons for studying new standards based on length of runs. A. Golomb’s Randomness Postulates Let 0 1 1 , , , , n S s s s be an infnite binary sequence periodic with n or a finite sequence of length n. A run is defined as an uninterrupted maximal sequence of identical bits. Runs of 0’s are called gap; runs of 1’s are called block. R1, R2, and R3 are Golomb’s randomness postulates which are given as follows: (R1) In a p ... ty values for runs of length one test for 128-bit blocks Interval Probability Box 1 0-27 0.2191945278 Box 2 28-31 0.2304573984 Box 3 32-34 0.1843489091 Box 4 35-38 0.1945435197 Box 5 39-128 0.1714556450 Total 1 Remark 1: In the Table 1, we have use the intervals given in [5], however the calculated probabilities of Box 4 and Box 5 are not match with the probabilities given in [5]. After retesting, we find that the authors in [5] give correct intervals but wrong probabilities. The correct probabilities are as in Table 1. Moreover, the probabilities given in [5] are belong to the intervals 35-40, and 41-128, that can not belong to the intervals 35-38 and 39-128. Similarly, we can calculate probabilitiy intervals for sequences with different lengths. The subinterval probabilities for runs of length 1 can be seen in Table 2. Table 2. Interval and probability values for runs of length one test for 64, 128, 256, and 512-bit blocks n = 64 n = 128 Interva l Probability Interval Probability Box 1 0-12 0.19008234 44 0-27 0.21919452 78 Box 2 13-15 0.23887746 37 28-31 0.23045739 84 Box 3 16-17 0.17456037 41 32-34 0.18434890 91 Box 4 18-20 0.21147040 14 35-38 0.19454351 97 Box 5 21-64 0.18500941 64 39-128 0.17145564 50 Total 1 1 n = 256 n = 512 Interva l Probability Interval Probability Box 1 0-56 0.18725584 09 0-117 0.19356638 36 Box 2 57-61 0.18928091 85 118-125 0.21863011 42 Box 3 62-66 0.21985945 18 126-132 0.21707667 90 Box 4 67-72 0.21877592 27 133-140 0.19951554 92 Box 5 73-256 0.18482786 61 141-512 0.17121127 40 Total 1 1 Nghiên cứu Khoa học và Công nghệ trong lĩnh vực An toàn thông tin Số 2.CS (08) 2018 15 Remark 2: In [5], the authors give the intervals and the corresponding probabilities for runs of length 1. However, we found that some value in [5] are incorrect! For 64n , if we use the intervals in [5], then the correct probabilities should be: Table 2.1. Interval and probability values for runs of length 1 test for 64-bit blocks n = 64 Interval Probability Box 1 0-13 0.2613425337 Box 2 14-16 0.2561417553 Box 3 17-18 0.1659176815 Box 4 19-21 0.1812433426 Box 5 22-64 0.1353546869 Total 1 These probabilities are not match with the probabilities given in [5]. Moreover, these intervals are not equivalent. Therefore, we have re-divide into new intervals and recalculate probabilities in new intervals. Interestingly, these probability values approximate the values given in [5] but belong to other intervals. Similar to the case 128n , intervals and the probability values given in [5] is not match. Specifically, if we take the given intervals in [5], we have recalculated the probabilities exactly as shown in Table 1 and Table 2. If we use the intervals as follows, the probability values coincide with the values given in [5]. Table 2.2. Interval and probability values for runs of length 1 test for 128-bit blocks n = 128 Interval Probability Box 1 0-26 0.1731718548 Box 2 27-30 0.2142651725 Box 3 31-33 0.1869770204 Box 4 34-37 0.2133929800 Box 5 38-128 0.2121929722 Total 0.9999999999 In case 2k , we have calculated all probability values and divide into subintervals as follow: Table 3. Interval and probability values for runs of length two test for 64, 128, 256, and 512-bit blocks n = 64 n = 128 Interval Probability Interval Probability Box 1 0-5 0.16134454 44 0-12 0.16707580 01 Box 2 6-7 0.26096400 39 13-14 0.17407560 80 Box 3 8 0.14909396 94 15-16 0.20979407 61 Box 4 9-10 0.24528775 67 17-19 0.26659050 46 Box 5 11-32 0.18330972 55 20-64 0.18246401 11 Total 0.99999999 99 0.99999999 99 n = 256 n = 512 Interval Probability Interval Probability Box 1 0-27 0.19257931 49 0-57 0.18893841 82 Box 2 28-30 0.19405196 89 58-61 0.17879497 30 Box 3 31-33 0.22292350 33 62-65 0.21049627 69 Box 4 34-36 0.18785330 79 66-70 0.22561551 70 Box 5 37-128 0.20259190 51 71-256 0.19615481 50 Total 1.00000000 01 1.00000000 01 In case 3k , we have calculated all probability values and divide into subintervals as follow: Table 4. Interval and probability values for runs of length three test for 64, 128, 256, and 512-bit blocks n = 64 n = 128 Interval Probability Interval Probability Box 1 0-2 0.20782508 99 0-5 0.16320900 84 Box 2 3 0.20431981 09 6-7 0.27450079 90 Box 3 4 0.21673204 08 8 0.15485472 29 Box 4 5-6 0.28324547 60 9-10 0.24505901 18 Box 5 7-21 0.08787758 25 11-42 0.16237645 79 Total 1.00000000 01 1.00000000 00 n = 256 n = 512 Interv al Probability Interva l Probability Box 1 0-13 0.248734758 4 0-27 0.189852605 4 Box 2 14-15 0.207164715 8 28-30 0.201497107 3 Box 3 16-17 0.213743768 1 31-33 0.231181647 3 Box 4 18-20 0.222144506 9 34-36 0.190006612 6 Box 5 20-85 0.108212250 8 37-170 0.187462027 4 Total 1.000000000 0 1.000000000 0 Journal of Science and Technology on Information Security 16 Số 2.CS (08) 2018 Remark 3: In the case of 512n we used Magma software to divide the intervals and calculate the probability values because it takes quite a long time to run in C ++ language. The calculation time on Magma for this case is about 5000 seconds. C. Tests Descriptions After calculating the probabilities, we begin to build a new test based on the number of runs of length k. Specifically, to test a sequence of N n m bits, where n is the block size we choose. Or we consider m outputs of a cryptographic primitive (a block cipher or a hash function) that have output block size is n- bit. First, we'll count the number of runs of length k of each sequence in m blocks, and increases the count value of the corresponding sub-interval to 1. After calculating, we record the counting values of each sub-interval, denoted by 1 2 5 , , ,F F F respectively. We use the approach as in [8], using χ2 test to evaluate the randomness of the sequence. Consider: 2 5 2 1 Pr Pr i i i i F m m , Lastly p-value is calculated according to the given values: 25 1 , 2 2 p value igamc . By comparing the produced p-value with the level of significance , we can conclude about the randomness of the input sequence. Note that for 2 test, we require Pr 5 i m therefore for Pr 0.2 i we need 25m . Thus, the new tests can be applied for short sequences of length N n m bit for 25m . In addition, counting the total runs numbers and runs numbers with length k of a sequence by definition is difficult. Therefore, we use the concept of “derivative” of a sequence. Definition 1 (Remark 11, [5]) (derivative of a sequence) Let 0 1 1 , , , n S s s s be a binary sequence of length n, the derivative of S, denoted by 0 1 1 , , , n S s s s is defined as follows 1 if 0,1, , 2, 1 if 1. i i i s s i n s i n Also we use a variation of S , denoted by S of length 1n by adding 1’s at the beginning the sequence S . The variation of derivative is an important part of new defned run tests, since the number of runs of different length is determined by this sequence. Let 0 1 1 , , , n S s s s be a binary sequence and derivative of S is 0 1 1 , , , n S s s s . Then, 0 1 , , , n S s s s is defined as follows 1 if 1,2, , , 1 if 0. i i s i n s i It is easy to prove that the total runs number of a sequence is the weight of the derivative of that sequence, and the runs number with the length k in a given sequence S is the number of samples 1 10 01 k overlaps in S . Algorithm 2 presents the pseudocode of the test based on the number of runs of length k: Algorithm 2: Test based on the number of runs of length k 1 21 2, , , , , , , mm k k k kS S S L l l l ,0 ,1 , , , , i i i i n S s s s 0, 0i k i l while j n k do 1 2 0 , , 1 , 2 , 2 2 2 2k k k i j i j i j i j k temp s s s s if 2 1ktemp then 1i i k k l l end if 1i i end while Applying χ2 test for k L return p-value. IV. SOME EXPERIMENTAL RESULTS We have developed a randomness test program using tests based on runs of lengths 1, 2 and 3. The program interface is shown in Figure 1. Specifically, we have tested 4 files true random: sample1.rng, sample2.rng, sample3.rng, sample4.rng (these true random files are actually 32 KB in size, that is, bits of 32 1024 8N length, downloaded at with the following cases n = 64, 128, 256 and 512. The results of all 4 files have passed 3 new runs Nghiên cứu Khoa học và Công nghệ trong lĩnh vực An toàn thông tin Số 2.CS (08) 2018 17 tests with n = 64, 128, 256 and 512. Specifically, for the case n = 64, select the significance level 0.01 , the file to be checked is sample1.rng, we get the result as shown in Figure 1: Fig 1. The program interface of 3 new runs tests for n = 64, 0.01 for file sample1.rng Similarly, we perform tests for the cases 128,256, 512n and for files sample2.rng, sample3.rng, sample4.rng. The results are summarized in the following Table 5: Table 5. Results of 3 new runs tests for true random files Case n = 64 sample 1.rng sample2.r ng sample3.r ng sample4.r ng Runs of length 1 test 0.2654 71 0.177239 0.249560 0.857602 Runs of length 2 test 0.5320 56 0.054239 0.509319 0.219101 Runs of length 3 test 0.5785 69 0.832500 0.445590 0.941098 Case n = 128 sample 1.rng sample2.r ng sample3.r ng sample4.r ng Runs of length 1 test 0.0897 78 0.601941 0.251491 0.941470 Runs of length 2 test 0.1695 05 0.435659 0.645554 0.416198 Runs of length 3 test 0.2641 85 0.893517 0.393173 0.978088 Case n = 256 sample 1.rng sample2.r ng sample3.r ng sample4.r ng Runs of length 1 test 0.6409 39 0.308548 0.272620 0.422990 Runs of length 2 test 0.8992 93 0.231489 0.571770 0.779767 Runs of length 3 test 0.5063 32 0.814081 0.011770 0.591287 Case n = 512 sample 1.rng sample2.r ng sample3.r ng sample4.r ng Runs of length 1 test 0.1205 85 0.292832 0.480338 0.861397 Runs of length 2 test 0.2522 12 0.821268 0.105730 0.726579 Runs of length 3 test 0.8111 72 0.471682 0.110607 0.834620 V. CONCLUSION In this paper, we present some results on new randomness tests based on length of run proposed by A. Doğanaksoy et al. [5]. First, we have given and demonstrated in detail the probability calculation formula for runs of length k, with 1 k n . Second, we show that some probability values for runs lengths 1 and 2 are inaccurate and suggest corrections. Third, we have built a randomness testing algorithm based on the length of runs. Finally, we programmed to build an accurate and efficient tool to test randomness based on the length of runs and apply evaluations to true random sources. Further research directions: Note that the criteria presented in this paper can only be used to evaluate sequences of lengths greater than 512 bits, so it is not applicable to assess randomness output for cryptographic primitives such as block ciphers or hash functions. To be able to evaluate for sequences of length less than or equal to 512 bits, we need to recalculate the probability distribution for blocks of smaller lengths and divide the probability interval accordingly. This is an open problem that needs further research in the future. In addition, the evaluation of probability values for series with a length greater than or equal to 4 and the correlation between these tests also need further consideration and research. Journal of Science and Technology on Information Security 18 Số 2.CS (08) 2018 REFERENCES [1]. M. D. MacLaren, “The art of computer programming. Volume 2: Seminumerical algorithms (Donald E. Knuth)”, SIAM Review 12, pp. 306-308, 1970. [2]. G. Marsaglia, “The marsaglia random number cdrom including the diehard battery of tests of randomness, 1995”. URL stat. fsu. edu/pub/diehard, 2008. [3]. W. Caelli, “Crypt x package documentation”. Information Security Research Centre and School of Mathematics, Queensland University of Technology, 1992. [4]. A. Rukhin, J. Soto, J. Nechvatal, E. Barker, S. Leigh, M. Levenson, D. Banks, A. Heckert, J. Dray, S. Vo, “Statistical test suite for random and pseudorandom number generators for cryptographic applications, NIST special publication”, 2010. [5]. A. Doğanaksoy, F. Sulak, M. Uğuz, O. Şeker, Z. Akcengiz, “New statistical randomness tests based on length of runs”. Mathematical Problems in Engineering, 2015. [6]. S. W. Golomb, “Shift register sequences”. Aegean Park Press, 1982. [7]. P. L'Ecuyer, R. Simard, “TestU01: AC library for empirical testing of random number generators”. ACM Transactions on Mathematical Software (TOMS) 33, 22, 2007. [8]. F. Sulak, A. Doğanaksoy, B. Ege, O. Koçak, “Evaluation of randomness test results for short sequences”, in International Conference on Sequences and Their Applications. Springer, pp. 309-319, 2010. ABOUT THE AUTHOR B.S. Linh Hoang Dinh Workplace: Institute of Cryptography Science and Technology. Email: linhhd@bcy.gov.vn The education process: has received a mathematical bachelor degree in Ha Noi University of Science, in 2014. Research today: symmetric cryptography algorithm.
File đính kèm:
- some_results_on_new_statistical_randomness_tests_based_on_le.pdf