First, the definition
Benford's Law, also known as the law of Benford described occurrence probability pile derived from real life data, to 1 as the first digit of the number of about three percent of the total, nearly three times the expected value of 1/9. Promotion, the larger the number, the lower the probability that it led to several numbers appear. It can be used to check whether there is a variety of data fraud. [1]
Second, mathematics
Benford's Law described in b binary system, the probability of occurrence of the count n is beginning
Benford's Law not only applies to single digits, even more than the number can also be used.
The probability of occurrence of the first decimal digits (%, after a decimal place):
d | p |
1 | 30.1% |
2 | 17.6% |
3 | 12.5% |
4 | 9.7% |
5 | 7.9% |
6 | 6.7% |
7 | 5.8% |
8 | 5.1% |
9 | 4.6% |
Third, to prove
In fact, for Benford's Law, it has so far not accepted proof.
Most of the data can be satisfied, but also part of the data is not satisfied, a uniform distribution of such data
1, a lot of growth in the amount of data will be proportional to the stock (similar to bank deposits, deposit the more, the more income) is there such a formula:
ΔN / (N * Δt) = const (constant)
Wherein incremental ΔN, Δt is the unit time, N is the stock
2, growth is exponential growth, i.e. the same time, the multiple turn is identical with
N=N0*e^(ct)
Wherein, when the growth of stock N0 to N times, the time required t, c is a constant
Shows that, when N1 N2 to grow, the time needed is:
t = c'lg(N2/N1)
3, computing
t1 = c'lg (2)
t2 = c'lg (3/2)
...
tn = c'lg (n + 1) / n
The first authentication data from time 1 to 9 needed
t = t1 + t2 + ... + t9 = c'lg(10) = c'
P1 = t1 / t = c'lg (2) / c '= lg (2) = lg (1 + 1) / 1 ≈ 30.1%
Pn = tn / t = log (n + 1) / n
Here's a word to think of our ancestors, things are difficult, perhaps this is the meaning of it. In fact, this law has so far not a recognized proof, just a lot of data is in line with Benford's Law is.
Fourth, verification
Verify Benford's Law has certain requirements for digital, it must be disorganized data, such as the national population, GDP, etc.
The following verification is a Fibonacci number and the random number
1, Fibonacci column verification
PHP:
<?php $size = 1000; $arr = array(1, 2); for($i = 2; $i < $size; $i++) { $arr[] = $arr[$i-1] + $arr[$i-2]; } $sum = array(0, 0, 0, 0, 0, 0, 0, 0, 0, 0); for($k = 0; $k < count($arr); $k++) { $index = substr($arr[$k], 0, 1); $sum[$index]++; } print_r($sum); for($n = 1; $n < count($sum); $n++) { echo "首位 {$n} ,比例 " . round($sum[$n]/$size, 2) . "\n"; } ?>
Output:
The Array ( [ 0] => 0 [ . 1] => 300 [ 2] => 177 [ . 3] => 125 [ . 4] => 96 [ . 5] => 80 [ . 6] => 67 [ . 7] => 57 is [ 8] => 53 [ 9] => 45 ) the first one, the ratio of 0.3 the first two, the ratio of 0.18 the first 3, a rate of 0.13 the first 4, a ratio of 0.1 the first 5, the proportion of 0.08 top 6, the ratio of 0.07 the first 7 and 0.06 first 8 , the ratio of 0.05 the first 9, the ratio of 0.05
2, random number, note that the random process is a pseudo-random number, a random here plus growth rate, in addition to note that the data could lead to cross-border is too long, plus cycles to ensure that no more than fifteen random number ten square
PHP:
? < PHP $ COUNT = 0 ; $ size = 1000 ; $ Grow = 80000; // growth of $ A = RAND (); $ SUM = Array (0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ); for ( $ I = 0; $ I < $ size ; $ I ++ ) { // simulate the natural growth rate, 8w can be changed $ K = ( RAND () - 16384) / $ Grow +. 1 ; A $ = $ A + $ K * $ A ; the while (mb_strlen ( $ A )> = 15 ) { // lowering no effect on the magnitude of the first $ A / = 10 ; } $ index = substr ( $ A , 0,. 1 ); $ SUM [ $ index ] ++ ; } print_r ( $ SUM ); for ( $ n- =. 1; $ n- < COUNT ( $ SUM ); $ n- ++ ) { echo "first { $ n- .}, the ratio of" round ( $ SUM [ $ n-]/$size, 2) . "\n"; } ?>
Output:
The Array ( [ 0] => 0 [ . 1] => 303 [ 2] => 176 [ . 3] => 121 [ . 4] => 111 [ . 5] => 89 [ . 6] => 65 [ . 7] => 54 is [ 8] => 36 [ 9] => 45 ) the first one, the ratio of 0.3 the first two, the ratio of 0.18 the first 3, and 0.12 the first 4, the ratio of 0.11 the first 5, the proportion of 0.09 top 6, the ratio of 0.07 the first 7, the ratio of 0.05 the first 8 , the ratio of 0.04 the first 9, the ratio of 0.05
V. Conclusion
For Fibonacci number and a random number, get out of the result is relatively close to Benford's Law, which is in most cases Benford's Law can be used to verify the cause of the false data
reference:
[1]. Benford's Law