Zoom in, use Python to predict the color ball

1. Requirements introduction

I happened to see a netizen put forward the analysis requirement of the double-color ball data by chance before, and I found it quite interesting, so I started to operate it. The following is the page of a two-color ball publishing station. You can see that each issue will produce red/blue numbers, of which red balls are 33 to choose 6, blue balls are 16 to 1, a total of 7 balls are selected from 49 balls. If you want to bet on a ball of a certain color, such as the red 1 ball, you must first analyze the past output of the ball.
insert image description here
It can be analyzed from the following three aspects, with Shi Mingjian, to see if the latest issue is worth choosing the ball:

1. Omissions

Count the number of hits after missing n periods, as shown in the figure above: there are 2 times after missing 1 period, 1 time after missing 2 periods, 2 times after missing 5 periods, and 2 times after missing 7 periods The number of times is 1.
insert image description here

2. Consecutive repeated numbers

As shown in the figure, there is 1 time for 3 periods of consecutive repeated numbers.
insert image description here
3. Omissions before heavy numbers

As shown in the figure, there is 1 time that 2 phases were missed before the heavy number.
insert image description here

2. Source data

The excerpted source data is as follows, the data is stored in the form of an html table, class="yl01" in the label means a miss, class="chartBall01" means a red ball is hit, class="chartBall02" means a blue ball is hit (nearly 100 issues are obtained at the end data sample).

<tr>

    <td class="c_fbf5e3 bd_rt_a">2021090</td>

    <td class="yl01" style="font-size:xx-small">1</td>

    <td class="yl01" style="font-size:xx-small">1</td>

    <td class="yl01" style="font-size:xx-small">8</td>

    <td class="yl01" style="font-size:xx-small">2</td>

    <td class="chartBall01">05</td>

    <td class="chartBall01">06</td>

    <td class="yl01" style="font-size:xx-small">1</td>

    <td class="yl01" style="font-size:xx-small">4</td>

    <td class="yl01" style="font-size:xx-small">6</td>

    <td class="yl01" style="font-size:xx-small">4</td>

    <td class="yl01" style="font-size:xx-small">5</td>

    <td class="chartBall01">12</td>

    <td class="yl01" style="font-size:xx-small">1</td>

    <td class="chartBall01">14</td>

    <td class="yl01" style="font-size:xx-small">2</td>

    <td class="yl01" style="font-size:xx-small">2</td>

    <td class="yl01" style="font-size:xx-small">13</td>

    <td class="yl01" style="font-size:xx-small">9</td>

    <td class="yl01" style="font-size:xx-small">2</td>

    <td class="yl01" style="font-size:xx-small">5</td>

    <td class="yl01" style="font-size:xx-small">6</td>

    <td class="yl01" style="font-size:xx-small">9</td>

    <td class="yl01" style="font-size:xx-small">1</td>

    <td class="yl01" style="font-size:xx-small">1</td>

    <td class="yl01" style="font-size:xx-small">7</td>

    <td class="yl01" style="font-size:xx-small">2</td>

    <td class="chartBall01">27</td>

    <td class="chartBall01">28</td>

    <td class="yl01" style="font-size:xx-small">4</td>

    <td class="yl01" style="font-size:xx-small">13</td>

    <td class="yl01" style="font-size:xx-small">12</td>

    <td class="yl01" style="font-size:xx-small">8</td>

    <td class="yl01" style="font-size:xx-small">7</td>

    <td class="v_line"></td>

    <td class="yl01" style="font-size:xx-small">4</td>

    <td class="yl01" style="font-size:xx-small">5</td>

    <td class="yl01" style="font-size:xx-small">42</td>

    <td class="yl01" style="font-size:xx-small">3</td>

    <td class="yl01" style="font-size:xx-small">8</td>

    <td class="yl01" style="font-size:xx-small">2</td>

    <td class="yl01" style="font-size:xx-small">16</td>

    <td class="chartBall02">08</td>

    <td class="yl01" style="font-size:xx-small">10</td>

    <td class="yl01" style="font-size:xx-small">13</td>

    <td class="yl01" style="font-size:xx-small">54</td>

    <td class="yl01" style="font-size:xx-small">1</td>

    <td class="yl01" style="font-size:xx-small">11</td>

    <td class="yl01" style="font-size:xx-small">31</td>

    <td class="yl01" style="font-size:xx-small">18</td>

    <td class="yl01" style="font-size:xx-small">25</td>

</tr>

<tr>

    <td class="c_fbf5e3 bd_rt_a">2021091</td>

    <td class="yl01" style="font-size:xx-small">2</td>

    <td class="yl01" style="font-size:xx-small">2</td>

    <td class="yl01" style="font-size:xx-small">9</td>

    <td class="chartBall01">04</td>

    <td class="yl01" style="font-size:xx-small">1</td>

    <td class="chartBall01">06</td>

    <td class="yl01" style="font-size:xx-small">2</td>

    <td class="yl01" style="font-size:xx-small">5</td>

    <td class="yl01" style="font-size:xx-small">7</td>

    <td class="yl01" style="font-size:xx-small">5</td>

    <td class="yl01" style="font-size:xx-small">6</td>

    <td class="yl01" style="font-size:xx-small">1</td>

    <td class="yl01" style="font-size:xx-small">2</td>

    <td class="yl01" style="font-size:xx-small">1</td>

    <td class="yl01" style="font-size:xx-small">3</td>

    <td class="chartBall01">16</td>

    <td class="yl01" style="font-size:xx-small">14</td>

    <td class="yl01" style="font-size:xx-small">10</td>

    <td class="yl01" style="font-size:xx-small">3</td>

    <td class="yl01" style="font-size:xx-small">6</td>

    <td class="yl01" style="font-size:xx-small">7</td>

    <td class="yl01" style="font-size:xx-small">10</td>

    <td class="yl01" style="font-size:xx-small">2</td>

    <td class="chartBall01">24</td>

    <td class="yl01" style="font-size:xx-small">8</td>

    <td class="chartBall01">26</td>

    <td class="yl01" style="font-size:xx-small">1</td>

    <td class="yl01" style="font-size:xx-small">1</td>

    <td class="yl01" style="font-size:xx-small">5</td>

    <td class="yl01" style="font-size:xx-small">14</td>

    <td class="yl01" style="font-size:xx-small">13</td>

    <td class="yl01" style="font-size:xx-small">9</td>

    <td class="chartBall01">33</td>

    <td class="v_line"></td>

    <td class="yl01" style="font-size:xx-small">5</td>

    <td class="yl01" style="font-size:xx-small">6</td>

    <td class="yl01" style="font-size:xx-small">43</td>

    <td class="yl01" style="font-size:xx-small">4</td>

    <td class="yl01" style="font-size:xx-small">9</td>

    <td class="yl01" style="font-size:xx-small">3</td>

    <td class="yl01" style="font-size:xx-small">17</td>

    <td class="yl01" style="font-size:xx-small">1</td>

    <td class="yl01" style="font-size:xx-small">11</td>

    <td class="yl01" style="font-size:xx-small">14</td>

    <td class="yl01" style="font-size:xx-small">55</td>

    <td class="yl01" style="font-size:xx-small">2</td>

    <td class="yl01" style="font-size:xx-small">12</td>

    <td class="yl01" style="font-size:xx-small">32</td>

    <td class="yl01" style="font-size:xx-small">19</td>

    <td class="chartBall02">16</td>

</tr>

3. Analysis and implementation

1. Data structure

In order to facilitate the representation of the hit, color, and value of the ball, we can map according to the following rules:
insert image description here
First, use the beautiful soup to fish out the required content, and take out the content representing the name and color number in all tags, and map it according to the above rules.

soup = BeautifulSoup(res_table, "html.parser")
item_lst = []
for td in soup.find_all('td'):
	cls = td['class'][0]
	num = td.string
	if cls in ['yl01', 'chartBall01', 'chartBall02']:
	    item_lst.append('-'.join([cls, num]))
print(item_lst[:10])

Output the first 10 previews as follows:

['yl01-3', 'yl01-5', 'chartBall01-03', 'yl01-5', 'yl01-1', 'yl01-5', 'chartBall01-07', 'yl01-9', 'yl01-6', 'yl01-6']

But this is just a super long list, and I still can’t find a clue to deal with it on this basis. At least it needs to be transformed into a matrix consistent with the structure on the webpage to understand it. Then I can think of using numpy matrix. First, use np.array to convert the ordinary list into a numpy array, and then use reshape to convert the one-dimensional array into a two-dimensional array, that is, a matrix of 100 periods X 49 balls.

array = np.array(item_lst).reshape(100, 49)

As shown in the figure, each row of the numpy matrix corresponds to the original page structure.
insert image description here
However, what we want to conduct is a longitudinal comparative analysis to compare the relationship between periods, so we need to transpose the matrix again.

array_T = array.T  # 矩阵转置,一个元素对应一列

insert image description here
At this point, each element in array_T represents the last 100 episodes of a certain color number. If you want to see the red 1 ball, it is array_T[0]; if you want to see the red 2 ball, it is array_T[1]; if you want to see the blue 1 ball, it is array_T[33] (because there are 33 red balls in total, So the index of the first blue ball column is 33). In order to facilitate ball selection, we can write a mapping method:

def trans_col(txt):# 翻译列索引号,如红1=0,红33=32,蓝1=33,蓝16=48
    if"红"in txt:
        col = int(txt.replace('红', '')) - 1
    else:
        col = int(txt.replace('蓝', '')) + 32
    return col

Then you can officially start to realize the needs of statistical analysis!

2. Missing Statistics

Pass in the transposed matrix and the column where the ball of the specified color number is located, and add an end mark after obtaining the column, because at least 1 miss + 1 hit is judged as missing 1 period, so from the second item of the column The data starts to be judged. If it is not equal to the value of the previous item and the current item is a hit, record the value of the previous item (number of missing periods). Afterwards, the mark yl01 indicating omission in the record list is removed, and only the number of periods is retained.

def fun_miss(array, col):# 统计该列中遗漏后命中的次数
    line = array[col].tolist()
    order_grp = []
    for i, v in enumerate(line):
        if i > 0:
            if v != line[i - 1] and'chart'in v:
                order_grp.append(line[i - 1])
    order_grp = [i.replace('yl01-', '') for i in order_grp]
    c = dict(Counter(order_grp))
    result = sorted(c.items(), key=lambda x: int(x[0]))
    for i in result:
        print(f"遗漏{i[0]}期后中的次数有{i[1]}次")
     
fun_miss(array_T, trans_col("红1"))  # 执行

insert image description here
The output after statistical sorting is as follows:

Number of times after period 1 was missed 4 times Number of times after period 3 was missed 3 times Number of times after period 5 was missed 2 times Number of times after period 7 was missed 2 times
Missed 1 time after the 8th period Missed 2 times after the 9th period Missed 1 time after the 13th period

3. Statistics of consecutive repeated numbers

Same as missing statistics, two parameters are passed in, and the color number column to be selected is determined first. When the current item is equal to the previous item, and the current item is a hit, it means a double number. When the double number is continuous, the count counts +1. When the double number condition is not met, it means that the continuous double number is interrupted. Count again when the next round of heavy numbers appears.

def fun_repeat(array, col):# 统计该列中的重号次数
    line = array[col].tolist()
    count_grp = []
    count = 0
    for i, v in enumerate(line):
        if i > 0:
            if v == line[i-1] and'chart'in v:
                count += 1
            elif count > 0:
                count_grp.append(count)
                count = 0
    c = dict(Counter(count_grp))
    result = sorted(c.items(), key=lambda x: x[0])
    for i in result:
        print(f"连续重号{i[0]}次的有{i[1]}次")

fun_repeat(wt, trans_col("红2"))  # 执行

insert image description here
The output after statistical sorting is as follows:

There is 1 time for consecutive repeated numbers 1 time, and 1 time for consecutive repeated numbers 2 times

4. Omission statistics before heavy numbers

Since it is only counted if there are double numbers + omissions, the judgment should be made at least from the third item of data.

def fun_return(array, col):# 统计该列中重号前的遗漏次数
    line = array[col].tolist()
    order_grp = []
    for i, v in enumerate(line):
        if i > 1:
            if v == line[i - 1] and v != line[i - 2] and'chart'in v:
                order_grp.append(line[i - 2])
    order_grp = [i.replace('yl01-', '') for i in order_grp]
    c = dict(Counter(order_grp))
    result = sorted(c.items(), key=lambda x: int(x[0]))
    for i in result:
        print(f"重号前遗漏{i[0]}期的有{i[1]}次")
       
fun_return(wt, trans_col("红1"))  # 执行

insert image description here
The output after statistical sorting is as follows:

There was 1 time that 2 issues were missed before the heavy number

Four. Summary

emmmmm... But from this point of view, it is not enough to analyze a ball alone, and it needs to be compared with other color numbers in depth. Interested students should try it themselves hahaha!

Reminder: Gambling is risky, this article only discusses data processing technology, and does not constitute any capital investment advice! (Don't blame me for the headline party)
In addition, I am afraid that everyone will not use it, so I have prepared the written one for everyone, download it and open it to use!
The source code is placed on the Baidu cloud disk. You need to scan the QR code of the CSDN official certification below on WeChat to receive it for free.

Guess you like

Origin blog.csdn.net/m0_59162248/article/details/129746397