[2020年冠状病毒肺炎 - 武汉加油] 使用Logistic增长模型预测确诊病人数目

在前一段时间，我脑子一热使用指数回归以及多项式回归对新型冠状病毒2019-nCov的感染人数进行预测。后来发现不行啊，感染人数不可能一直上涨啊，总得有停止上涨的时候啊！在多名网友的提醒下，在参考了邢翔瑞大佬的博客之后，我痛定思痛，尝试使用Logistic增长模型对感染人数进行预测，结果如下图（下图是直接使用拟合得到的r值，为0.303，经过简单的测试，这个模型预测的数量偏小，如果有大佬能否指点一二）：

我们在高中生物中有学过，种群的增长曲线有分"J"型和"S"型，其中"J"型为理想型，实际中种群的数量不可能一直像指数函数那样，而是一个S型曲线，如下图所示。

这个函数的曲线和机器学习中的Sigmoid函数 $f(x)=\frac{1}{1+e^{-x}}$ 及其相似，其公式都有点像，logistic增长函数为”

$P(t) = \frac{KP_0e^{rt}}{K+P_0(e^{rt} - 1)}$

其中K为环境最大容量，P0为初始容量，r为增长速率，r越大则增长越快（即更快的逼近上限）。

该模型的微分式是： $\frac{dx}{dt}=rx(1-x)$

如果看不懂上面的内容，直接复制粘贴代码运行一下吧~（其实和之前的思路一样，使用最小二乘法拟合数据，只是拟合的函数换了而已）

from scipy.optimize import curve_fit
import urllib
import json
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import plotly.express as px
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn import linear_model
import scipy as sp
from scipy.stats import norm

def date_encode(date):
    # '01.24' -> 1 * 100 + 24 = 124
    d = date.split('/')
    month, day = int(d[0]), int(d[1])
    return 100 * month + day


def date_decode(date):
    # 124 -> '01.24'
    return '{}.{}'.format(str(date // 100), str(date % 100))
    
    
def sequence_analyse(data):
    date_list, confirm_list, dead_list, heal_list, suspect_list = [], [], [], [], []
    data.sort(key = lambda x: date_encode(x['date']))
    for day in data:
        date_list.append(day['date'])
        confirm_list.append(int(day['confirm']))
        dead_list.append(int(day['dead']))
        heal_list.append(int(day['heal']))
        suspect_list.append(int(day['suspect']))
    return pd.DataFrame({
        'date': date_list, 
        'confirm': confirm_list, 
        'dead': dead_list,
        'heal': heal_list,
        'suspect': suspect_list
    })


def get_date_list(target_month = 3):
    """
    得到从1月13日到month月最后一天的所有日期列表
    """
    month_day = [0, 31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
    cur_month, cur_day = 1, 13
    ans = []
    while cur_month <= target_month:
        while cur_day <= month_day[cur_month]:
            d = "0" + str(cur_day) if cur_day < 10 else str(cur_day)
            ans += [str(cur_month) + "/" + d]
            cur_day += 1
        cur_day = 1
        cur_month += 1
    return ans


def logistic_function(t, K, P0, r):
    exp = np.exp(r * (t - t0))
    return (K * exp * P0) / (K + (exp - 1) * P0)

# 获取实时数据
url = 'https://view.inews.qq.com/g2/getOnsInfo?name=wuwei_ww_cn_day_counts'
response = urllib.request.urlopen(url)
json_data = response.read().decode('utf-8').replace('\n','')
data = json.loads(json_data)
data = json.loads(data['data'])

df = sequence_analyse(data)
date, confirm = df['date'].values, df['confirm'].values
x = np.arange(len(x))


# 逻辑回归参数
r = 0.64
t0 = 0

# 预测未来天数
predict_days = 20

# 用最小二乘法估计拟合
popt, pcov = curve_fit(logistic_function, x, confirm)
print(popt)

#近期情况预测
predict_x = list(x) + [x[-1] + i for i in range(1, 1 + predict_days)]
predict_x = np.array(predict_x)
predict_y = logistic_increase_function(predict_x, popt[0], popt[1], popt[2])

#绘图
plt.figure(figsize=(15, 8))
plt.plot(x, confirm, 's',label="confimed infected number")
plt.plot(predict_x, predict_y, 's',label="predicted infected number")
date_labels = get_date_list(2)
plt.xticks(predict_x, date_labels[:len(predict_x) + 1], rotation=90)
plt.yticks(rotation=90)

plt.title("Logistic Fitting Curve for 2019-nCov infected numbers(Max = {},  r={:.2})".format(int(popt[0]), popt[2]), fontsize=16, fontweight="bold")
plt.xlabel('date', fontsize=14)
plt.ylabel('infected number', fontsize=14)

参考：

python 对于任意数据和曲线进行拟合并求出函数表达式的三种方案

python实现logistic增长模型拟合2019-nCov确诊人数2月1日更新

千瞱

发布了116 篇原创文章 · 获赞 109 · 访问量 21万+

私信关注

[2020年冠状病毒肺炎 - 武汉加油] 使用Logistic增长模型预测确诊病人数目

猜你喜欢