中国新冠疫情大数据展示系统


项目展示
中国新冠疫情大数据展示系统
项目花费7天时间完成,页面中实现了六个功能:左侧上下两个功能分别是 各地区风险评定等级百度今日热搜,中部上下两个功能分别是 总数据展示实时数据地图,右侧上下两个功能分别是 各省累计确诊排名今日发生疫情省份统计,这些数据的来源是通过爬虫去各个网站进行爬取,一共爬取的三个网站,接下来我将讲述如何去爬取这六个功能的数据来源,并且会附上源码供大家复制到自己项目中学习。

python爬虫

python爬虫爬取网易新闻疫情数据

咱爬取网易新闻疫情数据的网址如下:https://wp.m.163.com/163/page/news/virus_report/index.html?nw=1&anw=1 ,这个网址返回的是json文件数据,能在F12的Network中查看得到,很方便的可以将json文件处理转换成csv文件。代码如下:

爬取网易新闻疫情数据放入csv文件中:

import pandas as pd
import requests
headers={
    'user-agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36'
}
url='https://c.m.163.com/ug/api/wuhan/app/data/list-total?'
res=requests.get(url,headers=headers)
import json
data_json=json.loads(res.text)
data=data_json['data']
data_province=data['areaTree'][2]['children']
print(data_province)
free_data=pd.DataFrame(data_province)[['id','lastUpdateTime','name']]
today_data=pd.DataFrame([province['today'] for province in data_province])
total_data=pd.DataFrame([province['total'] for province in data_province])
today_data.columns=("today_"+i for i in today_data.columns)
total_data.columns=("total_"+i for i in total_data.columns)
China_data=pd.concat([free_data,today_data,total_data],axis=1)
import time
# file_name='今天中国各省'+'_'+time.strftime('%Y_%m_%d',time.localtime(time.time()))+'.csv'
file_name='C:/Users/DELL/Desktop/machine_learning/COVID-19_crawler/china_today_data'+'.csv'
China_data.to_csv(file_name,index=None,encoding='utf_8_sig')
print("中国的各省疫情数据保存成功啦!")

代码思路是,获取到网页返回的json数据,我们利用python的类进行数据的处理,得到我们想要的数据。

接下来是的是将csv文件数据导入mysql中:

import pymysql
import csv
import codecs

def get_conn():
    conn = pymysql.connect(host='localhost', port=3307, user='root', passwd='123456', db='china_covid-19', charset='utf8')
    return conn

def insert(cur, sql, args):
    cur.execute(sql, args)

def read_csv_to_mysql(filename):
    with codecs.open(filename=filename, mode='r', encoding='utf-8') as f:
        reader = csv.reader(f)
        head = next(reader)
        conn = get_conn()
        cur = conn.cursor()
        sql = 'insert into china_today_data values(%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)'
        cur.execute("Delete from china_today_data where 1=1")
        conn.commit()
        for item in reader:
            args = tuple(item)
            insert(cur, sql=sql, args=args)

        conn.commit()
        cur.close()
        conn.close()

if __name__ == '__main__':
    read_csv_to_mysql('C:/Users/DELL/Desktop/machine_learning/COVID-19_crawler/china_today_data.csv')

我的网易新闻疫情数据的mysql表结构:

中国今日疫情数据表
中国今日数据表结构
中国今日疫情数据表结构是:一共有18个字段,每个字段类型皆为varchar。原因是从csv文件转存进mysql的过程都是默认以varchar的形式。

python爬虫爬取各地区风险评定等级数据

咱爬取各地区风险评定等级的网址如下:http://www.gd.gov.cn/gdywdt/zwzt/yqfk/content/post_3021711.html ,这个网址的接口稳定不变,我们可以稳定爬取该网站数据,代码如下:

爬取各地区风险评定等级数据

#Python爬取中高风险地区名单代码
import requests
from bs4 import BeautifulSoup
import pandas as pd

def getHTML(url):
    try:
        r = requests.get(url, timeout=30)
        r.raise_for_status()
        r.encoding = r.apparent_encoding
        return r.text
    except:
        return ""


def getContent(url):
    html = getHTML(url)
    soup = BeautifulSoup(html, 'html.parser')
    paras_tmp = soup.select('.zw-title')+soup.select('p')
    paras = paras_tmp[0:]
    return paras


def saveFile(text):

    # datetimes = time.strftime("%Y-%m-%d",time.localtime(time.time()))
    fname = "C:/Users/DELL/Desktop/machine_learning/COVID-19_crawler/"+r"Webdata.txt"
    f = open(fname, 'w')
    for t in text:
        if len(t) > 0:
            f.writelines(t.get_text() + "\n\n")
    f.close()

def saveCSV(name,list):
    test = pd.DataFrame(columns=name, data=list)  # 数据有三列,列名分别为one,two,three
    # datetimes = time.strftime("%Y-%m-%d", time.localtime(time.time()))
    fname = "C:/Users/DELL/Desktop/machine_learning/COVID-19_crawler/"+r"list_of_grade_risk_areas.csv"
    test.to_csv(fname, encoding='utf-8')

def main():
    url = 'http://www.gd.gov.cn/gdywdt/zwzt/yqfk/content/mpost_3021711.html'
    text = getContent(url)
    saveFile(text)

    # 打开文本文件
    fp = open('C:/Users/DELL/Desktop/machine_learning/COVID-19_crawler/Webdata.txt', 'r')
    # 使用readlines读取
    lines = fp.readlines()

    ss2 = []
    for line in lines:
        # 将读取的每行内容过滤掉换行符,如果不加这个条件,输入的内容中将会添加换行符\n
        line = line.strip('\n')
        line = line.strip('\u3000')
        ss = line.split('\n') # 将每行内容根据=分割

        if(ss != ['']):
            ss2.append(ss)
    while [] in ss2:
        ss2.remove([''])
    ss2.pop()
    name=['title','high_risk_areas','high_areas','low_risk_areas','low_areas']
    new_ss2=[];
    list=[];
    for i in range(0,len(ss2)):
        new_ss2.append(ss2[i][0])
    list.append(new_ss2)
    saveCSV(name,list)
    fp.close()


main()

该爬虫的思路是,直接爬取网页中展示出来的所有HTML内容,再从中筛选出来我们想要的字段。将筛选出来的字段存入txt文件中,接着再从txt文件中读取数据转存进csv文件中。

接下来是的是将csv文件数据导入mysql中:

import pymysql
import csv
import codecs

def get_conn():
    conn = pymysql.connect(host='localhost', port=3307, user='root', passwd='123456', db='china_covid-19', charset='utf8')
    return conn

def insert(cur, sql, args):
    cur.execute(sql, args)

def read_csv_to_mysql(filename):
    with codecs.open(filename=filename, mode='r', encoding='utf-8') as f:
        reader = csv.reader(f)
        head = next(reader)
        conn = get_conn()
        cur = conn.cursor()
        sql = 'insert into list_of_grade_risk_areas values(%s,%s,%s,%s,%s,%s)'
        cur.execute("Delete from list_of_grade_risk_areas where 1=1")
        conn.commit()
        for item in reader:
            args = tuple(item)
            insert(cur, sql=sql, args=args)

        conn.commit()
        cur.close()
        conn.close()

if __name__ == '__main__':
    read_csv_to_mysql('C:/Users/DELL/Desktop/machine_learning/COVID-19_crawler/list_of_grade_risk_areas.csv')

我的各地区风险评定等级的mysql表结构:

各地区风险评定等级的mysql表
各地区风险评定等级的mysql表结构
上面就是我使用爬虫爬取各地区风险评定等级的过程,记录了爬虫爬取数据,以及将数据转换成csv文件存储进mysql中。希望能对你有一些小小的帮助,一起努力一起进步!

python爬虫爬取百度今日热搜榜

咱爬百度热搜榜单的网址如下:http://top.baidu.com/buzz?b=1&fr=20811,这个网址的接口稳定不变,咱写好一次程序就可以安然的爬取数据就行。代码如下:

爬取百度热搜榜数据放入csv文件中:

# 爬取百度热搜榜数据放入csv文件中
# 导入相关库
from bs4 import BeautifulSoup
import pandas as pd
import requests
import time


def get_html(url, headers):
    r = requests.get(url, headers=headers)
    r.encoding = r.apparent_encoding
    return r.text


def get_pages(html):
    soup = BeautifulSoup(html, 'html.parser')
    all_topics = soup.find_all('tr')[1:]
    data=[];
    for each_topic in all_topics:
        topic_times = each_topic.find('td', class_='last')  # 搜索指数
        topic_rank = each_topic.find('td', class_='first')  # 排名
        topic_name = each_topic.find('td', class_='keyword')  # 标题目
        if topic_rank != None and topic_name != None and topic_times != None:
            topic_rank = each_topic.find('td', class_='first').get_text().replace(' ', '').replace('\n', '')
            topic_name = each_topic.find('td', class_='keyword').get_text().replace(' ', '').replace('\n', '')
            topic_times = each_topic.find('td', class_='last').get_text().replace(' ', '').replace('\n', '')
            # print('排名:{},标题:{},热度:{}'.format(topic_rank,topic_name,topic_times))
            tplt = "排名:{0:^4} 标题:{1:{3}^15} 热度:{2:^8}"
            data.append(tplt.format(topic_rank, topic_name, topic_times, chr(12288)))
    test = pd.DataFrame(columns=['news'], data=data)  # 数据有三列,列名分别为one,two,three
    # datetimes = time.strftime("%Y-%m-%d", time.localtime(time.time()))
    # fname = "./" + datetimes + r"baidu_hot_search.csv"
    fname = "C:/Users/DELL/Desktop/machine_learning/COVID-19_crawler/" + r"baidu_hot_search.csv"
    test.to_csv(fname, encoding='utf-8')

def main():
    # 百度热点排行榜单链接
    url = 'http://top.baidu.com/buzz?b=1&fr=20811'
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36'}
    html = get_html(url, headers)
    get_pages(html)


if __name__ == '__main__':
    main()

代码的思路是,先爬取到网页上的数据,将其转换成json格式,然后将json格式的数据转存进csv文件中,csv文件可以很方便的导入mysql中使csv文件数据能够直接变成mysql数据库数据。

接下来是的是将csv文件数据导入mysql中:

#将csv文件导入到mysql中
import pymysql
import csv
import codecs


def get_conn():
    conn = pymysql.connect(host='localhost', port=3307, user='root', passwd='123456', db='china_covid-19', charset='utf8')
    return conn


def insert(cur, sql, args):
    cur.execute(sql, args)


def read_csv_to_mysql(filename):
    with codecs.open(filename=filename, mode='r', encoding='utf-8') as f:
        reader = csv.reader(f)
        head = next(reader)
        conn = get_conn()
        cur = conn.cursor()
        sql = 'insert into baidu_hot_search values(%s,%s)'
        cur.execute("Delete from baidu_hot_search where 1=1")
        conn.commit()
        for item in reader:
            args = tuple(item)
            insert(cur, sql=sql, args=args)

        conn.commit()
        cur.close()
        conn.close()


if __name__ == '__main__':
    read_csv_to_mysql('C:/Users/DELL/Desktop/machine_learning/COVID-19_crawler/baidu_hot_search.csv')

我的百度热搜榜的mysql表结构:

百度热搜榜mysql表结构1
百度热搜mysql表结构2
至此我介绍了编写爬虫去爬取百度热搜榜的数据并将其放入csv文件中,还介绍了如何将csv文件数据存储进mysql中,你的mysql表结构都建成跟我一样的才能够实现数据的导入,不然你得修改代码才能够实现导入。希望我能为你解惑,感谢你的阅读!

总结:

本篇文章中介绍了如何利用爬虫爬取网页数据,并且将其放入mysql数据库中。中国新冠疫情大数据展示系统我后端我利用的技术是php,前端我是用的是jquery+Echarts进行大屏展示,前后端交互我使用的是ajax。上面的代码足够你爬到想要的数据了,后端你还可以选择JAVA之类的来实现,数据就是石油,有了石油,小车就靠你们自己去造了!希望本篇文章能对你有小小的帮助,我是黑马Jack,编程这条路上我们一起披荆斩棘,不断前进!
- -黑马Jack

猜你喜欢

转载自blog.csdn.net/m0_46991388/article/details/114827023