一文搞定B站弹幕生成云图

一文搞定B站弹幕生成云图

最近学了词云图, 感觉非常有趣,我们做一些图试试, 最近我们遭受了新冠疫情 , 关于这个点,去看看b站弹幕都在说什么?

B站的弹幕接口

直接从B站某视频源中找半天没找到弹幕的接口,在网上查,发现B站弹幕的api接口如下,
‘https://api.bilibili.com/x/v1/dm/list.so?oid=’ + cid
只要获得视频的cid就直接可以拿到弹幕了,那cid就更简单了,直接打开开发者工具,网络, ctrl + F 搜索 cid ,找到一串数字的就是了

以一个疫情下的武汉视频为例,当前视频的cid 为 146308826
弹幕的url为 'https://api.bilibili.com/x/v1/dm/list.so?oid=146308826 ’ ,查看 ,确实是的

下面是爬取弹幕,以及生成云图代码

import requests
import chardet
import re
import pandas as pd
import jieba
from wordcloud import WordCloud
import matplotlib.pyplot as plt
from imageio import imread
import warnings

# 1.根据地址发起请求 ,获得弹幕数据
cid = 146308826
final_url = "https://api.bilibili.com/x/v1/dm/list.so?oid=" + str(cid)
final_res = requests.get(final_url)
final_res.encoding = chardet.detect(final_res.content)['encoding']
final_res = final_res.text
pattern = re.compile('<d.*?>(.*?)</d>') # 获取弹幕
data = pattern.findall(final_res)

# 2 持久化存储弹幕到txt文件
with open("弹幕.txt", mode="w", encoding="utf-8") as f:
              for i in data:
                  f.write(i)
                  f.write("\n")
warnings.filterwarnings("ignore")
     
 # 3 读取文本文件,并使用lcut()方法进行分词
with open("弹幕.txt",encoding="utf-8") as f:
    txt = f.read()
txt = txt.split()
data_cut = [jieba.lcut(x) for x in txt]
data_cut

# 4 读取停用词
with open(r"C:\soft\Anaconda\Lib\site-packages\wordcloud\stopwords",encoding="utf-8") as f:
    stop = f.read()
stop = stop.split()
stop = [" ","道","说道","说"] + stop

# 5 去掉停用词
s_data_cut = pd.Series(data_cut)
all_words_after = s_data_cut.apply(lambda x:[i for i in x if i not in stop])

# 6 词频统计
all_words = []
for i in all_words_after:
    all_words.extend(i)
word_count = pd.Series(all_words).value_counts()

# 7 画词云图
# 读取背景图片
back_picture = imread(r"C:\Users\yuanwanli\Desktop\dasi.jpg")
# 设置词云参数
wc = WordCloud(font_path=r"C:\soft\Anaconda\Lib\site-packages\wordcloud\STXINGKA.ttf",
               background_color="white",
               max_words=1000,
               mask=back_picture,
               max_font_size=200,
               random_state=42
              )
wc2 = wc.fit_words(word_count)
# 3)绘制词云图
plt.figure(figsize=(20,8))
plt.imshow(wc2)
plt.axis("off")
plt.show()
wc.to_file("ciyun.png")

可以看到最多的就是给武汉加油打气, 还有全国各地的美食对热干面的祝福

猜你喜欢

转载自blog.csdn.net/weixin_43705953/article/details/106973829