Article Directory
Article Directory
retrieve data
library(jsonlite)
url <- 'https://view.inews.qq.com/g2/getOnsInfo?name=wuwei_ww_time_line'
pre <- fromJSON(url)
data <- fromJSON(pre$data)
Stop lexicon
link: https: //pan.baidu.com/s/1m5lC6Ld-Fu5_YZtLzqQNGw
extraction code: 2e3i
Frequencies
library(jiebaR)
library(ggplot2)
library(ggthemes)
#删除数字英文字母
data$desc <- gsub('[<U+0-9A-Z>]','',data$desc)
#导入停词库
wk <- worker(stop_word = 'c:/Users/wisonmon/Desktop/stop.txt')
#切词
seg <- segment(data$desc,wk)
#词频
count <- freq(seg)
#排名前20词
kw <- count[order(-count$freq),][1:20,]
kw
char freq
815 病例 395
1189 确诊 264
766 新型 258
782 冠状病毒 252
821 感染 225
883 肺炎 222
904 新增 195
1897 出院 138
684 累计 105
374 报告 103
1331 患者 101
1094 治愈 64
992 疫情 62
1209 武汉 58
1959 医院 58
1476 死亡 49
736 重症 43
933 治疗 32
1061 湖北省 29
1161 隔离 29
#绘图
ggplot(kw) +
aes(x = reorder(char,freq), weight = freq) +
geom_bar(fill = "#0c4c8a") +
labs(x = "keywords", y = "count", title = "武汉疫情关键词", caption = " ") +
coord_flip() +
theme_minimal()
Word cloud
Text for the Kanda Shan, Minato words requirements
We can get critical information from the word cloud, to know which is the focus vocabulary. It can also detect a secondary key information is difficult to focus excessively subjective insight or information ignored. With these words, and explore related information, the formation of a more complete overall awareness event.
Word cloud can also use a different picture as the background, we are interested can try.
#随机生成词云
library(wordcloud2)
wordcloud2(count,minSize = 3)