Wuhan epidemic pneumonia word cloud


retrieve data

library(jsonlite)
url <- 'https://view.inews.qq.com/g2/getOnsInfo?name=wuwei_ww_time_line'
pre <- fromJSON(url)
data <- fromJSON(pre$data)

Stop lexicon
link: https: //pan.baidu.com/s/1m5lC6Ld-Fu5_YZtLzqQNGw
extraction code: 2e3i

Frequencies

library(jiebaR)
library(ggplot2)
library(ggthemes)
#删除数字英文字母
data$desc <- gsub('[<U+0-9A-Z>]','',data$desc)
#导入停词库
wk <- worker(stop_word = 'c:/Users/wisonmon/Desktop/stop.txt')
#切词
seg <- segment(data$desc,wk)
#词频
count <- freq(seg)
#排名前20词
kw <- count[order(-count$freq),][1:20,]
kw
         char freq
815      病例  395
1189     确诊  264
766      新型  258
782  冠状病毒  252
821      感染  225
883      肺炎  222
904      新增  195
1897     出院  138
684      累计  105
374      报告  103
1331     患者  101
1094     治愈   64
992      疫情   62
1209     武汉   58
1959     医院   58
1476     死亡   49
736      重症   43
933      治疗   32
1061   湖北省   29
1161     隔离   29
#绘图
ggplot(kw) +
   aes(x = reorder(char,freq), weight = freq) +
   geom_bar(fill = "#0c4c8a") +
   labs(x = "keywords", y = "count", title = "武汉疫情关键词", caption = " ") +
   coord_flip() +
   theme_minimal()

Word frequency .png

Word cloud

Text for the Kanda Shan, Minato words requirements

We can get critical information from the word cloud, to know which is the focus vocabulary. It can also detect a secondary key information is difficult to focus excessively subjective insight or information ignored. With these words, and explore related information, the formation of a more complete overall awareness event.

Word cloud can also use a different picture as the background, we are interested can try.

#随机生成词云
library(wordcloud2)
wordcloud2(count,minSize = 3)

Word cloud .png


R language Speech small White speed White speed-R Language
R Understand the point R language
Welcome to Share FAVORITE

Published 38 original articles · won praise 13 · views 3160

Guess you like

Origin blog.csdn.net/renewallee/article/details/104213153