本项目旨在实现词云的可视化,适用英文、中文、中文去除停留词(采用哈工大停留词表)和定制形状情况下的词云生成。
工具:python3.7 + Jupyter
1. 英文词云
效果图:
代码实现:
import matplotlib.pyplot as plt
from wordcloud import WordCloud
mytext = open('text\en-demo.txt',encoding='utf-8').read()
wordcloud = WordCloud().generate(mytext)
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
2. 中文词云,未去除停留词
效果图:
代码实现:
import matplotlib.pyplot as plt
from wordcloud import WordCloud
import jieba #中文分词
mytext = open('text\ch-demo.txt',encoding='utf-8').read()
mytext = " ".join(jieba.cut(mytext))
wordcloud = WordCloud(font_path="text\simsun.ttf").generate(mytext)
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
3. 中文词云,已去除停留词
效果图:
代码实现:
扫描二维码关注公众号,回复:
13138532 查看本文章

from wordcloud import WordCloud
import jieba
#读取文本
mytext = open('text\ch-demo.txt',encoding='utf-8').read()
#未去停用词的分词
mytext = " ".join(jieba.cut(mytext))
#停留词ch_stopwords.txt采用哈工大停留词表
w = WordCloud(width=500,
height=400,
background_color='black',
font_path='msyh.ttc',
stopwords=[line.strip() for line in open('text\ch_stopwords.txt', encoding='UTF-8').readlines()]).generate(mytext)
w.to_file('output\ch_output.png')
4. 任意形状词云
效果图:
代码实现:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
import numpy as np
import jieba
import imageio
import re
import PIL
image1= PIL.Image.open('text\horse.png')
mk = np.array(image1)
#读取文本
mytext = open('text\ch-demo.txt',encoding='utf-8').read()
#去除标点符号、换行符
punctuation = ',。?:、'
def removePunctuation(text):
text = re.sub(r'[{}]+'.format(punctuation),'',text)
return text.strip().lower()
#未去停用词的分词
mytext = " ".join(jieba.cut(mytext))
mytext = removePunctuation(mytext)
mytext = mytext.replace('\n', '')
w = WordCloud(width=500,
height=400,
background_color='black',
font_path='msyh.ttc',
stopwords=[line.strip() for line in open('text\ch_stopwords.txt', encoding='UTF-8').readlines()],
mask=mk).generate(mytext)
w.to_file('output\shape_output.png')