中文分词的实现以及相应模块jieba的安装

一、jieba的安装

同一般模块的安装过程,只需要在python的命令行里面输入

pip install jieba

就可以了。我的环境是anaconda3,打开anaconda的命令行就可以了。

结果如下:

(base) C:\Users\DELL>pip install jieba
Collecting jieba
  Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.urllib3.connection.VerifiedHTTPSConnection object at 0x000002456D9C4B00>, 'Connection to files.pythonhosted.org timed out. (connect timeout=15)')': /packages/71/46/c6f9179f73b818d5827202ad1c4a94e371a29473b7f043b736b4dab6b8cd/jieba-0.39.zip
  Downloading https://files.pythonhosted.org/packages/71/46/c6f9179f73b818d5827202ad1c4a94e371a29473b7f043b736b4dab6b8cd/jieba-0.39.zip (7.3MB)
    100% |████████████████████████████████| 7.3MB 14kB/s
Building wheels for collected packages: jieba
  Running setup.py bdist_wheel for jieba ... done
  Stored in directory: C:\Users\DELL\AppData\Local\pip\Cache\wheels\c9\c7\63\a9ec0322ccc7c365fd51e475942a82395807186e94f0522243
Successfully built jieba
Installing collected packages: jieba
Successfully installed jieba-0.39
You are using pip version 9.0.1, however version 10.0.1 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.

二、分词的实现

实现一个简单的微博文本分词。此处文本放置在weibo.xlsx中,代码如下:

#encoding='utf-8'
import xlrd
import jieba
data = xlrd.open_workbook("weibo.xlsx")
table = data.sheets()[0]
nrows = table.nrows
for i in range(nrows):
    cut = jieba.cut(table.row_values(i)[5])
    print(','.join(cut))

运行部分结果为:

#,微,直播,南航,运行,指挥系统,技能,大赛,决赛,#, ,第三,环节,各队,激辩,,,哥,激情,洋溢,!, ,​,​,​,​
#,微,直播,南航,运行,指挥系统,技能,大赛,决赛,#, ,第二,环节,精彩,辩论,:,雷达,不是,万能,的,,,但是,没有,雷达,则,万万不能,!, ,​,​,​,​
#,微,直播,南航,运行,指挥系统,技能,大赛,决赛,#, ,21,世纪,什么,最,贵,?, ,​,​,​,​


猜你喜欢

转载自blog.csdn.net/qq_35014850/article/details/80512378