Hands-Natural-language-processing-python 1: NLTK - 代码天地

Hands-Natural-language-processing-python 1: NLTK

其他 2018-12-23 20:16:32 阅读次数: 0

版权声明：本文为博主原创文章，未经博主允许不得转载。 https://blog.csdn.net/QFire/article/details/84862942

基本用法：

>>> from nltk.tokenize import word_tokenize as wtoken
>>> wtoken(samples_tw[20])
>>> from nltk.stem import PorterStemmer
>>> stemming = PorterStemmer()
>>> stemming.stem('enjoying')
'enjoy'
>>> stemming.stem('enjoys')
'enjoy'
>>> stemming.stem('enjoyable')
'enjoy'
>>> from nltk.corpus import stopwords
>>> sw_l = stopwords.words('english')
>>> sw_l[20:40]
['himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this']
>>> example_text = "This is an example sentence to test stopwords"
>>> example_text_without_stopwords = [word for word in example_text.split() if word not in sw_l]
>>> example_text_without_stopwords
['This', 'example', 'sentence', 'test', 'stopwords']

>>> from nltk.corpus import webtext
>>> webtext_sentences = webtext.sents('firefox.txt')
>>> webtext_words = webtext.words('firefox.txt')
>>> len(webtext_sentences)
1142
>>> len(webtext_words)
102457
>>> vocabulary = set(webtext_words)
>>> len(vocabulary)
8296
>>> frequency_dist = nltk.FreqDist(webtext_words)
>>> sorted(frequency_dist, key=frequency_dist.__getitem__, reverse=True)[0:30]
['.', 'in', 'to', '"', 'the', "'", 'not', '-', 'when', 'on', 'a', 'is', 't', 'and', 'of', '(', 'page', 'for', 'with', ')', 'window', 'Firefox', 'does', 'from', 'open', ':', 'menu', 'should', 'bar', 'tab']
>>> large_words = dict([(k,v) for k,v in frequency_dist.items() if len(k)>3])
>>> frequency_dist = nltk.FreqDist(large_words)
>>> frequency_dist.plot(50, cumulative=False)

wcloud = WordCloud().generate_from_frequencies(frequency_dist)
import matplotlib.pyplot as plt
plt.imshow(wcloud, interpolation='bilinear')
<matplotlib.image.AxesImage object at 0x000000000DED65F8>
plt.axis('off')
(-0.5, 399.5, 199.5, -0.5)
plt.show()

猜你喜欢

转载自blog.csdn.net/QFire/article/details/84862942

Hands-Natural-language-processing-python 1: NLTK

NLTK：Natural Language Toolkit

Python NLTK——python与nltk配置

Note - Natural Language Processing with Python (Chapter1)

python 自然语言处理Natural language toolkit (NLTK)

python nltk nltk_data安装

python安装nltk

python中NLTK的安装

python NLTK安装

python的nltk库

（Python pip）import nltk

NLTK

NLTK（Natural language toolkit）使用方法总结

【Kaggle微课程】Natural Language Processing - 1. Intro to NLP

Python自然语言处理-学习笔记(1)——nltk入门常用函数

Python安装nltk使用Ngram

python3安装nltk

Python NLTK 入门教程

Python数据分析：NLTK

python文本分析-NLTK安装

python+nltk安装+jieba分词安装

nltk：python自然语言处理一

nltk:python自然语言处理二

python自然语言处理-—安装NLTK

Python NLTK学习5（词性标注）

windows7 安装python +nltk

NLTK在python3中的变化

Python||报错：ModuleNotFoundError: No module named ‘nltk‘

Python文本分析（NLTK,jieba,snownlp）

使用nltk处理中文语料（1）- 统计相关

今日推荐

NetBSD 禁止提交由 AI 生成的代码

Apache Doris 2.0.10 版本正式发布！

开源日报 | 大模型开战；大模型独角兽被曝卖身；周鸿祎建议谷歌开源所有产品；最大开源AI社区提供1000万美元共享GPU

开源日报 | Chrome内置Gemini的意义不在于Gemini；中国AI追随之路的五大误区；ECharts创始人“下海”养鱼；谷歌I/O开发者大会什么都有，只是没有惊喜

微软回应中国区AI团队“打包赴美”传闻

基于大语言模型的开源知识库问答系统 MaxKB GitHub Star 数量突破 5,000 个！

周排行

女程序员是这样被恶搞的

B/S 和 C/S 的优缺点

vector一直申请会怎样？

座头鲸识别比赛(Humpback Whale Identification)总结

Linux高性能服务器编程——I/O复用 select

Mysql连接数据库（当包使用）

通过URI获取的文件路径为null的解决方法

1022-Primes on Interval(素数筛选+二分查找) ZCMU

Python出现： TypeError: expected string or buffer

bzoj2434: [Noi2011]阿狸的打字机 ac自动机+树状数组

每日归档

更多

2024-05-18(4)

2024-05-17(34)

2024-05-16(6)

2024-05-15(24)

2024-05-14(0)

2024-05-13(18)

2024-05-12(0)

2024-05-11(38)

2024-05-10(38)

2024-05-09(35)