1. Introduction to nltk
NLTK (Natural Language Toolkit) is a Python library for natural language processing and text analysis.
NLTK supports many natural language processing tasks, such as text classification, syntax analysis, part-of-speech tagging, text corpus processing, and more.
2. nltk installation
pip install nltk
3. nltk_data installation
wget https://gitcode.net/mirrors/nltk/nltk_data/-/archive/gh-pages/nltk_data-gh-pages.zip
unzip nltk_data-gh-pages.zip
4. View the file retrieval path
Create a new py file:
import nltk
nltk.data.find('.')
execute program:
5. Put the thesaurus in the search path
You can put the files under the packages path in any path where the above program reports an error.
cp -R nltk_data-gh-pages/packages/* /root/nltk_data/
Note: The next step is very important! ! !
Find the directory where punkt is located in nltk-data:
Compress the punkt.zip archive, and then delete it!
6. nltk library test
Python sample code:
import nltk
# 下载词性标注器
#nltk.download('averaged_perceptron_tagger')
text = "I love natural language processing"
tokens = nltk.word_tokenize(text)
tags = nltk.pos_tag(tokens)
# 输出分类结果
for word, pos in tags:
print(word, pos)
Reference blog post: