What is the python-jieba library and how to use it

One: Summary

Jieba is an excellent third-party library for Chinese word segmentation .

Chinese text needs to obtain individual words through word segmentation

Jieba is an excellent third-party library for Chinese word segmentation, which requires additional installation

The jieba library provides three word segmentation modes, and the simplest one only needs to install a function.

The jieba library uses the Chinese thesaurus to identify word segmentation.

The installation command is as follows:

→→→→→→→→→ Click windows+r to enter the command prompt and enter cmd. After entering the interface, enter pip install jieba . You can install it.

Two: Instructions for using the jieba library

(1) Four modes of jieba word segmentation

Exact mode, full mode, search engine mode, new participle

① jieba.cut(s) Precise mode: split the text accurately, without redundant words:

② jieba.lcut(s,cut_all=True) full mode: scan all possible words in the text, there is redundancy:

③jieba.lcut_for_search(s) search engine mode: on the basis of the precise mode, segment long words again:

④jieba.add_word(w), add a new word w to the word segmentation dictionary:

The code example is as follows:

import jieba
a=jieba.add_word("奇才队控球后卫约翰沃尔是NBA超级巨星")
b=jieba.lcut("奇才队控球后卫约翰沃尔是NBA超级巨星")print(b)

The running interface is as follows:

Key point: jieba.lcuts(s), can perform precise word segmentation on the string s, and return a list type.

————————————————

Reference article link:

https://cloud.tencent.com/developer/article/2154756

https://blog.csdn.net/weixin_61631131/article/details/124274495

Guess you like

Origin blog.csdn.net/weixin_43934631/article/details/129163373