基于python3.9.2环境CentOS7安装pycorrector

目录

pycorrector安装

gcc安装

kenlm安装

测试文本纠错


pycorrector安装

pip3 install pycorrector

gcc安装

安装时会询问是否同意,输入 y 即可

yum install gcc-c++

kenlm安装

pip3 install https://github.com/kpu/kenlm/archive/master.zip

问题:

  1. 运行程序,自动语言模型下载到~/.pycorrector/datasets/zh_giga.no_cna_cmn.prune01244.klm
  2. 下载很慢,最终下载失败 ConnectionResetError: [Errno 104] Connection reset by peer
  3. 规则方法默认会从路径~/.pycorrector/datasets/zh_giga.no_cna_cmn.prune01244.klm加载kenlm语言模型文件,如果检测没有该文件,则程序会自动联网下载。当然也可以手动下载模型文件(2.8G)并放置于该位置。

解决方法:

  1. kenlm的模型文件默认是存放在目录 ~/.pycorrector/datasets 下面
  2. 直接手动将zh_giga.no_cna_cmn.prune01244.klm拷贝到该目录下即可

测试文本纠错

[root@cbz01 local]# python
Python 3.9.2 (default, Mar 18 2021, 09:43:37) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pycorrector
>>> corrected_sent, detail = pycorrector.correct('少先队员因该为老人让坐')
[  DEBUG 20210318 02:27:48 detector:  88] Loaded language model: /root/.pycorrector/datasets/zh_giga.no_cna_cmn.prune01244.klm, spend: 0.504 s.
[  DEBUG 20210318 02:27:58 detector: 107] Loaded dict file, spend: 9.650 s.
>>> print(corrected_sent, detail)
少先队员应该为老人让座 [['因该', '应该', 4, 6], ['坐', '座', 10, 11]]
>>> 

猜你喜欢

转载自blog.csdn.net/gnwu1111/article/details/114978306