准备好文件total_word_feature_extractor_zh.dat,将它放到指定目录。
专题中上一篇文章已经提过利用python setup.py install来安装的问题,还是逐一安装感觉更可控。需要安装的包有mitie。
pip install mitie报错:
Building wheel for mitie (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: 'd:\programs\python\python38\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\Administrator\\AppData\\Local\\Temp\\pip-install-2kau9q3r\\mitie_ce10b05bf4a748c7b8d0ddce9b0d68b0\\setup.py'"'"'; __file__='"'"'C:\\Users\\Administrator\\AppData\\Local\\Temp\\pip-install-2kau9q3r\\mitie_ce10b05bf4a748c7b8d0ddce9b0d68b0\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d 'C:\Users\Administrator\AppData\Local\Temp\pip-wheel-1fhfv6eq'
cwd: C:\Users\Administrator\AppData\Local\Temp\pip-install-2kau9q3r\mitie_ce10b05bf4a748c7b8d0ddce9b0d68b0\
Complete output (40 lines):
running bdist_wheel
running build
Traceback (most recent call last):
File "C:\Users\Administrator\AppData\Local\Temp\pip-install-2kau9q3r\mitie_ce10b05bf4a748c7b8d0ddce9b0d68b0\setup.py", line 44, in get_cmake_version
out = subprocess.check_output(['cmake', '--version'])
File "d:\programs\python\python38\lib\subprocess.py", line 411, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "d:\programs\python\python38\lib\subprocess.py", line 489, in run
with Popen(*popenargs, **kwargs) as process:
File "d:\programs\python\python38\lib\subprocess.py", line 854, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "d:\programs\python\python38\lib\subprocess.py", line 1307, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] 系统找不到指定的文件。
使用命令pip install git+https://github.com/mit-nlp/MITIE.git进行安装感觉更加靠谱。不过安装得非常缓慢,会长时间卡住没有反应:
C:\Users\Administrator>pip install git+https://github.com/mit-nlp/MITIE.git
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting git+https://github.com/mit-nlp/MITIE.git
Cloning https://github.com/mit-nlp/MITIE.git to c:\users\administrator\appdata\local\temp\pip-req-build-bgrgbbiq
Running command git clone -q https://github.com/mit-nlp/MITIE.git 'C:\Users\Administrator\AppData\Local\Temp\pip-req-build-bgrgbbiq'
虽然在控制台上看不到下载进度,不过进入对应的目录通过看属性的文件大小还是能够发现正在下载。第一次下载到12分钟失败,并不支持断点续传。后面连续尝试多次皆失败。于是尝试从github上下载,https://github.com/mit-nlp/MITIE,下载解压后在路径下cmd,然后再运行python setup.py install。这种方法在windows下也会报错:
running install
running build
Traceback (most recent call last):
File "setup.py", line 44, in get_cmake_version
out = subprocess.check_output(['cmake', '--version'])
File "D:\Programs\Python\Python38\lib\subprocess.py", line 411, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "D:\Programs\Python\Python38\lib\subprocess.py", line 489, in run
with Popen(*popenargs, **kwargs) as process:
File "D:\Programs\Python\Python38\lib\subprocess.py", line 854, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "D:\Programs\Python\Python38\lib\subprocess.py", line 1307, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] 系统找不到指定的文件。
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "setup.py", line 51, in <module>
setup(
File "D:\Programs\Python\Python38\lib\distutils\core.py", line 148, in setup
dist.run_commands()
File "D:\Programs\Python\Python38\lib\distutils\dist.py", line 966, in run_commands
self.run_command(cmd)
File "D:\Programs\Python\Python38\lib\distutils\dist.py", line 985, in run_command
cmd_obj.run()
File "D:\Programs\Python\Python38\lib\distutils\command\install.py", line 545, in run
self.run_command('build')
File "D:\Programs\Python\Python38\lib\distutils\cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "D:\Programs\Python\Python38\lib\distutils\dist.py", line 985, in run_command
cmd_obj.run()
File "setup.py", line 16, in run
if LooseVersion(self.get_cmake_version()) < '3.1.0':
File "setup.py", line 47, in get_cmake_version
", ".join(e.name for e in self.extensions))
File "D:\Programs\Python\Python38\lib\distutils\cmd.py", line 103, in __getattr__
raise AttributeError(attr)
AttributeError: extensions
看来在windows下安装MITIE之前还得将一些环境配好,包括VS里的cl、boost以及cmake。
安装boost之前需要验证vs的环境变量:把vs安装路径E:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29333\bin\Hostx64\x64加入环境变量,并在该路径下cmd,键入‘cl’不报错则配置成功。
安装boost:cd 到目录E:\develop-environment\boost_1_67_0\tools\build下面执行bootstrap.bat。然而还是报错:Failed to bootstrap the build engine Please consult bootstrap.log for furter diagnostics.
于是到boost官网上去找到了更新的版本:boost_1_75_0。cd 到目录E:\develop-environment\boost_1_75_0\tools\build下面执行bootstrap.bat,OK。
接下来cd到E:\项目备份\Rasa_NLU_Chi\MITIE-master\examples中运行python setup.py install就可以通过了。
如果是在pycharm中运行train.py,运行参数得设对:-c ../sample_configs/config_jieba_mitie_sklearn.yml --data ../data/examples/rasa/demo-rasa_zh.json --path models --project Rasa_NLU_Chi。models的路径和train.py同级。../表示train.py的上级目录。在windows下由于路径和linux不同,还得在配置文件的相关地方改一下路径,如下图所示:
修改完后就可以开始进行奇妙的训练之旅了,运行结果如下:
Training to recognize 2 labels: 'food', 'disease'
Part I: train segmenter
words in dictionary: 200000
num features: 271
now do training
C: 20
epsilon: 0.01
num threads: 1
cache size: 5
max iterations: 2000
loss per missed segment: 3
C: 20 loss: 3 0.444444
C: 35 loss: 3 0.444444
C: 20 loss: 4.5 0.555556
C: 5 loss: 3 0.444444
C: 20 loss: 1.5 0.444444
C: 20 loss: 6 0.555556
C: 20 loss: 5.25 0.555556
C: 21.5 loss: 4.65 0.555556
C: 16.9684 loss: 4.72073 0.555556
C: 18.2577 loss: 4.43072 0.555556
C: 18.2131 loss: 4.55681 0.555556
C: 20 loss: 4.4 0.555556
C: 20.9694 loss: 4.47547 0.555556
best C: 20
best loss: 4.5
num feats in chunker model: 4095
train: precision, recall, f1-score: 1 1 1
Part I: elapsed time: 3 seconds.
Part II: train segment classifier
now do training
num training samples: 9
C: 200 f-score: 1
C: 400 f-score: 1
C: 300 f-score: 1
C: 100 f-score: 1
C: 0.01 f-score: 1
C: 50.005 f-score: 1
C: 25.0075 f-score: 1
C: 12.5088 f-score: 1
C: 6.25938 f-score: 1
C: 3.13469 f-score: 1
C: 1.57234 f-score: 1
C: 0.791172 f-score: 1
C: 0.400586 f-score: 1
best C: 0.791172
test on train:
3 0
0 6
overall accuracy: 1
Part II: elapsed time: 3 seconds.
df.number_of_classes(): 2
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Fitting 2 folds for each of 6 candidates, totalling 12 fits
[Parallel(n_jobs=1)]: Done 12 out of 12 | elapsed: 0.0s finished