python 调用 java 的 ansj_seg 分词工具

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/zjm750617105/article/details/82634571

解决方案链接: https://github.com/NLPchina/ansj_seg/issues/681

提供给对于 java不熟, 不想用jiaba分词, 对 ansj_seg 念念不忘的同学们一个 python 一个解决方案:
环境: python2.7 jdk1.8.0_161 tree_split-1.5.jar, nlp-lang-1.7.7.jar和 ansj_seg-5.1.6.jar
对于环境, 虽然文档写的jdk1.6, 但是可能是之前的版本, 看最新的jar包的mainfest文件中有jdk的最新版本.
踩了很多坑, 刚开始一直找不到类, 弄了大半天, 各种试jdk版本.
第一注意jdk版本:
第二注意所有依赖包, 看源码发现会引用nlp-lang里面的类
第三注意函数是调用父类 Analysis的方法名称.

#!coding=utf-8
import jpype
import os
jvmPath = '/usr/lib/java/jdk1.8.0_161/jre/lib/amd64/server/libjvm.so'
print jvmPath

jars_dir = '/mnt/data/pretrained_models/word2vec_models/jars4ansj'
jars = [os.path.join(jars_dir, 'ansj_seg-5.1.6.jar'), os.path.join(jars_dir, 'nlp-lang-1.7.7.jar'), os.path.join(jars_dir, 'tree_split-1.5.jar')]
jvm_cp = "-Djava.class.path={}".format(':'.join(jars))
jpype.startJVM(jvmPath, "-ea", jvm_cp)
SegModel = jpype.JClass('org.ansj.splitWord.analysis.ToAnalysis')
jd = SegModel()
print(jd.parseStr("怎么这么麻烦"))

jpype.shutdownJVM()

返回信息:
/usr/lib/java/jdk1.8.0_161/jre/lib/amd64/server/libjvm.so
Sep 11, 2018 11:12:25 PM org.ansj.util.MyStaticValue warn
WARNING: not find library.properties in classpath use it by default !
Sep 11, 2018 11:12:25 PM org.ansj.dic.impl.File2Stream info
INFO: path to stream library/ambiguity.dic
Sep 11, 2018 11:12:25 PM org.ansj.library.AmbiguityLibrary error
SEVERE: Init ambiguity library error :org.ansj.exception.LibraryException: path :library/ambiguity.dic file:/home/jinmming/git_manager/paraphrase/bimpm/test_units/library/ambiguity.dic not found or can not to read, path: library/ambiguity.dic
Sep 11, 2018 11:12:25 PM org.ansj.dic.impl.File2Stream info
INFO: path to stream library/default.dic
Sep 11, 2018 11:12:25 PM org.ansj.library.DicLibrary error
SEVERE: Init dic library error :org.ansj.exception.LibraryException: path :library/default.dic file:/home/jinmming/git_manager/paraphrase/bimpm/test_units/library/default.dic not found or can not to read, path: library/default.dic
Sep 11, 2018 11:12:25 PM org.ansj.library.DATDictionary info
INFO: init core library ok use time : 572
Sep 11, 2018 11:12:25 PM org.ansj.library.NgramLibrary info
INFO: init ngram ok use time :287
怎么/r,这么/r,麻烦/an
JVM has been shutdown

猜你喜欢

转载自blog.csdn.net/zjm750617105/article/details/82634571