ik parser
1. ik parser
The IK Analysis plugin integrates Lucene IK analyzer (http://code.google.com/p/ik-analyzer/) into elasticsearch, support customized dictionary.
Analyzer: ik_smart, ik_max_word, Tokenizer: ik_smart, ik_max_word
Documentation: https://github.com/medcl/elasticsearch-analysis-ik
1.1. Download and install configuration
Publishing pages https://github.com/medcl/elasticsearch-analysis-ik/releases
Find the corresponding version here is 7.3.1, download;
cd your-es-root / plugins / && mkdir ik # Create a directory ik
unzip plugin to folder your-es-root / plugins / I # 解压 到 I
installation
Ik directory to extract to
test
rv = es.cat.plugins(v=True)
Q. (Rw)
name component version
** analysis I-7.3.1
2. Test segmentation effect
Code
# Participle
def test1():
# Ik word test results
d3 = {
"Text": "The world is can be recognized, knowledge is a dialectical process of development."
,"analyzer":"standard"
}
# Tokenizer
ana = ["standard", "ik_smart", "ik_max_word"]
for _ in ana:
d3["analyzer"] = _
rv = es.indices.analyze(body=d3, format="text")
print (_ + "word result:", [x [ "token"] for x in rv [ "tokens"]]) # d1 segmentation results
test1 ()
result:
standard segmentation results: [ 'World', 'sector', 'a', 'available', 'with', 'by', 'recognize', 'know', 'a', 'recognize', 'know', ' a ',' a ',' a ',' debate ',' card ',' send ',' development ',' a ',' over ',' away ']
ik_smart word Results: [ 'world', 'yes',' can ',' is', 'understanding', 'of', 'understanding', 'yes',' a ',' dialectical ',' development ',' the process of']
ik_max_word word Results: [ 'world', 'yes',' can ',' is', 'understanding', 'of', 'understanding', 'yes',' a ',' a ',' a ',' dialectical ',' development ',' a ',' process']