Solr: integrate carrot2 with solr-5.1.0

I already integrated carrot2 with solr-4.x with my customerized chinese tokenizer successfully.

But I run some errors following my series of blogs http://ylzhj02.iteye.com/blog/2152348  to adopt carrot2 to solr-5.1.0

The error is

org.carrot2.util.factory.FallbackFactory; Tokenizer for Chinese Simplified (zh_cn) is not available. This may degrade clustering quality of Chinese Simplified content. Cause: java.lang.NoSuchMethodError: org.apache.lucene.analysis.Tokenizer.<init>(Ljava/io/Reader;)V

The reason is that solr-5.2.1 adopted lucene 5.1.0, however carrot2-3.10.0  used lucene 4.6.0.   So the cause is jars uncompatible.

So, the solution is to download the latest version of carrot2 #git clone git://github.com/carrot2/carrot2.git (3.11.0) the lucene version is now 5.1.0 #cd carrot2   step 1: #vi core/carrot2-util-text/src/org/carrot2/text/linguistic/DefaultTokenizerFactory.java add
import org.carrot2.text.linguistic.lucene.InokChineseTokenizerAdapter;

  change
100         map.put(LanguageCode.CHINESE_SIMPLIFIED,
101             new NewClassInstanceFactory<ITokenizer>(ChineseTokenizerAdapter.class));
 to
map.put(LanguageCode.CHINESE_SIMPLIFIED,
            new NewClassInstanceFactory<ITokenizer>(InokChineseTokenizerAdapter.class));
  step 2: #vi InokChineseTokenizerAdapter.java #cp chineseTokenizer/InokChineseTokenizerAdapter.java ./core/carrot2-util-text/src/org/carrot2/text/linguistic/lucene/ step 3: #mkdir lib/org.lionsoul.jcseg ├── build.properties
├── jcseg-core-1.9.6.jar
├── jcseg.LICENSE
└── META-INF
    └── MANIFEST.MF the file and jars is build.properties
bin.includes = META-INF/,\
               jcseg-core-1.9.6.jar,\
               jcseg.LICENSE
META-INF/MANIFEST.MF
Manifest-Version: 1.0
Bundle-ManifestVersion: 2
Bundle-Name: Jcseg Tokenizer
Bundle-SymbolicName: org.lionsoul.jcseg
Bundle-Version: 1.9.6
Bundle-ClassPath: jcseg-core-1.9.6.jar
Bundle-Vendor: INokNok Inc.
Bundle-RequiredExecutionEnvironment: JavaSE-1.6
  step 4: modify build.xml
 141   <patternset id="lib.test">
 142     <include name="core/**/*.jar" />
 143     <include name="lib/**/*.jar" />
 144     <include name="lib/org.lionsoul.jcseg/*.jar" />
 145     <exclude name="lib/org.slf4j/slf4j-nop*" />
 146     <include name="applications/carrot2-dcs/**/*.jar" />
 147     <include name="applications/carrot2-webapp/lib/*.jar" />
 148     <include name="applications/carrot2-benchmarks/lib/*.jar" />
 149   </patternset>
 
 173   <patternset id="lib.core">
 174     <include name="lib/**/*.jar" />
 175     <include name="lib/org.lionsoul.jcseg/*.jar" />
 176     <include name="core/carrot2-util-matrix/lib/*.jar" />
 177     <patternset refid="lib.core.excludes" />
 178   </patternset>
 
 180   <patternset id="lib.core.mini">
 181     <include name="lib/**/mahout-*.jar" />
 182     <include name="lib/**/jcseg*.jar" />
 183     <include name="lib/**/mahout.LICENSE" />
 184     <include name="lib/**/colt.LICENSE" />
 185     <include name="lib/**/commons-lang*" />
 186     <include name="lib/**/guava*" />
 187     <include name="lib/**/jackson*" />
 188     <include name="lib/**/lucene-snowball*" />
 189     <include name="lib/**/lucene.LICENSE" />
 190     <include name="lib/**/hppc-*.jar" />
 191     <include name="lib/**/hppc*.LICENSE" />
 192     
 193     <include name="lib/**/slf4j-api*.jar" />
 194     <include name="lib/**/slf4j-nop*.jar" />
 195     <include name="lib/**/slf4j.LICENSE" />
 196     
 197     <include name="lib/**/attributes-binder-*.jar" />
 198   </patternset>
 199   
 
 906   <target name="core" depends="jar, jar.src, lib-no-jar.flattened" description="Builds Carrot2 Java API JAR with dependencies">
 907     <delete dir="${api.dir}" failonerror="false" />
 908     <mkdir dir="${api.dir}" />
 909     <mkdir dir="${api.dir}/lib" />
 910     <mkdir dir="${api.dir}/examples" />
 911     <mkdir dir="${api.dir}/resources" />
 912 
 913     <patternset id="carrot2.required">
 914       <include name="**/jcseg*" />
 915       <include name="**/commons-lang*" />
  step 6: #ant jar #scp tmp/jar/carrot2-core-3.11.0-SNAPSHOT.jar [email protected]:/opt/solr/contrib/clustering/lib
carrot2-core-3.11.0-SNAPSHOT.jar restart solr server to test clustering  ----------------------------- An error happans
org.apache.solr.common.SolrException; null:java.lang.RuntimeException: java.lang.NoClassDefFoundError: com
/carrotsearch/hppc/ObjectHashSet
  Solution : #scp lib/com.carrotsearch.hppc/hppc-0.7.1.jar [email protected]:/opt/solr/contrib/clustering/lib/
hppc-0.7.1.jar                                        #rm -f  opt/solr/contrib/clustering/lib/hppc-0.5.2.jar                                        ------ another error is
java.lang.RuntimeException: java.lang.IllegalAccessError: class
 com.carrotsearch.hppc.ObjectHashSet cannot access its superclass com.carrotsearch.hppc.AbstractObjectCollection
  The reason is that there is an old hppc-0.5.2.jar in /opt/solr/server/webapps/solr.war  so, Solution is to #cd /opt/solr/server/solr-webapp/webapp #rm -f WEB-INF/lib/hppc-0.5.2.jar #cp hppc-0.7.1.jar   WEB-INF/lib #jar cf solr.war  ./ #mv solr.war  /opt/solr/server/webapps restart solr the error disappers        

猜你喜欢

转载自ylzhj02.iteye.com/blog/2223353
今日推荐