Summary of problems in lucene grouping query


In recent requirements, lucene's grouping query needs to be used. The existing API uses GroupingSearch query. The code is as follows:
	GroupingSearch groupingSearch = new GroupingSearch("compId");
		groupingSearch.setGroupSort(new Sort(SortField.FIELD_SCORE));
		groupingSearch.setFillSortFields(true);
		//groupingSearch.setCachingInMB(8.0, true);
		groupingSearch.setAllGroups(true);
		// groupingSearch.setAllGroupHeads(true);
		groupingSearch.setGroupDocsLimit(10);

		IndexWriterConfig config = new IndexWriterConfig(new IKAnalyzer());
		config.setOpenMode(OpenMode.CREATE_OR_APPEND);

		LogByteSizeMergePolicy mergePolicy = new LogByteSizeMergePolicy();
		Directory directory = NIOFSDirectory.open(new File("/Users/lvyanglin/searchdata/search/offlineindex").toPath()); // The new file is in a subdirectory
		mergePolicy.setMergeFactor(5);
		config.setMergePolicy(mergePolicy);
		config.setSimilarity(new ClassicSimilarity());
		IndexWriter indexWriter = new IndexWriter(directory, config);
		IndexReader reader = DirectoryReader.open(indexWriter);

		
		
		IndexSearcher isearcher = new IndexSearcher(reader);
		Query query = new TermQuery(new Term("id", "20755185"));
		
	

		TopGroups<BytesRef> result = groupingSearch.search(isearcher, query, 0, 1000);

		System.out.println("Search hit count: " + result.totalHitCount);
		System.out.println("Number of search result groups: " + result.groups.length);

		Document document;
		for (GroupDocs<BytesRef> groupDocs : result.groups) {
			System.out.println("分组:" + groupDocs.groupValue.utf8ToString());
			System.out.println("Records in group: " + groupDocs.totalHits);

			// System.out.println("groupDocs.scoreDocs.length:" +
			// groupDocs.scoreDocs.length);
			for (ScoreDoc scoreDoc : groupDocs.scoreDocs) {
				System.out.println("compId="+isearcher.doc(scoreDoc.doc).get("compId"));
			}
		}

	}


But no matter how you debug it will appear:

Exception in thread "main" java.lang.IllegalStateException: unexpected docvalues type NONE for field 'compId' (expected=SORTED). Re-index with correct docvalues type.
	at org.apache.lucene.index.DocValues.checkField(DocValues.java:212)
	at org.apache.lucene.index.DocValues.getSorted(DocValues.java:264)
	at org.apache.lucene.search.grouping.term.TermFirstPassGroupingCollector.doSetNextReader(TermFirstPassGroupingCollector.java:91)
	at org.apache.lucene.search.SimpleCollector.getLeafCollector(SimpleCollector.java:33)
	at org.apache.lucene.search.MultiCollector.getLeafCollector(MultiCollector.java:121)
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:660)
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:473)
	at org.apache.lucene.search.grouping.GroupingSearch.groupByFieldOrFunction(GroupingSearch.java:193)
	at org.apache.lucene.search.grouping.GroupingSearch.search(GroupingSearch.java:129)
	at com.shunteng.service.test.v3.GroupSearchTest2.main(GroupSearchTest2.java:79)



The problem is that the grouping sorting field must be SortedDocValuesField, because it is a long type field, so I use the SortedNumbericDocValuesField field to build an index, but the above error still occurs after running. Baidu basically can't find the cause of the problem, the official website has no explanation, and there is no example on the official website wiki. You can only debug the code to find out the problem, and see the exception that occurs in that link. The method that throws an exception directly is the checkField method in DocValues.java,

private static void checkField(LeafReader in, String field, DocValuesType... expected) {
    FieldInfo fi = in.getFieldInfos().fieldInfo(field);
    if (fi != null) {
      DocValuesType actual = fi.getDocValuesType();
      throw new IllegalStateException("unexpected docvalues type " + actual +
                                        " for field '" + field + "' " +
                                        (expected.length == 1
                                        ? "(expected=" + expected[0]
                                        : "(expected one of " + Arrays.toString(expected)) + "). " +
                                        "Re-index with correct docvalues type.");
    }
  }

Look at the following method to enter this method to throw an exception, and pharmaceuticals to enter the above method must throw an exception. So to find out why enter this method:

 public static SortedDocValues getSorted(LeafReader reader, String field) throws IOException {
    SortedDocValues dv = reader.getSortedDocValues(field);
    if (dv == null) {
      checkField(reader, field, DocValuesType.SORTED);
      return emptySorted();
    } else {
      return dv;
    }
  }



The above method getSorted is a key condition, in-depth = reader.getSortedDocValues(field); this method is as follows


@Override
   @Override
  public final SortedDocValues getSortedDocValues(String field) throws IOException {
    ensureOpen();
    Map<String,Object> dvFields = docValuesLocal.get();
    
    Object previous = dvFields.get(field);
    if (previous != null && previous instanceof SortedDocValues) {
      return (SortedDocValues) previous;
    } else {
      FieldInfo fi = getDVField(field, DocValuesType.SORTED);
      if (fi == null) {
        return null;
      }
      SortedDocValues dv = getDocValuesReader().getSorted(fi);
      dvFields.put(field, dv);
      return dv;
    }
  }


If the previous instanceof SortedDocValues ​​is established or the latter type is SortedDocValuesField, there will be no return null below. The problem is that we must use the SortedDocValuesField type field when configuring the index grouping field. You cannot use SortedNumbericDocValuesField to build an index, which is also a wonderful place for lucene.
The purpose of writing this article is to summarize the experience of finding problems recently. When using Alibaba dubbo, I also encountered a situation where I could not find information everywhere. We need to debug step by step to find the root cause of the problem. It is more reliable and faster than Baidu. quick.


Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=327041153&siteId=291194637