Lucene学习笔记之四：lucene入门实例

根据Lucene学习笔记之三：全文搜索详解，搭建lucene入门实例。主要内容包括建立索引和搜索，分词贯穿其中。

lucene建立索引

信息源

要采集，必须有信息源，在这里我们就以读取硬盘中一个文件夹下所有的文件（File）充当信息源。

File f= newFile("E:/lucene/example");

加工

要把采集的信息，以lucene规定的形式存放到索引库中，所以要创建相应的文档（Document）对象。在这个文档中，我们要存放哪些信息才能达到完整且辟免垃圾信息，例如网页，我们可以要存储的是他的标题、内容、URL等，那些广告是不用存储的。在这里我们用到Field来存储各项目内容。

注：Document包含一系列的(域名)Field：

List<Fieldable> fields = new ArrayList<Fieldable>();

分词

对于加工好的了文档，对其进行分词。用什么分词器呢？对英文和中文使用的分词器有可能不一样吧,这个得看后续分解了。在这里我们就用lucene提供的标准的分词器（StandardAnalyzer）
索引库

要把文档写入到索引库，并且根据分词器进行分词、建立索引，这得建索引库吧，在lucene中对应的是Directory，它可以建立在内存中，也可以建立在硬盘中。

//Directory directory=newRAMDirectory();//建立在内存中

Directorydirectory=FSDirectory.open(newFile("E:/lucene/index01"));

建立索引的完整代码：

 
        public  
        void  
        index(){ 
       
        IndexWriter writer= 
        null 
        ; 
       
        try  
        { 
       
        //1.创建Directory:创建索引的保存地址 
       
        //Directory directory=new RAMDirectory();//建立在内存中 
       
        Directory directory=FSDirectory.open( 
       
        new  
        File( 
        "E:/lucene/index01" 
        )); 
       
        //2.创建IndexWriter, 是用来操作（增、删、改）索引库的 
       
        IndexWriterConfig iwc= 
        new  
        IndexWriterConfig( 
       
        Version.LUCENE_36, 
       
        new  
        StandardAnalyzer(Version.LUCENE_36)); 
       
        writer= 
        new  
        IndexWriter(directory, iwc); 
       
        //3.创建Document对象 
       
        Document doc= 
        null 
        ; 
       
        //4.为Document添加Field 
       
        File f=  
        new  
        File( 
        "E:/lucene/example" 
        ); 
       
        for 
        (File file:f.listFiles()){ 
       
        doc= 
        new  
        Document(); 
       
        doc.add( 
        new  
        Field( 
        "content" 
        ,  
        new  
        FileReader(file))); 
       
        doc.add( 
        new  
        Field( 
        "fileName" 
        ,file.getName(), 
       
        Field.Store.YES,Field.Index.NOT_ANALYZED)); 
       
        doc.add( 
        new  
        Field( 
        "path" 
        ,file.getAbsolutePath(), 
       
        Field.Store.YES,Field.Index.NOT_ANALYZED)); 
       
        //5.通过IndexWriter添加文档到索引中 
       
        writer.addDocument(doc); 
       
        } 
       
        }  
        catch  
        (CorruptIndexException e) { 
       
        e.printStackTrace(); 
       
        }  
        catch  
        (LockObtainFailedException e) { 
       
        e.printStackTrace(); 
       
        }  
        catch  
        (IOException e) { 
       
        e.printStackTrace(); 
       
        } 
        finally 
        { 
       
        if 
        (writer!= 
        null 
        ) 
       
        try  
        { 
       
        writer.close(); 
       
        }  
        catch  
        (CorruptIndexException e) { 
       
        e.printStackTrace(); 
       
        }  
        catch  
        (IOException e) { 
       
        e.printStackTrace(); 
       
        } 
       
        } 
       
        }

根据创建好的索引进行搜索

创建好了索引之后，接下来就是搜索了。按照搜索关键字（下例的关键字是“java”），在指定域（content）与分词器（StandardAnalyzer），通过IndexReader输入流读取索引库中的文档进行搜索。然后遍历搜索到的文档，把他的文件名和路径输出到控制台。

搜索的完整代码如下：

 
        public  
        void  
        searcher(){ 
       
        try  
        { 
       
        //1.创建Directory：去哪里搜索 
       
        Directory directory=FSDirectory.open( 
       
        new  
        File( 
        "E:/lucene/index01" 
        )); 
       
        //2.创建IndexReader 
       
        IndexReader reader=IndexReader.open(directory); 
       
        //3.根据IndexReader创建IndexSearcher 
       
        IndexSearcher searcher= 
        new  
        IndexSearcher(reader); 
       
        //4.创建搜索的Query 
       
        //创建parser来确定要搜索文件的内容，第二个参数表示搜索的域 
       
        QueryParser parser= 
        new  
        QueryParser(Version.LUCENE_36, 
       
        "content" 
        , 
       
        new  
        StandardAnalyzer(Version.LUCENE_36)); 
       
        //创建Query，表示搜索域为content中包含java的文档 
       
        Query query=parser.parse( 
        "java" 
        ); 
       
        //5.根据searcher搜索并且返回TopDocs 
       
        TopDocs tds=searcher.search(query, 
        2 
        ); 
       
        System.out.println( 
        "总共有【"  
        + tds.totalHits +  
        "】条匹配结果" 
        ); 
       
        //6.根据TopDocs获取ScoreDoc对象 
       
        ScoreDoc[] sds=tds.scoreDocs; 
       
        for 
        (ScoreDoc sd:sds){ 
       
        //7.根据searcher和TopDocs对象获取Document对象 
       
        Document d = searcher.doc(sd.doc);  
        //sd.doc:文档内部编号 
       
        //8.根据Document对象获取需要的值 
       
        System.out.println(d.get( 
        "fileName" 
        )+ 
       
        "[" 
        +d.get( 
        "path" 
        )+ 
        "]" 
        ); 
       
        } 
       
        //9.关闭reader 
       
        reader.close(); 
       
        }  
        catch  
        (CorruptIndexException e) { 
       
        e.printStackTrace(); 
       
        }  
        catch  
        (IOException e) { 
       
        e.printStackTrace(); 
       
        }  
        catch  
        (ParseException e) { 
       
        e.printStackTrace(); 
       
        }  
       
        }

本文链接：Lucene学习笔记之四：lucene入门实例，本文由huangyineng原创，转载请注明出处

Lucene学习笔记之四：lucene入门实例

猜你喜欢