检索工具—IndexSearcher 种

检索工具—IndexSearcher 类

?IndexSearcher类继承自Searcher基类，是Lucene中最重要的一个检索用类。

IndexSearcher类时最重要的就是要告诉它索引存放的路径，只有这样，检索工具才可以定位索引，从而完成查找的任

务。以下是IndexSearcher的所有构造函数：

public IndexSearcher(String path) throws IOException {
this(IndexReader.open(path), true);
}

public IndexSearcher(Directory directory) throws IOException {
this(IndexReader.open(directory), true);
}

public IndexSearcher(IndexReader r) {
this(r, false);
}

private IndexSearcher(IndexReader r, boolean closeReader) {
reader = r;
this.closeReader = closeReader;
}

可以看到，IndexSearcher一共提供了四种构造函数来初始化一个IndexSearcher对象。

第一个方法最为简单，它直接使用了索引存放的路径作为参数来构造对象。

第二种方式则是使用Directory类型的对象来构建IndexSearcher

第三种是直接使用IndexReader来初始化一个IndexSearcher对象。

第四种则是在第三种的基础上加了一个布尔型的开关，用于判断在关闭IndexSearcher时是否要关闭所带的IndexReader对象。

大家可以看出，实际上，无论传入的参数类型是什么，IndexSearcher最终都还是使用IndexReader来作为实际的索引目录读取器。

前三种构造函数均首先根据传入的参数生成一个IndexReader对象，然后调用了第四种构造方法来完成IndexSearcher的初始化工作。在初始化的工作完成后，在进行搜索前还需要构建一个Query对象。关于Query对象的构建将在后面的章节进行详细介绍，这里只介绍IndexSearcher的查找功能使用IndexSearcher来进行检索，首先要初始化IndexSearcher对象，下面以代码11.1为例进行详细介绍。

在初始化的工作完成后，在进行搜索前还需要构建一个Query对象。关于Query对象的构建将在后面的章节进行详细介绍，这里只介绍IndexSearcher的查找功能

使用IndexSearcher来进行检索，首先要初始化IndexSearcher对象，下面以代码11.1为例进行详细介绍。

代码11.1? IndexSearcher的查找

package ch11;

import org.apache.lucene.analysis.standard.StandardAnalyzer;

import org.apache.lucene.document.Document;

import org.apache.lucene.document.Field;

import org.apache.lucene.index.IndexReader;

import org.apache.lucene.index.IndexWriter;

import org.apache.lucene.index.Term;

import org.apache.lucene.queryParser.QueryParser;

import org.apache.lucene.search.Hits;

import org.apache.lucene.search.IndexSearcher;

import org.apache.lucene.search.Query;

public class IndexSearcherTest1 {

? ?? public static void main(String[] args) throws Exception {

? ?????? // 构建5个不同的document，每个含有两个字段，其中

? ?????? // title字段中为该document的名称

? ?????? // name字段中为索引信息

? ?????? //新生成一个Document对象doc1

? ?????? Document doc1 = new Document();

? ?????? //添加“name”字段的内容

? ?????? doc1.add(Field.Text("name", "word1 word2 word3"));

? ?????? //添加“title”字段的内容

? ?????? doc1.add(Field.Keyword("title", "doc1"));

? ?????? //新生成一个Document对象doc2

? ?????? Document doc2 = new Document();

? ?????? //添加“name”字段的内容

? ?????? doc2.add(Field.Text("name", "word4 word5 word6"));

? ?????? //添加“title”字段的内容

? ?????? doc2.add(Field.Keyword("title", "doc2"));

? ?????? //新生成一个Document对象doc3

? ?????? Document doc3 = new Document();

? ?????? //添加“name”字段的内容

? ?????? doc3.add(Field.Text("name", "word1 word4"));

? ?????? //添加“title”字段的内容

? ?????? doc3.add(Field.Keyword("title", "doc3"));

? ?????? //新生成一个Document对象doc4

? ?????? Document doc4 = new Document();

? ?????? //添加“name”字段的内容

? ?????? doc4.add(Field.Text("name", "word2 word5"));

? ?????? //添加“title”字段的内容

? ?????? doc4.add(Field.Keyword("title", "doc4"));

? ?????? //新生成一个Document对象doc5

? ?????? Document doc5 = new Document();

? ?????? //添加“name”字段的内容

? ?????? doc5.add(Field.Text("name", "word3 word6"));

? ?????? //添加“title”字段的内容

? ?????? doc5.add(Field.Keyword("title", "doc5"));

? ?????? //生成一个索引书写器

? ?????? IndexWriter writer = new IndexWriter("c:\\index",

? ?????????????? new StandardAnalyzer(), true);

? ?????? //依次将前面生成的Document对象添加到索引中

? ?????? writer.addDocument(doc1);

? ?????? writer.addDocument(doc2);

? ?????? writer.addDocument(doc3);

? ?????? writer.addDocument(doc4);

? ?????? writer.addDocument(doc5);

? ?????? writer.close();

? ?????? //生成查询对象

? ?????? Query query = null;

? ?????? //生成Hits对象，保存检索返回的结果

? ?????? Hits hits = null;

? ?????? // 定义六个查找的关键字

? ?????? String key1 = "word1";

? ?????? String key2 = "word2";

? ?????? String key3 = "word3";

? ?????? String key4 = "word4";

? ?????? String key5 = "word5";

? ?????? String key6 = "word6";

? ?????? // 初始化IndexSearcher

? ?????? IndexSearcher searcher = new IndexSearcher("c:\\index");

? ??????

? ?????? //第一次检索

? ?????? query = QueryParser.parse(key1, "name", new StandardAnalyzer());

? ?????? //返回第一次的检索结果

? ?????? hits = searcher.search(query);

? ?????? //输出检索结果的相关信息

? ?????? printResult(hits, key1);

? ??????

? ?????? //第二次检索

? ?????? query = QueryParser.parse(key2, "name", new StandardAnalyzer());

??? ???? //返回第二次的检索结果

? ?????? hits = searcher.search(query);

? ?????? //输出检索结果的相关信息

? ?????? printResult(hits, key2);

? ??????

? ?????? //第三次检索

? ?????? query = QueryParser.parse(key3, "name", new StandardAnalyzer());

? ?????? //返回第三次的检索结果

? ?????? hits = searcher.search(query);

? ?????? //输出检索结果的相关信息

? ?????? printResult(hits, key3);

? ??????

? ?????? //第四次检索

? ?????? query = QueryParser.parse(key4, "name", new StandardAnalyzer());

? ?????? //返回第四次的检索结果

? ?????? hits = searcher.search(query);

? ?????? //输出检索结果的相关信息

? ?????? printResult(hits, key4);

? ??????

? ?? ??? //第五次检索

? ?????? query = QueryParser.parse(key5, "name", new StandardAnalyzer());

? ?????? //返回第五次的检索结果

? ?????? hits = searcher.search(query);

? ?????? //输出检索结果的相关信息

? ?????? printResult(hits, key5);

? ??????

? ?????? //第六次检索

? ?????? query = QueryParser.parse(key6, "name", new StandardAnalyzer());

? ?????? //返回第六次的检索结果

? ?????? hits = searcher.search(query);

? ?????? //输出检索结果的相关信息

? ?????? printResult(hits, key6);

? ?? }

? ??

? ?? // 输出结果

? ?? public static void printResult(Hits hits, String key) throws Exception

? ?? ??? {System.out.println("查找 \"" + key + "\" :");

? ?????? if (hits != null) {

? ?????????? if (hits.length() == 0) {

? ?????????????? System.out.println("没有找到任何结果");

? ?????????? } else {

? ?????????????? System.out.print("找到");

? ?????????????? for (int i = 0; i < hits.length(); i++) {

? ?????????????????? //取得文档检索结果中的对象

? ?????????????????? Document d = hits.doc(i);

? ?????????????????? //取得文档对象“title”字段的内容

? ?????????????????? String dname = d.get("title");

? ?????????????????? //输出

? ?????????????????? System.out.print(dname + "?? " );

? ?????????????? }

? ?????????????? System.out.println();

? ?????????? }

? ?????? }

? ?? }

}

在上述代码中初始化了一个IndexSearcher对象后，按不同的关键字创建了5个不同的Document对象，每个Document对象中包括两个字段，一个字段的名称为name，它表示该Document的名称，另一个则是实际用于检索的字段，它的内容将被分词，然后存入索引。然后调用IndexSearcher的search（Query）方法进行查找，这样就实现了最简单的检索功能。

注意：在IndexSearcher类中也有一个close方法。事实上，它关闭的并非Searcher对象本身，而是关闭Searcher对象内部所带的IndexReader对象。

除了上面的示例，IndexSearcher类的search方法还有多种重载格式，以满足不同情况的需要。具体重载格

从图11-2中可以看出，IndexSearcher类中最简单的search方法，就是在代码11.1中使用的search(Query)方法。其他的各种重载的search方法中有不同的参数。主要是为了完成对检索结果的排序、过滤等功能而设置的。关于IndexSearcher的其他内容，将会在下一章高级搜索技巧中进行介绍。

11.2.2? 检索结果—Hits
在搜索完成之后，就需要把搜索结果返回并显示给用户，只有这样才算是完成了搜索的任务。在Lucene中搜索结果的集合是用Hits类的实例来进行表示的。如果读者细心观察图11-2，就会发现，所有的search方法都返回一个类型为Hits的对象。其实在前面各章的代码中，相信读者已经多次看到关于Hits实例的使用。
Hits对象中主要有以下几个经常使用的方法。
???? length()：返回搜索到结果的总数量。
???? doc(int n)：返回第n个文档。
???? id(int n)：返回第n个文档的内部ID号。
???? score(n)：返回第n个文档的得分。
其中，length()方法和doc(intn)方法共同使用，就可以遍历结果集中的所有文档记录。不过有一点值得注意，如果一个结果集含有100000条记录，而Hits对象一次性就把检索结果全部返回，那么这个Hits对象的结果就会大不一样。
在本书中对这个问题做了细致的考虑，它并不是一次性将所有的结果返回，而是采取一种懒惰（Lazy）的方式来加载返回结果，即当用户将要访问某个文档的时候，Hits对象在内部对Lucene的索引又进行了一次检索，才将这个最新的结果返回给用户。有兴趣的读者可以研习Hits的源代码，以获得更多关于检索结果缓存的方法。
关于Hits对象的使用，在前面的章节中已经给出了许多例子，下面将给出一个更高级的使用示例。
在代码11.2中，将介绍提取一个单独的方法来建立索引。该方法总共创建了12个不同的Document，每个Document有两个域，“contents”和“path”，分别表示文档的内容和路径。每个文档的内容中都有一个单词“word”。
代码11.2? Hits对象的使用
package ch11;
import java.io.BufferedReader;
import java.io.InputStreamReader;
?
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.*;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.Searcher;
?
public class HitsTest
{
?
? public static void main(String[] args) throws Exception
? {
????? // 构建索引
????? buildIndex();
????? // 使用已经存在索引目录
????? Searcher searcher = new IndexSearcher("c:\\index");
????? // 使用标准分析器
????? Analyzer aStandardAnalyzer = new StandardAnalyzer();
????? // 从标准输入读取查询的字符串
????? BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
????? while (true)
????? {
??????? System.out.println("-------------------------------------------------");
??????? System.out.print("Query: ");
? ?????? //从命令行中输入一字符串，以回车结束
??????? String line = in.readLine();
??????? // 判断是否直接输入的回车
??????? if (line.length() == 0)
????????? break;
??????? // 构造Query对象
??????? Query query = QueryParser.parse(line, "contents", aStandardAnalyzer);
??????? // 输出要搜索的内容
??????? System.out.println("查找 :??? " + query.toString("contents"));
??????? // 使用Searcher对象的search方法进行搜索，返回的是一个Hits类型的对象
??????? Hits hits = searcher.search(query);
??????? // 使用Hits对象的length（）方法，输出搜索到的文档的数量
??????? System.out.println("总共找到 " + hits.length() + " 个文档");
??????? // 定义每次显示的搜索结果数目
??????? final int HITS_PER_PAGE = 10;
??????? // 循环输出
??????? for (int start = 0; start < hits.length(); start += HITS_PER_PAGE)
??????? {
????????? //计算结束的位置
????????? int end = Math.min(hits.length(), start + HITS_PER_PAGE);
????????? for (int i = start; i < end; i++)
? ????????{
??????????? // 取得搜索结果中的一个文档对象
??????????? Document doc = hits.doc(i);
??????????? // 输出文档的ID编号
??????????? System.out.println("文档的内部ID号:" + hits.id(i));
??????????? // 输出文档的评分
??????????? System.out.println("文档的分值:" + hits.score(i));
??????????? // 输出文档的存放路径
??????????? String path = doc.get("path");
??????????? if (path != null)
??????????? {
????????????? System.out.println("路径为："+path);
??????????? }
????????? }
????????? // 判断是否还有结果未输出
????????? if (hits.length() > end)
????????? {
?????????? ?System.out.print("more (y/n) ? ");
??????????? line = in.readLine();
??????????? if (line.length() == 0 || line.charAt(0) == 'n')
????????????? break;
????????? }
??????? }
????? }
????? searcher.close();
? }
?
? // 构建索引
? public static void buildIndex() throws Exception {
? ?? //以下步骤同代码11.1中的步骤类似，在此处不再详细解释
??? Document doc1 = new Document();
??? doc1.add(Field.Text("contents", "word1 word"));
??? doc1.add(Field.Keyword("path", "path\\document1.txt"));
???
??? Document doc2 = new Document();
??? doc2.add(Field.Text("contents", "word2 word"));
??? doc2.add(Field.Keyword("path", "path\\document2.txt"));
???
??? Document doc3 = new Document();
??? doc3.add(Field.Text("contents", "word3 word"));
??? doc3.add(Field.Keyword("path", "path\\document3.txt"));
???
??? Document doc4 = new Document();
??? doc4.add(Field.Text("contents", "word4 word"));
??? doc4.add(Field.Keyword("path", "path\\document4.txt"));
???
??? Document doc5 = new Document();
??? doc5.add(Field.Text("contents", "word5 word"));
??? doc5.add(Field.Keyword("path", "path\\document5.txt"));
???
??? Document doc6 = new Document();
??? doc6.add(Field.Text("contents", "word6 word"));
??? doc6.add(Field.Keyword("path", "path\\document6.txt"));
???
??? Document doc7 = new Document();
??? doc7.add(Field.Text("contents", "word7 word"));
??? doc7.add(Field.Keyword("path", "path\\document7.txt"));
???
??? Document doc8 = new Document();
??? doc8.add(Field.Text("contents", "word8 word"));
??? doc8.add(Field.Keyword("path", "path\\document8.txt"));
???
?? ?Document doc9 = new Document();
??? doc9.add(Field.Text("contents", "word9 word"));
??? doc9.add(Field.Keyword("path", "path\\document9.txt"));
???
??? Document doc10 = new Document();
??? doc10.add(Field.Text("contents", "word10 word"));
??? doc10.add(Field.Keyword("path", "path\\document10.txt"));
???
??? Document doc11 = new Document();
??? doc11.add(Field.Text("contents", "word11 word"));
??? doc11.add(Field.Keyword("path", "path\\document11.txt"));
???
??? Document doc12 = new Document();
??? doc12.add(Field.Text("contents", "word12 word"));
??? doc12.add(Field.Keyword("path", "path\\document12.txt"));
???
??? IndexWriter writer = new IndexWriter("c:\\index", new StandardAnalyzer(), true);
???
??? writer.addDocument(doc1);
??? writer.addDocument(doc2);
??? writer.addDocument(doc3);
??? writer.addDocument(doc4);
??? writer.addDocument(doc5);
??? writer.addDocument(doc6);
??? writer.addDocument(doc7);
??? writer.addDocument(doc8);
??? writer.addDocument(doc9);
??? writer.addDocument(doc10);
??? writer.addDocument(doc11);
??? writer.addDocument(doc12);
???
??? writer.close();
? }
}
在建立完索引后，初始化一个IndexSearcher来进行检索。对于检索结果，在代码中使用了Hits对象所提供的大多数方法，比如获取文档、获取文档ID和获取文档评分等。代码在运行时，首先要求输入要查询的字符串，然后根据输入的查询字符串进行相应的检索，运行结果如图11-3所示。
?
图11-3? 测试结果1
在代码11.2的测试过程中，分别使用了关键字word1、word4和word1 word3 word10。可以看到，IndexSearcher很好地完成了查找的任务。其中通过Hits对象获得文档的ID号、分值等信息。
图11-4是当用户检索关键字“word”时的运行效果，由于结果数量大于10条，而程序设定一次最多显示10条记录，故程序提示用户是否继续显示剩余的记录。如果此时用户输入“y”，则输出结果如图11-5所示。
?
?????????????? 图11-4? 测试结果2???????????????????????????? 图11-5? 测试结果3
在开发Web相关应用时，简便的方法是当某个用户检索完毕后，可直接将返回的Hits对象存入该用户的session中，然后根据用户的需要进行相关查询。不过这里读者要注意的一点，由于Hits对象被放入session中，并不适合存入大量文本。因为若是这样，对用户来说，可能导致浏览器的响应速度极慢，对服务器方来说，可能导致服务器的内容被大量Hits所占用，最终造成服务器的崩溃。
比较好的一种方式，是将Lucene与数据库相结合，在索引中存入一些关键性的ID字段、路径字段或是简单的文本，而真正的数据提取则从数据库中得到。这样一来既可以发挥Lucene优势，也可以使服务器端的压力减轻。

检索工具IndexSearcher 种

热点推荐