读书人

Lucene跟IKAnalyzer处理中文:索引、搜

发布时间: 2012-09-24 13:49:41 作者: rapoo

Lucene和IKAnalyzer处理中文:索引、搜索实例
版本:lucene3.02, IKAnalyzer3.20

检索程序(Indexer.java)实现了对给定文件夹下深度遍历txt文件经行索引。
通过实例化IndexWriter将new IKAnalyzer(false)作为第二个参数传入。
在indexFile()中,通过内部类new Field()的形式将索引字段和相应的输入加入Document中。lucene3.*的这一改进须留意。
特别的,由于处理中文,而对于indexFile()的第二个Reader参数,如果IDE的环境为utf-8,则会让IO流处理中文时得到乱码,所以这里改用InputStreamReader实现。

public class indexer {private File baseDir = new File("E:\\");private File indexDir = new File("F:\\indexDir");public indexer() {if (!this.baseDir.exists() || !this.indexDir.exists()) {return;}}public void createIndex() {try {IndexWriter writer = new IndexWriter(FSDirectory.open(indexDir),new IKAnalyzer(false), true,IndexWriter.MaxFieldLength.LIMITED);indexDirectory(writer, baseDir);writer.optimize(); //优化合并writer.close();System.out.println("索引完毕");} catch (CorruptIndexException e) {e.printStackTrace();} catch (LockObtainFailedException e) {e.printStackTrace();} catch (IOException e) {e.printStackTrace();}}private void indexDirectory(IndexWriter writer, File dir) {if (!dir.exists() || !dir.isDirectory()) {return;}File[] files = dir.listFiles();for (File file : files) {if (file.isDirectory()) indexDirectory(writer, file);else indexFile(writer, file);}}private void indexFile(IndexWriter writer, File file) {if (file.isHidden() || !file.exists() || !file.canRead()) {return;}try {if (file.getCanonicalPath().endsWith(".txt")) {System.out.println("正在索引:" + file.getCanonicalPath());Document doc = new Document();doc.add(new Field("text", new InputStreamReader(new FileInputStream(file),"GBK")));// 对文件内容索引doc.add(new Field("filename", file.getCanonicalPath(),Field.Store.YES, Field.Index.ANALYZED));// 对文件名建立索引writer.addDocument(doc);// 调用addDocument()方法,Lucene会建立doc的索引}} catch (FileNotFoundException e) {e.printStackTrace();} catch (CorruptIndexException e) {e.printStackTrace();} catch (IOException e) {e.printStackTrace();}}public static void main(String[] args) {indexer lucene = new indexer();lucene.createIndex();}}

读书人网 >软件架构设计

热点推荐