lucene--创建index
cookqq ›博客列表 ›lucene

lucene--创建index

2012-08-10 15:36:31.0|分类: lucene|浏览量: 1620

摘要: 这是我在lucene in action 中看到的,本来想翻译一下,但是翻译成汉语就没有原来的味道了。 What is indexing, and why is it important?           Suppose you needed to search a large...

这是我在lucene in action 中看到的,本来想翻译一下,但是翻译成汉语就没有原来的味道了。

What is indexing, and why is it important? 

         Suppose you needed to search a large number of files, and you wanted to be able 

         to find files that contained a certain word or a phrase. How would you go about 

         writing a program to do this? A naïve approach would be to sequentially scan 

         each file for the given word or phrase. This approach has a number of flaws, the 

         most obvious of which is that it doesn’t scale to larger file sets or cases where files

         are very large. This is where indexing comes in: To search large amounts of text 

         quickly, you must first index that text and convert it into a format that will let you 

         search it rapidly, eliminating the slow sequential scanning process. This conver- 

         sion process is called indexing, and its output is called an index. 

            You can think of an index as a data structure that allows fast random access to 

        words stored inside it. The concept behind it is analogous to an index at the end 

         of a book, which lets you quickly locate pages that discuss certain topics. In the 

         case of Lucene, an index is a specially designed data structure, typically stored 

         on the file system as a set of index files. We cover the structure of index files in 

         detail in appendix B, but for now just think of a Lucene index as a tool that 

         allows quick word lookup. 

创建index的过程:

1创建Directory -- 我们的索引是创建在硬盘还是创建在内存

2创建IndexWriter

3创建Document对象   索引文档(名称,路径,大小,修改时间,内容)什么形式呈现

4 为Document添加Field

5通过IndexWriter添加文档到索引中

5关闭 IndexWriter 

示例代码:

package com.java.lucene.index;

import java.io.File;
import java.io.IOException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.LockObtainFailedException;
import org.apache.lucene.util.Version;

public class MyIndex {
	private String[] ids = {"1","2","3","4","5","6"};
	private String[] names = {"tian","bao","xing","zhen","kun","xing"};
	private String[] emails = {"aa@qq.com","bb@qq.com","cc@qq.com",
			"dd@qq.com","ee@qq.com","ff@qq.com"};
	private String[] contents = {
			"Lucene Core, our flagship sub-project, provides Java-based indexing and search technology",
			"Solr is a high performance search server built using Lucene Core, with XML/HTTP and ",
			"Open Relevance Project is a subproject ",
			"PyLucene is a Python port of the Core project.",
			"22 July 2012 - Apache Lucene 3.6.1 and Apache Solr 3.6.1 available",
			"Lucene 3.6.1 Release Highlights"
	};
	
	private Directory directory = null;
	
	public MyIndex(){
		try {
			//1创建Directory -- 我们的索引是创建在硬盘还是创建在内存
//			Directory directory = new RAMDirectory(); // 建立在内存中的索引
			directory = FSDirectory.open(new File("d:/tools/lucene/index02"));
		} catch (IOException e) {
			e.printStackTrace();
		}
	}
	
	public void index() {
		IndexWriter writer = null;
		try {
			//2创建IndexWriter
			writer = new IndexWriter(directory, new IndexWriterConfig(Version.LUCENE_35, new StandardAnalyzer(Version.LUCENE_35)));
			Document doc = null;
			for(int i=0;i<ids.length;i++) {
				//3创建Document对象   索引文档(名称,路径,大小,修改时间,内容)什么形式呈现
				doc = new Document();
				//4 为Document添加Field
				doc.add(new Field("id",ids[i],Field.Store.YES,Field.Index.NOT_ANALYZED_NO_NORMS));
				doc.add(new Field("email",emails[i],Field.Store.YES,Field.Index.NOT_ANALYZED));
				doc.add(new Field("content",contents[i],Field.Store.NO,Field.Index.ANALYZED));
				doc.add(new Field("name",names[i],Field.Store.YES,Field.Index.NOT_ANALYZED_NO_NORMS));
				//5通过IndexWriter添加文档到索引中
				writer.addDocument(doc);
			}
		} catch (CorruptIndexException e) {
			e.printStackTrace();
		} catch (LockObtainFailedException e) {
			e.printStackTrace();
		} catch (IOException e) {
			e.printStackTrace();
		} finally {
			try {
				if(writer!=null){
					//6关闭writer
					writer.close();
				}
			} catch (CorruptIndexException e) {
				e.printStackTrace();
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
	}
	
	
}

 

 

 


 

一键分享文章

分类列表

  • • struts源码分析
  • • flink
  • • struts
  • • redis
  • • kafka
  • • ubuntu
  • • zookeeper
  • • hadoop
  • • activiti
  • • linux
  • • 成长
  • • NIO
  • • 关键词提取
  • • mysql
  • • android studio
  • • zabbix
  • • 云计算
  • • mahout
  • • jmeter
  • • hive
  • • ActiveMQ
  • • lucene
  • • MongoDB
  • • netty
  • • flume
  • • 我遇到的问题
  • • GRUB
  • • nginx
  • • 大家好的文章
  • • android
  • • tomcat
  • • Python
  • • luke
  • • android源码编译
  • • 安全
  • • MPAndroidChart
  • • swing
  • • POI
  • • powerdesigner
  • • jquery
  • • html
  • • java
  • • eclipse
  • • shell
  • • jvm
  • • highcharts
  • • 设计模式
  • • 列式数据库
  • • spring cloud
  • • docker+node.js+zookeeper构建微服务
版权所有 cookqq 感谢访问 支持开源 京ICP备15030920号
CopyRight 2015-2018 cookqq.com All Right Reserved.