cookqq ›博客列表 ›lucene

lucene--创建index

2012-08-10 15:36:31.0|分类: lucene|浏览量: 1620

摘要: 这是我在lucene in action 中看到的，本来想翻译一下，但是翻译成汉语就没有原来的味道了。 What is indexing, and why is it important? Suppose you needed to search a large...

这是我在lucene in action 中看到的，本来想翻译一下，但是翻译成汉语就没有原来的味道了。

What is indexing, and why is it important?

Suppose you needed to search a large number of files, and you wanted to be able

to find files that contained a certain word or a phrase. How would you go about

writing a program to do this? A naïve approach would be to sequentially scan

each file for the given word or phrase. This approach has a number of flaws, the

most obvious of which is that it doesn’t scale to larger file sets or cases where files

are very large. This is where indexing comes in: To search large amounts of text

quickly, you must first index that text and convert it into a format that will let you

search it rapidly, eliminating the slow sequential scanning process. This conver-

sion process is called indexing, and its output is called an index.

You can think of an index as a data structure that allows fast random access to

words stored inside it. The concept behind it is analogous to an index at the end

of a book, which lets you quickly locate pages that discuss certain topics. In the

case of Lucene, an index is a specially designed data structure, typically stored

on the file system as a set of index files. We cover the structure of index files in

detail in appendix B, but for now just think of a Lucene index as a tool that

allows quick word lookup.

创建index的过程:

1创建Directory -- 我们的索引是创建在硬盘还是创建在内存

2创建IndexWriter

3创建Document对象索引文档（名称，路径，大小，修改时间，内容）什么形式呈现

4 为Document添加Field

5通过IndexWriter添加文档到索引中

5关闭 IndexWriter

示例代码：

package com.java.lucene.index;

import java.io.File;
import java.io.IOException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.LockObtainFailedException;
import org.apache.lucene.util.Version;

public class MyIndex {
	private String[] ids = {"1","2","3","4","5","6"};
	private String[] names = {"tian","bao","xing","zhen","kun","xing"};
	private String[] emails = {"aa@qq.com","bb@qq.com","cc@qq.com",
			"dd@qq.com","ee@qq.com","ff@qq.com"};
	private String[] contents = {
			"Lucene Core, our flagship sub-project, provides Java-based indexing and search technology",
			"Solr is a high performance search server built using Lucene Core, with XML/HTTP and ",
			"Open Relevance Project is a subproject ",
			"PyLucene is a Python port of the Core project.",
			"22 July 2012 - Apache Lucene 3.6.1 and Apache Solr 3.6.1 available",
			"Lucene 3.6.1 Release Highlights"
	};
	
	private Directory directory = null;
	
	public MyIndex(){
		try {
			//1创建Directory -- 我们的索引是创建在硬盘还是创建在内存
//			Directory directory = new RAMDirectory(); // 建立在内存中的索引
			directory = FSDirectory.open(new File("d:/tools/lucene/index02"));
		} catch (IOException e) {
			e.printStackTrace();
		}
	}
	
	public void index() {
		IndexWriter writer = null;
		try {
			//2创建IndexWriter
			writer = new IndexWriter(directory, new IndexWriterConfig(Version.LUCENE_35, new StandardAnalyzer(Version.LUCENE_35)));
			Document doc = null;
			for(int i=0;i<ids.length;i++) {
				//3创建Document对象   索引文档（名称，路径，大小，修改时间，内容）什么形式呈现
				doc = new Document();
				//4 为Document添加Field
				doc.add(new Field("id",ids[i],Field.Store.YES,Field.Index.NOT_ANALYZED_NO_NORMS));
				doc.add(new Field("email",emails[i],Field.Store.YES,Field.Index.NOT_ANALYZED));
				doc.add(new Field("content",contents[i],Field.Store.NO,Field.Index.ANALYZED));
				doc.add(new Field("name",names[i],Field.Store.YES,Field.Index.NOT_ANALYZED_NO_NORMS));
				//5通过IndexWriter添加文档到索引中
				writer.addDocument(doc);
			}
		} catch (CorruptIndexException e) {
			e.printStackTrace();
		} catch (LockObtainFailedException e) {
			e.printStackTrace();
		} catch (IOException e) {
			e.printStackTrace();
		} finally {
			try {
				if(writer!=null){
					//6关闭writer
					writer.close();
				}
			} catch (CorruptIndexException e) {
				e.printStackTrace();
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
	}
	
	
}

一键分享文章

分类列表

• struts源码分析
• flink
• struts
• redis
• kafka
• ubuntu
• zookeeper
• hadoop
• activiti
• linux
• 成长
• NIO
• 关键词提取
• mysql
• android studio
• zabbix
• 云计算
• mahout
• jmeter
• hive
• ActiveMQ
• lucene
• MongoDB
• netty
• flume
• 我遇到的问题
• GRUB
• nginx
• 大家好的文章
• android
• tomcat
• Python
• luke
• android源码编译
• 安全
• MPAndroidChart
• swing
• POI
• powerdesigner
• jquery
• html
• java
• eclipse
• shell
• jvm
• highcharts
• 设计模式
• 列式数据库
• spring cloud
• docker+node.js+zookeeper构建微服务