新书推介:《语义网技术体系》
作者:瞿裕忠,胡伟,程龚
   XML论坛     W3CHINA.ORG讨论区     >>计算机科学论坛<<     SOAChina论坛     Blog     开放翻译计划     新浪微博  
 
  • 首页
  • 登录
  • 注册
  • 软件下载
  • 资料下载
  • 核心成员
  • 帮助
  •   Add to Google

    >> 搜索引擎, 信息分类与检索, 语义搜索, Lucene, Nutch, GRUB, Larbin, Weka
    [返回] 计算机科学论坛计算机技术与应用『 Web挖掘技术 』 → Delve inside the Lucene indexing mechanism 查看新帖用户列表

      发表一个新主题  发表一个新投票  回复主题  (订阅本版) 您是本帖的第 24703 个阅读者浏览上一篇主题  刷新本主题   树形显示贴子 浏览下一篇主题
     * 贴子主题: Delve inside the Lucene indexing mechanism 举报  打印  推荐  IE收藏夹 
       本主题类别: 信息检索    
     admin 帅哥哟,离线,有人找我吗?
      
      
      
      威望:9
      头衔:W3China站长
      等级:计算机硕士学位(管理员)
      文章:5255
      积分:18406
      门派:W3CHINA.ORG
      注册:2003/10/5

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给admin发送一个短消息 把admin加入好友 查看admin的个人资料 搜索admin在『 Web挖掘技术 』 的所有贴子 点击这里发送电邮给admin  访问admin的主页 引用回复这个贴子 回复这个贴子 查看admin的博客楼主
    发贴心情 Delve inside the Lucene indexing mechanism

    Index your documents with Lucene, an IR library written in Java
    [URL=http://www.ibm.com/developerworks/]按此在新窗口浏览图片[/URL]

    Level: Intermediate

    [URL=http://www-128.ibm.com/developerworks/library/wa-lucene/#author]Deng Peng Zhou[/URL] ([URL=mailto:zhoudengpeng@yahoo.com.cn?subject=Delve inside the Lucene indexing mechanism&cc=htc@us.ibm.com]zhoudengpeng@yahoo.com.cn[/URL]), Software Engineer, Shanghai Jiaotong University


    27 Jun 2006

    Discover Lucene, a full-text information retrieval (IR) library written in the Java™ language. You can embed Lucene easily into your applications and implement indexing and searching functionality. Now it's an open source project in the popular Apache Jakarta Project family. Learn about Lucene's indexing mechanism, as well as its index file structure.
    This article introduces you to the indexing mechanism of Lucene, a popular full-text IR library written in the Java language. First, I'll demonstrate how to index your documents with Lucene, then I'll discuss how to improve the indexing performance. Finally, I'll analyze Lucene's index file structure. Keep in mind that Lucene is not a ready-to-use application, but rather an IR Library that lets you add searching and indexing functionality to your application.

    Architecture overview

    [URL=http://www-128.ibm.com/developerworks/library/wa-lucene/#figure1]Figure 1[/URL] shows the indexing architecture of Lucene. Lucene uses different parsers for different types of documents. Take HTML documents, for example -- an HTML parser does some preprocessing, such as filtering the HTML tags and so on. The HTML parser outputs the text content, and then the Lucene Analyzer extracts tokens and related information, such as token frequency, from the text content. The Lucene Analyzer then writes the tokens and related information into the index files of Lucene.


    Figure 1. Indexing the Lucene architecture
    按此在新窗口浏览图片

    Indexing your documents with Lucene

    I'll show you step by step how to create an index for your documents with Lucene. Lucene can index any data that you can convert into textual format. For example, if you want to index HTML or PDF documents, first you should extract the textual information from the documents and then send the information to Lucene for indexing. The example in this article uses Lucene to index text files with a .txt extension.

    1. Prepare the text files

    Put some text files with a .txt extension into a directory -- for example, C:\\files_to_index on the Microsoft&reg; Windows&reg; platform.

    2. Create the index

    [URL=http://www-128.ibm.com/developerworks/library/wa-lucene/#Listing1]Listing 1[/URL] shows you how to index the text files you prepared in the first step.


    Listing 1. Indexing your documents with Lucene
    package lucene.index;

    import java.io.File;
    import java.io.FileReader;
    import java.io.Reader;
    import java.util.Date;

    import org.apache.lucene.analysis.Analyzer;
    import org.apache.lucene.analysis.standard.StandardAnalyzer;
    import org.apache.lucene.document.Document;
    import org.apache.lucene.document.Field;
    import org.apache.lucene.index.IndexWriter;

    /**
    * This class demonstrates the process of creating an index with Lucene
    * for text files in a directory.
    */
    public class TextFileIndexer {
    public static void main(String[] args) throws Exception{
       //fileDir is the directory that contains the text files to be indexed
       File   fileDir  = new File("C:\\files_to_index ");

       //indexDir is the directory that hosts Lucene's index files
       File   indexDir = new File("C:\\luceneIndex");
       Analyzer luceneAnalyzer = new StandardAnalyzer();
       IndexWriter indexWriter = new IndexWriter(indexDir,luceneAnalyzer,true);
       File[] textFiles  = fileDir.listFiles();
       long startTime = new Date().getTime();

       //Add documents to the index
       for(int i = 0; i < textFiles.length; i++){
         if(textFiles[i].isFile() >> textFiles[i].getName().endsWith(".txt")){
           System.out.println("File " + textFiles[i].getCanonicalPath()
                  + " is being indexed");
           Reader textReader = new FileReader(textFiles[i]);
           Document document = new Document();
           document.add(Field.Text("content",textReader));
           document.add(Field.Text("path",textFiles[i].getPath()));
           indexWriter.addDocument(document);
         }
       }

       indexWriter.optimize();
       indexWriter.close();
       long endTime = new Date().getTime();

       System.out.println("It took " + (endTime - startTime)
                  + " milliseconds to create an index for the files in the directory "
                  + fileDir.getPath());
      }
    }
          


    As [URL=http://www-128.ibm.com/developerworks/library/wa-lucene/#Listing1]Listing 1[/URL] demonstrates, you can index your text files easily with Lucene. Let's interpret the key statements in [URL=http://www-128.ibm.com/developerworks/library/wa-lucene/#Listing1]Listing 1[/URL], beginning with this one:

    Analyzer luceneAnalyzer = new StandardAnalyzer();


    This statement creates an instance of the StandardAnalyzer class, which is in charge of extracting tokens out of text to be indexed. StandardAnalyzer is just one implementation of the abstract class Analyzer; other implementations, such as SimpleAnalyzer, exist.

    Now, take a look at this statement:

    IndexWriter indexWriter = new IndexWriter(indexDir,luceneAnalyzer,true);


    This statement creates an instance of the IndexWriter class, which is a key component in the indexing process. This class can create a new index or open an existing index and add documents to it. You might notice that its constructor accepts three parameters. The first parameter specifies the directory that stores the index files; the second parameter specifies the analyzer that will be used in the indexing process; the last parameter is a Boolean variable. If true, the class creates a new index; if false, it opens an existing index.

    The following code snippet shows the process of adding one document to the index:

    Document document = new Document();
    document.add(Field.Text("content",textReader));
    document.add(Field.Text("path",textFiles[i].getPath()));
    indexWriter.addDocument(document);


    The first line creates an instance of the Document class, which consists of a collection of fields. You can think of this class as a virtual document, such as an HTML page, a PDF file, or a text file. The fields in a document are often the attributes of a virtual document. Take an HTML page, for example: Its fields can include title, contents, URL, and so on. Different types of Field control which field you should index and which you should store with the index. For more information about Field, you can refer to Lucene's Javadoc. The second and third lines add two fields to the document. Each field contains a field name and the content. This example adds two fields named "content" and "path", which store the content and the path of the text file, respectively. The last line adds the prepared documents to the index.

    After you add the documents to the index, don't forget to close the index by calling this method, which guarantees that the index changes are written to the disk:

    indexWriter.close();


    Using the code in [URL=http://www-128.ibm.com/developerworks/library/wa-lucene/#Listing1]Listing 1[/URL], you can add the text documents to the index successfully. Now, let's look at another operation on the index.

    3. Remove documents from the index

    The IndexReader class in Lucene is responsible for removing documents from the existing index, as demonstrated in [URL=http://www-128.ibm.com/developerworks/library/wa-lucene/#Listing2]Listing 2[/URL].


    Listing 2. Removing documents from the index
    File   indexDir = new File("C:\\luceneIndex");
    IndexReader ir = IndexReader.open(indexDir);
    ir.delete(1);
    ir.delete(new Term("path","C:\\file_to_index\lucene.txt"));
    ir.close();


    In [URL=http://www-128.ibm.com/developerworks/library/wa-lucene/#Listing2]Listing 2[/URL], the second line initializes an instance of the IndexReader class using the static method IndexReader.open(indexDir). The parameter of the method specifies the directory that stores the Lucene index files. IndexReader provides two methods to remove documents, as shown in the third and fourth lines. The third line deletes a document by document ID. Every document has a unique ID in the Lucene index, but the system generates the ID, so it's not convenient to use it to delete the document. The fourth line deletes the documents that contain the string "C:\\file_to_index\lucene.txt" in their field "path." You can easily specify a document to be deleted by its file path. Keep in mind that although the documents aren't searchable, the operations don't physically remove the documents from the index; they just mark the documents that have been deleted by creating a file with a .del extension.

    You can easily recover the documents that have been marked as deleted, as shown in [URL=http://www-128.ibm.com/developerworks/library/wa-lucene/#Listing3]Listing 3[/URL]. First, open the index, then call the ir.undeleteAll() method to complete the recovery process.


    Listing 3. Recovering deleted documents
    File   indexDir = new File("C:\\luceneIndex");
    IndexReader ir = IndexReader.open(indexDir);
    ir.undeleteAll();
    ir.close();


    You might want to know how to remove the documents from the index physically. [URL=http://www-128.ibm.com/developerworks/library/wa-lucene/#Listing4]Listing 4[/URL] shows the process.


    Listing 4. Removing documents from the index physically
    File   indexDir = new File("C:\\luceneIndex");
    Analyzer luceneAnalyzer = new StandardAnalyzer();
    IndexWriter indexWriter = new IndexWriter(indexDir,luceneAnalyzer,false);
    indexWriter.optimize();
    indexWriter.close();


    The third line in [URL=http://www-128.ibm.com/developerworks/library/wa-lucene/#Listing4]Listing 4[/URL] initializes an instance of the IndexWriter class and opens the existing index specified by the first parameter. The fourth line cleans up the index. IndexWriter physically deletes from the disk the documents that have been marked as deleted.

    Lucene doesn't provide a method to update the document in the index directly, but if you want to do so, first remove the documents from the index and then add the updated version of this document to the index.


    [URL=http://www-128.ibm.com/developerworks/library/wa-lucene/#main]Back to top[/URL]

    Improving the indexing performance

    You can make full use of your hardware resources to improve the indexing performance with Lucene. When you need to index a large number of documents, you'll notice that the bottleneck of the indexing is the process of writing the documents into the index files on the disk. To solve this problem, Lucene holds a buffer in the RAM. But how can you control the buffer that Lucene uses? Fortunately, Lucene's IndexWriter class exposes three parameters to let you adjust the size of the buffer and the frequency of the disk writes.

    mergeFactor

    This parameter determines how many documents you can store in the original segment index and how often you can merge together the segment indexes in the disk. For example, if the value of mergeFactor is 10, all the documents will write to a new segment index on the disk if the number of documents reaches 10 in the memory. Also, if the number of segment indexes on the disk reaches 10, they will merge together. The default value of this parameter is 10, which isn't suitable if you have a large number of documents. The large value of this parameter is better for batch index creation.

    minMergeDocs

    This parameter also affects the indexing performance. It determines the minimum number of documents that have to be buffered in the RAM before IndexWriter writes them to disk. The default value of this parameter is 10. If you have enough RAM, set the value of this parameter as large as possible to decrease the indexing time dramatically.

    maxMergeDocs

    This parameter determines the maximum number of documents per segment index. The default value is Integer.MAX_VALUE. Large values are better for batched indexing and speedier searches.

    [URL=http://www-128.ibm.com/developerworks/library/wa-lucene/#Listing5]Listing 5[/URL] shows the usage of these parameters. [URL=http://www-128.ibm.com/developerworks/library/wa-lucene/#Listing5]Listing 5[/URL] is similar to [URL=http://www-128.ibm.com/developerworks/library/wa-lucene/#Listing1]Listing 1[/URL] but adds the statements to set the parameters described previously.


    Listing 5. Improving indexing performance
            
    /**
    * This class demonstrates how to improve the indexing performance
    * by adjusting the parameters provided by IndexWriter.
    */
    public class AdvancedTextFileIndexer  {
      public static void main(String[] args) throws Exception{
        //fileDir is the directory that contains the text files to be indexed
        File   fileDir  = new File("C:\\files_to_index");

        //indexDir is the directory that hosts Lucene's index files
        File   indexDir = new File("C:\\luceneIndex");
        Analyzer luceneAnalyzer = new StandardAnalyzer();
        File[] textFiles  = fileDir.listFiles();
        long startTime = new Date().getTime();

        int mergeFactor = 10;
        int minMergeDocs = 10;
        int maxMergeDocs = Integer.MAX_VALUE;
        IndexWriter indexWriter = new IndexWriter(indexDir,luceneAnalyzer,true);        
        indexWriter.mergeFactor = mergeFactor;
        indexWriter.minMergeDocs = minMergeDocs;
        indexWriter.maxMergeDocs = maxMergeDocs;

        //Add documents to the index
        for(int i = 0; i < textFiles.length; i++){
          if(textFiles[i].isFile() >> textFiles[i].getName().endsWith(".txt")){
            Reader textReader = new FileReader(textFiles[i]);
            Document document = new Document();
            document.add(Field.Text("content",textReader));
            document.add(Field.Keyword("path",textFiles[i].getPath()));
            indexWriter.addDocument(document);
          }
        }

        indexWriter.optimize();
        indexWriter.close();
        long endTime = new Date().getTime();

        System.out.println("MergeFactor: " + indexWriter.mergeFactor);
        System.out.println("MinMergeDocs: " + indexWriter.minMergeDocs);
        System.out.println("MaxMergeDocs: " + indexWriter.maxMergeDocs);
        System.out.println("Document number: " + textFiles.length);
        System.out.println("Time consumed: " + (endTime - startTime) + " milliseconds");
      }
    }



    Notice that Lucene gives you enough flexibility to control the size of the buffer pool and the frequency of disk writes. Now, take a look at the key statements in this example. The following statements first create an instance of IndexWriter and then assign the defined values to the parameters of IndexWriter.

    int mergeFactor = 10;
    int minMergeDocs = 10;
    int maxMergeDocs = Integer.MAX_VALUE;
    IndexWriter indexWriter = new IndexWriter(indexDir,luceneAnalyzer,true);        
    indexWriter.mergeFactor = mergeFactor;
    indexWriter.minMergeDocs = minMergeDocs;
    indexWriter.maxMergeDocs = maxMergeDocs;


    Let's examine these parameters' influence on the indexing time. Notice the values of these parameters and the changes on the indexing time. I prepared 10,000 documents for this test; [URL=http://www-128.ibm.com/developerworks/library/wa-lucene/#table1]Table 1[/URL] shows the test results.


    Table 1. Testing results
    MergeFactor MinMergeDocs MaxMergeDocs Document number Time consumed (seconds)
    10 10 Integer.MAX_VALUE 10,000 423
    100 10 Integer.MAX_VALUE 10,000 270
    100 100 Integer.MAX_VALUE 10,000 213
    100 100 100 10,000 220
    1000 1000 Integer.MAX_VALUE 10,000 194


    From [URL=http://www-128.ibm.com/developerworks/library/wa-lucene/#table1]Table 1[/URL], you can easily see the influence that three parameters have on the indexing time. In practice, you'll often change the value of mergeFactor and minMergeDocs to improve the indexing performance. As long as you have enough RAM, you can assign a big integer value to the mergeFactor and minMergeDocs parameters to decrease the indexing time dramatically.


    [URL=http://www-128.ibm.com/developerworks/library/wa-lucene/#main]Back to top[/URL]

    Lucene's index file structure analysis

    Before analyzing Lucene's index file structure, you should understand the inverted index concept. An inverted index is an inside-out arrangement of documents in which terms take center stage. Each term points to a list of documents that contain it. On the contrary, in a forwarding index, documents take the center stage, and each document refers to a list of terms it contains. You can use an inverted index to easily find which documents contain certain terms. Lucene uses an inverted index as its index structure.

    Logical view of index files

    Lucene features segments, which contain some indexed documents. You can search segments independently. Now look at Lucene's logical view of index files in [URL=http://www-128.ibm.com/developerworks/library/wa-lucene/#figure2]Figure 2[/URL]. The number of segments is determined by the number of documents to be indexed and the maximum number of documents that one segment can contain.


    Figure 2. Logical view of index files
    按此在新窗口浏览图片


    Key index files in Lucene

    The following describes the main index files in Lucene. Some might not include all of the columns, but it won't affect your understanding of the index file.

    Segments file

    A single file contains the active segments information for each index. This file lists the segments by name, and it contains the size of each segment. [URL=http://www-128.ibm.com/developerworks/library/wa-lucene/#table2]Table 2[/URL] describes the structure of this file.


    Table 2. Structure of segments file
    Column name Data type Description
    Version UInt64 Contains the version information of the index files.
    SegCount UInt32 The number of segments in the index.
    NameCounter UInt32 Generates names for new segment files.
    SegName String The name of one segment. If the index contains more than one segment, this column will appear more than once.
    SegSize UInt32 The size of one segment. If the index contains more than one segment, this column will appear more than once.


    Fields information file

    As you know, documents in the index are composed of fields, and this file contains the fields information in the segment. [URL=http://www-128.ibm.com/developerworks/library/wa-lucene/#table3]Table 3[/URL] shows the structure of this file.


    Table 3. Structure of fields information file
    Column name Data type Description
    FieldsCount VInt The number of fields.
    FieldName String The name of one field.
    FieldBits Byte Contains various flags. For example, if the lowest bit is 1, it means this is an indexed field; if 0, it's a nonindexed field.


    Text information file

    This core index file stores all of the terms and related information in the index, sorted by term. [URL=http://www-128.ibm.com/developerworks/library/wa-lucene/#table4]Table 4[/URL] shows the structure of this file.


    Table 4. Structure of term information file
    Column name Data type Description
    TIVersion UInt32 Names the version of this file's format.
    TermCount UInt64 The number of terms in this segment.
    Term Structure This column is composed of three subcolumns: PrefixLength, Suffix, and FieldNum. It represents the contents in this term.
    DocFreq VInt The number of documents that contain the term.
    FreqDelta VInt Points to the frequency file.
    ProxDelta VInt Points to the position file.


    Frequency file

    This file contains the list of documents that contain the terms, along with the term frequency in each document. If Lucene finds a term that matches the search word in the term information file, it will visit the list in the frequency file to find which documents contain the term. [URL=http://www-128.ibm.com/developerworks/library/wa-lucene/#table5]Table 5[/URL] shows a brief structure of this file. It does not contain all of the fields of this file, but it can help you understand its usage.


    Table 5. Structure of the frequency file
    Column name Data type Description
    DocDelta VInt It determines both the document number and term frequency. If the value is odd, the term frequency is 1; otherwise, the Freq column determines the term frequency.
    Freq VInt If the value of DocDelta is even, this column determines the term frequency.

    Position file

    This file contains the list of positions at which the term occurs within each document. You can use this information to rank the search results. [URL=http://www-128.ibm.com/developerworks/library/wa-lucene/#table6]Table 6[/URL] shows the structure of this file.


    Table 6. Structure of the position file
    Column name Data type Description
    PositionDelta VInt The position at which each term occurs within the documents


    I've introduced you to the main index files in Lucene, hopefully allowing you to understand the physical storage structure of Lucene.

    [URL=http://www-128.ibm.com/developerworks/library/wa-lucene/#main]Back to top[/URL]

    In conclusion

    A number of large, well-known organizations are using Lucene. For example, Lucene provides searching capabilities for the Eclipse help system, MIT's OpenCourseWare, and so on. Upon reading this article, I hope you've gained an understanding of Lucene's indexing system and will find it easy to create an index using Lucene's API.


    [URL=http://www-128.ibm.com/developerworks/library/wa-lucene/#main]Back to top[/URL]

    Resources

    Learn

    [URL=http://www.ibm.com/developerworks/web/library/j-lucene/]Parsing, indexing, and searching XML with Digester and Lucene[/URL] by Otis Gospodnetic (developerWorks, June 2003): Manipulate XML in Lucene and cut your developement time.


    [URL=http://www.ibm.com/developerworks/db2/library/techarticle/dm-0601chitiveli/]IBM Search and Index APIs (SIAPI) for WebSphere Information Integrator OmniFind Edition[/URL] by Srinivas Varma Chitiveli (developerWorks, January 2005): Build your own search solutions based on OmniFind technology, IBM's information retrieval library.


    [URL=http://lucene.apache.org/]Lucene's official Web site[/URL]: Explore numerous study materials for Lucene, including Javadoc and Lucene's latest release.


    [URL=http://lucene.sourceforge.net/talks/pisa/]A lecture on Lucene[/URL], presented by Doug Cutting at the University of Pisa on November 24, 2004: Explore this brief introduction to Lucene.


    [URL=http://www.amazon.com/gp/product/020139829X/104-7111632-8247925?v=glance&n=283155]Modern Information Retrieval[/URL] by Ricardo Baeza-Yates and Berthier Ribeiro-Neto: Read about changes in modern information retrieval and how to provide relevant information in this book about IR technology.


    developerWorks [URL=http://www.ibm.com/developerworks/web]Web Architecture zone[/URL]: Expand your site development skills with articles and tutorials that specialize in Web technologies.


    [URL=http://www.ibm.com/developerworks/offers/techbriefings/?S_TACT=105AGX08&S_CMP=art]developerWorks technical events and webcasts[/URL]: Stay current with jam-packed technical sessions that shorten your learning curve, and improve the quality and results of your most difficult software projects.

    Get products and technologies

    [URL=http://www.apache.org/dyn/closer.cgi/lucene/java/]Lucene[/URL]: Download the latest version.


    [URL=http://www.ibm.com/developerworks/downloads/?S_TACT=105AGX08&S_CMP=art]Free downloads and learning resources[/URL]: Improve your work with software downloads from developerWorks.

    Discuss

    [URL=http://lucene.apache.org/java/docs/mailinglists.html]Lucene mailing list[/URL] standards: Ask questions, share knowledge, and discuss issues.


    [URL=http://www.ibm.com/developerworks/community/]developerWorks discussion forums[/URL]: Join and participate in the developerWorks community.


    [URL=http://www.ibm.com/developerworks/blogs/]developerWorks blogs[/URL]:Get involved in the developerWorks community.

    [URL=http://www-128.ibm.com/developerworks/library/wa-lucene/#main]Back to top[/URL]

    About the author


    按此在新窗口浏览图片
      Deng Peng Zhou is a graduate student from Shanghai Jiaotong University. He is interested in Java technology and modern information retrieval. You can contact him at [URL=mailto:zhoudengpeng@yahoo.com.cn?cc=htc@us.ibm.com]zhoudengpeng@yahoo.com.cn[/URL].


       收藏   分享  
    顶(0)
      




    ----------------------------------------------

    -----------------------------------------------

    第十二章第一节《用ROR创建面向资源的服务》
    第十二章第二节《用Restlet创建面向资源的服务》
    第三章《REST式服务有什么不同》
    InfoQ SOA首席编辑胡键评《RESTful Web Services中文版》
    [InfoQ文章]解答有关REST的十点疑惑

    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2006/7/1 20:51:00
     
     Ambrosia 美女呀,离线,快来找我吧!
      
      
      威望:1
      头衔:天使
      等级:计算机学士学位
      文章:377
      积分:2304
      门派:XHTML.ORG.CN
      注册:2006/2/23

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给Ambrosia发送一个短消息 把Ambrosia加入好友 查看Ambrosia的个人资料 搜索Ambrosia在『 Web挖掘技术 』 的所有贴子 引用回复这个贴子 回复这个贴子 查看Ambrosia的博客2
    发贴心情 
    老大,不太懂俄。好像Lucene是个中间层,用java编写,全文检索引擎。我看的一些文章叫这个traditional search engine。这个文献只是个简单介绍马,是原创还是拾人牙慧?
    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2006/7/2 9:34:00
     
     whale 帅哥哟,离线,有人找我吗?水瓶座1980-1-25
      
      
      威望:7
      等级:大四(GRE考了1600分!)(版主)
      文章:131
      积分:1300
      门派:XML.ORG.CN
      注册:2004/7/2

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给whale发送一个短消息 把whale加入好友 查看whale的个人资料 搜索whale在『 Web挖掘技术 』 的所有贴子 引用回复这个贴子 回复这个贴子 查看whale的博客3
    发贴心情 
    lucene用来做面向领域的应用是很有用的
    个人觉得结合lucene做领域语义搜索非常有意义,也会很实用,目前偶们正在研究中!
    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2006/7/3 10:31:00
     
     GoogleAdSense水瓶座1980-1-25
      
      
      等级:大一新生
      文章:1
      积分:50
      门派:无门无派
      院校:未填写
      注册:2007-01-01
    给Google AdSense发送一个短消息 把Google AdSense加入好友 查看Google AdSense的个人资料 搜索Google AdSense在『 Web挖掘技术 』 的所有贴子 访问Google AdSense的主页 引用回复这个贴子 回复这个贴子 查看Google AdSense的博客广告
    2024/4/18 23:38:07

    本主题贴数3,分页: [1]

    管理选项修改tag | 锁定 | 解锁 | 提升 | 删除 | 移动 | 固顶 | 总固顶 | 奖励 | 惩罚 | 发布公告
    W3C Contributing Supporter! W 3 C h i n a ( since 2003 ) 旗 下 站 点
    苏ICP备05006046号《全国人大常委会关于维护互联网安全的决定》《计算机信息网络国际联网安全保护管理办法》
    109.375ms