Lucene 全文搜索 - 一个非常基础的教程

孙汉波

5.00/5 (4投票s)

2019年9月26日

MIT

13分钟阅读

26053

476

一个关于使用 Apache Lucene 进行全文搜索的简单教程。

下载演示项目 - 8.5 KB

引言

这是我今年写的第四篇教程。关于这个教程，我打算深入研究我最喜欢的主题之一——全文搜索引擎。人们常用的一个是 Apache Lucene。我之前用过它，并且写过一篇关于它和 Hibernate 的教程。这次，我想在不与其他技术混合的情况下探索它。我将有一个简单的 Java 控制台应用程序，它将执行三个不同的功能

索引一些文档
全文搜索以查找目标文档
按唯一标识符获取文档

该程序还执行其他一些杂项功能，例如从索引中删除所有文档，或仅从索引中删除某些文档。

该程序将使用文件目录作为索引存储库。其中使用的 Apache Lucene 版本是 8.2.0。我们只需要 lucene-core 库就可以让这一切正常工作。

背景

使用 Lucene 似乎很复杂。但与关系数据库相比，它相当简单。让我解释一些术语。首先，有索引的概念。索引就像一个数据库。它可以存储大量文档。文档就像一个表。正如我们都知道的，一个表可以有一个或多个列。一个文档可以有一个或多个字段。这些字段就像关系数据库的表中的列。

因此，添加文档就像向表中添加一行。然后，在索引中查找文档就像查询表以查找与查询条件匹配的数据行一样。正如我们都知道的，当查询关系数据库中的表时，是针对列指定查询条件。在索引中查找文档可以通过相同的方式完成，即通过针对文档字段指定搜索词。

对于这个示例应用程序，我将使用文件系统来存储文档索引。将文档添加到索引后，您会看到目录看起来像这样

screenshot 1

让我们从索引文档的方式开始。

索引文档

为了执行全文搜索操作，您需要做的第一件事就是将一些文档添加到索引中。Apache Lucene 库提供了两种对象类型，一种称为 Document；另一种称为 IndexableField。一个 document 包含多个可索引字段。一旦创建了带有多个可索引字段的 document，就可以将其添加到全文搜索索引中。IndexableField 是一个抽象类型，它的子类型包括 TextField、StringField、IntPoint、FloatPoint、IntRange、FloatRange 以及许多其他字段类型。在本教程中，我只使用 TextField 和 StringField。

之所以存在如此多的字段类型，是因为不同类型的数值可以被以不同的方式分析，并且可以作为单个文档添加到同一个可搜索索引中。TextField 和 StringField 之间的区别在于，TextField 的任何值都将被分解成单词（标记）。对于英语，一个句子是由用空格和标点符号分隔的单词组成的。如果一个句子存储在文本字段中，所有单词都会被提取出来，每个单词都是一个可搜索的标记。Lucene 会将此文档与所有这些单词关联起来。如果句子存储在 StringField 中，那么整个句子将被视为一个标记。而数值字段中的数值将被视为数值，可以使用等于、大于、小于或其他基于范围的比较来查询这些字段并用于定位文档。

让我们看看如何创建一个可索引的文档。这是代码

   public Document createIndexDocument(IndexableDocument docToAdd)
   {
      Document retVal = new Document();
      
      IndexableField docIdField = new StringField("DOCID",
         docToAdd.getDocumentId(),
         Field.Store.YES);
      IndexableField titleField = new TextField("TITLE",
         docToAdd.getTitle(),
         Field.Store.YES);
      IndexableField contentField = new TextField("CONTENT",
         docToAdd.getContent(),
         Field.Store.NO);
      IndexableField keywordsField = new TextField("KEYWORDS",
         docToAdd.getKeywords(),
         Field.Store.YES);
      IndexableField categoryField = new StringField("CATEGORY",
         docToAdd.getCategory(),
         Field.Store.YES);
      IndexableField authorNameField = new TextField("AUTHOR",
         docToAdd.getAuthorName(),
         Field.Store.YES);
      long createTime = docToAdd.getDocumentDate().getTime();
      IndexableField documentTimeField = new StoredField("DOCTIME", createTime);
      IndexableField emailField = new StringField("AUTHOREMAIL",
         docToAdd.getAuthorEmail(),
         Field.Store.YES);
      
      retVal.add(docIdField);
      retVal.add(titleField);
      retVal.add(contentField);
      retVal.add(keywordsField);
      retVal.add(categoryField);
      retVal.add(authorNameField);
      retVal.add(documentTimeField);
      retVal.add(emailField);
      
      return retVal;
   }

上面的代码片段不难理解。它创建了一个 Document 对象。然后创建了多个 IndexableFields 对象。所有这些字段对象都被添加到 Document 对象中。然后，该方法返回 Document 对象。此方法可以在文件 "FileBasedDocumentIndexer.java" 中找到。

在上面的代码片段中，我使用了 TextField 和 StringField。这两种类型的构造函数有三个参数。第一个是字段名。第二个是字段的值。最后一个是一个 enum 值，指示是否将值存储在索引中，或者只是索引而不丢弃该值。存储或不存储该值之间的区别在于，通过存储该值，当您检索文档时，您还可以获取这些存储字段的值。

下一步是实际索引文档。这是代码

   String indexDirectory;
   ...

   public void indexDocument(Document docToAdd) throws Exception
   {
      IndexWriter writer = null;
      try
      {
         Directory indexWriteToDir = 
               FSDirectory.open(Paths.get(indexDirectory));
         
         writer = new IndexWriter(indexWriteToDir, new IndexWriterConfig());
         writer.addDocument(docToAdd);
         writer.flush();
         writer.commit();
      }
      finally
      {
         if (writer != null)
         {
            writer.close();
         }
      }
   }

上面的代码片段使用 FDiretory 类的 static 方法 open() 来获取索引目录的引用，这是一个 Directory 类型（Apache Lucene 中的一个对象类型）的对象。接下来，我实例化一个 IndexWriter 对象。构造函数接受两个参数，第一个是 Directory 对象；第二个是 IndexWriterConfig 类型的一个配置对象。我使用了默认配置，该配置使用了 StandardAnalayzer。标准分析器默认处理基于英语的句子。还有许多其他类型的分析器，您可以通过添加额外的 jar 来使用它们，或者如果您愿意，也可以实现自己的分析器。为了简单起见，本教程只使用 StandardAnalyzer。

接下来，我使用 IndexWriter 对象的方法 addDocument() 将文档添加到全文搜索索引中。这就是文档索引发生的地方。然后调用 flush() 和 commit() 以确保索引完全提交到全文搜索索引。

我将整个操作包装在 try-finally 块中，但没有 catch 块。这样做是为了让任何异常都可以被任何调用者处理。finally 块将关闭写入器以清理资源使用。如果您运行示例项目并完成此方法，您将看到类似上面截图的内容。

现在我们知道了如何索引一个 document，接下来我们将看看如何搜索这个 document。

全文搜索以定位文档

在提供代码之前，我想解释我的设计意图。我喜欢搜索索引中所有字段的文档。如果任何字段与全文搜索条件匹配，则该文档将被视为已找到。

这是代码

   public List<FoundDocument> searchForDocument(String searchVal)
   {
      List<FoundDocument> retVal = new ArrayList<FoundDocument>();
      
      try
      {
         Directory dirOfIndexes = 
               FSDirectory.open(Paths.get(indexDirectory));
         
         IndexSearcher searcher = new IndexSearcher(DirectoryReader.open(dirOfIndexes));
         
         QueryBuilder bldr = new QueryBuilder(new StandardAnalyzer());
         Query q1 = bldr.createPhraseQuery("TITLE", searchVal);
         Query q2 = bldr.createPhraseQuery("KEYWORDS", searchVal);
         Query q3 = bldr.createPhraseQuery("CONTENT", searchVal);
         Query q4 = bldr.createBooleanQuery("CATEGORY", searchVal);
         Query q5 = bldr.createPhraseQuery("AUTHOR", searchVal);
         Query q6 = bldr.createBooleanQuery("AUTHOREMAIL", searchVal);
         
         BooleanQuery.Builder chainQryBldr = new BooleanQuery.Builder();
         chainQryBldr.add(q1, Occur.SHOULD);
         chainQryBldr.add(q2, Occur.SHOULD);
         chainQryBldr.add(q3, Occur.SHOULD);
         chainQryBldr.add(q4, Occur.SHOULD);
         chainQryBldr.add(q5, Occur.SHOULD);
         chainQryBldr.add(q6, Occur.SHOULD);         
         
         BooleanQuery finalQry = chainQryBldr.build();
         
         TopDocs allFound = searcher.search(finalQry, 100);
         if (allFound.scoreDocs != null)
         {
            for (ScoreDoc doc : allFound.scoreDocs)
            {
               System.out.println("Score: " + doc.score);
               
               int docidx = doc.doc;
               Document docRetrieved = searcher.doc(docidx);
               if (docRetrieved != null)
               {
                  FoundDocument docToAdd = new FoundDocument();

                  IndexableField field = docRetrieved.getField("TITLE");
                  if (field != null)
                  {
                     docToAdd.setTitle(field.stringValue());
                  }
                  
                  field = docRetrieved.getField("DOCID");
                  if (field != null)
                  {
                     docToAdd.setDocumentId(field.stringValue());
                  }
                  
                  field = docRetrieved.getField("KEYWORDS");
                  if (field != null)
                  {
                     docToAdd.setKeywords(field.stringValue());
                  }
                  
                  field = docRetrieved.getField("CATEGORY");
                  if (field != null)
                  {
                     docToAdd.setCategory(field.stringValue());
                  }
                  
                  if (docToAdd.validate())
                  {
                     retVal.add(docToAdd);
                  }
               }
            }
         }
      }
      catch (Exception ex)
      {
         ex.printStackTrace();
      }
      
      return retVal;
   }

这段代码片段可以分成几个部分，第一部分是打开全文搜索索引的目录

Directory dirOfIndexes = 
   FSDirectory.open(Paths.get(indexDirectory));

IndexSearcher searcher = new IndexSearcher(DirectoryReader.open(dirOfIndexes));

这部分与用于将 document 添加到 index 的 directory 的打开方式类似。但不是使用 IndexWriter，我正在使用一个 IndexSearcher 对象。同样，我使用的分析器是默认的 StandardAnalyzer。您不会在我的示例代码中看到 StandardAnalyzer 对象或任何引用，因为它们被假定为默认使用的分析器。

接下来，我需要创建一个查询。查询基本上指定如果任何字段包含短语（通过我的方法的参数传入），则该文档被视为已找到。由于我有多个字段，我必须像这样创建查询

   QueryBuilder bldr = new QueryBuilder(new StandardAnalyzer());
   Query q1 = bldr.createPhraseQuery("TITLE", searchVal);
   Query q2 = bldr.createPhraseQuery("KEYWORDS", searchVal);
   Query q3 = bldr.createPhraseQuery("CONTENT", searchVal);
   Query q4 = new TermQuery(new Term("CATEGORY", searchVal));
   Query q5 = bldr.createPhraseQuery("AUTHOR", searchVal);
   Query q6= new TermQuery(new Term("AUTHOREMAIL", searchVal));

   BooleanQuery.Builder chainQryBldr = new BooleanQuery.Builder();
   chainQryBldr.add(q1, Occur.SHOULD);
   chainQryBldr.add(q2, Occur.SHOULD);
   chainQryBldr.add(q3, Occur.SHOULD);
   chainQryBldr.add(q4, Occur.SHOULD);
   chainQryBldr.add(q5, Occur.SHOULD);
   chainQryBldr.add(q6, Occur.SHOULD);         

   BooleanQuery finalQry = chainQryBldr.build();

上面的代码片段首先创建了六个不同的 Query 对象。每个对象对应文档的一个字段。如您所见，我使用了两种不同的 Query 子类型。一种称为短语查询。这种类型的查询会尝试将输入字符串作为字段值的一个子文本段进行匹配。另一种类型称为术语查询。使用两种不同类型的 Query 的原因是短语查询不适用于 StringField 类型字段。所以我使用术语查询来尝试将输入的搜索值与 StringField 类型字段进行匹配。这种简单的方法足以使查询工作。

一旦我构建了查询，就需要创建一个主查询来连接所有六个查询。逻辑应该是，如果字段 #1 匹配输入查询值，或者字段 #2 匹配输入查询值，或者字段 #3 匹配输入查询值，……，或者字段 #6 匹配输入查询值，那么 document 就是应该被检索的那个。我们可以使用 BooleanQuery 的查询构建器来创建这样的主查询。一旦我创建了构建对象，我就一个接一个地将所有六个查询添加到构建器中。每个查询都以 Occur 选项和对象值 Occur.Should 添加。对象值 Occur.Should 等同于逻辑运算符 "OR"。如果您想使用等同于逻辑运算符 "AND" 的东西，那么您可以使用 Occur 对象值 Occur.Must。在我的场景中，我需要的是对象值 Occur.Should。

最后一行将构建最终的主查询。然后我需要对索引调用查询。这是如何做的

TopDocs allFound = searcher.search(finalQry, 100);

我使用 IndexSearcher 对象的 search() 方法来查找最相关的文档。此方法接受两个参数。第一个是主查询。第二个是要返回的最相关的文档的最大数量。TopDocs 是一个文档集合，这些文档在查询执行过程中被发现最相关。TopDocs 中的每个对象都是一个整数索引和一个分数。分数表示文档与搜索条件的相关程度。

现在我有了文档的集合，我将获取它们并获取我需要的文档信息。这是我如何做的完整代码

if (allFound.scoreDocs != null)
{
   for (ScoreDoc doc : allFound.scoreDocs)
   {
      System.out.println("Score: " + doc.score);
      
      int docidx = doc.doc;
      Document docRetrieved = searcher.doc(docidx);
      if (docRetrieved != null)
      {
         FoundDocument docToAdd = new FoundDocument();

         IndexableField field = docRetrieved.getField("TITLE");
         if (field != null)
         {
            docToAdd.setTitle(field.stringValue());
         }
         
         field = docRetrieved.getField("DOCID");
         if (field != null)
         {
            docToAdd.setDocumentId(field.stringValue());
         }
         
         field = docRetrieved.getField("KEYWORDS");
         if (field != null)
         {
            docToAdd.setKeywords(field.stringValue());
         }
         
         field = docRetrieved.getField("CATEGORY");
         if (field != null)
         {
            docToAdd.setCategory(field.stringValue());
         }
         
         if (docToAdd.validate())
         {
            retVal.add(docToAdd);
         }
      }
   }
}

上面的代码片段遍历找到的文档。对于每个找到的文档，我首先输出相关性分数。接下来，我通过 doc.doc 获取文档的整数索引值。最后，我使用 searcher 对象，根据文档的整数索引检索文档。这是代码片段

   System.out.println("Score: " + doc.score);

   int docidx = doc.doc;
   Document docRetrieved = searcher.doc(docidx);

一旦我检索了文档，我必须提取字段值并将其存储在我的 document 对象中。这是完成此操作的代码

if (docRetrieved != null)
{
   FoundDocument docToAdd = new FoundDocument();

   IndexableField field = docRetrieved.getField("TITLE");
   if (field != null)
   {
      docToAdd.setTitle(field.stringValue());
   }
   
   field = docRetrieved.getField("DOCID");
   if (field != null)
   {
      docToAdd.setDocumentId(field.stringValue());
   }
   
   field = docRetrieved.getField("KEYWORDS");
   if (field != null)
   {
      docToAdd.setKeywords(field.stringValue());
   }
   
   field = docRetrieved.getField("CATEGORY");
   if (field != null)
   {
      docToAdd.setCategory(field.stringValue());
   }
   
   if (docToAdd.validate())
   {
      retVal.add(docToAdd);
   }
}

这是本教程最激动人心的部分的结尾。接下来，我将介绍一些杂项操作。

其他有趣的东西

我们都知道在关系数据库中，表中的每一行都有一个唯一的标识符。当我使用 Lucene 时，我也这样做。我使用文档的一个字段来存储唯一标识符。我所做的是使用 Java 的 UUID 对象创建一个 GUID 值。我获取 GUID 值的 string 表示形式，然后删除破折号字符。这是一个我使用的 GUID 值的示例

77bbd895bb6f4c16bb637a44d8ea6f1e

假设存储唯一标识符的字段名为 "DOCID"。要查找具有此 ID 的文档，请按以下方式操作

   public Document getDocumentById(String docId)
   {
      Document retVal = null;
      try
      {
         Directory dirOfIndexes = 
               FSDirectory.open(Paths.get(indexDirectory));
         
         StandardAnalyzer analyzer = new StandardAnalyzer();
         IndexSearcher searcher = new IndexSearcher(DirectoryReader.open(dirOfIndexes));
         QueryBuilder quryBldr = new QueryBuilder(analyzer);
         
         Query idQury = quryBldr.createPhraseQuery("DOCID", docId);
         TopDocs foundDocs = searcher.search(idQury, 1);
         if (foundDocs != null)
         {
            if (foundDocs.scoreDocs != null && foundDocs.scoreDocs.length > 0)
            {
               System.out.println("Score: " + foundDocs.scoreDocs[0].score);
               retVal = searcher.doc(foundDocs.scoreDocs[0].doc);
            }
         }
      }
      catch (Exception ex)
      {
         ex.printStackTrace();
      }
            
      return retVal;
   }

这与上一节中介绍的 search 方法非常相似。首先，我打开一个包含全文索引的 Directory，然后创建一个 IndexSearcher 对象。使用此 searcher 对象，我创建一个短语查询，该查询指定要针对名为 "DOCID" 的单个字段搜索 GUID 输入值。我指定此查询只返回一个文档。所以它是一个“一个结果或没有”的查询。无论找到什么，都会是我预期的文档。一旦我找到文档，我将使用整数索引值将其检索并转换为我自己的类型的文档。

接下来，我想讨论两种有用的清理方法。一种是如何从全文索引中删除文档。另一种是从全文索引中删除所有文档。两者都易于执行。首先，让我们看看如何从索引中删除单个文档。这是代码

   public void deleteDocument(String docId) throws Exception
   {
      IndexWriter writer = null;
      try
      {
         Directory indexWriteToDir = 
               FSDirectory.open(Paths.get(indexDirectory));
         
         writer = new IndexWriter(indexWriteToDir, new IndexWriterConfig());
         writer.deleteDocuments(new Term("DOCID", docId));
         writer.flush();
         writer.commit();
      }
      finally
      {
         if (writer != null)
         {
            writer.close();
         }
      }
   }

这段代码片段展示了为文档设置唯一标识符的重要性。在此代码片段中，它调用 IndexWriter 的 deleteDocuments()。此方法使用 Term 对象查找所有与搜索词匹配的文档。然后它删除所有这些文档。在上面的代码中，我再次使用字段 "DOCID" 来查找匹配唯一标识符的文档。此方法不限于仅删除一个文档。您可以使用 Term 对象指定多个文档的搜索词。然后该方法将删除所有这些文档。

类似地，只需调用 deleteAll() 方法即可删除所有索引，这是如何完成的

   public void deleteAllIndexes() throws Exception
   {
      IndexWriter writer = null;
      try
      {
         Directory indexWriteToDir = 
               FSDirectory.open(Paths.get(indexDirectory));
         
         writer = new IndexWriter(indexWriteToDir, new IndexWriterConfig());
         writer.deleteAll();
         writer.flush();
         writer.commit();
      }
      finally
      {
         if (writer != null)
         {
            writer.close();
         }
      }
   }

在这两种方法中，我打开目录；使用 directory 对象和默认配置创建一个 IndexWriter 对象。最后，我调用 IndexWriter 对象的 delete 方法。最后，我刷新 IndexWriter 对象并提交更改。

就是这样！用于基本文档索引和全文搜索所需的所有小技巧。它们不多。但它们有效。

测试运行

现在最大的问题是如何测试所有这些代码设计。在我的示例应用程序中，我有一个名为 IndexingMain 的类。里面有一个主入口和一堆辅助方法。让我从创建文档的方法开始。这是

   public static IndexableDocument prepareDocForTesting(String docId)
   {
      IndexableDocument doc = new IndexableDocument();

      Calendar cal = Calendar.getInstance();
      cal.set(2018, 8, 21, 13, 13, 13);
      
      doc.setDocumentId(docId);
      doc.setAuthorEmail("testuser@lucenetest.com");
      doc.setAuthorName("Lucene Test User");
      doc.setCategory("Index File Sample");
      doc.setContent("There are two main types of medical gloves: "
         + "examination and surgical. Surgical gloves have more "
         + "precise sizing with a better precision and sensitivity "
         + "and are made to a higher standard. Examination gloves "
         + "are available as either sterile or non-sterile, while "
         + "surgical gloves are generally sterile.");
      doc.setDocumentDate(cal.getTime());
      doc.setKeywords("Joseph, Brian, Clancy, Connery, Reynolds, Lindsay");
      doc.setTitle("Quick brown fox and the lazy dog");
      
      return doc;
   }

IndexableDocument 是我创建的一个文档类型。我必须将我的文档类型对象转换为 Apache Lucene Document 对象。这是通过这段代码片段完成的，您可以在主入口中找到它

FileBasedDocumentIndexer indexer = new FileBasedDocumentIndexer("c:/DevJunk/Lucene/indexes");
...
Document lucDoc1 = indexer.createIndexDocument(doc1);
indexer.indexDocument(lucDoc1);

上面代码片段的最后一行是将文档索引到 Lucene 文件索引中。既然我们已经成功索引了一个文档，现在是时候看看搜索是如何工作的了。这是

public static void testFindDocument(String searchTerm)
{
   LuceneDocumentLocator locator = new LuceneDocumentLocator("c:/DevJunk/Lucene/indexes");
   List<FoundDocument> foundDocs = locator.searchForDocument(searchTerm);
   
   if (foundDocs != null)
   {
      for (FoundDocument doc : foundDocs)
      {
         System.out.println("------------------------------");
         System.out.println("Found document...");
         System.out.println("Document Id: " + doc.getDocumentId());
         System.out.println("Title: " + doc.getTitle());
         System.out.println("Category: " + doc.getCategory());
         System.out.println("Keywords: " + doc.getKeywords());
         System.out.println("------------------------------");
      }
   }
}

这是这个辅助方法在主入口中的用法

...        
System.out.println("********************************");
System.out.println("Search first document");
testFindDocument("available as either");
System.out.println("********************************");
...

最后，我创建了一个辅助方法，通过 "DOCID" 查找文档。这是

   public static Document testGetDocumentById(String docId)
   {
      LuceneDocumentLocator locator = new LuceneDocumentLocator("c:/DevJunk/Lucene/indexes");
      Document retVal = locator.getDocumentById(docId);
      
      if (retVal != null)
      {
         System.out.println("Get Document by Id [" +  docId + "] found.");
      }
      else
      {
         System.out.println("Get Document by Id [" +  docId + "] **not** found.");
      }
      
      return retVal;
   }

要使用这个辅助方法进行测试，就像这样

...
testGetDocumentById(id1);
...

这是示例应用程序的主入口

   public static void main(String[] args)
   {
      UUID x = UUID.randomUUID();
      String id1 = x.toString();
      id1 = id1.replace("-", "");
      System.out.println("Document #1 with id [" + id1 + "] has been created.");
      
      x = UUID.randomUUID();
      String id2 = x.toString();
      id2 = id2.replace("-", "");
      System.out.println("Document #2 with id [" + id2 + "] has been created.");

      IndexableDocument doc1 = prepareDocForTesting(id1);
      IndexableDocument doc2 = prepare2ndTestDocument(id2);
      
      FileBasedDocumentIndexer indexer = 
               new FileBasedDocumentIndexer("c:/DevJunk/Lucene/indexes");
      try
      {
         indexer.deleteAllIndexes();

         Document lucDoc1 = indexer.createIndexDocument(doc1);
         indexer.indexDocument(lucDoc1);
         
         System.out.println("********************************");
         System.out.println("Search first document");
         testFindDocument("available as either");
         System.out.println("********************************");
         
         Document lucDoc2 = indexer.createIndexDocument(doc2);
         indexer.indexDocument(lucDoc2);
         
         testGetDocumentById(id1);
         
         System.out.println("********************************");
         System.out.println("Search second document");
         testFindDocument("coocoobird@moomootease.com");
         System.out.println("********************************");

         testGetDocumentById(id2);

         indexer.deleteAllIndexes();
      }
      catch (Exception ex)
      {
         ex.printStackTrace();
         return;
      }
   }

示例应用程序包含一个 Apache Maven pom.xml。要构建应用程序，只需运行

mvn clean install

如果您愿意，可以从这个 Maven pom.xml 文件创建一个 Eclipse 项目。然后您可以将项目导入 Eclipse。要创建 Eclipse 项目文件，请执行此操作

mvn eclipse:eclipse

当您运行应用程序时，您会看到这个

Document #1 with id [ae1541e5051743e5af310bcfb50a19e8] has been created.
Document #2 with id [c1f20e79043d4b40aa2b9f3ac74e287b] has been created.
********************************
Search first document
Score: 0.39229375
------------------------------
Found document...
Document Id: ae1541e5051743e5af310bcfb50a19e8
Title: Quick brown fox and the lazy dog
Category: Index File Sample
Keywords: Joseph, Brian, Clancy, Connery, Reynolds, Lindsay
------------------------------
********************************
Score: 0.3150669
Get Document by Id [ae1541e5051743e5af310bcfb50a19e8] found.
********************************
Search second document
Score: 0.3150669
------------------------------
Found document...
Document Id: c1f20e79043d4b40aa2b9f3ac74e287b
Title: The centre of kingfisher diversity is the Australasian region
Category: Once upon a Time
Keywords: Liddy, Yellow, Fisher, King, Stevie, Nickolas, Feng Feng
------------------------------
********************************
Score: 0.3150669
Get Document by Id [c1f20e79043d4b40aa2b9f3ac74e287b] found.

有点乱，但它证明了我上面创建的所有方法都能按预期工作。其中可能存在一些错误。我希望您能发现一些。至少，随时更改方法调用中的搜索字符串，例如 testFindDocument(...);<code> 像这样

...
testFindDocument("<Your test search string here>");
...

摘要

最后，终于到了写本教程总结的时候了。本教程不是我过去常写的有趣类型的教程。我对主题的了解不深。我曾挣扎过。最后，结果看起来还可以。我对此还算满意。

在本教程中，我讨论了以下主题

如何将目录打开为文档索引
如何将文档索引到文档索引中
如何在文档索引中搜索文档。它很简单但效果还不错。
如何通过唯一标识符定位文档
如何通过唯一标识符删除文档。以及如何删除所有文档

我仍然很享受写这个教程的乐趣。希望您也能享受它。

历史

2019年9月24日 - 初稿