将 "contents" 字段更改为存储、标记化、索引以突出显示
change "contents" field to stored,tokenized,indexed for highlight
这就是我从 LucenePDFDocument
:
中获取详细信息的方式
doc = LucenePDFDocument.getDocument(file);
System.out.println("field list: \n" + doc.getFields());
这是输出:
field list:
[<stored<path:D:\Kuliah\rancangan document indexing\dir-pdf\dua.pdf>,
stored<url:D:/Kuliah/rancangan document indexing/dir-pdf/dua.pdf>,
stored,indexed,omitNorms,indexOptions=DOCS<modified:20170307220729>,
indexed,tokenized<uid:D Kuliah rancangan document indexing dir-pdf dua.pdf 20170307220729>,
indexed,tokenized<contents:java.io.StringReader@4206a205>,
stored,indexed,tokenized<Author:Acer-2577>,
stored,indexed,tokenized<CreationDate:20150222074338>,
stored,indexed,tokenized<Creator:PDF24 Creator>,
stored,indexed,tokenized<ModificationDate:20150222074338>,
stored,indexed,tokenized<Producer:GPL Ghostscript 9.10>,
stored,indexed,tokenized<Title:Microsoft Word - Vol 10.1 bag ke 2a fix.doc>,
stored<summary:Jurnal Teknologi Informasi, Volume 10 Nomor 1, April ...>]
我想在 "contents" 字段中突出显示检索到的词。 Highlight 需要一个存储字段,但 "contents" 字段只是索引和标记化。我收到如下错误:"contents field is not stored".
我应该怎么做才能使 "contents" 字段存储、标记化和索引?
应该编辑 LucenePDFDocument.java 吗?哪一部分?
是的,内容字段已编入索引但未存储,这意味着它不会从搜索结果中返回,但可以搜索,是的,这不适用于荧光笔。
您需要修改 LucenePDFDocument class 才能存储该字段。为此,只需将字符串而不是 reader 传递给 addTextField 调用:
String contents = writer.getBuffer().toString();
addTextField(document, "contents", contents);
您可能还应该删除 "summary" 字段,因为如果您要存储完整内容,则不需要它。
这就是我从 LucenePDFDocument
:
doc = LucenePDFDocument.getDocument(file);
System.out.println("field list: \n" + doc.getFields());
这是输出:
field list:
[<stored<path:D:\Kuliah\rancangan document indexing\dir-pdf\dua.pdf>,
stored<url:D:/Kuliah/rancangan document indexing/dir-pdf/dua.pdf>,
stored,indexed,omitNorms,indexOptions=DOCS<modified:20170307220729>,
indexed,tokenized<uid:D Kuliah rancangan document indexing dir-pdf dua.pdf 20170307220729>,
indexed,tokenized<contents:java.io.StringReader@4206a205>,
stored,indexed,tokenized<Author:Acer-2577>,
stored,indexed,tokenized<CreationDate:20150222074338>,
stored,indexed,tokenized<Creator:PDF24 Creator>,
stored,indexed,tokenized<ModificationDate:20150222074338>,
stored,indexed,tokenized<Producer:GPL Ghostscript 9.10>,
stored,indexed,tokenized<Title:Microsoft Word - Vol 10.1 bag ke 2a fix.doc>,
stored<summary:Jurnal Teknologi Informasi, Volume 10 Nomor 1, April ...>]
我想在 "contents" 字段中突出显示检索到的词。 Highlight 需要一个存储字段,但 "contents" 字段只是索引和标记化。我收到如下错误:"contents field is not stored".
我应该怎么做才能使 "contents" 字段存储、标记化和索引? 应该编辑 LucenePDFDocument.java 吗?哪一部分?
是的,内容字段已编入索引但未存储,这意味着它不会从搜索结果中返回,但可以搜索,是的,这不适用于荧光笔。
您需要修改 LucenePDFDocument class 才能存储该字段。为此,只需将字符串而不是 reader 传递给 addTextField 调用:
String contents = writer.getBuffer().toString();
addTextField(document, "contents", contents);
您可能还应该删除 "summary" 字段,因为如果您要存储完整内容,则不需要它。