在 Java 中读取 Solr 索引文件的内部结构

Question

我正在尝试读取 Solr 索引文件。此文件由版本 6.4 中的 Solr 下载页面的示例创建。
我正在使用此代码：

    import java.io.File;
    import java.io.IOException;
    import org.apache.lucene.document.Document;
    import org.apache.lucene.index.IndexReader;
    import org.apache.lucene.store.Directory;
    import org.apache.lucene.store.FSDirectory;

    public class TestIndex {
        public static void main(String[] args) throws IOException {


            Directory dirIndex = FSDirectory.open(new File("D:\data\data\index"));
            IndexReader indexReader = IndexReader.open(dirIndex);
            Document doc = null;   

            for(int i = 0; i < indexReader.numDocs(); i++) {
                doc = indexReader.document(i);
            }

            System.out.println(doc.toString());

            indexReader.close();
            dirIndex.close();
        }
    }

Solr 罐子：solr-solrj-6.5.1.jar
Lucene : lucene-核心-r1211247.jar

异常：

Exception in thread "main" 
org.apache.lucene.index.IndexFormatTooOldException: Format version is not 
supported (resource: 
ChecksumIndexInput(MMapIndexInput(path="D:\data\data\index\segments_2"))): 
1071082519 (needs to be between -9 and -12). This version of Lucene only 
supports indexes created with release 3.0 and later.

使用 lucene 6.5.1 更新了代码

Path path = FileSystems.getDefault().getPath("D:\data\data\index");
Directory dirIndex = FSDirectory.open(path);
DirectoryReader  dr  = DirectoryReader.open(dirIndex);
Document doc = null;   

    for(int i = 0; i < dr.numDocs(); i++) {
        doc = dr.document(i);
    }

    System.out.println(doc.toString());

    dr.close();
    dirIndex.close();

异常：

java.lang.UnsupportedClassVersionError: org/apache/lucene/store/Directory : Unsupported major.minor version 52.0.

你能帮我运行这个代码吗？

谢谢
维伦德拉·阿加瓦尔

Answer 1

那个 lucene-jar 好像是 2012 年的，所以已经有五年多了。使用 lucene-core-6.5.1 读取 Solr 6.5.1 生成的索引文件。

如果它错误地选择了任意命名的文件，您可以将依赖项固定在构建文件中。

Answer 2

我建议使用卢克

https://github.com/DmitryKey/luke

Luke is the GUI tool for introspecting your Lucene / Solr / Elasticsearch index. It allows:

Viewing your documents and analyzing their field contents (for stored fields) Searching in the index

Performing index maintenance: index health checking, index optimization (take a - backup before running this!)

Reading index from hdfs

Exporting the index or portion of it into an xml format

Testing your custom Lucene analyzers

Creating your own plugins!

在 Java 中读取 Solr 索引文件的内部结构

Reading internals of Solr index file in Java

lucene

indexing

solr

solrj

Luke is the GUI tool for introspecting your Lucene / Solr / Elasticsearch index. It allows: