如何在 Pylucene 8.6.1 中创建自定义分析器？

Question

我看过 this, this and this 但我不确定为什么它们对我不起作用。

我通常会使用如下所示的分析仪。

import lucene
from org.apache.lucene.analysis.core import WhitespaceAnalyzer
from org.apache.lucene.index import IndexWriterConfig, IndexWriter
from org.apache.lucene.store import SimpleFSDirectory
from java.nio.file import Paths
from org.apache.lucene.document import Document, Field, TextField

index_path = "./index"

lucene.initVM()

analyzer =  WhitespaceAnalyzer()
config = IndexWriterConfig(analyzer)
store = SimpleFSDirectory(Paths.get(index_path))
writer = IndexWriter(store, config)

doc = Document()
doc.add(Field("title", "The quick brown fox.",  TextField.TYPE_STORED))
writer.addDocument(doc)

writer.close()
store.close()

而不是 WhitespaceAnalyzer() 我想使用 MyAnalyzer() 应该有 LowerCaseFilter 和 WhitespaceTokenizer.

from org.apache.lucene.analysis.core import LowerCaseFilter, WhitespaceTokenizer
from org.apache.pylucene.analysis import PythonAnalyzer

class MyAnalyzer(PythonAnalyzer):
    def __init__(self):
        PythonAnalyzer.__init__(self)

    def createComponents(self, fieldName):
        # What do I write here?

你能帮我写和使用吗MyAnalyzer()？

Answer 1

我发现here and here下面的方法有效。

from org.apache.lucene.analysis.core import LowerCaseFilter, WhitespaceTokenizer
from org.apache.pylucene.analysis import PythonAnalyzer
from org.apache.lucene.analysis import Analyzer

class MyAnalyzer(PythonAnalyzer):
    def __init__(self):
        PythonAnalyzer.__init__(self)

    def createComponents(self, fieldName):
        source = WhitespaceTokenizer()
        result = LowerCaseFilter(source)
        return Analyzer.TokenStreamComponents(source, result)

如果有人能指出正确的方向以便能够正确找到这些答案，那就太好了。

如何在 Pylucene 8.6.1 中创建自定义分析器？

How to create a custom analyzer in Pylucene 8.6.1?

python

lucene

search

full-text-search

pylucene