为 elasticsearch 构建自定义分词器
Build a Custom Tokenizer for elasticsearch
我正在构建一个自定义分词器来响应这个:Performance of doc_values field vs analysed field
这个 API 的 None 似乎有记录(?),所以我要离开其他 plugins/tokenizers 的代码示例,但是当我重新启动 elastic 时部署了我的tokenizer 我在日志中不断收到此错误:
[2017-09-20 08:45:37,412][WARN ][indices.cluster ] [Samuel Silke] [[storm-crawler-2017-09-11][3]] marking and sending shard failed due to [failed to create index]
[storm-crawler-2017-09-11] IndexCreationException[failed to create index]; nested: CreationException[Guice creation errors:
1) Could not find a suitable constructor in com.cameraforensics.elasticsearch.plugins.UrlTokenizerFactory. Classes must have either one (and only one) constructor annotated with @Inject or a zero-argument constructor that is not private.
at com.cameraforensics.elasticsearch.plugins.UrlTokenizerFactory.class(Unknown Source)
at org.elasticsearch.index.analysis.TokenizerFactoryFactory.create(Unknown Source)
at org.elasticsearch.common.inject.assistedinject.FactoryProvider2.initialize(Unknown Source)
at _unknown_
1 error];
at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:360)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewIndices(IndicesClusterStateService.java:294)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:163)
at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:610)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:772)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.elasticsearch.common.inject.CreationException: Guice creation errors:
1) Could not find a suitable constructor in com.cameraforensics.elasticsearch.plugins.UrlTokenizerFactory. Classes must have either one (and only one) constructor annotated with @Inject or a zero-argument constructor that is not private.
at com.cameraforensics.elasticsearch.plugins.UrlTokenizerFactory.class(Unknown Source)
at org.elasticsearch.index.analysis.TokenizerFactoryFactory.create(Unknown Source)
at org.elasticsearch.common.inject.assistedinject.FactoryProvider2.initialize(Unknown Source)
at _unknown_
1 error
at org.elasticsearch.common.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:360)
at org.elasticsearch.common.inject.InjectorBuilder.injectDynamically(InjectorBuilder.java:172)
at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:110)
at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:157)
at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:55)
at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:358)
... 9 more
我的 tokenizer 是为 v2.3.4 构建的,TokenizerFactory 如下所示:
public class UrlTokenizerFactory extends AbstractTokenizerFactory {
@Inject
public UrlTokenizerFactory(Index index, IndexSettingsService indexSettings, @Assisted String name, @Assisted Settings settings){
super(index, indexSettings.getSettings(), name, settings);
}
@Override
public Tokenizer create() {
return new UrlTokenizer();
}
}
我真的不知道我做错了什么。我部署不正确吗?根据日志,它似乎正在使用我的 类...
我只将它部署到我的一个 es 节点(4 节点集群)。 /_cat/plugins?v
端点给出了这个:
name component version type url
Samuel Silke urltokenizer 2.3.4.0 j
由于关于此过程的文档很少或根本没有,我通过复制其他人在插件中创建的结构来达到这个目的。
我看到的错误没有意义。对于这个版本的弹性,我的 TokenizerFactory 看起来就像其他人一样。我做错了什么,或者可能没有做我应该做的事情?
原来我遗漏了一个 Environment
变量。应该是这样的:
public UrlTokenizerFactory(Index index, IndexSettingsService indexSettings, Environment env, @Assisted String name, @Assisted Settings settings){
...
我正在构建一个自定义分词器来响应这个:Performance of doc_values field vs analysed field
这个 API 的None 似乎有记录(?),所以我要离开其他 plugins/tokenizers 的代码示例,但是当我重新启动 elastic 时部署了我的tokenizer 我在日志中不断收到此错误:
[2017-09-20 08:45:37,412][WARN ][indices.cluster ] [Samuel Silke] [[storm-crawler-2017-09-11][3]] marking and sending shard failed due to [failed to create index]
[storm-crawler-2017-09-11] IndexCreationException[failed to create index]; nested: CreationException[Guice creation errors:
1) Could not find a suitable constructor in com.cameraforensics.elasticsearch.plugins.UrlTokenizerFactory. Classes must have either one (and only one) constructor annotated with @Inject or a zero-argument constructor that is not private.
at com.cameraforensics.elasticsearch.plugins.UrlTokenizerFactory.class(Unknown Source)
at org.elasticsearch.index.analysis.TokenizerFactoryFactory.create(Unknown Source)
at org.elasticsearch.common.inject.assistedinject.FactoryProvider2.initialize(Unknown Source)
at _unknown_
1 error];
at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:360)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewIndices(IndicesClusterStateService.java:294)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:163)
at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:610)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:772)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.elasticsearch.common.inject.CreationException: Guice creation errors:
1) Could not find a suitable constructor in com.cameraforensics.elasticsearch.plugins.UrlTokenizerFactory. Classes must have either one (and only one) constructor annotated with @Inject or a zero-argument constructor that is not private.
at com.cameraforensics.elasticsearch.plugins.UrlTokenizerFactory.class(Unknown Source)
at org.elasticsearch.index.analysis.TokenizerFactoryFactory.create(Unknown Source)
at org.elasticsearch.common.inject.assistedinject.FactoryProvider2.initialize(Unknown Source)
at _unknown_
1 error
at org.elasticsearch.common.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:360)
at org.elasticsearch.common.inject.InjectorBuilder.injectDynamically(InjectorBuilder.java:172)
at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:110)
at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:157)
at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:55)
at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:358)
... 9 more
我的 tokenizer 是为 v2.3.4 构建的,TokenizerFactory 如下所示:
public class UrlTokenizerFactory extends AbstractTokenizerFactory {
@Inject
public UrlTokenizerFactory(Index index, IndexSettingsService indexSettings, @Assisted String name, @Assisted Settings settings){
super(index, indexSettings.getSettings(), name, settings);
}
@Override
public Tokenizer create() {
return new UrlTokenizer();
}
}
我真的不知道我做错了什么。我部署不正确吗?根据日志,它似乎正在使用我的 类...
我只将它部署到我的一个 es 节点(4 节点集群)。 /_cat/plugins?v
端点给出了这个:
name component version type url
Samuel Silke urltokenizer 2.3.4.0 j
由于关于此过程的文档很少或根本没有,我通过复制其他人在插件中创建的结构来达到这个目的。
我看到的错误没有意义。对于这个版本的弹性,我的 TokenizerFactory 看起来就像其他人一样。我做错了什么,或者可能没有做我应该做的事情?
原来我遗漏了一个 Environment
变量。应该是这样的:
public UrlTokenizerFactory(Index index, IndexSettingsService indexSettings, Environment env, @Assisted String name, @Assisted Settings settings){
...