StormCrawler /Elastic Search Apache Tika 用于解析 PDF。 运行 拓扑时出现错误
StormCrawler /Elastic Search Apache Tika for parsing PDF's. Getting error when running topology
当我 运行 es-crawler.flux 拓扑时出现以下错误。我不确定我做错了什么。我认为没有 yaml 错误?
**I added the Apache Tika module as an dependency in the pom.xml. file**
<!-- Add tika dependency -->
<dependency>
<groupId>com.digitalpebble.stormcrawler</groupId>
<artifactId>storm-crawler-tika</artifactId>
<version>${stormcrawler.version}</version>
</dependency>
更新了此处引用的 es-crawler.flux 文件* https://gist.github.com/jnioche/3f09c2e3f7da845181b733253bc806f1
我运行拓扑
**Got the following results.**
线程“main”中的异常无法创建 属性=JavaBean 的流=org.apache.storm.flux.model.TopologyDef@65e98b1c
in 'string',第 1 行,第 1 列:
name: "devcrawler" ^
无法为 JavaBean 创建 属性=grouping=org.apache.storm.flux.model.StreamDef@1ff4931d
in 'string',第 94 行,第 5 列:
- 来自:“分流器”```
^
``` in 'string', line 97, column 7:
type: LOCAL_OR_SHUFFLE
^
Unable to find property 'streamid' on class: org.apache.storm.flux.model.GroupingDef
in 'string', line 98, column 17:
streamid: "tika"
``` Unable to find property 'streamid' on class: org.apache.storm.flux.model.GroupingDef
in 'string', line 98, column 17:
streamid: "tika"
^
in 'string', line 97, column 7:
type: LOCAL_OR_SHUFFLE
^
in 'string', line 63, column 3:
- from: "spout"
^
at org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.constructJavaBean2ndStep(Constructor.java:292)
at org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.construct(Constructor.java:171)
at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:331)
at org.yaml.snakeyaml.constructor.BaseConstructor.constructObjectNoCheck(BaseConstructor.java:230)
at org.yaml.snakeyaml.constructor.BaseConstructor.constructObject(BaseConstructor.java:219)
at org.yaml.snakeyaml.constructor.BaseConstructor.constructDocument(BaseConstructor.java:173)
at org.yaml.snakeyaml.constructor.BaseConstructor.getSingleData(BaseConstructor.java:157)
at org.yaml.snakeyaml.Yaml.loadFromReader(Yaml.java:472)
at org.yaml.snakeyaml.Yaml.load(Yaml.java:398)
at org.apache.storm.flux.parser.FluxParser.loadYaml(FluxParser.java:168)
at org.apache.storm.flux.parser.FluxParser.parseInputStream(FluxParser.java:114)
at org.apache.storm.flux.parser.FluxParser.parseFile(FluxParser.java:68)
at org.apache.storm.flux.Flux.runCli(Flux.java:167)
at org.apache.storm.flux.Flux.main(Flux.java:119)```
Caused by: Cannot create property=grouping for JavaBean=org.apache.storm.flux.model.StreamDef@1ff4931d ```
in 'string', line 94, column 5:
- from: "shunt"
^
Cannot create property=streamid for JavaBean=org.apache.storm.flux.model.GroupingDef@710f4dc7
in 'string', line 97, column 7:
type: LOCAL_OR_SHUFFLE
^
Unable to find property 'streamid' on class: org.apache.storm.flux.model.GroupingDef ```
in 'string', line 98, column 17:
streamid: "tika"
in 'string', line 97, column 7: ```
type: LOCAL_OR_SHUFFLE
```
我从上面的 Gist 复制了 Flux 文件,运行 没有问题。也许您文件中的行对齐方式不正确(例如 space 缺失)?
当我 运行 es-crawler.flux 拓扑时出现以下错误。我不确定我做错了什么。我认为没有 yaml 错误?
**I added the Apache Tika module as an dependency in the pom.xml. file**
<!-- Add tika dependency -->
<dependency>
<groupId>com.digitalpebble.stormcrawler</groupId>
<artifactId>storm-crawler-tika</artifactId>
<version>${stormcrawler.version}</version>
</dependency>
更新了此处引用的 es-crawler.flux 文件* https://gist.github.com/jnioche/3f09c2e3f7da845181b733253bc806f1
我运行拓扑
**Got the following results.**
线程“main”中的异常无法创建 属性=JavaBean 的流=org.apache.storm.flux.model.TopologyDef@65e98b1c
in 'string',第 1 行,第 1 列:
name: "devcrawler" ^
无法为 JavaBean 创建 属性=grouping=org.apache.storm.flux.model.StreamDef@1ff4931d
in 'string',第 94 行,第 5 列:
- 来自:“分流器”```
^
``` in 'string', line 97, column 7:
type: LOCAL_OR_SHUFFLE
^
Unable to find property 'streamid' on class: org.apache.storm.flux.model.GroupingDef
in 'string', line 98, column 17:
streamid: "tika"
``` Unable to find property 'streamid' on class: org.apache.storm.flux.model.GroupingDef
in 'string', line 98, column 17:
streamid: "tika"
^
in 'string', line 97, column 7:
type: LOCAL_OR_SHUFFLE
^
in 'string', line 63, column 3:
- from: "spout"
^
at org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.constructJavaBean2ndStep(Constructor.java:292)
at org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.construct(Constructor.java:171)
at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:331)
at org.yaml.snakeyaml.constructor.BaseConstructor.constructObjectNoCheck(BaseConstructor.java:230)
at org.yaml.snakeyaml.constructor.BaseConstructor.constructObject(BaseConstructor.java:219)
at org.yaml.snakeyaml.constructor.BaseConstructor.constructDocument(BaseConstructor.java:173)
at org.yaml.snakeyaml.constructor.BaseConstructor.getSingleData(BaseConstructor.java:157)
at org.yaml.snakeyaml.Yaml.loadFromReader(Yaml.java:472)
at org.yaml.snakeyaml.Yaml.load(Yaml.java:398)
at org.apache.storm.flux.parser.FluxParser.loadYaml(FluxParser.java:168)
at org.apache.storm.flux.parser.FluxParser.parseInputStream(FluxParser.java:114)
at org.apache.storm.flux.parser.FluxParser.parseFile(FluxParser.java:68)
at org.apache.storm.flux.Flux.runCli(Flux.java:167)
at org.apache.storm.flux.Flux.main(Flux.java:119)```
Caused by: Cannot create property=grouping for JavaBean=org.apache.storm.flux.model.StreamDef@1ff4931d ```
in 'string', line 94, column 5:
- from: "shunt"
^
Cannot create property=streamid for JavaBean=org.apache.storm.flux.model.GroupingDef@710f4dc7
in 'string', line 97, column 7:
type: LOCAL_OR_SHUFFLE
^
Unable to find property 'streamid' on class: org.apache.storm.flux.model.GroupingDef ```
in 'string', line 98, column 17:
streamid: "tika"
in 'string', line 97, column 7: ```
type: LOCAL_OR_SHUFFLE
```
我从上面的 Gist 复制了 Flux 文件,运行 没有问题。也许您文件中的行对齐方式不正确(例如 space 缺失)?