Flume 的 HDFS IO 错误(hadoop 2.8)
HDFS IO error (hadoop 2.8) with Flume
当我尝试通过 Flume 将流式数据导入 hadoop 时出现以下错误。
我在 flume/lib 中创建了 link,指向 hadoop/share/hadoop/
中的 .jar
文件
我仔细检查了 URL,我认为它们都是正确的。想发帖以获得更多关注和反馈。
2017-07-20 10:53:18,959 (SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN -org.apache.flume.sink.hdfs.HDFSEventSink.process HDFSEventSink.java:455)] HDFS IO error
java.io.IOException: No FileSystem for scheme: hdfs
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2798)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2809)
at org.apache.hadoop.fs.FileSystem.access0(FileSystem.java:100)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2848)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2830)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356)
at org.apache.flume.sink.hdfs.BucketWriter.call(BucketWriter.java:243)
at org.apache.flume.sink.hdfs.BucketWriter.call(BucketWriter.java:235)
at org.apache.flume.sink.hdfs.BucketWriter.run(BucketWriter.java:679)
at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)
at org.apache.flume.sink.hdfs.BucketWriter.call(BucketWriter.java:676)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
这是 Flume 接收器配置
agent1.sinks.PurePathSink.type = hdfs
agent1.sinks.PurePathSink.hdfs.path = hdfs://127.0.0.1:9000/User/bts/pp
agent1.sinks.PurePathSink.hdfs.fileType = DataStream
agent1.sinks.PurePathSink.hdfs.filePrefix = export
agent1.sinks.PurePathSink.hdfs.fileSuffix = .txt
agent1.sinks.PurePathSink.hdfs.rollInterval = 120
agent1.sinks.PurePathSink.hdfs.rollSize = 131072
核心-site.xml - Hadoop 2.8
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home1/tmp</value>
<description>A base for other temporary directories</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://127.0.0.1:9000</value>
</property>
<property>
<name>fs.file.impl</name>
<value>org.apache.hadoop.fs.LocalFileSystem</value>
<description>The FileSystem for file: uris.</description>
</property>
<property>
<name>fs.hdfs.impl</name>
<value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
<description>The FileSystem for hdfs: uris.</description>
</property>
看你的Flume Sink,看来你不是运行这个在集群上,而是在本地主机上。
查看HDFS路径,是否可访问:
agent1.sinks.PurePathSink.hdfs.path = hdfs://127.0.0.1:9000/User/bts/pp
端口号一般为8020(如果你使用的是Cloudera Distribution)
请同时检查以下link错误复制和解决方案:
[Cloudera 已解决:FLUME + IO错误问题]
就我而言,我发现明确声明路径可以解决问题。它与它拾取的 Jar 有关。
感谢@V.Bravo 的回复。我没有使用发行版,而是建立了自己的集群
- 摩西
在我的例子中,将 hdfs jar 文件从 hadoop/hdfs 复制到 flume/lib 解决了问题。
$ cp my_hadoop_path/share/hadoop/hdfs/*.jar my_flume_path/lib/
当我尝试通过 Flume 将流式数据导入 hadoop 时出现以下错误。
我在 flume/lib 中创建了 link,指向 hadoop/share/hadoop/
中的.jar
文件
我仔细检查了 URL,我认为它们都是正确的。想发帖以获得更多关注和反馈。
2017-07-20 10:53:18,959 (SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN -org.apache.flume.sink.hdfs.HDFSEventSink.process HDFSEventSink.java:455)] HDFS IO error
java.io.IOException: No FileSystem for scheme: hdfs
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2798)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2809)
at org.apache.hadoop.fs.FileSystem.access0(FileSystem.java:100)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2848)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2830)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356)
at org.apache.flume.sink.hdfs.BucketWriter.call(BucketWriter.java:243)
at org.apache.flume.sink.hdfs.BucketWriter.call(BucketWriter.java:235)
at org.apache.flume.sink.hdfs.BucketWriter.run(BucketWriter.java:679)
at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)
at org.apache.flume.sink.hdfs.BucketWriter.call(BucketWriter.java:676)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
这是 Flume 接收器配置
agent1.sinks.PurePathSink.type = hdfs
agent1.sinks.PurePathSink.hdfs.path = hdfs://127.0.0.1:9000/User/bts/pp
agent1.sinks.PurePathSink.hdfs.fileType = DataStream
agent1.sinks.PurePathSink.hdfs.filePrefix = export
agent1.sinks.PurePathSink.hdfs.fileSuffix = .txt
agent1.sinks.PurePathSink.hdfs.rollInterval = 120
agent1.sinks.PurePathSink.hdfs.rollSize = 131072
核心-site.xml - Hadoop 2.8
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home1/tmp</value>
<description>A base for other temporary directories</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://127.0.0.1:9000</value>
</property>
<property>
<name>fs.file.impl</name>
<value>org.apache.hadoop.fs.LocalFileSystem</value>
<description>The FileSystem for file: uris.</description>
</property>
<property>
<name>fs.hdfs.impl</name>
<value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
<description>The FileSystem for hdfs: uris.</description>
</property>
看你的Flume Sink,看来你不是运行这个在集群上,而是在本地主机上。
查看HDFS路径,是否可访问:
agent1.sinks.PurePathSink.hdfs.path = hdfs://127.0.0.1:9000/User/bts/pp
端口号一般为8020(如果你使用的是Cloudera Distribution)
请同时检查以下link错误复制和解决方案: [Cloudera 已解决:FLUME + IO错误问题]
就我而言,我发现明确声明路径可以解决问题。它与它拾取的 Jar 有关。
感谢@V.Bravo 的回复。我没有使用发行版,而是建立了自己的集群
- 摩西
在我的例子中,将 hdfs jar 文件从 hadoop/hdfs 复制到 flume/lib 解决了问题。
$ cp my_hadoop_path/share/hadoop/hdfs/*.jar my_flume_path/lib/