Flume 的 HDFS IO 错误(hadoop 2.8)

HDFS IO error (hadoop 2.8) with Flume

当我尝试通过 Flume 将流式数据导入 hadoop 时出现以下错误。

我在 flume/lib 中创建了 link,指向 hadoop/share/hadoop/

中的 .jar 文件

我仔细检查了 URL,我认为它们都是正确的。想发帖以获得更多关注和反馈。

      2017-07-20 10:53:18,959 (SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN -org.apache.flume.sink.hdfs.HDFSEventSink.process HDFSEventSink.java:455)] HDFS IO error
      java.io.IOException: No FileSystem for scheme: hdfs
        at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2798)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2809)
        at org.apache.hadoop.fs.FileSystem.access0(FileSystem.java:100)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2848)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2830)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356)
        at org.apache.flume.sink.hdfs.BucketWriter.call(BucketWriter.java:243)
        at org.apache.flume.sink.hdfs.BucketWriter.call(BucketWriter.java:235)
        at org.apache.flume.sink.hdfs.BucketWriter.run(BucketWriter.java:679)
        at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)
        at org.apache.flume.sink.hdfs.BucketWriter.call(BucketWriter.java:676)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:748)

这是 Flume 接收器配置

agent1.sinks.PurePathSink.type = hdfs
agent1.sinks.PurePathSink.hdfs.path = hdfs://127.0.0.1:9000/User/bts/pp 
agent1.sinks.PurePathSink.hdfs.fileType = DataStream
agent1.sinks.PurePathSink.hdfs.filePrefix = export
agent1.sinks.PurePathSink.hdfs.fileSuffix = .txt
agent1.sinks.PurePathSink.hdfs.rollInterval = 120
agent1.sinks.PurePathSink.hdfs.rollSize = 131072

核心-site.xml - Hadoop 2.8

<configuration>

    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home1/tmp</value>
        <description>A base for other temporary directories</description>
    </property>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://127.0.0.1:9000</value>
    </property>

    <property>
        <name>fs.file.impl</name>
        <value>org.apache.hadoop.fs.LocalFileSystem</value>
        <description>The FileSystem for file: uris.</description>
    </property>

    <property>
        <name>fs.hdfs.impl</name>
        <value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
        <description>The FileSystem for hdfs: uris.</description>
    </property>

看你的Flume Sink,看来你不是运行这个在集群上,而是在本地主机上。

查看HDFS路径,是否可访问:

agent1.sinks.PurePathSink.hdfs.path = hdfs://127.0.0.1:9000/User/bts/pp

端口号一般为8020(如果你使用的是Cloudera Distribution)

请同时检查以下link错误复制和解决方案: [Cloudera 已解决:FLUME + IO错误问题]

https://community.cloudera.com/t5/Storage-Random-Access-HDFS/Flume-HDFS-IO-error-ConnectException/td-p/28157

就我而言,我发现明确声明路径可以解决问题。它与它拾取的 Jar 有关。

感谢@V.Bravo 的回复。我没有使用发行版,而是建立了自己的集群

  • 摩西

在我的例子中,将 hdfs jar 文件从 hadoop/hdfs 复制到 flume/lib 解决了问题。

$ cp my_hadoop_path/share/hadoop/hdfs/*.jar my_flume_path/lib/