LoadIncrementalHFiles 有什么问题?
What wrong with LoadIncrementalHFile?
我是hadoop&Hbase方面的菜鸟。我想将 .csv 文件导入到 Hfile。
我在 HDFS
中有一个 csv 文件 "testcsv.csv"
ty,12,1
tes,13,1
tt,14,1
yu,15,1
ui,16,1
qq,17,1
我在主节点中使用命令。
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv '-Dimporttsv.separator=,' -Dimporttsv.columns=HBASE_ROW_KEY,basic:G1,basic:G2, testTSV /user/hadoop/csvtest.csv
我验证了Hbase table.
hbase(main):002:0> scan 'testTSV'
ROW COLUMN+CELL
qq column=basic:G1, timestamp=1435682234304, value=17
qq column=basic:G2, timestamp=1435682234304, value=1
tes column=basic:G1, timestamp=1435682234304, value=13
tes column=basic:G2, timestamp=1435682234304, value=1
tt column=basic:G1, timestamp=1435682234304, value=14
tt column=basic:G2, timestamp=1435682234304, value=1
ty column=basic:G1, timestamp=1435682234304, value=12
ty column=basic:G2, timestamp=1435682234304, value=1
ui column=basic:G1, timestamp=1435682234304, value=16
ui column=basic:G2, timestamp=1435682234304, value=1
yu column=basic:G1, timestamp=1435682234304, value=15
yu column=basic:G2, timestamp=1435682234304, value=1
6 row(s) in 1.6180 seconds
之后,我使用 CompleteBulkLoad 方法将数据从 StoreFile 加载到 table。
这个命令
hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /user/hadoop/outputfile testTSV
................................................ ............
2015-07-01 00:53:10,128 INFO [main] zookeeper.ZooKeeper: Client environment:java.library.path=/home/hadoop/app/lib
2015-07-01 00:53:10,128 INFO [main] zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
2015-07-01 00:53:10,128 INFO [main] zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
2015-07-01 00:53:10,128 INFO [main] zookeeper.ZooKeeper: Client environment:os.name=Linux
2015-07-01 00:53:10,128 INFO [main] zookeeper.ZooKeeper: Client environment:os.arch=amd64
2015-07-01 00:53:10,128 INFO [main] zookeeper.ZooKeeper: Client environment:os.version=2.6.32-431.el6.x86_64
2015-07-01 00:53:10,128 INFO [main] zookeeper.ZooKeeper: Client environment:user.name=hadoop
2015-07-01 00:53:10,128 INFO [main] zookeeper.ZooKeeper: Client environment:user.home=/home/hadoop
2015-07-01 00:53:10,128 INFO [main] zookeeper.ZooKeeper: Client environment:user.dir=/home/hadoop/Hbase
2015-07-01 00:53:10,131 INFO [main] zookeeper.ZooKeeper: Initiating client connection, connectString=Datanode01:2181,Masternode01:2181 sessionTimeout=90000 watcher=hconnection-0x526b00740x0, quorum=Datanode01:2181,Masternode01:2181, baseZNode=/hbase
2015-07-01 00:53:10,300 INFO [main-SendThread(Datanode01:2181)] zookeeper.ClientCnxn: Opening socket connection to server Datanode01/192.168.23.152:2181. Will not attempt to authenticate using SASL (unknown error)
2015-07-01 00:53:10,333 INFO [main-SendThread(Datanode01:2181)] zookeeper.ClientCnxn: Socket connection established to Datanode01/192.168.23.152:2181, initiating session
2015-07-01 00:53:10,358 INFO [main-SendThread(Datanode01:2181)] zookeeper.ClientCnxn: Session establishment complete on server Datanode01/192.168.23.152:2181, sessionid = 0x14e35637b2c000d, negotiated timeout = 90000
2015-07-01 00:53:12,901 INFO [main] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x7d83bb5e connecting to ZooKeeper ensemble=Datanode01:2181,Masternode01:2181
2015-07-01 00:53:12,901 INFO [main] zookeeper.ZooKeeper: Initiating client connection, connectString=Datanode01:2181,Masternode01:2181 sessionTimeout=90000 watcher=hconnection-0x7d83bb5e0x0, quorum=Datanode01:2181,Masternode01:2181, baseZNode=/hbase
2015-07-01 00:53:12,905 INFO [main-SendThread(Datanode01:2181)] zookeeper.ClientCnxn: Opening socket connection to server Datanode01/192.168.23.152:2181. Will not attempt to authenticate using SASL (unknown error)
2015-07-01 00:53:12,906 INFO [main-SendThread(Datanode01:2181)] zookeeper.ClientCnxn: Socket connection established to Datanode01/192.168.23.152:2181, initiating session
2015-07-01 00:53:12,922 INFO [main-SendThread(Datanode01:2181)] zookeeper.ClientCnxn: Session establishment complete on server Datanode01/192.168.23.152:2181, sessionid = 0x14e35637b2c000e, negotiated timeout = 90000
2015-07-01 00:53:13,036 INFO [main] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x14e35637b2c000e
2015-07-01 00:53:13,054 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
2015-07-01 00:53:13,054 INFO [main] zookeeper.ZooKeeper: Session: 0x14e35637b2c000e closed
Exception in thread "main" java.io.FileNotFoundException: Bulkload dir /user/hadoop/outputfile not found
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.visitBulkHFiles(LoadIncrementalHFiles.java:176)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.discoverLoadQueue(LoadIncrementalHFiles.java:260)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:314)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:960)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:967)
我错过了什么?
下面说的很清楚,就是文件丢失
FileNotFoundException: Bulkload dir /user/hadoop/outputfile not found
可能没有名为 outputfile 的文件。它是 HFile 的相对路径,应该在您使用 ImportTsv 的第一个命令中提及。请验证目录
当您使用 运行 ImportTSV 命令时,它会查看本地文件系统以将 csv 文件加载到 HBase 中,但在 LoadIncrementalHFiles 的情况下,它会查找存在于 hdfs.I 中的 hfiles 相信这不是在你的情况下。
请验证 /user/hadoop/outputfile 是否在 hdfs 文件系统中包含您的 hfiles。
这似乎是使用 MapReduce 工具时的权限问题。
如果在执行mapreduce命令时加入参数-Dfs.permissions.umask-mode=000
,例如:
org.apache.hadoop.hbase.mapreduce.ImportTsv
要么
org.apache.phoenix.mapreduce.CsvBulkLoadTool
它将启用写入临时文件并且作业将成功结束。
我是hadoop&Hbase方面的菜鸟。我想将 .csv 文件导入到 Hfile。 我在 HDFS
中有一个 csv 文件 "testcsv.csv"ty,12,1
tes,13,1
tt,14,1
yu,15,1
ui,16,1
qq,17,1
我在主节点中使用命令。
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv '-Dimporttsv.separator=,' -Dimporttsv.columns=HBASE_ROW_KEY,basic:G1,basic:G2, testTSV /user/hadoop/csvtest.csv
我验证了Hbase table.
hbase(main):002:0> scan 'testTSV'
ROW COLUMN+CELL
qq column=basic:G1, timestamp=1435682234304, value=17
qq column=basic:G2, timestamp=1435682234304, value=1
tes column=basic:G1, timestamp=1435682234304, value=13
tes column=basic:G2, timestamp=1435682234304, value=1
tt column=basic:G1, timestamp=1435682234304, value=14
tt column=basic:G2, timestamp=1435682234304, value=1
ty column=basic:G1, timestamp=1435682234304, value=12
ty column=basic:G2, timestamp=1435682234304, value=1
ui column=basic:G1, timestamp=1435682234304, value=16
ui column=basic:G2, timestamp=1435682234304, value=1
yu column=basic:G1, timestamp=1435682234304, value=15
yu column=basic:G2, timestamp=1435682234304, value=1
6 row(s) in 1.6180 seconds
之后,我使用 CompleteBulkLoad 方法将数据从 StoreFile 加载到 table。
这个命令
hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /user/hadoop/outputfile testTSV
................................................ ............
2015-07-01 00:53:10,128 INFO [main] zookeeper.ZooKeeper: Client environment:java.library.path=/home/hadoop/app/lib
2015-07-01 00:53:10,128 INFO [main] zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
2015-07-01 00:53:10,128 INFO [main] zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
2015-07-01 00:53:10,128 INFO [main] zookeeper.ZooKeeper: Client environment:os.name=Linux
2015-07-01 00:53:10,128 INFO [main] zookeeper.ZooKeeper: Client environment:os.arch=amd64
2015-07-01 00:53:10,128 INFO [main] zookeeper.ZooKeeper: Client environment:os.version=2.6.32-431.el6.x86_64
2015-07-01 00:53:10,128 INFO [main] zookeeper.ZooKeeper: Client environment:user.name=hadoop
2015-07-01 00:53:10,128 INFO [main] zookeeper.ZooKeeper: Client environment:user.home=/home/hadoop
2015-07-01 00:53:10,128 INFO [main] zookeeper.ZooKeeper: Client environment:user.dir=/home/hadoop/Hbase
2015-07-01 00:53:10,131 INFO [main] zookeeper.ZooKeeper: Initiating client connection, connectString=Datanode01:2181,Masternode01:2181 sessionTimeout=90000 watcher=hconnection-0x526b00740x0, quorum=Datanode01:2181,Masternode01:2181, baseZNode=/hbase
2015-07-01 00:53:10,300 INFO [main-SendThread(Datanode01:2181)] zookeeper.ClientCnxn: Opening socket connection to server Datanode01/192.168.23.152:2181. Will not attempt to authenticate using SASL (unknown error)
2015-07-01 00:53:10,333 INFO [main-SendThread(Datanode01:2181)] zookeeper.ClientCnxn: Socket connection established to Datanode01/192.168.23.152:2181, initiating session
2015-07-01 00:53:10,358 INFO [main-SendThread(Datanode01:2181)] zookeeper.ClientCnxn: Session establishment complete on server Datanode01/192.168.23.152:2181, sessionid = 0x14e35637b2c000d, negotiated timeout = 90000
2015-07-01 00:53:12,901 INFO [main] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x7d83bb5e connecting to ZooKeeper ensemble=Datanode01:2181,Masternode01:2181
2015-07-01 00:53:12,901 INFO [main] zookeeper.ZooKeeper: Initiating client connection, connectString=Datanode01:2181,Masternode01:2181 sessionTimeout=90000 watcher=hconnection-0x7d83bb5e0x0, quorum=Datanode01:2181,Masternode01:2181, baseZNode=/hbase
2015-07-01 00:53:12,905 INFO [main-SendThread(Datanode01:2181)] zookeeper.ClientCnxn: Opening socket connection to server Datanode01/192.168.23.152:2181. Will not attempt to authenticate using SASL (unknown error)
2015-07-01 00:53:12,906 INFO [main-SendThread(Datanode01:2181)] zookeeper.ClientCnxn: Socket connection established to Datanode01/192.168.23.152:2181, initiating session
2015-07-01 00:53:12,922 INFO [main-SendThread(Datanode01:2181)] zookeeper.ClientCnxn: Session establishment complete on server Datanode01/192.168.23.152:2181, sessionid = 0x14e35637b2c000e, negotiated timeout = 90000
2015-07-01 00:53:13,036 INFO [main] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x14e35637b2c000e
2015-07-01 00:53:13,054 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
2015-07-01 00:53:13,054 INFO [main] zookeeper.ZooKeeper: Session: 0x14e35637b2c000e closed
Exception in thread "main" java.io.FileNotFoundException: Bulkload dir /user/hadoop/outputfile not found
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.visitBulkHFiles(LoadIncrementalHFiles.java:176)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.discoverLoadQueue(LoadIncrementalHFiles.java:260)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:314)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:960)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:967)
我错过了什么?
下面说的很清楚,就是文件丢失
FileNotFoundException: Bulkload dir /user/hadoop/outputfile not found
可能没有名为 outputfile 的文件。它是 HFile 的相对路径,应该在您使用 ImportTsv 的第一个命令中提及。请验证目录
当您使用 运行 ImportTSV 命令时,它会查看本地文件系统以将 csv 文件加载到 HBase 中,但在 LoadIncrementalHFiles 的情况下,它会查找存在于 hdfs.I 中的 hfiles 相信这不是在你的情况下。
请验证 /user/hadoop/outputfile 是否在 hdfs 文件系统中包含您的 hfiles。
这似乎是使用 MapReduce 工具时的权限问题。
如果在执行mapreduce命令时加入参数-Dfs.permissions.umask-mode=000
,例如:
org.apache.hadoop.hbase.mapreduce.ImportTsv
要么
org.apache.phoenix.mapreduce.CsvBulkLoadTool
它将启用写入临时文件并且作业将成功结束。