无法使用 importtsv 将数据从 Hdfs 导入到 Hbase
Unable to import data from Hdfs to Hbase using importtsv
我将制表符分隔的文件移动到 hdfs,现在正在尝试将其移动到 hbase。
下面是我的 importtsv 命令
bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,cf:ok,cf:ek,cf:rk,cf:rsk,cf:pdk,cf:pmk,cf:omk,cf:sok,cf:sdk,cf:cdk,cf:q,cf:uc,cf:up,cf:usp,cf:gm,cf:st,cf:gp -Dimporttsv.skip.bad.lines=false 'sales_fact' hdfs://localhost:54310/my/file.txt
它正在尝试从不存在的位置读取 jar。
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:54310/home/elijah/Downloads/hbase/lib/htrace-core-3.1.0-incubating.jar
at org.apache.hadoop.hdfs.DistributedFileSystem.doCall(DistributedFileSystem.java:1072)
at org.apache.hadoop.hdfs.DistributedFileSystem.doCall(DistributedFileSystem.java:1064)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1064)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)
at org.apache.hadoop.mapreduce.Job.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
at org.apache.hadoop.hbase.mapreduce.ImportTsv.run(ImportTsv.java:738)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.hbase.mapreduce.ImportTsv.main(ImportTsv.java:747)
我不明白为什么它将 hdfs 和本地目录路径混为一谈。
hdfs://localhost:54310/home/elijah/Downloads/hbase/lib/htrace-core-3.1.0-incubating.jar
运行 导入作业的用户可以完全访问本地目录中的 hbase 库。
我可以看到缺少 -libjars
选项....您可以使用 -libjars
选项,下面是示例用法:
hadoop jar \
hbase-server-0.98.6-cdh5.2.1.jar \
importtsv \
-libjars /opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/jars/high-scale-lib-1.1.1.jar \
-Dimporttsv.separator=, -Dimporttsv.bulk.output=output \
-Dimporttsv.columns=HBASE_ROW_KEY,f:count wordcount \
word_count.csv
你也可以这样做:-
# export HADOOP_CLASSPATH=`./hbase classpath`
其中一个缺少的 jar,即 hbase/lib/htrace-core-3.1.0-incubating.jar 将是 hbase 类路径。在这种情况下应该有效。
我将制表符分隔的文件移动到 hdfs,现在正在尝试将其移动到 hbase。 下面是我的 importtsv 命令
bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,cf:ok,cf:ek,cf:rk,cf:rsk,cf:pdk,cf:pmk,cf:omk,cf:sok,cf:sdk,cf:cdk,cf:q,cf:uc,cf:up,cf:usp,cf:gm,cf:st,cf:gp -Dimporttsv.skip.bad.lines=false 'sales_fact' hdfs://localhost:54310/my/file.txt
它正在尝试从不存在的位置读取 jar。
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:54310/home/elijah/Downloads/hbase/lib/htrace-core-3.1.0-incubating.jar
at org.apache.hadoop.hdfs.DistributedFileSystem.doCall(DistributedFileSystem.java:1072)
at org.apache.hadoop.hdfs.DistributedFileSystem.doCall(DistributedFileSystem.java:1064)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1064)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)
at org.apache.hadoop.mapreduce.Job.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
at org.apache.hadoop.hbase.mapreduce.ImportTsv.run(ImportTsv.java:738)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.hbase.mapreduce.ImportTsv.main(ImportTsv.java:747)
我不明白为什么它将 hdfs 和本地目录路径混为一谈。 hdfs://localhost:54310/home/elijah/Downloads/hbase/lib/htrace-core-3.1.0-incubating.jar
运行 导入作业的用户可以完全访问本地目录中的 hbase 库。
我可以看到缺少 -libjars
选项....您可以使用 -libjars
选项,下面是示例用法:
hadoop jar \
hbase-server-0.98.6-cdh5.2.1.jar \
importtsv \
-libjars /opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/jars/high-scale-lib-1.1.1.jar \
-Dimporttsv.separator=, -Dimporttsv.bulk.output=output \
-Dimporttsv.columns=HBASE_ROW_KEY,f:count wordcount \
word_count.csv
你也可以这样做:-
# export HADOOP_CLASSPATH=`./hbase classpath`