如何将数据从 CSV 加载到 impala 中的外部 table
How to load data from CSV into an external table in impala
我正在遵循 this 将外部 table 加载到 Impala 的解决方案,因为如果我通过引用文件加载数据,我会遇到同样的错误。
所以,如果我 运行:
[quickstart.cloudera:21000] > create external table Police2 (Priority string,Call_Type string,Jurisdiction string,Dispatch_Area string,Received_Date string,Received_Time int,Dispatch_Time int,Arrival_Time int,Cleared_Time int,Disposition string) row format delimited
> fields terminated by ','
> STORED as TEXTFILE
> location '/user/cloudera/rdpdata/rpd_data_all.csv' ;
我得到:
Query: create external table Police2 (Priority string,Call_Type string,Jurisdiction string,Dispatch_Area string,Received_Date string,Received_Time int,Dispatch_Time int,Arrival_Time int,Cleared_Time int,Disposition string) row format delimited
fields terminated by ','
STORED as TEXTFILE
location '/user/cloudera/rdpdata/rpd_data_all.csv'
ERROR: ImpalaRuntimeException: Error making 'createTable' RPC to Hive Metastore:
CAUSED BY: MetaException: hdfs://quickstart.cloudera:8020/user/cloudera/rdpdata/rpd_data_all.csv is not a directory or unable to create one
如果 运行 以下,则不会导入任何内容。
[quickstart.cloudera:21000] > create external table Police2 (Priority string,Call_Type string,Jurisdiction string,Dispatch_Area string,Received_Date string,Received_Time int,Dispatch_Time int,Arrival_Time int,Cleared_Time int,Disposition string) row format delimited
> fields terminated by ','
> location '/user/cloudera/rdpdata' ;
Query: create external table Police2 (Priority string,Call_Type string,Jurisdiction string,Dispatch_Area string,Received_Date string,Received_Time int,Dispatch_Time int,Arrival_Time int,Cleared_Time int,Disposition string) row format delimited
fields terminated by ','
location '/user/cloudera/rdpdata'
Fetched 0 row(s) in 1.01s
和文件夹的内容
[cloudera@quickstart ~]$ hadoop fs -ls /user/cloudera/rdpdata
Found 1 items
-rwxrwxrwx 1 cloudera cloudera 75115191 2020-09-02 19:36 /user/cloudera/rdpdata/rpd_data_all.csv
以及文件内容:
[cloudera@quickstart ~]$ hadoop fs -cat /user/cloudera/rdpdata/rpd_data_all.csv
1,EMSP,RP,RC, 03/21/2013,095454,000000,000000,101659,CANC
以及cloudera quickstart vm的截图
impala create table语句中的位置选项决定了hdfs_path或HDFS存储数据文件的目录。尝试提供目录位置而不是文件名,这样您就可以使用现有数据。
供您参考:https://impala.apache.org/docs/build/html/topics/impala_tables.html
我正在遵循 this 将外部 table 加载到 Impala 的解决方案,因为如果我通过引用文件加载数据,我会遇到同样的错误。
所以,如果我 运行:
[quickstart.cloudera:21000] > create external table Police2 (Priority string,Call_Type string,Jurisdiction string,Dispatch_Area string,Received_Date string,Received_Time int,Dispatch_Time int,Arrival_Time int,Cleared_Time int,Disposition string) row format delimited
> fields terminated by ','
> STORED as TEXTFILE
> location '/user/cloudera/rdpdata/rpd_data_all.csv' ;
我得到:
Query: create external table Police2 (Priority string,Call_Type string,Jurisdiction string,Dispatch_Area string,Received_Date string,Received_Time int,Dispatch_Time int,Arrival_Time int,Cleared_Time int,Disposition string) row format delimited
fields terminated by ','
STORED as TEXTFILE
location '/user/cloudera/rdpdata/rpd_data_all.csv'
ERROR: ImpalaRuntimeException: Error making 'createTable' RPC to Hive Metastore:
CAUSED BY: MetaException: hdfs://quickstart.cloudera:8020/user/cloudera/rdpdata/rpd_data_all.csv is not a directory or unable to create one
如果 运行 以下,则不会导入任何内容。
[quickstart.cloudera:21000] > create external table Police2 (Priority string,Call_Type string,Jurisdiction string,Dispatch_Area string,Received_Date string,Received_Time int,Dispatch_Time int,Arrival_Time int,Cleared_Time int,Disposition string) row format delimited
> fields terminated by ','
> location '/user/cloudera/rdpdata' ;
Query: create external table Police2 (Priority string,Call_Type string,Jurisdiction string,Dispatch_Area string,Received_Date string,Received_Time int,Dispatch_Time int,Arrival_Time int,Cleared_Time int,Disposition string) row format delimited
fields terminated by ','
location '/user/cloudera/rdpdata'
Fetched 0 row(s) in 1.01s
和文件夹的内容
[cloudera@quickstart ~]$ hadoop fs -ls /user/cloudera/rdpdata
Found 1 items
-rwxrwxrwx 1 cloudera cloudera 75115191 2020-09-02 19:36 /user/cloudera/rdpdata/rpd_data_all.csv
以及文件内容:
[cloudera@quickstart ~]$ hadoop fs -cat /user/cloudera/rdpdata/rpd_data_all.csv
1,EMSP,RP,RC, 03/21/2013,095454,000000,000000,101659,CANC
以及cloudera quickstart vm的截图
impala create table语句中的位置选项决定了hdfs_path或HDFS存储数据文件的目录。尝试提供目录位置而不是文件名,这样您就可以使用现有数据。
供您参考:https://impala.apache.org/docs/build/html/topics/impala_tables.html