以 Sequence File 格式将文件 file hive 导出到 hdfs
Export file file hive to hdfs in Sequence File format
我正在尝试执行配置单元查询,并以 SEQUENCE FILE 格式将其输出导出到 HDFS。
beeline> show create table test_table;
+--------------------------------------------------------------------------------------+
| createtab_stmt |
+--------------------------------------------------------------------------------------+
| CREATE TABLE `test_table`( |
| `XXXXXXXXXXXXXX` bigint, |
| `XXXXXXXXXXXxx` int, |
| `XXXXXXXXX` int, |
| `XXXXXX` int) |
| PARTITIONED BY ( |
| `XXXXXXXX` string, |
| `XXXX` string, |
| `XXXXXXXX` string) |
| ROW FORMAT DELIMITED |
| FIELDS TERMINATED BY '\u00001' |
| STORED AS INPUTFORMAT |
| 'org.apache.hadoop.mapred.SequenceFileInputFormat' |
| OUTPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat' |
| LOCATION |
| 'hdfs://localhost:8020/user/hive/warehouse/local_hive_report.db/test_table' |
| TBLPROPERTIES ( |
| 'transient_lastDdlTime'='1437569941') |
+--------------------------------------------------------------------------------------+
这是我尝试导出数据的查询,
beeline> INSERT OVERWRITE DIRECTORY '/user/nages/load/date'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS SEQUENCEFILE
SELECT * FROM test_table WHERE column=value;
这里是错误,
Error: Error while compiling statement: FAILED: ParseException line 1:61
cannot recognize input near 'ROW' 'FORMAT' 'DELIMITED' in statement (state=42000,code=40000)
我是不是漏掉了什么?
软件版本:
Cloudera hadoop CDH5.3.3,
Apache 版本 0.13.1.
编辑:
在下面更新了我的临时解决方案。
这是因为配置单元查询默认使用 ^
作为分隔符。
您可以通过导出到本地文件来尝试相同的操作 system.That 应该受支持。
beeline> INSERT OVERWRITE LOCAL DIRECTORY '/user/~local directoryname'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS SEQUENCEFILE
SELECT * FROM test_table WHERE column=value;
作为临时修复,我创建了一个具有序列文件格式的 Hive table,并将选定的记录插入其中。
CREATE TABLE temp_table
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS SEQUENCEFILE
AS
SELECT * FROM test_table WHERE column=value;
这将在 HDFS 的以下位置创建序列文件。
/<HIVE_DATABASE_ROOT>/temp_table.db/
这个脚本适合我:
CREATE EXTERNAL TABLE dept_seq (department_id int, department_name string) ROW FORMAT DELIMITED FIELDS TERMINATED by '[=10=]1' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat' LOCATION 'hdfs:///user/cloudera/departments_seq';
我正在尝试执行配置单元查询,并以 SEQUENCE FILE 格式将其输出导出到 HDFS。
beeline> show create table test_table;
+--------------------------------------------------------------------------------------+
| createtab_stmt |
+--------------------------------------------------------------------------------------+
| CREATE TABLE `test_table`( |
| `XXXXXXXXXXXXXX` bigint, |
| `XXXXXXXXXXXxx` int, |
| `XXXXXXXXX` int, |
| `XXXXXX` int) |
| PARTITIONED BY ( |
| `XXXXXXXX` string, |
| `XXXX` string, |
| `XXXXXXXX` string) |
| ROW FORMAT DELIMITED |
| FIELDS TERMINATED BY '\u00001' |
| STORED AS INPUTFORMAT |
| 'org.apache.hadoop.mapred.SequenceFileInputFormat' |
| OUTPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat' |
| LOCATION |
| 'hdfs://localhost:8020/user/hive/warehouse/local_hive_report.db/test_table' |
| TBLPROPERTIES ( |
| 'transient_lastDdlTime'='1437569941') |
+--------------------------------------------------------------------------------------+
这是我尝试导出数据的查询,
beeline> INSERT OVERWRITE DIRECTORY '/user/nages/load/date'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS SEQUENCEFILE
SELECT * FROM test_table WHERE column=value;
这里是错误,
Error: Error while compiling statement: FAILED: ParseException line 1:61
cannot recognize input near 'ROW' 'FORMAT' 'DELIMITED' in statement (state=42000,code=40000)
我是不是漏掉了什么?
软件版本: Cloudera hadoop CDH5.3.3, Apache 版本 0.13.1.
编辑: 在下面更新了我的临时解决方案。
这是因为配置单元查询默认使用 ^
作为分隔符。
您可以通过导出到本地文件来尝试相同的操作 system.That 应该受支持。
beeline> INSERT OVERWRITE LOCAL DIRECTORY '/user/~local directoryname'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS SEQUENCEFILE
SELECT * FROM test_table WHERE column=value;
作为临时修复,我创建了一个具有序列文件格式的 Hive table,并将选定的记录插入其中。
CREATE TABLE temp_table
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS SEQUENCEFILE
AS
SELECT * FROM test_table WHERE column=value;
这将在 HDFS 的以下位置创建序列文件。
/<HIVE_DATABASE_ROOT>/temp_table.db/
这个脚本适合我:
CREATE EXTERNAL TABLE dept_seq (department_id int, department_name string) ROW FORMAT DELIMITED FIELDS TERMINATED by '[=10=]1' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat' LOCATION 'hdfs:///user/cloudera/departments_seq';