指定列的 Spark sql 问题
Spark sql issue with columns specified
我们正在尝试将 oracle 数据库复制到配置单元中。我们从 oracle 获取查询,然后 运行 在 hive 中获取它们。
因此,我们以这种格式获取它们:
INSERT INTO schema.table(col1,col2) VALUES ('val','val');
虽然此查询直接在 Hive 中运行,但当我使用 spark.sql 时,出现以下错误:
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'emp_id' expecting {'(', 'SELECT', 'FROM', 'VALUES', 'TABLE', 'INSERT', 'MAP', 'REDUCE'}(line 1, pos 20)
== SQL ==
insert into ss.tab(emp_id,firstname,lastname) values ('1','demo','demo')
--------------------^^^
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:217)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:114)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:68)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:623)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:691)
at com.datastream.SparkReplicator.insertIntoHive(SparkReplicator.java:20)
at com.datastream.App.main(App.java:67)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
at org.apache.spark.deploy.SparkSubmit$.doRunMain(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
出现此错误是因为 Spark SQL 不支持插入语句中的列列表。因此从插入语句中排除列列表。
下面是我的蜂巢 table:
select * from UDB.emp_details_table;
+---------+-----------+-----------+-------------------+--+
| emp_id | emp_name | emp_dept | emp_joining_date |
+---------+-----------+-----------+-------------------+--+
| 1 | AAA | HR | 2018-12-06 |
| 1 | BBB | HR | 2017-10-26 |
| 2 | XXX | ADMIN | 2018-10-22 |
| 2 | YYY | ADMIN | 2015-10-19 |
| 2 | ZZZ | IT | 2018-05-14 |
| 3 | GGG | HR | 2018-06-30 |
+---------+-----------+-----------+-------------------+--+
这里我通过 pyspark
使用 spark sql 插入记录
df = spark.sql("""insert into UDB.emp_details_table values ('6','VVV','IT','2018-12-18')""");
您可以在下面看到给定的记录已插入到我现有的配置单元中 table。
+---------+-----------+-----------+-------------------+--+
| emp_id | emp_name | emp_dept | emp_joining_date |
+---------+-----------+-----------+-------------------+--+
| 1 | AAA | HR | 2018-12-06 |
| 1 | BBB | HR | 2017-10-26 |
| 2 | XXX | ADMIN | 2018-10-22 |
| 2 | YYY | ADMIN | 2015-10-19 |
| 2 | ZZZ | IT | 2018-05-14 |
| 3 | GGG | HR | 2018-06-30 |
| 6 | VVV | IT | 2018-12-18 |
+---------+-----------+-----------+-------------------+--+
将您的 spark sql 查询更改为:
spark.sql("""insert into ss.tab values ('1','demo','demo')""");
Note: I am using spark 2.3, you need to use hive context in case you
are using spark 1.6 version.
如果有效请告诉我。
我们正在尝试将 oracle 数据库复制到配置单元中。我们从 oracle 获取查询,然后 运行 在 hive 中获取它们。 因此,我们以这种格式获取它们:
INSERT INTO schema.table(col1,col2) VALUES ('val','val');
虽然此查询直接在 Hive 中运行,但当我使用 spark.sql 时,出现以下错误:
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'emp_id' expecting {'(', 'SELECT', 'FROM', 'VALUES', 'TABLE', 'INSERT', 'MAP', 'REDUCE'}(line 1, pos 20)
== SQL ==
insert into ss.tab(emp_id,firstname,lastname) values ('1','demo','demo')
--------------------^^^
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:217)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:114)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:68)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:623)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:691)
at com.datastream.SparkReplicator.insertIntoHive(SparkReplicator.java:20)
at com.datastream.App.main(App.java:67)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
at org.apache.spark.deploy.SparkSubmit$.doRunMain(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
出现此错误是因为 Spark SQL 不支持插入语句中的列列表。因此从插入语句中排除列列表。
下面是我的蜂巢 table:
select * from UDB.emp_details_table;
+---------+-----------+-----------+-------------------+--+
| emp_id | emp_name | emp_dept | emp_joining_date |
+---------+-----------+-----------+-------------------+--+
| 1 | AAA | HR | 2018-12-06 |
| 1 | BBB | HR | 2017-10-26 |
| 2 | XXX | ADMIN | 2018-10-22 |
| 2 | YYY | ADMIN | 2015-10-19 |
| 2 | ZZZ | IT | 2018-05-14 |
| 3 | GGG | HR | 2018-06-30 |
+---------+-----------+-----------+-------------------+--+
这里我通过 pyspark
使用 spark sql 插入记录df = spark.sql("""insert into UDB.emp_details_table values ('6','VVV','IT','2018-12-18')""");
您可以在下面看到给定的记录已插入到我现有的配置单元中 table。
+---------+-----------+-----------+-------------------+--+
| emp_id | emp_name | emp_dept | emp_joining_date |
+---------+-----------+-----------+-------------------+--+
| 1 | AAA | HR | 2018-12-06 |
| 1 | BBB | HR | 2017-10-26 |
| 2 | XXX | ADMIN | 2018-10-22 |
| 2 | YYY | ADMIN | 2015-10-19 |
| 2 | ZZZ | IT | 2018-05-14 |
| 3 | GGG | HR | 2018-06-30 |
| 6 | VVV | IT | 2018-12-18 |
+---------+-----------+-----------+-------------------+--+
将您的 spark sql 查询更改为:
spark.sql("""insert into ss.tab values ('1','demo','demo')""");
Note: I am using spark 2.3, you need to use hive context in case you are using spark 1.6 version.
如果有效请告诉我。