为什么 Zeppelin 在 %spark.sql 段落中以 "mismatched input ';' expecting <EOF>" 失败？

Question

我已经从 csv 构建了一个 parquet 文件。

在 Zeppelin 中，我创建了一个 sql 语句，例如：

%spark.sql
DROP TABLE IF EXISTS df;
CREATE TABLE df (
    date_time STRING
  , site_name STRING
  , posa_continent STRING
  , user_location_country STRING
  , user_location_region STRING
  , user_location_city STRING
  , orig_destination_distance DOUBLE
  , user_id STRING
  , is_mobile STRING
  , is_package STRING
  , channel STRING
  , srch_ci STRING
  , srch_co STRING
  , srch_adults_cnt INT 
  , srch_children_cnt INT
  , srch_rm_cnt INT
  , srch_destination_id STRING
  , srch_destination_type_id STRING
  , is_booking STRING
  , cnt INT
  , hotel_continentm STRING
  , hotel_country STRING
  , hotel_market STRING
  , hotel_cluster STRING)
USING parquet
OPTIONS (path "s3://hansprojekt/training_17000000pq")

结果我得到一个错误：

mismatched input ';' expecting <EOF>(line 1, pos 23)
== SQL ==
DROP TABLE IF EXISTS df;
-----------------------^^^
CREATE TABLE df (
    date_time STRING
  , site_name STRING
  , posa_continent STRING
  , user_location_country STRING
  , user_location_region STRING
  , user_location_city STRING
  , orig_destination_distance DOUBLE
  , user_id STRING
  , is_mobile STRING
  , is_package STRING
  , channel STRING
  , srch_ci STRING
  , srch_co STRING
  , srch_adults_cnt INT 
  , srch_children_cnt INT
  , srch_rm_cnt INT
  , srch_destination_id STRING
  , srch_destination_type_id STRING
  , is_booking STRING
  , cnt INT
  , hotel_continent STRING
  , hotel_country STRING
  , hotel_market STRING
  , hotel_cluster STRING)
USING parquet
OPTIONS (path "s3://hansprojekt/training_17000000pq")
set zeppelin.spark.sql.stacktrace = true to see full stacktrace

我不明白这个问题。 csv 用','分隔。

有人能帮帮我吗？

Answer 1

在 %spark.sql in Zeppelin 中的 paragraph（又名 代码部分）使用一个 SQL 语句。

因此，一段中的这一行：

DROP TABLE IF EXISTS df;

和另一个 %spark.sql 段落中的那个。

CREATE TABLE df (
    date_time STRING
  , site_name STRING
  , posa_continent STRING
  , user_location_country STRING
  , user_location_region STRING
  , user_location_city STRING
  , orig_destination_distance DOUBLE
  , user_id STRING
  , is_mobile STRING
  , is_package STRING
  , channel STRING
  , srch_ci STRING
  , srch_co STRING
  , srch_adults_cnt INT 
  , srch_children_cnt INT
  , srch_rm_cnt INT
  , srch_destination_id STRING
  , srch_destination_type_id STRING
  , is_booking STRING
  , cnt INT
  , hotel_continentm STRING
  , hotel_country STRING
  , hotel_market STRING
  , hotel_cluster STRING)
USING parquet
OPTIONS (path "s3://hansprojekt/training_17000000pq")

%spark.sql provides a SQL environment using Spark SQL (via SparkSQLInterpreter).

如果我没有弄错，当请求结果时SparkSQLInterpreter简单地执行SQLContext.sql:

  // method signature of sqlc.sql() is changed
  // from  def sql(sqlText: String): SchemaRDD (1.2 and prior)
  // to    def sql(sqlText: String): DataFrame (1.3 and later).
  // Therefore need to use reflection to keep binary compatibility for all spark versions.
  Method sqlMethod = sqlc.getClass().getMethod("sql", String.class);
  rdd = sqlMethod.invoke(sqlc, st);

指向 SQLContext.sql 作为 "execution environment"。

sql(sqlText: String): DataFrame Executes a SQL query using Spark, returning the result as a DataFrame.

并且 sql 需要单个 SQL 语句。

为什么 Zeppelin 在 %spark.sql 段落中以 "mismatched input ';' expecting <EOF>" 失败？

Why does Zeppelin fail with "mismatched input ';' expecting <EOF>" in %spark.sql paragraph?

apache-spark

parquet

apache-spark-sql

apache-zeppelin