为什么 Zeppelin 在 %spark.sql 段落中以 "mismatched input ';' expecting <EOF>" 失败?
Why does Zeppelin fail with "mismatched input ';' expecting <EOF>" in %spark.sql paragraph?
我已经从 csv 构建了一个 parquet 文件。
在 Zeppelin 中,我创建了一个 sql 语句,例如:
%spark.sql
DROP TABLE IF EXISTS df;
CREATE TABLE df (
date_time STRING
, site_name STRING
, posa_continent STRING
, user_location_country STRING
, user_location_region STRING
, user_location_city STRING
, orig_destination_distance DOUBLE
, user_id STRING
, is_mobile STRING
, is_package STRING
, channel STRING
, srch_ci STRING
, srch_co STRING
, srch_adults_cnt INT
, srch_children_cnt INT
, srch_rm_cnt INT
, srch_destination_id STRING
, srch_destination_type_id STRING
, is_booking STRING
, cnt INT
, hotel_continentm STRING
, hotel_country STRING
, hotel_market STRING
, hotel_cluster STRING)
USING parquet
OPTIONS (path "s3://hansprojekt/training_17000000pq")
结果我得到一个错误:
mismatched input ';' expecting <EOF>(line 1, pos 23)
== SQL ==
DROP TABLE IF EXISTS df;
-----------------------^^^
CREATE TABLE df (
date_time STRING
, site_name STRING
, posa_continent STRING
, user_location_country STRING
, user_location_region STRING
, user_location_city STRING
, orig_destination_distance DOUBLE
, user_id STRING
, is_mobile STRING
, is_package STRING
, channel STRING
, srch_ci STRING
, srch_co STRING
, srch_adults_cnt INT
, srch_children_cnt INT
, srch_rm_cnt INT
, srch_destination_id STRING
, srch_destination_type_id STRING
, is_booking STRING
, cnt INT
, hotel_continent STRING
, hotel_country STRING
, hotel_market STRING
, hotel_cluster STRING)
USING parquet
OPTIONS (path "s3://hansprojekt/training_17000000pq")
set zeppelin.spark.sql.stacktrace = true to see full stacktrace
我不明白这个问题。 csv 用','分隔。
有人能帮帮我吗?
在 %spark.sql
in Zeppelin 中的 paragraph(又名 代码部分)使用一个 SQL 语句。
因此,一段中的这一行:
DROP TABLE IF EXISTS df;
和另一个 %spark.sql
段落中的那个。
CREATE TABLE df (
date_time STRING
, site_name STRING
, posa_continent STRING
, user_location_country STRING
, user_location_region STRING
, user_location_city STRING
, orig_destination_distance DOUBLE
, user_id STRING
, is_mobile STRING
, is_package STRING
, channel STRING
, srch_ci STRING
, srch_co STRING
, srch_adults_cnt INT
, srch_children_cnt INT
, srch_rm_cnt INT
, srch_destination_id STRING
, srch_destination_type_id STRING
, is_booking STRING
, cnt INT
, hotel_continentm STRING
, hotel_country STRING
, hotel_market STRING
, hotel_cluster STRING)
USING parquet
OPTIONS (path "s3://hansprojekt/training_17000000pq")
%spark.sql
provides a SQL environment using Spark SQL (via SparkSQLInterpreter).
如果我没有弄错,当请求结果时SparkSQLInterpreter
简单地执行SQLContext.sql:
// method signature of sqlc.sql() is changed
// from def sql(sqlText: String): SchemaRDD (1.2 and prior)
// to def sql(sqlText: String): DataFrame (1.3 and later).
// Therefore need to use reflection to keep binary compatibility for all spark versions.
Method sqlMethod = sqlc.getClass().getMethod("sql", String.class);
rdd = sqlMethod.invoke(sqlc, st);
指向 SQLContext.sql 作为 "execution environment"。
sql(sqlText: String): DataFrame Executes a SQL query using Spark, returning the result as a DataFrame.
并且 sql
需要单个 SQL 语句。
我已经从 csv 构建了一个 parquet 文件。
在 Zeppelin 中,我创建了一个 sql 语句,例如:
%spark.sql
DROP TABLE IF EXISTS df;
CREATE TABLE df (
date_time STRING
, site_name STRING
, posa_continent STRING
, user_location_country STRING
, user_location_region STRING
, user_location_city STRING
, orig_destination_distance DOUBLE
, user_id STRING
, is_mobile STRING
, is_package STRING
, channel STRING
, srch_ci STRING
, srch_co STRING
, srch_adults_cnt INT
, srch_children_cnt INT
, srch_rm_cnt INT
, srch_destination_id STRING
, srch_destination_type_id STRING
, is_booking STRING
, cnt INT
, hotel_continentm STRING
, hotel_country STRING
, hotel_market STRING
, hotel_cluster STRING)
USING parquet
OPTIONS (path "s3://hansprojekt/training_17000000pq")
结果我得到一个错误:
mismatched input ';' expecting <EOF>(line 1, pos 23)
== SQL ==
DROP TABLE IF EXISTS df;
-----------------------^^^
CREATE TABLE df (
date_time STRING
, site_name STRING
, posa_continent STRING
, user_location_country STRING
, user_location_region STRING
, user_location_city STRING
, orig_destination_distance DOUBLE
, user_id STRING
, is_mobile STRING
, is_package STRING
, channel STRING
, srch_ci STRING
, srch_co STRING
, srch_adults_cnt INT
, srch_children_cnt INT
, srch_rm_cnt INT
, srch_destination_id STRING
, srch_destination_type_id STRING
, is_booking STRING
, cnt INT
, hotel_continent STRING
, hotel_country STRING
, hotel_market STRING
, hotel_cluster STRING)
USING parquet
OPTIONS (path "s3://hansprojekt/training_17000000pq")
set zeppelin.spark.sql.stacktrace = true to see full stacktrace
我不明白这个问题。 csv 用','分隔。
有人能帮帮我吗?
在 %spark.sql
in Zeppelin 中的 paragraph(又名 代码部分)使用一个 SQL 语句。
因此,一段中的这一行:
DROP TABLE IF EXISTS df;
和另一个 %spark.sql
段落中的那个。
CREATE TABLE df (
date_time STRING
, site_name STRING
, posa_continent STRING
, user_location_country STRING
, user_location_region STRING
, user_location_city STRING
, orig_destination_distance DOUBLE
, user_id STRING
, is_mobile STRING
, is_package STRING
, channel STRING
, srch_ci STRING
, srch_co STRING
, srch_adults_cnt INT
, srch_children_cnt INT
, srch_rm_cnt INT
, srch_destination_id STRING
, srch_destination_type_id STRING
, is_booking STRING
, cnt INT
, hotel_continentm STRING
, hotel_country STRING
, hotel_market STRING
, hotel_cluster STRING)
USING parquet
OPTIONS (path "s3://hansprojekt/training_17000000pq")
%spark.sql
provides a SQL environment using Spark SQL (via SparkSQLInterpreter).
如果我没有弄错,当请求结果时SparkSQLInterpreter
简单地执行SQLContext.sql:
// method signature of sqlc.sql() is changed
// from def sql(sqlText: String): SchemaRDD (1.2 and prior)
// to def sql(sqlText: String): DataFrame (1.3 and later).
// Therefore need to use reflection to keep binary compatibility for all spark versions.
Method sqlMethod = sqlc.getClass().getMethod("sql", String.class);
rdd = sqlMethod.invoke(sqlc, st);
指向 SQLContext.sql 作为 "execution environment"。
sql(sqlText: String): DataFrame Executes a SQL query using Spark, returning the result as a DataFrame.
并且 sql
需要单个 SQL 语句。