Spark Sql 查询失败

Spark Sql query fails

使用 Sparks 2/java/Cassanda2.2 尝试 运行 一个简单的 sparks sql 查询,它会出错: 尝试如下,+ 变体,如“'LAX'”,和“=”而不是“==”。

Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve '`LAX`' given input columns: [transdate, origin]; line 1 pos 42;
'Project ['origin]
+- 'Filter (origin#1 = 'LAX)
   +- SubqueryAlias origins
      +- LogicalRDD [transdate#0, origin#1]

JavaRDD<TransByDate> originDateRDD = javaFunctions(sc).cassandraTable("trans", "trans_by_date", CassandraJavaUtil.mapRowTo(TransByDate.class)).select(CassandraJavaUtil.column("origin"), CassandraJavaUtil.column("trans_date").as("transdate"));

long cnt1= originDateRDD.count();
System.out.println("sqlLike originDateRDD.count: "+cnt1); --> 406000
Dataset<Row> originDF = sparks.createDataFrame(originDateRDD, TransByDate.class);
originDF.createOrReplaceTempView("origins");
Dataset<Row> originlike = sparks.sql("SELECT origin FROM origins WHERE origin =="+ "LAX");

我启用了 Hive 支持(如果有帮助的话) 谢谢

Hive 不是问题,这是您的问题所在的行:

Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve '`LAX`' given input columns: [transdate, origin]; line 1 pos 42;

这就是说,在列名称中,none 被命名为 LAX。 scala DSL 在匹配作为列中键的值时要求 ===,也许类似的东西会更理想,比如 origins.filter($"origin === "LAX")

将列值放在单引号内。您的查询应如下所示。

Dataset<Row> originlike = spark.sql("SELECT origin FROM origins WHERE origin == "+"'LAX'");

详情请参考Querying Cassandra data using Spark SQL in Java

喜欢的查询应该如下所示。

Dataset<Row> originlike = spark.sql("SELECT origin FROM origins WHERE origin like 'LA%'");