Spark 2.0.0 - 实木复合地板读取为空 table
Spark 2.0.0 - parquet read empty table
我刚更新到 Spark 2.0.0
,我想在 SparkR
中阅读我的镶木地板文件:
library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))
sparkR.session(master = "local[*]", sparkConfig = list(spark.driver.memory = "2g"), sparkPackages = "com.databricks:spark-avro_2.11:3.0.0")
df1 <- read.parquet("my.parquet")
但是SparkDataFrame
returns空了。当我收集它时,我得到了我的 variables/columns,但没有行。然而,这段代码适用于我用 Spark 1.6.2
.
生成的 parquet
此 parquet 文件是在另一个 file.scala 和 Spark 2.0.0
:
中生成的
myDf.write.format("parquet").mode("overwrite")
.option("header", "true")
.option("parquet.enable.summary-metadata","true").save("my.parquet")
来自 Release Notes、"When writing Parquet files, the summary files are not written by default. To re-enable it, users must set “parquet.enable.summary-metadata”
to true.
",我做了。
myDf 不为空,因为我可以用 show()
打印出来,文件通常由 write:
创建
./_common_metadata
./_metadata
./_SUCCESS
./part-r-00000-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00001-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00002-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00003-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00004-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00005-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00006-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00007-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00008-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00009-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00010-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00011-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00012-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00013-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00014-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00015-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00016-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00017-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00018-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00019-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00020-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00021-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00022-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00023-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00024-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00025-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00026-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00027-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00028-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00029-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
要么是myDF写对了但加载不正确,要么写错了。对可能发生的事情有任何见解吗?
确实是parquet写错了。
我在命令行中 运行 我的工作是 --packages "com.databricks:spark-csv_2.10:1.2.0"
。但是,现在 spark-csv
包含在 Spark 2.0.0
中,而且我调用了错误的版本。删除 packages 命令修复了它。
我刚更新到 Spark 2.0.0
,我想在 SparkR
中阅读我的镶木地板文件:
library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))
sparkR.session(master = "local[*]", sparkConfig = list(spark.driver.memory = "2g"), sparkPackages = "com.databricks:spark-avro_2.11:3.0.0")
df1 <- read.parquet("my.parquet")
但是SparkDataFrame
returns空了。当我收集它时,我得到了我的 variables/columns,但没有行。然而,这段代码适用于我用 Spark 1.6.2
.
parquet
此 parquet 文件是在另一个 file.scala 和 Spark 2.0.0
:
myDf.write.format("parquet").mode("overwrite")
.option("header", "true")
.option("parquet.enable.summary-metadata","true").save("my.parquet")
来自 Release Notes、"When writing Parquet files, the summary files are not written by default. To re-enable it, users must set “parquet.enable.summary-metadata”
to true.
",我做了。
myDf 不为空,因为我可以用 show()
打印出来,文件通常由 write:
./_common_metadata
./_metadata
./_SUCCESS
./part-r-00000-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00001-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00002-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00003-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00004-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00005-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00006-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00007-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00008-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00009-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00010-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00011-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00012-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00013-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00014-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00015-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00016-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00017-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00018-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00019-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00020-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00021-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00022-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00023-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00024-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00025-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00026-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00027-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00028-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
./part-r-00029-6235ae25-fb7b-472b-9f0e-139907759393.snappy.parquet
要么是myDF写对了但加载不正确,要么写错了。对可能发生的事情有任何见解吗?
确实是parquet写错了。
我在命令行中 运行 我的工作是 --packages "com.databricks:spark-csv_2.10:1.2.0"
。但是,现在 spark-csv
包含在 Spark 2.0.0
中,而且我调用了错误的版本。删除 packages 命令修复了它。