mySparkDF.show() 中的错误:找不到函数 "mySparkDF.show"

Error in mySparkDF.show() : could not find function "mySparkDF.show"

我想按照教程开始使用 sparkR,但出现以下错误:

library(SparkR)
Sys.setenv(SPARK_HOME="/Users/myuserhone/dev/spark-2.2.0-bin-hadoop2.7")
  .libPaths(c(file.path(Sys.getenv("SPARK_HOME"),"R","lib"), .libPaths()))

spark <- sparkR.session(appName = "mysparkr", Sys.getenv("SPARK_HOME"), master = "local[*]")

csvPath <- "file:///Users/myuserhome/dev/spark-data/donation"
mySparkDF <- read.df(csvPath, "csv", header = "true", inferSchema = "true", na.strings = "?")
mySparkDF.show()

但我得到:

Error in mySparkDF.show() : could not find function "mySparkDF.show"

不确定我做错了什么,此外,我没有像 read.df(...)

这样的 spark 函数的代码完成

另外如果我试试

show(describe(mySparkDF))

show(summary(mySparkDF))

我得到的结果是元数据,而不是 "describe" 预期结果

SparkDataFrame[summary:string, id_1:string, id_2:string, cmp_fname_c1:string, cmp_fname_c2:string, cmp_lname_c1:string, cmp_lname_c2:string, cmp_sex:string, cmp_bd:string, cmp_bm:string, cmp_by:string, cmp_plz:string]

我做错了什么吗?

show 在 SparkR 中没有以这种方式使用,它与 PySpark 中的同名命令的用途相同;你应该使用 head or showDF:

df <- as.DataFrame(faithful)

show(df)
# result:
SparkDataFrame[eruptions:double, waiting:double]

head(df)
# result:
   eruptions waiting
 1     3.600      79
 2     1.800      54
 3     3.333      74
 4     2.283      62
 5     4.533      85
 6     2.883      55

showDF(df)
# result:
+---------+-------+
|eruptions|waiting|
+---------+-------+
|      3.6|   79.0|
|      1.8|   54.0|
|    3.333|   74.0|
|    2.283|   62.0|
|    4.533|   85.0|
|    2.883|   55.0|
|      4.7|   88.0|
|      3.6|   85.0|
|     1.95|   51.0|
|     4.35|   85.0|
|    1.833|   54.0|
|    3.917|   84.0|
|      4.2|   78.0|
|     1.75|   47.0|
|      4.7|   83.0|
|    2.167|   52.0|
|     1.75|   62.0|
|      4.8|   84.0|
|      1.6|   52.0|
|     4.25|   79.0|
+---------+-------+
only showing top 20 rows