无法使用 SparkR 对数据进行子集化,使用管道约定来执行命令

Unable to subset the data using SparkR, using piping convention to execute the commands

我正在处理一些如下所示的数据: dataFrame

我正在执行的命令是:

library(magrittr)

#subsetting the data for MAC-OS & sorting by event-timestamp.
macDF <- eventsDF %>% 
  SparkR::select("device", "event_timestamp") %>%
  SparkR::filter("device = macOS") %>%
  SparkR::arrange("event_timestamp")

display(macDF)

我得到的错误是:

Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'arrange': unable to find an inherited method for function ‘filter’ for signature ‘"character", "missing"’
Some(<code style = 'font-size:10p'> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'arrange': unable to find an inherited method for function ‘filter’ for signature ‘&quot;character&quot;, &quot;missing&quot;’ </code>)

任何帮助将不胜感激,谢谢!

我无法精确复制您的错误,但我在 R 中创建了一个示例 eventsDF 数据框,将其转换为 Spark 数据框,并更新了您的一些代码。

这是您开始使用的样式的示例。请注意对 SparkR::expr 的调用,它允许您为 Spark 提供 sql 表达式以放入它正在构建的 where 子句中。由于本例使用expr()构建了一个sql where子句,macOS需要引用:

library(magrittr)

eventsDF = data.frame(device=c("macOS","redhat","macOS"),event_timestamp=strptime(c('2022-01-13 12:19','2021-11-14 08:02','2021-12-01 21:33'),format="%Y-%m-%d %H:%M")) %>%
            SparkR::as.DataFrame()

macDF <- eventsDF %>% 
  SparkR::select(eventsDF$device, eventsDF$event_timestamp) %>%
  SparkR::filter(SparkR::expr("device='macOS'")) %>%
  SparkR::arrange('event_timestamp') %>%
  display()

我该怎么做:

library(dplyr)
library(SparkR)

eventsDF = data.frame(device=c("macOS","redhat","macOS"),event_timestamp=strptime(c('2022-01-13 12:19','2021-11-14 08:02','2021-12-01 21:33'),format="%Y-%m-%d %H:%M")) %>%
            as.DataFrame()

macDF <- eventsDF %>% 
  select(c('device','event_timestamp')) %>%
  filter(eventsDF$device=='macOS') %>%
  arrange('event_timestamp') %>%
  display()

结果: