无法使用 SparkR 对数据进行子集化,使用管道约定来执行命令
Unable to subset the data using SparkR, using piping convention to execute the commands
我正在处理一些如下所示的数据:
dataFrame
我正在执行的命令是:
library(magrittr)
#subsetting the data for MAC-OS & sorting by event-timestamp.
macDF <- eventsDF %>%
SparkR::select("device", "event_timestamp") %>%
SparkR::filter("device = macOS") %>%
SparkR::arrange("event_timestamp")
display(macDF)
我得到的错误是:
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'arrange': unable to find an inherited method for function ‘filter’ for signature ‘"character", "missing"’
Some(<code style = 'font-size:10p'> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'arrange': unable to find an inherited method for function ‘filter’ for signature ‘"character", "missing"’ </code>)
任何帮助将不胜感激,谢谢!
我无法精确复制您的错误,但我在 R 中创建了一个示例 eventsDF 数据框,将其转换为 Spark 数据框,并更新了您的一些代码。
这是您开始使用的样式的示例。请注意对 SparkR::expr 的调用,它允许您为 Spark 提供 sql 表达式以放入它正在构建的 where 子句中。由于本例使用expr()构建了一个sql where子句,macOS需要引用:
library(magrittr)
eventsDF = data.frame(device=c("macOS","redhat","macOS"),event_timestamp=strptime(c('2022-01-13 12:19','2021-11-14 08:02','2021-12-01 21:33'),format="%Y-%m-%d %H:%M")) %>%
SparkR::as.DataFrame()
macDF <- eventsDF %>%
SparkR::select(eventsDF$device, eventsDF$event_timestamp) %>%
SparkR::filter(SparkR::expr("device='macOS'")) %>%
SparkR::arrange('event_timestamp') %>%
display()
我该怎么做:
library(dplyr)
library(SparkR)
eventsDF = data.frame(device=c("macOS","redhat","macOS"),event_timestamp=strptime(c('2022-01-13 12:19','2021-11-14 08:02','2021-12-01 21:33'),format="%Y-%m-%d %H:%M")) %>%
as.DataFrame()
macDF <- eventsDF %>%
select(c('device','event_timestamp')) %>%
filter(eventsDF$device=='macOS') %>%
arrange('event_timestamp') %>%
display()
结果:
我正在处理一些如下所示的数据: dataFrame
我正在执行的命令是:
library(magrittr)
#subsetting the data for MAC-OS & sorting by event-timestamp.
macDF <- eventsDF %>%
SparkR::select("device", "event_timestamp") %>%
SparkR::filter("device = macOS") %>%
SparkR::arrange("event_timestamp")
display(macDF)
我得到的错误是:
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'arrange': unable to find an inherited method for function ‘filter’ for signature ‘"character", "missing"’
Some(<code style = 'font-size:10p'> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'arrange': unable to find an inherited method for function ‘filter’ for signature ‘"character", "missing"’ </code>)
任何帮助将不胜感激,谢谢!
我无法精确复制您的错误,但我在 R 中创建了一个示例 eventsDF 数据框,将其转换为 Spark 数据框,并更新了您的一些代码。
这是您开始使用的样式的示例。请注意对 SparkR::expr 的调用,它允许您为 Spark 提供 sql 表达式以放入它正在构建的 where 子句中。由于本例使用expr()构建了一个sql where子句,macOS需要引用:
library(magrittr)
eventsDF = data.frame(device=c("macOS","redhat","macOS"),event_timestamp=strptime(c('2022-01-13 12:19','2021-11-14 08:02','2021-12-01 21:33'),format="%Y-%m-%d %H:%M")) %>%
SparkR::as.DataFrame()
macDF <- eventsDF %>%
SparkR::select(eventsDF$device, eventsDF$event_timestamp) %>%
SparkR::filter(SparkR::expr("device='macOS'")) %>%
SparkR::arrange('event_timestamp') %>%
display()
我该怎么做:
library(dplyr)
library(SparkR)
eventsDF = data.frame(device=c("macOS","redhat","macOS"),event_timestamp=strptime(c('2022-01-13 12:19','2021-11-14 08:02','2021-12-01 21:33'),format="%Y-%m-%d %H:%M")) %>%
as.DataFrame()
macDF <- eventsDF %>%
select(c('device','event_timestamp')) %>%
filter(eventsDF$device=='macOS') %>%
arrange('event_timestamp') %>%
display()
结果: