dplyr 不按日期分组数据
dplyr does not group data by date
我正在尝试使用 Leada 提供的数据集计算人们骑自行车的频率。
代码如下:
library(dplyr)
setAs("character", "POSIXlt", function(from) strptime(from, format = "%m/%d/%y %H:%M"))
d <- read.csv("http://mandrillapp.com/track/click/30315607/s3-us-west-1.amazonaws.com?p=eyJzIjoiemxlVjNUREczQ2l5UFVPeEFCalNUdmlDYTgwIiwidiI6MSwicCI6IntcInVcIjozMDMxNTYwNyxcInZcIjoxLFwidXJsXCI6XCJodHRwczpcXFwvXFxcL3MzLXVzLXdlc3QtMS5hbWF6b25hd3MuY29tXFxcL2RhdGF5ZWFyXFxcL2Jpa2VfdHJpcF9kYXRhLmNzdlwiLFwiaWRcIjpcImEyODNiNjMzOWJkOTQxMGM5ZjlkYzE0MmQ0NDQ5YmU4XCIsXCJ1cmxfaWRzXCI6W1wiMTVlYzMzNWM1NDRlMTM1ZDI0YjAwODE4ZjI5YTdkMmFkZjU2NWQ2MVwiXX0ifQ",
colClasses = c("numeric", "numeric", "POSIXlt", "factor", "numeric", "POSIXlt", "factor", "numeric", "numeric", "factor", "character"),
stringsAsFactors = T)
names(d)[9] <- "BikeNo"
d <- tbl_df(d)
d <- d %>% mutate(Weekday = factor(weekdays(Start.Date)))
d %>% group_by(Weekday)
%>% summarise(Total = n())
%>% select(Weekday, Total)
这很奇怪,但 dplyr 不想按工作日对数据进行分组说:
Error: column 'Start.Date' has unsupported type
为什么它关心我按因素分组的 Start.Date 列?
您可以在本地运行代码重现错误:它会自动下载数据。
P.S。我使用的是 dplyr 版本:dplyr_0.3.0.2
lubridate 包在处理日期时很有用。
下面是解析 Start.Date 和 End.Date 的代码,提取工作日,然后按工作日分组:
将日期读取为字符向量
library(dplyr)
library(lubridate)
# For some reason your instruction to load the csv directly from a url
# didn't work. I save the csv to a temporary directory.
d <- read.csv("/tmp/bike_trip_data.csv", colClasses = c("numeric", "numeric", "character", "factor", "numeric", "character", "factor", "numeric", "numeric", "factor", "character"), stringsAsFactors = T)
names(d)[9] <- "BikeNo"
d <- tbl_df(d)
使用 lubridate 转换开始日期和结束日期
d <- d %>%
mutate(
Start.Date = parse_date_time(Start.Date,"%m/%d/%y %H:%M"),
End.Date = parse_date_time(End.Date,"%m/%d/%y %H:%M"),
Weekday = wday(Start.Date, label=TRUE, abbr=FALSE))
每周日行数
d %>%
group_by(Weekday) %>%
summarise(Total = n())
# Weekday Total
# 1 Sunday 10587
# 2 Monday 23138
# 3 Tuesday 24678
# 4 Wednesday 23651
# 5 Thursday 25265
# 6 Friday 24283
# 7 Saturday 12413
很抱歉,这个问题早已被遗忘,但当我一直在使用从plyr 包使用 plyr::arrange
,因为它似乎没有 POSIXlt 格式的问题。因为我通常不是为 R 中的问题找到最简单解决方案的人,所以我开始认为它有问题。它与 dplyr 版本不一样吗?
我正在尝试使用 Leada 提供的数据集计算人们骑自行车的频率。
代码如下:
library(dplyr)
setAs("character", "POSIXlt", function(from) strptime(from, format = "%m/%d/%y %H:%M"))
d <- read.csv("http://mandrillapp.com/track/click/30315607/s3-us-west-1.amazonaws.com?p=eyJzIjoiemxlVjNUREczQ2l5UFVPeEFCalNUdmlDYTgwIiwidiI6MSwicCI6IntcInVcIjozMDMxNTYwNyxcInZcIjoxLFwidXJsXCI6XCJodHRwczpcXFwvXFxcL3MzLXVzLXdlc3QtMS5hbWF6b25hd3MuY29tXFxcL2RhdGF5ZWFyXFxcL2Jpa2VfdHJpcF9kYXRhLmNzdlwiLFwiaWRcIjpcImEyODNiNjMzOWJkOTQxMGM5ZjlkYzE0MmQ0NDQ5YmU4XCIsXCJ1cmxfaWRzXCI6W1wiMTVlYzMzNWM1NDRlMTM1ZDI0YjAwODE4ZjI5YTdkMmFkZjU2NWQ2MVwiXX0ifQ",
colClasses = c("numeric", "numeric", "POSIXlt", "factor", "numeric", "POSIXlt", "factor", "numeric", "numeric", "factor", "character"),
stringsAsFactors = T)
names(d)[9] <- "BikeNo"
d <- tbl_df(d)
d <- d %>% mutate(Weekday = factor(weekdays(Start.Date)))
d %>% group_by(Weekday)
%>% summarise(Total = n())
%>% select(Weekday, Total)
这很奇怪,但 dplyr 不想按工作日对数据进行分组说:
Error: column 'Start.Date' has unsupported type
为什么它关心我按因素分组的 Start.Date 列? 您可以在本地运行代码重现错误:它会自动下载数据。
P.S。我使用的是 dplyr 版本:dplyr_0.3.0.2
lubridate 包在处理日期时很有用。 下面是解析 Start.Date 和 End.Date 的代码,提取工作日,然后按工作日分组:
将日期读取为字符向量
library(dplyr)
library(lubridate)
# For some reason your instruction to load the csv directly from a url
# didn't work. I save the csv to a temporary directory.
d <- read.csv("/tmp/bike_trip_data.csv", colClasses = c("numeric", "numeric", "character", "factor", "numeric", "character", "factor", "numeric", "numeric", "factor", "character"), stringsAsFactors = T)
names(d)[9] <- "BikeNo"
d <- tbl_df(d)
使用 lubridate 转换开始日期和结束日期
d <- d %>%
mutate(
Start.Date = parse_date_time(Start.Date,"%m/%d/%y %H:%M"),
End.Date = parse_date_time(End.Date,"%m/%d/%y %H:%M"),
Weekday = wday(Start.Date, label=TRUE, abbr=FALSE))
每周日行数
d %>%
group_by(Weekday) %>%
summarise(Total = n())
# Weekday Total
# 1 Sunday 10587
# 2 Monday 23138
# 3 Tuesday 24678
# 4 Wednesday 23651
# 5 Thursday 25265
# 6 Friday 24283
# 7 Saturday 12413
很抱歉,这个问题早已被遗忘,但当我一直在使用从plyr 包使用 plyr::arrange
,因为它似乎没有 POSIXlt 格式的问题。因为我通常不是为 R 中的问题找到最简单解决方案的人,所以我开始认为它有问题。它与 dplyr 版本不一样吗?