从因素到另一种格式的日期,以便我可以找到每个组的开始和结束日期
Dates from factor to another format so I can find start and end dates per group
我是 R 的新手,我不知道我的问题。我经历过无数种形式。
我的数据集如下所示:
Glimpse of dataset
我想找到每个事件的第一个和最后一个日期,并把它放在一个漂亮的 table 中。有26个事件。
但是,日期采用因子格式,这让我无法找到开始和结束日期。当我尝试将它转换为数字格式时,我得到每个值的 NA,当我尝试将它转换为日期格式时,它保持因子格式。
有人可以帮我吗?
按照建议,我尝试找到一种使用 dput 共享我的数据集的方法。我试过了,我认为这应该可以获取我的数据集的 2x8 样本。
df <- structure(list(`Release Date` = structure(c(1L, 3L, 4L, 5L, 6L, 7L, 8L, 9L), .Names = c("",
"", "", "", "",
"", "", ""), .Label = c("3/17/2020", "Release Date", "6/16/2020", "9/15/2020",
"12/16/2020", "12/17/2015", "6/17/2013", "9/17/2012", "6/14/2012",
"3/15/2012", "6/20/2011", "3/16/2011", "12/16/2010", "9/14/2010",
"6/16/2010", "3/17/2010", "12/15/2009", "9/15/2009", "6/16/2009",
"3/13/2009", "12/12/2008", "9/15/2008", "6/13/2008", "3/14/2008",
"12/13/2007", "9/12/2007", "6/14/2007", "3/15/2007", "12/14/2006",
"9/14/2006", "6/16/2006", "3/17/2006", "12/15/2005", "10/18/2005",
"9/21/2005", "7/15/2005", "6/21/2005", "4/15/2005", "3/15/2005",
"1/18/2005", "12/15/2004", "10/27/2004", "9/15/2004", "7/28/2004"
), class = "factor"), Event = structure(c(2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L), .Names = c("", "", "", "", "", "", "", ""), .Label = c("Event", "Labour Costs YoY",
"Unemployment Change (000's)", "Unemployment Rate", "Jobseekers Net Change"
), class = "factor")), row.names = c("X.1", "X.11", "X.12", "X.13", "X.14", "X.15", "X.16", "X.17"), class = "data.frame")
将日期转换为日期对象后,您可以使用 min
和 max
获取每个事件的第一个和最后一个日期。
library(dplyr)
df %>%
mutate(`Release Date` = as.Date(`Release Date`, '%m/%d/%Y')) %>%
group_by(Event) %>%
summarise(first_date = min(`Release Date`),
last_date = max(`Release Date`))
# Event first_date last_date
# <fct> <date> <date>
#1 Labour Costs YoY 2020-03-17 2020-12-16
#2 Unemployment Change (000's) 2012-06-14 2015-12-17
我是 R 的新手,我不知道我的问题。我经历过无数种形式。
我的数据集如下所示: Glimpse of dataset
我想找到每个事件的第一个和最后一个日期,并把它放在一个漂亮的 table 中。有26个事件。 但是,日期采用因子格式,这让我无法找到开始和结束日期。当我尝试将它转换为数字格式时,我得到每个值的 NA,当我尝试将它转换为日期格式时,它保持因子格式。
有人可以帮我吗?
按照建议,我尝试找到一种使用 dput 共享我的数据集的方法。我试过了,我认为这应该可以获取我的数据集的 2x8 样本。
df <- structure(list(`Release Date` = structure(c(1L, 3L, 4L, 5L, 6L, 7L, 8L, 9L), .Names = c("",
"", "", "", "",
"", "", ""), .Label = c("3/17/2020", "Release Date", "6/16/2020", "9/15/2020",
"12/16/2020", "12/17/2015", "6/17/2013", "9/17/2012", "6/14/2012",
"3/15/2012", "6/20/2011", "3/16/2011", "12/16/2010", "9/14/2010",
"6/16/2010", "3/17/2010", "12/15/2009", "9/15/2009", "6/16/2009",
"3/13/2009", "12/12/2008", "9/15/2008", "6/13/2008", "3/14/2008",
"12/13/2007", "9/12/2007", "6/14/2007", "3/15/2007", "12/14/2006",
"9/14/2006", "6/16/2006", "3/17/2006", "12/15/2005", "10/18/2005",
"9/21/2005", "7/15/2005", "6/21/2005", "4/15/2005", "3/15/2005",
"1/18/2005", "12/15/2004", "10/27/2004", "9/15/2004", "7/28/2004"
), class = "factor"), Event = structure(c(2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L), .Names = c("", "", "", "", "", "", "", ""), .Label = c("Event", "Labour Costs YoY",
"Unemployment Change (000's)", "Unemployment Rate", "Jobseekers Net Change"
), class = "factor")), row.names = c("X.1", "X.11", "X.12", "X.13", "X.14", "X.15", "X.16", "X.17"), class = "data.frame")
将日期转换为日期对象后,您可以使用 min
和 max
获取每个事件的第一个和最后一个日期。
library(dplyr)
df %>%
mutate(`Release Date` = as.Date(`Release Date`, '%m/%d/%Y')) %>%
group_by(Event) %>%
summarise(first_date = min(`Release Date`),
last_date = max(`Release Date`))
# Event first_date last_date
# <fct> <date> <date>
#1 Labour Costs YoY 2020-03-17 2020-12-16
#2 Unemployment Change (000's) 2012-06-14 2015-12-17