尝试 group_by 然后将最大值和最小值 - 运行汇总为明确格式的错误

Question

我的地址与 Kingwood 和 Humble 地址的信息重复。我正在尝试合并这些条目，保留最短首次报告日期和最长最后报告日期，使用此代码：

df <- df %>% group_by(id, street) %>% 
  summarise(firstReportedDate = min(as.Date(firstReportedDate))) %>% 
  summarise(lastReportedDate = max(as.Date(lastReportedDate)))

但是，由于某种原因，id == 1000 给我错误：

Error: Problem with `summarise()` column `firstReportedDate`.
i `firstReportedDate = min(as.Date(firstReportedDate))`.
x character string is not in a standard unambiguous format
i The error occurred in group 3: id = "1000", street = "Po Box 203"

谁能帮我理解这个错误？以下数据示例：

dput(df)
structure(list(street = c("2200 Lake Village Dr", "1040 Marina Dr", 
"2200 Lake Village Dr", "1040 Marina Dr", "22302 Rustic Bridge Ln", 
"22302 Rustic Bridge Ln", "1060 Marina Dr", "3211 Laurel Point Ct", 
"Po Box 203", "19703 Highway 59 N", "6714 Dorylee Ln", "3511 Forest Row Dr", 
"3511 Forest Row Dr", "Acorn Ln"), city = c("Kingwood", "Humble", 
"Kingwood", "Kingwood", "Kingwood", "Humble", "Humble", "Kingwood", 
"Humble", "Humble", "Humble", "Kingwood", "Humble", "Humble"), 
    state = c("TX", "TX", "TX", "TX", "TX", "TX", "TX", "TX", 
    "TX", "TX", "TX", "TX", "TX", "TX"), zip = c("77339", "77339", 
    "77339", "77339", "77339", "77339", "77339", "77339", "77347", 
    "77338", "77396", "77345", "77345", "77345"), firstReportedDate = c("5/25/2019", 
    "1/1/2015", "9/30/2017", "11/30/2015", "10/18/2017", "6/15/2017", 
    "9/30/2009", "10/12/2002", "9/22/2017", "1/1/2009", "3/5/2004", 
    "4/8/2012", "9/30/2009", "1/1/2009"), lastReportedDate = c("4/1/2022", 
    "1/1/2021", "9/30/2017", "11/30/2015", "4/1/2022", "6/15/2018", 
    "9/30/2009", "3/3/2004", "4/1/2022", "1/1/2011", "3/5/2004", 
    "4/1/2022", "9/30/2009", "1/1/2013"), id = c("357", "357", 
    "357", "357", "359", "359", "359", "359", "1000", "1000", 
    "1000", "1431", "1431", "1431")), row.names = c(NA, -14L), class = c("tbl_df", 
"tbl", "data.frame"))

Answer 1

将所有内容嵌入同一个汇总调用中。此外，当您的数据不在 international date format.

中时，您应该在 as.Date 的 format 参数中指定日期格式

dat %>% 
  mutate(across(ends_with("Date"), as.Date, format = "%m/%d/%Y")) %>% 
  group_by(id, street) %>% 
  summarise(firstReportedDate = min(firstReportedDate),
            lastReportedDate = max(lastReportedDate))

输出

# A tibble: 10 × 4
# Groups:   id [4]
   id    street                 firstReportedDate lastReportedDate
   <chr> <chr>                  <date>            <date>          
 1 1000  19703 Highway 59 N     2009-01-01        2011-01-01      
 2 1000  6714 Dorylee Ln        2004-03-05        2004-03-05      
 3 1000  Po Box 203             2017-09-22        2022-04-01      
 4 1431  3511 Forest Row Dr     2009-09-30        2022-04-01      
 5 1431  Acorn Ln               2009-01-01        2013-01-01      
 6 357   1040 Marina Dr         2015-01-01        2021-01-01      
 7 357   2200 Lake Village Dr   2017-09-30        2022-04-01      
 8 359   1060 Marina Dr         2009-09-30        2009-09-30      
 9 359   22302 Rustic Bridge Ln 2017-06-15        2022-04-01      
10 359   3211 Laurel Point Ct   2002-10-12        2004-03-03

尝试 group_by 然后将最大值和最小值 - 运行汇总为明确格式的错误

trying to group_by and then summarize max and min - running into error for unambiguous format

r

date

data-manipulation

dataframe

dplyr

尝试 group_by 然后将最大值和最小值 - 运行 汇总为明确格式的错误

trying to group_by and then summarize max and min - running into error for unambiguous format

r

date

data-manipulation

dataframe

dplyr

尝试 group_by 然后将最大值和最小值 - 运行汇总为明确格式的错误