尝试 group_by 然后将最大值和最小值 - 运行 汇总为明确格式的错误
trying to group_by and then summarize max and min - running into error for unambiguous format
我的地址与 Kingwood 和 Humble 地址的信息重复。我正在尝试合并这些条目,保留最短首次报告日期和最长最后报告日期,使用此代码:
df <- df %>% group_by(id, street) %>%
summarise(firstReportedDate = min(as.Date(firstReportedDate))) %>%
summarise(lastReportedDate = max(as.Date(lastReportedDate)))
但是,由于某种原因,id == 1000 给我错误:
Error: Problem with `summarise()` column `firstReportedDate`.
i `firstReportedDate = min(as.Date(firstReportedDate))`.
x character string is not in a standard unambiguous format
i The error occurred in group 3: id = "1000", street = "Po Box 203"
谁能帮我理解这个错误?以下数据示例:
dput(df)
structure(list(street = c("2200 Lake Village Dr", "1040 Marina Dr",
"2200 Lake Village Dr", "1040 Marina Dr", "22302 Rustic Bridge Ln",
"22302 Rustic Bridge Ln", "1060 Marina Dr", "3211 Laurel Point Ct",
"Po Box 203", "19703 Highway 59 N", "6714 Dorylee Ln", "3511 Forest Row Dr",
"3511 Forest Row Dr", "Acorn Ln"), city = c("Kingwood", "Humble",
"Kingwood", "Kingwood", "Kingwood", "Humble", "Humble", "Kingwood",
"Humble", "Humble", "Humble", "Kingwood", "Humble", "Humble"),
state = c("TX", "TX", "TX", "TX", "TX", "TX", "TX", "TX",
"TX", "TX", "TX", "TX", "TX", "TX"), zip = c("77339", "77339",
"77339", "77339", "77339", "77339", "77339", "77339", "77347",
"77338", "77396", "77345", "77345", "77345"), firstReportedDate = c("5/25/2019",
"1/1/2015", "9/30/2017", "11/30/2015", "10/18/2017", "6/15/2017",
"9/30/2009", "10/12/2002", "9/22/2017", "1/1/2009", "3/5/2004",
"4/8/2012", "9/30/2009", "1/1/2009"), lastReportedDate = c("4/1/2022",
"1/1/2021", "9/30/2017", "11/30/2015", "4/1/2022", "6/15/2018",
"9/30/2009", "3/3/2004", "4/1/2022", "1/1/2011", "3/5/2004",
"4/1/2022", "9/30/2009", "1/1/2013"), id = c("357", "357",
"357", "357", "359", "359", "359", "359", "1000", "1000",
"1000", "1431", "1431", "1431")), row.names = c(NA, -14L), class = c("tbl_df",
"tbl", "data.frame"))
将所有内容嵌入同一个汇总调用中。此外,当您的数据不在 international date format.
中时,您应该在 as.Date
的 format
参数中指定日期格式
dat %>%
mutate(across(ends_with("Date"), as.Date, format = "%m/%d/%Y")) %>%
group_by(id, street) %>%
summarise(firstReportedDate = min(firstReportedDate),
lastReportedDate = max(lastReportedDate))
输出
# A tibble: 10 × 4
# Groups: id [4]
id street firstReportedDate lastReportedDate
<chr> <chr> <date> <date>
1 1000 19703 Highway 59 N 2009-01-01 2011-01-01
2 1000 6714 Dorylee Ln 2004-03-05 2004-03-05
3 1000 Po Box 203 2017-09-22 2022-04-01
4 1431 3511 Forest Row Dr 2009-09-30 2022-04-01
5 1431 Acorn Ln 2009-01-01 2013-01-01
6 357 1040 Marina Dr 2015-01-01 2021-01-01
7 357 2200 Lake Village Dr 2017-09-30 2022-04-01
8 359 1060 Marina Dr 2009-09-30 2009-09-30
9 359 22302 Rustic Bridge Ln 2017-06-15 2022-04-01
10 359 3211 Laurel Point Ct 2002-10-12 2004-03-03
我的地址与 Kingwood 和 Humble 地址的信息重复。我正在尝试合并这些条目,保留最短首次报告日期和最长最后报告日期,使用此代码:
df <- df %>% group_by(id, street) %>%
summarise(firstReportedDate = min(as.Date(firstReportedDate))) %>%
summarise(lastReportedDate = max(as.Date(lastReportedDate)))
但是,由于某种原因,id == 1000 给我错误:
Error: Problem with `summarise()` column `firstReportedDate`.
i `firstReportedDate = min(as.Date(firstReportedDate))`.
x character string is not in a standard unambiguous format
i The error occurred in group 3: id = "1000", street = "Po Box 203"
谁能帮我理解这个错误?以下数据示例:
dput(df)
structure(list(street = c("2200 Lake Village Dr", "1040 Marina Dr",
"2200 Lake Village Dr", "1040 Marina Dr", "22302 Rustic Bridge Ln",
"22302 Rustic Bridge Ln", "1060 Marina Dr", "3211 Laurel Point Ct",
"Po Box 203", "19703 Highway 59 N", "6714 Dorylee Ln", "3511 Forest Row Dr",
"3511 Forest Row Dr", "Acorn Ln"), city = c("Kingwood", "Humble",
"Kingwood", "Kingwood", "Kingwood", "Humble", "Humble", "Kingwood",
"Humble", "Humble", "Humble", "Kingwood", "Humble", "Humble"),
state = c("TX", "TX", "TX", "TX", "TX", "TX", "TX", "TX",
"TX", "TX", "TX", "TX", "TX", "TX"), zip = c("77339", "77339",
"77339", "77339", "77339", "77339", "77339", "77339", "77347",
"77338", "77396", "77345", "77345", "77345"), firstReportedDate = c("5/25/2019",
"1/1/2015", "9/30/2017", "11/30/2015", "10/18/2017", "6/15/2017",
"9/30/2009", "10/12/2002", "9/22/2017", "1/1/2009", "3/5/2004",
"4/8/2012", "9/30/2009", "1/1/2009"), lastReportedDate = c("4/1/2022",
"1/1/2021", "9/30/2017", "11/30/2015", "4/1/2022", "6/15/2018",
"9/30/2009", "3/3/2004", "4/1/2022", "1/1/2011", "3/5/2004",
"4/1/2022", "9/30/2009", "1/1/2013"), id = c("357", "357",
"357", "357", "359", "359", "359", "359", "1000", "1000",
"1000", "1431", "1431", "1431")), row.names = c(NA, -14L), class = c("tbl_df",
"tbl", "data.frame"))
将所有内容嵌入同一个汇总调用中。此外,当您的数据不在 international date format.
中时,您应该在as.Date
的 format
参数中指定日期格式
dat %>%
mutate(across(ends_with("Date"), as.Date, format = "%m/%d/%Y")) %>%
group_by(id, street) %>%
summarise(firstReportedDate = min(firstReportedDate),
lastReportedDate = max(lastReportedDate))
输出
# A tibble: 10 × 4
# Groups: id [4]
id street firstReportedDate lastReportedDate
<chr> <chr> <date> <date>
1 1000 19703 Highway 59 N 2009-01-01 2011-01-01
2 1000 6714 Dorylee Ln 2004-03-05 2004-03-05
3 1000 Po Box 203 2017-09-22 2022-04-01
4 1431 3511 Forest Row Dr 2009-09-30 2022-04-01
5 1431 Acorn Ln 2009-01-01 2013-01-01
6 357 1040 Marina Dr 2015-01-01 2021-01-01
7 357 2200 Lake Village Dr 2017-09-30 2022-04-01
8 359 1060 Marina Dr 2009-09-30 2009-09-30
9 359 22302 Rustic Bridge Ln 2017-06-15 2022-04-01
10 359 3211 Laurel Point Ct 2002-10-12 2004-03-03