使用 Purr 和 Map 从分组数据中提取信息
Using Purr and Map to extract information from grouped data
我有一个数据集,需要收集分组数据,比如最短时间,最长时间等
> data
# A tibble: 9 x 3
DateTime Location Temperature
<dttm> <chr> <dbl>
1 2022-01-30 18:00:00 A 122
2 2022-01-30 18:00:00 B 123
3 2022-01-30 18:00:20 C 112
4 2022-01-30 18:01:00 A 123
5 2022-01-30 18:01:00 B 124
6 2022-01-30 18:01:20 C 114
7 2022-01-30 18:02:00 A 122.
8 2022-01-30 18:02:00 B 123
9 2022-01-30 18:02:20 C 115
我想要一个类似
的总结
Location Min Max
A 2022-01-30 18:00:00 2022-01-30 18:02:00
B 2022-01-30 18:00:00 2022-01-30 18:02:00
C 2022-01-30 18:00:20 2022-01-30 18:00:20
我能够使用以下方法将它分成分组的小标题:
> data_grouped <- data %>%
+ split(.$Location)
> data
# A tibble: 9 x 3
DateTime Location Temperature
<dttm> <chr> <dbl>
1 2022-01-30 18:00:00 A 122
2 2022-01-30 18:00:00 B 123
3 2022-01-30 18:00:20 C 112
4 2022-01-30 18:01:00 A 123
5 2022-01-30 18:01:00 B 124
6 2022-01-30 18:01:20 C 114
7 2022-01-30 18:02:00 A 122.
8 2022-01-30 18:02:00 B 123
9 2022-01-30 18:02:20 C 115
> data_grouped <- data %>%
+ split(.$Location)
> data_grouped
$A
# A tibble: 3 x 3
DateTime Location Temperature
<dttm> <chr> <dbl>
1 2022-01-30 18:00:00 A 122
2 2022-01-30 18:01:00 A 123
3 2022-01-30 18:02:00 A 122.
$B
# A tibble: 3 x 3
DateTime Location Temperature
<dttm> <chr> <dbl>
1 2022-01-30 18:00:00 B 123
2 2022-01-30 18:01:00 B 124
3 2022-01-30 18:02:00 B 123
$C
# A tibble: 3 x 3
DateTime Location Temperature
<dttm> <chr> <dbl>
1 2022-01-30 18:00:20 C 112
2 2022-01-30 18:01:20 C 114
3 2022-01-30 18:02:20 C 115
但我无法进一步了解它。有人可以给我一些建议吗?数据的工作副本如下。
library(tidyverse)
library(lubridate)
library(purrr)
data <- tibble(
DateTime = ymd_hms("2022-01-30 18:00:00",
"2022-01-30 18:00:00",
"2022-01-30 18:00:20",
"2022-01-30 18:01:00",
"2022-01-30 18:01:00",
"2022-01-30 18:01:20",
"2022-01-30 18:02:00",
"2022-01-30 18:02:00",
"2022-01-30 18:02:20"),
Location = rep(c("A","B","C"),3),
Temperature = c(122,123,112,123,124,114,122.5,123,115)
)
谢谢!
肖恩·韦
这可以通过 min/max
和组 by/summarise
来完成
library(dplyr)
data %>%
group_by(Location) %>%
summarise(Min = min(DateTime), Max = max(DateTime))
拆分为 list
然后循环并不是真正需要的。如果只是为了了解 map
的用法 - 使用 map
循环拆分 list
,将 summarise
应用于 return [=12] =] 作为列并将输出列表元素 rbinded 与 _dfr
绑定
library(purrr)
map_dfr(data_grouped, ~ .x %>%
summarise(Location = first(Location),
Min = min(DateTime), Max = max(DateTime)))
我有一个数据集,需要收集分组数据,比如最短时间,最长时间等
> data
# A tibble: 9 x 3
DateTime Location Temperature
<dttm> <chr> <dbl>
1 2022-01-30 18:00:00 A 122
2 2022-01-30 18:00:00 B 123
3 2022-01-30 18:00:20 C 112
4 2022-01-30 18:01:00 A 123
5 2022-01-30 18:01:00 B 124
6 2022-01-30 18:01:20 C 114
7 2022-01-30 18:02:00 A 122.
8 2022-01-30 18:02:00 B 123
9 2022-01-30 18:02:20 C 115
我想要一个类似
的总结Location Min Max
A 2022-01-30 18:00:00 2022-01-30 18:02:00
B 2022-01-30 18:00:00 2022-01-30 18:02:00
C 2022-01-30 18:00:20 2022-01-30 18:00:20
我能够使用以下方法将它分成分组的小标题:
> data_grouped <- data %>%
+ split(.$Location)
> data
# A tibble: 9 x 3
DateTime Location Temperature
<dttm> <chr> <dbl>
1 2022-01-30 18:00:00 A 122
2 2022-01-30 18:00:00 B 123
3 2022-01-30 18:00:20 C 112
4 2022-01-30 18:01:00 A 123
5 2022-01-30 18:01:00 B 124
6 2022-01-30 18:01:20 C 114
7 2022-01-30 18:02:00 A 122.
8 2022-01-30 18:02:00 B 123
9 2022-01-30 18:02:20 C 115
> data_grouped <- data %>%
+ split(.$Location)
> data_grouped
$A
# A tibble: 3 x 3
DateTime Location Temperature
<dttm> <chr> <dbl>
1 2022-01-30 18:00:00 A 122
2 2022-01-30 18:01:00 A 123
3 2022-01-30 18:02:00 A 122.
$B
# A tibble: 3 x 3
DateTime Location Temperature
<dttm> <chr> <dbl>
1 2022-01-30 18:00:00 B 123
2 2022-01-30 18:01:00 B 124
3 2022-01-30 18:02:00 B 123
$C
# A tibble: 3 x 3
DateTime Location Temperature
<dttm> <chr> <dbl>
1 2022-01-30 18:00:20 C 112
2 2022-01-30 18:01:20 C 114
3 2022-01-30 18:02:20 C 115
但我无法进一步了解它。有人可以给我一些建议吗?数据的工作副本如下。
library(tidyverse)
library(lubridate)
library(purrr)
data <- tibble(
DateTime = ymd_hms("2022-01-30 18:00:00",
"2022-01-30 18:00:00",
"2022-01-30 18:00:20",
"2022-01-30 18:01:00",
"2022-01-30 18:01:00",
"2022-01-30 18:01:20",
"2022-01-30 18:02:00",
"2022-01-30 18:02:00",
"2022-01-30 18:02:20"),
Location = rep(c("A","B","C"),3),
Temperature = c(122,123,112,123,124,114,122.5,123,115)
)
谢谢!
肖恩·韦
这可以通过 min/max
和组 by/summarise
library(dplyr)
data %>%
group_by(Location) %>%
summarise(Min = min(DateTime), Max = max(DateTime))
拆分为 list
然后循环并不是真正需要的。如果只是为了了解 map
的用法 - 使用 map
循环拆分 list
,将 summarise
应用于 return [=12] =] 作为列并将输出列表元素 rbinded 与 _dfr
library(purrr)
map_dfr(data_grouped, ~ .x %>%
summarise(Location = first(Location),
Min = min(DateTime), Max = max(DateTime)))