将地图与嵌套列表一起使用
Use map with nested lists
我正在努力正确使用库 purrr 中的地图。我想通过在列表中嵌套常见观察值然后使用 map() 来计算样本的加权平均值。 (我知道这也适用于 group_by)
MWE:假设我观察了 3 个不同的对象(用 'id' 表示),我有他们的样本权重('weights')和相应的观察值('obs')。
df <- tibble(id = c(1, 1, 2, 2, 3,3), weights = c(0.3,0.7,0.25,0.75,0.14,0.86), obs = 6:1)
df
# A tibble: 6 x 3
id weights obs
<dbl> <dbl> <int>
1 1 0.3 6
2 1 0.7 5
3 2 0.25 4
4 2 0.75 3
5 3 0.14 2
6 3 0.86 1
我想计算每个 subject.Therefore 的加权平均值,我嵌套了权重和观测值。
df %>% nest(data = c(weights, obs))
# A tibble: 3 x 2
id data
<dbl> <list>
1 1 <tibble [2 x 2]>
2 2 <tibble [2 x 2]>
3 3 <tibble [2 x 2]>
现在我想使用 map 对数据的每个元素应用一个函数。更准确地说,我尝试按以下方式解决它
df %>% nest(data = c(weights, obs)) %>% map(data, ~ (.x$weights*.x$obs)/sum(.x$weights))
Warning in .f(.x[[i]], ...) : data set ‘.x[[i]]’ not found
Warning in .f(.x[[i]], ...) :
data set ‘~(.x$weights * .x$obs)/sum(.x$weights)’ not found
Warning in .f(.x[[i]], ...) : data set ‘.x[[i]]’ not found
Warning in .f(.x[[i]], ...) :
data set ‘~(.x$weights * .x$obs)/sum(.x$weights)’ not found
如您所见,这会导致出现大量错误消息。为了更好地理解地图,我尝试将每个 ID 的权重向量乘以 2。
df %>% nest(data = c(weights, obs)) %>% map(data, ~ .x$weights*2)
$id
[1] ".x[[i]]" "~.x$weights * 2"
$data
[1] ".x[[i]]" "~.x$weights * 2"
Warning messages:
1: In .f(.x[[i]], ...) : data set ‘.x[[i]]’ not found
2: In .f(.x[[i]], ...) : data set ‘~.x$weights * 2’ not found
3: In .f(.x[[i]], ...) : data set ‘.x[[i]]’ not found
4: In .f(.x[[i]], ...) : data set ‘~.x$weights * 2’ not found
和
df %>% nest(data = c(weights, obs)) %>% map(data, function(x) x$weights*2)
Warning in .f(.x[[i]], ...) : data set ‘.x[[i]]’ not found
Warning in .f(.x[[i]], ...) :
data set ‘function(x) x$weights * 2’ not found
Warning in .f(.x[[i]], ...) : data set ‘.x[[i]]’ not found
Warning in .f(.x[[i]], ...) :
data set ‘function(x) x$weights * 2’ not found
$id
[1] ".x[[i]]" "function(x) x$weights * 2"
$data
[1] ".x[[i]]" "function(x) x$weights * 2"
所以我也在这里收到错误消息。即使阅读了地图的文档,我也很迷茫。我没有看到我的错误。我很高兴有任何见解!
非常感谢!
我们可以在 mutate
中传递 map
,因为 data
列在数据之外不可访问,除非我们使用 .$data
library(dplyr)
library(purrr)
df %>%
nest(data = c(weights, obs)) %>%
mutate(wtd_mean = map_dbl(data, ~ sum(.x$weights*.x$obs)/sum(.x$weights)))
-输出
# A tibble: 3 × 3
id data wtd_mean
<dbl> <list> <dbl>
1 1 <tibble [2 × 2]> 5.3
2 2 <tibble [2 × 2]> 3.25
3 3 <tibble [2 × 2]> 1.14
还有 weighted.mean
来自 stats
的函数(base R)
df %>%
nest(data = c(weights, obs)) %>%
mutate(wtd_mean = map_dbl(data, ~ weighted.mean(.x$obs, .x$weights)))
# A tibble: 3 × 3
id data wtd_mean
<dbl> <list> <dbl>
1 1 <tibble [2 × 2]> 5.3
2 2 <tibble [2 × 2]> 3.25
3 3 <tibble [2 × 2]> 1.14
如果需要 tibble,可以将命名向量 split-map_dbl
设为 return 或 map_df
:
tibble(id = c(1, 1, 2, 2, 3,3), weights = c(0.3,0.7,0.25,0.75,0.14,0.86), obs = 6:1) %>%
split.data.frame(.$id) %>%
map_dbl(
~sum(.x$weights * .x$obs)/sum(.x$weights)
)
1 2 3
5.30 3.25 1.14
tibble(id = c(1, 1, 2, 2, 3,3), weights = c(0.3,0.7,0.25,0.75,0.14,0.86), obs = 6:1) %>%
split.data.frame(.$id) %>%
map_df(
~sum(.x$weights * .x$obs)/sum(.x$weights)
)
# A tibble: 1 x 3
`1` `2` `3`
<dbl> <dbl> <dbl>
1 5.3 3.25 1.14
我正在努力正确使用库 purrr 中的地图。我想通过在列表中嵌套常见观察值然后使用 map() 来计算样本的加权平均值。 (我知道这也适用于 group_by)
MWE:假设我观察了 3 个不同的对象(用 'id' 表示),我有他们的样本权重('weights')和相应的观察值('obs')。
df <- tibble(id = c(1, 1, 2, 2, 3,3), weights = c(0.3,0.7,0.25,0.75,0.14,0.86), obs = 6:1)
df
# A tibble: 6 x 3
id weights obs
<dbl> <dbl> <int>
1 1 0.3 6
2 1 0.7 5
3 2 0.25 4
4 2 0.75 3
5 3 0.14 2
6 3 0.86 1
我想计算每个 subject.Therefore 的加权平均值,我嵌套了权重和观测值。
df %>% nest(data = c(weights, obs))
# A tibble: 3 x 2
id data
<dbl> <list>
1 1 <tibble [2 x 2]>
2 2 <tibble [2 x 2]>
3 3 <tibble [2 x 2]>
现在我想使用 map 对数据的每个元素应用一个函数。更准确地说,我尝试按以下方式解决它
df %>% nest(data = c(weights, obs)) %>% map(data, ~ (.x$weights*.x$obs)/sum(.x$weights))
Warning in .f(.x[[i]], ...) : data set ‘.x[[i]]’ not found
Warning in .f(.x[[i]], ...) :
data set ‘~(.x$weights * .x$obs)/sum(.x$weights)’ not found
Warning in .f(.x[[i]], ...) : data set ‘.x[[i]]’ not found
Warning in .f(.x[[i]], ...) :
data set ‘~(.x$weights * .x$obs)/sum(.x$weights)’ not found
如您所见,这会导致出现大量错误消息。为了更好地理解地图,我尝试将每个 ID 的权重向量乘以 2。
df %>% nest(data = c(weights, obs)) %>% map(data, ~ .x$weights*2)
$id
[1] ".x[[i]]" "~.x$weights * 2"
$data
[1] ".x[[i]]" "~.x$weights * 2"
Warning messages:
1: In .f(.x[[i]], ...) : data set ‘.x[[i]]’ not found
2: In .f(.x[[i]], ...) : data set ‘~.x$weights * 2’ not found
3: In .f(.x[[i]], ...) : data set ‘.x[[i]]’ not found
4: In .f(.x[[i]], ...) : data set ‘~.x$weights * 2’ not found
和
df %>% nest(data = c(weights, obs)) %>% map(data, function(x) x$weights*2)
Warning in .f(.x[[i]], ...) : data set ‘.x[[i]]’ not found
Warning in .f(.x[[i]], ...) :
data set ‘function(x) x$weights * 2’ not found
Warning in .f(.x[[i]], ...) : data set ‘.x[[i]]’ not found
Warning in .f(.x[[i]], ...) :
data set ‘function(x) x$weights * 2’ not found
$id
[1] ".x[[i]]" "function(x) x$weights * 2"
$data
[1] ".x[[i]]" "function(x) x$weights * 2"
所以我也在这里收到错误消息。即使阅读了地图的文档,我也很迷茫。我没有看到我的错误。我很高兴有任何见解!
非常感谢!
我们可以在 mutate
中传递 map
,因为 data
列在数据之外不可访问,除非我们使用 .$data
library(dplyr)
library(purrr)
df %>%
nest(data = c(weights, obs)) %>%
mutate(wtd_mean = map_dbl(data, ~ sum(.x$weights*.x$obs)/sum(.x$weights)))
-输出
# A tibble: 3 × 3
id data wtd_mean
<dbl> <list> <dbl>
1 1 <tibble [2 × 2]> 5.3
2 2 <tibble [2 × 2]> 3.25
3 3 <tibble [2 × 2]> 1.14
还有 weighted.mean
来自 stats
的函数(base R)
df %>%
nest(data = c(weights, obs)) %>%
mutate(wtd_mean = map_dbl(data, ~ weighted.mean(.x$obs, .x$weights)))
# A tibble: 3 × 3
id data wtd_mean
<dbl> <list> <dbl>
1 1 <tibble [2 × 2]> 5.3
2 2 <tibble [2 × 2]> 3.25
3 3 <tibble [2 × 2]> 1.14
如果需要 tibble,可以将命名向量 split-map_dbl
设为 return 或 map_df
:
tibble(id = c(1, 1, 2, 2, 3,3), weights = c(0.3,0.7,0.25,0.75,0.14,0.86), obs = 6:1) %>%
split.data.frame(.$id) %>%
map_dbl(
~sum(.x$weights * .x$obs)/sum(.x$weights)
)
1 2 3
5.30 3.25 1.14
tibble(id = c(1, 1, 2, 2, 3,3), weights = c(0.3,0.7,0.25,0.75,0.14,0.86), obs = 6:1) %>%
split.data.frame(.$id) %>%
map_df(
~sum(.x$weights * .x$obs)/sum(.x$weights)
)
# A tibble: 1 x 3
`1` `2` `3`
<dbl> <dbl> <dbl>
1 5.3 3.25 1.14