将地图与嵌套列表一起使用

Question

我正在努力正确使用库 purrr 中的地图。我想通过在列表中嵌套常见观察值然后使用 map() 来计算样本的加权平均值。（我知道这也适用于 group_by）

MWE：假设我观察了 3 个不同的对象（用 'id' 表示），我有他们的样本权重（'weights'）和相应的观察值（'obs'）。

df <- tibble(id = c(1, 1, 2, 2, 3,3), weights = c(0.3,0.7,0.25,0.75,0.14,0.86), obs = 6:1)
df
# A tibble: 6 x 3
     id weights   obs
  <dbl>   <dbl> <int>
1     1    0.3      6
2     1    0.7      5
3     2    0.25     4
4     2    0.75     3
5     3    0.14     2
6     3    0.86     1

我想计算每个 subject.Therefore 的加权平均值，我嵌套了权重和观测值。

df %>% nest(data = c(weights, obs))
# A tibble: 3 x 2
     id data            
  <dbl> <list>          
1     1 <tibble [2 x 2]>
2     2 <tibble [2 x 2]>
3     3 <tibble [2 x 2]>

现在我想使用 map 对数据的每个元素应用一个函数。更准确地说，我尝试按以下方式解决它

df %>% nest(data = c(weights, obs)) %>% map(data, ~ (.x$weights*.x$obs)/sum(.x$weights))

Warning in .f(.x[[i]], ...) : data set ‘.x[[i]]’ not found
Warning in .f(.x[[i]], ...) :
  data set ‘~(.x$weights * .x$obs)/sum(.x$weights)’ not found
Warning in .f(.x[[i]], ...) : data set ‘.x[[i]]’ not found
Warning in .f(.x[[i]], ...) :
  data set ‘~(.x$weights * .x$obs)/sum(.x$weights)’ not found

如您所见，这会导致出现大量错误消息。为了更好地理解地图，我尝试将每个 ID 的权重向量乘以 2。

df %>% nest(data = c(weights, obs)) %>% map(data, ~ .x$weights*2)
$id
[1] ".x[[i]]"         "~.x$weights * 2"

$data
[1] ".x[[i]]"         "~.x$weights * 2"

Warning messages:
1: In .f(.x[[i]], ...) : data set ‘.x[[i]]’ not found
2: In .f(.x[[i]], ...) : data set ‘~.x$weights * 2’ not found
3: In .f(.x[[i]], ...) : data set ‘.x[[i]]’ not found
4: In .f(.x[[i]], ...) : data set ‘~.x$weights * 2’ not found

和

df %>% nest(data = c(weights, obs)) %>% map(data, function(x) x$weights*2)
Warning in .f(.x[[i]], ...) : data set ‘.x[[i]]’ not found
Warning in .f(.x[[i]], ...) :
  data set ‘function(x) x$weights * 2’ not found
Warning in .f(.x[[i]], ...) : data set ‘.x[[i]]’ not found
Warning in .f(.x[[i]], ...) :
  data set ‘function(x) x$weights * 2’ not found
$id
[1] ".x[[i]]"                   "function(x) x$weights * 2"

$data
[1] ".x[[i]]"                   "function(x) x$weights * 2"

所以我也在这里收到错误消息。即使阅读了地图的文档，我也很迷茫。我没有看到我的错误。我很高兴有任何见解！

非常感谢！

Answer 1

我们可以在 mutate 中传递 map，因为 data 列在数据之外不可访问，除非我们使用 .$data

library(dplyr)
library(purrr)
df %>%
   nest(data = c(weights, obs)) %>%
    mutate(wtd_mean = map_dbl(data, ~ sum(.x$weights*.x$obs)/sum(.x$weights)))

-输出

# A tibble: 3 × 3
     id data             wtd_mean
  <dbl> <list>              <dbl>
1     1 <tibble [2 × 2]>     5.3 
2     2 <tibble [2 × 2]>     3.25
3     3 <tibble [2 × 2]>     1.14

还有 weighted.mean 来自 stats 的函数（base R）

df %>% 
   nest(data = c(weights, obs)) %>% 
   mutate(wtd_mean = map_dbl(data, ~ weighted.mean(.x$obs, .x$weights)))
# A tibble: 3 × 3
     id data             wtd_mean
  <dbl> <list>              <dbl>
1     1 <tibble [2 × 2]>     5.3 
2     2 <tibble [2 × 2]>     3.25
3     3 <tibble [2 × 2]>     1.14

Answer 2

如果需要 tibble，可以将命名向量 split-map_dbl 设为 return 或 map_df：

tibble(id = c(1, 1, 2, 2, 3,3), weights = c(0.3,0.7,0.25,0.75,0.14,0.86), obs = 6:1) %>% 
    split.data.frame(.$id) %>% 
    map_dbl(
        ~sum(.x$weights * .x$obs)/sum(.x$weights)
    )
   1    2    3 
5.30 3.25 1.14 
tibble(id = c(1, 1, 2, 2, 3,3), weights = c(0.3,0.7,0.25,0.75,0.14,0.86), obs = 6:1) %>% 
    split.data.frame(.$id) %>% 
    map_df(
        ~sum(.x$weights * .x$obs)/sum(.x$weights)
    )
    # A tibble: 1 x 3
    `1`   `2`   `3`
  <dbl> <dbl> <dbl>
1   5.3  3.25  1.14

将地图与嵌套列表一起使用

Use map with nested lists

r

purrr

tidyverse