如何为整个数据框指定 pivot_wider?

How do I specify pivot_wider for an entire dataframe?

我可以使用以下方法 pivot_wider 特定列:

new_df <- pivot_wider(old_df, names_from = col10, values_from = value_col, values_fn = list)

我想 pivot_wider 数据框中的每一列(减去 id 列)。做这个的最好方式是什么?我应该使用循环还是有办法让这个函数获取整个数据帧?

为了澄清,使用下面的示例数据帧,我可以使用上面列出的 pivot_wider 函数从 old_df 转到 new_df。我现在想从 old_df2 转到 new_df2.

old_df <- structure(list(id = c("1", "1", "2"), col10 = c("yellow", 
"green", "green"), value_col = c("1", "1", "1")), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"))

old_df2 <- structure(list(id = c("1", "1", "2"), col10 = c("yellow", 
"green", "green"), col11 = c("dog", 
"cat", "dog"), value_col = c("1", "1", "1")), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"))

new_df <- pivot_wider(old_df, names_from = col10, values_from = value_col, values_fn = list)

new_df2 <- structure(list(id = c("1", "2"), yellow = c("1", "NULL"), green = c("1", "1"), dog = c("1", "1"), cat = c("1", "NULL")), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"))

如果您想为这两列(或任意数量的列)之间的每个值使用单独的列名,您首先需要使用 pivot_longer 将所有列名放入一个列中,然后使用 pivot_wider 传播它们:

library(tidyr)

old_df2 %>%
  pivot_longer(!c(id, value_col), names_to = "Cols", values_to = "vals") %>%
  pivot_wider(names_from = vals, values_from = value_col) %>%
  select(-Cols) %>%
  group_by(id) %>%
  summarise(across(everything(), ~ sum(as.numeric(.x), na.rm = TRUE)))

# A tibble: 2 x 5
  id    yellow   dog green   cat
  <chr>  <dbl> <dbl> <dbl> <dbl>
1 1          1     1     1     1
2 2          0     1     1     0

更新 1

根据您的更新,这里有一个 data.table 选项

dcast(
  melt(setDT(old_df),
    id.var = "id",
    measure.vars = patterns("^col\d+")
  ),
  id ~ value,
  fun.aggregate = length,
  fill = NA
)

这给出了

   id cat dog green yellow
1:  1   1   1     1      1
2:  2  NA   1     1     NA

您是否在寻找类似下面的内容?

reshape(
  transform(
    old_df,
    q = ave(id, id, FUN = seq_along)
  ),
  direction = "wide",
  idvar = "id",
  timevar = "q"
)

输出为

  id col10.1 col11.1 value_col.1 col10.2 col11.2 value_col.2
1  1  yellow     dog           1   green     cat           1
3  2   green     dog           1    <NA>    <NA>        <NA>

您可以合并这些列并取消嵌套,然后是 pivot_wider:

library(tidyr)
library(dplyr)

old_df2 <- structure(list(id = c("1", "1", "2"), col10 = c("yellow", 
                                                           "green", "green"), col11 = c("dog", 
                                                                                        "cat", "dog"), value_col = c("1", "1", "1")), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"))


old_df2 %>% 
  mutate(new_col = strsplit(paste(col10, col11, sep = "_"), "_"), .keep = "unused") %>% 
  unnest(new_col) %>% 
  pivot_wider(names_from = new_col, values_from = value_col)

#> # A tibble: 2 x 5
#>   id    yellow dog   green cat  
#>   <chr> <chr>  <chr> <chr> <chr>
#> 1 1     1      1     1     1    
#> 2 2     <NA>   1     1     <NA>

reprex package (v2.0.1)

于 2021-08-25 创建