如何为整个数据框指定 pivot_wider?
How do I specify pivot_wider for an entire dataframe?
我可以使用以下方法 pivot_wider 特定列:
new_df <- pivot_wider(old_df, names_from = col10, values_from = value_col, values_fn = list)
我想 pivot_wider
数据框中的每一列(减去 id 列)。做这个的最好方式是什么?我应该使用循环还是有办法让这个函数获取整个数据帧?
为了澄清,使用下面的示例数据帧,我可以使用上面列出的 pivot_wider 函数从 old_df 转到 new_df。我现在想从 old_df2 转到 new_df2.
old_df <- structure(list(id = c("1", "1", "2"), col10 = c("yellow",
"green", "green"), value_col = c("1", "1", "1")), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"))
old_df2 <- structure(list(id = c("1", "1", "2"), col10 = c("yellow",
"green", "green"), col11 = c("dog",
"cat", "dog"), value_col = c("1", "1", "1")), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"))
new_df <- pivot_wider(old_df, names_from = col10, values_from = value_col, values_fn = list)
new_df2 <- structure(list(id = c("1", "2"), yellow = c("1", "NULL"), green = c("1", "1"), dog = c("1", "1"), cat = c("1", "NULL")), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"))
如果您想为这两列(或任意数量的列)之间的每个值使用单独的列名,您首先需要使用 pivot_longer
将所有列名放入一个列中,然后使用 pivot_wider
传播它们:
library(tidyr)
old_df2 %>%
pivot_longer(!c(id, value_col), names_to = "Cols", values_to = "vals") %>%
pivot_wider(names_from = vals, values_from = value_col) %>%
select(-Cols) %>%
group_by(id) %>%
summarise(across(everything(), ~ sum(as.numeric(.x), na.rm = TRUE)))
# A tibble: 2 x 5
id yellow dog green cat
<chr> <dbl> <dbl> <dbl> <dbl>
1 1 1 1 1 1
2 2 0 1 1 0
更新 1
根据您的更新,这里有一个 data.table
选项
dcast(
melt(setDT(old_df),
id.var = "id",
measure.vars = patterns("^col\d+")
),
id ~ value,
fun.aggregate = length,
fill = NA
)
这给出了
id cat dog green yellow
1: 1 1 1 1 1
2: 2 NA 1 1 NA
您是否在寻找类似下面的内容?
reshape(
transform(
old_df,
q = ave(id, id, FUN = seq_along)
),
direction = "wide",
idvar = "id",
timevar = "q"
)
输出为
id col10.1 col11.1 value_col.1 col10.2 col11.2 value_col.2
1 1 yellow dog 1 green cat 1
3 2 green dog 1 <NA> <NA> <NA>
您可以合并这些列并取消嵌套,然后是 pivot_wider
:
library(tidyr)
library(dplyr)
old_df2 <- structure(list(id = c("1", "1", "2"), col10 = c("yellow",
"green", "green"), col11 = c("dog",
"cat", "dog"), value_col = c("1", "1", "1")), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"))
old_df2 %>%
mutate(new_col = strsplit(paste(col10, col11, sep = "_"), "_"), .keep = "unused") %>%
unnest(new_col) %>%
pivot_wider(names_from = new_col, values_from = value_col)
#> # A tibble: 2 x 5
#> id yellow dog green cat
#> <chr> <chr> <chr> <chr> <chr>
#> 1 1 1 1 1 1
#> 2 2 <NA> 1 1 <NA>
由 reprex package (v2.0.1)
于 2021-08-25 创建
我可以使用以下方法 pivot_wider 特定列:
new_df <- pivot_wider(old_df, names_from = col10, values_from = value_col, values_fn = list)
我想 pivot_wider
数据框中的每一列(减去 id 列)。做这个的最好方式是什么?我应该使用循环还是有办法让这个函数获取整个数据帧?
为了澄清,使用下面的示例数据帧,我可以使用上面列出的 pivot_wider 函数从 old_df 转到 new_df。我现在想从 old_df2 转到 new_df2.
old_df <- structure(list(id = c("1", "1", "2"), col10 = c("yellow",
"green", "green"), value_col = c("1", "1", "1")), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"))
old_df2 <- structure(list(id = c("1", "1", "2"), col10 = c("yellow",
"green", "green"), col11 = c("dog",
"cat", "dog"), value_col = c("1", "1", "1")), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"))
new_df <- pivot_wider(old_df, names_from = col10, values_from = value_col, values_fn = list)
new_df2 <- structure(list(id = c("1", "2"), yellow = c("1", "NULL"), green = c("1", "1"), dog = c("1", "1"), cat = c("1", "NULL")), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"))
如果您想为这两列(或任意数量的列)之间的每个值使用单独的列名,您首先需要使用 pivot_longer
将所有列名放入一个列中,然后使用 pivot_wider
传播它们:
library(tidyr)
old_df2 %>%
pivot_longer(!c(id, value_col), names_to = "Cols", values_to = "vals") %>%
pivot_wider(names_from = vals, values_from = value_col) %>%
select(-Cols) %>%
group_by(id) %>%
summarise(across(everything(), ~ sum(as.numeric(.x), na.rm = TRUE)))
# A tibble: 2 x 5
id yellow dog green cat
<chr> <dbl> <dbl> <dbl> <dbl>
1 1 1 1 1 1
2 2 0 1 1 0
更新 1
根据您的更新,这里有一个 data.table
选项
dcast(
melt(setDT(old_df),
id.var = "id",
measure.vars = patterns("^col\d+")
),
id ~ value,
fun.aggregate = length,
fill = NA
)
这给出了
id cat dog green yellow
1: 1 1 1 1 1
2: 2 NA 1 1 NA
您是否在寻找类似下面的内容?
reshape(
transform(
old_df,
q = ave(id, id, FUN = seq_along)
),
direction = "wide",
idvar = "id",
timevar = "q"
)
输出为
id col10.1 col11.1 value_col.1 col10.2 col11.2 value_col.2
1 1 yellow dog 1 green cat 1
3 2 green dog 1 <NA> <NA> <NA>
您可以合并这些列并取消嵌套,然后是 pivot_wider
:
library(tidyr)
library(dplyr)
old_df2 <- structure(list(id = c("1", "1", "2"), col10 = c("yellow",
"green", "green"), col11 = c("dog",
"cat", "dog"), value_col = c("1", "1", "1")), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"))
old_df2 %>%
mutate(new_col = strsplit(paste(col10, col11, sep = "_"), "_"), .keep = "unused") %>%
unnest(new_col) %>%
pivot_wider(names_from = new_col, values_from = value_col)
#> # A tibble: 2 x 5
#> id yellow dog green cat
#> <chr> <chr> <chr> <chr> <chr>
#> 1 1 1 1 1 1
#> 2 2 <NA> 1 1 <NA>
由 reprex package (v2.0.1)
于 2021-08-25 创建