分组,总结,在 R 中传播不起作用
Group by, summarize, spread in R not working
我有一个如下所示的数据框:
ID Code Desc
1 0A Red
1 NA Red
2 1A Blue
3 2B Green
我想先创建一个新列,在其中连接 ID 相同的代码列中的值。所以:
ID Combined_Code Desc
1 0A | NA Red
2 1A Blue
3 2B Green
那我就想把原来的Code专栏拿来传播一下。在这种情况下,值将是每个代码针对给定 ID 出现的次数。所以:
ID Combined_Code 0A NA 1A 2B Desc
1 0A | NA 1 1 0 0 Red
2 1A 0 0 1 0 Blue
3 2B 0 0 0 1 Green
我试过:
sample_data %>%
group_by(ID) %>%
summarise(Combined_Code = paste(unique(Combined_Code), collapse ='|'))
这适用于创建串联。但是,我不能让它与 spread 一起工作:
sample_data %>%
group_by(ID) %>%
summarise(Combined_Code = paste(unique(Combined_Code), collapse ='|'))
sample_data <- spread(count(sample_data, ID, Combined_Code, Desc., Code), Code, n, fill = 0)
这样做会传播,但会丢弃串联。我也用过滤器而不是总结来尝试这个,它给出了相同的结果。这导致:
ID Combined_Code 0A NA 1A 2B Desc
1 0A 1 0 0 0 Red
1 NA 0 1 0 0 Red
2 1A 0 0 1 0 Blue
3 2B 0 0 0 1 Green
最后,我尝试通过汇总函数进行管道传播:
sample_data %>%
group_by(ID) %>%
summarise(Combined_Code = paste(unique(Combined_Code), collapse ='|')) %>%
spread(count(sample_data, ID, Combined_Code, Desc., Code), Code, n, fill = 0)
这会导致错误:
Error: `var` must evaluate to a single number or a column name, not a list
Run `rlang::last_error()` to see where the error occurred.
我能做些什么来解决这些问题?
我们可以做一个小组paste
library(dplyr)
library(stringr)
df1 %>%
group_by(ID, Desc) %>%
summarise(Combined_Code = str_c(Code, collapse="|"))
# A tibble: 3 x 3
# Groups: ID [3]
# ID Desc Combined_Code
# <int> <chr> <chr>
#1 1 Red 0A|0B
#2 2 Blue 1A
#3 3 Green 2B
对于第二种情况,在创建一个'val'列1s之后,paste
'Code'元素按'ID'、'Desc'分组后,然后使用 tidyr
中的 pivot_wider
将 'long' 重塑为 'wide format.
library(tidyr)
df1 %>%
mutate(val = 1) %>%
group_by(ID, Desc) %>%
mutate(Combined_Code = str_c(Code, collapse="|")) %>%
pivot_wider(names_from = Code, values_from = val, values_fill = list(val = 0))
# A tibble: 3 x 7
# Groups: ID, Desc [3]
# ID Desc Combined_Code `0A` `0B` `1A` `2B`
# <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#1 1 Red 0A|0B 1 1 0 0
#2 2 Blue 1A 0 0 1 0
#3 3 Green 2B 0 0 0 1
OP 的预期输出是
ID Combined_Code 0A 0B 1A 2B Desc
1 0A | 0B 1 1 0 0 Red
2 1A 0 0 1 0 Blue
3 2B 0 0 0 1 Green
更新
对于更新后的数据集,'Code'中有NA
个元素,默认情况下str_c
returnsNA
如果有任何NA作为一个的元素,而 paste
仍然 returns NA 以及其他元素。在这里,我们将 str_c
替换为 paste
df2 %>%
mutate(val = 1) %>%
group_by(ID, Desc) %>%
mutate(Combined_Code = paste(Code, collapse="|")) %>%
pivot_wider(names_from = Code, values_from = val, values_fill = list(val = 0))
# A tibble: 3 x 7
# Groups: ID, Desc [3]
# ID Desc Combined_Code `0A` `NA` `1A` `2B`
# <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#1 1 Red 0A|NA 1 1 0 0
#2 2 Blue 1A 0 0 1 0
#3 3 Green 2B 0 0 0 1
数据
df1 <- structure(list(ID = c(1L, 1L, 2L, 3L), Code = c("0A", "0B", "1A",
"2B"), Desc = c("Red", "Red", "Blue", "Green")),
class = "data.frame", row.names = c(NA,
-4L))
df2 <- structure(list(ID = c(1L, 1L, 2L, 3L), Code = c("0A", NA, "1A",
"2B"), Desc = c("Red", "Red", "Blue", "Green")), class = "data.frame",
row.names = c(NA,
-4L))
我有一个如下所示的数据框:
ID Code Desc
1 0A Red
1 NA Red
2 1A Blue
3 2B Green
我想先创建一个新列,在其中连接 ID 相同的代码列中的值。所以:
ID Combined_Code Desc
1 0A | NA Red
2 1A Blue
3 2B Green
那我就想把原来的Code专栏拿来传播一下。在这种情况下,值将是每个代码针对给定 ID 出现的次数。所以:
ID Combined_Code 0A NA 1A 2B Desc
1 0A | NA 1 1 0 0 Red
2 1A 0 0 1 0 Blue
3 2B 0 0 0 1 Green
我试过:
sample_data %>%
group_by(ID) %>%
summarise(Combined_Code = paste(unique(Combined_Code), collapse ='|'))
这适用于创建串联。但是,我不能让它与 spread 一起工作:
sample_data %>%
group_by(ID) %>%
summarise(Combined_Code = paste(unique(Combined_Code), collapse ='|'))
sample_data <- spread(count(sample_data, ID, Combined_Code, Desc., Code), Code, n, fill = 0)
这样做会传播,但会丢弃串联。我也用过滤器而不是总结来尝试这个,它给出了相同的结果。这导致:
ID Combined_Code 0A NA 1A 2B Desc
1 0A 1 0 0 0 Red
1 NA 0 1 0 0 Red
2 1A 0 0 1 0 Blue
3 2B 0 0 0 1 Green
最后,我尝试通过汇总函数进行管道传播:
sample_data %>%
group_by(ID) %>%
summarise(Combined_Code = paste(unique(Combined_Code), collapse ='|')) %>%
spread(count(sample_data, ID, Combined_Code, Desc., Code), Code, n, fill = 0)
这会导致错误:
Error: `var` must evaluate to a single number or a column name, not a list
Run `rlang::last_error()` to see where the error occurred.
我能做些什么来解决这些问题?
我们可以做一个小组paste
library(dplyr)
library(stringr)
df1 %>%
group_by(ID, Desc) %>%
summarise(Combined_Code = str_c(Code, collapse="|"))
# A tibble: 3 x 3
# Groups: ID [3]
# ID Desc Combined_Code
# <int> <chr> <chr>
#1 1 Red 0A|0B
#2 2 Blue 1A
#3 3 Green 2B
对于第二种情况,在创建一个'val'列1s之后,paste
'Code'元素按'ID'、'Desc'分组后,然后使用 tidyr
中的 pivot_wider
将 'long' 重塑为 'wide format.
library(tidyr)
df1 %>%
mutate(val = 1) %>%
group_by(ID, Desc) %>%
mutate(Combined_Code = str_c(Code, collapse="|")) %>%
pivot_wider(names_from = Code, values_from = val, values_fill = list(val = 0))
# A tibble: 3 x 7
# Groups: ID, Desc [3]
# ID Desc Combined_Code `0A` `0B` `1A` `2B`
# <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#1 1 Red 0A|0B 1 1 0 0
#2 2 Blue 1A 0 0 1 0
#3 3 Green 2B 0 0 0 1
OP 的预期输出是
ID Combined_Code 0A 0B 1A 2B Desc
1 0A | 0B 1 1 0 0 Red
2 1A 0 0 1 0 Blue
3 2B 0 0 0 1 Green
更新
对于更新后的数据集,'Code'中有NA
个元素,默认情况下str_c
returnsNA
如果有任何NA作为一个的元素,而 paste
仍然 returns NA 以及其他元素。在这里,我们将 str_c
替换为 paste
df2 %>%
mutate(val = 1) %>%
group_by(ID, Desc) %>%
mutate(Combined_Code = paste(Code, collapse="|")) %>%
pivot_wider(names_from = Code, values_from = val, values_fill = list(val = 0))
# A tibble: 3 x 7
# Groups: ID, Desc [3]
# ID Desc Combined_Code `0A` `NA` `1A` `2B`
# <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#1 1 Red 0A|NA 1 1 0 0
#2 2 Blue 1A 0 0 1 0
#3 3 Green 2B 0 0 0 1
数据
df1 <- structure(list(ID = c(1L, 1L, 2L, 3L), Code = c("0A", "0B", "1A",
"2B"), Desc = c("Red", "Red", "Blue", "Green")),
class = "data.frame", row.names = c(NA,
-4L))
df2 <- structure(list(ID = c(1L, 1L, 2L, 3L), Code = c("0A", NA, "1A",
"2B"), Desc = c("Red", "Red", "Blue", "Green")), class = "data.frame",
row.names = c(NA,
-4L))