是否有 R 函数通过乘以另一列的值来添加列?
Is there an R function to add columns by multipling value of an other column?
我想进行关联分析,但需要将我的数据框转换为正确的格式,该格式仅显示交易。 1) 如何将我的 "Sub Category" 列乘以 "Quantity" 列的数量?
2) 如何按订单 ID 对交易进行分组?
我有这个 df:
`Order ID` `Sub-Category` `Quantity`
<chr> <chr> <dbl>
1 CA-2017-152156 Bookcases 2
2 CA-2017-152156 Chairs 3
3 CA-2017-138688 Labels 2
1) 我想要这个:
`Order ID` `Sub-Category` `Sub-Category2` `Sub-Category3`
<chr> <chr> <chr> <chr>
1 CA-2017-152156 Bookcases Bookcases NULL
2 CA-2017-152156 Chairs Chairs Chairs
3 CA-2017-138688 Labels Labels NULL
(之后我想合并相同的订单 ID。例如第 1 行和第 2 行。您对此有提示吗?)
谢谢!
以下对第 1 点的回答)。
Max <- max(df1$Quantity)
res <- lapply(seq_len(nrow(df1)), function(i){
c(rep(as.character(df1[i, 2]), df1[i, 3]), rep(NA, Max - df1[i, 3]))
})
res <- cbind(df1[1], do.call(rbind, res))
names(res)[-1] <- paste0(names(df1)[2], names(res)[-1])
res
# Order ID Sub-Category1 Sub-Category2 Sub-Category3
#1 CA-2017-152156 Bookcases Bookcases <NA>
#2 CA-2017-152156 Chairs Chairs Chairs
#3 CA-2017-138688 Labels Labels <NA>
dput
格式的数据。
df1 <-
structure(list(`Order ID` = structure(c(2L, 2L, 1L),
.Label = c("CA-2017-138688", "CA-2017-152156"),
class = "factor"), `Sub-Category` = structure(1:3,
.Label = c("Bookcases", "Chairs", "Labels"), class =
"factor"), Quantity = c(2L, 3L, 2L)), class = "data.frame",
row.names = c("1", "2", "3"))
要回答问题 1) 使用 tidyverse
,一种方法是创建一个新列 rep
eat every Sub-Category
Quantity
次并将其存储为一个comma-separated 格式的字符串,然后将它们 separate
放入 n
列。
library(tidyverse)
n <- max(df$Quantity)
df1 <- df %>%
mutate(new = map2_chr(`Sub-Category`, Quantity, ~paste(rep(.x, .y), collapse = ","))) %>%
separate(new, paste("Sub-Category", seq_len(n))) %>%
select(-`Sub-Category`)
df1
# Order ID Quantity Sub-Category 1 Sub-Category 2 Sub-Category 3
#1 CA-2017-152156 2 Bookcases Bookcases <NA>
#2 CA-2017-152156 3 Chairs Chairs Chairs
#3 CA-2017-138688 2 Labels Labels <NA>
就问题 2) 而言,我不是 100% 清楚你在寻找什么(因为没有预期的输出)但我认为你正在寻找 group_by
Order ID
和将每个组的类别折叠成一行?
df1 %>%
group_by(`Order ID`) %>%
summarise_at(vars(starts_with("Sub")), list(~paste(na.omit(.), collapse = ",")))
# A tibble: 2 x 4
# `Order ID` `Sub-Category 1` `Sub-Category 2` `Sub-Category 3`
# <fct> <chr> <chr> <chr>
#1 CA-2017-138688 Labels Labels ""
#2 CA-2017-152156 Bookcases,Chairs Bookcases,Chairs Chairs
我想进行关联分析,但需要将我的数据框转换为正确的格式,该格式仅显示交易。 1) 如何将我的 "Sub Category" 列乘以 "Quantity" 列的数量?
2) 如何按订单 ID 对交易进行分组?
我有这个 df:
`Order ID` `Sub-Category` `Quantity`
<chr> <chr> <dbl>
1 CA-2017-152156 Bookcases 2
2 CA-2017-152156 Chairs 3
3 CA-2017-138688 Labels 2
1) 我想要这个:
`Order ID` `Sub-Category` `Sub-Category2` `Sub-Category3`
<chr> <chr> <chr> <chr>
1 CA-2017-152156 Bookcases Bookcases NULL
2 CA-2017-152156 Chairs Chairs Chairs
3 CA-2017-138688 Labels Labels NULL
(之后我想合并相同的订单 ID。例如第 1 行和第 2 行。您对此有提示吗?) 谢谢!
以下对第 1 点的回答)。
Max <- max(df1$Quantity)
res <- lapply(seq_len(nrow(df1)), function(i){
c(rep(as.character(df1[i, 2]), df1[i, 3]), rep(NA, Max - df1[i, 3]))
})
res <- cbind(df1[1], do.call(rbind, res))
names(res)[-1] <- paste0(names(df1)[2], names(res)[-1])
res
# Order ID Sub-Category1 Sub-Category2 Sub-Category3
#1 CA-2017-152156 Bookcases Bookcases <NA>
#2 CA-2017-152156 Chairs Chairs Chairs
#3 CA-2017-138688 Labels Labels <NA>
dput
格式的数据。
df1 <-
structure(list(`Order ID` = structure(c(2L, 2L, 1L),
.Label = c("CA-2017-138688", "CA-2017-152156"),
class = "factor"), `Sub-Category` = structure(1:3,
.Label = c("Bookcases", "Chairs", "Labels"), class =
"factor"), Quantity = c(2L, 3L, 2L)), class = "data.frame",
row.names = c("1", "2", "3"))
要回答问题 1) 使用 tidyverse
,一种方法是创建一个新列 rep
eat every Sub-Category
Quantity
次并将其存储为一个comma-separated 格式的字符串,然后将它们 separate
放入 n
列。
library(tidyverse)
n <- max(df$Quantity)
df1 <- df %>%
mutate(new = map2_chr(`Sub-Category`, Quantity, ~paste(rep(.x, .y), collapse = ","))) %>%
separate(new, paste("Sub-Category", seq_len(n))) %>%
select(-`Sub-Category`)
df1
# Order ID Quantity Sub-Category 1 Sub-Category 2 Sub-Category 3
#1 CA-2017-152156 2 Bookcases Bookcases <NA>
#2 CA-2017-152156 3 Chairs Chairs Chairs
#3 CA-2017-138688 2 Labels Labels <NA>
就问题 2) 而言,我不是 100% 清楚你在寻找什么(因为没有预期的输出)但我认为你正在寻找 group_by
Order ID
和将每个组的类别折叠成一行?
df1 %>%
group_by(`Order ID`) %>%
summarise_at(vars(starts_with("Sub")), list(~paste(na.omit(.), collapse = ",")))
# A tibble: 2 x 4
# `Order ID` `Sub-Category 1` `Sub-Category 2` `Sub-Category 3`
# <fct> <chr> <chr> <chr>
#1 CA-2017-138688 Labels Labels ""
#2 CA-2017-152156 Bookcases,Chairs Bookcases,Chairs Chairs