将同一数据框的列重塑为一个
Reshaping columns of the same dataframe into one
我有一个 df
看起来像这样:
Department ID Category Category.ID
NA NA NA NA
Sales 101 2 4
Sales 101 2 4
NA NA NA NA
Sales 101 2 4
Sales 101 2 4
NA NA NA NA
Sales 101 2 4
Sales 101 2 4
df = data.frame(Department = rep(c(NA, 'Sales', 'Sales'), times = 3),
ID = rep(c(NA, 101, 101), times = 3),
Category.Department = rep(c(NA, 2, 2), times = 3),
Category.ID = rep(c(NA, 4, 4), times = 3), stringsAsFactors = FALSE)
我想要这样的输出,其中只有一列我可以有 Department
和 ID
,而在另一列中有 Category
。 NA
每列中的分组很重要。
New.Col Category
NA NA
Sales 2
101 4
NA NA
Sales 2
101 4
NA NA
Sales 2
101 4
到目前为止,我尝试使用 transpose
、sapply
和 function
,但它没有像我预期的那样工作。 base
有什么建议吗?
无法接受没有真实预期输出的接受。
df$group <- rep(1:3, times = 3)
df2 <- reshape(df[df$group != 3,], direction = "long", varying = list(New.col = c(1,2), Category = c(3,4)),
idvar = "id", v.names = c("New.col", "Category"))
df3 <- df2[order(df2$id),]
df3[!(df3$time == 1 & df3$group == 1), c(3,4)]
New.col Category
1.2 <NA> NA
2.1 Sales 2
2.2 101 4
3.2 <NA> NA
4.1 Sales 2
4.2 101 4
5.2 <NA> NA
6.1 Sales 2
6.2 101 4
这是一种不同于转换为长格式的方法,它依赖于 coalesce
。此外,我创建了一个组变量并删除了 NA
行,因为它们在您的分析中没有用处,即
library(tidyverse)
df %>%
group_by(grp = cumsum(rowSums(is.na(.)) == ncol(.))) %>%
mutate_at(vars(contains('ID')), funs(lag)) %>%
mutate_at(vars(contains('Department')), funs(lead)) %>%
mutate(new.col = coalesce(Department, as.character(ID)),
category = coalesce(Category.Department, Category.ID)) %>%
select(grp, new.col, category) %>%
distinct()
这给出了,
# A tibble: 6 x 3
# Groups: grp [3]
grp new.col category
<int> <chr> <dbl>
1 1 Sales 2
2 1 101 4
3 2 Sales 2
4 2 101 4
5 3 Sales 2
6 3 101 4
我有一个 df
看起来像这样:
Department ID Category Category.ID
NA NA NA NA
Sales 101 2 4
Sales 101 2 4
NA NA NA NA
Sales 101 2 4
Sales 101 2 4
NA NA NA NA
Sales 101 2 4
Sales 101 2 4
df = data.frame(Department = rep(c(NA, 'Sales', 'Sales'), times = 3),
ID = rep(c(NA, 101, 101), times = 3),
Category.Department = rep(c(NA, 2, 2), times = 3),
Category.ID = rep(c(NA, 4, 4), times = 3), stringsAsFactors = FALSE)
我想要这样的输出,其中只有一列我可以有 Department
和 ID
,而在另一列中有 Category
。 NA
每列中的分组很重要。
New.Col Category
NA NA
Sales 2
101 4
NA NA
Sales 2
101 4
NA NA
Sales 2
101 4
到目前为止,我尝试使用 transpose
、sapply
和 function
,但它没有像我预期的那样工作。 base
有什么建议吗?
无法接受没有真实预期输出的接受。
df$group <- rep(1:3, times = 3)
df2 <- reshape(df[df$group != 3,], direction = "long", varying = list(New.col = c(1,2), Category = c(3,4)),
idvar = "id", v.names = c("New.col", "Category"))
df3 <- df2[order(df2$id),]
df3[!(df3$time == 1 & df3$group == 1), c(3,4)]
New.col Category
1.2 <NA> NA
2.1 Sales 2
2.2 101 4
3.2 <NA> NA
4.1 Sales 2
4.2 101 4
5.2 <NA> NA
6.1 Sales 2
6.2 101 4
这是一种不同于转换为长格式的方法,它依赖于 coalesce
。此外,我创建了一个组变量并删除了 NA
行,因为它们在您的分析中没有用处,即
library(tidyverse)
df %>%
group_by(grp = cumsum(rowSums(is.na(.)) == ncol(.))) %>%
mutate_at(vars(contains('ID')), funs(lag)) %>%
mutate_at(vars(contains('Department')), funs(lead)) %>%
mutate(new.col = coalesce(Department, as.character(ID)),
category = coalesce(Category.Department, Category.ID)) %>%
select(grp, new.col, category) %>%
distinct()
这给出了,
# A tibble: 6 x 3 # Groups: grp [3] grp new.col category <int> <chr> <dbl> 1 1 Sales 2 2 1 101 4 3 2 Sales 2 4 2 101 4 5 3 Sales 2 6 3 101 4