将同一数据框的列重塑为一个

Question

我有一个 df 看起来像这样：

Department   ID   Category   Category.ID
    NA       NA      NA          NA
   Sales     101     2           4
   Sales     101     2           4
    NA       NA      NA          NA
   Sales     101     2           4
   Sales     101     2           4
    NA       NA      NA          NA
   Sales     101     2           4
   Sales     101     2           4

df = data.frame(Department = rep(c(NA, 'Sales', 'Sales'), times = 3),
                ID = rep(c(NA, 101, 101), times = 3),
                Category.Department = rep(c(NA, 2, 2), times = 3),
                Category.ID = rep(c(NA, 4, 4), times = 3), stringsAsFactors = FALSE)

我想要这样的输出，其中只有一列我可以有 Department 和 ID，而在另一列中有 Category。 NA 每列中的分组很重要。

New.Col   Category
  NA         NA
 Sales       2
  101        4
  NA         NA
 Sales       2
  101        4
  NA         NA
 Sales       2
  101        4

到目前为止，我尝试使用 transpose、sapply 和 function，但它没有像我预期的那样工作。 base 有什么建议吗？

Answer 1

无法接受没有真实预期输出的接受。


df$group <- rep(1:3, times = 3)

df2 <- reshape(df[df$group != 3,], direction = "long", varying = list(New.col = c(1,2), Category = c(3,4)),
               idvar = "id", v.names = c("New.col", "Category"))

df3 <- df2[order(df2$id),]

df3[!(df3$time == 1 & df3$group == 1), c(3,4)] 

    New.col Category
1.2    <NA>       NA
2.1   Sales        2
2.2     101        4
3.2    <NA>       NA
4.1   Sales        2
4.2     101        4
5.2    <NA>       NA
6.1   Sales        2
6.2     101        4

Answer 2

这是一种不同于转换为长格式的方法，它依赖于 coalesce。此外，我创建了一个组变量并删除了 NA 行，因为它们在您的分析中没有用处，即

library(tidyverse)

df %>% 
 group_by(grp = cumsum(rowSums(is.na(.)) == ncol(.))) %>% 
 mutate_at(vars(contains('ID')), funs(lag)) %>% 
 mutate_at(vars(contains('Department')), funs(lead)) %>% 
 mutate(new.col = coalesce(Department, as.character(ID)), 
        category = coalesce(Category.Department, Category.ID)) %>% 
 select(grp, new.col, category) %>% 
 distinct()

这给出了，

# A tibble: 6 x 3
# Groups:   grp [3]
    grp new.col category
  <int> <chr>      <dbl>
1     1 Sales          2
2     1 101            4
3     2 Sales          2
4     2 101            4
5     3 Sales          2
6     3 101            4

将同一数据框的列重塑为一个

Reshaping columns of the same dataframe into one

r

reshape

dataframe