如何在 R 中使用从长到宽的分类变量重塑 DF？

Question

我不熟悉重塑数据框。我有一个 df，我想扩大它以便我可以将它用于集群和 NMDS 等分析。我发现了几个与如何重塑主要包含定量数据（使用聚合函数）的数据相关的问题和（答案），但就我而言，我的变量都是分类的。

由于我的 df 有一千行和几十列，我创建了一个玩具 df 作为示例。它看起来像这样：

df <- data.frame(
  id=c("a","c", "a","b","d","c","e","d","c","a","a","e","a","b","d"), 
  color=c("red", "blue", "gray", "yellow", "green","green","blue","purple"            ,"black","green","yellow","blue","red","yellow","gray"),
  fruit=c("apple", "orange", "avocado", "strawberry", "banana", "apple",               "orange", "avocado", "strawberry", "banana","banana", "strawberry",           "watermelon", "lemon", "lemon" ),
  country = c("Italy", "Spain", "Brazil", "Brazil", "Australia", "Italy",           "Japan", "India", "USA", "Mexico", "USA", "Mexico", "Spain",              "France", "France"),
  animal=c("alligator", "camel", "alligator", "bat", "dolphin", "camel",                "elephant", "dolphin", "camel", "alligator", "alligator",                    "elephant", "alligator", "bat", "dolphin"))

我希望“id”列是我重塑后的数据框中的第一个，“animal”是第二个，然后是“color”、“fruit”和“country”的级别。这里的重点是我希望它们分开。

下面的代码显示了我所做的一些尝试：

df <- dplyr::mutate_if(df,is.character,as.factor) 
attach(df)

dcast(df, id ~ color,value.var = "id") #The output is exactly what I wanted! 

dcast(df, id + animal ~ color,value.var = "id") #Exactly what I wanted!

dcast(df, id + animal ~ fruit,value.var = "id") #Exactly what I wanted!

dcast(df, id ~ country, value.var = "id") #Not the output I wanted. Only "works well" if I specify "fun.aggregate=length". Why?

dcast(df, id ~ color + country, value.var = "id") #Not the output what I wanted.

dcast(df, id + animal~ color + country, value.var = "id") #Not the output I wanted.

dcast(df, id + animal~ color + country + fruit, value.var = "id") #Not the output I wanted.

我预期的重塑 df 应该如下所示：

Expected reshape data frame

为此，我尝试了以下所有命令，但其中 none 运行良好：

dcast(df, id + animal ~ color + country + fruit, fun.aggregate=length)

dcast(df, id + animal ~ c(color, country, fruit), fun.aggregate=length)

dcast(df, id + animal ~ c("color", "country", "fruit"), fun.aggregate=length)

dcast(df, id + animal ~ color:fruit, fun.aggregate=length)

我也尝试过使用 tidyr::pivot_wider 来做到这一点，但没有成功。

有没有办法使用 reshape2::dcast 或 tidyr::pivot_wider 或 R 中的任何其他函数来实现我的目标？如果你们能帮助我，我将不胜感激。提前致谢。

Answer 1

首先，您必须pivot_longer 将您想要的列名称放入列中。然后我把它按未来的列名排列，所以单词会被分组，就像你的图像一样，然后我使用了pivot_wider。它掉了动物栏，所以我把它放回去，然后按id排列，这样它们就会和你的图像处于相同的观察顺序。

pivot_longer(df, cols = color:country, names_to = "variable", 
             values_to = "value") %>%                       # column names to rows
  arrange(variable, value) %>%                              # organize future column names
  pivot_wider(!variable, names_from = value, values_from = animal, 
              values_fn = list(animal = length), values_fill = 0) %>%
  left_join(distinct(df[,c(1,5)])) %>%                      # add animals back
  select(id, animal, everything()) %>%                      # rearrange columns
  arrange(id)                                               # reorder observations

根据您的评论更新 - 按颜色、水果、国家/地区排序

添加了mutate并修改了第一个arrange和pivot_wider：

pivot_longer(df,cols = color:country, names_to = "variable", 
             values_to = "value") %>%                # future col names to rows
  mutate(ordering = ifelse(variable == "color", 1,   # create organizer variable
                           ifelse(variable == "fruit", 2, 3))) %>% 
  arrange(ordering, value) %>%                       # organize future column order
  pivot_wider(!c(variable,ordering),                 # make it wide
              names_from = value, 
              values_from = animal, 
              values_fn = list(animal = length), 
              values_fill = 0) %>%
  left_join(distinct(df[,c(1,5)])) %>%               # add the animals back
  select(id, animal, everything()) %>%               # move animals to 2nd position
  arrange(id)                                        # reorder observations

查看：

如何在 R 中使用从长到宽的分类变量重塑 DF？

How to Reshape DF with categorical variables from long to wide in R?

r

reshape

dataframe

reshape2

tidyr