在 R 中,融化数据与 ggplot 一起使用。为什么相同的手动构建的数据集会失败?

in R, melted data works with ggplot. Why does an identical manually constructed data set fail?

我有一个看起来像这样的数据框,我们暂时称它为 t1:

       D1                D3                D5        
 Min.   :-0.2692   Min.   :-0.4129   Min.   : 2.509  
 1st Qu.: 2.4232   1st Qu.: 2.9288   1st Qu.: 4.731  
 Median : 3.3372   Median : 4.0337   Median : 5.657  
 Mean   : 3.5321   Mean   : 4.1214   Mean   : 5.943  
 3rd Qu.: 4.4551   3rd Qu.: 5.0950   3rd Qu.: 6.935  
 Max.   : 9.2710   Max.   : 9.5757   Max.   :10.604 

我可以融化那个数据框,它看起来像这样:

   variable    value
1        D1 5.121777
2        D1 7.129591
3        D1 6.568010
4        D1 9.271042
5        D1 6.246738
...      ...   
909      D5 6.323069
910      D5 6.397816
911      D5 6.293596
912      D5 5.167107
913      D5 4.118420
914      D5 5.733515
...      ....

我正在根据某个组将第三列添加到融化的数据中,所以最后一列看起来像这样。

   variable    value   groupBy
1        D1 5.121777  group1
2        D1 7.129591  group1
3        D1 6.568010  group1
4        D1 9.271042  group1
5        D1 6.246738  group2
...      ...   
909      D5 6.323069  group4
910      D5 6.397816  group4
911      D5 6.293596  group4
912      D5 5.167107  group5
913      D5 4.118420  group5
914      D5 5.733515  group5
...      ....

我的目标是绘制 X 轴有 D1、D5 等的东西。此数据框中的 "variable",Y 轴使用值,颜色按组拆分。这实际上工作正常。

ggplot(final_melt, aes(x = as.numeric(variable), y = value, colour = groupVar)) + geom_smooth(aes(x = as.numeric(variable), y = value), method = 'glm')

现在,我想对此做一个变体,所以我正在创建我自己的融化数据版本来绘制。

  #This is in a loop and just creates "pseudo-melted" data.
  nameSet  <- colnames(result_dfs[[i]])
  meanSet  <- as.numeric(lapply(result_dfs[[i]], mean))
  groupVar <- rep((paste("group", i, sep="")), length(nameSet))
  cBound   <- cbind(nameSet,as.numeric(meanSet),groupVar)
  mean_dat <- rbind(mean_dat, cBound)

  #After the loop, make everything look just like the standard melted dataset.
  colnames(mean_dat) <- c("variable","value","groupVar")
  mean_dat <- data.frame(mean_dat)

所以手动构建的伪熔化数据看起来像这样。我只希望 x 轴具有 "variable" 类别和一条线,根据值从一个条件到另一个条件,用 groupVar 为各个线着色。

   variable              value groupVar
1  Ebola_D1   2.08831695477086   group1
2  Ebola_D3   2.54949105549377   group1
3  Ebola_D5   4.15035141230915   group1
4  Ebola_D1 -0.390323691887409   group2
5  Ebola_D3  -1.83541896004176   group2
6  Ebola_D5  -1.12565386663147   group2
7  Ebola_D1  -0.83608582623162   group3
8  Ebola_D3  -7.55858863601214   group3
9  Ebola_D5  -2.52864397283096   group3
10 Ebola_D1  0.457247980555584   group4
11 Ebola_D3  0.957424853791735   group4
12 Ebola_D5   1.17865891001209   group4

首先,让我们尝试完全相同的事情:

> ggplot(series_dat, aes(x = as.numeric(variable), y = value, colour 
= groupVar)) + geom_smooth(aes(x = as.numeric(variable), y = value), 
method = 'glm')
    Don't know how to automatically pick scale for object of type 
        list. Defaulting to continuous.
    Don't know how to automatically pick scale for object of type 
        list. Defaulting to continuous.
    Error: stat_smooth requires the following missing aesthetics: y
    In addition: There were 24 warnings (use warnings() to see them)

> warnings()
Warning messages:
1: In fun(x, ...) : NAs introduced by coercion
  .. . . . . 

好吧,那不行,所以我试着让它更简单,只是一个线图。

> ggplot(series_dat, aes(x=variable, y=value, group = groupVar)) + 
geom_line(color ="blue")
  Don't know how to automatically pick scale for object of type list. 
      Defaulting to continuous.
  Don't know how to automatically pick scale for object of type list. 
      Defaulting to continuous.
  Don't know how to automatically pick scale for object of type list. 
      Defaulting to continuous.
  Error in order(data$PANEL, data$group, data$x) : 
    argument 3 is not a vector

所以我尝试了很多变体,但我不明白为什么这个手动创建的数据不能像融化的数据一样工作。我觉得是类型问题,但我检查了两者的类型,一切看起来都一样。我感谢任何人可以提供的任何见解。谢谢!

@joran 提到要检查 str(),就在这里。

这是给融化的:

'data.frame':   918 obs. of  2 variables:
 $ variable: Factor w/ 3 levels "D1","D3","D5": 1 1 1 1 1 1 1 1 1 1 ...
 $ value   : num  5.12 7.13 6.57 9.27 6.25 ...

这是未熔化的。

'data.frame':   12 obs. of  3 variables:
$ variable:List of 12
 ..$ : chr "Ebola_D1"
 ..$ : chr "Ebola_D3"
 ..$ : chr "Ebola_D5"
 ..$ : chr "Ebola_D1"
 ..$ : chr "Ebola_D3"
 ..$ : chr "Ebola_D5"
 ..$ : chr "Ebola_D1"
 ..$ : chr "Ebola_D3"
 ..$ : chr "Ebola_D5"
 ..$ : chr "Ebola_D1"
 ..$ : chr "Ebola_D3"
 ..$ : chr "Ebola_D5"
$ value   :List of 12
 ..$ : chr "2.08831695477086"
 ..$ : chr "2.54949105549377"
 ..$ : chr "4.15035141230915"
 ..$ : chr "-0.390323691887409"
 ..$ : chr "-1.83541896004176"
 ..$ : chr "-1.12565386663147"
 ..$ : chr "-0.83608582623162"
 ..$ : chr "-7.55858863601214"
 ..$ : chr "-2.52864397283096"
 ..$ : chr "0.457247980555584"
 ..$ : chr "0.957424853791735"
 ..$ : chr "1.17865891001209"
$ groupVar:List of 12
 ..$ : chr "group1"
 ..$ : chr "group1"
 ..$ : chr "group1"
 ..$ : chr "group2"
 ..$ : chr "group2"
 ..$ : chr "group2"
 ..$ : chr "group3"
 ..$ : chr "group3"
 ..$ : chr "group3"
 ..$ : chr "group4"
 ..$ : chr "group4"
 ..$ : chr "group4"

所以这很有用,但我仍然不太确定如何处理它。

如果 expect/want 要将结果作为数据框,请小心使用 cbind。除非在非常特殊的情况下,cbind() 将倾向于生成一个矩阵,因此会将所有内容转换为单一类型。

从单个向量创建数据框的最安全方法是简单地使用 data.frame()