重塑 R 中的问题：我重塑的数据框将 3 个变量更改为 1 个

Question

我是 R 的新手，正在尝试将我的数据从宽格式重塑为长格式，但遇到了问题。我在想我的问题可能是由于我从我在 R 中创建的 data.frame 制作了 data.frame，将大 data.frame 的平均值转换为另一个 data.frame.

我所做的是创建一个空的 data.frame (ndf):

ndf <- data.frame(matrix(ncol = 0, nrow = 3))

然后使用 lapply 从大 data.frame (ldf) 中获取平均值到新 data.frame 中的单独列中，使用大 [=35] 中的年份=]:

ndf$Year <- names(ldf)
ndf$col1 <- lapply(ldf, function(i) {mean(i$col1)})
ndf$col2 <- lapply(ldf, function(i) {mean(i$col2)})
etc.

reshape2 中的 melted 函数显然不起作用，因为存在非原子 'measure' 列。

为了使用重塑基函数，我使用了代码：

reshape.ndf <- reshape(ndf, 
                    varying = list(names(ndf)[2:7]), 
                    v.names = "cover",
                    timevar = "species",
                    times = names(ndf[2:7]),
                    new.row.names = 1:1000,
                    direction = "long")

我的输出基本上只是使用变量的第一行。所以我的宽 data.frame 看起来像这样（对不起，奇怪的名字）：

Year Cladonia.portentosa Erica.tetralix Eriophorum.vaginatum  
1 2014               11.75             35                   55     
2 2015               15.75          25.75                   70      
3 2016               22.75              5                 37.5

而长 data.frame 看起来像这样：

Year             species cover id
1 2014 Cladonia.portentosa 11.75  1
2 2015 Cladonia.portentosa 11.75  2
3 2016 Cladonia.portentosa 11.75  3
4 2014      Erica.tetralix 35.00  1
5 2015      Erica.tetralix 35.00  2
6 2016      Erica.tetralix 35.00  3

其中 "cover" 列应将每年的值放入相应年份的单元格中。

谁能告诉我哪里做错了！？

Answer 1

这是 tidyr 中 'melting' 的示例。

你需要 tidyr 但我也喜欢 dplyr 并且将它包括在这里以鼓励它与 tidyverse 的其余部分一起使用。您会在网络上找到无穷无尽的精彩教程...

library(dplyr)
library(tidyr)

让我们以 iris 为例，我想要一个长表格，其中 species、variable 和 value 是列。

data(iris)

这是gather()。我们指定变量和值是新 'melted' 列的列名。我们还指定我们不想融化我们希望保留其自己的列的列 Species。

iris_long <- iris %>%
  gather(variable, value, -Species)

检查 iris_long 对象以确保它有效。

Answer 2

除了 roman 的回答，我想我会分享我对我的数据集所做的事情。

我的初始 "wide" data.frame ndf 看起来像这样：

Year Cladonia.portentosa Erica.tetralix Eriophorum.vaginatum  
1 2014               11.75             35                   55     
2 2015               15.75          25.75                   70      
3 2016               22.75              5                 37.5

我用的是下载的tidyr

install.packages("tidyr")

然后选择套餐

library(tidyr)

然后我使用 tidyr 包中的 gather() 函数将 species 列 Cladonia.portentosa Erica.tetralix 和 Eriophorum.vaginatum 一起收集到一栏，在新 "long" data.frame.

中有一个 cover 栏

long.ndf <- ndf %>% gather(species, cover, Cladonia.portentosa:Eriophorum.vaginatum)

轻松愉快！再次感谢roman的建议！

Answer 3

我正在回答你的问题，以防它可以帮助使用 reshape 功能的人。

Please could someone tell me where I've gone wrong!?

您没有指定参数 idvar，reshape 已经为您创建了一个名为 id 的参数。为了避免它，只需在您的代码中添加行 idvar = "Year" :

ndf <- read.table(text = 
  "Year Cladonia.portentosa Erica.tetralix Eriophorum.vaginatum
    1 2014               11.75             35                   55     
    2 2015               15.75          25.75                   70      
    3 2016               22.75              5                 37.5", 
  header=TRUE, stringsAsFactors = F)

reshape.ndf <- reshape(ndf, 
  varying = list(names(ndf)[2:4]), 
  v.names = "cover",
  idvar = "Year",
  timevar = "species",
  times = names(ndf[2:4]),
  new.row.names = 1:9,
  direction = "long")

结果如您所料

reshape.ndf
  Year              species cover
1 2014  Cladonia.portentosa 11.75
2 2015  Cladonia.portentosa 15.75
3 2016  Cladonia.portentosa 22.75
4 2014       Erica.tetralix 35.00
5 2015       Erica.tetralix 25.75
6 2016       Erica.tetralix  5.00
7 2014 Eriophorum.vaginatum 55.00
8 2015 Eriophorum.vaginatum 70.00
9 2016 Eriophorum.vaginatum 37.50

重塑 R 中的问题：我重塑的数据框将 3 个变量更改为 1 个

Reshaping issues in R: my reshaped dataframe changes 3 variables into 1

r

reshape