创建具有多行的新列

Question

我有一个数据框如下：

dat <- read.table(text=
"ID | Year | Month | Variable | Value1 | Value2 | Value3
  1 | 1950 |   1   |   PRCP   |  0     |   1    |   0
  1 | 1950 |   1   |   TMAX   |  52    |   51   |   52
  1 | 1950 |   1   |   TMIN   |  41    |   41   |   39
  1 | 1950 |   2   |   PRCP   |  1     |   0    |   1
  1 | 1950 |   2   |   TMAX   |  55    |   57   |   58",
  header=TRUE, sep="|")

有 50 个站点 ID，年份跨度为 1950-2005，月份为 1-12，有 3 个天气变量（PRCP、TMAX 和 TMIN），然后列 Value1-Value31 代表一个月中的每一天天气变量测量。

我想创建一个如下所示的数据框：

ID | Date       | PRCP
1  | 1950-01-01 |  0
1  | 1950-01-02 |  1
1  | 1950-01-03 |  0

到目前为止，我已经能够为每个天气变量创建 3 个单独的数据集，但我不知道如何创建新列并相应地扩展行（每天需要 31 个新行这个月）。我是 R 的新手，非常感谢任何帮助 - 谢谢！

Answer 1

我们可以使用 data.table 中的 melt/dcast。我们将 'data.frame' 转换为 'data.table' (setDT(dat))，使用 melt 从 'wide' 重塑为 'long' 格式，创建序列列 ('ind') 按 'ID'、'Year'、'Month' 和 'Variable' 分组。通过粘贴 'Year'、'Month' 和 'ind' 创建 'Date' 列，然后使用 dcast 重塑为 'wide' 格式。我们可以将所有信息都放在一个数据集中，而不是创建三个单独的数据集。

library(data.table)#v1.9.6+
dM <- melt(setDT(dat), measure=patterns('^Value'))
dM1 <- dM[, ind:= 1:.N, by = .(ID, Year, Month, Variable)]
dM1[, Date:=as.Date(sprintf('%04d-%02d-%02d', Year, Month, ind))]
dcast(dM1, ID+Date~Variable, value.var='value1')
#   ID       Date PRCP TMAX TMIN
#1:  1 1950-01-01    0   52   41
#2:  1 1950-01-02    1   51   41
#3:  1 1950-01-03    0   52   39
#4:  1 1950-02-01    1   55   NA
#5:  1 1950-02-02    0   57   NA
#6:  1 1950-02-03    1   58   NA

注意：在示例数据中，OP 仅提供了 3 个值列。我猜在原始数据集中，它将是 31 列。

数据

dat <-  structure(list(ID = c(1, 1, 1, 1, 1), Year = c(1950, 1950, 1950, 
1950, 1950), Month = c(1, 1, 1, 2, 2), Variable = c("PRCP", "TMAX", 
"TMIN", "PRCP", "TMAX"), Value1 = c(0, 52, 41, 1, 55), Value2 = c(1, 
51, 41, 0, 57), Value3 = c(0, 52, 39, 1, 58)), .Names = c("ID", 
"Year", "Month", "Variable", "Value1", "Value2", "Value3"),
row.names = c(NA, -5L), class = "data.frame")

创建具有多行的新列

Create new column with multiple rows

row

r

data-manipulation

reshape

数据