在 R 中,将数据帧对角线转换为行
In R, convert data frame diagonals to rows
我正在开发一个模型来预测某个年龄组的完全生育能力。我目前有一个这样的数据框,其中行是年龄,列是年份。每个单元格中的值是当年的特定年龄生育率:
> df1
iso3 sex age fert1953 fert1954 fert1955
14 AUS female 13 0.000 0.00000 0.00000
15 AUS female 14 0.000 0.00000 0.00000
16 AUS female 15 13.108 13.42733 13.74667
17 AUS female 16 26.216 26.85467 27.49333
18 AUS female 17 39.324 40.28200 41.24000
但是,我想要的是每一行都是一个队列。因为行和列代表个别年份,所以可以通过获取对角线来获得队列数据。我正在寻找这样的结果:
> df2
iso3 sex ageIn1953 fert1953 fert1954 fert1955
14 AUS female 13 0.000 0.00000 13.74667
15 AUS female 14 0.000 13.42733 27.49333
16 AUS female 15 13.108 26.85467 41.24000
17 AUS female 16 26.216 40.28200 [data..]
18 AUS female 17 39.324 [data..] [data..]
这是 df1
数据框:
df1 <- structure(list(iso3 = c("AUS", "AUS", "AUS", "AUS", "AUS"), sex = c("female",
"female", "female", "female", "female"), age = c(13, 14, 15,
16, 17), fert1953 = c(0, 0, 13.108, 26.216, 39.324), fert1954 = c(0,
0, 13.4273333333333, 26.8546666666667, 40.282), fert1955 = c(0,
0, 13.7466666666667, 27.4933333333333, 41.24)), .Names = c("iso3",
"sex", "age", "fert1953", "fert1954", "fert1955"), class = "data.frame", row.names = 14:18)
编辑:
这是我最终使用的解决方案。它基于 David 的回答,但我需要为 iso3
.
的每个级别执行此操作
df.ls <- lapply(split(f3, f = f3$iso3), FUN = function(df1) {
n <- ncol(df1) - 4
temp <- mapply(function(x, y) lead(x, n = y), df1[, -seq_len(4)], seq_len(n))
return(cbind(df1[seq_len(4)], temp))
})
f4 <- do.call("rbind", df.ls)
我还没有测试过速度,但是 data.table
v1.9.5,最近实现了一个新的(用 C 语言编写)lead/lag 函数,叫做 shift
因此对于您要移动的列,您可以将它与 mapply
结合使用,例如
library(data.table)
n <- ncol(df1) - 4 # the number of years - 1
temp <- mapply(function(x, y) shift(x, n = y, type = "lead"), df1[, -seq_len(4)], seq_len(n))
cbind(df1[seq_len(4)], temp) # combining back with the unchanged columns
# iso3 sex age fert1953 fert1954 fert1955
# 14 AUS female 13 0.000 0.00000 13.74667
# 15 AUS female 14 0.000 13.42733 27.49333
# 16 AUS female 15 13.108 26.85467 41.24000
# 17 AUS female 16 26.216 40.28200 NA
# 18 AUS female 17 39.324 NA NA
编辑:您可以使用
从GitHub轻松安装data.table
的开发版本
library(devtools)
install_github("Rdatatable/data.table", build_vignettes = FALSE)
不管怎样,如果你想要 dplyr
,这里是
library(dplyr)
n <- ncol(df1) - 4 # the number of years - 1
temp <- mapply(function(x, y) lead(x, n = y), df1[, -seq_len(4)], seq_len(n))
cbind(df1[seq_len(4)], temp)
# iso3 sex age fert1953 fert1954 fert1955
# 14 AUS female 13 0.000 0.00000 13.74667
# 15 AUS female 14 0.000 13.42733 27.49333
# 16 AUS female 15 13.108 26.85467 41.24000
# 17 AUS female 16 26.216 40.28200 NA
# 18 AUS female 17 39.324 NA NA
这是一个基本的 R 方法:
df1[,5:ncol(df1)] <- mapply(function(x, y) {vec.list <- df1[-1:-y, x]
length(vec.list) <- nrow(df1)
vec.list},
x=5:ncol(df1), y=1:(ncol(df1)-4))
df1
# iso3 sex age fert1953 fert1954 fert1955
#14 AUS female 13 0.000 0.00000 13.74667
#15 AUS female 14 0.000 13.42733 27.49333
#16 AUS female 15 13.108 26.85467 41.24000
#17 AUS female 16 26.216 40.28200 NA
#18 AUS female 17 39.324 NA NA
我正在开发一个模型来预测某个年龄组的完全生育能力。我目前有一个这样的数据框,其中行是年龄,列是年份。每个单元格中的值是当年的特定年龄生育率:
> df1
iso3 sex age fert1953 fert1954 fert1955
14 AUS female 13 0.000 0.00000 0.00000
15 AUS female 14 0.000 0.00000 0.00000
16 AUS female 15 13.108 13.42733 13.74667
17 AUS female 16 26.216 26.85467 27.49333
18 AUS female 17 39.324 40.28200 41.24000
但是,我想要的是每一行都是一个队列。因为行和列代表个别年份,所以可以通过获取对角线来获得队列数据。我正在寻找这样的结果:
> df2
iso3 sex ageIn1953 fert1953 fert1954 fert1955
14 AUS female 13 0.000 0.00000 13.74667
15 AUS female 14 0.000 13.42733 27.49333
16 AUS female 15 13.108 26.85467 41.24000
17 AUS female 16 26.216 40.28200 [data..]
18 AUS female 17 39.324 [data..] [data..]
这是 df1
数据框:
df1 <- structure(list(iso3 = c("AUS", "AUS", "AUS", "AUS", "AUS"), sex = c("female",
"female", "female", "female", "female"), age = c(13, 14, 15,
16, 17), fert1953 = c(0, 0, 13.108, 26.216, 39.324), fert1954 = c(0,
0, 13.4273333333333, 26.8546666666667, 40.282), fert1955 = c(0,
0, 13.7466666666667, 27.4933333333333, 41.24)), .Names = c("iso3",
"sex", "age", "fert1953", "fert1954", "fert1955"), class = "data.frame", row.names = 14:18)
编辑:
这是我最终使用的解决方案。它基于 David 的回答,但我需要为 iso3
.
df.ls <- lapply(split(f3, f = f3$iso3), FUN = function(df1) {
n <- ncol(df1) - 4
temp <- mapply(function(x, y) lead(x, n = y), df1[, -seq_len(4)], seq_len(n))
return(cbind(df1[seq_len(4)], temp))
})
f4 <- do.call("rbind", df.ls)
我还没有测试过速度,但是 data.table
v1.9.5,最近实现了一个新的(用 C 语言编写)lead/lag 函数,叫做 shift
因此对于您要移动的列,您可以将它与 mapply
结合使用,例如
library(data.table)
n <- ncol(df1) - 4 # the number of years - 1
temp <- mapply(function(x, y) shift(x, n = y, type = "lead"), df1[, -seq_len(4)], seq_len(n))
cbind(df1[seq_len(4)], temp) # combining back with the unchanged columns
# iso3 sex age fert1953 fert1954 fert1955
# 14 AUS female 13 0.000 0.00000 13.74667
# 15 AUS female 14 0.000 13.42733 27.49333
# 16 AUS female 15 13.108 26.85467 41.24000
# 17 AUS female 16 26.216 40.28200 NA
# 18 AUS female 17 39.324 NA NA
编辑:您可以使用
从GitHub轻松安装data.table
的开发版本
library(devtools)
install_github("Rdatatable/data.table", build_vignettes = FALSE)
不管怎样,如果你想要 dplyr
,这里是
library(dplyr)
n <- ncol(df1) - 4 # the number of years - 1
temp <- mapply(function(x, y) lead(x, n = y), df1[, -seq_len(4)], seq_len(n))
cbind(df1[seq_len(4)], temp)
# iso3 sex age fert1953 fert1954 fert1955
# 14 AUS female 13 0.000 0.00000 13.74667
# 15 AUS female 14 0.000 13.42733 27.49333
# 16 AUS female 15 13.108 26.85467 41.24000
# 17 AUS female 16 26.216 40.28200 NA
# 18 AUS female 17 39.324 NA NA
这是一个基本的 R 方法:
df1[,5:ncol(df1)] <- mapply(function(x, y) {vec.list <- df1[-1:-y, x]
length(vec.list) <- nrow(df1)
vec.list},
x=5:ncol(df1), y=1:(ncol(df1)-4))
df1
# iso3 sex age fert1953 fert1954 fert1955
#14 AUS female 13 0.000 0.00000 13.74667
#15 AUS female 14 0.000 13.42733 27.49333
#16 AUS female 15 13.108 26.85467 41.24000
#17 AUS female 16 26.216 40.28200 NA
#18 AUS female 17 39.324 NA NA