如何根据最大值重塑 R 中的数据框?
How to reshape a data frame in R, conditioned on a maximum value?
我在 R 中重塑我的数据框时遇到了一些困难。我有 5 个个体:A、B、C、D 和 E。一些个体有 1 个观察值,一些有 2 个观察值。我测量了 3 个每个观察值:X、Y 和 Z。我想将我的数据框从长格式转换为宽格式,为每个人生成一行和两组标记为 X、Y 和 Z 的列。但是,我想以 X 的值为条件,使得具有最大值 X 的观察集首先出现。因此,对于给定的观察,X、Y 和 Z 的值必须保持分组在一起,但是观察 1 或 2 的值是否首先出现取决于哪个具有最大值 X。
df = data.frame(
indiv = c("A","A","B","C","C","D","D","E"),
observ = c(1,2,1,1,2,1,2,1),
X = c(rnorm(8, mean = 10, sd = 6)),
Y = c(rnorm(8, mean = 0, sd = 2)),
Z = c(rnorm(8, mean = 4, sd = 4))
)
indiv observ X Y Z
1 A 1 9.959043 1.785043 10.134511
2 A 2 14.122006 -2.257666 5.799366
3 B 1 11.562801 -1.394951 4.988923
4 C 1 12.955644 -4.330272 8.870165
5 C 2 13.582154 -1.727224 -7.5617
6 D 1 4.053437 1.815233 1.789157
7 D 2 12.990071 -1.989307 3.67696
8 E 1 2.820895 -3.754263 3.001725
下面是我希望宽数据框的样子。对于个体 A,观察 2 中的 X 更大,因此该组值 (X,Y,Z) 首先出现。相比之下,对于个人 C 和 D,X 在观察 1 中更大,因此该集合首先出现。我认为它应该是 reshape 函数的一些变化,但我不确定如何以 X 的最大值为条件。提前致谢!
indiv observ X Y Z observ X Y Z
1 A 2 18.797087 0.3247862 4.774446 1 8.547868 0.3203667 6.729975
2 B 1 1.646638 0.7986036 6.938825 NA NA NA NA
3 C 1 17.354905 -2.399272 8.357045 2 6.856093 0.6493722 2.420827
4 D 1 16.058101 -1.2370024 4.045489 2 7.641576 3.0820116 4.232615
5 E 1 13.625998 -0.1953445 -5.627932 NA NA NA NA
我会在施法前点单。以下使用 data.table
因为 dcast 函数也在该包中 - 可以使用普通的 data.frame 和 reshape
以及
library(data.table)
set.seed(1)
df = data.frame(
indiv = c("A","A","B","C","C","D","D","E"),
observ = c(1,2,1,1,2,1,2,1),
X = c(rnorm(8, mean = 10, sd = 6)),
Y = c(rnorm(8, mean = 0, sd = 2)),
Z = c(rnorm(8, mean = 4, sd = 4))
)
df
indiv observ X Y Z
1: A 2 11.101860 -0.61077677 7.775345
2: A 1 6.241277 1.15156270 3.935239
3: B 1 4.986228 3.02356234 7.284885
4: C 1 19.571685 0.77968647 6.375605
5: C 2 11.977047 -1.24248116 7.675909
6: D 2 12.924574 2.24986184 4.298260
7: D 1 5.077190 -4.42939977 7.128545
8: E 1 14.429948 -0.08986722 -3.957407
setDT(df)
df <- df[order(indiv,-X)] #orders your frame
df[, observ := as.numeric(1:.N), by = indiv] #reset observ based on new order
df
indiv observ X Y Z
1: A 1 11.101860 -0.61077677 7.775345
2: A 2 6.241277 1.15156270 3.935239
3: B 1 4.986228 3.02356234 7.284885
4: C 1 19.571685 0.77968647 6.375605
5: C 2 11.977047 -1.24248116 7.675909
6: D 1 12.924574 2.24986184 4.298260
7: D 2 5.077190 -4.42939977 7.128545
8: E 1 14.429948 -0.08986722 -3.957407
现在正常施放:
dcast(df, indiv ~ observ, value.var = c("X","Y","Z"))
indiv X_1 X_2 Y_1 Y_2 Z_1 Z_2
1: A 11.101860 6.241277 -0.61077677 1.151563 7.775345 3.935239
2: B 4.986228 NA 3.02356234 NA 7.284885 NA
3: C 19.571685 11.977047 0.77968647 -1.242481 6.375605 7.675909
4: D 12.924574 5.077190 2.24986184 -4.429400 4.298260 7.128545
5: E 14.429948 NA -0.08986722 NA -3.957407 NA
要得到你想要的柱序,我觉得你需要先融化再投:
dcast(melt(df, id.vars = c("indiv","observ")), indiv ~ observ + variable)
indiv 1_X 1_Y 1_Z 2_X 2_Y 2_Z
1: A 11.101860 -0.61077677 7.775345 6.241277 1.151563 3.935239
2: B 4.986228 3.02356234 7.284885 NA NA NA
3: C 19.571685 0.77968647 6.375605 11.977047 -1.242481 7.675909
4: D 12.924574 2.24986184 4.298260 5.077190 -4.429400 7.128545
5: E 14.429948 -0.08986722 -3.957407 NA NA NA
我在 R 中重塑我的数据框时遇到了一些困难。我有 5 个个体:A、B、C、D 和 E。一些个体有 1 个观察值,一些有 2 个观察值。我测量了 3 个每个观察值:X、Y 和 Z。我想将我的数据框从长格式转换为宽格式,为每个人生成一行和两组标记为 X、Y 和 Z 的列。但是,我想以 X 的值为条件,使得具有最大值 X 的观察集首先出现。因此,对于给定的观察,X、Y 和 Z 的值必须保持分组在一起,但是观察 1 或 2 的值是否首先出现取决于哪个具有最大值 X。
df = data.frame(
indiv = c("A","A","B","C","C","D","D","E"),
observ = c(1,2,1,1,2,1,2,1),
X = c(rnorm(8, mean = 10, sd = 6)),
Y = c(rnorm(8, mean = 0, sd = 2)),
Z = c(rnorm(8, mean = 4, sd = 4))
)
indiv observ X Y Z
1 A 1 9.959043 1.785043 10.134511
2 A 2 14.122006 -2.257666 5.799366
3 B 1 11.562801 -1.394951 4.988923
4 C 1 12.955644 -4.330272 8.870165
5 C 2 13.582154 -1.727224 -7.5617
6 D 1 4.053437 1.815233 1.789157
7 D 2 12.990071 -1.989307 3.67696
8 E 1 2.820895 -3.754263 3.001725
下面是我希望宽数据框的样子。对于个体 A,观察 2 中的 X 更大,因此该组值 (X,Y,Z) 首先出现。相比之下,对于个人 C 和 D,X 在观察 1 中更大,因此该集合首先出现。我认为它应该是 reshape 函数的一些变化,但我不确定如何以 X 的最大值为条件。提前致谢!
indiv observ X Y Z observ X Y Z
1 A 2 18.797087 0.3247862 4.774446 1 8.547868 0.3203667 6.729975
2 B 1 1.646638 0.7986036 6.938825 NA NA NA NA
3 C 1 17.354905 -2.399272 8.357045 2 6.856093 0.6493722 2.420827
4 D 1 16.058101 -1.2370024 4.045489 2 7.641576 3.0820116 4.232615
5 E 1 13.625998 -0.1953445 -5.627932 NA NA NA NA
我会在施法前点单。以下使用 data.table
因为 dcast 函数也在该包中 - 可以使用普通的 data.frame 和 reshape
以及
library(data.table)
set.seed(1)
df = data.frame(
indiv = c("A","A","B","C","C","D","D","E"),
observ = c(1,2,1,1,2,1,2,1),
X = c(rnorm(8, mean = 10, sd = 6)),
Y = c(rnorm(8, mean = 0, sd = 2)),
Z = c(rnorm(8, mean = 4, sd = 4))
)
df
indiv observ X Y Z
1: A 2 11.101860 -0.61077677 7.775345
2: A 1 6.241277 1.15156270 3.935239
3: B 1 4.986228 3.02356234 7.284885
4: C 1 19.571685 0.77968647 6.375605
5: C 2 11.977047 -1.24248116 7.675909
6: D 2 12.924574 2.24986184 4.298260
7: D 1 5.077190 -4.42939977 7.128545
8: E 1 14.429948 -0.08986722 -3.957407
setDT(df)
df <- df[order(indiv,-X)] #orders your frame
df[, observ := as.numeric(1:.N), by = indiv] #reset observ based on new order
df
indiv observ X Y Z
1: A 1 11.101860 -0.61077677 7.775345
2: A 2 6.241277 1.15156270 3.935239
3: B 1 4.986228 3.02356234 7.284885
4: C 1 19.571685 0.77968647 6.375605
5: C 2 11.977047 -1.24248116 7.675909
6: D 1 12.924574 2.24986184 4.298260
7: D 2 5.077190 -4.42939977 7.128545
8: E 1 14.429948 -0.08986722 -3.957407
现在正常施放:
dcast(df, indiv ~ observ, value.var = c("X","Y","Z"))
indiv X_1 X_2 Y_1 Y_2 Z_1 Z_2
1: A 11.101860 6.241277 -0.61077677 1.151563 7.775345 3.935239
2: B 4.986228 NA 3.02356234 NA 7.284885 NA
3: C 19.571685 11.977047 0.77968647 -1.242481 6.375605 7.675909
4: D 12.924574 5.077190 2.24986184 -4.429400 4.298260 7.128545
5: E 14.429948 NA -0.08986722 NA -3.957407 NA
要得到你想要的柱序,我觉得你需要先融化再投:
dcast(melt(df, id.vars = c("indiv","observ")), indiv ~ observ + variable)
indiv 1_X 1_Y 1_Z 2_X 2_Y 2_Z
1: A 11.101860 -0.61077677 7.775345 6.241277 1.151563 3.935239
2: B 4.986228 3.02356234 7.284885 NA NA NA
3: C 19.571685 0.77968647 6.375605 11.977047 -1.242481 7.675909
4: D 12.924574 2.24986184 4.298260 5.077190 -4.429400 7.128545
5: E 14.429948 -0.08986722 -3.957407 NA NA NA