如何根据最大值重塑 R 中的数据框?

How to reshape a data frame in R, conditioned on a maximum value?

我在 R 中重塑我的数据框时遇到了一些困难。我有 5 个个体:A、B、C、D 和 E。一些个体有 1 个观察值,一些有 2 个观察值。我测量了 3 个每个观察值:X、Y 和 Z。我想将我的数据框从长格式转换为宽格式,为每个人生成一行和两组标记为 X、Y 和 Z 的列。但是,我想以 X 的值为条件,使得具有最大值 X 的观察集首先出现。因此,对于给定的观察,X、Y 和 Z 的值必须保持分组在一起,但是观察 1 或 2 的值是否首先出现取决于哪个具有最大值 X。

df = data.frame(
  indiv = c("A","A","B","C","C","D","D","E"),
  observ = c(1,2,1,1,2,1,2,1),
  X = c(rnorm(8, mean = 10, sd = 6)),
  Y = c(rnorm(8, mean = 0, sd = 2)),
  Z = c(rnorm(8, mean = 4, sd = 4))
)

        indiv   observ  X   Y   Z
1   A   1   9.959043    1.785043    10.134511
2   A   2   14.122006   -2.257666   5.799366
3   B   1   11.562801   -1.394951   4.988923
4   C   1   12.955644   -4.330272   8.870165
5   C   2   13.582154   -1.727224   -7.5617
6   D   1   4.053437    1.815233    1.789157
7   D   2   12.990071   -1.989307   3.67696
8   E   1   2.820895    -3.754263   3.001725

下面是我希望宽数据框的样子。对于个体 A,观察 2 中的 X 更大,因此该组值 (X,Y,Z) 首先出现。相比之下,对于个人 C 和 D,X 在观察 1 中更大,因此该集合首先出现。我认为它应该是 reshape 函数的一些变化,但我不确定如何以 X 的最大值为条件。提前致谢!

        indiv   observ  X   Y   Z   observ  X   Y   Z
1   A   2   18.797087   0.3247862   4.774446    1   8.547868    0.3203667   6.729975
2   B   1   1.646638    0.7986036   6.938825    NA  NA  NA  NA
3   C   1   17.354905   -2.399272   8.357045    2   6.856093    0.6493722   2.420827
4   D   1   16.058101   -1.2370024  4.045489    2   7.641576    3.0820116   4.232615
5   E   1   13.625998   -0.1953445  -5.627932   NA  NA  NA  NA

我会在施法前点单。以下使用 data.table 因为 dcast 函数也在该包中 - 可以使用普通的 data.frame 和 reshape 以及

library(data.table)
set.seed(1)
df = data.frame(
  indiv = c("A","A","B","C","C","D","D","E"),
  observ = c(1,2,1,1,2,1,2,1),
  X = c(rnorm(8, mean = 10, sd = 6)),
  Y = c(rnorm(8, mean = 0, sd = 2)),
  Z = c(rnorm(8, mean = 4, sd = 4))
)
df
   indiv observ         X           Y         Z
1:     A      2 11.101860 -0.61077677  7.775345
2:     A      1  6.241277  1.15156270  3.935239
3:     B      1  4.986228  3.02356234  7.284885
4:     C      1 19.571685  0.77968647  6.375605
5:     C      2 11.977047 -1.24248116  7.675909
6:     D      2 12.924574  2.24986184  4.298260
7:     D      1  5.077190 -4.42939977  7.128545
8:     E      1 14.429948 -0.08986722 -3.957407

setDT(df)
df <- df[order(indiv,-X)] #orders your frame
df[, observ := as.numeric(1:.N), by = indiv] #reset observ based on new order

df
   indiv observ         X           Y         Z
1:     A      1 11.101860 -0.61077677  7.775345
2:     A      2  6.241277  1.15156270  3.935239
3:     B      1  4.986228  3.02356234  7.284885
4:     C      1 19.571685  0.77968647  6.375605
5:     C      2 11.977047 -1.24248116  7.675909
6:     D      1 12.924574  2.24986184  4.298260
7:     D      2  5.077190 -4.42939977  7.128545
8:     E      1 14.429948 -0.08986722 -3.957407

现在正常施放:

dcast(df, indiv ~ observ, value.var = c("X","Y","Z"))

   indiv       X_1       X_2         Y_1       Y_2       Z_1      Z_2
1:     A 11.101860  6.241277 -0.61077677  1.151563  7.775345 3.935239
2:     B  4.986228        NA  3.02356234        NA  7.284885       NA
3:     C 19.571685 11.977047  0.77968647 -1.242481  6.375605 7.675909
4:     D 12.924574  5.077190  2.24986184 -4.429400  4.298260 7.128545
5:     E 14.429948        NA -0.08986722        NA -3.957407       NA

要得到你想要的柱序,我觉得你需要先融化再投:

dcast(melt(df, id.vars = c("indiv","observ")), indiv ~ observ + variable)
   indiv       1_X         1_Y       1_Z       2_X       2_Y      2_Z
1:     A 11.101860 -0.61077677  7.775345  6.241277  1.151563 3.935239
2:     B  4.986228  3.02356234  7.284885        NA        NA       NA
3:     C 19.571685  0.77968647  6.375605 11.977047 -1.242481 7.675909
4:     D 12.924574  2.24986184  4.298260  5.077190 -4.429400 7.128545
5:     E 14.429948 -0.08986722 -3.957407        NA        NA       NA