平均两列在 R 中部分匹配(大数据)
Averaging two columns with partial matching in R (big size data)
我有一个矩阵,它有 562709 行和 803 列,对于列,它有 7 个元数据和 796 个复制数据,它们应该成对组合并取平均值。
我用谷歌搜索了同样的问题,但效果不佳,尤其是因为我的矩阵太大了,所以我想知道是否有人可以帮助我解决这个问题。
[1] "seqnames"
[2] "start"
[3] "end"
[4] "name"
[5] "score"
[6] "annotation"
[7] "GC"
[8] "ACCx_025FE5F8_885E_433D_9018_7AE322A92285_X034_S09_L133_B1_T1_PMRG"
[9] "ACCx_025FE5F8_885E_433D_9018_7AE322A92285_X034_S09_L134_B1_T2_PMRG"
[10] "ACCx_2A5AE757_20D5_49B6_95FF_CAE08E8197A0_X012_S05_L033_B1_T1_P024"
[11] "ACCx_2A5AE757_20D5_49B6_95FF_CAE08E8197A0_X012_S05_L034_B1_T2_P025"
[12] "ACCx_3D0CD3BD_3960_46FB_92C3_777F11CCD0FC_X011_S06_L011_B1_T1_P024"
[13] "ACCx_3D0CD3BD_3960_46FB_92C3_777F11CCD0FC_X011_S06_L012_B1_T2_P026"
[14] "ACCx_4D0D43F5_D8F0_4735_92D5_F40E321C7A05_X010_S09_L065_B1_T1_P019"
[15] "ACCx_4D0D43F5_D8F0_4735_92D5_F40E321C7A05_X010_S09_L066_B1_T2_P020"
[16] "ACCx_81A262BD_3078_4BDB_8EB1_30DD6D7948C3_X027_S03_L081_B1_T1_P063"
[17] "ACCx_81A262BD_3078_4BDB_8EB1_30DD6D7948C3_X027_S03_L082_B1_T2_P067"
...
[800]"UCEC_C335297F_2D63_4973_9182_FA18C28E001E_X037_S04_L055_B1_T1_P088"
[801]"UCEC_C335297F_2D63_4973_9182_FA18C28E001E_X037_S04_L056_B1_T2_P089"
[802]"UCEC_D820B024_6B3B_4B5B_866E_F9A8139C270B_X039_S09_L113_B1_T1_P099"
[803]"UCEC_D820B024_6B3B_4B5B_866E_F9A8139C270B_X039_S09_L114_B1_T2_P098"
和上面一样,前 7 列不应修改,但对于第 8 列,应合并 pair 作为它们的平均值。 (例如,应合并第 8,9 列,以及 10,11...)
获取交替列,将它们相加,然后除以 2:
# example data, 5 rows, 11 cols
x <- mtcars[1:5, ]
cbind(
# keep first 7 columns as is
x[ 1:7 ],
# then take alternating cols, add, and, divide by 2
(x[ 8:11 ][, c(TRUE, FALSE) ] + x[ 8:11 ][, c(FALSE, TRUE) ]) / 2
)
我有一个矩阵,它有 562709 行和 803 列,对于列,它有 7 个元数据和 796 个复制数据,它们应该成对组合并取平均值。
我用谷歌搜索了同样的问题,但效果不佳,尤其是因为我的矩阵太大了,所以我想知道是否有人可以帮助我解决这个问题。
[1] "seqnames"
[2] "start"
[3] "end"
[4] "name"
[5] "score"
[6] "annotation"
[7] "GC"
[8] "ACCx_025FE5F8_885E_433D_9018_7AE322A92285_X034_S09_L133_B1_T1_PMRG"
[9] "ACCx_025FE5F8_885E_433D_9018_7AE322A92285_X034_S09_L134_B1_T2_PMRG"
[10] "ACCx_2A5AE757_20D5_49B6_95FF_CAE08E8197A0_X012_S05_L033_B1_T1_P024"
[11] "ACCx_2A5AE757_20D5_49B6_95FF_CAE08E8197A0_X012_S05_L034_B1_T2_P025"
[12] "ACCx_3D0CD3BD_3960_46FB_92C3_777F11CCD0FC_X011_S06_L011_B1_T1_P024"
[13] "ACCx_3D0CD3BD_3960_46FB_92C3_777F11CCD0FC_X011_S06_L012_B1_T2_P026"
[14] "ACCx_4D0D43F5_D8F0_4735_92D5_F40E321C7A05_X010_S09_L065_B1_T1_P019"
[15] "ACCx_4D0D43F5_D8F0_4735_92D5_F40E321C7A05_X010_S09_L066_B1_T2_P020"
[16] "ACCx_81A262BD_3078_4BDB_8EB1_30DD6D7948C3_X027_S03_L081_B1_T1_P063"
[17] "ACCx_81A262BD_3078_4BDB_8EB1_30DD6D7948C3_X027_S03_L082_B1_T2_P067"
...
[800]"UCEC_C335297F_2D63_4973_9182_FA18C28E001E_X037_S04_L055_B1_T1_P088"
[801]"UCEC_C335297F_2D63_4973_9182_FA18C28E001E_X037_S04_L056_B1_T2_P089"
[802]"UCEC_D820B024_6B3B_4B5B_866E_F9A8139C270B_X039_S09_L113_B1_T1_P099"
[803]"UCEC_D820B024_6B3B_4B5B_866E_F9A8139C270B_X039_S09_L114_B1_T2_P098"
和上面一样,前 7 列不应修改,但对于第 8 列,应合并 pair 作为它们的平均值。 (例如,应合并第 8,9 列,以及 10,11...)
获取交替列,将它们相加,然后除以 2:
# example data, 5 rows, 11 cols
x <- mtcars[1:5, ]
cbind(
# keep first 7 columns as is
x[ 1:7 ],
# then take alternating cols, add, and, divide by 2
(x[ 8:11 ][, c(TRUE, FALSE) ] + x[ 8:11 ][, c(FALSE, TRUE) ]) / 2
)