将多个矩阵值分配给数据框的单列

Question

我有以下数据框，其中包含 163 只猴子的信息：

> head(vervetdf)
    ucla_id                  country                             species Gender pi
1   A8516_M_2                 Barbados                 Chlorocebus sabaeus      M NA
2   AG23_F_10                 Tanzania Chlorocebus pygerythrus pygerythrus      F NA
3  AG5417_F_10                 Tanzania Chlorocebus pygerythrus pygerythrus      F NA
4  AGM126_F_1 Central African Republic               Chlorocebus tantalus       F NA
5  AGM127_F_1 Central African Republic                Chlorocebus tantalus      F NA
6  AGM129_F_1 Central African Republic                Chlorocebus tantalus      F NA

> str(vervetdf)
'data.frame':   163 obs. of  5 variables:
 $ ucla_id: Factor w/ 163 levels "A8516_M_2","AG23_F_10",..: 1 2 3 4 5 6 7 8        9 10 ...
 $ country: Factor w/ 12 levels "Barbados","Botswana",..: 1 11 11 3 3 3 3 3 3 3 ...
 $ species: Factor w/ 5 levels "Chlorocebus aethiops aethiops",..: 4 3 3 5 5 5 5 5 5 5 ...
 $ Gender : Factor w/ 2 levels "F","M": 2 1 1 1 1 1 1 2 1 2 ...
 $ pi     : logi  NA NA NA NA NA NA ...

我需要为每只猴子添加 pi 值以进行分析和绘图，因此我创建了新列 pi。 Pi对于同种的所有猴子都是一样的（我有5种），但是在windows中计算，所以每只猴子有1300个pi值。我有一个矩阵，其中包含每个物种的 pi 值：

> head(corrected_pi)
          pi1         pi2         pi3         pi4         pi5
w1.ce 0.001918322 0.002408772 0.002306475 0.002086117 0.002501300
w2.ce 0.002125624 0.002779025 0.002620691 0.002599817 0.002847614
w3.ce 0.001512895 0.001886345 0.001867847 0.001658217 0.001875594
w4.ce 0.002340536 0.002637327 0.002736944 0.002252872 0.002848985
w5.ce 0.001329015 0.001553925 0.001654385 0.001654023 0.001806535
w6.ce 0.001326739 0.001595000 0.001487649 0.001417510 0.001581388

> dim(corrected_pi)
[1] 1300    5

那么，有没有一种方法可以将所有 pi 值分配给数据框的一列中的相应物种？

Answer 1

您可以使用 tidyr 包中的 nest 在一栏中列出一个物种的所有 pi 值。然后用merge加入新的pitable和vervetdf。在这里，我们假设您还没有为 vervetdf$pi 创建 NA 列，因为 merge 会为您创建：

library(tidyr)
new.pi <- nest(data.frame(species=factor(levels(vervetdf$species), levels=levels(vervetdf$species)), t(corrected.pi)), -species, .key=pi)
result <- merge(vervetdf, new.pi, by="species", sort=FALSE)

鉴于您发布的数据有限（只有 6 行 corrected.pi）：

print(result)
##                              species     ucla_id                  country Gender                                                                           pi
##1                 Chlorocebus sabaeus   A8516_M_2                 Barbados      M 0.002306475, 0.002620691, 0.001867847, 0.002736944, 0.001654385, 0.001487649
##2 Chlorocebus pygerythrus pygerythrus   AG23_F_10                 Tanzania      F 0.002408772, 0.002779025, 0.001886345, 0.002637327, 0.001553925, 0.001595000
##3 Chlorocebus pygerythrus pygerythrus AG5417_F_10                 Tanzania      F 0.002408772, 0.002779025, 0.001886345, 0.002637327, 0.001553925, 0.001595000
##4                Chlorocebus tantalus  AGM126_F_1 Central African Republic      F 0.002086117, 0.002599817, 0.001658217, 0.002252872, 0.001654023, 0.001417510
##5                Chlorocebus tantalus  AGM127_F_1 Central African Republic      F 0.002086117, 0.002599817, 0.001658217, 0.002252872, 0.001654023, 0.001417510
##6                Chlorocebus tantalus  AGM129_F_1 Central African Republic      F 0.002086117, 0.002599817, 0.001658217, 0.002252872, 0.001654023, 0.001417510

备注：

new.pi 是一个包含 5 行的数据框，每行代表您的每个物种。
new.pi是一个数据框有两个columns：
- species：这是使用 vervetdf$species 列的水平创建的一个因子。这允许我们稍后加入两个 table。
- pi：由nest创建。请注意，nest 创建一个新列，以 .key 参数命名，它是嵌套列中的值列表。 nest 的第一个参数是要嵌套列的数据框。在这里，我们构建了一个临时数据框，它是 species 列加上 corrected.pi 的所有行（即 t(corrected.pi)）。然后我们 select 除了要嵌套的 species 列之外的所有列（即 -species）

将多个矩阵值分配给数据框的单列

Assigning multiple matrix values to single column of data frame

r

bioinformatics