在 R 中使用等式添加列

Question

所以我正在尝试使用 R 简化用于在数据框中生成新列的代码。我的数据组织如下（第 1-4 列），我想生成第 5 列（未标记):

col1__col2__col3__col4______
 t1    f1    A    20     0
 t1    f2    A    19     0
 t1    f3    A    21     0
 t1    f1    B    25     5
 t1    f2    B    25     6
 t1    f3    B    26     5
 t2    f1    A    18     0
 t2    f2    A    19     0
 t2    f3    A    18     0
 t2    f1    B    20     2
 t2    f2    B    20     1
 t2    f3    B    20     2

编辑：第 5 列看起来像这样（方程式）。它从 col4 中获取 t1,f1 的值，然后从 t1,f1 中减去该值，得到 col3 = "A"。所以在第 1 行中，它需要 20 并减去完全相同的 20。对于第 4 行，它需要 25，并减去第 1 行中找到的 20，因为这两行都引用处理 t1 中的样本 f1，但我正在测量两个不同事物（A 和 B）的值。所以上面找到的第5列是这样计算的：

col5
(20-20)
(19-19)
(21-21)
(25-20)
(25-19)
(26-21)
etc...

添加一个列既好又容易，但我很难找到在所有这些条件下构建的好方法。如果有人对如何编码提出建议，and/or 如何更好地组织我的数据以使事情变得更容易，我将非常感激！到目前为止，我只是在 MS excel :\

中手动生成第 5 列的值

干杯

Edit2：已回答。非常感谢所有回复的人！

Answer 1

所以如果我理解正确的话，如果col3 == "B"，那么你取col3 == "A"所在的匹配行并从col4中减去相应的值？然后，你需要这样的东西（假设你的数据框被称为 df）：

for(i in 1:dim(df)[1] {
  if(df[i, 3] == "B") {
    df[i, 5] <- df[i, 4] - df[which(df[1:(i-1), 1] == df[i, 1] & df[1:(i-1),2] == df[i, 2] & df[1:(i-1),3] == "A"), 4]
  }
}

修正了原文中的拼写错误 post。

Answer 2

df = df[order(df$col1,df$col3,df$col2),]          ## make sure you have it ordered right
flength = length(unique(df$col2))            ## get the length of unique col2
alength = length(unique(df$col3))            ## get the length of unique col3
Avector = df[df$col3=="A","col4"]             ## get the elements of col 4 with col3="A"
sapplyVec = (1:alength) - 1                  ## create vector to sapply over

## take the elements in Avector in sections of size flength and repeat those
## section alength times.
Avector = c(sapply(sapplyVec ,function(x) rep(Avector[c(1:flength)+(x*flength)],alength)))

这采用从 col4 创建的向量，其中 col3="A"。然后它重复大小为 flength 的块（在你的情况下为 3），alength 次（在你的情况下为 2）。从这里您可以将新列添加为 col4 - Avector

df$col5 = df$col4 - Avector

Answer 3

虽然 user2864849 的系统适用于此示例数据框，但在尝试将其应用于我的真实数据时，它最终给出了第 5 列中应该有的值数量两倍的输出。我不明白为什么，但这与它如何处理 sapply 函数有关。重新审视这个问题，我意识到有一个非常简单但更长的编码解决方案可以工作，使用 user286 的代码提醒生成排序数据的新向量。

我为第 3 列中的每个子集生成了第 4 列中的值的向量。然后我对数据帧进行排序，使其以与我生成的向量的顺序相同的形式出现。然后我创建了一个新的向量，将这些单独的向量组合起来生成第 5 列。最后我将第 5 列添加到排序后的数据框中。

#Define variables - optional
col1<-as.factor(df$col1)
col2<-as.factor(df$col2)
col3<-as.factor(df$col3)
col4<-df$col4

## Create vectors of Cq values for each gene
col3Avec = df[col3=="A","col4"]  
col3Bvec = df[col3=="B","col4"]

#Create vectors of dCq values of each gene
col5A<-col3Avec-col3Avec
Col5B<-col3Bvec-col3Avec

#Sort dataframe so its order matches the order of the dCq vectors
dfsort <- df[order(col3,col1,col2),]

#Add dCq vectors in correct order as new column to sorted dataframe
dfsort$col5<-c(col5A,col5B)

#Total = 6 lines of codes not including variable definitions

无论长度如何，或者样本大小不等，我认为这种方法都会奏效。看起来代码很多，但如果所有变量在您应用此代码的数据中的命名一致，则应用它需要进行的重新编码最少。

在 R 中使用等式添加列

Add column using equation in R

r

calculated-columns