需要利用R的回收一次
Need to take advantage of R's recycling for once
我想根据第一列对列 df$candidate1 到 df$candidate5 中的单元格进行索引。
在某些情况下,候选人是空的,在这种情况下我想循环回到第一个候选人。例如,我可以让 df$old=4 和 df$candidate1=0.47 以及 df$candidate2=-0.14 和 candidates 3/4/5 全部空白。在那种情况下,我想检索 candidate2(因为它搜索向量 c(0.47,-0.14,0.47,-0.14) 并检索第四个元素。
old candidate1 candidate2 candidate3 candidate4 candidate5 new
1 4 0.47 -0.14 NA NA NA -0.14
下面是一个可重现的示例虚拟数据和一个不回收的for循环(但显示了过程的基础知识).
问:如何进行回收?
set.seed(123)
size <- 10
df <- data.frame(old = sample(1:5, size, replace = TRUE),
candidate1 = rnorm(size),
candidate2 = rnorm(size),
candidate3 = rnorm(size),
candidate4 = rnorm(size),
candidate5 = rnorm(size))
df$candidate5 <- ifelse(runif(size, 0, 1) < 0.5, NA, df$candidate5) # sometimes this (and other) columns are empty, I want to recycle over at candidate1
# this for loop works but it doesn't recycle
new <- vector(mode = "numeric", length = size)
for (i in 1:size){
new[i] <- df[i,1+df$old[i]]
}
df$new <- new
df$new[6] <- df$candidate1[6] # filling in the missed cell because my for loop doesn't recycle, value = 0.11068272
# conceptually, below is what I tried first, but this pulls a full vector for each row (and will overload R, so test with caution!)
df$new <- df[,2:6][df$old]
上述可重现示例的输出
# df[6,7] (0.11068272) was filled in manually to show the desired output
old candidate1 candidate2 candidate3 candidate4 candidate5 new
1 3 -0.6868529 0.7013559 -1.13813694 -0.3059627 0.77996512 -1.13813694
2 3 -0.4456620 -0.4727914 1.25381492 -0.3804710 -0.08336907 1.25381492
3 2 1.2240818 -1.0678237 0.42646422 -0.6947070 0.25331851 -1.06782371
4 2 0.3598138 -0.2179749 -0.29507148 -0.2079173 NA -0.21797491
5 3 0.4007715 -1.0260044 0.89512566 -1.2653964 -0.04287046 0.89512566
6 5 0.1106827 -0.7288912 0.87813349 2.1689560 NA 0.11068272
7 4 -0.5558411 -0.6250393 0.82158108 1.2079620 NA 1.20796200
8 1 1.7869131 -1.6866933 0.68864025 -1.1231086 NA 1.78691314
9 2 0.4978505 0.8377870 0.55391765 -0.4028848 NA 0.83778704
10 3 -1.9666172 0.1533731 -0.06191171 -0.4666554 0.58461375 -0.06191171
我认为这行得通。我将候选列提取到矩阵中,回收它们,然后使用该回收对象创建 new
列:
m_recycle = as.matrix(df[, 2:6])
m_recycle = t(apply(m_recycle, 1, function(x) rep(x[!is.na(x)], length.out = 5)))
df$new = m_recycle[cbind(1:nrow(m_recycle), df$old)]
df
# old candidate1 candidate2 candidate3 candidate4 candidate5 new
# 1 2 1.7150650 1.7869131 -1.6866933 0.68864025 -1.12310858 1.78691314
# 2 4 0.4609162 0.4978505 0.8377870 0.55391765 NA 0.55391765
# 3 3 -1.2650612 -1.9666172 0.1533731 -0.06191171 NA 0.15337312
# 4 5 -0.6868529 0.7013559 -1.1381369 -0.30596266 0.77996512 0.77996512
# 5 5 -0.4456620 -0.4727914 1.2538149 -0.38047100 -0.08336907 -0.08336907
# 6 1 1.2240818 -1.0678237 0.4264642 -0.69470698 NA 1.22408180
# 7 3 0.3598138 -0.2179749 -0.2950715 -0.20791728 -0.02854676 -0.29507148
# 8 5 0.4007715 -1.0260044 0.8951257 -1.26539635 -0.04287046 -0.04287046
# 9 3 0.1106827 -0.7288912 0.8781335 2.16895597 1.36860228 0.87813349
# 10 3 -0.5558411 -0.6250393 0.8215811 1.20796200 NA 0.82158108
不过,我的数据与你的不匹配。也许你没有 运行 set.seed
?
我想根据第一列对列 df$candidate1 到 df$candidate5 中的单元格进行索引。
在某些情况下,候选人是空的,在这种情况下我想循环回到第一个候选人。例如,我可以让 df$old=4 和 df$candidate1=0.47 以及 df$candidate2=-0.14 和 candidates 3/4/5 全部空白。在那种情况下,我想检索 candidate2(因为它搜索向量 c(0.47,-0.14,0.47,-0.14) 并检索第四个元素。
old candidate1 candidate2 candidate3 candidate4 candidate5 new
1 4 0.47 -0.14 NA NA NA -0.14
下面是一个可重现的示例虚拟数据和一个不回收的for循环(但显示了过程的基础知识).
问:如何进行回收?
set.seed(123)
size <- 10
df <- data.frame(old = sample(1:5, size, replace = TRUE),
candidate1 = rnorm(size),
candidate2 = rnorm(size),
candidate3 = rnorm(size),
candidate4 = rnorm(size),
candidate5 = rnorm(size))
df$candidate5 <- ifelse(runif(size, 0, 1) < 0.5, NA, df$candidate5) # sometimes this (and other) columns are empty, I want to recycle over at candidate1
# this for loop works but it doesn't recycle
new <- vector(mode = "numeric", length = size)
for (i in 1:size){
new[i] <- df[i,1+df$old[i]]
}
df$new <- new
df$new[6] <- df$candidate1[6] # filling in the missed cell because my for loop doesn't recycle, value = 0.11068272
# conceptually, below is what I tried first, but this pulls a full vector for each row (and will overload R, so test with caution!)
df$new <- df[,2:6][df$old]
上述可重现示例的输出
# df[6,7] (0.11068272) was filled in manually to show the desired output
old candidate1 candidate2 candidate3 candidate4 candidate5 new
1 3 -0.6868529 0.7013559 -1.13813694 -0.3059627 0.77996512 -1.13813694
2 3 -0.4456620 -0.4727914 1.25381492 -0.3804710 -0.08336907 1.25381492
3 2 1.2240818 -1.0678237 0.42646422 -0.6947070 0.25331851 -1.06782371
4 2 0.3598138 -0.2179749 -0.29507148 -0.2079173 NA -0.21797491
5 3 0.4007715 -1.0260044 0.89512566 -1.2653964 -0.04287046 0.89512566
6 5 0.1106827 -0.7288912 0.87813349 2.1689560 NA 0.11068272
7 4 -0.5558411 -0.6250393 0.82158108 1.2079620 NA 1.20796200
8 1 1.7869131 -1.6866933 0.68864025 -1.1231086 NA 1.78691314
9 2 0.4978505 0.8377870 0.55391765 -0.4028848 NA 0.83778704
10 3 -1.9666172 0.1533731 -0.06191171 -0.4666554 0.58461375 -0.06191171
我认为这行得通。我将候选列提取到矩阵中,回收它们,然后使用该回收对象创建 new
列:
m_recycle = as.matrix(df[, 2:6])
m_recycle = t(apply(m_recycle, 1, function(x) rep(x[!is.na(x)], length.out = 5)))
df$new = m_recycle[cbind(1:nrow(m_recycle), df$old)]
df
# old candidate1 candidate2 candidate3 candidate4 candidate5 new
# 1 2 1.7150650 1.7869131 -1.6866933 0.68864025 -1.12310858 1.78691314
# 2 4 0.4609162 0.4978505 0.8377870 0.55391765 NA 0.55391765
# 3 3 -1.2650612 -1.9666172 0.1533731 -0.06191171 NA 0.15337312
# 4 5 -0.6868529 0.7013559 -1.1381369 -0.30596266 0.77996512 0.77996512
# 5 5 -0.4456620 -0.4727914 1.2538149 -0.38047100 -0.08336907 -0.08336907
# 6 1 1.2240818 -1.0678237 0.4264642 -0.69470698 NA 1.22408180
# 7 3 0.3598138 -0.2179749 -0.2950715 -0.20791728 -0.02854676 -0.29507148
# 8 5 0.4007715 -1.0260044 0.8951257 -1.26539635 -0.04287046 -0.04287046
# 9 3 0.1106827 -0.7288912 0.8781335 2.16895597 1.36860228 0.87813349
# 10 3 -0.5558411 -0.6250393 0.8215811 1.20796200 NA 0.82158108
不过,我的数据与你的不匹配。也许你没有 运行 set.seed
?