根据特定列中的数据框条目添加新列的最快方法是什么
What is the fastest way to add new column based on dataframe entries in specific columns
所以我有这个数据框
# Name Comp1 Con2 Vis3 Tra4 Pred5 Adap6
# 1 A1 x <NA> <NA> <NA> <NA> <NA>
# 2 A2 <NA> x <NA> <NA> <NA> <NA>
# 3 B1 <NA> <NA> x <NA> <NA> <NA>
# 4 B2 <NA> <NA> <NA> <NA> x <NA>
# 5 B3 <NA> <NA> <NA> x <NA> <NA>
# 6 D2 <NA> <NA> <NA> <NA> <NA> x
# 7 F6 <NA> <NA> <NA> <NA> x <NA>
我想向数据后端添加一列,根据“x”在数据后端上的哪一列显示从 1 到 6 的值。所以附加列看起来像这样
# Name Comp1 Con2 Vis3 Tra4 Pred5 Adap6 stage
# 1 A1 x <NA> <NA> <NA> <NA> <NA> 1
# 2 A2 <NA> x <NA> <NA> <NA> <NA> 2
# 3 B1 <NA> <NA> x <NA> <NA> <NA> 3
# 4 B2 <NA> <NA> <NA> <NA> x <NA> 5
# 5 B3 <NA> <NA> <NA> x <NA> <NA> 4
# 6 D2 <NA> <NA> <NA> <NA> <NA> x 6
# 7 F6 <NA> <NA> <NA> <NA> x <NA> 5
由于我的数据框在原始脚本中非常大,我正在寻找最快(自动)的方法来执行此操作。我试过使用 for 循环,但它花费的时间太长。
数据
databackend <- structure(list(Name = c("A1", "A2", "B1", "B2", "B3", "D2", "F6"
), Comp1 = c("x", NA, NA, NA, NA, NA, NA), Con2 = c(NA, "x",
NA, NA, NA, NA, NA), Vis3 = c(NA, NA, "x", NA, NA, NA, NA), Tra4 = c(NA,
NA, NA, NA, "x", NA, NA), Pred5 = c(NA, NA, NA, "x", NA, NA,
"x"), Adap6 = c(NA, NA, NA, NA, NA, "x", NA), stage = c(1, 2,
3, 5, 4, 6, 5)), row.names = c(NA, -7L), class = "data.frame")
比较简单
> tmp=which(databackend[,-1]=="x",arr.ind=T)
> tmp[order(tmp[,"row"]),"col"]
[1] 1 2 3 5 4 6 5
你可以这样做(假设在你的例子中每一行都有一个“x”):
max.col(!is.na(databackend[-1]))
[1] 1 2 3 5 4 6 5
使用 which
和 apply
:
apply(databackend[-1], 1, \(x) which(x == "x"))
#[1] 1 2 3 5 4 6 5
一个基准,max.col
是最快的:
microbenchmark::microbenchmark(
apply = apply(databackend[-1], 1, \(x) which(x == "x")),
which = {tmp=which(databackend[,-1]=="x",arr.ind=T)
tmp[order(tmp[,"row"]),"col"]},
max.col = max.col(!is.na(databackend[-1]))
)
Unit: microseconds
expr min lq mean median uq max neval
apply 149.4 165.95 232.308 196.20 216.95 2882.4 100
which 118.9 144.35 184.684 158.10 190.45 907.0 100
max.col 51.5 73.00 88.302 79.45 94.40 326.1 100
我们可以试试
> rowSums(col(databackend[-1])*(!is.na(databackend[-1])))
[1] 1 2 3 5 4 6 5
所以我有这个数据框
# Name Comp1 Con2 Vis3 Tra4 Pred5 Adap6
# 1 A1 x <NA> <NA> <NA> <NA> <NA>
# 2 A2 <NA> x <NA> <NA> <NA> <NA>
# 3 B1 <NA> <NA> x <NA> <NA> <NA>
# 4 B2 <NA> <NA> <NA> <NA> x <NA>
# 5 B3 <NA> <NA> <NA> x <NA> <NA>
# 6 D2 <NA> <NA> <NA> <NA> <NA> x
# 7 F6 <NA> <NA> <NA> <NA> x <NA>
我想向数据后端添加一列,根据“x”在数据后端上的哪一列显示从 1 到 6 的值。所以附加列看起来像这样
# Name Comp1 Con2 Vis3 Tra4 Pred5 Adap6 stage
# 1 A1 x <NA> <NA> <NA> <NA> <NA> 1
# 2 A2 <NA> x <NA> <NA> <NA> <NA> 2
# 3 B1 <NA> <NA> x <NA> <NA> <NA> 3
# 4 B2 <NA> <NA> <NA> <NA> x <NA> 5
# 5 B3 <NA> <NA> <NA> x <NA> <NA> 4
# 6 D2 <NA> <NA> <NA> <NA> <NA> x 6
# 7 F6 <NA> <NA> <NA> <NA> x <NA> 5
由于我的数据框在原始脚本中非常大,我正在寻找最快(自动)的方法来执行此操作。我试过使用 for 循环,但它花费的时间太长。
数据
databackend <- structure(list(Name = c("A1", "A2", "B1", "B2", "B3", "D2", "F6"
), Comp1 = c("x", NA, NA, NA, NA, NA, NA), Con2 = c(NA, "x",
NA, NA, NA, NA, NA), Vis3 = c(NA, NA, "x", NA, NA, NA, NA), Tra4 = c(NA,
NA, NA, NA, "x", NA, NA), Pred5 = c(NA, NA, NA, "x", NA, NA,
"x"), Adap6 = c(NA, NA, NA, NA, NA, "x", NA), stage = c(1, 2,
3, 5, 4, 6, 5)), row.names = c(NA, -7L), class = "data.frame")
比较简单
> tmp=which(databackend[,-1]=="x",arr.ind=T)
> tmp[order(tmp[,"row"]),"col"]
[1] 1 2 3 5 4 6 5
你可以这样做(假设在你的例子中每一行都有一个“x”):
max.col(!is.na(databackend[-1]))
[1] 1 2 3 5 4 6 5
使用 which
和 apply
:
apply(databackend[-1], 1, \(x) which(x == "x"))
#[1] 1 2 3 5 4 6 5
一个基准,max.col
是最快的:
microbenchmark::microbenchmark(
apply = apply(databackend[-1], 1, \(x) which(x == "x")),
which = {tmp=which(databackend[,-1]=="x",arr.ind=T)
tmp[order(tmp[,"row"]),"col"]},
max.col = max.col(!is.na(databackend[-1]))
)
Unit: microseconds
expr min lq mean median uq max neval
apply 149.4 165.95 232.308 196.20 216.95 2882.4 100
which 118.9 144.35 184.684 158.10 190.45 907.0 100
max.col 51.5 73.00 88.302 79.45 94.40 326.1 100
我们可以试试
> rowSums(col(databackend[-1])*(!is.na(databackend[-1])))
[1] 1 2 3 5 4 6 5