删除满足 R 中每一列条件的行
Delete rows satisfying condition for each column in R
我有一个带有数值的数据框 (df)。我想编写一个遍历列的 for 循环。对于每一列,我希望它计算值大于某个数字(比如 3)的行数,然后我希望它在移动到下一列之前完全删除这些行。
这是我目前尝试的方法:
output <- vector("double", ncol(df))
for (i in 1:ncol(df)){
output[[i]] <- length(which(df[i] >= 3))
df <- df[!df[,i] >= 3, ]
}
但我收到以下错误:
Error in matrix(if (is.null(value)) logical() else value, nrow = nr,
dimnames = list(rn, : length of 'dimnames' [2] not equal to array
extent
dput(head(df))
#output:
structure(list(col1 = numeric(0), col2 = numeric(0), (etc.)
NA. = integer(0)), row.names = integer(0), class = "data.frame")
col1 col2 col3 col4 col5
1 2.09 1.10 0 21.03 0.88
3 0.00 0.00 0 11.71 0.00
4 1.50 1.10 0 1.67 1.76
5 5.10 0.00 0 0.83 17.94
6 0.00 6.34 0 2.10 0.00
在上面的示例中,我感兴趣的最终输出是一个向量,其中包含每列删除的行数:(1,1,0,2,0)。
这里有一个 for
循环的方法 -
dummy_df <- df # dummy_df in case you don't want to alter original df
output <- rep(0, ncol(df)) # initialize output
for(i in 1:ncol(df)) {
if(nrow(dummy_df) == 0) break # loop breaks if all rows are removed
if(!any(dummy_df >= 3)) break # loop breaks if no values >= 3 remain
output[i] <- sum(dummy_df[i] >= 3)
dummy_df <- dummy_df[dummy_df[i] < 3, , drop = F]
}
output
[1] 3 0 1
apply
的另一种方式可能比上面的循环更快 -
# output excludes columns with 0 rows but can be added later if needed
table(apply(df, 1, function(x) match(TRUE, x >= 3)))
1 3
3 1
数据(感谢@Sada93)-
a b c
1 1 1 1
2 2 2 5
3 3 3 2
4 4 10 1
5 5 2 1
你可以这样做:
Data:
df <- data.frame(x=c(1:5,2),y=c(1,1,1,4,5,2), z= c(2,1,1,2,5,2))
代码:
removed.df <- NULL
for (i in 1:ncol(df)){
for(j in 1:nrow(df)){
if(df[j,i] > 3){
tmp.df <- df[j,]
tmp.df$index <- j
removed.df <- rbind(removed.df, tmp.df)
}
}
}
# removed.df is the rows you have deleted. Index column shows original rows deleted
removed.df <- removed.df[!duplicated(removed.df$index),]
# now you just remove the rows (index of removed.df) from df.
df[-removed.df$index,]
> df[-removed.df$index,]
x y z
1 1 1 2
2 2 1 1
3 3 1 1
6 2 2 2
我有一个带有数值的数据框 (df)。我想编写一个遍历列的 for 循环。对于每一列,我希望它计算值大于某个数字(比如 3)的行数,然后我希望它在移动到下一列之前完全删除这些行。
这是我目前尝试的方法:
output <- vector("double", ncol(df))
for (i in 1:ncol(df)){
output[[i]] <- length(which(df[i] >= 3))
df <- df[!df[,i] >= 3, ]
}
但我收到以下错误:
Error in matrix(if (is.null(value)) logical() else value, nrow = nr, dimnames = list(rn, : length of 'dimnames' [2] not equal to array extent
dput(head(df))
#output:
structure(list(col1 = numeric(0), col2 = numeric(0), (etc.)
NA. = integer(0)), row.names = integer(0), class = "data.frame")
col1 col2 col3 col4 col5
1 2.09 1.10 0 21.03 0.88
3 0.00 0.00 0 11.71 0.00
4 1.50 1.10 0 1.67 1.76
5 5.10 0.00 0 0.83 17.94
6 0.00 6.34 0 2.10 0.00
在上面的示例中,我感兴趣的最终输出是一个向量,其中包含每列删除的行数:(1,1,0,2,0)。
这里有一个 for
循环的方法 -
dummy_df <- df # dummy_df in case you don't want to alter original df
output <- rep(0, ncol(df)) # initialize output
for(i in 1:ncol(df)) {
if(nrow(dummy_df) == 0) break # loop breaks if all rows are removed
if(!any(dummy_df >= 3)) break # loop breaks if no values >= 3 remain
output[i] <- sum(dummy_df[i] >= 3)
dummy_df <- dummy_df[dummy_df[i] < 3, , drop = F]
}
output
[1] 3 0 1
apply
的另一种方式可能比上面的循环更快 -
# output excludes columns with 0 rows but can be added later if needed
table(apply(df, 1, function(x) match(TRUE, x >= 3)))
1 3
3 1
数据(感谢@Sada93)-
a b c
1 1 1 1
2 2 2 5
3 3 3 2
4 4 10 1
5 5 2 1
你可以这样做:
Data:
df <- data.frame(x=c(1:5,2),y=c(1,1,1,4,5,2), z= c(2,1,1,2,5,2))
代码:
removed.df <- NULL
for (i in 1:ncol(df)){
for(j in 1:nrow(df)){
if(df[j,i] > 3){
tmp.df <- df[j,]
tmp.df$index <- j
removed.df <- rbind(removed.df, tmp.df)
}
}
}
# removed.df is the rows you have deleted. Index column shows original rows deleted
removed.df <- removed.df[!duplicated(removed.df$index),]
# now you just remove the rows (index of removed.df) from df.
df[-removed.df$index,]
> df[-removed.df$index,]
x y z
1 1 1 2
2 2 1 1
3 3 1 1
6 2 2 2