从数据框 R 中提取某些列

Question

我的数据框如下所示：

   x s1   s2 s3  s4
1 x1  1 1954  1 yes
2 x2  2 1955  1  no
3 x3  1 1976  2 yes
4 x4  2 1954  2 yes
5 x5  3 1943  1  no

示例数据：

df <- data.frame(x=c('x1','x2','x3','x4','x5'),
                    s1=c(1,2,1,2,3),
                    s2=c(1954,1955,1976,1954,1943), 
                    s3=c(1,1,2,2,1),
                    s4=c('yes','no','yes','yes','no'))```

是否可以提取包含整数 1 到 3 的数据框列？例如，新的数据框看起来像：

是否可以根据列中的值是否为 1 将 s1 和 s3 列更改为 0 或 1？更改后的数据框将如下所示：

Answer 1

我认为这就是您所期望的：

my_df <- data.frame(x=c('x1','x2','x3','x4','x5'),
             s1=c(1,2,1,2,3),
             s2=c(1954,1955,1976,1954,1943), 
             s3=c(1,1,2,2,1),
             s4=c('yes','no','yes','yes','no'))

my_df$end <- apply(my_df, 2, function(x) paste(x, collapse = " "))
my_df <- my_df %>% group_by(x) %>% mutate(end2 = paste(str_extract_all(string = end, pattern = "1|2|3", simplify = TRUE), collapse = " "))
my_var <- which(my_df$end == my_df$end2)
my_df[, my_var] <- t(apply(my_df[, my_var], 1, function(x) ifelse(test = x == 1, yes = 1, no = 0)))
my_df <- my_df[, c(1, my_var)]

Answer 2

基础 R

newdf <- df[, unique(c("x", names(which(sapply(df, function(z) is.numeric(z) & any(c(1, 3) %in% z)))))), drop = FALSE]
newdf
#    x s1 s3
# 1 x1  1  1
# 2 x2  2  1
# 3 x3  1  2
# 4 x4  2  2
# 5 x5  3  1

newdf[-1] <- lapply(newdf[-1], function(z) +(z == 1))
newdf
#    x s1 s3
# 1 x1  1  1
# 2 x2  0  1
# 3 x3  1  0
# 4 x4  0  0
# 5 x5  0  1

演练：

首先，我们确定哪些列是数字并包含数字 1 或 3：
```
sapply(df, function(z) is.numeric(z) & any(c(1, 3) %in% z))
#     x    s1    s2    s3    s4 
# FALSE  TRUE FALSE  TRUE FALSE 
```
这将排除任何非数字列，这意味着包含文字 "1" 或 "3" 的 character 列将不被保留。这是我的完整推断；如果您想接受字符串版本，请删除 is.numeric(z) 组件。

其次，我们提取真实的名字，并在前面加上"x"

c("x", names(which(sapply(df, function(z) is.numeric(z) & any(c(1, 3) %in% z)))))
# [1] "x"  "s1" "s3"

如果出于某种原因 "x" 也是数字并且包含 1 或 3，则将其包装在 unique(.) 中（这一步纯粹是防御性的，您可能并不严格需要它)
select 那些列，防御性地添加 drop=FALSE 这样如果只有一个列匹配，它仍然 returns 一个完整的 data.frame
仅用 0 或 1 替换那些列（不包括第一列 "x"）； z == 1 returns logical 和包装 +(..) 将逻辑转换为 0（假）或 1（真）。

dplyr

library(dplyr)
df %>%
  select(x, where(~ is.numeric(.) & any(c(1, 3) %in% .))) %>%
  mutate(across(-x, ~ +(. == 1)))
#    x s1 s3
# 1 x1  1  1
# 2 x2  0  1
# 3 x3  1  0
# 4 x4  0  0
# 5 x5  0  1

从数据框 R 中提取某些列

Extract certain columns from data frame R

r

dataframe

sapply

基础 R

dplyr