如何使用一个简短的脚本来根据 colname 的前缀消除除一个之外的所有重复列变量

Question

我想知道如何使用一个简短的脚本来根据列名的前缀消除所有重复的列变量，而无需手动输入我要删除的变量。

例如，我创建了 mtcars$am 变量的重复，在一个名为mtcars_example_2。我删除了 mtcars_example_2 数据框中的原始 am 变量。

我可以使用下面的脚本来消除前缀为“am”的所有变量，但 am1 变量使用下面的代码删除一个名为 mtcars_example_3 的新变量，该代码输入所有要删除的变量手工：

## long way of removing all variable with am prefix that were not am1
mtcars_example_3 <- 
  mtcars_example_2 %>% 
  select(
    -c(
      "am2", "am3", "am4"
    )
  )

但这似乎是一个漫长的过程。有没有更快的方法不需要我单独输入我想从数据中删除的每个变量的名称。

这可能吗？如果可以，如何实现？

提前致谢。

示例代码如下：

# example data

## loads packages
library(tidyverse)

## creates mtcars_example data
mtcars_example_1 <- data.frame(mtcars)
mtcars_example_2 <- data.frame(mtcars_example_1)

## creates duplicate variables, based on am variable
mtcars_example_2$am1 <- mtcars_example_1$am
mtcars_example_2$am2 <- mtcars_example_1$am
mtcars_example_2$am3 <- mtcars_example_1$am
mtcars_example_2$am4 <- mtcars_example_1$am

## removes original variable
mtcars_example_2 <- 
  mtcars_example_2 %>% 
  select(
    -c(
      "am"
    )
  )

## long way of removing all variable with am prefix that were not am1
mtcars_example_3 <- 
  mtcars_example_2 %>% 
  select(
    -c(
      "am2", "am3", "am4"
    )
  )

Answer 1

您可以删除所有以 am 开头的变量，但保留 am1 :

library(dplyr)

mtcars_example_2 %>% select(-starts_with('am'), am1) %>% head 

#                   mpg cyl disp  hp drat    wt  qsec vs gear carb am1
#Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0    4    4   1
#Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0    4    4   1
#Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1    4    1   1
#Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1    3    1   0
#Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0    3    2   0
#Valiant           18.1   6  225 105 2.76 3.460 20.22  1    3    1   0

根据您的实际情况，您也可以使用正则表达式来删除列。

mtcars_example_2 %>% select(-matches('am[2-4]')) %>% head

Answer 2

我们也可以

library(dplyr)
mtcars_example_2 %>%
     select(-contains('am'), am1)

如何使用一个简短的脚本来根据 colname 的前缀消除除一个之外的所有重复列变量

How to use a short script to eliminate all but one duplicate column variables based on the prefix of the colname

automation

r

duplicates