根据行删除或过滤值

remove or filter values based on row

我正在尝试根据值从我的数据集中删除某些行。 我的数据集如下所示:

我想过滤行并删除包含“.”的行。值以及修改具有许多 rsid 的行并将它们分开并将它们放入单独的行中。 我试图用过滤器功能来做到这一点,但它给了我错误。

我使用的命令:

filter(rsid_en_vcf, X1 != ".")

Error in filter(rsid_en_vcf, X1 != ".") : object 'X1' not found

我的数据集是:

dput(rsid_en_vcf[1:48, 1])
c("rs782629217", "rs782403204", "rs199529001", ".", "rs147880041", 
".", ".", "rs141826009", "rs199826048", "rs200558688", "rs782114919", 
"rs41304577", ".", "rs200311430", "rs147114528", "rs200635479", 
"rs41288741", "rs782167952", "rs6560827", "rs200242637", "rs144539776", 
"rs41305669", "rs41288743", "rs41288743", "rs369736529", "rs148025238", 
"rs41298226", "rs782272071", "rs9329304", "rs9329305", "rs137895574", 
"rs142619172", "rs144154384", "rs782777737", "rs782796368", "rs782443786", 
"rs782246853", "rs150779790", "rs782304204", "rs9329306", "rs144740103", 
"rs4431953", "rs189892388;rs75953774", "rs61839057", "rs61839058", 
"rs145405488", "rs782307404", "rs782307404")

在正则表达式中,. 表示任何内容。因此,如果您只是在 filter 语句中使用 .,那么它会保留所有内容。因此,要明确搜索 .,我们需要通过寻找固定时间段(即 [.])或使用 \.

来转义它
library(tidyverse)

df %>% 
  filter(!str_detect(codes, "[.]"))

或者您可以使用 \:

df %>% 
  filter(!str_detect(codes, "\."))

或以 R 为基数:

df[!grepl("\.", df$codes),]

或设置fixed = TRUE:

df[!grepl(".", df$codes, fixed = TRUE), ]

输出

                    codes
1             rs782629217
2             rs782403204
3             rs199529001
4             rs147880041
5             rs141826009
6             rs199826048
7             rs200558688
8             rs782114919
9              rs41304577
10            rs200311430
11            rs147114528
12            rs200635479
13             rs41288741
14            rs782167952
15              rs6560827
16            rs200242637
17            rs144539776
18             rs41305669
19             rs41288743
20             rs41288743
21            rs369736529
22            rs148025238
23             rs41298226
24            rs782272071
25              rs9329304
26              rs9329305
27            rs137895574
28            rs142619172
29            rs144154384
30            rs782777737
31            rs782796368
32            rs782443786
33            rs782246853
34            rs150779790
35            rs782304204
36              rs9329306
37            rs144740103
38              rs4431953
39 rs189892388;rs75953774
40             rs61839057
41             rs61839058
42            rs145405488
43            rs782307404
44            rs782307404

数据

df <- structure(list(codes = c("rs782629217", "rs782403204", "rs199529001", 
".", "rs147880041", ".", ".", "rs141826009", "rs199826048", "rs200558688", 
"rs782114919", "rs41304577", ".", "rs200311430", "rs147114528", 
"rs200635479", "rs41288741", "rs782167952", "rs6560827", "rs200242637", 
"rs144539776", "rs41305669", "rs41288743", "rs41288743", "rs369736529", 
"rs148025238", "rs41298226", "rs782272071", "rs9329304", "rs9329305", 
"rs137895574", "rs142619172", "rs144154384", "rs782777737", "rs782796368", 
"rs782443786", "rs782246853", "rs150779790", "rs782304204", "rs9329306", 
"rs144740103", "rs4431953", "rs189892388;rs75953774", "rs61839057", 
"rs61839058", "rs145405488", "rs782307404", "rs782307404")), class = "data.frame", row.names = c(NA, 
-48L))