根据行删除或过滤值
remove or filter values based on row
我正在尝试根据值从我的数据集中删除某些行。
我的数据集如下所示:
我想过滤行并删除包含“.”的行。值以及修改具有许多 rsid 的行并将它们分开并将它们放入单独的行中。
我试图用过滤器功能来做到这一点,但它给了我错误。
我使用的命令:
filter(rsid_en_vcf, X1 != ".")
Error in filter(rsid_en_vcf, X1 != ".") : object 'X1' not found
我的数据集是:
dput(rsid_en_vcf[1:48, 1])
c("rs782629217", "rs782403204", "rs199529001", ".", "rs147880041",
".", ".", "rs141826009", "rs199826048", "rs200558688", "rs782114919",
"rs41304577", ".", "rs200311430", "rs147114528", "rs200635479",
"rs41288741", "rs782167952", "rs6560827", "rs200242637", "rs144539776",
"rs41305669", "rs41288743", "rs41288743", "rs369736529", "rs148025238",
"rs41298226", "rs782272071", "rs9329304", "rs9329305", "rs137895574",
"rs142619172", "rs144154384", "rs782777737", "rs782796368", "rs782443786",
"rs782246853", "rs150779790", "rs782304204", "rs9329306", "rs144740103",
"rs4431953", "rs189892388;rs75953774", "rs61839057", "rs61839058",
"rs145405488", "rs782307404", "rs782307404")
在正则表达式中,.
表示任何内容。因此,如果您只是在 filter
语句中使用 .
,那么它会保留所有内容。因此,要明确搜索 .
,我们需要通过寻找固定时间段(即 [.]
)或使用 \
.
来转义它
library(tidyverse)
df %>%
filter(!str_detect(codes, "[.]"))
或者您可以使用 \
:
df %>%
filter(!str_detect(codes, "\."))
或以 R 为基数:
df[!grepl("\.", df$codes),]
或设置fixed = TRUE
:
df[!grepl(".", df$codes, fixed = TRUE), ]
输出
codes
1 rs782629217
2 rs782403204
3 rs199529001
4 rs147880041
5 rs141826009
6 rs199826048
7 rs200558688
8 rs782114919
9 rs41304577
10 rs200311430
11 rs147114528
12 rs200635479
13 rs41288741
14 rs782167952
15 rs6560827
16 rs200242637
17 rs144539776
18 rs41305669
19 rs41288743
20 rs41288743
21 rs369736529
22 rs148025238
23 rs41298226
24 rs782272071
25 rs9329304
26 rs9329305
27 rs137895574
28 rs142619172
29 rs144154384
30 rs782777737
31 rs782796368
32 rs782443786
33 rs782246853
34 rs150779790
35 rs782304204
36 rs9329306
37 rs144740103
38 rs4431953
39 rs189892388;rs75953774
40 rs61839057
41 rs61839058
42 rs145405488
43 rs782307404
44 rs782307404
数据
df <- structure(list(codes = c("rs782629217", "rs782403204", "rs199529001",
".", "rs147880041", ".", ".", "rs141826009", "rs199826048", "rs200558688",
"rs782114919", "rs41304577", ".", "rs200311430", "rs147114528",
"rs200635479", "rs41288741", "rs782167952", "rs6560827", "rs200242637",
"rs144539776", "rs41305669", "rs41288743", "rs41288743", "rs369736529",
"rs148025238", "rs41298226", "rs782272071", "rs9329304", "rs9329305",
"rs137895574", "rs142619172", "rs144154384", "rs782777737", "rs782796368",
"rs782443786", "rs782246853", "rs150779790", "rs782304204", "rs9329306",
"rs144740103", "rs4431953", "rs189892388;rs75953774", "rs61839057",
"rs61839058", "rs145405488", "rs782307404", "rs782307404")), class = "data.frame", row.names = c(NA,
-48L))
我正在尝试根据值从我的数据集中删除某些行。
我的数据集如下所示:
我想过滤行并删除包含“.”的行。值以及修改具有许多 rsid 的行并将它们分开并将它们放入单独的行中。 我试图用过滤器功能来做到这一点,但它给了我错误。
我使用的命令:
filter(rsid_en_vcf, X1 != ".")
Error in filter(rsid_en_vcf, X1 != ".") : object 'X1' not found
我的数据集是:
dput(rsid_en_vcf[1:48, 1])
c("rs782629217", "rs782403204", "rs199529001", ".", "rs147880041",
".", ".", "rs141826009", "rs199826048", "rs200558688", "rs782114919",
"rs41304577", ".", "rs200311430", "rs147114528", "rs200635479",
"rs41288741", "rs782167952", "rs6560827", "rs200242637", "rs144539776",
"rs41305669", "rs41288743", "rs41288743", "rs369736529", "rs148025238",
"rs41298226", "rs782272071", "rs9329304", "rs9329305", "rs137895574",
"rs142619172", "rs144154384", "rs782777737", "rs782796368", "rs782443786",
"rs782246853", "rs150779790", "rs782304204", "rs9329306", "rs144740103",
"rs4431953", "rs189892388;rs75953774", "rs61839057", "rs61839058",
"rs145405488", "rs782307404", "rs782307404")
在正则表达式中,.
表示任何内容。因此,如果您只是在 filter
语句中使用 .
,那么它会保留所有内容。因此,要明确搜索 .
,我们需要通过寻找固定时间段(即 [.]
)或使用 \
.
library(tidyverse)
df %>%
filter(!str_detect(codes, "[.]"))
或者您可以使用 \
:
df %>%
filter(!str_detect(codes, "\."))
或以 R 为基数:
df[!grepl("\.", df$codes),]
或设置fixed = TRUE
:
df[!grepl(".", df$codes, fixed = TRUE), ]
输出
codes
1 rs782629217
2 rs782403204
3 rs199529001
4 rs147880041
5 rs141826009
6 rs199826048
7 rs200558688
8 rs782114919
9 rs41304577
10 rs200311430
11 rs147114528
12 rs200635479
13 rs41288741
14 rs782167952
15 rs6560827
16 rs200242637
17 rs144539776
18 rs41305669
19 rs41288743
20 rs41288743
21 rs369736529
22 rs148025238
23 rs41298226
24 rs782272071
25 rs9329304
26 rs9329305
27 rs137895574
28 rs142619172
29 rs144154384
30 rs782777737
31 rs782796368
32 rs782443786
33 rs782246853
34 rs150779790
35 rs782304204
36 rs9329306
37 rs144740103
38 rs4431953
39 rs189892388;rs75953774
40 rs61839057
41 rs61839058
42 rs145405488
43 rs782307404
44 rs782307404
数据
df <- structure(list(codes = c("rs782629217", "rs782403204", "rs199529001",
".", "rs147880041", ".", ".", "rs141826009", "rs199826048", "rs200558688",
"rs782114919", "rs41304577", ".", "rs200311430", "rs147114528",
"rs200635479", "rs41288741", "rs782167952", "rs6560827", "rs200242637",
"rs144539776", "rs41305669", "rs41288743", "rs41288743", "rs369736529",
"rs148025238", "rs41298226", "rs782272071", "rs9329304", "rs9329305",
"rs137895574", "rs142619172", "rs144154384", "rs782777737", "rs782796368",
"rs782443786", "rs782246853", "rs150779790", "rs782304204", "rs9329306",
"rs144740103", "rs4431953", "rs189892388;rs75953774", "rs61839057",
"rs61839058", "rs145405488", "rs782307404", "rs782307404")), class = "data.frame", row.names = c(NA,
-48L))