哪些文件在 R 中有一些内容
Which files have some content in R
我有一个包含文件行的列表,显示了其中的示例。
list(c("\"ID\",\"SIGNALINTENSITY\",\"SNR\"", "\"NM_012429\",\"7.19739265676517\",\"0.738130599770152\"",
"\"NM_003980\",\"12.4036181424743\",\"13.753593768862\"", "\"AY044449\",\"8.74973537284918\",\"1.77200602833912\"",
"\"NM_005015\",\"11.3735054810744\",\"6.76079815107347\""), c("\"ID\",\"SIGNALINTENSITY\",\"SNR\"",
"\"NM_012429\",\"7.07699512126353\",\"0.987579612646805\"", "\"NM_003980\",\"11.3172936656653\",\"8.38227473088534\"",
"\"AY044449\",\"9.2865464417786\",\"2.61149606120517\"", "\"NM_005015\",\"10.1228142794354\",\"3.98707517627092\""
), c("ID,SIGNALINTENSITY,SNR", "1,NM_012429,6.44764696592035,0.84120306786724",
"2,NM_003980,9.52604513443066,3.02404186191898", "3,AY044449,9.11930818670925,2.24361163736047",
"4,NM_005015,10.5672879852575,5.29334273442728"))
我想在阅读台词时确认匹配。我试图通过以下代码
找出哪些文件的内容以NM
或GE
开头
which(lapply(lines, function(x) any(grepl(paste(c("^NM_","^GE"),collapse = "|"), x, ignore.case = TRUE))) == T)
应该给出所有三个的索引,但它 return integer(0)
。我不确定我错过了什么。
试试这个:
lyst <- list(c("\"ID\",\"SIGNALINTENSITY\",\"SNR\"", "\"NM_012429\",\"7.19739265676517\",\"0.738130599770152\"",
"\"NM_003980\",\"12.4036181424743\",\"13.753593768862\"", "\"AY044449\",\"8.74973537284918\",\"1.77200602833912\"",
"\"NM_005015\",\"11.3735054810744\",\"6.76079815107347\""), c("\"ID\",\"SIGNALINTENSITY\",\"SNR\"",
"\"NM_012429\",\"7.07699512126353\",\"0.987579612646805\"", "\"NM_003980\",\"11.3172936656653\",\"8.38227473088534\"",
"\"AY044449\",\"9.2865464417786\",\"2.61149606120517\"", "\"NM_005015\",\"10.1228142794354\",\"3.98707517627092\""
), c("ID,SIGNALINTENSITY,SNR", "1,NM_012429,6.44764696592035,0.84120306786724",
"2,NM_003980,9.52604513443066,3.02404186191898", "3,AY044449,9.11930818670925,2.24361163736047",
"4,NM_005015,10.5672879852575,5.29334273442728"))
假设 lyst
根据您的问题给出了字符串,那么您可以这样做:
lapply(1:length(lyst), function(x)grepl("^NM|^GE",gsub('"',"", lyst[[x]])))
逻辑:
首先使用 gsub
将 ' " ' 替换为空,然后使用 '^' 使用 grepl 确定字符串的开头是 NM 还是 GE。
但是,如果有人有兴趣用可选数字和逗号进行匹配
也可以使用这个正则表达式:
lapply(1:3, function(x)grepl("^(NM|GE)|^\d+,(NM|GE)",gsub('"',"", lyst[[x]])))
输出:
> lapply(1:3, function(x)grepl("^(NM|GE)|^\d+,(NM|GE)",gsub('"',"", lyst[[x]])))
[[1]]
[1] FALSE TRUE TRUE FALSE TRUE
[[2]]
[1] FALSE TRUE TRUE FALSE TRUE
[[3]]
[1] FALSE TRUE TRUE FALSE TRUE
dat <- lapply(
lines,
function(x) read.csv(text = x)
)
# [[1]]
# ID SIGNALINTENSITY SNR
# 1 NM_012429 7.197393 0.7381306
# 2 NM_003980 12.403618 13.7535938
# 3 AY044449 8.749735 1.7720060
# 4 NM_005015 11.373505 6.7607982
#
# [[2]]
# ID SIGNALINTENSITY SNR
# 1 NM_012429 7.076995 0.9875796
# 2 NM_003980 11.317294 8.3822747
# 3 AY044449 9.286546 2.6114961
# 4 NM_005015 10.122814 3.9870752
#
# [[3]]
# ID SIGNALINTENSITY SNR
# 1 NM_012429 6.447647 0.8412031
# 2 NM_003980 9.526045 3.0240419
# 3 AY044449 9.119308 2.2436116
# 4 NM_005015 10.567288 5.2933427
过滤行:
lapply(
dat,
function(df) df[grepl("^NM_|^GE", df$ID, ignore.case = TRUE), ]
)
# [[1]]
# ID SIGNALINTENSITY SNR
# 1 NM_012429 7.197393 0.7381306
# 2 NM_003980 12.403618 13.7535938
# 4 NM_005015 11.373505 6.7607982
#
# [[2]]
# ID SIGNALINTENSITY SNR
# 1 NM_012429 7.076995 0.9875796
# 2 NM_003980 11.317294 8.3822747
# 4 NM_005015 10.122814 3.9870752
#
# [[3]]
# ID SIGNALINTENSITY SNR
# 1 NM_012429 6.447647 0.8412031
# 2 NM_003980 9.526045 3.0240419
# 4 NM_005015 10.567288 5.2933427
或者如果只需要索引:
lapply(
dat,
function(df) grepl("^NM_|^GE", df$ID, ignore.case = TRUE)
)
# [[1]]
# [1] TRUE TRUE FALSE TRUE
#
# [[2]]
# [1] TRUE TRUE FALSE TRUE
#
# [[3]]
# [1] TRUE TRUE FALSE TRUE
或者用 grep
代替 grepl
:
lapply(
dat,
function(df) grep("^NM_|^GE", df$ID, ignore.case = TRUE)
)
# [[1]]
# [1] 1 2 4
#
# [[2]]
# [1] 1 2 4
#
# [[3]]
# [1] 1 2 4
我有一个包含文件行的列表,显示了其中的示例。
list(c("\"ID\",\"SIGNALINTENSITY\",\"SNR\"", "\"NM_012429\",\"7.19739265676517\",\"0.738130599770152\"",
"\"NM_003980\",\"12.4036181424743\",\"13.753593768862\"", "\"AY044449\",\"8.74973537284918\",\"1.77200602833912\"",
"\"NM_005015\",\"11.3735054810744\",\"6.76079815107347\""), c("\"ID\",\"SIGNALINTENSITY\",\"SNR\"",
"\"NM_012429\",\"7.07699512126353\",\"0.987579612646805\"", "\"NM_003980\",\"11.3172936656653\",\"8.38227473088534\"",
"\"AY044449\",\"9.2865464417786\",\"2.61149606120517\"", "\"NM_005015\",\"10.1228142794354\",\"3.98707517627092\""
), c("ID,SIGNALINTENSITY,SNR", "1,NM_012429,6.44764696592035,0.84120306786724",
"2,NM_003980,9.52604513443066,3.02404186191898", "3,AY044449,9.11930818670925,2.24361163736047",
"4,NM_005015,10.5672879852575,5.29334273442728"))
我想在阅读台词时确认匹配。我试图通过以下代码
找出哪些文件的内容以NM
或GE
开头
which(lapply(lines, function(x) any(grepl(paste(c("^NM_","^GE"),collapse = "|"), x, ignore.case = TRUE))) == T)
应该给出所有三个的索引,但它 return integer(0)
。我不确定我错过了什么。
试试这个:
lyst <- list(c("\"ID\",\"SIGNALINTENSITY\",\"SNR\"", "\"NM_012429\",\"7.19739265676517\",\"0.738130599770152\"",
"\"NM_003980\",\"12.4036181424743\",\"13.753593768862\"", "\"AY044449\",\"8.74973537284918\",\"1.77200602833912\"",
"\"NM_005015\",\"11.3735054810744\",\"6.76079815107347\""), c("\"ID\",\"SIGNALINTENSITY\",\"SNR\"",
"\"NM_012429\",\"7.07699512126353\",\"0.987579612646805\"", "\"NM_003980\",\"11.3172936656653\",\"8.38227473088534\"",
"\"AY044449\",\"9.2865464417786\",\"2.61149606120517\"", "\"NM_005015\",\"10.1228142794354\",\"3.98707517627092\""
), c("ID,SIGNALINTENSITY,SNR", "1,NM_012429,6.44764696592035,0.84120306786724",
"2,NM_003980,9.52604513443066,3.02404186191898", "3,AY044449,9.11930818670925,2.24361163736047",
"4,NM_005015,10.5672879852575,5.29334273442728"))
假设 lyst
根据您的问题给出了字符串,那么您可以这样做:
lapply(1:length(lyst), function(x)grepl("^NM|^GE",gsub('"',"", lyst[[x]])))
逻辑:
首先使用 gsub
将 ' " ' 替换为空,然后使用 '^' 使用 grepl 确定字符串的开头是 NM 还是 GE。
但是,如果有人有兴趣用可选数字和逗号进行匹配 也可以使用这个正则表达式:
lapply(1:3, function(x)grepl("^(NM|GE)|^\d+,(NM|GE)",gsub('"',"", lyst[[x]])))
输出:
> lapply(1:3, function(x)grepl("^(NM|GE)|^\d+,(NM|GE)",gsub('"',"", lyst[[x]])))
[[1]]
[1] FALSE TRUE TRUE FALSE TRUE
[[2]]
[1] FALSE TRUE TRUE FALSE TRUE
[[3]]
[1] FALSE TRUE TRUE FALSE TRUE
dat <- lapply(
lines,
function(x) read.csv(text = x)
)
# [[1]]
# ID SIGNALINTENSITY SNR
# 1 NM_012429 7.197393 0.7381306
# 2 NM_003980 12.403618 13.7535938
# 3 AY044449 8.749735 1.7720060
# 4 NM_005015 11.373505 6.7607982
#
# [[2]]
# ID SIGNALINTENSITY SNR
# 1 NM_012429 7.076995 0.9875796
# 2 NM_003980 11.317294 8.3822747
# 3 AY044449 9.286546 2.6114961
# 4 NM_005015 10.122814 3.9870752
#
# [[3]]
# ID SIGNALINTENSITY SNR
# 1 NM_012429 6.447647 0.8412031
# 2 NM_003980 9.526045 3.0240419
# 3 AY044449 9.119308 2.2436116
# 4 NM_005015 10.567288 5.2933427
过滤行:
lapply(
dat,
function(df) df[grepl("^NM_|^GE", df$ID, ignore.case = TRUE), ]
)
# [[1]]
# ID SIGNALINTENSITY SNR
# 1 NM_012429 7.197393 0.7381306
# 2 NM_003980 12.403618 13.7535938
# 4 NM_005015 11.373505 6.7607982
#
# [[2]]
# ID SIGNALINTENSITY SNR
# 1 NM_012429 7.076995 0.9875796
# 2 NM_003980 11.317294 8.3822747
# 4 NM_005015 10.122814 3.9870752
#
# [[3]]
# ID SIGNALINTENSITY SNR
# 1 NM_012429 6.447647 0.8412031
# 2 NM_003980 9.526045 3.0240419
# 4 NM_005015 10.567288 5.2933427
或者如果只需要索引:
lapply(
dat,
function(df) grepl("^NM_|^GE", df$ID, ignore.case = TRUE)
)
# [[1]]
# [1] TRUE TRUE FALSE TRUE
#
# [[2]]
# [1] TRUE TRUE FALSE TRUE
#
# [[3]]
# [1] TRUE TRUE FALSE TRUE
或者用 grep
代替 grepl
:
lapply(
dat,
function(df) grep("^NM_|^GE", df$ID, ignore.case = TRUE)
)
# [[1]]
# [1] 1 2 4
#
# [[2]]
# [1] 1 2 4
#
# [[3]]
# [1] 1 2 4