如何从包含特定文本的数据框列中提取数据
how do you extract data from data frame columns that contain certain text
我有这个数据框:
dput(df)
structure(list(Time = structure(1:4, .Label = c("1/29/2015 2:00",
"1/29/2015 2:10", "1/29/2015 2:20", "1/29/2015 2:30"), class = "factor"),
WTAD..SNMP..AppTier.BIGIP.SNMP.CPU.5min.avg.on.Web01.Content.Match = structure(c(1L,
1L, 1L, 1L), .Label = "n/a", class = "factor"), WTAD..SNMP..AppTier.BIGIP.SNMP.CPU.5min.avg.on.Web01.Status = structure(c(1L,
1L, 1L, 1L), .Label = "n/a", class = "factor"), WTAD..SNMP..AppTier.BIGIP.SNMP.CPU.5min.avg.on.Web01.Value = c(12L,
12L, 12L, 12L), WTAD..SNMP..AppTier.BIGIP.SNMP.Memory.on.Web01.Content.Match = structure(c(1L,
1L, 1L, 1L), .Label = "n/a", class = "factor")), .Names = c("Time",
"WTAD..SNMP..AppTier.BIGIP.SNMP.CPU.5min.avg.on.Web01.Content.Match",
"WTAD..SNMP..AppTier.BIGIP.SNMP.CPU.5min.avg.on.Web01.Status",
"WTAD..SNMP..AppTier.BIGIP.SNMP.CPU.5min.avg.on.Web01.Value",
"WTAD..SNMP..AppTier.BIGIP.SNMP.Memory.on.Web01.Content.Match"
), class = "data.frame", row.names = c(NA, -4L))
我正在尝试包含以下内容的列:CPU.5min.avg.on.*.Value"
library(dplyr)
df<-select(df, Time, contains("CPU.5min.avg.on.*.Value"))
这项工作适用于 windows R,但不适用于 linux。知道我做错了什么吗?
基础 R 解决方案:
df[,c("Time",colnames(df)[sapply(colnames(df), function(u) grepl("CPU.5min.avg.on.*.Value",u))])]
dplyr
解法:
select(df, Time, matches('CPU.5min.avg.on.*.Value'))
实际上,我很困惑您的解决方案在 Windows 下有效。 ?select
文档说:
contains(x, ignore.case = TRUE): selects all variables whose name
contains x
matches(x, ignore.case = TRUE): selects all variables whose name
matches the regular expression x
并且您正在尝试匹配代码中的正则表达式,因此它不应该在任何 OS.
下与 contain
一起使用
我有这个数据框:
dput(df)
structure(list(Time = structure(1:4, .Label = c("1/29/2015 2:00",
"1/29/2015 2:10", "1/29/2015 2:20", "1/29/2015 2:30"), class = "factor"),
WTAD..SNMP..AppTier.BIGIP.SNMP.CPU.5min.avg.on.Web01.Content.Match = structure(c(1L,
1L, 1L, 1L), .Label = "n/a", class = "factor"), WTAD..SNMP..AppTier.BIGIP.SNMP.CPU.5min.avg.on.Web01.Status = structure(c(1L,
1L, 1L, 1L), .Label = "n/a", class = "factor"), WTAD..SNMP..AppTier.BIGIP.SNMP.CPU.5min.avg.on.Web01.Value = c(12L,
12L, 12L, 12L), WTAD..SNMP..AppTier.BIGIP.SNMP.Memory.on.Web01.Content.Match = structure(c(1L,
1L, 1L, 1L), .Label = "n/a", class = "factor")), .Names = c("Time",
"WTAD..SNMP..AppTier.BIGIP.SNMP.CPU.5min.avg.on.Web01.Content.Match",
"WTAD..SNMP..AppTier.BIGIP.SNMP.CPU.5min.avg.on.Web01.Status",
"WTAD..SNMP..AppTier.BIGIP.SNMP.CPU.5min.avg.on.Web01.Value",
"WTAD..SNMP..AppTier.BIGIP.SNMP.Memory.on.Web01.Content.Match"
), class = "data.frame", row.names = c(NA, -4L))
我正在尝试包含以下内容的列:CPU.5min.avg.on.*.Value"
library(dplyr)
df<-select(df, Time, contains("CPU.5min.avg.on.*.Value"))
这项工作适用于 windows R,但不适用于 linux。知道我做错了什么吗?
基础 R 解决方案:
df[,c("Time",colnames(df)[sapply(colnames(df), function(u) grepl("CPU.5min.avg.on.*.Value",u))])]
dplyr
解法:
select(df, Time, matches('CPU.5min.avg.on.*.Value'))
实际上,我很困惑您的解决方案在 Windows 下有效。 ?select
文档说:
contains(x, ignore.case = TRUE): selects all variables whose name contains x
matches(x, ignore.case = TRUE): selects all variables whose name matches the regular expression x
并且您正在尝试匹配代码中的正则表达式,因此它不应该在任何 OS.
下与contain
一起使用