如何从 R 中的字符串列表中提取值?
How to extract values from string lists in R?
我想从三个字符串向量中提取 X 平方值和 p 值(仅限数字)。
smr.text1
[1] ""
[2] "\tPearson's Chi-squared test with Yates' continuity correction"
[3] ""
[4] "data: data$parasite and data$T1"
[5] "X-squared = 0.017361, df = 1, p-value = 0.8952"
[6] ""
smr.txt2
[1] ""
[2] "\tPearson's Chi-squared test with Yates' continuity correction"
[3] ""
[4] "data: data$parasite and data$T2"
[5] "X-squared = 2.5679e-32, df = 1, p-value = 1"
[6] ""
smr.text3
[1] ""
[2] "\tPearson's Chi-squared test with Yates' continuity correction"
[3] ""
[4] "data: data$parasite and data$T3"
[5] "X-squared = 0.17857, df = 1, p-value = 0.6726"
[6] ""
我很容易使用索引号从第一个字符串向量中提取这些值:
> c1 <- as.numeric(str_sub(smr.txt1[5], 13, 20))
> c1
[1] 0.017361
> p1 <- as.numeric(str_sub(smr.txt1[5], -6))
> p1
[1] 0.8952
但是在第二个字符串向量中我不能真正做同样的事情,因为它是一个科学数字。我也可以对第三个字符串向量做同样的事情,但是有没有更好的方法,例如使用循环只提取这些值并将它们放在同一个数据框中?提前致谢!
而不是 str_sub
(这是基于位置的,当 start/end 位置不是常量时它不会起作用,如示例 2 所示)我们可以使用正则表达式环视来提取 p-value 子串和后面带 .
的数字 (str_extract
)
library(stringr)
f1 <- function(x, categ ="p-value") {
as.numeric(str_extract(x,
glue::glue("(?<={categ} \= )[0-9.]+(e-[0-9]*)?")))
}
-测试
> f1("X-squared = 0.017361, df = 1, p-value = 0.8952")
[1] 0.8952
> f1("X-squared = 0.017361, df = 1, p-value = 0.8952", "X-squared")
[1] 0.017361
> f1("X-squared = 2.5679e-32, df = 1, p-value = 1")
[1] 1
> f1("X-squared = 2.5679e-32, df = 1, p-value = 1", "X-squared")
[1] 2.5679e-32
> f1("X-squared = 0.17857, df = 1, p-value = 0.6726")
[1] 0.6726
> f1("X-squared = 0.17857, df = 1, p-value = 0.6726", "X-squared")
[1] 0.17857
另一种选择是将列名称转换为 data.frame
'X-squared'、'p-value'、'df',然后提取列值
f2 <- function(x, categ = "p-value") {
x1 <- gsub(",\s*", "\n", gsub("\s*=\s*", ":", x))
type.convert(as.data.frame(read.dcf(textConnection(x1))),
as.is = TRUE)[[categ]]
}
-测试
> f2("X-squared = 0.17857, df = 1, p-value = 0.6726", "X-squared")
[1] 0.17857
> f2("X-squared = 0.017361, df = 1, p-value = 0.8952")
[1] 0.8952
> f2("X-squared = 0.017361, df = 1, p-value = 0.8952", "X-squared")
[1] 0.017361
> f2("X-squared = 2.5679e-32, df = 1, p-value = 1")
[1] 1
> f2("X-squared = 2.5679e-32, df = 1, p-value = 1", "X-squared")
[1] 2.5679e-32
> f2("X-squared = 0.17857, df = 1, p-value = 0.6726")
[1] 0.6726
> f2("X-squared = 0.17857, df = 1, p-value = 0.6726", "X-squared")
[1] 0.17857
不清楚为什么我们需要将 chisq.test
的输出 list
输出转换为字符串以进行提取,即从 chisq.test
的输出,使用 [= 更容易提取=22=] 或 [[
M <- as.table(rbind(c(762, 327, 468), c(484, 239, 477)))
dimnames(M) <- list(gender = c("F", "M"),
party = c("Democrat","Independent", "Republican"))
Xsq <- chisq.test(M)
Xsq$p.value
#[1] 2.953589e-07
Xsq$statistic[["X-squared"]]
[1] 30.07015
虽然不是您所要求的,但您似乎使用 capture.output(.)
来捕获这些字符串。我建议您不要尝试从捕获的输出中提取字符串,而是从对象本身获取 实数。
M <- as.table(rbind(c(762, 327, 468), c(484, 239, 477)))
dimnames(M) <- list(gender = c("F", "M"),
party = c("Democrat","Independent", "Republican"))
Xsq <- chisq.test(M)
names(Xsq)
# [1] "statistic" "parameter" "p.value" "method" "data.name" "observed" "expected" "residuals" "stdres"
Xsq[c("statistic","p.value")]
# $statistic
# X-squared
# 30.07015
# $p.value
# [1] 2.953589e-07
既然你提到有一个列表,那么使用它也很容易。例如,如果您有一个测试结果列表,如
Xsq2 <- lapply(list(M, M), chisq.test)
Xsq2
# [[1]]
# Pearson's Chi-squared test
# data: X[[i]]
# X-squared = 30.07, df = 2, p-value = 2.954e-07
# [[2]]
# Pearson's Chi-squared test
# data: X[[i]]
# X-squared = 30.07, df = 2, p-value = 2.954e-07
lapply(Xsq2, `[`, c("statistic", "p.value"))
# [[1]]
# [[1]]$statistic
# X-squared
# 30.07015
# [[1]]$p.value
# [1] 2.953589e-07
# [[2]]
# [[2]]$statistic
# X-squared
# 30.07015
# [[2]]$p.value
# [1] 2.953589e-07
可以很容易地转换成 data.frame
,其中:
do.call(rbind.data.frame, lapply(Xsq2, `[`, c("statistic", "p.value")))
# statistic p.value
# 1 30.07015 2.953589e-07
# 2 30.07015 2.953589e-07
我想从三个字符串向量中提取 X 平方值和 p 值(仅限数字)。
smr.text1
[1] ""
[2] "\tPearson's Chi-squared test with Yates' continuity correction"
[3] ""
[4] "data: data$parasite and data$T1"
[5] "X-squared = 0.017361, df = 1, p-value = 0.8952"
[6] ""
smr.txt2
[1] ""
[2] "\tPearson's Chi-squared test with Yates' continuity correction"
[3] ""
[4] "data: data$parasite and data$T2"
[5] "X-squared = 2.5679e-32, df = 1, p-value = 1"
[6] ""
smr.text3
[1] ""
[2] "\tPearson's Chi-squared test with Yates' continuity correction"
[3] ""
[4] "data: data$parasite and data$T3"
[5] "X-squared = 0.17857, df = 1, p-value = 0.6726"
[6] ""
我很容易使用索引号从第一个字符串向量中提取这些值:
> c1 <- as.numeric(str_sub(smr.txt1[5], 13, 20))
> c1
[1] 0.017361
> p1 <- as.numeric(str_sub(smr.txt1[5], -6))
> p1
[1] 0.8952
但是在第二个字符串向量中我不能真正做同样的事情,因为它是一个科学数字。我也可以对第三个字符串向量做同样的事情,但是有没有更好的方法,例如使用循环只提取这些值并将它们放在同一个数据框中?提前致谢!
而不是 str_sub
(这是基于位置的,当 start/end 位置不是常量时它不会起作用,如示例 2 所示)我们可以使用正则表达式环视来提取 p-value 子串和后面带 .
的数字 (str_extract
)
library(stringr)
f1 <- function(x, categ ="p-value") {
as.numeric(str_extract(x,
glue::glue("(?<={categ} \= )[0-9.]+(e-[0-9]*)?")))
}
-测试
> f1("X-squared = 0.017361, df = 1, p-value = 0.8952")
[1] 0.8952
> f1("X-squared = 0.017361, df = 1, p-value = 0.8952", "X-squared")
[1] 0.017361
> f1("X-squared = 2.5679e-32, df = 1, p-value = 1")
[1] 1
> f1("X-squared = 2.5679e-32, df = 1, p-value = 1", "X-squared")
[1] 2.5679e-32
> f1("X-squared = 0.17857, df = 1, p-value = 0.6726")
[1] 0.6726
> f1("X-squared = 0.17857, df = 1, p-value = 0.6726", "X-squared")
[1] 0.17857
另一种选择是将列名称转换为 data.frame
'X-squared'、'p-value'、'df',然后提取列值
f2 <- function(x, categ = "p-value") {
x1 <- gsub(",\s*", "\n", gsub("\s*=\s*", ":", x))
type.convert(as.data.frame(read.dcf(textConnection(x1))),
as.is = TRUE)[[categ]]
}
-测试
> f2("X-squared = 0.17857, df = 1, p-value = 0.6726", "X-squared")
[1] 0.17857
> f2("X-squared = 0.017361, df = 1, p-value = 0.8952")
[1] 0.8952
> f2("X-squared = 0.017361, df = 1, p-value = 0.8952", "X-squared")
[1] 0.017361
> f2("X-squared = 2.5679e-32, df = 1, p-value = 1")
[1] 1
> f2("X-squared = 2.5679e-32, df = 1, p-value = 1", "X-squared")
[1] 2.5679e-32
> f2("X-squared = 0.17857, df = 1, p-value = 0.6726")
[1] 0.6726
> f2("X-squared = 0.17857, df = 1, p-value = 0.6726", "X-squared")
[1] 0.17857
不清楚为什么我们需要将 chisq.test
的输出 list
输出转换为字符串以进行提取,即从 chisq.test
的输出,使用 [= 更容易提取=22=] 或 [[
M <- as.table(rbind(c(762, 327, 468), c(484, 239, 477)))
dimnames(M) <- list(gender = c("F", "M"),
party = c("Democrat","Independent", "Republican"))
Xsq <- chisq.test(M)
Xsq$p.value
#[1] 2.953589e-07
Xsq$statistic[["X-squared"]]
[1] 30.07015
虽然不是您所要求的,但您似乎使用 capture.output(.)
来捕获这些字符串。我建议您不要尝试从捕获的输出中提取字符串,而是从对象本身获取 实数。
M <- as.table(rbind(c(762, 327, 468), c(484, 239, 477)))
dimnames(M) <- list(gender = c("F", "M"),
party = c("Democrat","Independent", "Republican"))
Xsq <- chisq.test(M)
names(Xsq)
# [1] "statistic" "parameter" "p.value" "method" "data.name" "observed" "expected" "residuals" "stdres"
Xsq[c("statistic","p.value")]
# $statistic
# X-squared
# 30.07015
# $p.value
# [1] 2.953589e-07
既然你提到有一个列表,那么使用它也很容易。例如,如果您有一个测试结果列表,如
Xsq2 <- lapply(list(M, M), chisq.test)
Xsq2
# [[1]]
# Pearson's Chi-squared test
# data: X[[i]]
# X-squared = 30.07, df = 2, p-value = 2.954e-07
# [[2]]
# Pearson's Chi-squared test
# data: X[[i]]
# X-squared = 30.07, df = 2, p-value = 2.954e-07
lapply(Xsq2, `[`, c("statistic", "p.value"))
# [[1]]
# [[1]]$statistic
# X-squared
# 30.07015
# [[1]]$p.value
# [1] 2.953589e-07
# [[2]]
# [[2]]$statistic
# X-squared
# 30.07015
# [[2]]$p.value
# [1] 2.953589e-07
可以很容易地转换成 data.frame
,其中:
do.call(rbind.data.frame, lapply(Xsq2, `[`, c("statistic", "p.value")))
# statistic p.value
# 1 30.07015 2.953589e-07
# 2 30.07015 2.953589e-07