将上标数字从字符串转换为科学记数法(来自 Unicode、UTF 8)

Convert superscripted numbers from string into scientific notation (from Unicode, UTF8)

我从 Excel table 中导入了一个 p 值向量。这些数字以带上标的 Unicode 字符串的形式给出。经过几个小时的尝试,我仍然很难将它们转换成数字。

参见下面的示例。使用 as.numeric() 的简单转换不起作用。我也尝试使用 Regex 来捕获上标数字,但事实证明每个上标数字都有一个不同的 Unicode 代码,没有翻译。

test <- c("0.0126", "0.000289", "4.26x10⁻¹⁴", "6.36x10⁻⁴⁸", 
          "4.35x10⁻⁹", "0.115", "0.0982", "0.000187", "0.0484", "0.000223")

as.numeric(test)

有人知道可以毫不费力地进行翻译的 R 包,还是我必须将代码一个一个地翻译成数字?

这种格式绝对不是很便携...不过这里有一个可能的解决方案,用于练习...

test <- c("0.0126", "0.000289", "4.26x10⁻¹⁴", "6.36x10⁻⁴⁸",
          "4.35x10⁻⁹", "0.115", "0.0982", "0.000187", "0.0484",
          "0.000223")

library(utf8)
library(stringr)

# normalize, ie everything to "normal text"
testnorm <- utf8_normalize(test, map_case = TRUE, map_compat = TRUE)

# replace exponent part
# \N{Minus Sign} is the unicode name of the minus sign symbol
# (see [ICU regex](http://userguide.icu-project.org/strings/regexp))
# it is necessary because the "-" is not a plain text minus sign...
testnorm <- str_replace_all(testnorm, "x10\N{Minus Sign}", "e-")

# evaluate these character strings
p_vals <- sapply(X = testnorm,
                    FUN = function(x) eval(parse(text = x)),
                    USE.NAMES = FALSE
)

# everything got adjusted to the "e-48" element...
format(p_vals, digits = 2, scientific = F)