在R中转换所有带小数的%

Question

我有一个大数据框，其中百分比写为 10% 而不是 .1。并非所有列都是百分比，但有相当一部分是。

有没有一种优雅的方法可以将所有 % 转换为小数？我特别担心百分比可能大于 100% 并且规则可以应用于整个 tibble 而不是我必须弄清楚要定位哪些列。

示例，如果不清楚... 这个：

tibble(cola = c("hello", "good bye", "hi there"), colb = c("10%", "20%", "100%"), colc = c(53, 67, 89),cold = c("10%", "200%", "50%") )

对此：

tibble(cola = c("hello", "good bye", "hi there"), colb = c(.10, .20, 1.0), colc = c(53, 67, 89),cold = c(.10, 2.0, .5) )

谢谢。

Answer 1

这是一个带有 across/mutate 的选项，其中我们 select 具有 character class 和 (&&) any 值的列%, mutate across 那些列，用 parse_number 提取数字部分并除以 100

library(dplyr) # 1.0.0
library(stringr)
df1 %>% 
    mutate(across(where(~ is.character(.) &&
         any(str_detect(., "%"))), ~ readr::parse_number(.)/100))
# A tibble: 3 x 4
#  cola      colb  colc  cold
#  <chr>    <dbl> <dbl> <dbl>
#1 hello      0.1    53   0.1
#2 good bye   0.2    67   2  
#3 hi there   1      89   0.5

Answer 2

写一个辅助函数，mutate_if根据它的值

is.percentage <- function(x) any(grepl("%$", x))

df1 %>%
  mutate_if(is.percentage, ~as.numeric(sub("%", "", .))/100)
## A tibble: 3 x 4
#  cola      colb  colc  cold
#  <chr>    <dbl> <dbl> <dbl>
#1 hello      0.1    53   0.1
#2 good bye   0.2    67   2  
#3 hi there   1      89   0.5

Answer 3

使用baseR，我们可以获得所有条目以“%”结尾的列名，将字符串末尾的“%”替换为“”并除以100。

idx <- rapply(dat, f = function(x) all(endsWith(x, "%")), classes = "character")
dat[names(idx)[idx]] <- lapply(dat[names(idx)[idx]], function(x) {
  as.integer(sub("%$", "", x)) / 100L
  })

结果

dat
#      cola colb colc cold
#1    hello  0.1   53  0.1
#2 good bye  0.2   67  2.0
#3 hi there  1.0   89  0.5

数据

dat <-
  data.frame(
    cola = c("hello", "good bye", "hi there"),
    colb = c("10%", "20%", "100%"),
    colc = c(53, 67, 89),
    cold = c("10%", "200%", "50%")
  )

在R中转换所有带小数的%

Convert all % with decimal in R

replace

r

character

percentage

dataframe