r 将 tibble 中保存为 char 的数字转换为 double NA 引入

r convert number saved as char in tibble to double NA introduced

我想将以下小标题中的相关列转换为数字(双精度):

# A tibble: 6 x 6
  Date       Open      High       Low      Close   Shares
  <chr>      <chr>     <chr>     <chr>     <chr>   <chr> 
1 16.04.2021 53,64     54,12     53,64     54,12   50    
2 15.04.2021 53,19     53,19     53,19     53,19   -     
3 14.04.2021 53,29     53,29     53,29     53,29   -     
4 13.04.2021 52,86     52,86     52,86     52,86   -     
5 12.04.2021 53,17     53,17     53,17     53,17   -     
6 09.04.2021 53,18     53,18     53,18     53,18   -     

但是,如果我将 as.numeric 应用于相关列,则会引入 NA

在不生成 NA 的情况下将相关列中的条目转换为 double 的最有效方法是什么?

可重现的样本数据:

df <- tribble(
  ~Date,       ~Open,      ~High,       ~Low,      ~Close,   ~Shares,
 "16.04.2021",  "53,64",     "54,12",     "53,64",     "54,12",   50,    
 "15.04.2021",  "53,19",     "53,19",     "53,19",     "53,19",   NA,     
 "14.04.2021",  "53,29",     "53,29",     "53,29",     "53,29",   NA,     
 "13.04.2021",  "52,86",     "52,86",     "52,86",     "52,86",   NA,     
 "12.04.2021",  "53,17",     "53,17",     "53,17",     "53,17",   NA,     
 "09.04.2021",  "53,18",     "53,18",     "53,18",     "53,18",   NA 
)

您可以将逗号替换为点并转换为数字。使用 lapply 将函数应用于多个列。

df[2:5] <- lapply(df[2:5], function(x) as.numeric(sub(',', '.', x)))

使用dplyr

library(dplyr)
library(readr)

df %>%
  mutate(across(Open:Close, ~parse_number(., locale = locale(decimal_mark = ","))))

无法将它们转换为数值的原因是 , 作为小数点分隔符而不是 .。所以你可以使用下面的代码:

library(dplyr)
library(stringr)

df %>%
  mutate(across(Open:Close, ~ str_replace(., ",", "\.")),
         across(Open:Close, as.numeric))

# A tibble: 6 x 6
  Date        Open  High   Low Close Shares
  <chr>      <dbl> <dbl> <dbl> <dbl>  <dbl>
1 16.04.2021  53.6  54.1  53.6  54.1     50
2 15.04.2021  53.2  53.2  53.2  53.2     NA
3 14.04.2021  53.3  53.3  53.3  53.3     NA
4 13.04.2021  52.9  52.9  52.9  52.9     NA
5 12.04.2021  53.2  53.2  53.2  53.2     NA
6 09.04.2021  53.2  53.2  53.2  53.2     NA

首先转义“.”在你的正则表达式中。

其次将逗号替换为“.”在你可以转换为数字之前

df  %>% 
  mutate(across(2:5, ~as.numeric(gsub(",", ".", gsub("\.", "", .)))))

输出:

  Date        Open  High   Low Close Shares
  <chr>      <dbl> <dbl> <dbl> <dbl> <chr> 
1 16.04.2021  53.6  54.1  53.6  54.1 50    
2 15.04.2021  53.2  53.2  53.2  53.2 -     
3 14.04.2021  53.3  53.3  53.3  53.3 -     
4 13.04.2021  52.9  52.9  52.9  52.9 -     
5 12.04.2021  53.2  53.2  53.2  53.2 -     
6 09.04.2021  53.2  53.2  53.2  53.2 -