r 将 tibble 中保存为 char 的数字转换为 double NA 引入
r convert number saved as char in tibble to double NA introduced
我想将以下小标题中的相关列转换为数字(双精度):
# A tibble: 6 x 6
Date Open High Low Close Shares
<chr> <chr> <chr> <chr> <chr> <chr>
1 16.04.2021 53,64 54,12 53,64 54,12 50
2 15.04.2021 53,19 53,19 53,19 53,19 -
3 14.04.2021 53,29 53,29 53,29 53,29 -
4 13.04.2021 52,86 52,86 52,86 52,86 -
5 12.04.2021 53,17 53,17 53,17 53,17 -
6 09.04.2021 53,18 53,18 53,18 53,18 -
但是,如果我将 as.numeric
应用于相关列,则会引入 NA
。
在不生成 NA
的情况下将相关列中的条目转换为 double
的最有效方法是什么?
可重现的样本数据:
df <- tribble(
~Date, ~Open, ~High, ~Low, ~Close, ~Shares,
"16.04.2021", "53,64", "54,12", "53,64", "54,12", 50,
"15.04.2021", "53,19", "53,19", "53,19", "53,19", NA,
"14.04.2021", "53,29", "53,29", "53,29", "53,29", NA,
"13.04.2021", "52,86", "52,86", "52,86", "52,86", NA,
"12.04.2021", "53,17", "53,17", "53,17", "53,17", NA,
"09.04.2021", "53,18", "53,18", "53,18", "53,18", NA
)
您可以将逗号替换为点并转换为数字。使用 lapply
将函数应用于多个列。
df[2:5] <- lapply(df[2:5], function(x) as.numeric(sub(',', '.', x)))
使用dplyr
:
library(dplyr)
library(readr)
df %>%
mutate(across(Open:Close, ~parse_number(., locale = locale(decimal_mark = ","))))
无法将它们转换为数值的原因是 ,
作为小数点分隔符而不是 .
。所以你可以使用下面的代码:
library(dplyr)
library(stringr)
df %>%
mutate(across(Open:Close, ~ str_replace(., ",", "\.")),
across(Open:Close, as.numeric))
# A tibble: 6 x 6
Date Open High Low Close Shares
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 16.04.2021 53.6 54.1 53.6 54.1 50
2 15.04.2021 53.2 53.2 53.2 53.2 NA
3 14.04.2021 53.3 53.3 53.3 53.3 NA
4 13.04.2021 52.9 52.9 52.9 52.9 NA
5 12.04.2021 53.2 53.2 53.2 53.2 NA
6 09.04.2021 53.2 53.2 53.2 53.2 NA
首先转义“.”在你的正则表达式中。
其次将逗号替换为“.”在你可以转换为数字之前
df %>%
mutate(across(2:5, ~as.numeric(gsub(",", ".", gsub("\.", "", .)))))
输出:
Date Open High Low Close Shares
<chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 16.04.2021 53.6 54.1 53.6 54.1 50
2 15.04.2021 53.2 53.2 53.2 53.2 -
3 14.04.2021 53.3 53.3 53.3 53.3 -
4 13.04.2021 52.9 52.9 52.9 52.9 -
5 12.04.2021 53.2 53.2 53.2 53.2 -
6 09.04.2021 53.2 53.2 53.2 53.2 -
我想将以下小标题中的相关列转换为数字(双精度):
# A tibble: 6 x 6
Date Open High Low Close Shares
<chr> <chr> <chr> <chr> <chr> <chr>
1 16.04.2021 53,64 54,12 53,64 54,12 50
2 15.04.2021 53,19 53,19 53,19 53,19 -
3 14.04.2021 53,29 53,29 53,29 53,29 -
4 13.04.2021 52,86 52,86 52,86 52,86 -
5 12.04.2021 53,17 53,17 53,17 53,17 -
6 09.04.2021 53,18 53,18 53,18 53,18 -
但是,如果我将 as.numeric
应用于相关列,则会引入 NA
。
在不生成 NA
的情况下将相关列中的条目转换为 double
的最有效方法是什么?
可重现的样本数据:
df <- tribble(
~Date, ~Open, ~High, ~Low, ~Close, ~Shares,
"16.04.2021", "53,64", "54,12", "53,64", "54,12", 50,
"15.04.2021", "53,19", "53,19", "53,19", "53,19", NA,
"14.04.2021", "53,29", "53,29", "53,29", "53,29", NA,
"13.04.2021", "52,86", "52,86", "52,86", "52,86", NA,
"12.04.2021", "53,17", "53,17", "53,17", "53,17", NA,
"09.04.2021", "53,18", "53,18", "53,18", "53,18", NA
)
您可以将逗号替换为点并转换为数字。使用 lapply
将函数应用于多个列。
df[2:5] <- lapply(df[2:5], function(x) as.numeric(sub(',', '.', x)))
使用dplyr
:
library(dplyr)
library(readr)
df %>%
mutate(across(Open:Close, ~parse_number(., locale = locale(decimal_mark = ","))))
无法将它们转换为数值的原因是 ,
作为小数点分隔符而不是 .
。所以你可以使用下面的代码:
library(dplyr)
library(stringr)
df %>%
mutate(across(Open:Close, ~ str_replace(., ",", "\.")),
across(Open:Close, as.numeric))
# A tibble: 6 x 6
Date Open High Low Close Shares
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 16.04.2021 53.6 54.1 53.6 54.1 50
2 15.04.2021 53.2 53.2 53.2 53.2 NA
3 14.04.2021 53.3 53.3 53.3 53.3 NA
4 13.04.2021 52.9 52.9 52.9 52.9 NA
5 12.04.2021 53.2 53.2 53.2 53.2 NA
6 09.04.2021 53.2 53.2 53.2 53.2 NA
首先转义“.”在你的正则表达式中。
其次将逗号替换为“.”在你可以转换为数字之前
df %>%
mutate(across(2:5, ~as.numeric(gsub(",", ".", gsub("\.", "", .)))))
输出:
Date Open High Low Close Shares
<chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 16.04.2021 53.6 54.1 53.6 54.1 50
2 15.04.2021 53.2 53.2 53.2 53.2 -
3 14.04.2021 53.3 53.3 53.3 53.3 -
4 13.04.2021 52.9 52.9 52.9 52.9 -
5 12.04.2021 53.2 53.2 53.2 53.2 -
6 09.04.2021 53.2 53.2 53.2 53.2 -