转换混合单位测量值
Convert Mixed Unit Measurements
我有一个包含大量非标准化混合英制和公制测量值的文件,我想对其进行标准化并重新发布。
该范围的样本如下所示:
df <- data.frame(Measurements =c("1.25m", "2 Feet", "3 Inches", "5.5 cm"))
|Measurements|
|1.25m |
|2 Feet |
|3 Inches |
|5.5 cm |
我想看起来像这样:
|Measurements|MM_Conversion|
|1.25m |1200mm
|2 Feet |609.6mm
|3 Inches |76.2mm
|5.5 cm |55mm
我不能使用 measurements::conv_unit
或 units::set_unit
,因为它们似乎都需要数字输入值。有没有一种直接的方法可以解析值和字符串,并进行相应的转换?
编辑 1: 遇到 Conv_Unit 无法转换 NA 值的问题。如果初始向量改为:df <- data.frame(Measurements =c(NA, 1.25m", "2 Feet", "3 Inches", "5.5 cm"))
,您将如何绕过它?
这可以(轻松)完成,但您必须先确定测量值中的单位,因为接受的长度单位来自 measurements::conv_unit
# accepted units
# $length
# [1] "angstrom" "nm" "um" "mm" "cm" "dm" "m" "km" "inch" "ft" "yd" "fathom" "mi" "naut_mi"
# [15] "au" "light_yr" "parsec" "point"
所以,英寸必须变成 "inch",而 "feet" 应该变成 "ft"(执行一些正则表达式魔术 ;-))..但是...
library(tidyverse)
df <- data.frame( Measurements =c( "1.25m", "2 ft", "3 inch", "5.5 cm" ) )
df %>%
#extract the numeric and the unit-parts from the string
mutate( num_part = as.numeric( stringr::str_extract( Measurements, "\d+\.*\d*" ) ),
unit_part = stringr::str_extract( Measurements, "[a-zA-Z]+" ) ) %>%
#perform a rowwise operation
rowwise() %>%
#convert the units to mm, row-by-row
mutate( in_mm = conv_unit( num_part, unit_part, "mm" ) )
# Source: local data frame [4 x 4]
# Groups: <by row>
# # A tibble: 4 x 4
# Measurements num_part unit_part in_mm
# <fct> <dbl> <chr> <dbl>
# 1 1.25m 1.25 m 1250
# 2 2 ft 2 ft 610.
# 3 3 inch 3 inch 76.2
# 4 5.5 cm 5.5 cm 55
我们可以使用 tidyr
中的 extract
来分隔值和单位,然后使用 map2
:
将其输入 conv_unit
df <- data.frame(Measurements =c(NA, "1.25m", "2 Feet", "3 Inches", "5.5 cm"))
library(tidyverse)
library(stringr)
library(measurements)
df %>%
extract(Measurements, c("value", "unit"),
regex = "^([\d.]+)\s*([[:alpha:]]+)$",
remove = FALSE, convert = TRUE) %>%
mutate(unit = str_replace_all(unit, c(Feet="ft", Inches="inch")),
MM_Conversion = paste0(map2(value, unit, ~if(!is.na(.x)) conv_unit(.x, .y, "mm") else NA), "mm"))
结果:
Measurements value unit MM_Conversion
1 <NA> NA <NA> NAmm
2 1.25m 1.25 m 1250mm
3 2 Feet 2.00 ft 609.6mm
4 3 Inches 3.00 inch 76.2mm
5 5.5 cm 5.50 cm 55mm
或使用 filter
如果 NA
s 不应出现在最终输出中:
df %>%
extract(Measurements, c("value", "unit"),
regex = "^([\d.]+)\s*([[:alpha:]]+)$",
remove = FALSE, convert = TRUE) %>%
filter(!is.na(Measurements)) %>%
mutate(unit = str_replace_all(unit, c(Feet="ft", Inches="inch")),
MM_Conversion = paste0(map2(value, unit, ~conv_unit(.x, .y, "mm")), "mm"))
结果:
Measurements value unit MM_Conversion
1 1.25m 1.25 m 1250mm
2 2 Feet 2.00 ft 609.6mm
3 3 Inches 3.00 inch 76.2mm
4 5.5 cm 5.50 cm 55mm
请注意我是如何手动缩写原始单位以使 conv_unit
起作用的。如果原始单位已经是缩写形式,那就少了一步。
我有一个包含大量非标准化混合英制和公制测量值的文件,我想对其进行标准化并重新发布。
该范围的样本如下所示:
df <- data.frame(Measurements =c("1.25m", "2 Feet", "3 Inches", "5.5 cm"))
|Measurements|
|1.25m |
|2 Feet |
|3 Inches |
|5.5 cm |
我想看起来像这样:
|Measurements|MM_Conversion|
|1.25m |1200mm
|2 Feet |609.6mm
|3 Inches |76.2mm
|5.5 cm |55mm
我不能使用 measurements::conv_unit
或 units::set_unit
,因为它们似乎都需要数字输入值。有没有一种直接的方法可以解析值和字符串,并进行相应的转换?
编辑 1: 遇到 Conv_Unit 无法转换 NA 值的问题。如果初始向量改为:df <- data.frame(Measurements =c(NA, 1.25m", "2 Feet", "3 Inches", "5.5 cm"))
,您将如何绕过它?
这可以(轻松)完成,但您必须先确定测量值中的单位,因为接受的长度单位来自 measurements::conv_unit
# accepted units
# $length
# [1] "angstrom" "nm" "um" "mm" "cm" "dm" "m" "km" "inch" "ft" "yd" "fathom" "mi" "naut_mi"
# [15] "au" "light_yr" "parsec" "point"
所以,英寸必须变成 "inch",而 "feet" 应该变成 "ft"(执行一些正则表达式魔术 ;-))..但是...
library(tidyverse)
df <- data.frame( Measurements =c( "1.25m", "2 ft", "3 inch", "5.5 cm" ) )
df %>%
#extract the numeric and the unit-parts from the string
mutate( num_part = as.numeric( stringr::str_extract( Measurements, "\d+\.*\d*" ) ),
unit_part = stringr::str_extract( Measurements, "[a-zA-Z]+" ) ) %>%
#perform a rowwise operation
rowwise() %>%
#convert the units to mm, row-by-row
mutate( in_mm = conv_unit( num_part, unit_part, "mm" ) )
# Source: local data frame [4 x 4]
# Groups: <by row>
# # A tibble: 4 x 4
# Measurements num_part unit_part in_mm
# <fct> <dbl> <chr> <dbl>
# 1 1.25m 1.25 m 1250
# 2 2 ft 2 ft 610.
# 3 3 inch 3 inch 76.2
# 4 5.5 cm 5.5 cm 55
我们可以使用 tidyr
中的 extract
来分隔值和单位,然后使用 map2
:
conv_unit
df <- data.frame(Measurements =c(NA, "1.25m", "2 Feet", "3 Inches", "5.5 cm"))
library(tidyverse)
library(stringr)
library(measurements)
df %>%
extract(Measurements, c("value", "unit"),
regex = "^([\d.]+)\s*([[:alpha:]]+)$",
remove = FALSE, convert = TRUE) %>%
mutate(unit = str_replace_all(unit, c(Feet="ft", Inches="inch")),
MM_Conversion = paste0(map2(value, unit, ~if(!is.na(.x)) conv_unit(.x, .y, "mm") else NA), "mm"))
结果:
Measurements value unit MM_Conversion
1 <NA> NA <NA> NAmm
2 1.25m 1.25 m 1250mm
3 2 Feet 2.00 ft 609.6mm
4 3 Inches 3.00 inch 76.2mm
5 5.5 cm 5.50 cm 55mm
或使用 filter
如果 NA
s 不应出现在最终输出中:
df %>%
extract(Measurements, c("value", "unit"),
regex = "^([\d.]+)\s*([[:alpha:]]+)$",
remove = FALSE, convert = TRUE) %>%
filter(!is.na(Measurements)) %>%
mutate(unit = str_replace_all(unit, c(Feet="ft", Inches="inch")),
MM_Conversion = paste0(map2(value, unit, ~conv_unit(.x, .y, "mm")), "mm"))
结果:
Measurements value unit MM_Conversion
1 1.25m 1.25 m 1250mm
2 2 Feet 2.00 ft 609.6mm
3 3 Inches 3.00 inch 76.2mm
4 5.5 cm 5.50 cm 55mm
请注意我是如何手动缩写原始单位以使 conv_unit
起作用的。如果原始单位已经是缩写形式,那就少了一步。