在 case_when() 语句中将因子变异为新变量
Mutate factor to new variable in case_when() statement
数据设置
我有一个数据集,看起来有点像下面这个简单的数据框:
CAD_EXCHANGE <- 1.34
EUR_EXCHANGE <- 0.88
df <- tibble(
shipment = c("A", "B", "C", "D", "E"),
invoice = c(rep(500, 5)),
currency = factor(c("USD", "EUR", "CAD", NA, "SDD"))
)
df
# A tibble: 5 x 3
shipment invoice currency
<chr> <dbl> <fct>
1 A 500 USD
2 B 500 EUR
3 C 500 CAD
4 D 500 NA
5 E 500 SDD
levels(df$currency)
[1] "CAD" "EUR" "SDD" "USD"
最终目标
我正在尝试将一些常见其他货币(欧元和加元)的发票转换为美元,但不是全部,或者如果数据丢失(即 SDD 和 NA
)。我的最终数据框应如下所示:
# A tibble: 5 x 5
shipment invoice currency invoice_converted currency_converted
<chr> <dbl> <fct> <dbl> <fct>
1 A 500 USD 500 USD
2 B 500 EUR 568 USD
3 C 500 CAD 373 USD
4 D 500 NA 500 NA
5 E 500 SDD 500 SDD
试验 1 -- 不起作用
以后我换算的货币可能不止这几种,所以我应用了case_when()
语句。这是我的第一次尝试:
df_USD1 <- df %>%
mutate(
invoice_converted = case_when(
currency == "EUR" ~ round(invoice / EUR_EXCHANGE),
currency == "CAD" ~ round(invoice / CAD_EXCHANGE),
TRUE ~ invoice
),
currency_converted = case_when(currency == "EUR" ~ "USD",
currency == "CAD" ~ "USD",
TRUE ~ currency)
)
Error: Problem with `mutate()` column `currency_converted`.
i `currency_converted = case_when(...)`.
x must be a character vector, not a `factor` object.
通过上面的内容,我明白我在分配给 currency_converted
时混合了字符和因素,因为我有默认的 TRUE ~ currency
(而 currency
是一个因素)。所以我尝试只使用因子进行分配...
试验 2 -- 有效,但不可靠
df_USD2 <- df %>%
mutate(
invoice_converted = case_when(
currency == "EUR" ~ round(invoice / EUR_EXCHANGE),
currency == "CAD" ~ round(invoice / CAD_EXCHANGE),
TRUE ~ invoice
),
currency_converted = case_when(
currency == "EUR" ~ currency[1],
currency == "CAD" ~ currency[1],
TRUE ~ currency)
)
它有效,但这只是因为在我对这个问题的设置中,美元处于第一位,我不能依赖它。
> df$currency
[1] USD EUR CAD <NA> SDD
Levels: CAD EUR SDD USD
试验 3 -- 不起作用
我想我可以尝试一些其他的方法来获取子集的因素,但这不起作用:
df_USD3 <- df %>%
mutate(
invoice_converted = case_when(
currency == "EUR" ~ round(invoice / EUR_EXCHANGE),
currency == "CAD" ~ round(invoice / CAD_EXCHANGE),
TRUE ~ invoice
),
currency_converted = case_when(
currency == "EUR" ~ df$currency[df$currency == "USD"],
currency == "CAD" ~ df$currency[df$currency == "USD"],
TRUE ~ currency
)
)
Error: Problem with `mutate()` column `currency_converted`.
i `currency_converted = factor(...)`.
x `currency == "EUR" ~ df$currency[df$currency == "USD"]`, `currency == "CAD" ~ df$currency[df$currency == "USD"]` must be length 5 or one, not 2.
Run `rlang::last_error()` to see where the error occurred.
似乎是因为返回了 NA
...
> df$currency[df$currency == "USD"]
[1] USD <NA>
Levels: CAD EUR SDD USD
...因为如果我回到原来的 df
并将 NA
替换为其他货币,它会起作用——但显然我需要能够保持 NA
它属于哪里。
我觉得有一些非常好的方法可以做到这一点,但尽管阅读了各种因素并尝试了不同的方法,但我还是想念它。帮助?
case_when
不会自动进行类型转换 - 即 currency
是 factor
而 case_when
中其他条件的 returns 只是 character
。因此,我们可以强制将 currency
转换为 character
以使所有 returns 相同 class 并且它应该工作
library(dplyr)
df %>%
mutate(
invoice_converted = case_when(
currency == "EUR" ~ round(invoice / EUR_EXCHANGE),
currency == "CAD" ~ round(invoice / CAD_EXCHANGE),
TRUE ~ invoice
), currency_converted = case_when(currency == "EUR" ~ "USD",
currency == "CAD" ~ "USD",
TRUE ~ as.character(currency)))
-输出
# A tibble: 5 × 5
shipment invoice currency invoice_converted currency_converted
<chr> <dbl> <fct> <dbl> <chr>
1 A 500 USD 500 USD
2 B 500 EUR 568 USD
3 C 500 CAD 373 USD
4 D 500 <NA> 500 <NA>
5 E 500 SDD 500 SDD
如果我们想将其保留为 factor
,请在 case_when
之后用 factor
换行,或者直接使用 fct_recode
而不是 case_when
library(forcats)
df %>%
mutate(
invoice_converted = case_when(
currency == "EUR" ~ round(invoice / EUR_EXCHANGE),
currency == "CAD" ~ round(invoice / CAD_EXCHANGE),
TRUE ~ invoice
), currency_converted = fct_recode(currency, USD = "EUR", USD = "CAD"))
-输出
# A tibble: 5 × 5
shipment invoice currency invoice_converted currency_converted
<chr> <dbl> <fct> <dbl> <fct>
1 A 500 USD 500 USD
2 B 500 EUR 568 USD
3 C 500 CAD 373 USD
4 D 500 <NA> 500 <NA>
5 E 500 SDD 500 SDD
数据设置
我有一个数据集,看起来有点像下面这个简单的数据框:
CAD_EXCHANGE <- 1.34
EUR_EXCHANGE <- 0.88
df <- tibble(
shipment = c("A", "B", "C", "D", "E"),
invoice = c(rep(500, 5)),
currency = factor(c("USD", "EUR", "CAD", NA, "SDD"))
)
df
# A tibble: 5 x 3
shipment invoice currency
<chr> <dbl> <fct>
1 A 500 USD
2 B 500 EUR
3 C 500 CAD
4 D 500 NA
5 E 500 SDD
levels(df$currency)
[1] "CAD" "EUR" "SDD" "USD"
最终目标
我正在尝试将一些常见其他货币(欧元和加元)的发票转换为美元,但不是全部,或者如果数据丢失(即 SDD 和 NA
)。我的最终数据框应如下所示:
# A tibble: 5 x 5
shipment invoice currency invoice_converted currency_converted
<chr> <dbl> <fct> <dbl> <fct>
1 A 500 USD 500 USD
2 B 500 EUR 568 USD
3 C 500 CAD 373 USD
4 D 500 NA 500 NA
5 E 500 SDD 500 SDD
试验 1 -- 不起作用
以后我换算的货币可能不止这几种,所以我应用了case_when()
语句。这是我的第一次尝试:
df_USD1 <- df %>%
mutate(
invoice_converted = case_when(
currency == "EUR" ~ round(invoice / EUR_EXCHANGE),
currency == "CAD" ~ round(invoice / CAD_EXCHANGE),
TRUE ~ invoice
),
currency_converted = case_when(currency == "EUR" ~ "USD",
currency == "CAD" ~ "USD",
TRUE ~ currency)
)
Error: Problem with `mutate()` column `currency_converted`.
i `currency_converted = case_when(...)`.
x must be a character vector, not a `factor` object.
通过上面的内容,我明白我在分配给 currency_converted
时混合了字符和因素,因为我有默认的 TRUE ~ currency
(而 currency
是一个因素)。所以我尝试只使用因子进行分配...
试验 2 -- 有效,但不可靠
df_USD2 <- df %>%
mutate(
invoice_converted = case_when(
currency == "EUR" ~ round(invoice / EUR_EXCHANGE),
currency == "CAD" ~ round(invoice / CAD_EXCHANGE),
TRUE ~ invoice
),
currency_converted = case_when(
currency == "EUR" ~ currency[1],
currency == "CAD" ~ currency[1],
TRUE ~ currency)
)
它有效,但这只是因为在我对这个问题的设置中,美元处于第一位,我不能依赖它。
> df$currency
[1] USD EUR CAD <NA> SDD
Levels: CAD EUR SDD USD
试验 3 -- 不起作用
我想我可以尝试一些其他的方法来获取子集的因素,但这不起作用:
df_USD3 <- df %>%
mutate(
invoice_converted = case_when(
currency == "EUR" ~ round(invoice / EUR_EXCHANGE),
currency == "CAD" ~ round(invoice / CAD_EXCHANGE),
TRUE ~ invoice
),
currency_converted = case_when(
currency == "EUR" ~ df$currency[df$currency == "USD"],
currency == "CAD" ~ df$currency[df$currency == "USD"],
TRUE ~ currency
)
)
Error: Problem with `mutate()` column `currency_converted`.
i `currency_converted = factor(...)`.
x `currency == "EUR" ~ df$currency[df$currency == "USD"]`, `currency == "CAD" ~ df$currency[df$currency == "USD"]` must be length 5 or one, not 2.
Run `rlang::last_error()` to see where the error occurred.
似乎是因为返回了 NA
...
> df$currency[df$currency == "USD"]
[1] USD <NA>
Levels: CAD EUR SDD USD
...因为如果我回到原来的 df
并将 NA
替换为其他货币,它会起作用——但显然我需要能够保持 NA
它属于哪里。
我觉得有一些非常好的方法可以做到这一点,但尽管阅读了各种因素并尝试了不同的方法,但我还是想念它。帮助?
case_when
不会自动进行类型转换 - 即 currency
是 factor
而 case_when
中其他条件的 returns 只是 character
。因此,我们可以强制将 currency
转换为 character
以使所有 returns 相同 class 并且它应该工作
library(dplyr)
df %>%
mutate(
invoice_converted = case_when(
currency == "EUR" ~ round(invoice / EUR_EXCHANGE),
currency == "CAD" ~ round(invoice / CAD_EXCHANGE),
TRUE ~ invoice
), currency_converted = case_when(currency == "EUR" ~ "USD",
currency == "CAD" ~ "USD",
TRUE ~ as.character(currency)))
-输出
# A tibble: 5 × 5
shipment invoice currency invoice_converted currency_converted
<chr> <dbl> <fct> <dbl> <chr>
1 A 500 USD 500 USD
2 B 500 EUR 568 USD
3 C 500 CAD 373 USD
4 D 500 <NA> 500 <NA>
5 E 500 SDD 500 SDD
如果我们想将其保留为 factor
,请在 case_when
之后用 factor
换行,或者直接使用 fct_recode
而不是 case_when
library(forcats)
df %>%
mutate(
invoice_converted = case_when(
currency == "EUR" ~ round(invoice / EUR_EXCHANGE),
currency == "CAD" ~ round(invoice / CAD_EXCHANGE),
TRUE ~ invoice
), currency_converted = fct_recode(currency, USD = "EUR", USD = "CAD"))
-输出
# A tibble: 5 × 5
shipment invoice currency invoice_converted currency_converted
<chr> <dbl> <fct> <dbl> <fct>
1 A 500 USD 500 USD
2 B 500 EUR 568 USD
3 C 500 CAD 373 USD
4 D 500 <NA> 500 <NA>
5 E 500 SDD 500 SDD