在 case_when() 语句中将因子变异为新变量

Mutate factor to new variable in case_when() statement

数据设置

我有一个数据集,看起来有点像下面这个简单的数据框:

CAD_EXCHANGE <- 1.34
EUR_EXCHANGE <- 0.88
 
df <- tibble(
  shipment = c("A", "B", "C", "D", "E"),
  invoice = c(rep(500, 5)),
  currency = factor(c("USD", "EUR", "CAD", NA, "SDD"))
)
 
df
# A tibble: 5 x 3
  shipment invoice currency
  <chr>      <dbl> <fct>   
1 A            500 USD     
2 B            500 EUR     
3 C            500 CAD     
4 D            500 NA      
5 E            500 SDD     

levels(df$currency)
[1] "CAD" "EUR" "SDD" "USD"

最终目标

我正在尝试将一些常见其他货币(欧元和加元)的发票转换为美元,但不是全部,或者如果数据丢失(即 SDD 和 NA)。我的最终数据框应如下所示:

# A tibble: 5 x 5
  shipment invoice currency invoice_converted currency_converted
  <chr>      <dbl> <fct>                <dbl> <fct>             
1 A            500 USD                    500 USD               
2 B            500 EUR                    568 USD               
3 C            500 CAD                    373 USD               
4 D            500 NA                     500 NA                
5 E            500 SDD                    500 SDD               

试验 1 -- 不起作用

以后我换算的货币可能不止这几种,所以我应用了case_when()语句。这是我的第一次尝试:

df_USD1 <- df %>%
  mutate(
    invoice_converted = case_when(
      currency == "EUR" ~ round(invoice / EUR_EXCHANGE),
      currency == "CAD" ~ round(invoice / CAD_EXCHANGE),
      TRUE ~ invoice
    ),
    currency_converted = case_when(currency == "EUR" ~ "USD",
                                   currency == "CAD" ~ "USD",
                                   TRUE ~ currency)
  )

Error: Problem with `mutate()` column `currency_converted`.
i `currency_converted = case_when(...)`.
x must be a character vector, not a `factor` object.

通过上面的内容,我明白我在分配给 currency_converted 时混合了字符和因素,因为我有默认的 TRUE ~ currency(而 currency 是一个因素)。所以我尝试只使用因子进行分配...

试验 2 -- 有效,但不可靠

df_USD2 <- df %>%
  mutate(
    invoice_converted = case_when(
      currency == "EUR" ~ round(invoice / EUR_EXCHANGE),
      currency == "CAD" ~ round(invoice / CAD_EXCHANGE),
      TRUE ~ invoice
    ),
    currency_converted = case_when(
      currency == "EUR" ~ currency[1],
      currency == "CAD" ~ currency[1],
      TRUE ~ currency)
  )

它有效,但这只是因为在我对这个问题的设置中,美元处于第一位,我不能依赖它。

> df$currency
[1] USD  EUR  CAD  <NA> SDD 
Levels: CAD EUR SDD USD

试验 3 -- 不起作用

我想我可以尝试一些其他的方法来获取子集的因素,但这不起作用:

df_USD3 <- df %>%
  mutate(
    invoice_converted = case_when(
      currency == "EUR" ~ round(invoice / EUR_EXCHANGE),
      currency == "CAD" ~ round(invoice / CAD_EXCHANGE),
      TRUE ~ invoice
    ),
    currency_converted = case_when(
      currency == "EUR" ~ df$currency[df$currency == "USD"],
      currency == "CAD" ~ df$currency[df$currency == "USD"],
      TRUE ~ currency
    )
  )

Error: Problem with `mutate()` column `currency_converted`.
i `currency_converted = factor(...)`.
x `currency == "EUR" ~ df$currency[df$currency == "USD"]`, `currency == "CAD" ~ df$currency[df$currency == "USD"]` must be length 5 or one, not 2.
Run `rlang::last_error()` to see where the error occurred.

似乎是因为返回了 NA...

> df$currency[df$currency == "USD"]
[1] USD  <NA>
Levels: CAD EUR SDD USD

...因为如果我回到原来的 df 并将 NA 替换为其他货币,它会起作用——但显然我需要能够保持 NA 它属于哪里。

我觉得有一些非常好的方法可以做到这一点,但尽管阅读了各种因素并尝试了不同的方法,但我还是想念它。帮助?

case_when 不会自动进行类型转换 - 即 currencyfactorcase_when 中其他条件的 returns 只是 character。因此,我们可以强制将 currency 转换为 character 以使所有 returns 相同 class 并且它应该工作

library(dplyr)
df %>%
  mutate(
    invoice_converted = case_when(
      currency == "EUR" ~ round(invoice / EUR_EXCHANGE),
      currency == "CAD" ~ round(invoice / CAD_EXCHANGE),
      TRUE ~ invoice
    ), currency_converted = case_when(currency == "EUR" ~ "USD",
                                   currency == "CAD" ~ "USD",
                                   TRUE ~ as.character(currency)))

-输出

# A tibble: 5 × 5
  shipment invoice currency invoice_converted currency_converted
  <chr>      <dbl> <fct>                <dbl> <chr>             
1 A            500 USD                    500 USD               
2 B            500 EUR                    568 USD               
3 C            500 CAD                    373 USD               
4 D            500 <NA>                   500 <NA>              
5 E            500 SDD                    500 SDD             

如果我们想将其保留为 factor,请在 case_when 之后用 factor 换行,或者直接使用 fct_recode 而不是 case_when

library(forcats)
df %>%
  mutate(
    invoice_converted = case_when(
      currency == "EUR" ~ round(invoice / EUR_EXCHANGE),
      currency == "CAD" ~ round(invoice / CAD_EXCHANGE),
      TRUE ~ invoice
    ), currency_converted = fct_recode(currency, USD = "EUR", USD = "CAD"))

-输出

# A tibble: 5 × 5
  shipment invoice currency invoice_converted currency_converted
  <chr>      <dbl> <fct>                <dbl> <fct>             
1 A            500 USD                    500 USD               
2 B            500 EUR                    568 USD               
3 C            500 CAD                    373 USD               
4 D            500 <NA>                   500 <NA>              
5 E            500 SDD                    500 SDD