如何使用另一列的结果并考虑另一列的条件，将数据框中的 NA 替换为特定值？

Question

我有一个 dataframe 由 9 列组成，有 4000 多个观察值。对于这个问题，我将提出一个更简单的 dataframe（我使用 tidyverse 库）

假设我有以下 dataframe:

library(tidyverse)
df <- tibble(Product = c("Bread","Oranges","Eggs","Bananas","Whole Bread" ),
             Weight = c(NA, 1, NA, NA, NA),
             Units = c(2,6,1,2,1),
             Price = c(1,3.5,0.5,0.75,1.5))
df

我想将 Weight 列的 NA 值替换为乘以 Units 结果的数字，具体取决于 Product 列显示的单词.基本上，规则如下：

Replace NA in Weight for 2.5*number of units if Product contains the word "Bread". Replace for 1 if Product contains the word "Eggs"

问题是我不知道如何编写 R 中那样的代码。我尝试了以下代码，这是一位好心的用户给我的类似问题：

df <- df %>%
mutate(Weight = case_when(Product == "bread" & is.na(Weight) ~ 0.25*Units))

但它不起作用，它没有考虑到如果我的 dataframe 中写有 "Whole Bread" 它也必须应用该规则。

有人知道吗？

Answer 1

有些不是完全匹配，所以使用str_detect

library(dplyr)
library(stringr)
df %>% 
   mutate(Weight = case_when(is.na(Weight) & 
     str_detect(Product, regex("Bread", ignore_case = TRUE)) ~ 2.5 * Units, 
     is.na(Weight) & Product == "Eggs"~ Units, TRUE ~ Weight))

-输出

# A tibble: 5 × 4
  Product     Weight Units Price
  <chr>        <dbl> <dbl> <dbl>
1 Bread          5       2  1   
2 Oranges        1       6  3.5 
3 Eggs           1       1  0.5 
4 Bananas       NA       2  0.75
5 Whole Bread    2.5     1  1.5

如何使用另一列的结果并考虑另一列的条件，将数据框中的 NA 替换为特定值？

How to replace NA in a dataframe for a specific value using the results of another column and taking into account conditions of another column?

replace

r

dataframe

na