如何使用另一列的结果并考虑另一列的条件,将数据框中的 NA 替换为特定值?
How to replace NA in a dataframe for a specific value using the results of another column and taking into account conditions of another column?
我有一个 dataframe
由 9 列组成,有 4000 多个观察值。对于这个问题,我将提出一个更简单的 dataframe
(我使用 tidyverse
库)
假设我有以下 dataframe
:
library(tidyverse)
df <- tibble(Product = c("Bread","Oranges","Eggs","Bananas","Whole Bread" ),
Weight = c(NA, 1, NA, NA, NA),
Units = c(2,6,1,2,1),
Price = c(1,3.5,0.5,0.75,1.5))
df
我想将 Weight
列的 NA
值替换为乘以 Units
结果的数字,具体取决于 Product
列显示的单词.基本上,规则如下:
Replace NA in Weight for 2.5*number of units if Product contains the word "Bread". Replace for 1 if Product contains the word "Eggs"
问题是我不知道如何编写 R
中那样的代码。我尝试了以下代码,这是一位好心的用户给我的类似问题:
df <- df %>%
mutate(Weight = case_when(Product == "bread" & is.na(Weight) ~ 0.25*Units))
但它不起作用,它没有考虑到如果我的 dataframe
中写有 "Whole Bread"
它也必须应用该规则。
有人知道吗?
有些不是完全匹配,所以使用str_detect
library(dplyr)
library(stringr)
df %>%
mutate(Weight = case_when(is.na(Weight) &
str_detect(Product, regex("Bread", ignore_case = TRUE)) ~ 2.5 * Units,
is.na(Weight) & Product == "Eggs"~ Units, TRUE ~ Weight))
-输出
# A tibble: 5 × 4
Product Weight Units Price
<chr> <dbl> <dbl> <dbl>
1 Bread 5 2 1
2 Oranges 1 6 3.5
3 Eggs 1 1 0.5
4 Bananas NA 2 0.75
5 Whole Bread 2.5 1 1.5
我有一个 dataframe
由 9 列组成,有 4000 多个观察值。对于这个问题,我将提出一个更简单的 dataframe
(我使用 tidyverse
库)
假设我有以下 dataframe
:
library(tidyverse)
df <- tibble(Product = c("Bread","Oranges","Eggs","Bananas","Whole Bread" ),
Weight = c(NA, 1, NA, NA, NA),
Units = c(2,6,1,2,1),
Price = c(1,3.5,0.5,0.75,1.5))
df
我想将 Weight
列的 NA
值替换为乘以 Units
结果的数字,具体取决于 Product
列显示的单词.基本上,规则如下:
Replace NA in Weight for 2.5*number of units if Product contains the word "Bread". Replace for 1 if Product contains the word "Eggs"
问题是我不知道如何编写 R
中那样的代码。我尝试了以下代码,这是一位好心的用户给我的类似问题:
df <- df %>%
mutate(Weight = case_when(Product == "bread" & is.na(Weight) ~ 0.25*Units))
但它不起作用,它没有考虑到如果我的 dataframe
中写有 "Whole Bread"
它也必须应用该规则。
有人知道吗?
有些不是完全匹配,所以使用str_detect
library(dplyr)
library(stringr)
df %>%
mutate(Weight = case_when(is.na(Weight) &
str_detect(Product, regex("Bread", ignore_case = TRUE)) ~ 2.5 * Units,
is.na(Weight) & Product == "Eggs"~ Units, TRUE ~ Weight))
-输出
# A tibble: 5 × 4
Product Weight Units Price
<chr> <dbl> <dbl> <dbl>
1 Bread 5 2 1
2 Oranges 1 6 3.5
3 Eggs 1 1 0.5
4 Bananas NA 2 0.75
5 Whole Bread 2.5 1 1.5