根据其他两个具有多个条件的列添加新列,字符
add new column based on two other columns with several conditions, character
我想根据其他两列向我的数据框中添加一个新列。数据如下所示:
df
job honorary
yes yes
yes no
no yes
yes yes
yes NA
NA no
现在我想要第三列包含“两者”,如果工作和荣誉是“是”,“荣誉”如果只有荣誉列包含“是”,“工作”如果只有工作列包含是,NA如果两者都包含 NA 或一列包含 NA 而另一列没有。第三列应如下所示:
result
both
job
honorary
both
job
NA
我尝试过使用 if 和 mutate 编写代码,但我对 R 还很陌生,我的代码根本不起作用。
如果我像这样单独分配值:
data_nature_fewmissing$urbandnat[data_nature_fewmissing$nature =="yes" & data_nature_fewmissing$urbangreen =="yes"] <- "yes"
它不起作用,因为我在每一步都覆盖了之前的结果。
感谢您的帮助!
对于这些类型的复杂条件,我喜欢 dplyr
中的 case_when
。
df<-tibble::tribble(
~job, ~honorary,
"yes", "yes",
"yes", "no",
"no", "yes",
"yes", "yes",
"yes", NA,
NA, "no"
)
library(dplyr)
df_new <- df %>%
mutate(result=case_when(
job=="yes" & honorary=="yes" ~ "both",
honorary=="yes" ~ "honorary",
job=="yes" ~ "job",
is.na(honorary) & is.na(job) ~ NA_character_,
is.na(honorary) & job=="no" ~ NA_character_,
is.na(job) & honorary=="no" ~ NA_character_,
TRUE ~ "other"
))
df_new
#> # A tibble: 6 × 3
#> job honorary result
#> <chr> <chr> <chr>
#> 1 yes yes both
#> 2 yes no job
#> 3 no yes honorary
#> 4 yes yes both
#> 5 yes <NA> job
#> 6 <NA> no <NA>
或以 R 为基数
df_new<-df
df_new=within(df_new,{
result=NA
result[ honorary=="yes"] = "honorary"
result[ job=="yes"] = "job"
result[job=="yes" & honorary=="yes"]='both'
})
由 reprex package (v2.0.1)
创建于 2022-01-16
您的代码 returns 出错,因为您没有为行编制索引。索引数据帧时,语法为 df[rows, columns]
。所以要索引行和 select 所有列,你必须添加一个逗号:
data_nature_fewmissing$urbandnat[data_nature_fewmissing$nature =="yes" & data_nature_fewmissing$urbangreen =="yes",] <- "yes"
然而,更简单的方法是使用 tidyverse。我们将使用 mutate
创建新列,并使用 case_when
处理多个 if-else 条件。
library(tidyverse)
df = data_nature_fewmissing
df %>% mutate(result = case_when(
job == 'yes' & honorary == 'yes' ~ 'both',
job == 'yes' & (honorary == 'no' | is.na(honorary)) ~ 'job',
honorary == 'yes' & (job == 'no' | is.na(job)) ~ 'honorary',
))
我想根据其他两列向我的数据框中添加一个新列。数据如下所示:
df
job honorary
yes yes
yes no
no yes
yes yes
yes NA
NA no
现在我想要第三列包含“两者”,如果工作和荣誉是“是”,“荣誉”如果只有荣誉列包含“是”,“工作”如果只有工作列包含是,NA如果两者都包含 NA 或一列包含 NA 而另一列没有。第三列应如下所示:
result
both
job
honorary
both
job
NA
我尝试过使用 if 和 mutate 编写代码,但我对 R 还很陌生,我的代码根本不起作用。 如果我像这样单独分配值:
data_nature_fewmissing$urbandnat[data_nature_fewmissing$nature =="yes" & data_nature_fewmissing$urbangreen =="yes"] <- "yes"
它不起作用,因为我在每一步都覆盖了之前的结果。
感谢您的帮助!
对于这些类型的复杂条件,我喜欢 dplyr
中的 case_when
。
df<-tibble::tribble(
~job, ~honorary,
"yes", "yes",
"yes", "no",
"no", "yes",
"yes", "yes",
"yes", NA,
NA, "no"
)
library(dplyr)
df_new <- df %>%
mutate(result=case_when(
job=="yes" & honorary=="yes" ~ "both",
honorary=="yes" ~ "honorary",
job=="yes" ~ "job",
is.na(honorary) & is.na(job) ~ NA_character_,
is.na(honorary) & job=="no" ~ NA_character_,
is.na(job) & honorary=="no" ~ NA_character_,
TRUE ~ "other"
))
df_new
#> # A tibble: 6 × 3
#> job honorary result
#> <chr> <chr> <chr>
#> 1 yes yes both
#> 2 yes no job
#> 3 no yes honorary
#> 4 yes yes both
#> 5 yes <NA> job
#> 6 <NA> no <NA>
或以 R 为基数
df_new<-df
df_new=within(df_new,{
result=NA
result[ honorary=="yes"] = "honorary"
result[ job=="yes"] = "job"
result[job=="yes" & honorary=="yes"]='both'
})
由 reprex package (v2.0.1)
创建于 2022-01-16您的代码 returns 出错,因为您没有为行编制索引。索引数据帧时,语法为 df[rows, columns]
。所以要索引行和 select 所有列,你必须添加一个逗号:
data_nature_fewmissing$urbandnat[data_nature_fewmissing$nature =="yes" & data_nature_fewmissing$urbangreen =="yes",] <- "yes"
然而,更简单的方法是使用 tidyverse。我们将使用 mutate
创建新列,并使用 case_when
处理多个 if-else 条件。
library(tidyverse)
df = data_nature_fewmissing
df %>% mutate(result = case_when(
job == 'yes' & honorary == 'yes' ~ 'both',
job == 'yes' & (honorary == 'no' | is.na(honorary)) ~ 'job',
honorary == 'yes' & (job == 'no' | is.na(job)) ~ 'honorary',
))