在R中,如何根据条件用另一个数据集的另一列的值替换一列中的值?
In R, how to replace values in a column with values of another column of another data set based on a condition?
我需要数据集,我在下面给出了样本。我需要替换 target_df$project_name
中的项目名称,以防它们出现在 registry_df$to_change
中并具有 registry_df$replacement
中的相应值。但是,我尝试的代码显然没有产生任何结果。应该怎么改正或者还有什么方法可以达到预期的目的?
数据集:
target_df <- tibble::tribble(
~project_name, ~sum,
"Mark", "4307",
"Boat", "9567",
"Delorean", "5344",
"Parix", "1043",
)
registry_df <- tibble::tribble(
~to_change, ~replacement,
"Mark", "Duck",
"Boat", "Tank",
"Toloune", "Bordeaux",
"Hunge", "Juron",
)
target_df 的期望输出:
project_name sum
"Duck" "4307"
"Tank" "9567"
"Delorean" "5344"
"Parix" "1043"
代码:
library(data.table)
target_df <- transform(target_df,
project_name = ifelse(target_df$project_name %in% registry_df$to_change),
registry_df$replacement,
project_name
)
基本的 R 解决方案:您可以使用 match
函数匹配列。由于并非 target_df$project_name
的所有级别都在 registry_df$to_change
中,因此您的匹配变量将具有 NA
。因此,我包含了 ifelse
函数,它在 NA
s 的情况下保持原始值。
matching <- registry_df$replacement[match(target_df$project_name, registry_df$to_change)]
target_df$project_name <- ifelse(is.na(matching),
target_df$project_name,
matching)
target_df
给出预期输出:
project_name sum
<chr> <chr>
1 Duck 4307
2 Tank 9567
3 Delorean 5344
4 Parix 1043
一个dplyr
解决方案。可能有一种更少步骤的优雅方法。
library(dplyr)
target_df <- target_df %>%
left_join(registry_df,
by = c("project_name" = "to_change")) %>%
mutate(replacement = ifelse(is.na(replacement), project_name, replacement)) %>%
select(project_name = replacement, sum)
结果:
# A tibble: 4 × 2
project_name sum
<chr> <chr>
1 Duck 4307
2 Tank 9567
3 Delorean 5344
4 Parix 1043
我需要数据集,我在下面给出了样本。我需要替换 target_df$project_name
中的项目名称,以防它们出现在 registry_df$to_change
中并具有 registry_df$replacement
中的相应值。但是,我尝试的代码显然没有产生任何结果。应该怎么改正或者还有什么方法可以达到预期的目的?
数据集:
target_df <- tibble::tribble(
~project_name, ~sum,
"Mark", "4307",
"Boat", "9567",
"Delorean", "5344",
"Parix", "1043",
)
registry_df <- tibble::tribble(
~to_change, ~replacement,
"Mark", "Duck",
"Boat", "Tank",
"Toloune", "Bordeaux",
"Hunge", "Juron",
)
target_df 的期望输出:
project_name sum
"Duck" "4307"
"Tank" "9567"
"Delorean" "5344"
"Parix" "1043"
代码:
library(data.table)
target_df <- transform(target_df,
project_name = ifelse(target_df$project_name %in% registry_df$to_change),
registry_df$replacement,
project_name
)
基本的 R 解决方案:您可以使用 match
函数匹配列。由于并非 target_df$project_name
的所有级别都在 registry_df$to_change
中,因此您的匹配变量将具有 NA
。因此,我包含了 ifelse
函数,它在 NA
s 的情况下保持原始值。
matching <- registry_df$replacement[match(target_df$project_name, registry_df$to_change)]
target_df$project_name <- ifelse(is.na(matching),
target_df$project_name,
matching)
target_df
给出预期输出:
project_name sum
<chr> <chr>
1 Duck 4307
2 Tank 9567
3 Delorean 5344
4 Parix 1043
一个dplyr
解决方案。可能有一种更少步骤的优雅方法。
library(dplyr)
target_df <- target_df %>%
left_join(registry_df,
by = c("project_name" = "to_change")) %>%
mutate(replacement = ifelse(is.na(replacement), project_name, replacement)) %>%
select(project_name = replacement, sum)
结果:
# A tibble: 4 × 2
project_name sum
<chr> <chr>
1 Duck 4307
2 Tank 9567
3 Delorean 5344
4 Parix 1043