如何通过从现有列中获取值并根据差异列中的条件向该列添加特定值来创建新列？

Question

如何根据不同列的条件从现有列中添加值来创建新列？不确定如何清楚地解释这一点，所以这里有一个例子......

这是我的数据示例：

structure(list(id = c(1002L, 650L, 644L, 608L, 718L, 623L, 721L, 
715L, 820L, 616L, 1001L, 634L, 1005L, 821L, 816L, 1021L, 606L, 
824L, 626L, 815L), depth = c(10L, 15L, 10L, 15L, 15L, 10L, 10L, 
20L, 5L, 20L, 5L, 20L, 20L, 10L, 20L, 10L, 20L, 5L, 20L, 20L), 
    duration = c(10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 5L, 
    10L, 5L, 10L, 10L, 10L, 10L, 10L, 10L, 5L, 10L, 10L), catch = c(183L, 
    1216L, 159L, 232L, 1260L, 35L, 149L, 1047L, 907L, 1123L, 
    75L, 800L, 517L, 7L, 958L, 178L, 1127L, 16512L, 504L, 7890L
    ), CPH = c(1098L, 7296L, 954L, 1392L, 7560L, 210L, 894L, 
    6282L, 10884L, 6738L, 900L, 4800L, 3102L, 42L, 5748L, 1068L, 
    6762L, 198144L, 3024L, 47340L), distance = c(771.6666667, 
    771.6666667, 771.6666667, 771.6666667, 771.6666667, 771.6666667, 
    771.6666667, 771.6666667, 385.8333333, 771.6666667, 385.8333333, 
    771.6666667, 771.6666667, 771.6666667, 771.6666667, 771.6666667, 
    771.6666667, 385.8333333, 771.6666667, 771.6666667)), class = "data.frame", row.names = c(NA, 
-20L))

我需要根据其他现有列的以下条件创建一个新列 'totaldist'：

IF depth == 5，则应使用 = distance+111

创建新的 'totaldist' 列

如果深度== 10，则'totaldist' = 距离+130

如果深度== 15，则'totaldist' = 距离+185

如果深度== 20，则'totaldist' = 距离+259

例如，对于深度为 20 且距离为 771.6667 的 ID 606，新列 ('totaldist') 应显示值 1030.667（距离+259）。

我不确定我是否应该使用 if_else，或者如果与 else if 结合使用，但我发现使用这些函数的示例对我不起作用。我应该用 lapply 试试这个吗？不知道该怎么做。我希望这是清楚的。谢谢！

Answer 1

由于您有多个标准，case_when 可能比 ifelse 更好。

library(dplyr)

df %>% mutate(totaldist = case_when(depth == 5 ~ distance + 111,
                                    depth == 10 ~ distance + 130,
                                    depth == 15 ~ distance + 185,
                                    depth == 20 ~ distance + 259,
                                    TRUE ~ NA_real_))

     id depth duration catch    CPH distance totaldist
1  1002    10       10   183   1098 771.6667  901.6667
2   650    15       10  1216   7296 771.6667  956.6667
3   644    10       10   159    954 771.6667  901.6667
4   608    15       10   232   1392 771.6667  956.6667
5   718    15       10  1260   7560 771.6667  956.6667
6   623    10       10    35    210 771.6667  901.6667
7   721    10       10   149    894 771.6667  901.6667
8   715    20       10  1047   6282 771.6667 1030.6667
9   820     5        5   907  10884 385.8333  496.8333
10  616    20       10  1123   6738 771.6667 1030.6667
11 1001     5        5    75    900 385.8333  496.8333
12  634    20       10   800   4800 771.6667 1030.6667
13 1005    20       10   517   3102 771.6667 1030.6667
14  821    10       10     7     42 771.6667  901.6667
15  816    20       10   958   5748 771.6667 1030.6667
16 1021    10       10   178   1068 771.6667  901.6667
17  606    20       10  1127   6762 771.6667 1030.6667
18  824     5        5 16512 198144 385.8333  496.8333
19  626    20       10   504   3024 771.6667 1030.6667
20  815    20       10  7890  47340 771.6667 1030.6667

Answer 2

另一种方法是通过查找加入 table:

library(dplyr)

lookup <- data.frame(
  depth = c(5, 10, 15, 20),
  added_dist = c(111, 130, 185, 259)
)

my_df %>% 
  left_join(lookup, 'depth') %>% 
  mutate(totaldist = distance + added_dist)

这将逻辑与编码值分开，这可能很好。

如何通过从现有列中获取值并根据差异列中的条件向该列添加特定值来创建新列？

How can I create a new column by taking a value from an existing column and adding a specific value to that based on conditions from a diff column?

if-statement

r

lapply

dplyr