DataFrame 中多列的 Ifelse
Ifelse for Multiple Columns in DataFrame
我有如下数据集:
ID
Winter
Spring
Summer
Fall
1
high
NA
high
low
2
low
high
NA
low
3
low
NA
NA
low
4
low
high
NA
low
我想添加一个计算列,这样如果冬季、spring、夏季和秋季列中的任何一个包含“high”,则“1”将添加到该行,如下所示。否则它将包含 0.
ID
Winter
Spring
Summer
Fall
calculated_column
1
high
NA
high
low
1
2
low
high
NA
low
1
3
low
NA
NA
low
0
4
low
high
NA
low
1
到目前为止我有这样的事情,我知道它是不正确的。我不确定如何指定多列而不是一列:
df$calculated_column <- ifelse(c(2:5)=="High",1,0)
我们可以用if_any
library(dplyr)
df1 <- df1 %>%
mutate(calculated_column = +(if_any(-ID, ~ . %in% 'high')))
-输出
df1
ID Winter Spring Summer Fall calculated_column
1 1 high <NA> high low 1
2 2 low high <NA> low 1
3 3 low <NA> <NA> low 0
4 4 low high <NA> low 1
或者如果我们想使用 base R
,在逻辑矩阵
上创建带有 rowSums
的逻辑条件
df1$calculated_column <- +(rowSums(df1[-1] == "high", na.rm = TRUE) > 0)
数据
df1 <- structure(list(ID = 1:4, Winter = c("high", "low", "low", "low"
), Spring = c(NA, "high", NA, "high"), Summer = c("high", NA,
NA, NA), Fall = c("low", "low", "low", "low")),
class = "data.frame", row.names = c(NA,
-4L))
您还可以这样做:
df1$calculated_column = +grepl('high', do.call(paste, df1))
df1
ID Winter Spring Summer Fall calculated_column
1 1 high <NA> high low 1
2 2 low high <NA> low 1
3 3 low <NA> <NA> low 0
4 4 low high <NA> low 1
这是一个base R
解决方案:
calculated_column = (apply(df1,1,function(x) sum(grepl("high",x)))>0)*1
cbind(df1, calculated_column)
ID Winter Spring Summer Fall calculated_column
1 1 high <NA> high low 1
2 2 low high <NA> low 1
3 3 low <NA> <NA> low 0
4 4 low high <NA> low 1
我有如下数据集:
ID | Winter | Spring | Summer | Fall |
---|---|---|---|---|
1 | high | NA | high | low |
2 | low | high | NA | low |
3 | low | NA | NA | low |
4 | low | high | NA | low |
我想添加一个计算列,这样如果冬季、spring、夏季和秋季列中的任何一个包含“high”,则“1”将添加到该行,如下所示。否则它将包含 0.
ID | Winter | Spring | Summer | Fall | calculated_column |
---|---|---|---|---|---|
1 | high | NA | high | low | 1 |
2 | low | high | NA | low | 1 |
3 | low | NA | NA | low | 0 |
4 | low | high | NA | low | 1 |
到目前为止我有这样的事情,我知道它是不正确的。我不确定如何指定多列而不是一列:
df$calculated_column <- ifelse(c(2:5)=="High",1,0)
我们可以用if_any
library(dplyr)
df1 <- df1 %>%
mutate(calculated_column = +(if_any(-ID, ~ . %in% 'high')))
-输出
df1
ID Winter Spring Summer Fall calculated_column
1 1 high <NA> high low 1
2 2 low high <NA> low 1
3 3 low <NA> <NA> low 0
4 4 low high <NA> low 1
或者如果我们想使用 base R
,在逻辑矩阵
rowSums
的逻辑条件
df1$calculated_column <- +(rowSums(df1[-1] == "high", na.rm = TRUE) > 0)
数据
df1 <- structure(list(ID = 1:4, Winter = c("high", "low", "low", "low"
), Spring = c(NA, "high", NA, "high"), Summer = c("high", NA,
NA, NA), Fall = c("low", "low", "low", "low")),
class = "data.frame", row.names = c(NA,
-4L))
您还可以这样做:
df1$calculated_column = +grepl('high', do.call(paste, df1))
df1
ID Winter Spring Summer Fall calculated_column
1 1 high <NA> high low 1
2 2 low high <NA> low 1
3 3 low <NA> <NA> low 0
4 4 low high <NA> low 1
这是一个base R
解决方案:
calculated_column = (apply(df1,1,function(x) sum(grepl("high",x)))>0)*1
cbind(df1, calculated_column)
ID Winter Spring Summer Fall calculated_column
1 1 high <NA> high low 1
2 2 low high <NA> low 1
3 3 low <NA> <NA> low 0
4 4 low high <NA> low 1