查找一列中的值是否在其他几列的范围内
Finding if the values in one column are within the range of several other columns
我正在寻找一种简单的方法来确定列中的值是否在其他列中的值范围内。
我的输入是这样的:
ID "Q1 Comm - 01 Scope Thesis" "Q1 Comm - 02 Scope Project" "Q1 Comm - 03 Learn Intern" "Q1 Comm - 04 Biography" "Q1 Comm - Overall Plan"
10 NA NA 4 NA 4
31 2 NA NA NA 2
225 0 NA NA NA 1
243 NA 2 NA 1 0
310 NA 2 NA 1 NA
对于每个唯一 ID
,我有兴趣确定列 Q1 Comm - Overall Plan
何时为:
1 - Below
所有其他列的 min()
,或
2 - Above
所有其他列的 max()
,或
3 - Within
所有其他列的范围
完整的列列表(以及 overall
列)如下:
"Q1 Comm - 01 Scope Thesis"
"Q1 Comm - 02 Scope Project"
"Q1 Comm - 03 Learn Intern"
"Q1 Comm - 04 Biography"
"Q1 Comm - 05 Exhibit"
"Q1 Comm - 06 Social Act"
"Q1 Comm - 07 Post Project"
"Q1 Comm - 08 Learn Plant"
"Q1 Comm - 09 Study Narrate"
"Q1 Comm - 10 Learn Participate"
"Q1 Comm - 11 Write 1"
"Q1 Comm - 12 Read 2"
"Q1 Comm - Overall Plan"
我需要的输出是这样的:
ID "Q1 Comm - 01 Scope Thesis" "Q1 Comm - 02 Scope Project" "Q1 Comm - 03 Learn Intern" "Q1 Comm - 04 Biography" "Q1 Comm - Overall Plan" "Q1_check"
10 NA NA 4 NA 4 "within"
31 2 NA NA NA 2 "within"
225 0 NA NA NA 1 "above"
243 NA 2 NA 1 0 "below"
310 NA 2 NA 1 NA NA
我的数据框 df
的 dput() 如下。
dput(df)
structure(list(ID = c(10L, 31L, 225L, 243L), Q1.Comm...01.Scope.Thesis = c(NA,
2L, 0L, NA), Q1.Comm...02.Scope.Project = c(NA, NA, NA, 2L),
Q1.Comm...03.Learn.Intern = c(4L, NA, NA, NA), Q1.Comm...04.Biography = c(NA,
NA, NA, 1L), Q1.Comm...Overall.Plan = c(4L, 1L, 2L,
NA), X = c(NA, NA, NA, NA), X.1 = c(NA, NA, NA, NA), X.2 = c(NA,
NA, NA, NA)), class = "data.frame", row.names = c(NA, -4L
))
注:
我曾在这里问过这个问题 Finding if a value is within the range of other columns,但示例过于简单,none 的解决方案对我有用。
这个问题太长了,因此,为了清楚起见,我post将其作为一个新问题。
感谢您抽出宝贵时间帮助解决此问题 post。
你可以用 rowwise
和 c_across
尝试这样的事情:
library(dplyr)
df %>%
rowwise %>%
summarise(ID = ID,
Max = `Q1.Comm...Overall.Plan` > max(c_across(-c(ID,`Q1.Comm...Overall.Plan`)),na.rm = TRUE),
Min = `Q1.Comm...Overall.Plan` < min(c_across(-c(ID,`Q1.Comm...Overall.Plan`)),na.rm = TRUE),
Range = `Q1.Comm...Overall.Plan` >= range(c_across(-c(ID,`Q1.Comm...Overall.Plan`)),na.rm = TRUE)[1] &
`Q1.Comm...Overall.Plan` <= range(c_across(-c(ID,`Q1.Comm...Overall.Plan`)),na.rm = TRUE)[2]) %>%
mutate(Result = case_when(Max ~ "above",
Min ~ "below",
Range ~ "within",
TRUE ~ NA_character_))
# A tibble: 4 x 5
ID Max Min Range Result
<int> <lgl> <lgl> <lgl> <chr>
1 10 FALSE FALSE TRUE within
2 31 FALSE FALSE TRUE within
3 225 TRUE FALSE FALSE above
4 243 NA NA NA NA
您可以将 summarise
更改为 mutate
以保留原始列 and/or select
以删除它们。
有关详细信息,请参阅 dplyr rowwise tutorial。
library(purrr)
library(data.table)
needed_cols <- setdiff(names(df), c("ID", "Q1.Comm...Overall.Plan"))
setDT(df)[, c("min", "max") := transpose(pmap(.SD, range, na.rm = TRUE)), .SDcols = needed_cols]
df[, Q1_check := fcase(
is.na(`Q1.Comm...Overall.Plan`), NA_character_,
`Q1.Comm...Overall.Plan` < min, "below",
`Q1.Comm...Overall.Plan` > max, "above",
default = "within"
)
]
df[, c("max", "min") := NULL]
我已经修改了您的输出以满足您在链接问题中讨论的要求。我想这会对你有所帮助。我使用了 janitor::clean_names()
,我建议您在继续之前使用它,以便清理您的列名。
所以修改后的dput是
df <- structure(list(id = c(10L, 31L, 225L, 243L), q1_comm_01_scope_thesis = c(NA,
2L, 0L, NA), q1_comm_02_scope_project = c(NA, NA, NA, 2L), q1_comm_03_learn_intern = c(4L,
NA, NA, NA), q1_comm_04_biography = c(NA, NA, NA, 1L), q1_comm_overall_plan = c(4L,
1L, 2L, NA), q2_comm_01_scope_thesis = c(NA, 4, 0, NA), q2_comm_02_scope_project = c(NA,
NA, NA, 4), q2_comm_03_learn_intern = c(8, NA, NA, NA), q2_comm_04_biography = c(NA,
NA, NA, 2), q2_comm_overall_plan = c(8, 2, 4, NA)), row.names = c(NA,
-4L), class = "data.frame")
df
id q1_comm_01_scope_thesis q1_comm_02_scope_project q1_comm_03_learn_intern q1_comm_04_biography q1_comm_overall_plan q2_comm_01_scope_thesis
1 10 NA NA 4 NA 4 NA
2 31 2 NA NA NA 1 4
3 225 0 NA NA NA 2 0
4 243 NA 2 NA 1 NA NA
q2_comm_02_scope_project q2_comm_03_learn_intern q2_comm_04_biography q2_comm_overall_plan
1 NA 8 NA 8
2 NA NA NA 2
3 NA NA NA 4
4 4 NA 2 NA
现在按照建议进行。 您必须修改 cur_data() 内的 [-5] 以满足您的要求(根据 overall_column 的相对位置,我认为在您的情况下为 9)
library(tidyverse)
split.default(df[-1], gsub('(q\d*)(.*)', '\1', names(df[-1]), perl = T)) %>%
map(., ~ .x %>% bind_cols('id' = df$id) %>%
group_by(id) %>%
mutate(across(ends_with('_overall_plan'), ~ case_when(. < min(cur_data()[-5], na.rm = T) ~ 'below',
. > max(cur_data()[-5], na.rm = T) ~ 'above',
is.na(.) ~ NA_character_,
TRUE ~ 'within'),
.names = '{str_remove(.col,"_comm_overall_plan")}_check'))
) %>%
reduce(left_join, by = 'id')
# A tibble: 4 x 13
# Groups: id [4]
q1_comm_01_scop~ q1_comm_02_scop~ q1_comm_03_lear~ q1_comm_04_biog~ q1_comm_overall~ id q1_check q2_comm_01_scop~ q2_comm_02_scop~ q2_comm_03_lear~ q2_comm_04_biog~
<int> <int> <int> <int> <int> <int> <chr> <dbl> <dbl> <dbl> <dbl>
1 NA NA 4 NA 4 10 within NA NA 8 NA
2 2 NA NA NA 1 31 below 4 NA NA NA
3 0 NA NA NA 2 225 above 0 NA NA NA
4 NA 2 NA 1 NA 243 NA NA 4 NA 2
# ... with 2 more variables: q2_comm_overall_plan <dbl>, q2_check <chr>
我正在寻找一种简单的方法来确定列中的值是否在其他列中的值范围内。
我的输入是这样的:
ID "Q1 Comm - 01 Scope Thesis" "Q1 Comm - 02 Scope Project" "Q1 Comm - 03 Learn Intern" "Q1 Comm - 04 Biography" "Q1 Comm - Overall Plan"
10 NA NA 4 NA 4
31 2 NA NA NA 2
225 0 NA NA NA 1
243 NA 2 NA 1 0
310 NA 2 NA 1 NA
对于每个唯一 ID
,我有兴趣确定列 Q1 Comm - Overall Plan
何时为:
1 - Below
所有其他列的 min()
,或
2 - Above
所有其他列的 max()
,或
3 - Within
所有其他列的范围
完整的列列表(以及 overall
列)如下:
"Q1 Comm - 01 Scope Thesis"
"Q1 Comm - 02 Scope Project"
"Q1 Comm - 03 Learn Intern"
"Q1 Comm - 04 Biography"
"Q1 Comm - 05 Exhibit"
"Q1 Comm - 06 Social Act"
"Q1 Comm - 07 Post Project"
"Q1 Comm - 08 Learn Plant"
"Q1 Comm - 09 Study Narrate"
"Q1 Comm - 10 Learn Participate"
"Q1 Comm - 11 Write 1"
"Q1 Comm - 12 Read 2"
"Q1 Comm - Overall Plan"
我需要的输出是这样的:
ID "Q1 Comm - 01 Scope Thesis" "Q1 Comm - 02 Scope Project" "Q1 Comm - 03 Learn Intern" "Q1 Comm - 04 Biography" "Q1 Comm - Overall Plan" "Q1_check"
10 NA NA 4 NA 4 "within"
31 2 NA NA NA 2 "within"
225 0 NA NA NA 1 "above"
243 NA 2 NA 1 0 "below"
310 NA 2 NA 1 NA NA
我的数据框 df
的 dput() 如下。
dput(df)
structure(list(ID = c(10L, 31L, 225L, 243L), Q1.Comm...01.Scope.Thesis = c(NA,
2L, 0L, NA), Q1.Comm...02.Scope.Project = c(NA, NA, NA, 2L),
Q1.Comm...03.Learn.Intern = c(4L, NA, NA, NA), Q1.Comm...04.Biography = c(NA,
NA, NA, 1L), Q1.Comm...Overall.Plan = c(4L, 1L, 2L,
NA), X = c(NA, NA, NA, NA), X.1 = c(NA, NA, NA, NA), X.2 = c(NA,
NA, NA, NA)), class = "data.frame", row.names = c(NA, -4L
))
注:
我曾在这里问过这个问题 Finding if a value is within the range of other columns,但示例过于简单,none 的解决方案对我有用。
这个问题太长了,因此,为了清楚起见,我post将其作为一个新问题。
感谢您抽出宝贵时间帮助解决此问题 post。
你可以用 rowwise
和 c_across
尝试这样的事情:
library(dplyr)
df %>%
rowwise %>%
summarise(ID = ID,
Max = `Q1.Comm...Overall.Plan` > max(c_across(-c(ID,`Q1.Comm...Overall.Plan`)),na.rm = TRUE),
Min = `Q1.Comm...Overall.Plan` < min(c_across(-c(ID,`Q1.Comm...Overall.Plan`)),na.rm = TRUE),
Range = `Q1.Comm...Overall.Plan` >= range(c_across(-c(ID,`Q1.Comm...Overall.Plan`)),na.rm = TRUE)[1] &
`Q1.Comm...Overall.Plan` <= range(c_across(-c(ID,`Q1.Comm...Overall.Plan`)),na.rm = TRUE)[2]) %>%
mutate(Result = case_when(Max ~ "above",
Min ~ "below",
Range ~ "within",
TRUE ~ NA_character_))
# A tibble: 4 x 5
ID Max Min Range Result
<int> <lgl> <lgl> <lgl> <chr>
1 10 FALSE FALSE TRUE within
2 31 FALSE FALSE TRUE within
3 225 TRUE FALSE FALSE above
4 243 NA NA NA NA
您可以将 summarise
更改为 mutate
以保留原始列 and/or select
以删除它们。
有关详细信息,请参阅 dplyr rowwise tutorial。
library(purrr)
library(data.table)
needed_cols <- setdiff(names(df), c("ID", "Q1.Comm...Overall.Plan"))
setDT(df)[, c("min", "max") := transpose(pmap(.SD, range, na.rm = TRUE)), .SDcols = needed_cols]
df[, Q1_check := fcase(
is.na(`Q1.Comm...Overall.Plan`), NA_character_,
`Q1.Comm...Overall.Plan` < min, "below",
`Q1.Comm...Overall.Plan` > max, "above",
default = "within"
)
]
df[, c("max", "min") := NULL]
我已经修改了您的输出以满足您在链接问题中讨论的要求。我想这会对你有所帮助。我使用了 janitor::clean_names()
,我建议您在继续之前使用它,以便清理您的列名。
所以修改后的dput是
df <- structure(list(id = c(10L, 31L, 225L, 243L), q1_comm_01_scope_thesis = c(NA,
2L, 0L, NA), q1_comm_02_scope_project = c(NA, NA, NA, 2L), q1_comm_03_learn_intern = c(4L,
NA, NA, NA), q1_comm_04_biography = c(NA, NA, NA, 1L), q1_comm_overall_plan = c(4L,
1L, 2L, NA), q2_comm_01_scope_thesis = c(NA, 4, 0, NA), q2_comm_02_scope_project = c(NA,
NA, NA, 4), q2_comm_03_learn_intern = c(8, NA, NA, NA), q2_comm_04_biography = c(NA,
NA, NA, 2), q2_comm_overall_plan = c(8, 2, 4, NA)), row.names = c(NA,
-4L), class = "data.frame")
df
id q1_comm_01_scope_thesis q1_comm_02_scope_project q1_comm_03_learn_intern q1_comm_04_biography q1_comm_overall_plan q2_comm_01_scope_thesis
1 10 NA NA 4 NA 4 NA
2 31 2 NA NA NA 1 4
3 225 0 NA NA NA 2 0
4 243 NA 2 NA 1 NA NA
q2_comm_02_scope_project q2_comm_03_learn_intern q2_comm_04_biography q2_comm_overall_plan
1 NA 8 NA 8
2 NA NA NA 2
3 NA NA NA 4
4 4 NA 2 NA
现在按照建议进行。 您必须修改 cur_data() 内的 [-5] 以满足您的要求(根据 overall_column 的相对位置,我认为在您的情况下为 9)
library(tidyverse)
split.default(df[-1], gsub('(q\d*)(.*)', '\1', names(df[-1]), perl = T)) %>%
map(., ~ .x %>% bind_cols('id' = df$id) %>%
group_by(id) %>%
mutate(across(ends_with('_overall_plan'), ~ case_when(. < min(cur_data()[-5], na.rm = T) ~ 'below',
. > max(cur_data()[-5], na.rm = T) ~ 'above',
is.na(.) ~ NA_character_,
TRUE ~ 'within'),
.names = '{str_remove(.col,"_comm_overall_plan")}_check'))
) %>%
reduce(left_join, by = 'id')
# A tibble: 4 x 13
# Groups: id [4]
q1_comm_01_scop~ q1_comm_02_scop~ q1_comm_03_lear~ q1_comm_04_biog~ q1_comm_overall~ id q1_check q2_comm_01_scop~ q2_comm_02_scop~ q2_comm_03_lear~ q2_comm_04_biog~
<int> <int> <int> <int> <int> <int> <chr> <dbl> <dbl> <dbl> <dbl>
1 NA NA 4 NA 4 10 within NA NA 8 NA
2 2 NA NA NA 1 31 below 4 NA NA NA
3 0 NA NA NA 2 225 above 0 NA NA NA
4 NA 2 NA 1 NA 243 NA NA 4 NA 2
# ... with 2 more variables: q2_comm_overall_plan <dbl>, q2_check <chr>