Return 从 R 中的 tbl_df 个数字引用的字符串
Return a string by reference from tbl_df of numerics in R
我有一个名为 'control.scores' 的 tbl_df(tibble),它有一个名为 "Overall" 的列,该列的值介于 1.00 和 4.00 之间。
# A tibble: 2 x 8
group GOV CORC TMSC AUDIT PPS TRAIN Overall
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Group4 0.82 0.2 0.525 0.2833333 0.2 0.2333333 2.261667
2 Group5 0.82 0.0 0.525 0.2833333 0.2 0.2333333 2.061667
我还有一个tbl_df,叫'control.rating.tbl'构造:
# create a reference table of control ratings and numeric ranges
control.ref.tbl <- tribble(
~RATING, ~MIN, ~MAX,
"Ineffective", 3.500, 4.00,
"Marginally Effective",2.500 ,3.499,
"Generally Effective", 1.500 ,2.499,
"Highly Effective", 1.00, 1.499
)
如何向 'control.scores' 追加一列,该列使用 Overall 中的值并检查其在 'control.rating.tbl' 和 return 对应字符串的 MIN 和 MAX 范围之间的位置?
例如Group4_Overall == '2.261667,对应'control.rating.tbl'中的'Generally Effective'。它看起来像这样:
# A tibble: 2 x 8
group GOV CORC TMSC AUDIT PPS TRAIN Overall Rating
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Group4 0.82 0.2 0.525 0.2833333 0.2 0.2333333 2.261667 Generally Effective
2 Group5 0.82 0.0 0.525 0.2833333 0.2 0.2333333 2.061667 Generally Effective
我们可以考虑从dplyr
使用case_when
。请注意,我稍微更改了您的分类范围,因为您的原始分类存在差距。例如,根据您的原始分类,3.505 将没有任何关联 类。 dt2
是最终输出。
library(dplyr)
dt2 <- dt %>%
mutate(Rating = case_when(
Overall > 3.5 & Overall <= 4.00 ~ "Ineffective",
Overall > 3 & Overall <= 3.5 ~ "Marginally Effective",
Overall > 2.5 & Overall <= 3 ~ "Generally Effective",
Overall >= 1 & Overall <= 2.5 ~ "Highly Effective"
))
数据:
dt <- read.table(text = "group GOV CORC TMSC AUDIT PPS TRAIN Overall
1 Group4 0.82 0.2 0.525 0.2833333 0.2 0.2333333 2.261667
2 Group5 0.82 0.0 0.525 0.2833333 0.2 0.2333333 2.061667",
header = TRUE, stringsAsFactors = FALSE)
我有一个名为 'control.scores' 的 tbl_df(tibble),它有一个名为 "Overall" 的列,该列的值介于 1.00 和 4.00 之间。
# A tibble: 2 x 8
group GOV CORC TMSC AUDIT PPS TRAIN Overall
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Group4 0.82 0.2 0.525 0.2833333 0.2 0.2333333 2.261667
2 Group5 0.82 0.0 0.525 0.2833333 0.2 0.2333333 2.061667
我还有一个tbl_df,叫'control.rating.tbl'构造:
# create a reference table of control ratings and numeric ranges
control.ref.tbl <- tribble(
~RATING, ~MIN, ~MAX,
"Ineffective", 3.500, 4.00,
"Marginally Effective",2.500 ,3.499,
"Generally Effective", 1.500 ,2.499,
"Highly Effective", 1.00, 1.499
)
如何向 'control.scores' 追加一列,该列使用 Overall 中的值并检查其在 'control.rating.tbl' 和 return 对应字符串的 MIN 和 MAX 范围之间的位置?
例如Group4_Overall == '2.261667,对应'control.rating.tbl'中的'Generally Effective'。它看起来像这样:
# A tibble: 2 x 8
group GOV CORC TMSC AUDIT PPS TRAIN Overall Rating
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Group4 0.82 0.2 0.525 0.2833333 0.2 0.2333333 2.261667 Generally Effective
2 Group5 0.82 0.0 0.525 0.2833333 0.2 0.2333333 2.061667 Generally Effective
我们可以考虑从dplyr
使用case_when
。请注意,我稍微更改了您的分类范围,因为您的原始分类存在差距。例如,根据您的原始分类,3.505 将没有任何关联 类。 dt2
是最终输出。
library(dplyr)
dt2 <- dt %>%
mutate(Rating = case_when(
Overall > 3.5 & Overall <= 4.00 ~ "Ineffective",
Overall > 3 & Overall <= 3.5 ~ "Marginally Effective",
Overall > 2.5 & Overall <= 3 ~ "Generally Effective",
Overall >= 1 & Overall <= 2.5 ~ "Highly Effective"
))
数据:
dt <- read.table(text = "group GOV CORC TMSC AUDIT PPS TRAIN Overall
1 Group4 0.82 0.2 0.525 0.2833333 0.2 0.2333333 2.261667
2 Group5 0.82 0.0 0.525 0.2833333 0.2 0.2333333 2.061667",
header = TRUE, stringsAsFactors = FALSE)