您如何计算另一列中超过和低于特定数字的幅度（百分比）？

Question

我有这个数据集

study_ID title                  experiment question_ID participant_ID estimate_level estimate correct_answer question                      type   category   age gender
      <dbl> <chr>                       <dbl> <chr>                <int> <chr>             <dbl>          <dbl> <chr>                         <chr>  <chr>    <int> <chr> 
 1       11 Dallacker_Parents'_co…          1 1                        1 individual          3             10   How many sugar cubes does or… unlim… nutriti…    32 Female
 2       11 Dallacker_Parents'_co…          1 2                        1 individual         10             11.5 How many sugar cubes does a … unlim… nutriti…    32 Female
 3       11 Dallacker_Parents'_co…          1 3                        1 individual          7              6.5 How many sugar cubes does a … unlim… nutriti…    32 Female
 4       11 Dallacker_Parents'_co…          1 4                        1 individual          1             16.5 How many sugar cubes does a … unlim… nutriti…    32 Female
 5       11 Dallacker_Parents'_co…          1 5                        1 individual          7             11   How many sugar cubes does a … unlim… nutriti…    32 Female
 6       11 Dallacker_Parents'_co…          1 6                        1 individual          5              2.5 How many sugar cubes does a … unlim… nutriti…    32 Female
 7       11 Dallacker_Parents'_co…          1 1                        2 individual          2             10   How many sugar cubes does or… unlim… nutriti…    29 Female
 8       11 Dallacker_Parents'_co…          1 2                        2 individual         10             11.5 How many sugar cubes does a … unlim… nutriti…    29 Female
 9       11 Dallacker_Parents'_co…          1 3                        2 individual          1.5            6.5 How many sugar cubes does a … unlim… nutriti…    29 Female
10       11 Dallacker_Parents'_co…          1 4                        2 individual          2             16.5 How many sugar cubes does a … unlim… nutriti…    29 Female

这个数据集中有 6 个问题，每个问题都有一个 correct_answer 列和一个 estimate 列。我试图计算每个问题的幅度，以便我得到低估或高估以及正确估计的人的百分比。

例如，对于 6 个问题中的每一个，它都会 return 是这样的：低估了 80%，高估了 10%，正确回答了 10%。

我该怎么做？我很难过。提前致谢！

这是输出

dput(head(DF, 10))
structure(list(study_ID = c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5), title = c("5_Jayles_Debiasing_The_Crowd", 
"5_Jayles_Debiasing_The_Crowd", "5_Jayles_Debiasing_The_Crowd", 
"5_Jayles_Debiasing_The_Crowd", "5_Jayles_Debiasing_The_Crowd", 
"5_Jayles_Debiasing_The_Crowd", "5_Jayles_Debiasing_The_Crowd", 
"5_Jayles_Debiasing_The_Crowd", "5_Jayles_Debiasing_The_Crowd", 
"5_Jayles_Debiasing_The_Crowd"), experiment = c(1, 1, 1, 1, 1, 
1, 1, 1, 1, 1), question_ID = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1), 
    participant_ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), estimate_level = c("individual", 
    "individual", "individual", "individual", "individual", "individual", 
    "individual", "individual", "individual", "individual"), 
    estimate = c(2e+07, 4500000, 21075541, 2e+07, 1e+06, 1.1e+07, 
    2.5e+07, 8e+06, 1.6e+07, 9800000), correct = c(3.8e+07, 3.8e+07, 
    3.8e+07, 3.8e+07, 3.8e+07, 3.8e+07, 3.8e+07, 3.8e+07, 3.8e+07, 
    3.8e+07), question = c("What is the population of Tokyo and its agglomeration?", 
    "What is the population of Tokyo and its agglomeration?", 
    "What is the population of Tokyo and its agglomeration?", 
    "What is the population of Tokyo and its agglomeration?", 
    "What is the population of Tokyo and its agglomeration?", 
    "What is the population of Tokyo and its agglomeration?", 
    "What is the population of Tokyo and its agglomeration?", 
    "What is the population of Tokyo and its agglomeration?", 
    "What is the population of Tokyo and its agglomeration?", 
    "What is the population of Tokyo and its agglomeration?"), 
    type = c("unlimited", "unlimited", "unlimited", "unlimited", 
    "unlimited", "unlimited", "unlimited", "unlimited", "unlimited", 
    "unlimited"), category = c("demographics", "demographics", 
    "demographics", "demographics", "demographics", "demographics", 
    "demographics", "demographics", "demographics", "demographics"
    ), age = c("NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", 
    "NA", "NA"), gender = c("NA", "NA", "NA", "NA", "NA", "NA", 
    "NA", "NA", "NA", "NA")), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

Answer 1

这是一个 dplyr 方法：

library(dplyr)
df %>%
  group_by(question_ID) %>%
  summarize(prop_over = mean(estimate > correct),
            prop_under = mean(estimate < correct),
            prop_correct = mean(estimate == correct)
  )
# `summarise()` ungrouping output (override with `.groups` argument)
# # A tibble: 1 x 4
#   question_ID prop_over prop_under prop_correct
#         <dbl>     <dbl>      <dbl>        <dbl>
# 1           1         0          1            0

Answer 2

list1 <- lapply(split(DF, DF$question_ID), function (x) {
  overestimated <- 100 * length(which(x$estimate > x$correct)) / length(x$estimate)
  underestimated <- 100 * length(which(x$estimate < x$correct)) / length(x$estimate)
  correct <- 100 * length(which(x$estimate == x$correct)) / length(x$estimate)
  data.frame(overestimated, underestimated, correct)
})
list2 <- mapply(function (x, y) {
  x$question_ID <- y
  return (x)
}, x = list1, y = names(list1), SIMPLIFY = F)
Percent_Data <- do.call("rbind", list2)
Percent_Data <- Percent_Data[, c(which(colnames(Percent_Data) == "question_ID"), which(colnames(Percent_Data) != "question_ID"))]
Percent_Data
#   question_ID overestimated underestimated correct
# 1           1             0            100       0

您如何计算另一列中超过和低于特定数字的幅度（百分比）？

How do you compute magnitudes (percentages) that are over and under a specific number in another column?

r

percentage

magnitude