如何从 R 数据帧中获取条件结果

Question

这是我在这里的第一条消息。我正在尝试解决 edX R 课程中的 R 练习，但我陷入了困境。如果有人可以帮助我解决它，那就太好了。以下是给出的数据框和问题：

> students
   height shoesize gender population
1     181       44   male     kuopio
2     160       38 female     kuopio
3     174       42 female     kuopio
4     170       43   male     kuopio
5     172       43   male     kuopio
6     165       39 female     kuopio
7     161       38 female     kuopio
8     167       38 female    tampere
9     164       39 female    tampere
10    166       38 female    tampere
11    162       37 female    tampere
12    158       36 female    tampere
13    175       42   male    tampere
14    181       44   male    tampere
15    180       43   male    tampere
16    177       43   male    tampere
17    173       41   male    tampere

Given the dataframe above, create two subsets with students whose height is equal to or below the median height (call it students.short) and students whose height is strictly above the median height (call it students.tall). What is the mean shoesize for each of the above 2 subsets by population?

我已经能够创建两个子集 students.tall 和 students.short（都显示TRUE/FALSE) 的回答，但我不知道如何按人口获得平均值。数据应该这样显示：

                    kuopio     tampere
students.short      xxxx       xxxx
students.tall       xxxx       xxxx

非常感谢，如果你能帮助我！

Answer 1

我们可以 split 基于 median 高度的逻辑向量

# // median height
medHeight <- median(students$height, na.rm = TRUE)

# // split the data into a list of data.frames using the 'medHeight'
lst1 <- with(students, split(students, height > medHeight))

然后循环使用 list 从 base R

使用 aggregate

lapply(lst1, function(dat) aggregate(shoesize ~ population, 
        data = dat, FUN = mean, na.rm = TRUE))

但是，我们不需要创建两个单独的数据集或 list。可以通过将 'population' 和使用 logical vector

创建的 'grp' 分组来完成

library(dplyr)
students %>%
     group_by(grp = height > medHeight, population) %>%
     summarise(shoesize = mean(shoesize))

Answer 2

你可以试试这个：

#Code
students.short <- students[students$height<=median(students$height),]
students.tall <- students[students$height>median(students$height),]
#Mean
mean(students.short$shoesize)
mean(students.tall$shoesize)

输出：

[1] 38.44444
[1] 42.75

Answer 3

您可以在 tidyr 中使用 pivot_wider() 并将参数 values_fn 设置为 mean。

library(dplyr)
library(tidyr)

df %>%
  mutate(grp = if_else(height > median(height), "students.tall", "students.short")) %>%
  pivot_wider(id_cols = grp, names_from = population, values_from = height, values_fn = mean)

# # A tibble: 2 x 3
#   grp            kuopio tampere
#   <chr>           <dbl>   <dbl>
# 1 students.tall    176.    177.
# 2 students.short   164     163.

使用base方式，你可以尝试xtabs()，其中returns一个table对象。

xtabs(height ~ grp + population,
      aggregate(height ~ grp + population, FUN = mean,
                transform(df, grp = ifelse(height > median(height), "students.tall", "students.short"))))

#                 population
# grp                kuopio  tampere
#   students.short 164.0000 163.4000
#   students.tall  175.6667 177.2000

注： 要将table对象转换成data.frame，你可以使用 as.data.frame.matrix().

如何从 R 数据帧中获取条件结果

How to obtain conditioned results from an R dataframe

r

edx