如何从 R 数据帧中获取条件结果
How to obtain conditioned results from an R dataframe
这是我在这里的第一条消息。我正在尝试解决 edX R 课程中的 R 练习,但我陷入了困境。如果有人可以帮助我解决它,那就太好了。以下是给出的数据框和问题:
> students
height shoesize gender population
1 181 44 male kuopio
2 160 38 female kuopio
3 174 42 female kuopio
4 170 43 male kuopio
5 172 43 male kuopio
6 165 39 female kuopio
7 161 38 female kuopio
8 167 38 female tampere
9 164 39 female tampere
10 166 38 female tampere
11 162 37 female tampere
12 158 36 female tampere
13 175 42 male tampere
14 181 44 male tampere
15 180 43 male tampere
16 177 43 male tampere
17 173 41 male tampere
Given the dataframe above, create two subsets with students whose height is equal to or below the median height (call it students.short) and students whose height is strictly above the median height (call it students.tall). What is the mean shoesize for each of the above 2 subsets by population?
我已经能够创建两个子集 students.tall 和 students.short(都显示TRUE/FALSE
) 的回答,但我不知道如何按人口获得平均值。数据应该这样显示:
kuopio tampere
students.short xxxx xxxx
students.tall xxxx xxxx
非常感谢,如果你能帮助我!
我们可以 split
基于 median
高度的逻辑向量
# // median height
medHeight <- median(students$height, na.rm = TRUE)
# // split the data into a list of data.frames using the 'medHeight'
lst1 <- with(students, split(students, height > medHeight))
然后循环使用 list
从 base R
使用 aggregate
lapply(lst1, function(dat) aggregate(shoesize ~ population,
data = dat, FUN = mean, na.rm = TRUE))
但是,我们不需要创建两个单独的数据集或 list
。可以通过将 'population' 和使用 logical
vector
创建的 'grp' 分组来完成
library(dplyr)
students %>%
group_by(grp = height > medHeight, population) %>%
summarise(shoesize = mean(shoesize))
你可以试试这个:
#Code
students.short <- students[students$height<=median(students$height),]
students.tall <- students[students$height>median(students$height),]
#Mean
mean(students.short$shoesize)
mean(students.tall$shoesize)
输出:
[1] 38.44444
[1] 42.75
您可以在 tidyr
中使用 pivot_wider()
并将参数 values_fn
设置为 mean
。
library(dplyr)
library(tidyr)
df %>%
mutate(grp = if_else(height > median(height), "students.tall", "students.short")) %>%
pivot_wider(id_cols = grp, names_from = population, values_from = height, values_fn = mean)
# # A tibble: 2 x 3
# grp kuopio tampere
# <chr> <dbl> <dbl>
# 1 students.tall 176. 177.
# 2 students.short 164 163.
使用base
方式,你可以尝试xtabs()
,其中returns一个table
对象。
xtabs(height ~ grp + population,
aggregate(height ~ grp + population, FUN = mean,
transform(df, grp = ifelse(height > median(height), "students.tall", "students.short"))))
# population
# grp kuopio tampere
# students.short 164.0000 163.4000
# students.tall 175.6667 177.2000
注: 要将table
对象转换成data.frame
,你可以使用 as.data.frame.matrix()
.
这是我在这里的第一条消息。我正在尝试解决 edX R 课程中的 R 练习,但我陷入了困境。如果有人可以帮助我解决它,那就太好了。以下是给出的数据框和问题:
> students
height shoesize gender population
1 181 44 male kuopio
2 160 38 female kuopio
3 174 42 female kuopio
4 170 43 male kuopio
5 172 43 male kuopio
6 165 39 female kuopio
7 161 38 female kuopio
8 167 38 female tampere
9 164 39 female tampere
10 166 38 female tampere
11 162 37 female tampere
12 158 36 female tampere
13 175 42 male tampere
14 181 44 male tampere
15 180 43 male tampere
16 177 43 male tampere
17 173 41 male tampere
Given the dataframe above, create two subsets with students whose height is equal to or below the median height (call it students.short) and students whose height is strictly above the median height (call it students.tall). What is the mean shoesize for each of the above 2 subsets by population?
我已经能够创建两个子集 students.tall 和 students.short(都显示TRUE/FALSE
) 的回答,但我不知道如何按人口获得平均值。数据应该这样显示:
kuopio tampere
students.short xxxx xxxx
students.tall xxxx xxxx
非常感谢,如果你能帮助我!
我们可以 split
基于 median
高度的逻辑向量
# // median height
medHeight <- median(students$height, na.rm = TRUE)
# // split the data into a list of data.frames using the 'medHeight'
lst1 <- with(students, split(students, height > medHeight))
然后循环使用 list
从 base R
aggregate
lapply(lst1, function(dat) aggregate(shoesize ~ population,
data = dat, FUN = mean, na.rm = TRUE))
但是,我们不需要创建两个单独的数据集或 list
。可以通过将 'population' 和使用 logical
vector
library(dplyr)
students %>%
group_by(grp = height > medHeight, population) %>%
summarise(shoesize = mean(shoesize))
你可以试试这个:
#Code
students.short <- students[students$height<=median(students$height),]
students.tall <- students[students$height>median(students$height),]
#Mean
mean(students.short$shoesize)
mean(students.tall$shoesize)
输出:
[1] 38.44444
[1] 42.75
您可以在 tidyr
中使用 pivot_wider()
并将参数 values_fn
设置为 mean
。
library(dplyr)
library(tidyr)
df %>%
mutate(grp = if_else(height > median(height), "students.tall", "students.short")) %>%
pivot_wider(id_cols = grp, names_from = population, values_from = height, values_fn = mean)
# # A tibble: 2 x 3
# grp kuopio tampere
# <chr> <dbl> <dbl>
# 1 students.tall 176. 177.
# 2 students.short 164 163.
使用base
方式,你可以尝试xtabs()
,其中returns一个table
对象。
xtabs(height ~ grp + population,
aggregate(height ~ grp + population, FUN = mean,
transform(df, grp = ifelse(height > median(height), "students.tall", "students.short"))))
# population
# grp kuopio tampere
# students.short 164.0000 163.4000
# students.tall 175.6667 177.2000
注: 要将table
对象转换成data.frame
,你可以使用 as.data.frame.matrix()
.