通过分组和子分组变量找到平均值,并计算一个值在 R 中的这些组中出现的次数
Find the average values by grouping and sub-grouping variables, and count of the number of times a value occurs within these groups in R
我有一个包含四列数据的数据集。
我想按两个变量对行进行分组,按一个变量对列进行分组
这是我的数据示例
df <- data.frame(
Price = rnorm(24),
Grouping = rep(c("CD", "NW", "SMK", "ghd"),6),
Sub_grouping = c("CDapple", "NWapple", "SMKapple", "ghdapple",
"CDPear", "NWpear", "SMKpear", "ghdpear",
"CDgrape", "NWgrape", "SMKgrape", "ghdgrape",
"CDapple", "NWapple", "SMKapple", "ghdapple",
"CDPear", "NWpear", "SMKpear", "ghdpear",
"CDgrape", "NWgrape", "SMKgrape", "ghdgrape"),
SP = rep(c("SP", "OffSP"),12))
要获得每个子组的价格变量的平均值,我可以运行以下操作:
df <- melt(df)
df_mean <- dcast(df, Grouping + Sub_grouping ~ SP, value.var = "value", fun.aggregate = mean)
我还想要每个分组变量的价格平均值。这可能吗?
我还想计算输入每个平均价格的价格值的数量。因此,对于每个组,按 SP 和 OffSP,输入的价格数量;对于每个 sub_group,由 SP 和 OffSP 提供的价格数量。
有人知道怎么做吗?
我看过这些问题 How can I count the number of instances a value occurs within a subgroup in R?
但是他们的偶然事件 table 是 2x2,我需要一个 table 以分组和子组为行,以 SP / OffSP 为列。
谢谢
我们不需要将其重塑为 'long' 格式来获得 mean
值
library(dplyr)
df %>%
group_by(Grouping) %>% #first grouping
#create the mean column and the count by 'Grouping'
mutate(AvgPrice = mean(Price), n1 = n()) %>%
group_by(Sub_grouping, add= TRUE) %>% #second grouping
#summarise to get the mean within Sub_grouping and count the values with n()
summarise(AvgPrice = first(AvgPrice), n1 = first(n1), AvgPrice2 = mean(Price), n2 = n())
注意:如果我们还需要按 'SP' 分组,则将第一个 group_by
语句更改为
df
%>%
group_by(Grouping, SP) %>%
...
...
如果我们想为每个 'SP' 获取 mean
和 length
并希望作为单独的列,一个紧凑的选项是 dcast
来自 data.table
它可以采用多个函数和多个 value.var
列
library(data.table)
dcast(setDT(df), Grouping + Sub_grouping ~ SP, value.var = "Price", c(mean, length))
我有一个包含四列数据的数据集。
我想按两个变量对行进行分组,按一个变量对列进行分组
这是我的数据示例
df <- data.frame(
Price = rnorm(24),
Grouping = rep(c("CD", "NW", "SMK", "ghd"),6),
Sub_grouping = c("CDapple", "NWapple", "SMKapple", "ghdapple",
"CDPear", "NWpear", "SMKpear", "ghdpear",
"CDgrape", "NWgrape", "SMKgrape", "ghdgrape",
"CDapple", "NWapple", "SMKapple", "ghdapple",
"CDPear", "NWpear", "SMKpear", "ghdpear",
"CDgrape", "NWgrape", "SMKgrape", "ghdgrape"),
SP = rep(c("SP", "OffSP"),12))
要获得每个子组的价格变量的平均值,我可以运行以下操作:
df <- melt(df)
df_mean <- dcast(df, Grouping + Sub_grouping ~ SP, value.var = "value", fun.aggregate = mean)
我还想要每个分组变量的价格平均值。这可能吗?
我还想计算输入每个平均价格的价格值的数量。因此,对于每个组,按 SP 和 OffSP,输入的价格数量;对于每个 sub_group,由 SP 和 OffSP 提供的价格数量。
有人知道怎么做吗?
我看过这些问题
谢谢
我们不需要将其重塑为 'long' 格式来获得 mean
值
library(dplyr)
df %>%
group_by(Grouping) %>% #first grouping
#create the mean column and the count by 'Grouping'
mutate(AvgPrice = mean(Price), n1 = n()) %>%
group_by(Sub_grouping, add= TRUE) %>% #second grouping
#summarise to get the mean within Sub_grouping and count the values with n()
summarise(AvgPrice = first(AvgPrice), n1 = first(n1), AvgPrice2 = mean(Price), n2 = n())
注意:如果我们还需要按 'SP' 分组,则将第一个 group_by
语句更改为
df
%>%
group_by(Grouping, SP) %>%
...
...
如果我们想为每个 'SP' 获取 mean
和 length
并希望作为单独的列,一个紧凑的选项是 dcast
来自 data.table
它可以采用多个函数和多个 value.var
列
library(data.table)
dcast(setDT(df), Grouping + Sub_grouping ~ SP, value.var = "Price", c(mean, length))