R 在两个变量上聚合数据框并应用函数
R aggregate dataframe on two variables and apply function
我有一个数据框,我想聚合两个变量,在每个测量值上应用函数均值。这里是数据帧的头部:
Subject Activity meassureA meassureB meassureC meassureD
1 1 running 0.2820216 -0.037696218 -0.13489730 -0.3282802
2 1 running 0.2558408 -0.064550029 -0.09518634 -0.2292069
3 1 walking 0.2548672 0.003814723 -0.12365809 -0.2751579
4 2 running 0.3433705 -0.014446221 -0.16737697 -0.2299235
现在,我想得到这样的东西:
Subject Activity meassureA meassureB meassureC meassureD
1 1 running mean(S1,A1) mean(S1,A1) mean(S1,A1) mean(S1,A1)
2 1 walking mean(S1,A2) mean(S1,A2) mean(S1,A2) mean(S1,A2)
3 2 running mean(S2,A1) mean(S2,A1) mean(S2,A1) mean(S2,A1)
4 2 walking mean(S2,A2) mean(S2,A2) mean(S2,A2) mean(S2,A2)
其中测量值 A 是执行 activity 运行 (A1) 的受试者 1 (S1) 所有值的平均值。
我正在考虑使用 aggregate(),但我无法将到目前为止学到的知识应用到我的问题中。非常感谢任何帮助。
正如 David 在评论中提到的那样,您会这样做:
aggregate(. ~ Subject + Activity, df, mean)
或使用data.table
:
data.table::setDT(df)[, lapply(.SD, mean), by = .(Subject, Activity)]
或使用dplyr
:
library(dplyr)
df %>% group_by(Subject, Activity) %>% summarise_each(funs(mean))
给出:
# Subject Activity meassureA meassureB meassureC meassureD
#1 1 running 0.2689312 -0.051123123 -0.1150418 -0.2787436
#2 1 walking 0.2548672 0.003814723 -0.1236581 -0.2751579
#3 2 running 0.3433705 -0.014446221 -0.1673770 -0.2299235
我有一个数据框,我想聚合两个变量,在每个测量值上应用函数均值。这里是数据帧的头部:
Subject Activity meassureA meassureB meassureC meassureD
1 1 running 0.2820216 -0.037696218 -0.13489730 -0.3282802
2 1 running 0.2558408 -0.064550029 -0.09518634 -0.2292069
3 1 walking 0.2548672 0.003814723 -0.12365809 -0.2751579
4 2 running 0.3433705 -0.014446221 -0.16737697 -0.2299235
现在,我想得到这样的东西:
Subject Activity meassureA meassureB meassureC meassureD
1 1 running mean(S1,A1) mean(S1,A1) mean(S1,A1) mean(S1,A1)
2 1 walking mean(S1,A2) mean(S1,A2) mean(S1,A2) mean(S1,A2)
3 2 running mean(S2,A1) mean(S2,A1) mean(S2,A1) mean(S2,A1)
4 2 walking mean(S2,A2) mean(S2,A2) mean(S2,A2) mean(S2,A2)
其中测量值 A 是执行 activity 运行 (A1) 的受试者 1 (S1) 所有值的平均值。
我正在考虑使用 aggregate(),但我无法将到目前为止学到的知识应用到我的问题中。非常感谢任何帮助。
正如 David 在评论中提到的那样,您会这样做:
aggregate(. ~ Subject + Activity, df, mean)
或使用data.table
:
data.table::setDT(df)[, lapply(.SD, mean), by = .(Subject, Activity)]
或使用dplyr
:
library(dplyr)
df %>% group_by(Subject, Activity) %>% summarise_each(funs(mean))
给出:
# Subject Activity meassureA meassureB meassureC meassureD
#1 1 running 0.2689312 -0.051123123 -0.1150418 -0.2787436
#2 1 walking 0.2548672 0.003814723 -0.1236581 -0.2751579
#3 2 running 0.3433705 -0.014446221 -0.1673770 -0.2299235