嵌套数据帧中子集的平均值 (R)
Mean of subset in nested dataframe (R)
我在 R 中有以下玩具数据框,我试图在其中取 true/false 值 在 条件和名称中的平均值。
Name Condition Values
1 A True
1 B False
1 A True
2 B True
2 B False
3 A False
4 A True
4 B True
... ... ...
有人对处理这种嵌套结构有什么建议吗?我是 R 的新手,不确定我是否需要使用 group_by 或聚合或其他东西。非常感谢!
期望的输出:
Name Condition Values(mean)
1 A 1
1 B 0
2 A 0
2 B 0.5
3 A 0
3 B 0
4 A 1
4 B 1
... ... ...
我们可以按 'Name'、'Condition' 分组,并获取逻辑 vector
的 mean
以创建 'Values' 列
library(dplyr)
df1 %>%
group_by(Name, Condition) %>%
mutate(Values = mean(Values == 'True'))
# A tibble: 8 x 3
# Groups: Name, Condition [6]
# Name Condition Values
# <int> <chr> <dbl>
#1 1 A 1
#2 1 B 0
#3 1 A 1
#4 2 B 0.5
#5 2 B 0.5
#6 3 A 0
#7 4 A 1
#8 4 B 1
数据
df1 <- structure(list(Name = c(1L, 1L, 1L, 2L, 2L, 3L, 4L, 4L), Condition = c("A",
"B", "A", "B", "B", "A", "A", "B"), Values = c("True", "False",
"True", "True", "False", "False", "True", "True")),
class = "data.frame", row.names = c(NA,
-8L))
试试这个:
#Data
df1 <- structure(list(Name = c(1L, 1L, 1L, 2L, 2L, 3L, 4L, 4L), Condition = c("A",
"B", "A", "B", "B", "A", "A", "B"), Values = c("True", "False",
"True", "True", "False", "False", "True", "True")), class = "data.frame", row.names = c(NA,
-8L))
#Code
library(dplyr)
#Mutate
df1 %>% mutate(Index=ifelse(Values=='True',1,0)) %>% group_by(Name,Condition) %>%
summarise(Avg = mean(Index,na.rm=T))
# A tibble: 6 x 3
# Groups: Name [4]
Name Condition Avg
<int> <chr> <dbl>
1 1 A 1
2 1 B 0
3 2 B 0.5
4 3 A 0
5 4 A 1
6 4 B 1
您可以将 Values
列转换为逻辑列,并为每个 Name
和 Condition
取 mean
。使用基数 R aggregate
:
df$Values <- as.logical(df$Values)
aggregate(Values~Name + Condition, df, mean)
# Name Condition Values
#1 1 A 1.0
#2 3 A 0.0
#3 4 A 1.0
#4 1 B 0.0
#5 2 B 0.5
#6 4 B 1.0
我在 R 中有以下玩具数据框,我试图在其中取 true/false 值 在 条件和名称中的平均值。
Name Condition Values
1 A True
1 B False
1 A True
2 B True
2 B False
3 A False
4 A True
4 B True
... ... ...
有人对处理这种嵌套结构有什么建议吗?我是 R 的新手,不确定我是否需要使用 group_by 或聚合或其他东西。非常感谢!
期望的输出:
Name Condition Values(mean)
1 A 1
1 B 0
2 A 0
2 B 0.5
3 A 0
3 B 0
4 A 1
4 B 1
... ... ...
我们可以按 'Name'、'Condition' 分组,并获取逻辑 vector
的 mean
以创建 'Values' 列
library(dplyr)
df1 %>%
group_by(Name, Condition) %>%
mutate(Values = mean(Values == 'True'))
# A tibble: 8 x 3
# Groups: Name, Condition [6]
# Name Condition Values
# <int> <chr> <dbl>
#1 1 A 1
#2 1 B 0
#3 1 A 1
#4 2 B 0.5
#5 2 B 0.5
#6 3 A 0
#7 4 A 1
#8 4 B 1
数据
df1 <- structure(list(Name = c(1L, 1L, 1L, 2L, 2L, 3L, 4L, 4L), Condition = c("A",
"B", "A", "B", "B", "A", "A", "B"), Values = c("True", "False",
"True", "True", "False", "False", "True", "True")),
class = "data.frame", row.names = c(NA,
-8L))
试试这个:
#Data
df1 <- structure(list(Name = c(1L, 1L, 1L, 2L, 2L, 3L, 4L, 4L), Condition = c("A",
"B", "A", "B", "B", "A", "A", "B"), Values = c("True", "False",
"True", "True", "False", "False", "True", "True")), class = "data.frame", row.names = c(NA,
-8L))
#Code
library(dplyr)
#Mutate
df1 %>% mutate(Index=ifelse(Values=='True',1,0)) %>% group_by(Name,Condition) %>%
summarise(Avg = mean(Index,na.rm=T))
# A tibble: 6 x 3
# Groups: Name [4]
Name Condition Avg
<int> <chr> <dbl>
1 1 A 1
2 1 B 0
3 2 B 0.5
4 3 A 0
5 4 A 1
6 4 B 1
您可以将 Values
列转换为逻辑列,并为每个 Name
和 Condition
取 mean
。使用基数 R aggregate
:
df$Values <- as.logical(df$Values)
aggregate(Values~Name + Condition, df, mean)
# Name Condition Values
#1 1 A 1.0
#2 3 A 0.0
#3 4 A 1.0
#4 1 B 0.0
#5 2 B 0.5
#6 4 B 1.0