嵌套数据帧中子集的平均值 (R)

Mean of subset in nested dataframe (R)

我在 R 中有以下玩具数据框,我试图在其中取 true/false 值 条件和名称中的平均值。

Name  Condition  Values
1     A          True
1     B          False
1     A          True
2     B          True
2     B          False
3     A          False
4     A          True
4     B          True
...   ...        ...

有人对处理这种嵌套结构有什么建议吗?我是 R 的新手,不确定我是否需要使用 group_by 或聚合或其他东西。非常感谢!

期望的输出:

Name  Condition  Values(mean)
1     A          1
1     B          0
2     A          0
2     B          0.5
3     A          0
3     B          0
4     A          1
4     B          1
...   ...        ...

我们可以按 'Name'、'Condition' 分组,并获取逻辑 vectormean 以创建 'Values' 列

library(dplyr)
df1 %>%
     group_by(Name, Condition) %>%
     mutate(Values = mean(Values == 'True'))  
# A tibble: 8 x 3
# Groups:   Name, Condition [6]
#   Name Condition Values
#  <int> <chr>      <dbl>
#1     1 A            1  
#2     1 B            0  
#3     1 A            1  
#4     2 B            0.5
#5     2 B            0.5
#6     3 A            0  
#7     4 A            1  
#8     4 B            1  

数据

df1 <- structure(list(Name = c(1L, 1L, 1L, 2L, 2L, 3L, 4L, 4L), Condition = c("A", 
"B", "A", "B", "B", "A", "A", "B"), Values = c("True", "False", 
"True", "True", "False", "False", "True", "True")), 
class = "data.frame", row.names = c(NA, 
-8L))       

试试这个:

#Data
df1 <- structure(list(Name = c(1L, 1L, 1L, 2L, 2L, 3L, 4L, 4L), Condition = c("A", 
"B", "A", "B", "B", "A", "A", "B"), Values = c("True", "False", 
"True", "True", "False", "False", "True", "True")), class = "data.frame", row.names = c(NA, 
-8L))
#Code
library(dplyr)
#Mutate
df1 %>% mutate(Index=ifelse(Values=='True',1,0)) %>% group_by(Name,Condition) %>%
  summarise(Avg = mean(Index,na.rm=T))

# A tibble: 6 x 3
# Groups:   Name [4]
   Name Condition   Avg
  <int> <chr>     <dbl>
1     1 A           1  
2     1 B           0  
3     2 B           0.5
4     3 A           0  
5     4 A           1  
6     4 B           1  

您可以将 Values 列转换为逻辑列,并为每个 NameConditionmean。使用基数 R aggregate :

df$Values <- as.logical(df$Values)
aggregate(Values~Name + Condition, df, mean)

#  Name Condition Values
#1    1         A    1.0
#2    3         A    0.0
#3    4         A    1.0
#4    1         B    0.0
#5    2         B    0.5
#6    4         B    1.0