使用 dplyr 计算融化数据上某个值的出现次数

Count occurrences of a value on melted data using dplyr

我正在尝试在获得 melt 我的数据后做一个简单的 table,但使用 dplyr

我的数据是这样的

   cluster   21:30   21:45
4        c   alone   alone
6        b       %       %
12       e partner partner
14       b partner partner
20       b   alone   alone
22       c partner partner

使用table我可以简单地

table(dta$cluster)
   a b c d e 
   2 8 5 1 4 

如何使用 meltsummarise 获得相同的结果?

 library(dplyr)
 library(reshape2)

 dta %>% 
 melt(id.vars = 'cluster')  %>% 
 group_by(cluster) %>% 
 summarise( n() ) 

我真正需要的是 table 集群 融化数据之后。

所以要正确计算这个 data.frame

 dta %>% 
 melt(id.vars = 'cluster')

预期的输出是这个

      cluster variable   value n_cluster
1        a    21:30       .         2
2        a    21:30 nuclear         2
3        a    21:45       .         2
4        a    21:45 nuclear         2
5        b    21:30       %         8
6        b    21:30 partner         8
7        b    21:30   alone         8
8        b    21:30 partner         8
9        b    21:30 partner         8
10       b    21:30 nuclear         8
11       b    21:30 partner         8
12       b    21:30 partner         8
13       b    21:45       %         8
14       b    21:45 partner         8
15       b    21:45   alone         8
16       b    21:45 partner         8
17       b    21:45 partner         8
18       b    21:45 nuclear         8
19       b    21:45 partner         8
20       b    21:45 partner         8
21       c    21:30   alone         5
22       c    21:30 partner         5
23       c    21:30       %         5
24       c    21:30 partner         5
25       c    21:30 partner         5
26       c    21:45   alone         5
27       c    21:45 partner         5
28       c    21:45       %         5
29       c    21:45 partner         5
30       c    21:45 partner         5
31       d    21:30 partner         1
32       d    21:45   alone         1
33       e    21:30 partner         4
34       e    21:30 nuclear         4
35       e    21:30 nuclear         4
36       e    21:30 nuclear         4
37       e    21:45 partner         4
38       e    21:45 nuclear         4
39       e    21:45 nuclear         4
40       e    21:45 nuclear         4

有什么想法吗?

dta = structure(list(cluster = structure(c(3L, 2L, 5L, 2L, 2L, 3L, 
5L, 3L, 1L, 3L, 1L, 2L, 5L, 3L, 2L, 2L, 2L, 2L, 4L, 5L), .Label = c("a", 
"b", "c", "d", "e"), class = "factor"), `21:30` = structure(c(2L, 
7L, 5L, 5L, 2L, 5L, 4L, 7L, 1L, 5L, 4L, 5L, 4L, 5L, 5L, 4L, 5L, 
5L, 5L, 4L), .Label = c(".", "alone", "children", "nuclear", 
"partner", "*", "%"), class = "factor"), `21:45` = structure(c(2L, 
7L, 5L, 5L, 2L, 5L, 4L, 7L, 1L, 5L, 4L, 5L, 4L, 5L, 5L, 4L, 5L, 
5L, 2L, 4L), .Label = c(".", "alone", "children", "nuclear", 
"partner", "*", "%"), class = "factor")), .Names = c("cluster", 
"21:30", "21:45"), row.names = c("4", "6", "12", "14", "20", 
"22", "23", "28", "30", "32", "36", "38", "40", "42", "44", "48", 
"50", "56", "57", "60"), class = "data.frame")

我似乎找不到一个好的骗局,但是一个简单的 dplyr 习语将只使用 count

count(dta, cluster)
# Source: local data frame [5 x 2]
# 
#   cluster n
# 1       a 2
# 2       b 8
# 3       c 5
# 4       d 1
# 5       e 4

根据您想要的新输出,您可以将此结果加入您的融化数据集

dta %>% 
  melt(id.vars = 'cluster')  %>% 
  left_join(., count(dta, cluster)) %>%
  arrange(cluster)
#    cluster variable   value n
# 1        a    21:30       . 2
# 2        a    21:30 nuclear 2
# 3        a    21:45       . 2
# 4        a    21:45 nuclear 2
# 5        b    21:30       % 8
# 6        b    21:30 partner 8
# 7        b    21:30   alone 8
#...

在计算重复观察变量的分布时,应考虑观察次数。

在这个例子中

n_episode = 2 

那么代码就变得简单了

dta %>% 
  melt(id.vars = 'cluster')  %>% 
  group_by(cluster) %>% 
  mutate( n_cluster = n() / n_episode) %>% 
  arrange(cluster)

可以使用此结果 (n_episode) 来计算不同大小的组的平均值。