使用 dplyr 计算融化数据上某个值的出现次数
Count occurrences of a value on melted data using dplyr
我正在尝试在获得 melt
我的数据后做一个简单的 table
,但使用 dplyr
。
我的数据是这样的
cluster 21:30 21:45
4 c alone alone
6 b % %
12 e partner partner
14 b partner partner
20 b alone alone
22 c partner partner
使用table
我可以简单地
table(dta$cluster)
a b c d e
2 8 5 1 4
如何使用 melt
和 summarise
获得相同的结果?
library(dplyr)
library(reshape2)
dta %>%
melt(id.vars = 'cluster') %>%
group_by(cluster) %>%
summarise( n() )
我真正需要的是 table
集群 在 融化数据之后。
所以要正确计算这个 data.frame
dta %>%
melt(id.vars = 'cluster')
预期的输出是这个
cluster variable value n_cluster
1 a 21:30 . 2
2 a 21:30 nuclear 2
3 a 21:45 . 2
4 a 21:45 nuclear 2
5 b 21:30 % 8
6 b 21:30 partner 8
7 b 21:30 alone 8
8 b 21:30 partner 8
9 b 21:30 partner 8
10 b 21:30 nuclear 8
11 b 21:30 partner 8
12 b 21:30 partner 8
13 b 21:45 % 8
14 b 21:45 partner 8
15 b 21:45 alone 8
16 b 21:45 partner 8
17 b 21:45 partner 8
18 b 21:45 nuclear 8
19 b 21:45 partner 8
20 b 21:45 partner 8
21 c 21:30 alone 5
22 c 21:30 partner 5
23 c 21:30 % 5
24 c 21:30 partner 5
25 c 21:30 partner 5
26 c 21:45 alone 5
27 c 21:45 partner 5
28 c 21:45 % 5
29 c 21:45 partner 5
30 c 21:45 partner 5
31 d 21:30 partner 1
32 d 21:45 alone 1
33 e 21:30 partner 4
34 e 21:30 nuclear 4
35 e 21:30 nuclear 4
36 e 21:30 nuclear 4
37 e 21:45 partner 4
38 e 21:45 nuclear 4
39 e 21:45 nuclear 4
40 e 21:45 nuclear 4
有什么想法吗?
dta = structure(list(cluster = structure(c(3L, 2L, 5L, 2L, 2L, 3L,
5L, 3L, 1L, 3L, 1L, 2L, 5L, 3L, 2L, 2L, 2L, 2L, 4L, 5L), .Label = c("a",
"b", "c", "d", "e"), class = "factor"), `21:30` = structure(c(2L,
7L, 5L, 5L, 2L, 5L, 4L, 7L, 1L, 5L, 4L, 5L, 4L, 5L, 5L, 4L, 5L,
5L, 5L, 4L), .Label = c(".", "alone", "children", "nuclear",
"partner", "*", "%"), class = "factor"), `21:45` = structure(c(2L,
7L, 5L, 5L, 2L, 5L, 4L, 7L, 1L, 5L, 4L, 5L, 4L, 5L, 5L, 4L, 5L,
5L, 2L, 4L), .Label = c(".", "alone", "children", "nuclear",
"partner", "*", "%"), class = "factor")), .Names = c("cluster",
"21:30", "21:45"), row.names = c("4", "6", "12", "14", "20",
"22", "23", "28", "30", "32", "36", "38", "40", "42", "44", "48",
"50", "56", "57", "60"), class = "data.frame")
我似乎找不到一个好的骗局,但是一个简单的 dplyr
习语将只使用 count
count(dta, cluster)
# Source: local data frame [5 x 2]
#
# cluster n
# 1 a 2
# 2 b 8
# 3 c 5
# 4 d 1
# 5 e 4
根据您想要的新输出,您可以将此结果加入您的融化数据集
dta %>%
melt(id.vars = 'cluster') %>%
left_join(., count(dta, cluster)) %>%
arrange(cluster)
# cluster variable value n
# 1 a 21:30 . 2
# 2 a 21:30 nuclear 2
# 3 a 21:45 . 2
# 4 a 21:45 nuclear 2
# 5 b 21:30 % 8
# 6 b 21:30 partner 8
# 7 b 21:30 alone 8
#...
在计算重复观察变量的分布时,应考虑观察次数。
在这个例子中
n_episode = 2
那么代码就变得简单了
dta %>%
melt(id.vars = 'cluster') %>%
group_by(cluster) %>%
mutate( n_cluster = n() / n_episode) %>%
arrange(cluster)
可以使用此结果 (n_episode
) 来计算不同大小的组的平均值。
我正在尝试在获得 melt
我的数据后做一个简单的 table
,但使用 dplyr
。
我的数据是这样的
cluster 21:30 21:45
4 c alone alone
6 b % %
12 e partner partner
14 b partner partner
20 b alone alone
22 c partner partner
使用table
我可以简单地
table(dta$cluster)
a b c d e
2 8 5 1 4
如何使用 melt
和 summarise
获得相同的结果?
library(dplyr)
library(reshape2)
dta %>%
melt(id.vars = 'cluster') %>%
group_by(cluster) %>%
summarise( n() )
我真正需要的是 table
集群 在 融化数据之后。
所以要正确计算这个 data.frame
dta %>%
melt(id.vars = 'cluster')
预期的输出是这个
cluster variable value n_cluster
1 a 21:30 . 2
2 a 21:30 nuclear 2
3 a 21:45 . 2
4 a 21:45 nuclear 2
5 b 21:30 % 8
6 b 21:30 partner 8
7 b 21:30 alone 8
8 b 21:30 partner 8
9 b 21:30 partner 8
10 b 21:30 nuclear 8
11 b 21:30 partner 8
12 b 21:30 partner 8
13 b 21:45 % 8
14 b 21:45 partner 8
15 b 21:45 alone 8
16 b 21:45 partner 8
17 b 21:45 partner 8
18 b 21:45 nuclear 8
19 b 21:45 partner 8
20 b 21:45 partner 8
21 c 21:30 alone 5
22 c 21:30 partner 5
23 c 21:30 % 5
24 c 21:30 partner 5
25 c 21:30 partner 5
26 c 21:45 alone 5
27 c 21:45 partner 5
28 c 21:45 % 5
29 c 21:45 partner 5
30 c 21:45 partner 5
31 d 21:30 partner 1
32 d 21:45 alone 1
33 e 21:30 partner 4
34 e 21:30 nuclear 4
35 e 21:30 nuclear 4
36 e 21:30 nuclear 4
37 e 21:45 partner 4
38 e 21:45 nuclear 4
39 e 21:45 nuclear 4
40 e 21:45 nuclear 4
有什么想法吗?
dta = structure(list(cluster = structure(c(3L, 2L, 5L, 2L, 2L, 3L,
5L, 3L, 1L, 3L, 1L, 2L, 5L, 3L, 2L, 2L, 2L, 2L, 4L, 5L), .Label = c("a",
"b", "c", "d", "e"), class = "factor"), `21:30` = structure(c(2L,
7L, 5L, 5L, 2L, 5L, 4L, 7L, 1L, 5L, 4L, 5L, 4L, 5L, 5L, 4L, 5L,
5L, 5L, 4L), .Label = c(".", "alone", "children", "nuclear",
"partner", "*", "%"), class = "factor"), `21:45` = structure(c(2L,
7L, 5L, 5L, 2L, 5L, 4L, 7L, 1L, 5L, 4L, 5L, 4L, 5L, 5L, 4L, 5L,
5L, 2L, 4L), .Label = c(".", "alone", "children", "nuclear",
"partner", "*", "%"), class = "factor")), .Names = c("cluster",
"21:30", "21:45"), row.names = c("4", "6", "12", "14", "20",
"22", "23", "28", "30", "32", "36", "38", "40", "42", "44", "48",
"50", "56", "57", "60"), class = "data.frame")
我似乎找不到一个好的骗局,但是一个简单的 dplyr
习语将只使用 count
count(dta, cluster)
# Source: local data frame [5 x 2]
#
# cluster n
# 1 a 2
# 2 b 8
# 3 c 5
# 4 d 1
# 5 e 4
根据您想要的新输出,您可以将此结果加入您的融化数据集
dta %>%
melt(id.vars = 'cluster') %>%
left_join(., count(dta, cluster)) %>%
arrange(cluster)
# cluster variable value n
# 1 a 21:30 . 2
# 2 a 21:30 nuclear 2
# 3 a 21:45 . 2
# 4 a 21:45 nuclear 2
# 5 b 21:30 % 8
# 6 b 21:30 partner 8
# 7 b 21:30 alone 8
#...
在计算重复观察变量的分布时,应考虑观察次数。
在这个例子中
n_episode = 2
那么代码就变得简单了
dta %>%
melt(id.vars = 'cluster') %>%
group_by(cluster) %>%
mutate( n_cluster = n() / n_episode) %>%
arrange(cluster)
可以使用此结果 (n_episode
) 来计算不同大小的组的平均值。