计算多列中的特定类别并将它们与总和行相乘
Count specific categories in multiple columns and multiplicate them with a sum row
我想计算每个类别在我的数据框中出现的频率。
为此,我需要计算每一行中的类别,并将该数字乘以第 5 列的总和。
(我的分析不需要 c4 列)
首选输出为:
分析 = 131
广告= 253
身份证明= ..
我的数据是这样的:
tracker_category <- data.frame = c("Tracker1", "Tracker2", "Tracker3", "Tracker4","Tracker5","Tracker6"),
c1 = c("Analytics", "Crash", "Location", "Identification", "Analytics", "Ads"),
c2 = c("Ads", "Analytics", "Location", "Analytics", "Identification", "Ads"),
c3 = c("Identification", "Analytics", "Ads", "Ads", "Analytics", "Location"),
c4 = c("url1.com","ur2.com","url3.com","url4.com","url5.com","url6.com"),
sum_tracker = c(1,20,100,0,5,76))
下面应该会产生你想要的东西。
您可以将数据框转换为“长”格式,然后添加出现次数(您的第 5 列)。
数据
注意:为了支持再现性,我更正了您的数据框定义。
tracker_category <- data.frame(
id = c("Tracker1", "Tracker2", "Tracker3", "Tracker4","Tracker5","Tracker6"),
c1 = c("Analytics", "Crash", "Location", "Identification", "Analytics", "Ads"),
c2 = c("Ads", "Analytics", "Location", "Analytics", "Identification", "Ads"),
c3 = c("Identification", "Analytics", "Ads", "Ads", "Analytics", "Location"),
c4 = c("url1.com","ur2.com","url3.com","url4.com","url5.com","url6.com"),
sum_tracker = c(1,20,100,0,5,76)
)
强制转换为长格式
{tidyr}
为此提供了一个pivot_longer()
函数。
library(dplyr)
library(tidyr)
tracker_category %>%
select(-c4) %>% # remove c4
pivot_longer( cols = c(c1:c3) # which cols to use
, names_to = "action" # where to store the names
, values_to = "categories") # and values
这产生:
# A tibble: 18 x 4
id sum_tracker action categories
<chr> <dbl> <chr> <chr>
1 Tracker1 1 c1 Analytics
2 Tracker1 1 c2 Ads
3 Tracker1 1 c3 Identification
4 Tracker2 20 c1 Crash
5 Tracker2 20 c2 Analytics
6 Tracker2 20 c3 Analytics
7 Tracker3 100 c1 Location
8 Tracker3 100 c2 Location
9 Tracker3 100 c3 Ads
10 Tracker4 0 c1 Identification
11 Tracker4 0 c2 Analytics
12 Tracker4 0 c3 Ads
13 Tracker5 5 c1 Analytics
14 Tracker5 5 c2 Identification
15 Tracker5 5 c3 Analytics
16 Tracker6 76 c1 Ads
17 Tracker6 76 c2 Ads
18 Tracker6 76 c3 Location
通过这种格式,您可以使用 {dplyr}
在您的组上执行 summarise()
。
tracker_category %>%
select(-c4) %>%
pivot_longer(cols = c(c1:c3), names_to = "action", values_to = "categories") %>%
#------------- group by your categories
group_by(categories) %>%
#------------- and sum over your tracked results, note to use sum and not multiplication as we used a long format
summarise(total = sum(sum_tracker))
这产生:
# A tibble: 5 x 2
categories total
<chr> <dbl>
1 Ads 253
2 Analytics 51
3 Crash 20
4 Identification 6
5 Location 276
请检查您的分析示例 131 是否真的正确...
我想计算每个类别在我的数据框中出现的频率。
为此,我需要计算每一行中的类别,并将该数字乘以第 5 列的总和。
(我的分析不需要 c4 列)
首选输出为:
分析 = 131
广告= 253
身份证明= ..
我的数据是这样的:
tracker_category <- data.frame = c("Tracker1", "Tracker2", "Tracker3", "Tracker4","Tracker5","Tracker6"),
c1 = c("Analytics", "Crash", "Location", "Identification", "Analytics", "Ads"),
c2 = c("Ads", "Analytics", "Location", "Analytics", "Identification", "Ads"),
c3 = c("Identification", "Analytics", "Ads", "Ads", "Analytics", "Location"),
c4 = c("url1.com","ur2.com","url3.com","url4.com","url5.com","url6.com"),
sum_tracker = c(1,20,100,0,5,76))
下面应该会产生你想要的东西。
您可以将数据框转换为“长”格式,然后添加出现次数(您的第 5 列)。
数据 注意:为了支持再现性,我更正了您的数据框定义。
tracker_category <- data.frame(
id = c("Tracker1", "Tracker2", "Tracker3", "Tracker4","Tracker5","Tracker6"),
c1 = c("Analytics", "Crash", "Location", "Identification", "Analytics", "Ads"),
c2 = c("Ads", "Analytics", "Location", "Analytics", "Identification", "Ads"),
c3 = c("Identification", "Analytics", "Ads", "Ads", "Analytics", "Location"),
c4 = c("url1.com","ur2.com","url3.com","url4.com","url5.com","url6.com"),
sum_tracker = c(1,20,100,0,5,76)
)
强制转换为长格式
{tidyr}
为此提供了一个pivot_longer()
函数。
library(dplyr)
library(tidyr)
tracker_category %>%
select(-c4) %>% # remove c4
pivot_longer( cols = c(c1:c3) # which cols to use
, names_to = "action" # where to store the names
, values_to = "categories") # and values
这产生:
# A tibble: 18 x 4
id sum_tracker action categories
<chr> <dbl> <chr> <chr>
1 Tracker1 1 c1 Analytics
2 Tracker1 1 c2 Ads
3 Tracker1 1 c3 Identification
4 Tracker2 20 c1 Crash
5 Tracker2 20 c2 Analytics
6 Tracker2 20 c3 Analytics
7 Tracker3 100 c1 Location
8 Tracker3 100 c2 Location
9 Tracker3 100 c3 Ads
10 Tracker4 0 c1 Identification
11 Tracker4 0 c2 Analytics
12 Tracker4 0 c3 Ads
13 Tracker5 5 c1 Analytics
14 Tracker5 5 c2 Identification
15 Tracker5 5 c3 Analytics
16 Tracker6 76 c1 Ads
17 Tracker6 76 c2 Ads
18 Tracker6 76 c3 Location
通过这种格式,您可以使用 {dplyr}
在您的组上执行 summarise()
。
tracker_category %>%
select(-c4) %>%
pivot_longer(cols = c(c1:c3), names_to = "action", values_to = "categories") %>%
#------------- group by your categories
group_by(categories) %>%
#------------- and sum over your tracked results, note to use sum and not multiplication as we used a long format
summarise(total = sum(sum_tracker))
这产生:
# A tibble: 5 x 2
categories total
<chr> <dbl>
1 Ads 253
2 Analytics 51
3 Crash 20
4 Identification 6
5 Location 276
请检查您的分析示例 131 是否真的正确...