Rstudio，按特定品牌计算、总和和百分比

Question

我正在努力解决 R studio 中的 3 个问题。即使我可以计算出非常简单的方法，它也不是很干净（就像很多次使用过滤函数一样）。这是下面的示例数据集。 The dataset .

其实真实的数据集有100万以上，想知道一个高效的计算方法。另外，如果可能的话，我想避免创建新的数据集。

我想在这个集合中做的是

如何总结"the count of household by brand A"?
如何总结"the sum of sales by brand A"?
如何总结"brand A percentage of all household"（我是说"penetration"）？

感谢您的帮助。非常感谢。

Answer 1

这是一个使用 data.table 的解决方案。有许多资源会告诉您如何汇总变量，并且有几种不同的方法可以做到这一点。

如果您阅读有关 data.table here 的内容，它也会回答您的问题。

library(data.table)
library(magrittr)
library(reprex)

x <- data.table(code = 1:5,
                sales = 2000,
                household = c(12345, 3598, 456, 45698, 4875),
                brand = c("A", "B", "A", "C", "A"))

# gives counts by brands and then filtered for brand A
one <- x[, .N, by = .(brand)] %>% 
  .[brand == "A"]

# gives sums by brand
two <- x[, sum(sales), by = .(brand)] %>% 
  .[brand == "A"]

# make new column with percent of household
x[, percent := household / sum(x$household)]

# summarise the percent
three <- x[, sum(percent), by = .(brand)]

由 reprex package (v0.2.0) 创建于 2019-02-10。

Answer 2

这是一个完美的示例，说明在何处使用名为 tidyverse (https://www.tidyverse.org) 的软件包集合。 dplyr 是 tidyverse 中的一个包，它提供了一种非常简单、透明和可读的方式来做到这一点。您不必创建新的数据框。

在您的数据集中，我认为列 household 是家庭的 ID 号。（如果实际上是家庭数量，可以轻松调整代码来回答您的问题）

品牌 A 的家庭数量：

countbrandA<-dataset %>% filter(brand=="A") %>% summarize(N=n())

品牌A的销售额总和

totalsalesbrandA <- filter(brand=="A") %>% summarize(salestotal=sum(sales))

品牌占所有家庭的百分比。首先得到品牌 A 的家庭总数，然后除以总数。

grandtotal<-dataset %>% summarize(N=n())
brandpercentageA=countbrandA/grandtotal

或者要在单个 dplyr 管道中获得每个品牌的比例，您可以这样做

brandpercentage <- dataset %>% 
    group_by(brand) %>% 
    summarize(N=n()) %>%
    ungroup() %>%
    mutate(percent=N/grandtotal)

Rstudio，按特定品牌计算、总和和百分比

R studio, count, sum & percentage by specific brand

r

sum

count

percentage