如何在 R 中对多列条件进行子集化？

Question

全部，

我的 dataset 如下所示。我正在尝试回答以下问题。

问题：

仅根据绘图纸数据，商店销售的一种纸张子类型 (paper.type) 的数量（units.sold 列）是否比其他商店多？

为了回答上述问题，我使用了 tapply 函数，我可以在其中过滤两篇论文的数据。现在我不确定如何进一步获取图纸数据。感谢您的帮助！

我的代码

tapply(df$units.sold,list(df$paper,df$paper.type,df$store),sum)

数据集

             date year     rep     store paper          paper.type  unit.price   units.sold total.sale
9991  12/30/2015 2015     Ran    Dublin watercolor      sheet       0.77          5       3.85
9992  12/30/2015 2015     Ran    Dublin    drawing       pads      10.26          1      10.26
9993  12/30/2015 2015  Arijit  Syracuse watercolor        pad      12.15          2      24.30
9994  12/30/2015 2015  Thomas Davenport    drawing       roll      20.99          1      20.99
9995  12/31/2015 2015   Ruisi    Dublin watercolor      sheet       0.77          7       5.39
9996  12/31/2015 2015   Mohit Davenport    drawing       roll      20.99          1      20.99
9997  12/31/2015 2015    Aman  Portland    drawing       pads      10.26          1      10.26
9998  12/31/2015 2015 Barakat  Portland watercolor      block      19.34          1      19.34
9999  12/31/2015 2015  Yunzhu  Syracuse    drawing    journal      24.94          1      24.94
10000 12/31/2015 2015    Aman  Portland watercolor      block      19.34          1      19.34

注意：我是新手 R.Please 请提供解释以及您的代码。

Answer 1

您可以根据 store 和 paper.type

从 unit.sold 列开始 aggregate

aggregate(units.sold~store+paper.type, df[df$paper == "drawing", ], sum)

#      store paper.type units.sold
#1  Syracuse    journal          1
#2    Dublin       pads          1
#3  Portland       pads          1
#4 Davenport       roll          2

此处我们仅过滤 "drawing" 类型 paper 的数据。我们可以根据此输出比较每个 store 和 paper.type 的 units.sold 的数量。

Answer 2

使用 tidyverse 中的 dplyr 及其 filter 函数启动。您可以使用 %>% 管道运算符将函数链接在一起。

df2 <- df %>% 
  filter(paper == "drawing") %>% 
  group_by(store, paper.type) %>% 
  summarise(units.sold = sum(units.sold))

  store     paper.type units.sold
  <chr>     <chr>           <dbl>
1 Davenport roll                2
2 Dublin    pads                1
3 Portland  pads                1
4 Syracuse  journal             1

Answer 3

我们可以使用data.table。使用setDT将'data.frame'转换为'data.table'，按'store''paper.type'分组，将i表达式（paper == 'drawing'）指定为子集行并通过获取 sum

来总结 'units.sold'

library(data.table)
setDT(df)[paper == "drawing", .(units.sold = sum(units.sold)), .(store, paper.type)]
#       store paper.type units.sold
#1:    Dublin       pads          1
#2: Davenport       roll          2
#3:  Portland       pads          1
#4:  Syracuse    journal          1

数据

df <-  structure(list(date = c("12/30/2015", "12/30/2015", "12/30/2015", 
"12/30/2015", "12/31/2015", "12/31/2015", "12/31/2015", "12/31/2015", 
"12/31/2015", "12/31/2015"), year = c(2015L, 2015L, 2015L, 2015L, 
2015L, 2015L, 2015L, 2015L, 2015L, 2015L), rep = c("Ran", "Ran", 
"Arijit", "Thomas", "Ruisi", "Mohit", "Aman", "Barakat", "Yunzhu", 
"Aman"), store = c("Dublin", "Dublin", "Syracuse", "Davenport", 
"Dublin", "Davenport", "Portland", "Portland", "Syracuse", "Portland"
), paper = c("watercolor", "drawing", "watercolor", "drawing", 
"watercolor", "drawing", "drawing", "watercolor", "drawing", 
"watercolor"), paper.type = c("sheet", "pads", "pad", "roll", 
"sheet", "roll", "pads", "block", "journal", "block"), unit.price = c(0.77, 
10.26, 12.15, 20.99, 0.77, 20.99, 10.26, 19.34, 24.94, 19.34), 
    units.sold = c(5L, 1L, 2L, 1L, 7L, 1L, 1L, 1L, 1L, 1L), total.sale = c(3.85, 
    10.26, 24.3, 20.99, 5.39, 20.99, 10.26, 19.34, 24.94, 19.34
    )), class = "data.frame", row.names = c("9991", "9992", "9993", 
"9994", "9995", "9996", "9997", "9998", "9999", "10000"))

如何在 R 中对多列条件进行子集化？

How to subset multiple columns condition in R?

r

subset

dataset

tapply

数据