如何在 R 中对多列条件进行子集化?
How to subset multiple columns condition in R?
全部,
我的 dataset
如下所示。我正在尝试回答以下问题。
问题:
仅根据绘图纸数据,商店销售的一种纸张子类型 (paper.type) 的数量(units.sold 列)是否比其他商店多?
为了回答上述问题,我使用了 tapply
函数,我可以在其中过滤两篇论文的数据。现在我不确定如何进一步获取图纸数据。感谢您的帮助!
我的代码
tapply(df$units.sold,list(df$paper,df$paper.type,df$store),sum)
数据集
date year rep store paper paper.type unit.price units.sold total.sale
9991 12/30/2015 2015 Ran Dublin watercolor sheet 0.77 5 3.85
9992 12/30/2015 2015 Ran Dublin drawing pads 10.26 1 10.26
9993 12/30/2015 2015 Arijit Syracuse watercolor pad 12.15 2 24.30
9994 12/30/2015 2015 Thomas Davenport drawing roll 20.99 1 20.99
9995 12/31/2015 2015 Ruisi Dublin watercolor sheet 0.77 7 5.39
9996 12/31/2015 2015 Mohit Davenport drawing roll 20.99 1 20.99
9997 12/31/2015 2015 Aman Portland drawing pads 10.26 1 10.26
9998 12/31/2015 2015 Barakat Portland watercolor block 19.34 1 19.34
9999 12/31/2015 2015 Yunzhu Syracuse drawing journal 24.94 1 24.94
10000 12/31/2015 2015 Aman Portland watercolor block 19.34 1 19.34
注意:我是新手 R.Please 请提供解释以及您的代码。
您可以根据 store
和 paper.type
从 unit.sold
列开始 aggregate
aggregate(units.sold~store+paper.type, df[df$paper == "drawing", ], sum)
# store paper.type units.sold
#1 Syracuse journal 1
#2 Dublin pads 1
#3 Portland pads 1
#4 Davenport roll 2
此处我们仅过滤 "drawing" 类型 paper
的数据。我们可以根据此输出比较每个 store
和 paper.type
的 units.sold
的数量。
使用 tidyverse
中的 dplyr
及其 filter
函数启动。您可以使用 %>%
管道运算符将函数链接在一起。
df2 <- df %>%
filter(paper == "drawing") %>%
group_by(store, paper.type) %>%
summarise(units.sold = sum(units.sold))
store paper.type units.sold
<chr> <chr> <dbl>
1 Davenport roll 2
2 Dublin pads 1
3 Portland pads 1
4 Syracuse journal 1
我们可以使用data.table
。使用setDT
将'data.frame'转换为'data.table',按'store''paper.type'分组,将i
表达式(paper == 'drawing'
)指定为子集行并通过获取 sum
来总结 'units.sold'
library(data.table)
setDT(df)[paper == "drawing", .(units.sold = sum(units.sold)), .(store, paper.type)]
# store paper.type units.sold
#1: Dublin pads 1
#2: Davenport roll 2
#3: Portland pads 1
#4: Syracuse journal 1
数据
df <- structure(list(date = c("12/30/2015", "12/30/2015", "12/30/2015",
"12/30/2015", "12/31/2015", "12/31/2015", "12/31/2015", "12/31/2015",
"12/31/2015", "12/31/2015"), year = c(2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L), rep = c("Ran", "Ran",
"Arijit", "Thomas", "Ruisi", "Mohit", "Aman", "Barakat", "Yunzhu",
"Aman"), store = c("Dublin", "Dublin", "Syracuse", "Davenport",
"Dublin", "Davenport", "Portland", "Portland", "Syracuse", "Portland"
), paper = c("watercolor", "drawing", "watercolor", "drawing",
"watercolor", "drawing", "drawing", "watercolor", "drawing",
"watercolor"), paper.type = c("sheet", "pads", "pad", "roll",
"sheet", "roll", "pads", "block", "journal", "block"), unit.price = c(0.77,
10.26, 12.15, 20.99, 0.77, 20.99, 10.26, 19.34, 24.94, 19.34),
units.sold = c(5L, 1L, 2L, 1L, 7L, 1L, 1L, 1L, 1L, 1L), total.sale = c(3.85,
10.26, 24.3, 20.99, 5.39, 20.99, 10.26, 19.34, 24.94, 19.34
)), class = "data.frame", row.names = c("9991", "9992", "9993",
"9994", "9995", "9996", "9997", "9998", "9999", "10000"))
全部,
我的 dataset
如下所示。我正在尝试回答以下问题。
问题:
仅根据绘图纸数据,商店销售的一种纸张子类型 (paper.type) 的数量(units.sold 列)是否比其他商店多?
为了回答上述问题,我使用了 tapply
函数,我可以在其中过滤两篇论文的数据。现在我不确定如何进一步获取图纸数据。感谢您的帮助!
我的代码
tapply(df$units.sold,list(df$paper,df$paper.type,df$store),sum)
数据集
date year rep store paper paper.type unit.price units.sold total.sale
9991 12/30/2015 2015 Ran Dublin watercolor sheet 0.77 5 3.85
9992 12/30/2015 2015 Ran Dublin drawing pads 10.26 1 10.26
9993 12/30/2015 2015 Arijit Syracuse watercolor pad 12.15 2 24.30
9994 12/30/2015 2015 Thomas Davenport drawing roll 20.99 1 20.99
9995 12/31/2015 2015 Ruisi Dublin watercolor sheet 0.77 7 5.39
9996 12/31/2015 2015 Mohit Davenport drawing roll 20.99 1 20.99
9997 12/31/2015 2015 Aman Portland drawing pads 10.26 1 10.26
9998 12/31/2015 2015 Barakat Portland watercolor block 19.34 1 19.34
9999 12/31/2015 2015 Yunzhu Syracuse drawing journal 24.94 1 24.94
10000 12/31/2015 2015 Aman Portland watercolor block 19.34 1 19.34
注意:我是新手 R.Please 请提供解释以及您的代码。
您可以根据 store
和 paper.type
unit.sold
列开始 aggregate
aggregate(units.sold~store+paper.type, df[df$paper == "drawing", ], sum)
# store paper.type units.sold
#1 Syracuse journal 1
#2 Dublin pads 1
#3 Portland pads 1
#4 Davenport roll 2
此处我们仅过滤 "drawing" 类型 paper
的数据。我们可以根据此输出比较每个 store
和 paper.type
的 units.sold
的数量。
使用 tidyverse
中的 dplyr
及其 filter
函数启动。您可以使用 %>%
管道运算符将函数链接在一起。
df2 <- df %>%
filter(paper == "drawing") %>%
group_by(store, paper.type) %>%
summarise(units.sold = sum(units.sold))
store paper.type units.sold
<chr> <chr> <dbl>
1 Davenport roll 2
2 Dublin pads 1
3 Portland pads 1
4 Syracuse journal 1
我们可以使用data.table
。使用setDT
将'data.frame'转换为'data.table',按'store''paper.type'分组,将i
表达式(paper == 'drawing'
)指定为子集行并通过获取 sum
library(data.table)
setDT(df)[paper == "drawing", .(units.sold = sum(units.sold)), .(store, paper.type)]
# store paper.type units.sold
#1: Dublin pads 1
#2: Davenport roll 2
#3: Portland pads 1
#4: Syracuse journal 1
数据
df <- structure(list(date = c("12/30/2015", "12/30/2015", "12/30/2015",
"12/30/2015", "12/31/2015", "12/31/2015", "12/31/2015", "12/31/2015",
"12/31/2015", "12/31/2015"), year = c(2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L), rep = c("Ran", "Ran",
"Arijit", "Thomas", "Ruisi", "Mohit", "Aman", "Barakat", "Yunzhu",
"Aman"), store = c("Dublin", "Dublin", "Syracuse", "Davenport",
"Dublin", "Davenport", "Portland", "Portland", "Syracuse", "Portland"
), paper = c("watercolor", "drawing", "watercolor", "drawing",
"watercolor", "drawing", "drawing", "watercolor", "drawing",
"watercolor"), paper.type = c("sheet", "pads", "pad", "roll",
"sheet", "roll", "pads", "block", "journal", "block"), unit.price = c(0.77,
10.26, 12.15, 20.99, 0.77, 20.99, 10.26, 19.34, 24.94, 19.34),
units.sold = c(5L, 1L, 2L, 1L, 7L, 1L, 1L, 1L, 1L, 1L), total.sale = c(3.85,
10.26, 24.3, 20.99, 5.39, 20.99, 10.26, 19.34, 24.94, 19.34
)), class = "data.frame", row.names = c("9991", "9992", "9993",
"9994", "9995", "9996", "9997", "9998", "9999", "10000"))