修改 Hmisc 包中 cut2 函数的 Breaks
Modify Breaks in cut2 function in Hmisc package
这是这个问题的后续问题:
提供的答案使用 Hmisc::cut2
,效果很好!我想修改中断,而不是中断 1 美元,而是中断 0.50 美元。
下面是为答案提供的代码:
library(Hmisc)
library(dplyr)
df$cut_Price <- cut2(df$Price, cuts = 4:13)
df %>% group_by(cut_Price, Size, Type) %>%
summarise_at(c("Opps", "NumberofSales", "Revenue"),"sum") %>%
arrange(Size, cut_Price) %>% ungroup() %>%
mutate(cut_Price = gsub("(.*, \d\.)00", "\199", cut_Price))
# A tibble: 16 × 6
cut_Price Size Type Opps NumberofSales Revenue
<chr> <fctr> <fctr> <dbl> <dbl> <dbl>
1 [ 5.00, 6.99) LARGE desktop 477870 342455 2037.67
2 [ 6.00, 7.99) LARGE desktop 842882 523309 3292.29
3 [ 7.00, 8.99) LARGE desktop 283107 149878 1189.56
4 [10.00,11.00) LARGE desktop 5506835 1179544 12674.17
5 [11.00,12.00) LARGE desktop 3542187 1521347 17342.81
6 [ 3.63, 4.99) MEDIUM desktop 6038044 5129937 18617.94
7 [ 5.00, 6.99) MEDIUM desktop 2558997 478423 2548.95
8 [ 7.00, 8.99) MEDIUM desktop 1071631 352294 2483.10
9 [ 9.00,10.00) MEDIUM desktop 2510873 861183 8428.70
10 [10.00,11.00) MEDIUM desktop 441354 215643 2322.70
11 [11.00,12.00) MEDIUM desktop 5144351 1954720 22138.16
12 [ 3.63, 4.99) SMALL desktop 801038 587541 2145.76
13 [ 4.00, 5.99) SMALL desktop 939806 303515 1214.60
14 [ 5.00, 6.99) SMALL desktop 8303927 2143565 11902.14
15 [10.00,11.00) SMALL desktop 920975 321515 3284.54
16 [11.00,12.00) SMALL desktop 181471 236643 2811.50
任何帮助都将非常有用,谢谢!
您需要传递 cut2
您想要的中断矢量,您可以使用 seq
:
创建
library(tidyverse)
df %>% group_by(Size,
cut_Price = Hmisc::cut2(Price, cuts = seq(4, 13, .5)),
Type) %>%
summarise_at(c("Opps", "NumberofSales", "Revenue"), sum)
## Source: local data frame [18 x 6]
## Groups: Size, cut_Price [?]
##
## Size cut_Price Type Opps NumberofSales Revenue
## <fctr> <fctr> <fctr> <dbl> <dbl> <dbl>
## 1 LARGE [ 5.50, 6.00) desktop 477870 342455 2037.67
## 2 LARGE [ 6.00, 6.50) desktop 842882 523309 3292.29
## 3 LARGE [ 7.50, 8.00) desktop 283107 149878 1189.56
## 4 LARGE [10.00,10.50) desktop 928563 209218 2138.41
## 5 LARGE [10.50,11.00) desktop 4578272 970326 10535.76
## 6 LARGE [11.00,11.50) desktop 3542187 1521347 17342.81
## 7 MEDIUM [ 3.63, 4.00) desktop 6038044 5129937 18617.94
## 8 MEDIUM [ 5.00, 5.50) desktop 2558997 478423 2548.95
## 9 MEDIUM [ 7.00, 7.50) desktop 1071631 352294 2483.10
## 10 MEDIUM [ 9.50,10.00) desktop 2510873 861183 8428.70
## 11 MEDIUM [10.50,11.00) desktop 441354 215643 2322.70
## 12 MEDIUM [11.00,11.50) desktop 5144351 1954720 22138.16
## 13 SMALL [ 3.63, 4.00) desktop 801038 587541 2145.76
## 14 SMALL [ 4.00, 4.50) desktop 939806 303515 1214.60
## 15 SMALL [ 5.00, 5.50) desktop 849537 340580 1837.93
## 16 SMALL [ 5.50, 6.00) desktop 7454390 1802985 10064.21
## 17 SMALL [10.00,10.50) desktop 920975 321515 3284.54
## 18 SMALL [11.50,12.00) desktop 181471 236643 2811.50
如果您希望每个值都有行,您可以使用 tidyr::complete
。除非您在 complete
的 fill
参数中另有指定,否则空值将为 NA
。
df %>% group_by(Size,
cut_Price = Hmisc::cut2(Price, cuts = seq(4, 13, .5), oneval = FALSE),
Type) %>%
summarise_at(c("Opps", "NumberofSales", "Revenue"), sum) %>%
ungroup() %>%
complete(Size, cut_Price, Type)
## # A tibble: 57 × 6
## Size cut_Price Type Opps NumberofSales Revenue
## <fctr> <fctr> <fctr> <dbl> <dbl> <dbl>
## 1 LARGE [ 3.63, 4.00) desktop NA NA NA
## 2 LARGE [ 4.00, 4.50) desktop NA NA NA
## 3 LARGE [ 4.50, 5.00) desktop NA NA NA
## 4 LARGE [ 5.00, 5.50) desktop NA NA NA
## 5 LARGE [ 5.50, 6.00) desktop 477870 342455 2037.67
## 6 LARGE [ 6.00, 6.50) desktop 842882 523309 3292.29
## 7 LARGE [ 6.50, 7.00) desktop NA NA NA
## 8 LARGE [ 7.00, 7.50) desktop NA NA NA
## 9 LARGE [ 7.50, 8.00) desktop 283107 149878 1189.56
## 10 LARGE [ 8.00, 8.50) desktop NA NA NA
## # ... with 47 more rows
这是这个问题的后续问题:
提供的答案使用 Hmisc::cut2
,效果很好!我想修改中断,而不是中断 1 美元,而是中断 0.50 美元。
下面是为答案提供的代码:
library(Hmisc)
library(dplyr)
df$cut_Price <- cut2(df$Price, cuts = 4:13)
df %>% group_by(cut_Price, Size, Type) %>%
summarise_at(c("Opps", "NumberofSales", "Revenue"),"sum") %>%
arrange(Size, cut_Price) %>% ungroup() %>%
mutate(cut_Price = gsub("(.*, \d\.)00", "\199", cut_Price))
# A tibble: 16 × 6
cut_Price Size Type Opps NumberofSales Revenue
<chr> <fctr> <fctr> <dbl> <dbl> <dbl>
1 [ 5.00, 6.99) LARGE desktop 477870 342455 2037.67
2 [ 6.00, 7.99) LARGE desktop 842882 523309 3292.29
3 [ 7.00, 8.99) LARGE desktop 283107 149878 1189.56
4 [10.00,11.00) LARGE desktop 5506835 1179544 12674.17
5 [11.00,12.00) LARGE desktop 3542187 1521347 17342.81
6 [ 3.63, 4.99) MEDIUM desktop 6038044 5129937 18617.94
7 [ 5.00, 6.99) MEDIUM desktop 2558997 478423 2548.95
8 [ 7.00, 8.99) MEDIUM desktop 1071631 352294 2483.10
9 [ 9.00,10.00) MEDIUM desktop 2510873 861183 8428.70
10 [10.00,11.00) MEDIUM desktop 441354 215643 2322.70
11 [11.00,12.00) MEDIUM desktop 5144351 1954720 22138.16
12 [ 3.63, 4.99) SMALL desktop 801038 587541 2145.76
13 [ 4.00, 5.99) SMALL desktop 939806 303515 1214.60
14 [ 5.00, 6.99) SMALL desktop 8303927 2143565 11902.14
15 [10.00,11.00) SMALL desktop 920975 321515 3284.54
16 [11.00,12.00) SMALL desktop 181471 236643 2811.50
任何帮助都将非常有用,谢谢!
您需要传递 cut2
您想要的中断矢量,您可以使用 seq
:
library(tidyverse)
df %>% group_by(Size,
cut_Price = Hmisc::cut2(Price, cuts = seq(4, 13, .5)),
Type) %>%
summarise_at(c("Opps", "NumberofSales", "Revenue"), sum)
## Source: local data frame [18 x 6]
## Groups: Size, cut_Price [?]
##
## Size cut_Price Type Opps NumberofSales Revenue
## <fctr> <fctr> <fctr> <dbl> <dbl> <dbl>
## 1 LARGE [ 5.50, 6.00) desktop 477870 342455 2037.67
## 2 LARGE [ 6.00, 6.50) desktop 842882 523309 3292.29
## 3 LARGE [ 7.50, 8.00) desktop 283107 149878 1189.56
## 4 LARGE [10.00,10.50) desktop 928563 209218 2138.41
## 5 LARGE [10.50,11.00) desktop 4578272 970326 10535.76
## 6 LARGE [11.00,11.50) desktop 3542187 1521347 17342.81
## 7 MEDIUM [ 3.63, 4.00) desktop 6038044 5129937 18617.94
## 8 MEDIUM [ 5.00, 5.50) desktop 2558997 478423 2548.95
## 9 MEDIUM [ 7.00, 7.50) desktop 1071631 352294 2483.10
## 10 MEDIUM [ 9.50,10.00) desktop 2510873 861183 8428.70
## 11 MEDIUM [10.50,11.00) desktop 441354 215643 2322.70
## 12 MEDIUM [11.00,11.50) desktop 5144351 1954720 22138.16
## 13 SMALL [ 3.63, 4.00) desktop 801038 587541 2145.76
## 14 SMALL [ 4.00, 4.50) desktop 939806 303515 1214.60
## 15 SMALL [ 5.00, 5.50) desktop 849537 340580 1837.93
## 16 SMALL [ 5.50, 6.00) desktop 7454390 1802985 10064.21
## 17 SMALL [10.00,10.50) desktop 920975 321515 3284.54
## 18 SMALL [11.50,12.00) desktop 181471 236643 2811.50
如果您希望每个值都有行,您可以使用 tidyr::complete
。除非您在 complete
的 fill
参数中另有指定,否则空值将为 NA
。
df %>% group_by(Size,
cut_Price = Hmisc::cut2(Price, cuts = seq(4, 13, .5), oneval = FALSE),
Type) %>%
summarise_at(c("Opps", "NumberofSales", "Revenue"), sum) %>%
ungroup() %>%
complete(Size, cut_Price, Type)
## # A tibble: 57 × 6
## Size cut_Price Type Opps NumberofSales Revenue
## <fctr> <fctr> <fctr> <dbl> <dbl> <dbl>
## 1 LARGE [ 3.63, 4.00) desktop NA NA NA
## 2 LARGE [ 4.00, 4.50) desktop NA NA NA
## 3 LARGE [ 4.50, 5.00) desktop NA NA NA
## 4 LARGE [ 5.00, 5.50) desktop NA NA NA
## 5 LARGE [ 5.50, 6.00) desktop 477870 342455 2037.67
## 6 LARGE [ 6.00, 6.50) desktop 842882 523309 3292.29
## 7 LARGE [ 6.50, 7.00) desktop NA NA NA
## 8 LARGE [ 7.00, 7.50) desktop NA NA NA
## 9 LARGE [ 7.50, 8.00) desktop 283107 149878 1189.56
## 10 LARGE [ 8.00, 8.50) desktop NA NA NA
## # ... with 47 more rows