以第一列为字符对 R 中的列联表求和
Summing Contingency Tables in R with first column as character
我的销售数据集包括 3 列:国家/地区、销售额 Type/Method、季度总收入。这是前几行的显示,以获得更好的想法:
Retailer.country Order.method.type Qtr.Rev
<fctr> <fctr> <dbl>
1 Australia E-mail 171407.28
2 Australia Sales visit 2013909.18
3 Australia Special 158795.34
4 Australia Telephone 2289201.87
5 Australia Web 1738303.59
6 Austria Sales visit 66926.18
7 Austria Telephone 1671887.40
8 Austria Web 7050164.50
9 Belgium Sales visit 1655507.05
10 Belgium Web 6222440.26
etc.........
这是此数据的输入:
structure(list(Retailer.country = structure(c(1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 3L, 3L, 4L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L,
7L, 8L, 8L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 11L, 11L, 11L,
11L, 11L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 13L, 13L, 13L, 14L,
14L, 14L, 15L, 15L, 15L, 15L, 16L, 16L, 16L, 16L, 17L, 17L, 17L,
17L, 18L, 18L, 19L, 19L, 19L, 19L, 19L, 20L, 20L, 20L, 20L, 20L,
21L, 21L, 21L, 21L, 21L, 21L), .Label = c("Australia", "Austria",
"Belgium", "Brazil", "Canada", "China", "Denmark", "Finland",
"France", "Germany", "Italy", "Japan", "Korea", "Mexico",
"Netherlands",
"Singapore", "Spain", "Sweden", "Switzerland", "United Kingdom",
"United States"), class = "factor"), Order.method.type =
structure(c(1L,
4L, 5L, 6L, 7L, 4L, 6L, 7L, 4L, 7L, 7L, 1L, 2L, 4L, 7L, 2L, 4L,
6L, 7L, 4L, 7L, 4L, 7L, 2L, 4L, 6L, 7L, 1L, 3L, 4L, 7L, 1L, 2L,
4L, 5L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 4L, 6L, 7L, 4L, 5L, 7L,
2L, 3L, 6L, 7L, 2L, 5L, 6L, 7L, 2L, 3L, 6L, 7L, 1L, 7L, 2L, 4L,
5L, 6L, 7L, 1L, 2L, 4L, 6L, 7L, 2L, 3L, 4L, 5L, 6L, 7L), .Label =
c("E-mail",
"Fax", "Mail", "Sales visit", "Special", "Telephone", "Web"), class =
"factor"),
Qtr.Rev = c(171407.28, 2013909.18, 158795.34, 2289201.87,
1738303.59, 66926.18, 1671887.4, 7050164.5, 1655507.05,
6222440.26,
7746789.52, 6864270.12, 195549.5, 450628.79, 12376528.53,
415128.31, 1453194.14, 2735416.3, 15777880.11, 413978.16,
3776833.13, 308638.6, 12328172.97, 709194.65, 1304167.86,
5897377.14, 11048160.97, 1546079.43, 1247170.05, 2373591.15,
12102240.99, 2461322.51, 165800.42, 1397604.56, 198705.05,
7413833.64, 2662351.94, 289704.5, 680467.87, 87186.72, 343708.86,
1802166.73, 16990817.52, 2821127.32, 431860.34, 10144353.75,
5063353.42, 1725508.54, 3571760.87, 593828.88, 1074860.66,
2981026.86, 5254137.56, 469627.61, 908725.05, 1625096.56,
9677070.09, 88788.41, 337710.73, 254360.21, 7835117.44,
1292812.39,
4818848.86, 217936.39, 792168.42, 790344.28, 109161.04,
4565896.64,
697619.35, 264500.2, 189218.02, 2022968.96, 13756025.4,
1357389.56,
2352483.29, 2842600.85, 685752.21, 13437403.28, 29573813.7
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-79L), .Names = c("Retailer.country", "Order.method.type", "Qtr.Rev"
))
我正在 R 中创建一个意外事件 table,它显示每个国家/地区的每种销售方法产生的季度收入。最终输出应与此类似:
Retailer.country E-mail Fax Mail Sales visit Special Telephone Web TOTAL.cn
1 Australia 171407.3 0.00 0.0 2013909.18 158795.3 2289201.9 1738304 6371617
2 Austria 0.0 0.00 0.0 66926.18 0.0 1671887.4 7050164 8788978
3 Belgium 0.0 0.00 0.0 1655507.05 0.0 0.0 6222440 7877947
4 Brazil 0.0 0.00 0.0 0.00 0.0 0.0 7746790 7746790
5 Canada 6864270.1 195549.50 0.0 450628.79 0.0 0.0 12376529 19886977
6 China 0.0 415128.31 0.0 1453194.14 0.0 2735416.3 15777880 20381619
7 Denmark 0.0 0.00 0.0 413978.16
...
20 United Kingdom 697619.3 264500.20 0.0 189218.02 0.0 2022969.0 13756025 16930332
21 United States 0.0 1357389.56 2352483.3 2842600.85 685752.2 13437403.3 29573814 50249443
22 TOTAL.type 15695863.0 4767448.43 5692692.6 23233800.42 4811539.3 35257926.7 203769190 293228461
reshape 库中的 cast() 函数完成了大部分工作,只留下要计算的所有值的摘要列和行。
cast(sales.by.country, Retailer.country ~ Order.method.type,
fill=0) -> sales.by.country
将行汇总到名为 "TOTAL.cn" 的新列中非常简单:
sales.by.country$TOTAL.cn <- rowSums(sales.by.country[,c(2:8)])
但是对列求和变得非常头疼,因为最后一行的第一个组成部分必须是一个因子或字符。我将第一列 "Retailer.country" 转换为字符类型,因为它实际上只是一个视觉标签。
在折腾了几个函数之后,这是我能够创建的实现预期行总和的最佳代码:
# Sum the numeric columns, which is everything *except* column 1
total.by.ordertype <- (colSums(sales.by.country[,-1]))
# Create the Total by Order row
total.by.ordertype.row <- list("TOTAL.type", total.by.ordertype[1],
total.by.ordertype[2], total.by.ordertype[3], total.by.ordertype[4],
total.by.ordertype[5], total.by.ordertype[6], total.by.ordertype[7],
total.by.ordertype[8])
# Add the Total by Order row to the bottom of the table
sales.by.country[22, ] <- total.by.ordertype.row
它可以工作并在所有列中维护正确的数据类型...但我认为必须有一种更有效的方法,也许通过使用 apply 函数族,来自 dplyr 的东西等。也许是唯一的方法是自己写函数?
例如,未来的数据集可能有 50 多种不同的销售方式。在为 "Total by Order" 行(上图)创建列表时,我必须调出向量中的每个单元格,用逗号分隔,以便成功将其添加到我现有的 table。其他努力将所有其他列的数据类型转换为字符,这把一切都搞砸了。
我不介意copying/pasting "total.by.ordertype" 8 倍。但是当我处理 50-100 个订单类型时会发生什么?是否有更简洁的方法来重现这些内容?
谢谢!
使用 dplyr
和 tidyr
中的函数的解决方案。 dt4
是最终输出。注意 summarise_if
的使用。当我们只想对符合预定条件的列应用函数时,它很有用。在这种情况下,我们只能将 sum
函数应用于数字列。
# Create example data frame
library(dplyr)
library(tidyr)
# sales.by.country is created by OP's dput dataset
dt2 <- sales.by.country %>%
mutate(Retailer.country = as.character(Retailer.country)) %>%
# Spread the data frame
spread(Order.method.type, Qtr.Rev, fill = 0) %>%
# Calcualte Total.cn by rowSums
mutate(TOTAL.cn = rowSums(.[, 2:ncol(.)]))
# Calculate the sum of each column if it is numeric
dt3 <- dt2 %>% summarise_if(is.numeric, sum)
# Combine dt3 (the summary) to dt2
dt4 <- dt2 %>%
bind_rows(dt3) %>%
# Replace the na in Retailer.country to be "TOTAL.type"
replace_na(list(Retailer.country = "TOTAL.type"))
reshape
库中的 cast()
函数可以完成全部工作。使用参数 margin = TRUE
,将计算所有行和列的总数:
reshape::cast(sales.by.country, Retailer.country ~ Order.method.type, fun.aggregate = sum,
fill = 0, margins = TRUE)
Retailer.country E-mail Fax Mail Sales visit Special Telephone Web (all)
1 Australia 171407.3 0.00 0.0 2013909.18 158795.3 2289201.9 1738304 6371617
2 Austria 0.0 0.00 0.0 66926.18 0.0 1671887.4 7050164 8788978
3 Belgium 0.0 0.00 0.0 1655507.05 0.0 0.0 6222440 7877947
4 Brazil 0.0 0.00 0.0 0.00 0.0 0.0 7746790 7746790
5 Canada 6864270.1 195549.50 0.0 450628.79 0.0 0.0 12376529 19886977
6 China 0.0 415128.31 0.0 1453194.14 0.0 2735416.3 15777880 20381619
7 Denmark 0.0 0.00 0.0 413978.16 0.0 0.0 3776833 4190811
8 Finland 0.0 0.00 0.0 308638.60 0.0 0.0 12328173 12636812
9 France 0.0 709194.65 0.0 1304167.86 0.0 5897377.1 11048161 18958901
10 Germany 1546079.4 0.00 1247170.1 2373591.15 0.0 0.0 12102241 17269082
11 Italy 2461322.5 165800.42 0.0 1397604.56 198705.0 0.0 7413834 11637266
12 Japan 2662351.9 289704.50 680467.9 87186.72 343708.9 1802166.7 16990818 22856404
13 Korea 0.0 0.00 0.0 2821127.32 0.0 431860.3 10144354 13397341
14 Mexico 0.0 0.00 0.0 5063353.42 1725508.5 0.0 3571761 10360623
15 Netherlands 0.0 593828.88 1074860.7 0.00 0.0 2981026.9 5254138 9903854
16 Singapore 0.0 469627.61 0.0 0.00 908725.1 1625096.6 9677070 12680519
17 Spain 0.0 88788.41 337710.7 0.00 0.0 254360.2 7835117 8515977
18 Sweden 1292812.4 0.00 0.0 0.00 0.0 0.0 4818849 6111661
19 Switzerland 0.0 217936.39 0.0 792168.42 790344.3 109161.0 4565897 6475507
20 United Kingdom 697619.3 264500.20 0.0 189218.02 0.0 2022969.0 13756025 16930332
21 United States 0.0 1357389.56 2352483.3 2842600.85 685752.2 13437403.3 29573814 50249443
22 (all) 15695863.0 4767448.43 5692692.6 23233800.42 4811539.3 35257926.7 203769190 293228461
当然还要指定fun.aggregate
reshape2
包(reshape
的后继包)也提供了相同的功能,但对于这个小样本量,速度提高了大约 4 倍。
reshape2::dcast(sales.by.country, Retailer.country ~ Order.method.type, fun.aggregate = sum,
fill = 0, margins = TRUE)
dcast()
也可以从 data.table
包中获得,它声称比 reshape2::dcast()
更快。不幸的是,margins
参数尚未实现(当前 CRAN 版本 1.10.4)。因此,边距必须单独计算并与原始数据结合:
DT2 <- rbind(
DT,
DT[, .(Qtr.Rev = sum(Qtr.Rev)), by = Retailer.country],
DT[, .(Qtr.Rev = sum(Qtr.Rev)), by = Order.method.type],
DT[, .(Qtr.Rev = sum(Qtr.Rev))],
fill = TRUE
)
dcast(DT2, Retailer.country ~ Order.method.type, fill = 0)
Retailer.country E-mail Fax Mail Sales visit Special Telephone Web NA
1: Australia 171407.3 0.00 0.0 2013909.18 158795.3 2289201.9 1738304 6371617
2: Austria 0.0 0.00 0.0 66926.18 0.0 1671887.4 7050164 8788978
3: Belgium 0.0 0.00 0.0 1655507.05 0.0 0.0 6222440 7877947
4: Brazil 0.0 0.00 0.0 0.00 0.0 0.0 7746790 7746790
5: Canada 6864270.1 195549.50 0.0 450628.79 0.0 0.0 12376529 19886977
6: China 0.0 415128.31 0.0 1453194.14 0.0 2735416.3 15777880 20381619
7: Denmark 0.0 0.00 0.0 413978.16 0.0 0.0 3776833 4190811
8: Finland 0.0 0.00 0.0 308638.60 0.0 0.0 12328173 12636812
9: France 0.0 709194.65 0.0 1304167.86 0.0 5897377.1 11048161 18958901
10: Germany 1546079.4 0.00 1247170.1 2373591.15 0.0 0.0 12102241 17269082
11: Italy 2461322.5 165800.42 0.0 1397604.56 198705.0 0.0 7413834 11637266
12: Japan 2662351.9 289704.50 680467.9 87186.72 343708.9 1802166.7 16990818 22856404
13: Korea 0.0 0.00 0.0 2821127.32 0.0 431860.3 10144354 13397341
14: Mexico 0.0 0.00 0.0 5063353.42 1725508.5 0.0 3571761 10360623
15: Netherlands 0.0 593828.88 1074860.7 0.00 0.0 2981026.9 5254138 9903854
16: Singapore 0.0 469627.61 0.0 0.00 908725.1 1625096.6 9677070 12680519
17: Spain 0.0 88788.41 337710.7 0.00 0.0 254360.2 7835117 8515977
18: Sweden 1292812.4 0.00 0.0 0.00 0.0 0.0 4818849 6111661
19: Switzerland 0.0 217936.39 0.0 792168.42 790344.3 109161.0 4565897 6475507
20: United Kingdom 697619.3 264500.20 0.0 189218.02 0.0 2022969.0 13756025 16930332
21: United States 0.0 1357389.56 2352483.3 2842600.85 685752.2 13437403.3 29573814 50249443
22: NA 15695863.0 4767448.43 5692692.6 23233800.42 4811539.3 35257926.7 203769190 293228461
Retailer.country E-mail Fax Mail Sales visit Special Telephone Web NA
使用 tidyr
展开并使用 janitor
添加总计列和行:
library(janitor)
library(tidyr)
sales.by.country %>%
spread(Order.method.type, Qtr.Rev, fill = 0) %>%
adorn_totals(c("row", "col"))
我的销售数据集包括 3 列:国家/地区、销售额 Type/Method、季度总收入。这是前几行的显示,以获得更好的想法:
Retailer.country Order.method.type Qtr.Rev
<fctr> <fctr> <dbl>
1 Australia E-mail 171407.28
2 Australia Sales visit 2013909.18
3 Australia Special 158795.34
4 Australia Telephone 2289201.87
5 Australia Web 1738303.59
6 Austria Sales visit 66926.18
7 Austria Telephone 1671887.40
8 Austria Web 7050164.50
9 Belgium Sales visit 1655507.05
10 Belgium Web 6222440.26
etc.........
这是此数据的输入:
structure(list(Retailer.country = structure(c(1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 3L, 3L, 4L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L,
7L, 8L, 8L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 11L, 11L, 11L,
11L, 11L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 13L, 13L, 13L, 14L,
14L, 14L, 15L, 15L, 15L, 15L, 16L, 16L, 16L, 16L, 17L, 17L, 17L,
17L, 18L, 18L, 19L, 19L, 19L, 19L, 19L, 20L, 20L, 20L, 20L, 20L,
21L, 21L, 21L, 21L, 21L, 21L), .Label = c("Australia", "Austria",
"Belgium", "Brazil", "Canada", "China", "Denmark", "Finland",
"France", "Germany", "Italy", "Japan", "Korea", "Mexico",
"Netherlands",
"Singapore", "Spain", "Sweden", "Switzerland", "United Kingdom",
"United States"), class = "factor"), Order.method.type =
structure(c(1L,
4L, 5L, 6L, 7L, 4L, 6L, 7L, 4L, 7L, 7L, 1L, 2L, 4L, 7L, 2L, 4L,
6L, 7L, 4L, 7L, 4L, 7L, 2L, 4L, 6L, 7L, 1L, 3L, 4L, 7L, 1L, 2L,
4L, 5L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 4L, 6L, 7L, 4L, 5L, 7L,
2L, 3L, 6L, 7L, 2L, 5L, 6L, 7L, 2L, 3L, 6L, 7L, 1L, 7L, 2L, 4L,
5L, 6L, 7L, 1L, 2L, 4L, 6L, 7L, 2L, 3L, 4L, 5L, 6L, 7L), .Label =
c("E-mail",
"Fax", "Mail", "Sales visit", "Special", "Telephone", "Web"), class =
"factor"),
Qtr.Rev = c(171407.28, 2013909.18, 158795.34, 2289201.87,
1738303.59, 66926.18, 1671887.4, 7050164.5, 1655507.05,
6222440.26,
7746789.52, 6864270.12, 195549.5, 450628.79, 12376528.53,
415128.31, 1453194.14, 2735416.3, 15777880.11, 413978.16,
3776833.13, 308638.6, 12328172.97, 709194.65, 1304167.86,
5897377.14, 11048160.97, 1546079.43, 1247170.05, 2373591.15,
12102240.99, 2461322.51, 165800.42, 1397604.56, 198705.05,
7413833.64, 2662351.94, 289704.5, 680467.87, 87186.72, 343708.86,
1802166.73, 16990817.52, 2821127.32, 431860.34, 10144353.75,
5063353.42, 1725508.54, 3571760.87, 593828.88, 1074860.66,
2981026.86, 5254137.56, 469627.61, 908725.05, 1625096.56,
9677070.09, 88788.41, 337710.73, 254360.21, 7835117.44,
1292812.39,
4818848.86, 217936.39, 792168.42, 790344.28, 109161.04,
4565896.64,
697619.35, 264500.2, 189218.02, 2022968.96, 13756025.4,
1357389.56,
2352483.29, 2842600.85, 685752.21, 13437403.28, 29573813.7
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-79L), .Names = c("Retailer.country", "Order.method.type", "Qtr.Rev"
))
我正在 R 中创建一个意外事件 table,它显示每个国家/地区的每种销售方法产生的季度收入。最终输出应与此类似:
Retailer.country E-mail Fax Mail Sales visit Special Telephone Web TOTAL.cn
1 Australia 171407.3 0.00 0.0 2013909.18 158795.3 2289201.9 1738304 6371617
2 Austria 0.0 0.00 0.0 66926.18 0.0 1671887.4 7050164 8788978
3 Belgium 0.0 0.00 0.0 1655507.05 0.0 0.0 6222440 7877947
4 Brazil 0.0 0.00 0.0 0.00 0.0 0.0 7746790 7746790
5 Canada 6864270.1 195549.50 0.0 450628.79 0.0 0.0 12376529 19886977
6 China 0.0 415128.31 0.0 1453194.14 0.0 2735416.3 15777880 20381619
7 Denmark 0.0 0.00 0.0 413978.16
...
20 United Kingdom 697619.3 264500.20 0.0 189218.02 0.0 2022969.0 13756025 16930332
21 United States 0.0 1357389.56 2352483.3 2842600.85 685752.2 13437403.3 29573814 50249443
22 TOTAL.type 15695863.0 4767448.43 5692692.6 23233800.42 4811539.3 35257926.7 203769190 293228461
reshape 库中的 cast() 函数完成了大部分工作,只留下要计算的所有值的摘要列和行。
cast(sales.by.country, Retailer.country ~ Order.method.type,
fill=0) -> sales.by.country
将行汇总到名为 "TOTAL.cn" 的新列中非常简单:
sales.by.country$TOTAL.cn <- rowSums(sales.by.country[,c(2:8)])
但是对列求和变得非常头疼,因为最后一行的第一个组成部分必须是一个因子或字符。我将第一列 "Retailer.country" 转换为字符类型,因为它实际上只是一个视觉标签。
在折腾了几个函数之后,这是我能够创建的实现预期行总和的最佳代码:
# Sum the numeric columns, which is everything *except* column 1
total.by.ordertype <- (colSums(sales.by.country[,-1]))
# Create the Total by Order row
total.by.ordertype.row <- list("TOTAL.type", total.by.ordertype[1],
total.by.ordertype[2], total.by.ordertype[3], total.by.ordertype[4],
total.by.ordertype[5], total.by.ordertype[6], total.by.ordertype[7],
total.by.ordertype[8])
# Add the Total by Order row to the bottom of the table
sales.by.country[22, ] <- total.by.ordertype.row
它可以工作并在所有列中维护正确的数据类型...但我认为必须有一种更有效的方法,也许通过使用 apply 函数族,来自 dplyr 的东西等。也许是唯一的方法是自己写函数?
例如,未来的数据集可能有 50 多种不同的销售方式。在为 "Total by Order" 行(上图)创建列表时,我必须调出向量中的每个单元格,用逗号分隔,以便成功将其添加到我现有的 table。其他努力将所有其他列的数据类型转换为字符,这把一切都搞砸了。
我不介意copying/pasting "total.by.ordertype" 8 倍。但是当我处理 50-100 个订单类型时会发生什么?是否有更简洁的方法来重现这些内容?
谢谢!
使用 dplyr
和 tidyr
中的函数的解决方案。 dt4
是最终输出。注意 summarise_if
的使用。当我们只想对符合预定条件的列应用函数时,它很有用。在这种情况下,我们只能将 sum
函数应用于数字列。
# Create example data frame
library(dplyr)
library(tidyr)
# sales.by.country is created by OP's dput dataset
dt2 <- sales.by.country %>%
mutate(Retailer.country = as.character(Retailer.country)) %>%
# Spread the data frame
spread(Order.method.type, Qtr.Rev, fill = 0) %>%
# Calcualte Total.cn by rowSums
mutate(TOTAL.cn = rowSums(.[, 2:ncol(.)]))
# Calculate the sum of each column if it is numeric
dt3 <- dt2 %>% summarise_if(is.numeric, sum)
# Combine dt3 (the summary) to dt2
dt4 <- dt2 %>%
bind_rows(dt3) %>%
# Replace the na in Retailer.country to be "TOTAL.type"
replace_na(list(Retailer.country = "TOTAL.type"))
reshape
库中的 cast()
函数可以完成全部工作。使用参数 margin = TRUE
,将计算所有行和列的总数:
reshape::cast(sales.by.country, Retailer.country ~ Order.method.type, fun.aggregate = sum,
fill = 0, margins = TRUE)
Retailer.country E-mail Fax Mail Sales visit Special Telephone Web (all) 1 Australia 171407.3 0.00 0.0 2013909.18 158795.3 2289201.9 1738304 6371617 2 Austria 0.0 0.00 0.0 66926.18 0.0 1671887.4 7050164 8788978 3 Belgium 0.0 0.00 0.0 1655507.05 0.0 0.0 6222440 7877947 4 Brazil 0.0 0.00 0.0 0.00 0.0 0.0 7746790 7746790 5 Canada 6864270.1 195549.50 0.0 450628.79 0.0 0.0 12376529 19886977 6 China 0.0 415128.31 0.0 1453194.14 0.0 2735416.3 15777880 20381619 7 Denmark 0.0 0.00 0.0 413978.16 0.0 0.0 3776833 4190811 8 Finland 0.0 0.00 0.0 308638.60 0.0 0.0 12328173 12636812 9 France 0.0 709194.65 0.0 1304167.86 0.0 5897377.1 11048161 18958901 10 Germany 1546079.4 0.00 1247170.1 2373591.15 0.0 0.0 12102241 17269082 11 Italy 2461322.5 165800.42 0.0 1397604.56 198705.0 0.0 7413834 11637266 12 Japan 2662351.9 289704.50 680467.9 87186.72 343708.9 1802166.7 16990818 22856404 13 Korea 0.0 0.00 0.0 2821127.32 0.0 431860.3 10144354 13397341 14 Mexico 0.0 0.00 0.0 5063353.42 1725508.5 0.0 3571761 10360623 15 Netherlands 0.0 593828.88 1074860.7 0.00 0.0 2981026.9 5254138 9903854 16 Singapore 0.0 469627.61 0.0 0.00 908725.1 1625096.6 9677070 12680519 17 Spain 0.0 88788.41 337710.7 0.00 0.0 254360.2 7835117 8515977 18 Sweden 1292812.4 0.00 0.0 0.00 0.0 0.0 4818849 6111661 19 Switzerland 0.0 217936.39 0.0 792168.42 790344.3 109161.0 4565897 6475507 20 United Kingdom 697619.3 264500.20 0.0 189218.02 0.0 2022969.0 13756025 16930332 21 United States 0.0 1357389.56 2352483.3 2842600.85 685752.2 13437403.3 29573814 50249443 22 (all) 15695863.0 4767448.43 5692692.6 23233800.42 4811539.3 35257926.7 203769190 293228461
当然还要指定fun.aggregate
reshape2
包(reshape
的后继包)也提供了相同的功能,但对于这个小样本量,速度提高了大约 4 倍。
reshape2::dcast(sales.by.country, Retailer.country ~ Order.method.type, fun.aggregate = sum,
fill = 0, margins = TRUE)
dcast()
也可以从 data.table
包中获得,它声称比 reshape2::dcast()
更快。不幸的是,margins
参数尚未实现(当前 CRAN 版本 1.10.4)。因此,边距必须单独计算并与原始数据结合:
DT2 <- rbind(
DT,
DT[, .(Qtr.Rev = sum(Qtr.Rev)), by = Retailer.country],
DT[, .(Qtr.Rev = sum(Qtr.Rev)), by = Order.method.type],
DT[, .(Qtr.Rev = sum(Qtr.Rev))],
fill = TRUE
)
dcast(DT2, Retailer.country ~ Order.method.type, fill = 0)
Retailer.country E-mail Fax Mail Sales visit Special Telephone Web NA 1: Australia 171407.3 0.00 0.0 2013909.18 158795.3 2289201.9 1738304 6371617 2: Austria 0.0 0.00 0.0 66926.18 0.0 1671887.4 7050164 8788978 3: Belgium 0.0 0.00 0.0 1655507.05 0.0 0.0 6222440 7877947 4: Brazil 0.0 0.00 0.0 0.00 0.0 0.0 7746790 7746790 5: Canada 6864270.1 195549.50 0.0 450628.79 0.0 0.0 12376529 19886977 6: China 0.0 415128.31 0.0 1453194.14 0.0 2735416.3 15777880 20381619 7: Denmark 0.0 0.00 0.0 413978.16 0.0 0.0 3776833 4190811 8: Finland 0.0 0.00 0.0 308638.60 0.0 0.0 12328173 12636812 9: France 0.0 709194.65 0.0 1304167.86 0.0 5897377.1 11048161 18958901 10: Germany 1546079.4 0.00 1247170.1 2373591.15 0.0 0.0 12102241 17269082 11: Italy 2461322.5 165800.42 0.0 1397604.56 198705.0 0.0 7413834 11637266 12: Japan 2662351.9 289704.50 680467.9 87186.72 343708.9 1802166.7 16990818 22856404 13: Korea 0.0 0.00 0.0 2821127.32 0.0 431860.3 10144354 13397341 14: Mexico 0.0 0.00 0.0 5063353.42 1725508.5 0.0 3571761 10360623 15: Netherlands 0.0 593828.88 1074860.7 0.00 0.0 2981026.9 5254138 9903854 16: Singapore 0.0 469627.61 0.0 0.00 908725.1 1625096.6 9677070 12680519 17: Spain 0.0 88788.41 337710.7 0.00 0.0 254360.2 7835117 8515977 18: Sweden 1292812.4 0.00 0.0 0.00 0.0 0.0 4818849 6111661 19: Switzerland 0.0 217936.39 0.0 792168.42 790344.3 109161.0 4565897 6475507 20: United Kingdom 697619.3 264500.20 0.0 189218.02 0.0 2022969.0 13756025 16930332 21: United States 0.0 1357389.56 2352483.3 2842600.85 685752.2 13437403.3 29573814 50249443 22: NA 15695863.0 4767448.43 5692692.6 23233800.42 4811539.3 35257926.7 203769190 293228461 Retailer.country E-mail Fax Mail Sales visit Special Telephone Web NA
使用 tidyr
展开并使用 janitor
添加总计列和行:
library(janitor)
library(tidyr)
sales.by.country %>%
spread(Order.method.type, Qtr.Rev, fill = 0) %>%
adorn_totals(c("row", "col"))