在 R 中使用 dplyr 重塑 table
Reshaping a table with dplyr in R
欢迎就 dplyr
在 R 中的正确应用提出一些建议。
我们有以下数据:
City Amount Category
1 Los Angeles 100 Film
2 Los Angeles 200 Film
3 Los Angeles 400 Music
4 Seattle 300 Coffee
5 Boston 600 Books
...
最终结果应如下所示:
Film Coffee Books ...
City
Los Angeles, CA Sum Sum Sum Sum
Seattle, WA Sum Sum Sum Sum
Boston, MA Sum Sum Sum Sum
我希望数据透视表 table 汇总每个城市中每个类别的 "Amount" 的总值,以便城市在一列中位于左侧,所有类别在顶部作为一行.
尝试过:
data %>%
group_by(Location, Category) %>%
summarise(Amount = sum(Amount))
哪个看起来更像
City Amount Category
1 Los Angeles 300 Film
3 Los Angeles 400 Music
4 Seattle 300 Coffee
5 Boston 600 Books
计算是正确的,但如前所述,我们需要将城市和类别作为矩阵,其中每个单元格内的每个金额之和。
感谢您的帮助!
您正在寻找的是 tidyr::spread
将您的 data.frame 从长格式重塑为宽格式:
library(tidyverse)
# recreate the data
data <- tribble(
~City, ~Amount, ~Category,
"Los Angeles", 100, "Film",
"Los Angeles", 200, "Film",
"Los Angeles", 400, "Music",
"Seattle", 300, "Coffee",
"Boston", 600, "Books"
)
# using your code to get the data in the long-format
data_long <- data %>%
group_by(City, Category) %>%
summarise(Amount = sum(Amount))
data_long
#> # A tibble: 4 x 3
#> # Groups: City [?]
#> City Category Amount
#> <chr> <chr> <dbl>
#> 1 Boston Books 600
#> 2 Los Angeles Film 300
#> 3 Los Angeles Music 400
#> 4 Seattle Coffee 300
# spread to wide using the tidyr-package (in tidyverse)
data_wide <- spread(data_long, key = "Category", value = "Amount", fill = 0)
data_wide
#> # A tibble: 3 x 5
#> # Groups: City [3]
#> City Books Coffee Film Music
#> * <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Boston 600 0 0 0
#> 2 Los Angeles 0 0 300 400
#> 3 Seattle 0 300 0 0
走向矩阵
mat <- as.matrix(data_wide %>% ungroup %>% select(-City))
rownames(mat) <- data_wide$City
mat
#> Books Coffee Film Music
#> Boston 600 0 0 0
#> Los Angeles 0 0 300 400
#> Seattle 0 300 0 0
str(mat)
#> num [1:3, 1:4] 600 0 0 0 0 300 0 300 0 0 ...
#> - attr(*, "dimnames")=List of 2
#> ..$ : chr [1:3] "Boston" "Los Angeles" "Seattle"
#> ..$ : chr [1:4] "Books" "Coffee" "Film" "Music"
欢迎就 dplyr
在 R 中的正确应用提出一些建议。
我们有以下数据:
City Amount Category
1 Los Angeles 100 Film
2 Los Angeles 200 Film
3 Los Angeles 400 Music
4 Seattle 300 Coffee
5 Boston 600 Books
...
最终结果应如下所示:
Film Coffee Books ...
City
Los Angeles, CA Sum Sum Sum Sum
Seattle, WA Sum Sum Sum Sum
Boston, MA Sum Sum Sum Sum
我希望数据透视表 table 汇总每个城市中每个类别的 "Amount" 的总值,以便城市在一列中位于左侧,所有类别在顶部作为一行.
尝试过:
data %>%
group_by(Location, Category) %>%
summarise(Amount = sum(Amount))
哪个看起来更像
City Amount Category
1 Los Angeles 300 Film
3 Los Angeles 400 Music
4 Seattle 300 Coffee
5 Boston 600 Books
计算是正确的,但如前所述,我们需要将城市和类别作为矩阵,其中每个单元格内的每个金额之和。
感谢您的帮助!
您正在寻找的是 tidyr::spread
将您的 data.frame 从长格式重塑为宽格式:
library(tidyverse)
# recreate the data
data <- tribble(
~City, ~Amount, ~Category,
"Los Angeles", 100, "Film",
"Los Angeles", 200, "Film",
"Los Angeles", 400, "Music",
"Seattle", 300, "Coffee",
"Boston", 600, "Books"
)
# using your code to get the data in the long-format
data_long <- data %>%
group_by(City, Category) %>%
summarise(Amount = sum(Amount))
data_long
#> # A tibble: 4 x 3
#> # Groups: City [?]
#> City Category Amount
#> <chr> <chr> <dbl>
#> 1 Boston Books 600
#> 2 Los Angeles Film 300
#> 3 Los Angeles Music 400
#> 4 Seattle Coffee 300
# spread to wide using the tidyr-package (in tidyverse)
data_wide <- spread(data_long, key = "Category", value = "Amount", fill = 0)
data_wide
#> # A tibble: 3 x 5
#> # Groups: City [3]
#> City Books Coffee Film Music
#> * <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Boston 600 0 0 0
#> 2 Los Angeles 0 0 300 400
#> 3 Seattle 0 300 0 0
走向矩阵
mat <- as.matrix(data_wide %>% ungroup %>% select(-City))
rownames(mat) <- data_wide$City
mat
#> Books Coffee Film Music
#> Boston 600 0 0 0
#> Los Angeles 0 0 300 400
#> Seattle 0 300 0 0
str(mat)
#> num [1:3, 1:4] 600 0 0 0 0 300 0 300 0 0 ...
#> - attr(*, "dimnames")=List of 2
#> ..$ : chr [1:3] "Boston" "Los Angeles" "Seattle"
#> ..$ : chr [1:4] "Books" "Coffee" "Film" "Music"