如何在R中使用pivot_longer()按类别将列分成多行?

How to use pivot_longer() in R to separate columns into multiple rows by category?

这里是一些虚构的数据:

tibble(fruit = rep(c("apple", "pear", "orange"), each = 3),
       size = rep(c("big", "medium", "small"), times = 3),
       # summer stock
       shopA_summer_wk1 = abs(round(rnorm(9, 10, 5), 0)),
       shopA_summer_wk2 = abs(round(rnorm(9, 10, 5), 0)),
       shopB_summer_wk1 = abs(round(rnorm(9, 10, 5), 0)),
       shopB_summer_wk2 = abs(round(rnorm(9, 10, 5), 0)),
       shopC_summer_wk1 = abs(round(rnorm(9, 10, 5), 0)),
       shopC_summer_wk2 = abs(round(rnorm(9, 10, 5), 0)),
       # winter stock
       shopA_winter_wk1 = abs(round(rnorm(9, 8, 4), 0)),
       shopA_winter_wk2 = abs(round(rnorm(9, 8, 4), 0)),
       shopA_winter_wk3 = abs(round(rnorm(9, 8, 4), 0)),
       shopB_winter_wk1 = abs(round(rnorm(9, 8, 4), 0)),
       shopB_winter_wk2 = abs(round(rnorm(9, 8, 4), 0)),
       shopB_winter_wk3 = abs(round(rnorm(9, 8, 4), 0)),
       shopC_winter_wk1 = abs(round(rnorm(9, 8, 4), 0)),
       shopC_winter_wk2 = abs(round(rnorm(9, 8, 4), 0)),
       shopC_winter_wk3 = abs(round(rnorm(9, 8, 4), 0)))

在夏季的 2 周和冬季的 3 周内收集了 3 家商店(A、B、C)的一些数据。收集的数据是商店在特定一周内库存的每种尺寸(大、中、小)的水果(苹果、梨、橙)的数量。

这是数据集的前 6 行:

# fruit  size   shopA_summer_wk1   shopA_summer_wk2 shopB_summer_wk1 shopB_summer_wk2 shopC_summer_wk1 shopC_summer_wk2 shopA_winter_wk1 shopA_winter_wk2 shopA_winter_wk3
#   <chr>  <chr>             <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>
# 1 apple  big                   9               12               12               16               15                5               14                4                0
# 2 apple  medium               21               16               16                1               12               11                8                8                9
# 3 apple  small                10                6               18               18               22               12                4                2                0
# 4 pear   big                  13                7                4               12               13                6               10                6                2
# 5 pear   medium               13               12                8                0                8                5               11                7                3
# 6 pear   small                16               18                4                3               13                8                7                5                0

我想使用 R 中的 pivot_longer() 函数来重构这个数据集。鉴于有相当多的组类别,我很难为此编写代码。

我希望它看起来像下面这样:

如有任何意见,我将不胜感激:)

使用names_pattern参数,我们可以做:

pivot_longer(df, c(-fruit, -size), names_pattern = '(^.*)_wk(.*$)',
              names_to = c('Shop_season', 'week'))
#> # A tibble: 135 x 5
#>    fruit size  Shop_season  week  value
#>    <chr> <chr> <chr>        <chr> <dbl>
#>  1 apple big   shopA_summer 1        11
#>  2 apple big   shopA_summer 2         8
#>  3 apple big   shopB_summer 1         4
#>  4 apple big   shopB_summer 2        24
#>  5 apple big   shopC_summer 1         9
#>  6 apple big   shopC_summer 2        10
#>  7 apple big   shopA_winter 1         9
#>  8 apple big   shopA_winter 2        12
#>  9 apple big   shopA_winter 3         5
#> 10 apple big   shopB_winter 1         5
#> # ... with 125 more rows

您可能还想 separate 购物和季节,因为它们实际上是两个不同的变量:

pivot_longer(df, c(-fruit, -size), names_pattern = '(^.*)_wk(.*$)',
              names_to = c('Shop_season', 'week')) %>%
   separate(Shop_season, into = c('shop', 'season'))
#> # A tibble: 135 x 6
#>    fruit size  shop  season week  value
#>    <chr> <chr> <chr> <chr>  <chr> <dbl>
#>  1 apple big   shopA summer 1        11
#>  2 apple big   shopA summer 2         8
#>  3 apple big   shopB summer 1         4
#>  4 apple big   shopB summer 2        24
#>  5 apple big   shopC summer 1         9
#>  6 apple big   shopC summer 2        10
#>  7 apple big   shopA winter 1         9
#>  8 apple big   shopA winter 2        12
#>  9 apple big   shopA winter 3         5
#> 10 apple big   shopB winter 1         5
#> #... with 125 more rows

如果数据是dt,那么

pivot_longer(
  data = dt,
  cols = -c(fruit:size),
  names_to = c("shop_season", "week"),
  names_pattern = "(.*)_(.*)"
)

输出:

# A tibble: 135 x 5
   fruit size  shop_season  week  value
   <chr> <chr> <chr>        <chr> <dbl>
 1 apple big   shopA_summer wk1      13
 2 apple big   shopA_summer wk2      12
 3 apple big   shopB_summer wk1       9
 4 apple big   shopB_summer wk2       9
 5 apple big   shopC_summer wk1       7
 6 apple big   shopC_summer wk2      17
 7 apple big   shopA_winter wk1      10
 8 apple big   shopA_winter wk2      17
 9 apple big   shopA_winter wk3      12
10 apple big   shopB_winter wk1       8