stacking/melting R 中的多列到多列

stacking/melting multiple columns into multiple columns in R

我正在尝试 melt/stack/gather 将数据框的多个特定列分成 2 列,同时保留所有其他列。 我在 Whosebug 上尝试了很多很多答案但没有成功(下面的一些)。我这里基本上有类似 post 的情况: Reshaping multiple sets of measurement columns (wide format) into single columns (long format) 只有更多的列可以保留和组合。重要的是要提到我的年份列是因素,我的列比下面列出的示例多得多,所以我想调用列名而不是位置。

>df
ID Code Country     year.x   value.x  year.y value.y year.x.x value.x.x              
1  A    USA         2000     34.33422 2001 35.35241  2002   42.30042 
1  A    Spain       2000     34.71842 2001 39.82727  2002   43.22209 
3  B    USA         2000     35.98180 2001 37.70768  2002   44.40232 
3  B    Peru        2000     33.00000 2001 37.66468  2002   41.30232 
4  C    Argentina   2000     37.78005 2001 39.25627  2002   45.72927 
4  C    Peru        2000     40.52575 2001 40.55918  2002   46.62914

我在上面的post的基础上尝试在tidyr中使用pivot_longer,看起来很相似,这导致了各种错误,具体取决于我的操作:

pivot_longer(df, 
             cols = -c(ID, Code, Country), 
             names_to = c(".value", "group"),
             names_sep = ".")

我还在 reshape2 中以各种方式玩过 melt,这些方法要么只熔化值列,要么只熔化年份列。如:

new.df <- reshape2:::melt(df, id.var = c("ID", "Code", "Country"), measure.vars=c("value.x", "value.y", "value.x.x", "value.y.y", "value.x.x.x", "value.y.y.y"), value.name = "value", variable.vars=c('year.x','year.y', "year.x.x", "year.y.y", "year.x.x.x", "year.y.y.y", "value.x", variable.name = "year")

我也尝试过基于其他 posts 的 dplyr gather,但我发现很难理解帮助页面和 posts。 明确我想要实现的目标:

ID Code Country  year   value                
1  A    USA      2000   34.33422  
1  A    Spain    2000   34.71842  
3  B    USA      2000   35.98180  
3  B    Peru     2000   33.00000  
4  C    Argentina2000   37.78005  
4  C    Peru     2000   40.52575 
1  A    USA      2001   35.35241  
1  A    Spain    2001   39.82727  
3  B    USA      2001   37.70768  
3  B    Peru     2001   37.66468  
4  C    Argentina2001   39.25627  
4  C    Peru     2001   40.55918 
1  A    USA      2002   42.30042  
etc.

非常感谢这里的帮助。

我们可以指定names_pattern

library(tidyr)
library(dplyr)
df %>%  
   pivot_longer(cols = -c(ID, Code, Country),
       names_to = c(".value", "group"),names_pattern = "(.*)\.(.*)")

或根据 ?pivot_longer

使用带转义 .names_sep

names_sep - names_sep takes the same specification as separate(), and can either be a numeric vector (specifying positions to break on), or a single string (specifying a regular expression to split on).

这意味着默认情况下正则表达式是 on 并且正则表达式中的 . 匹配任何字符而不是文字点。要获取文字值,请转义或将其放在方括号

pivot_longer(df, 
         cols = -c(ID, Code, Country), 
          names_to = c(".value", "group"),
          names_sep = "\.")
# A tibble: 18 x 6
#      ID Code  Country   group  year value
#   <int> <chr> <chr>     <chr> <int> <dbl>
# 1     1 A     USA       x      2000  34.3
# 2     1 A     USA       y      2001  35.4
# 3     1 A     USA       z      2002  42.3
# 4     1 A     Spain     x      2000  34.7
# 5     1 A     Spain     y      2001  39.8
# 6     1 A     Spain     z      2002  43.2
# 7     3 B     USA       x      2000  36.0
# 8     3 B     USA       y      2001  37.7
# 9     3 B     USA       z      2002  44.4
#10     3 B     Peru      x      2000  33  
#11     3 B     Peru      y      2001  37.7
#12     3 B     Peru      z      2002  41.3
#13     4 C     Argentina x      2000  37.8
#14     4 C     Argentina y      2001  39.3
#15     4 C     Argentina z      2002  45.7
#16     4 C     Peru      x      2000  40.5
#17     4 C     Peru      y      2001  40.6
#18     4 C     Peru      z      2002  46.6

更新

对于更新后的数据集

library(stringr)
df2 %>% 
   rename_at(vars(matches("year|value")), ~ 
     str_replace(., "^([^.]+\.[^.]+)\.([^.]+)$", "\1\2")) %>% 
     pivot_longer(cols = -c(ID, Code, Country),
        names_to = c(".value", "group"),names_pattern = "(.*)\.(.*)")

或者没有 rename,使用正则表达式查找

df2 %>%
   pivot_longer(cols = -c(ID, Code, Country), 
       names_to = c(".value", "group"),
           names_sep = "(?<=year|value)\.")

数据

df <- structure(list(ID = c(1L, 1L, 3L, 3L, 4L, 4L), Code = c("A", 
"A", "B", "B", "C", "C"), Country = c("USA", "Spain", "USA", 
"Peru", "Argentina", "Peru"), year.x = c(2000L, 2000L, 2000L, 
2000L, 2000L, 2000L), value.x = c(34.33422, 34.71842, 35.9818, 
33, 37.78005, 40.52575), year.y = c(2001L, 2001L, 2001L, 2001L, 
2001L, 2001L), value.y = c(35.35241, 39.82727, 37.70768, 37.66468, 
39.25627, 40.55918), year.z = c(2002L, 2002L, 2002L, 2002L, 2002L, 
2002L), value.z = c(42.30042, 43.22209, 44.40232, 41.30232, 45.72927, 
46.62914)), class = "data.frame", row.names = c(NA, -6L))



df2 <- structure(list(ID = c(1L, 1L, 3L, 3L, 4L, 4L), Code = c("A", 
"A", "B", "B", "C", "C"), Country = c("USA", "Spain", "USA", 
"Peru", "Argentina", "Peru"), year.x = c(2000L, 2000L, 2000L, 
2000L, 2000L, 2000L), value.x = c(34.33422, 34.71842, 35.9818, 
33, 37.78005, 40.52575), year.y = c(2001L, 2001L, 2001L, 2001L, 
2001L, 2001L), value.y = c(35.35241, 39.82727, 37.70768, 37.66468, 
39.25627, 40.55918), year.x.x = c(2002L, 2002L, 2002L, 2002L, 
2002L, 2002L), value.x.x = c(42.30042, 43.22209, 44.40232, 41.30232, 
45.72927, 46.62914)), class = "data.frame", row.names = c(NA, 
-6L))