pivot_longer两组变量分为两列
pivot_longer two sets of variables into two columns
我想根据两组变量pivot_longer分成两列。
例如:
df <- data.frame(year = rep(c(2010,2012,2017), 4),
party = rep(c("A", "A", "A", "B", "B", "B"), 2),
pp1 = rep(c(3,4,5,1,2,6), 2),
pp2 = rep(c(1,2,3,4,5,6), 2),
pp3 = rep(c(6,2,3,1,5,4), 2),
l_pp1 = rep(c(1,2,6,3,4,5), 2),
l_pp2 = rep(c(4,5,6,1,2,3), 2),
l_pp3 = rep(c(1,5,4,6,2,3), 2))
数据:
year party pp1 pp2 pp3 l_pp1 l_pp2 l_pp3
1 2010 A 3 1 6 1 4 1
2 2012 A 4 2 2 2 5 5
3 2017 A 5 3 3 6 6 4
4 2010 B 1 4 1 3 1 6
5 2012 B 2 5 5 4 2 2
6 2017 B 6 6 4 5 3 3
7 2010 A 3 1 6 1 4 1
8 2012 A 4 2 2 2 5 5
9 2017 A 5 3 3 6 6 4
10 2010 B 1 4 1 3 1 6
11 2012 B 2 5 5 4 2 2
12 2017 B 6 6 4 5 3 3
我需要的是:
year party area pp l_pp
1 2010 A 1 3 1
2 2012 A 1 4 2
3 2017 A 1 5 6
4 2010 B 1 1 3
5 2012 B 1 2 4
etc.
这里pp和l_pp是同一个区域(pp1 & l_pp1变成pp和l_pp为区域1)。
我会这样想,但是values_to只能穿1号。
df <- df %>%
pivot_longer(!c("party", "year"), names_to = "area", values_to = c("pp", "l_pp"))
这让我有点接近,但这不是我要找的:
df <- df %>%
pivot_longer(!c("party", "year"), names_to = "area", values_to = c("pp"))
year party area pp
1 2010 A pp1 3
2 2010 A pp2 1
3 2010 A pp3 6
4 2010 A l_pp1 1
5 2010 A l_pp2 4
6 2010 A l_pp3 1
编辑 利用 .value
哨兵,这可以通过一个 pivot_longer 来实现,如下所示:
library(tidyr)
df %>%
pivot_longer(-c(year, party), names_to = c(".value", "area"), names_pattern = "^(.*?)(\d+)$")
#> # A tibble: 36 × 5
#> year party area pp l_pp
#> <dbl> <chr> <chr> <dbl> <dbl>
#> 1 2010 A 1 3 1
#> 2 2010 A 2 1 4
#> 3 2010 A 3 6 1
#> 4 2012 A 1 4 2
#> 5 2012 A 2 2 5
#> 6 2012 A 3 2 5
#> 7 2017 A 1 5 6
#> 8 2017 A 2 3 6
#> 9 2017 A 3 3 4
#> 10 2010 B 1 1 3
#> # … with 26 more rows
作为第二个选项,可以通过额外的 pivot_wider
实现相同的结果,就像这样,作为中间步骤,必须添加一个 id 列以唯一标识数据中的行:
library(dplyr)
library(tidyr)
df %>%
pivot_longer(!c(year, party), names_to = c("var", "area"), names_pattern = "(.*)(\d)") %>%
group_by(year, party, area, var) %>%
mutate(id = row_number()) %>%
ungroup() %>%
pivot_wider(names_from = var, values_from = value)
#> # A tibble: 36 x 6
#> year party area id pp l_pp
#> <dbl> <chr> <chr> <int> <dbl> <dbl>
#> 1 2010 A 1 1 3 1
#> 2 2010 A 2 1 1 4
#> 3 2010 A 3 1 6 1
#> 4 2012 A 1 1 4 2
#> 5 2012 A 2 1 2 5
#> 6 2012 A 3 1 2 5
#> 7 2017 A 1 1 5 6
#> 8 2017 A 2 1 3 6
#> 9 2017 A 3 1 3 4
#> 10 2010 B 1 1 1 3
#> # … with 26 more rows
我想根据两组变量pivot_longer分成两列。
例如:
df <- data.frame(year = rep(c(2010,2012,2017), 4),
party = rep(c("A", "A", "A", "B", "B", "B"), 2),
pp1 = rep(c(3,4,5,1,2,6), 2),
pp2 = rep(c(1,2,3,4,5,6), 2),
pp3 = rep(c(6,2,3,1,5,4), 2),
l_pp1 = rep(c(1,2,6,3,4,5), 2),
l_pp2 = rep(c(4,5,6,1,2,3), 2),
l_pp3 = rep(c(1,5,4,6,2,3), 2))
数据:
year party pp1 pp2 pp3 l_pp1 l_pp2 l_pp3
1 2010 A 3 1 6 1 4 1
2 2012 A 4 2 2 2 5 5
3 2017 A 5 3 3 6 6 4
4 2010 B 1 4 1 3 1 6
5 2012 B 2 5 5 4 2 2
6 2017 B 6 6 4 5 3 3
7 2010 A 3 1 6 1 4 1
8 2012 A 4 2 2 2 5 5
9 2017 A 5 3 3 6 6 4
10 2010 B 1 4 1 3 1 6
11 2012 B 2 5 5 4 2 2
12 2017 B 6 6 4 5 3 3
我需要的是:
year party area pp l_pp
1 2010 A 1 3 1
2 2012 A 1 4 2
3 2017 A 1 5 6
4 2010 B 1 1 3
5 2012 B 1 2 4
etc.
这里pp和l_pp是同一个区域(pp1 & l_pp1变成pp和l_pp为区域1)。
我会这样想,但是values_to只能穿1号。
df <- df %>%
pivot_longer(!c("party", "year"), names_to = "area", values_to = c("pp", "l_pp"))
这让我有点接近,但这不是我要找的:
df <- df %>%
pivot_longer(!c("party", "year"), names_to = "area", values_to = c("pp"))
year party area pp
1 2010 A pp1 3
2 2010 A pp2 1
3 2010 A pp3 6
4 2010 A l_pp1 1
5 2010 A l_pp2 4
6 2010 A l_pp3 1
编辑 利用 .value
哨兵,这可以通过一个 pivot_longer 来实现,如下所示:
library(tidyr)
df %>%
pivot_longer(-c(year, party), names_to = c(".value", "area"), names_pattern = "^(.*?)(\d+)$")
#> # A tibble: 36 × 5
#> year party area pp l_pp
#> <dbl> <chr> <chr> <dbl> <dbl>
#> 1 2010 A 1 3 1
#> 2 2010 A 2 1 4
#> 3 2010 A 3 6 1
#> 4 2012 A 1 4 2
#> 5 2012 A 2 2 5
#> 6 2012 A 3 2 5
#> 7 2017 A 1 5 6
#> 8 2017 A 2 3 6
#> 9 2017 A 3 3 4
#> 10 2010 B 1 1 3
#> # … with 26 more rows
作为第二个选项,可以通过额外的 pivot_wider
实现相同的结果,就像这样,作为中间步骤,必须添加一个 id 列以唯一标识数据中的行:
library(dplyr)
library(tidyr)
df %>%
pivot_longer(!c(year, party), names_to = c("var", "area"), names_pattern = "(.*)(\d)") %>%
group_by(year, party, area, var) %>%
mutate(id = row_number()) %>%
ungroup() %>%
pivot_wider(names_from = var, values_from = value)
#> # A tibble: 36 x 6
#> year party area id pp l_pp
#> <dbl> <chr> <chr> <int> <dbl> <dbl>
#> 1 2010 A 1 1 3 1
#> 2 2010 A 2 1 1 4
#> 3 2010 A 3 1 6 1
#> 4 2012 A 1 1 4 2
#> 5 2012 A 2 1 2 5
#> 6 2012 A 3 1 2 5
#> 7 2017 A 1 1 5 6
#> 8 2017 A 2 1 3 6
#> 9 2017 A 3 1 3 4
#> 10 2010 B 1 1 1 3
#> # … with 26 more rows