使用变量名从长到宽的格式
Long to wide format using variable names
我有一个广泛的数据集,如下所示:
dataset <- data.frame(id = c(1, 2, 3, 4, 5),
basketball.time1 = c(2, 5, 4, 3, 3),
basketball.time2 = c(3, 4, 5, 3, 2),
basketball.time3 = c(1, 8, 4, 3, 1),
volleyball.time1 = c(2, 3, 4, 0, 1),
volleyball.time2 = c(3, 4, 3, 1, 3),
volleyball.time3 = c(1, 8, 12, 2, 3))
我想要的是长格式的数据集,id
、time
、basketball
和volleyball
作为单独的变量。我想使用以“.”分隔的字符串创建具有三个因子(时间 1、时间 2 和时间 3)的 time
列。在篮球和排球栏的末尾。
非常感谢!
编辑:修正错别字
可能的解决方案:
library(tidyverse)
dataset <- data.frame(id = c(1, 2, 3, 4, 5),
basketball.time1 = c(2, 5, 4, 3, 3),
basketball.time2 = c(3, 4, 5, 3, 2),
basketball.time3 = c(1, 8, 4, 3, 1),
volleyball.time1 = c(2, 3, 4, 0, 1),
volleyball.time2 = c(3, 4, 3, 1, 3),
vollyeball.time3 = c(1, 8, 12, 2, 3))
dataset %>%
pivot_longer(cols = -id) %>%
separate(name,into = c("name", "time")) %>%
pivot_wider(id_cols = c(id, name, time))
#> # A tibble: 15 × 5
#> id time basketball volleyball vollyeball
#> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 1 time1 2 2 NA
#> 2 1 time2 3 3 NA
#> 3 1 time3 1 NA 1
#> 4 2 time1 5 3 NA
#> 5 2 time2 4 4 NA
#> 6 2 time3 8 NA 8
#> 7 3 time1 4 4 NA
#> 8 3 time2 5 3 NA
#> 9 3 time3 4 NA 12
#> 10 4 time1 3 0 NA
#> 11 4 time2 3 1 NA
#> 12 4 time3 3 NA 2
#> 13 5 time1 3 1 NA
#> 14 5 time2 2 3 NA
#> 15 5 time3 1 NA 3
pivot_longer
separate
在 sport
和 time
列中
pivot_wider
sport
列
library(dplyr)
library(tidyr)
dataset %>%
pivot_longer(
-id
) %>%
separate(name, c("sport", "time")) %>%
pivot_wider(
names_from = sport
)
id time basketball volleyball vollyeball
<dbl> <chr> <dbl> <dbl> <dbl>
1 1 time1 2 2 NA
2 1 time2 3 3 NA
3 1 time3 1 NA 1
4 2 time1 5 3 NA
5 2 time2 4 4 NA
6 2 time3 8 NA 8
7 3 time1 4 4 NA
8 3 time2 5 3 NA
9 3 time3 4 NA 12
10 4 time1 3 0 NA
11 4 time2 3 1 NA
12 4 time3 3 NA 2
13 5 time1 3 1 NA
14 5 time2 2 3 NA
15 5 time3 1 NA 3
我们可以使用pivor_longer %>% pivot_wider
。如果我们将适当的参数设置为 pivor_longer
,则不需要 separate
。
library(tidyr)
dataset %>%
pivot_longer(cols = matches('time\d+$'), names_to = c('sport', 'time'), names_pattern = '(.*)\.(.*)') %>%
pivot_wider(names_from = sport, values_from = value)
# A tibble: 15 × 5
id time basketball volleyball vollyeball
<dbl> <chr> <dbl> <dbl> <dbl>
1 1 time1 2 2 NA
2 1 time2 3 3 NA
3 1 time3 1 NA 1
4 2 time1 5 3 NA
5 2 time2 4 4 NA
6 2 time3 8 NA 8
7 3 time1 4 4 NA
8 3 time2 5 3 NA
9 3 time3 4 NA 12
10 4 time1 3 0 NA
11 4 time2 3 1 NA
12 4 time3 3 NA 2
13 5 time1 3 1 NA
14 5 time2 2 3 NA
15 5 time3 1 NA 3
我有一个广泛的数据集,如下所示:
dataset <- data.frame(id = c(1, 2, 3, 4, 5),
basketball.time1 = c(2, 5, 4, 3, 3),
basketball.time2 = c(3, 4, 5, 3, 2),
basketball.time3 = c(1, 8, 4, 3, 1),
volleyball.time1 = c(2, 3, 4, 0, 1),
volleyball.time2 = c(3, 4, 3, 1, 3),
volleyball.time3 = c(1, 8, 12, 2, 3))
我想要的是长格式的数据集,id
、time
、basketball
和volleyball
作为单独的变量。我想使用以“.”分隔的字符串创建具有三个因子(时间 1、时间 2 和时间 3)的 time
列。在篮球和排球栏的末尾。
非常感谢!
编辑:修正错别字
可能的解决方案:
library(tidyverse)
dataset <- data.frame(id = c(1, 2, 3, 4, 5),
basketball.time1 = c(2, 5, 4, 3, 3),
basketball.time2 = c(3, 4, 5, 3, 2),
basketball.time3 = c(1, 8, 4, 3, 1),
volleyball.time1 = c(2, 3, 4, 0, 1),
volleyball.time2 = c(3, 4, 3, 1, 3),
vollyeball.time3 = c(1, 8, 12, 2, 3))
dataset %>%
pivot_longer(cols = -id) %>%
separate(name,into = c("name", "time")) %>%
pivot_wider(id_cols = c(id, name, time))
#> # A tibble: 15 × 5
#> id time basketball volleyball vollyeball
#> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 1 time1 2 2 NA
#> 2 1 time2 3 3 NA
#> 3 1 time3 1 NA 1
#> 4 2 time1 5 3 NA
#> 5 2 time2 4 4 NA
#> 6 2 time3 8 NA 8
#> 7 3 time1 4 4 NA
#> 8 3 time2 5 3 NA
#> 9 3 time3 4 NA 12
#> 10 4 time1 3 0 NA
#> 11 4 time2 3 1 NA
#> 12 4 time3 3 NA 2
#> 13 5 time1 3 1 NA
#> 14 5 time2 2 3 NA
#> 15 5 time3 1 NA 3
pivot_longer
separate
在sport
和time
列中pivot_wider
sport
列
library(dplyr)
library(tidyr)
dataset %>%
pivot_longer(
-id
) %>%
separate(name, c("sport", "time")) %>%
pivot_wider(
names_from = sport
)
id time basketball volleyball vollyeball
<dbl> <chr> <dbl> <dbl> <dbl>
1 1 time1 2 2 NA
2 1 time2 3 3 NA
3 1 time3 1 NA 1
4 2 time1 5 3 NA
5 2 time2 4 4 NA
6 2 time3 8 NA 8
7 3 time1 4 4 NA
8 3 time2 5 3 NA
9 3 time3 4 NA 12
10 4 time1 3 0 NA
11 4 time2 3 1 NA
12 4 time3 3 NA 2
13 5 time1 3 1 NA
14 5 time2 2 3 NA
15 5 time3 1 NA 3
我们可以使用pivor_longer %>% pivot_wider
。如果我们将适当的参数设置为 pivor_longer
,则不需要 separate
。
library(tidyr)
dataset %>%
pivot_longer(cols = matches('time\d+$'), names_to = c('sport', 'time'), names_pattern = '(.*)\.(.*)') %>%
pivot_wider(names_from = sport, values_from = value)
# A tibble: 15 × 5
id time basketball volleyball vollyeball
<dbl> <chr> <dbl> <dbl> <dbl>
1 1 time1 2 2 NA
2 1 time2 3 3 NA
3 1 time3 1 NA 1
4 2 time1 5 3 NA
5 2 time2 4 4 NA
6 2 time3 8 NA 8
7 3 time1 4 4 NA
8 3 time2 5 3 NA
9 3 time3 4 NA 12
10 4 time1 3 0 NA
11 4 time2 3 1 NA
12 4 time3 3 NA 2
13 5 time1 3 1 NA
14 5 time2 2 3 NA
15 5 time3 1 NA 3