使用 pivot_longer 正确获取从宽到长的数据
Correctly getting data from wide to long using pivot_longer
我正在竭尽全力尝试使用 pivot_longer 正确地获取从宽到长的数据。
我的数据目前是这样的:
# A tibble: 1 x 11
Player completetion.rank~ completion.rank.n~ ypc.rank.2020 ypc.rank.not2020 ypc.td.2020 ypc.td.not2020 ypc.int.2020 ypc.int.not2020
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Tom Brady 1 0.375 1 0.375 1 0.312 0.25 0.375
# ... with 2 more variables: ypc.sack.2020 <dbl>, ypc.sack.not2020 <dbl>
最后,我希望数据是这样组织的:
最后,这是一个可重现的数据示例:
structure(list(Player = "Tom Brady", completetion.rank.2020 = 1,
completion.rank.not2020 = 0.375, ypc.rank.2020 = 1, ypc.rank.not2020 = 0.375,
ypc.td.2020 = 1, ypc.td.not2020 = 0.3125, ypc.int.2020 = 0.25,
ypc.int.not2020 = 0.375, ypc.sack.2020 = 0, ypc.sack.not2020 = 0.625), row.names = c(NA,
-1L), groups = structure(list(Player = "Tom Brady", .rows = structure(list(
1L), ptype = integer(0), class = c("vctrs_list_of", "vctrs_vctr",
"list"))), row.names = c(NA, -1L), class = c("tbl_df", "tbl",
"data.frame"), .drop = TRUE), class = c("grouped_df", "tbl_df",
"tbl", "data.frame"))
提前致谢。整个 pivot_longer、pivot_wider 的东西每次我试图弄清楚时都会让我陷入困境。
您可以使用正则表达式来执行此操作 -
tidyr::pivot_longer(df,
cols = -Player,
names_to = c('name', '.value'),
names_pattern = '(.*)\.(.*)')
# Player name `2020` not2020
# <chr> <chr> <dbl> <dbl>
#1 Tom Brady completetion.rank 1 NA
#2 Tom Brady completion.rank NA 0.375
#3 Tom Brady ypc.rank 1 0.375
#4 Tom Brady ypc.td 1 0.312
#5 Tom Brady ypc.int 0.25 0.375
#6 Tom Brady ypc.sack 0 0.625
基本上,直到最后一个 .
之前的所有内容都被捕获到 name
列中,其余的用于创建新列。
我正在竭尽全力尝试使用 pivot_longer 正确地获取从宽到长的数据。
我的数据目前是这样的:
# A tibble: 1 x 11
Player completetion.rank~ completion.rank.n~ ypc.rank.2020 ypc.rank.not2020 ypc.td.2020 ypc.td.not2020 ypc.int.2020 ypc.int.not2020
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Tom Brady 1 0.375 1 0.375 1 0.312 0.25 0.375
# ... with 2 more variables: ypc.sack.2020 <dbl>, ypc.sack.not2020 <dbl>
最后,我希望数据是这样组织的:
最后,这是一个可重现的数据示例:
structure(list(Player = "Tom Brady", completetion.rank.2020 = 1,
completion.rank.not2020 = 0.375, ypc.rank.2020 = 1, ypc.rank.not2020 = 0.375,
ypc.td.2020 = 1, ypc.td.not2020 = 0.3125, ypc.int.2020 = 0.25,
ypc.int.not2020 = 0.375, ypc.sack.2020 = 0, ypc.sack.not2020 = 0.625), row.names = c(NA,
-1L), groups = structure(list(Player = "Tom Brady", .rows = structure(list(
1L), ptype = integer(0), class = c("vctrs_list_of", "vctrs_vctr",
"list"))), row.names = c(NA, -1L), class = c("tbl_df", "tbl",
"data.frame"), .drop = TRUE), class = c("grouped_df", "tbl_df",
"tbl", "data.frame"))
提前致谢。整个 pivot_longer、pivot_wider 的东西每次我试图弄清楚时都会让我陷入困境。
您可以使用正则表达式来执行此操作 -
tidyr::pivot_longer(df,
cols = -Player,
names_to = c('name', '.value'),
names_pattern = '(.*)\.(.*)')
# Player name `2020` not2020
# <chr> <chr> <dbl> <dbl>
#1 Tom Brady completetion.rank 1 NA
#2 Tom Brady completion.rank NA 0.375
#3 Tom Brady ypc.rank 1 0.375
#4 Tom Brady ypc.td 1 0.312
#5 Tom Brady ypc.int 0.25 0.375
#6 Tom Brady ypc.sack 0 0.625
基本上,直到最后一个 .
之前的所有内容都被捕获到 name
列中,其余的用于创建新列。