R:在 tidyr 的数据透视函数中使用“.value”时为错误的变量名指定附加后缀?

R: specifying appended suffix for bad variable names while using ".value" in tidyr's pivot functions?

我有一个数据,数据存储在 "wide" 结构中,这样一组变量的多个观察值存储在一行的多个列中。我正在尝试使用 tidyr::pivot_longer() 将我的数据转换为长结构。但是,我收到错误 "Failed to create output due to bad names.",因为我传递给旋转函数的数据框中的一列与 pivot_longer() 想要基于传递 ".value"names_to 参数。

虽然这个错误避免了错误的名称,并且一个选项是更改我传递给 pivot_longer() 的数据中的名称,但我正在尝试通过使用函数本身找到一种避免这种情况的方法。修复名称参数可用于在名称末尾添加数字后缀以避免重复的列名,但我正在尝试添加我自己的字符串而不是后缀。

具体来说,我想知道是否有一种方法可以使用 names_to 参数来创建列名,从而在仍然使用 ".value" 的同时避免名称错误。这样做的动机是避免将列名向量传递给 names_to。或者,这可能是使用 pivot_longer_spec 可能合适的情况,但是,我不确定如何将此功能与 ".value".

结合使用

下面找到的最小工作示例:

library(tidyr)
library(dplyr)

# Create example data
dat <- data.frame(
  foo_1a = 1:3,
  foo_1b = 1:3,
  foo_2a = 1:3,
  foo_2b = 1:3,
  bar_1a = 1:3,
  bar_1b = 1:3,
  bar_2a = 1:3,
  bar_2b = 1:3,
  cat = c("a","b","c"),
  dog = c("d","e","f")
)

# No error
dat %>% tidyr::pivot_longer(
  cols = ends_with(c("1a", "1b", "2a", "2b")),
  names_to = c(".value", "profile"),
  names_sep = "_"
)

# Add another variable that causes duplicate names
# when pivoted due to column name prefix
dat_fail <- dat %>% mutate(foo = 4:6)

# "Error: Failed to Create output due to bad names"
# because the function tries to create foo when it's
# already in the data.
dat_fail %>% tidyr::pivot_longer(
  cols = ends_with(c("1a", "1b", "2a", "2b")),
  names_to = c(".value", "profile"),
  names_sep = "_"
)

# Attempt to fix #1: doesn't produce error
# but fails because it does not create columns
# foo and bar and instead places foo and bar
# in the .valuefiller column.
dat_fail %>% tidyr::pivot_longer(
  cols = ends_with(c("1a", "1b", "2a", "2b")),
  names_to = c(paste0(".value", "filler"), "profile"),
  names_sep = "_"
)

# Attempt to fix #2: try passing "unique" to
# repair argument, but doesn't work. Even so,
# this would append numeric suffixes when
# I want to be able to specify the suffix myself.
# Not sure if this is a bug.
dat_fail %>% tidyr::pivot_longer_spec(
  cols = ends_with(c("1a", "1b", "2a", "2b")),
  names_to = c(".value", "profile"),
  names_sep = "_",
  names_repair = "unique"
)

# Error in tidyr::pivot_longer_spec(., cols = ends_with(c("1a", "1b", "2a",  :
# unused arguments (  cols = ends_with(c("1a", "1b", "2a", "2b")),
# names_to = c(".value", "profile"), names_sep = "_")

# Desired output

# Create example data
dat <- data.frame(
  cat = c("a","a","a","a","b","b","b","b","c","c","c","c")
  dog = c("d","d","d","d","e","e","e","e","f","f","f","f")
  foo = c(1,1,1,1,2,2,2,2,3,3,3,3),
  profile = rep(c("1a","1b","2a","2b"), 3),
  foo_suffix = c(4,4,4,4,5,5,5,5,6,6,6,6)
)

names_repair 可以接受以列名作为输入的函数。

我们可以用它来构建您想要的结果。以下只是一个示例,可能不是一个很好的示例,但是您可以使用或编写更适合您的用例的函数:

library(tidyr)
library(dplyr)

# Create example data
dat <- data.frame(
  foo_1a = 1:3,
  foo_1b = 1:3,
  foo_2a = 1:3,
  foo_2b = 1:3,
  bar_1a = 1:3,
  bar_1b = 1:3,
  bar_2a = 1:3,
  bar_2b = 1:3,
  cat = c("a","b","c"),
  dog = c("d","e","f")
)

dat_fail <- dat %>% mutate(foo = 4:6)

dat_fail %>% 
  pivot_longer(
    cols = ends_with(c("1a", "1b", "2a", "2b")),
    names_sep = '_',
    names_to = c(".value", "profile"),
    names_repair = ~ {
      .x[duplicated(.x, fromLast = TRUE)] <- paste(.x[duplicated(.x, fromLast = TRUE)], 'suffix', sep = '_')
      .x
      }
  )
#> New names:
#> * foo -> foo_suffix
#> # A tibble: 12 x 6
#>    cat   dog   foo_suffix profile   foo   bar
#>    <fct> <fct>      <int> <chr>   <int> <int>
#>  1 a     d              4 1a          1     1
#>  2 a     d              4 1b          1     1
#>  3 a     d              4 2a          1     1
#>  4 a     d              4 2b          1     1
#>  5 b     e              5 1a          2     2
#>  6 b     e              5 1b          2     2
#>  7 b     e              5 2a          2     2
#>  8 b     e              5 2b          2     2
#>  9 c     f              6 1a          3     3
#> 10 c     f              6 1b          3     3
#> 11 c     f              6 2a          3     3
#> 12 c     f              6 2b          3     3

reprex package (v0.3.0)

于 2020-06-16 创建