R:在 tidyr 的数据透视函数中使用“.value”时为错误的变量名指定附加后缀?
R: specifying appended suffix for bad variable names while using ".value" in tidyr's pivot functions?
我有一个数据,数据存储在 "wide" 结构中,这样一组变量的多个观察值存储在一行的多个列中。我正在尝试使用 tidyr::pivot_longer()
将我的数据转换为长结构。但是,我收到错误 "Failed to create output due to bad names.",因为我传递给旋转函数的数据框中的一列与 pivot_longer()
想要基于传递 ".value"
到 names_to
参数。
虽然这个错误避免了错误的名称,并且一个选项是更改我传递给 pivot_longer()
的数据中的名称,但我正在尝试通过使用函数本身找到一种避免这种情况的方法。修复名称参数可用于在名称末尾添加数字后缀以避免重复的列名,但我正在尝试添加我自己的字符串而不是后缀。
具体来说,我想知道是否有一种方法可以使用 names_to
参数来创建列名,从而在仍然使用 ".value"
的同时避免名称错误。这样做的动机是避免将列名向量传递给 names_to
。或者,这可能是使用 pivot_longer_spec
可能合适的情况,但是,我不确定如何将此功能与 ".value"
.
结合使用
下面找到的最小工作示例:
library(tidyr)
library(dplyr)
# Create example data
dat <- data.frame(
foo_1a = 1:3,
foo_1b = 1:3,
foo_2a = 1:3,
foo_2b = 1:3,
bar_1a = 1:3,
bar_1b = 1:3,
bar_2a = 1:3,
bar_2b = 1:3,
cat = c("a","b","c"),
dog = c("d","e","f")
)
# No error
dat %>% tidyr::pivot_longer(
cols = ends_with(c("1a", "1b", "2a", "2b")),
names_to = c(".value", "profile"),
names_sep = "_"
)
# Add another variable that causes duplicate names
# when pivoted due to column name prefix
dat_fail <- dat %>% mutate(foo = 4:6)
# "Error: Failed to Create output due to bad names"
# because the function tries to create foo when it's
# already in the data.
dat_fail %>% tidyr::pivot_longer(
cols = ends_with(c("1a", "1b", "2a", "2b")),
names_to = c(".value", "profile"),
names_sep = "_"
)
# Attempt to fix #1: doesn't produce error
# but fails because it does not create columns
# foo and bar and instead places foo and bar
# in the .valuefiller column.
dat_fail %>% tidyr::pivot_longer(
cols = ends_with(c("1a", "1b", "2a", "2b")),
names_to = c(paste0(".value", "filler"), "profile"),
names_sep = "_"
)
# Attempt to fix #2: try passing "unique" to
# repair argument, but doesn't work. Even so,
# this would append numeric suffixes when
# I want to be able to specify the suffix myself.
# Not sure if this is a bug.
dat_fail %>% tidyr::pivot_longer_spec(
cols = ends_with(c("1a", "1b", "2a", "2b")),
names_to = c(".value", "profile"),
names_sep = "_",
names_repair = "unique"
)
# Error in tidyr::pivot_longer_spec(., cols = ends_with(c("1a", "1b", "2a", :
# unused arguments ( cols = ends_with(c("1a", "1b", "2a", "2b")),
# names_to = c(".value", "profile"), names_sep = "_")
# Desired output
# Create example data
dat <- data.frame(
cat = c("a","a","a","a","b","b","b","b","c","c","c","c")
dog = c("d","d","d","d","e","e","e","e","f","f","f","f")
foo = c(1,1,1,1,2,2,2,2,3,3,3,3),
profile = rep(c("1a","1b","2a","2b"), 3),
foo_suffix = c(4,4,4,4,5,5,5,5,6,6,6,6)
)
names_repair
可以接受以列名作为输入的函数。
我们可以用它来构建您想要的结果。以下只是一个示例,可能不是一个很好的示例,但是您可以使用或编写更适合您的用例的函数:
library(tidyr)
library(dplyr)
# Create example data
dat <- data.frame(
foo_1a = 1:3,
foo_1b = 1:3,
foo_2a = 1:3,
foo_2b = 1:3,
bar_1a = 1:3,
bar_1b = 1:3,
bar_2a = 1:3,
bar_2b = 1:3,
cat = c("a","b","c"),
dog = c("d","e","f")
)
dat_fail <- dat %>% mutate(foo = 4:6)
dat_fail %>%
pivot_longer(
cols = ends_with(c("1a", "1b", "2a", "2b")),
names_sep = '_',
names_to = c(".value", "profile"),
names_repair = ~ {
.x[duplicated(.x, fromLast = TRUE)] <- paste(.x[duplicated(.x, fromLast = TRUE)], 'suffix', sep = '_')
.x
}
)
#> New names:
#> * foo -> foo_suffix
#> # A tibble: 12 x 6
#> cat dog foo_suffix profile foo bar
#> <fct> <fct> <int> <chr> <int> <int>
#> 1 a d 4 1a 1 1
#> 2 a d 4 1b 1 1
#> 3 a d 4 2a 1 1
#> 4 a d 4 2b 1 1
#> 5 b e 5 1a 2 2
#> 6 b e 5 1b 2 2
#> 7 b e 5 2a 2 2
#> 8 b e 5 2b 2 2
#> 9 c f 6 1a 3 3
#> 10 c f 6 1b 3 3
#> 11 c f 6 2a 3 3
#> 12 c f 6 2b 3 3
由 reprex package (v0.3.0)
于 2020-06-16 创建
我有一个数据,数据存储在 "wide" 结构中,这样一组变量的多个观察值存储在一行的多个列中。我正在尝试使用 tidyr::pivot_longer()
将我的数据转换为长结构。但是,我收到错误 "Failed to create output due to bad names.",因为我传递给旋转函数的数据框中的一列与 pivot_longer()
想要基于传递 ".value"
到 names_to
参数。
虽然这个错误避免了错误的名称,并且一个选项是更改我传递给 pivot_longer()
的数据中的名称,但我正在尝试通过使用函数本身找到一种避免这种情况的方法。修复名称参数可用于在名称末尾添加数字后缀以避免重复的列名,但我正在尝试添加我自己的字符串而不是后缀。
具体来说,我想知道是否有一种方法可以使用 names_to
参数来创建列名,从而在仍然使用 ".value"
的同时避免名称错误。这样做的动机是避免将列名向量传递给 names_to
。或者,这可能是使用 pivot_longer_spec
可能合适的情况,但是,我不确定如何将此功能与 ".value"
.
下面找到的最小工作示例:
library(tidyr)
library(dplyr)
# Create example data
dat <- data.frame(
foo_1a = 1:3,
foo_1b = 1:3,
foo_2a = 1:3,
foo_2b = 1:3,
bar_1a = 1:3,
bar_1b = 1:3,
bar_2a = 1:3,
bar_2b = 1:3,
cat = c("a","b","c"),
dog = c("d","e","f")
)
# No error
dat %>% tidyr::pivot_longer(
cols = ends_with(c("1a", "1b", "2a", "2b")),
names_to = c(".value", "profile"),
names_sep = "_"
)
# Add another variable that causes duplicate names
# when pivoted due to column name prefix
dat_fail <- dat %>% mutate(foo = 4:6)
# "Error: Failed to Create output due to bad names"
# because the function tries to create foo when it's
# already in the data.
dat_fail %>% tidyr::pivot_longer(
cols = ends_with(c("1a", "1b", "2a", "2b")),
names_to = c(".value", "profile"),
names_sep = "_"
)
# Attempt to fix #1: doesn't produce error
# but fails because it does not create columns
# foo and bar and instead places foo and bar
# in the .valuefiller column.
dat_fail %>% tidyr::pivot_longer(
cols = ends_with(c("1a", "1b", "2a", "2b")),
names_to = c(paste0(".value", "filler"), "profile"),
names_sep = "_"
)
# Attempt to fix #2: try passing "unique" to
# repair argument, but doesn't work. Even so,
# this would append numeric suffixes when
# I want to be able to specify the suffix myself.
# Not sure if this is a bug.
dat_fail %>% tidyr::pivot_longer_spec(
cols = ends_with(c("1a", "1b", "2a", "2b")),
names_to = c(".value", "profile"),
names_sep = "_",
names_repair = "unique"
)
# Error in tidyr::pivot_longer_spec(., cols = ends_with(c("1a", "1b", "2a", :
# unused arguments ( cols = ends_with(c("1a", "1b", "2a", "2b")),
# names_to = c(".value", "profile"), names_sep = "_")
# Desired output
# Create example data
dat <- data.frame(
cat = c("a","a","a","a","b","b","b","b","c","c","c","c")
dog = c("d","d","d","d","e","e","e","e","f","f","f","f")
foo = c(1,1,1,1,2,2,2,2,3,3,3,3),
profile = rep(c("1a","1b","2a","2b"), 3),
foo_suffix = c(4,4,4,4,5,5,5,5,6,6,6,6)
)
names_repair
可以接受以列名作为输入的函数。
我们可以用它来构建您想要的结果。以下只是一个示例,可能不是一个很好的示例,但是您可以使用或编写更适合您的用例的函数:
library(tidyr)
library(dplyr)
# Create example data
dat <- data.frame(
foo_1a = 1:3,
foo_1b = 1:3,
foo_2a = 1:3,
foo_2b = 1:3,
bar_1a = 1:3,
bar_1b = 1:3,
bar_2a = 1:3,
bar_2b = 1:3,
cat = c("a","b","c"),
dog = c("d","e","f")
)
dat_fail <- dat %>% mutate(foo = 4:6)
dat_fail %>%
pivot_longer(
cols = ends_with(c("1a", "1b", "2a", "2b")),
names_sep = '_',
names_to = c(".value", "profile"),
names_repair = ~ {
.x[duplicated(.x, fromLast = TRUE)] <- paste(.x[duplicated(.x, fromLast = TRUE)], 'suffix', sep = '_')
.x
}
)
#> New names:
#> * foo -> foo_suffix
#> # A tibble: 12 x 6
#> cat dog foo_suffix profile foo bar
#> <fct> <fct> <int> <chr> <int> <int>
#> 1 a d 4 1a 1 1
#> 2 a d 4 1b 1 1
#> 3 a d 4 2a 1 1
#> 4 a d 4 2b 1 1
#> 5 b e 5 1a 2 2
#> 6 b e 5 1b 2 2
#> 7 b e 5 2a 2 2
#> 8 b e 5 2b 2 2
#> 9 c f 6 1a 3 3
#> 10 c f 6 1b 3 3
#> 11 c f 6 2a 3 3
#> 12 c f 6 2b 3 3
由 reprex package (v0.3.0)
于 2020-06-16 创建