Pivot_longer: 旋转多列相同数据类型的数据
Pivot_longer: Rotating multiple columns of data with same data types
我正在尝试将多列数据旋转成单个数据类型一致的列。
我在下面创建了一个最小示例。
library(tibble)
library(dplyr)
# I have data like this
df <- tibble(contact_1_prefix=c('Mr.','Mrs.','Dr.'),
contact_2_prefix=c('Dr.','Mr.','Mrs.'),
contact_1 = c('Bob Johnson','Robert Johnson','Bobby Johnson'),
contact_2 = c('Tommy Two Tones','Tommy Three Tones','Tommy No Tones'),
contact_1_loc = c('Earth','New York','Los Angeles'),
contact_2_loc = c('London','Geneva','Paris'))
# My attempt at a solution:
df %>% rename(contact_1_name=contact_1,
contact_2_name=contact_2) %>%
pivot_longer(cols=c(matches('_[12]_')),
names_to=c('.value','dat'),
names_pattern = "(.*)_[1-2]_(.*)") %>%
pivot_wider(names_from='dat',values_from='contact')
#What I want is to widen that data to achieve a tibble with these two example lines
df_desired <- tibble(name=c('Bob Johnson','Tommy Two Tones'),
loc =c('Earth','London'),
prefix=c('Mr.','Dr.'))
我想要 name 下的所有名称,loc 下的所有位置,prefix 下的所有前缀。
如果我只使用中间语句中的这个片段:
df %>% rename(contact_1_name=contact_1,
contact_2_name=contact_2) %>%
pivot_longer(cols=c(matches('_[12]_')),
names_to=c('.value','dat'),
names_pattern = "(.*)_[1-2]_(.*)")
输出的dput为:
structure(list(dat = c("prefix", "prefix", "name", "name", "loc",
"loc", "prefix", "prefix", "name", "name", "loc", "loc", "prefix",
"prefix", "name", "name", "loc", "loc"), contact = c("Mr.", "Dr.",
"Bob Johnson", "Tommy Two Tones", "Earth", "London", "Mrs.",
"Mr.", "Robert Johnson", "Tommy Three Tones", "New York", "Geneva",
"Dr.", "Mrs.", "Bobby Johnson", "Tommy No Tones", "Los Angeles",
"Paris")), row.names = c(NA, -18L), class = c("tbl_df", "tbl",
"data.frame"))
据此,我认为 pivot_wider 肯定是解决方案,但存在名称冲突。
我假设一个 pivot_longer 语句就能完成任务。我仔细研究了 但不太明白。我不得不承认我不太明白 names_to = c(".value", "group") 短语的作用。
无论如何,我们将不胜感激。
谢谢
你走在正确的道路上。需要重命名,因为只有名称列没有任何后缀来标识它们。 .value
标识要唯一标识为新列的原始列名的一部分。如果您删除所有内容直到最后一个下划线,剩下的部分是您可以在 names_pattern
.
中使用正则表达式指定的新列名
library(dplyr)
library(tidyr)
df %>%
rename(contact_1_name=contact_1,
contact_2_name=contact_2) %>%
pivot_longer(cols = everything(),
names_to = '.value',
names_pattern = '.*_(\w+)')
# prefix name loc
# <chr> <chr> <chr>
#1 Mr. Bob Johnson Earth
#2 Dr. Tommy Two Tones London
#3 Mrs. Robert Johnson New York
#4 Mr. Tommy Three Tones Geneva
#5 Dr. Bobby Johnson Los Angeles
#6 Mrs. Tommy No Tones Paris
这是一个使用split.default
的解决方案
data.table::rbindlist(
lapply( split.default( df, gsub( "[^0-9]+", "", names(df) ) ),
data.table::setnames,
new = c("prefix", "name", " loc" ) ) )
# prefix name loc
# 1: Mr. Bob Johnson Earth
# 2: Mrs. Robert Johnson New York
# 3: Dr. Bobby Johnson Los Angeles
# 4: Dr. Tommy Two Tones London
# 5: Mr. Tommy Three Tones Geneva
# 6: Mrs. Tommy No Tones Paris
我正在尝试将多列数据旋转成单个数据类型一致的列。
我在下面创建了一个最小示例。
library(tibble)
library(dplyr)
# I have data like this
df <- tibble(contact_1_prefix=c('Mr.','Mrs.','Dr.'),
contact_2_prefix=c('Dr.','Mr.','Mrs.'),
contact_1 = c('Bob Johnson','Robert Johnson','Bobby Johnson'),
contact_2 = c('Tommy Two Tones','Tommy Three Tones','Tommy No Tones'),
contact_1_loc = c('Earth','New York','Los Angeles'),
contact_2_loc = c('London','Geneva','Paris'))
# My attempt at a solution:
df %>% rename(contact_1_name=contact_1,
contact_2_name=contact_2) %>%
pivot_longer(cols=c(matches('_[12]_')),
names_to=c('.value','dat'),
names_pattern = "(.*)_[1-2]_(.*)") %>%
pivot_wider(names_from='dat',values_from='contact')
#What I want is to widen that data to achieve a tibble with these two example lines
df_desired <- tibble(name=c('Bob Johnson','Tommy Two Tones'),
loc =c('Earth','London'),
prefix=c('Mr.','Dr.'))
我想要 name 下的所有名称,loc 下的所有位置,prefix 下的所有前缀。
如果我只使用中间语句中的这个片段:
df %>% rename(contact_1_name=contact_1,
contact_2_name=contact_2) %>%
pivot_longer(cols=c(matches('_[12]_')),
names_to=c('.value','dat'),
names_pattern = "(.*)_[1-2]_(.*)")
输出的dput为:
structure(list(dat = c("prefix", "prefix", "name", "name", "loc",
"loc", "prefix", "prefix", "name", "name", "loc", "loc", "prefix",
"prefix", "name", "name", "loc", "loc"), contact = c("Mr.", "Dr.",
"Bob Johnson", "Tommy Two Tones", "Earth", "London", "Mrs.",
"Mr.", "Robert Johnson", "Tommy Three Tones", "New York", "Geneva",
"Dr.", "Mrs.", "Bobby Johnson", "Tommy No Tones", "Los Angeles",
"Paris")), row.names = c(NA, -18L), class = c("tbl_df", "tbl",
"data.frame"))
据此,我认为 pivot_wider 肯定是解决方案,但存在名称冲突。
我假设一个 pivot_longer 语句就能完成任务。我仔细研究了
无论如何,我们将不胜感激。
谢谢
你走在正确的道路上。需要重命名,因为只有名称列没有任何后缀来标识它们。 .value
标识要唯一标识为新列的原始列名的一部分。如果您删除所有内容直到最后一个下划线,剩下的部分是您可以在 names_pattern
.
library(dplyr)
library(tidyr)
df %>%
rename(contact_1_name=contact_1,
contact_2_name=contact_2) %>%
pivot_longer(cols = everything(),
names_to = '.value',
names_pattern = '.*_(\w+)')
# prefix name loc
# <chr> <chr> <chr>
#1 Mr. Bob Johnson Earth
#2 Dr. Tommy Two Tones London
#3 Mrs. Robert Johnson New York
#4 Mr. Tommy Three Tones Geneva
#5 Dr. Bobby Johnson Los Angeles
#6 Mrs. Tommy No Tones Paris
这是一个使用split.default
data.table::rbindlist(
lapply( split.default( df, gsub( "[^0-9]+", "", names(df) ) ),
data.table::setnames,
new = c("prefix", "name", " loc" ) ) )
# prefix name loc
# 1: Mr. Bob Johnson Earth
# 2: Mrs. Robert Johnson New York
# 3: Dr. Bobby Johnson Los Angeles
# 4: Dr. Tommy Two Tones London
# 5: Mr. Tommy Three Tones Geneva
# 6: Mrs. Tommy No Tones Paris