传播具有两个分组列的数据框
Spreading a dataframe with two grouping columns
我有一个教师数据集如下:
df <- data.frame(
teacher = c("A", "A", "A", "A", "B", "B", "C", 'C'),
seg = c("1", '1', "2", "2", "1", "2", "1", "2"),
claim = c(
"beth",
'john',
'john',
'beth',
'summer',
'summer',
"hannah",
"hannah"
)
)
理想情况下,我希望像这样传播我的数据集:
期望的输出。
关于如何使用 spread 或 pivot_wide 实现此目的的任何想法?问题是这里有两个分组变量(教师和段)。有些老师可能有多个相同的部分,但有些老师没有。
一个选项是创建按 'teacher'、'seg' 分组的序列列,然后使用 pivot_wider
library(dplyr)
library(tidyr)
library(stringr)
df %>%
group_by(teacher, seg) %>%
mutate(segN = c("", "double")[row_number()]) %>%
ungroup %>%
mutate(seg = str_c("seg", seg, segN)) %>%
select(-segN) %>%
pivot_wider(names_from = seg, values_from = claim)
# A tibble: 3 x 5
# teacher seg1 seg1double seg2 seg2double
# <fct> <fct> <fct> <fct> <fct>
#1 A beth john john beth
#2 B summer <NA> summer <NA>
#3 C hannah <NA> hannah <NA>
从data.table
可以用rowid
简化
library(data.table)
df %>%
mutate(seg = str_c('seg', c('', '_double')[rowid(teacher, seg)], seg)) %>%
pivot_wider(names_from = seg, values_from = claim)
#or use spread
# spread(seg, claim)
# teacher seg1 seg_double1 seg2 seg_double2
#1 A beth john john beth
#2 B summer <NA> summer <NA>
#3 C hannah <NA> hannah <NA>
您还可以使用具有强大重塑功能和一些少量数据准备的基本 R 方式
# find duplicate values
dups <- duplicated(df[, 1:2])
# assign new names to duplicates
df[dups, 2] <- paste0(df[dups, 2], "double")
# use base r reshape function that automatically builds suitable names
wide <- reshape(df, v.names = "claim", idvar = "teacher",
timevar = "seg", direction = "wide", sep = "")
# change varnames to the desired output
names(wide) <- gsub("claim", "seg", names(wide))
wide
我有一个教师数据集如下:
df <- data.frame(
teacher = c("A", "A", "A", "A", "B", "B", "C", 'C'),
seg = c("1", '1', "2", "2", "1", "2", "1", "2"),
claim = c(
"beth",
'john',
'john',
'beth',
'summer',
'summer',
"hannah",
"hannah"
)
)
理想情况下,我希望像这样传播我的数据集:
期望的输出。
关于如何使用 spread 或 pivot_wide 实现此目的的任何想法?问题是这里有两个分组变量(教师和段)。有些老师可能有多个相同的部分,但有些老师没有。
一个选项是创建按 'teacher'、'seg' 分组的序列列,然后使用 pivot_wider
library(dplyr)
library(tidyr)
library(stringr)
df %>%
group_by(teacher, seg) %>%
mutate(segN = c("", "double")[row_number()]) %>%
ungroup %>%
mutate(seg = str_c("seg", seg, segN)) %>%
select(-segN) %>%
pivot_wider(names_from = seg, values_from = claim)
# A tibble: 3 x 5
# teacher seg1 seg1double seg2 seg2double
# <fct> <fct> <fct> <fct> <fct>
#1 A beth john john beth
#2 B summer <NA> summer <NA>
#3 C hannah <NA> hannah <NA>
从data.table
rowid
简化
library(data.table)
df %>%
mutate(seg = str_c('seg', c('', '_double')[rowid(teacher, seg)], seg)) %>%
pivot_wider(names_from = seg, values_from = claim)
#or use spread
# spread(seg, claim)
# teacher seg1 seg_double1 seg2 seg_double2
#1 A beth john john beth
#2 B summer <NA> summer <NA>
#3 C hannah <NA> hannah <NA>
您还可以使用具有强大重塑功能和一些少量数据准备的基本 R 方式
# find duplicate values
dups <- duplicated(df[, 1:2])
# assign new names to duplicates
df[dups, 2] <- paste0(df[dups, 2], "double")
# use base r reshape function that automatically builds suitable names
wide <- reshape(df, v.names = "claim", idvar = "teacher",
timevar = "seg", direction = "wide", sep = "")
# change varnames to the desired output
names(wide) <- gsub("claim", "seg", names(wide))
wide