Pivot_wider 不删除重复项
Pivot_wider without removing duplicates
我想使用 pivot_wider,目标是通过将重复值分开来使结果列数等于旋转的行数。
我的示例数据集:
data <- data.frame(Person = c("Peter", "Peter", "Peter", "Peter", "Peter", "Peter",
"Carol", "Carol", "Carol", "Carol", "Carol", "Carol"),
GroupID = c(1, 1, 2, 2, 3, 3, 1, 1, 4, 4, 5, 5),
GroupTheme = c(1, 1, 1, 1, 2, 2, 1, 1, 2, 2, 2, 2),
Committee = c("Transport", "State", "Transport", "State", "Transport", "State",
"Technology", "Nature", "Technology", "Nature", "Technology", "Nature"))
我想每个人一行。为此,我需要通过 GroupID 和 groupTheme 扩大数据集。
我想每人一行。请注意,对每个组重复一个人的“委员会”的观察。原始数据集中的每个“名称”都是这样设计的。
目前我用过的代码:
widened = function(col, pre){
data %>%
select(Person, {{col}}) %>%
distinct() %>%
with_groups(Person, ~mutate(.x, n = row_number())) %>%
pivot_wider(names_from = n, values_from = {{col}}, names_prefix = pre)
}
data <- reduce(list(widened(GroupID, "GroupID_"),
widened(GroupTheme, "GroupTheme_"),
widened(Committee, "Committee_")),
left_join, by = "Person")
以下数据集的结果:
Person GroupID_1 GroupID_2 GroupID_3 GroupTheme_1 GroupTheme_2 Committee_1 Committee_2
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 Peter 1 2 3 1 2 Transport State
2 Carol 1 4 5 1 2 Technology Nature
如您所见,有 3 列带有 GroupID_,但只有 2 列带有 GroupThemes_。这是因为 GroupTheme_ 的最大唯一值数在所有行中为 2。
但是,我希望能够将每个 GroupID_ 与其对应的 GroupTheme_ 相匹配。所以,GroupTheme_1 应该对应于 GroupID_1 等等。
数据集应如下所示:
Person GroupID_1 GroupID_2 GroupID_3 GroupTheme_1 GroupTheme_2 GroupTheme_3 Committee_1
1 Peter 1 2 3 1 1 2 Transport
2 Carol 1 4 5 1 2 2 Technology
Committee_2
1 State
2 Nature
在我看来,这是通过不删除 GroupTheme_ 列之间的重复值来完成的。这使我可以按编号将每个 GroupID_ 与每个 GroupTheme_ 匹配,就像原始较长数据集中的情况一样。
我尝试了 pivot_wider 的选项,但没有想出办法。
如果您有其他方法(可能更直接)来解决在旋转更宽后能够将每个 ID 与主题匹配的问题,也非常感谢。
提前致谢
data %>%
group_by(Person) %>%
mutate(name = as.integer(factor(Committee, unique(Committee))))%>%
pivot_wider(c(Person, GroupID, GroupTheme), values_from = Committee,
names_prefix = 'Committee_') %>%
mutate(name = row_number()) %>%
pivot_wider(c(Person, starts_with('Committee')),
values_from = c(GroupID, GroupTheme))
# A tibble: 2 x 9
# Groups: Person [2]
Person Committee_1 Committee_2 GroupID_1 GroupID_2 GroupID_3 GroupTheme_1 GroupTheme_2 GroupTheme_3
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Peter Transport State 1 2 3 1 1 2
2 Carol Technology Nature 1 4 5 1 2 2
我想使用 pivot_wider,目标是通过将重复值分开来使结果列数等于旋转的行数。
我的示例数据集:
data <- data.frame(Person = c("Peter", "Peter", "Peter", "Peter", "Peter", "Peter",
"Carol", "Carol", "Carol", "Carol", "Carol", "Carol"),
GroupID = c(1, 1, 2, 2, 3, 3, 1, 1, 4, 4, 5, 5),
GroupTheme = c(1, 1, 1, 1, 2, 2, 1, 1, 2, 2, 2, 2),
Committee = c("Transport", "State", "Transport", "State", "Transport", "State",
"Technology", "Nature", "Technology", "Nature", "Technology", "Nature"))
我想每个人一行。为此,我需要通过 GroupID 和 groupTheme 扩大数据集。 我想每人一行。请注意,对每个组重复一个人的“委员会”的观察。原始数据集中的每个“名称”都是这样设计的。
目前我用过的代码:
widened = function(col, pre){
data %>%
select(Person, {{col}}) %>%
distinct() %>%
with_groups(Person, ~mutate(.x, n = row_number())) %>%
pivot_wider(names_from = n, values_from = {{col}}, names_prefix = pre)
}
data <- reduce(list(widened(GroupID, "GroupID_"),
widened(GroupTheme, "GroupTheme_"),
widened(Committee, "Committee_")),
left_join, by = "Person")
以下数据集的结果:
Person GroupID_1 GroupID_2 GroupID_3 GroupTheme_1 GroupTheme_2 Committee_1 Committee_2
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 Peter 1 2 3 1 2 Transport State
2 Carol 1 4 5 1 2 Technology Nature
如您所见,有 3 列带有 GroupID_,但只有 2 列带有 GroupThemes_。这是因为 GroupTheme_ 的最大唯一值数在所有行中为 2。
但是,我希望能够将每个 GroupID_ 与其对应的 GroupTheme_ 相匹配。所以,GroupTheme_1 应该对应于 GroupID_1 等等。 数据集应如下所示:
Person GroupID_1 GroupID_2 GroupID_3 GroupTheme_1 GroupTheme_2 GroupTheme_3 Committee_1
1 Peter 1 2 3 1 1 2 Transport
2 Carol 1 4 5 1 2 2 Technology
Committee_2
1 State
2 Nature
在我看来,这是通过不删除 GroupTheme_ 列之间的重复值来完成的。这使我可以按编号将每个 GroupID_ 与每个 GroupTheme_ 匹配,就像原始较长数据集中的情况一样。
我尝试了 pivot_wider 的选项,但没有想出办法。
如果您有其他方法(可能更直接)来解决在旋转更宽后能够将每个 ID 与主题匹配的问题,也非常感谢。
提前致谢
data %>%
group_by(Person) %>%
mutate(name = as.integer(factor(Committee, unique(Committee))))%>%
pivot_wider(c(Person, GroupID, GroupTheme), values_from = Committee,
names_prefix = 'Committee_') %>%
mutate(name = row_number()) %>%
pivot_wider(c(Person, starts_with('Committee')),
values_from = c(GroupID, GroupTheme))
# A tibble: 2 x 9
# Groups: Person [2]
Person Committee_1 Committee_2 GroupID_1 GroupID_2 GroupID_3 GroupTheme_1 GroupTheme_2 GroupTheme_3
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Peter Transport State 1 2 3 1 1 2
2 Carol Technology Nature 1 4 5 1 2 2