如何从一列创建多列,可能使用 dcast 或 tidyverse
How to create multiple columns from one column, maybe using dcast or tidyverse
我正在学习 R 并试图找出拆分列的方法。我希望以宽格式从单个列传播我的数据。有人告诉我使用 dcast,但我还没有找到最好的方法,并打算尝试通过 tidyverse 进行管道传输。
# sample data
> data <- data.frame(trimesterPeriod = c(first, second, third, PP, third, second, PP, first )
# dataframe
trimesterPeriod
1 first
2 second
3 third
4 PP
5 third
6 second
7 PP
8 first
and i would it to look like this:
#dataframe
ID first second third PP
1 1 0 0 0
2 0 1 0 0
3 0 0 1 0
4 0 0 0 1
5 0 0 1 0
6 0 1 0 0
7 0 0 0 1
8 1 0 0 0
我知道我将不得不更改角色的 trimesterPeriod 数据,但从那时起我不确定去哪里。我想做的是:
data.frame %>%
mutate(rn = row_number(first, second, third, PP)) %>%
spread(trimesterPeriod) %>%
select(-rn)
但我不确定。非常感谢任何建议!
我们可以使用 base R
中的 table
table(seq_len(nrow(data)), data$trimesterPeriod)
-输出
first PP second third
1 1 0 0 0
2 0 0 1 0
3 0 0 0 1
4 0 1 0 0
5 0 0 0 1
6 0 0 1 0
7 0 1 0 0
8 1 0 0 0
或使用tidyverse
library(dplyr)
library(tidyr)
data %>%
mutate(ID = row_number()) %>%
pivot_wider(names_from = trimesterPeriod,
values_from = trimesterPeriod, values_fn = length,
values_fill = 0)
-输出
# A tibble: 8 × 5
ID first second third PP
<int> <int> <int> <int> <int>
1 1 1 0 0 0
2 2 0 1 0 0
3 3 0 0 1 0
4 4 0 0 0 1
5 5 0 0 1 0
6 6 0 1 0 0
7 7 0 0 0 1
8 8 1 0 0 0
数据
data <- structure(list(trimesterPeriod = c("first", "second", "third",
"PP", "third", "second", "PP", "first")),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8"))
使用 dcast
来自 data.table
-
library(data.table)
dcast(setDT(data), seq_len(nrow(data)) ~ trimesterPeriod,
value.var = 'trimesterPeriod', fun.aggregate = length)
# data PP first second third
#1: 1 0 1 0 0
#2: 2 0 0 1 0
#3: 3 0 0 0 1
#4: 4 1 0 0 0
#5: 5 0 0 0 1
#6: 6 0 0 1 0
#7: 7 1 0 0 0
#8: 8 0 1 0 0
我正在学习 R 并试图找出拆分列的方法。我希望以宽格式从单个列传播我的数据。有人告诉我使用 dcast,但我还没有找到最好的方法,并打算尝试通过 tidyverse 进行管道传输。
# sample data
> data <- data.frame(trimesterPeriod = c(first, second, third, PP, third, second, PP, first )
# dataframe
trimesterPeriod
1 first
2 second
3 third
4 PP
5 third
6 second
7 PP
8 first
and i would it to look like this:
#dataframe
ID first second third PP
1 1 0 0 0
2 0 1 0 0
3 0 0 1 0
4 0 0 0 1
5 0 0 1 0
6 0 1 0 0
7 0 0 0 1
8 1 0 0 0
我知道我将不得不更改角色的 trimesterPeriod 数据,但从那时起我不确定去哪里。我想做的是:
data.frame %>%
mutate(rn = row_number(first, second, third, PP)) %>%
spread(trimesterPeriod) %>%
select(-rn)
但我不确定。非常感谢任何建议!
我们可以使用 base R
table
table(seq_len(nrow(data)), data$trimesterPeriod)
-输出
first PP second third
1 1 0 0 0
2 0 0 1 0
3 0 0 0 1
4 0 1 0 0
5 0 0 0 1
6 0 0 1 0
7 0 1 0 0
8 1 0 0 0
或使用tidyverse
library(dplyr)
library(tidyr)
data %>%
mutate(ID = row_number()) %>%
pivot_wider(names_from = trimesterPeriod,
values_from = trimesterPeriod, values_fn = length,
values_fill = 0)
-输出
# A tibble: 8 × 5
ID first second third PP
<int> <int> <int> <int> <int>
1 1 1 0 0 0
2 2 0 1 0 0
3 3 0 0 1 0
4 4 0 0 0 1
5 5 0 0 1 0
6 6 0 1 0 0
7 7 0 0 0 1
8 8 1 0 0 0
数据
data <- structure(list(trimesterPeriod = c("first", "second", "third",
"PP", "third", "second", "PP", "first")),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8"))
使用 dcast
来自 data.table
-
library(data.table)
dcast(setDT(data), seq_len(nrow(data)) ~ trimesterPeriod,
value.var = 'trimesterPeriod', fun.aggregate = length)
# data PP first second third
#1: 1 0 1 0 0
#2: 2 0 0 1 0
#3: 3 0 0 0 1
#4: 4 1 0 0 0
#5: 5 0 0 0 1
#6: 6 0 0 1 0
#7: 7 1 0 0 0
#8: 8 0 1 0 0