根据R中的ID将列值拆分为多个列
Split a column value into Multiple columns based on an ID in R
我有一个数据集,每个 ID 都有不同的时间点。我希望每个 ID 有一条记录,我希望将时间点分成不同的列。我不想使用传播,因为我想要列的实际值。一些 ID 每个 ID 有 14 条记录,我希望将 14 条记录分成 14 列。我怎样才能在 R 中实现这一点?
示例数据
ID
时间点
价值
一个
1
是
一个
2
是
一个
3
是
一个
4
是
B
7
是
B
11
是
C
4
是
C
5
是
D
7
是
ID
时间点 1
时间点 2
时间点 3
时间点 4
价值
一个
1
2
3
4
是
B
7
11
是
C
4
5
是
D
7
是
我们可以用dcast
library(data.table)
dcast(setDT(df1), ID + Value ~ paste0("Timepoint",
rowid(ID)), value.var = 'Timepoint')
-输出
ID Value Timepoint1 Timepoint2 Timepoint3 Timepoint4
1: A yes 1 2 3 4
2: B yes 7 11 NA NA
3: C yes 4 5 NA NA
4: D yes 7 NA NA NA
数据
df1 <- structure(list(ID = c("A", "A", "A", "A", "B", "B", "C", "C",
"D"), Timepoint = c(1L, 2L, 3L, 4L, 7L, 11L, 4L, 5L, 7L), Value = c("yes",
"yes", "yes", "yes", "yes", "yes", "yes", "yes", "yes")),
class = "data.frame", row.names = c(NA,
-9L))
tidyverse
df1 %>%
group_by(ID) %>%
mutate(id_rows = row_number()) %>%
pivot_wider(
id_cols = c(ID, Value),
names_from = id_rows,
values_from = Timepoint,
names_prefix = "Timepoint"
) %>%
ungroup()
输出
# A tibble: 4 x 6
ID Value Timepoint1 Timepoint2 Timepoint3 Timepoint4
<chr> <chr> <int> <int> <int> <int>
1 A yes 1 2 3 4
2 B yes 7 11 NA NA
3 C yes 4 5 NA NA
4 D yes 7 NA NA NA
在基础 R 中:
reshape(transform(df1, time = ave(ID, ID, FUN = seq)),
dir = 'wide', idvar = c('ID', 'Value'), sep='')
ID Value Timepoint1 Timepoint2 Timepoint3 Timepoint4
1 A yes 1 2 3 4
5 B yes 7 11 NA NA
7 C yes 4 5 NA NA
9 D yes 7 NA NA NA
不是最好的方法,但它有效:splitstackshape
和 data.table
的组合。其他解决方案已经提出:
library(splitstackshape)
library(data.table)
df <- dcast(getanID(df1, 'ID'), ID~.id, value.var='Timepoint')
colnames(df)[2:5] <- paste("Timepoint", colnames(df)[2:5], sep = "")
输出:
ID Timepoint1 Timepoint2 Timepoint3 Timepoint4
1: A 1 2 3 4
2: B 7 11 NA NA
3: C 4 5 NA NA
4: D 7 NA NA NA
我有一个数据集,每个 ID 都有不同的时间点。我希望每个 ID 有一条记录,我希望将时间点分成不同的列。我不想使用传播,因为我想要列的实际值。一些 ID 每个 ID 有 14 条记录,我希望将 14 条记录分成 14 列。我怎样才能在 R 中实现这一点?
示例数据
ID | 时间点 | 价值 |
---|---|---|
一个 | 1 | 是 |
一个 | 2 | 是 |
一个 | 3 | 是 |
一个 | 4 | 是 |
B | 7 | 是 |
B | 11 | 是 |
C | 4 | 是 |
C | 5 | 是 |
D | 7 | 是 |
ID | 时间点 1 | 时间点 2 | 时间点 3 | 时间点 4 | 价值 |
---|---|---|---|---|---|
一个 | 1 | 2 | 3 | 4 | 是 |
B | 7 | 11 | 是 | ||
C | 4 | 5 | 是 | ||
D | 7 | 是 |
我们可以用dcast
library(data.table)
dcast(setDT(df1), ID + Value ~ paste0("Timepoint",
rowid(ID)), value.var = 'Timepoint')
-输出
ID Value Timepoint1 Timepoint2 Timepoint3 Timepoint4
1: A yes 1 2 3 4
2: B yes 7 11 NA NA
3: C yes 4 5 NA NA
4: D yes 7 NA NA NA
数据
df1 <- structure(list(ID = c("A", "A", "A", "A", "B", "B", "C", "C",
"D"), Timepoint = c(1L, 2L, 3L, 4L, 7L, 11L, 4L, 5L, 7L), Value = c("yes",
"yes", "yes", "yes", "yes", "yes", "yes", "yes", "yes")),
class = "data.frame", row.names = c(NA,
-9L))
tidyverse
df1 %>%
group_by(ID) %>%
mutate(id_rows = row_number()) %>%
pivot_wider(
id_cols = c(ID, Value),
names_from = id_rows,
values_from = Timepoint,
names_prefix = "Timepoint"
) %>%
ungroup()
输出
# A tibble: 4 x 6
ID Value Timepoint1 Timepoint2 Timepoint3 Timepoint4
<chr> <chr> <int> <int> <int> <int>
1 A yes 1 2 3 4
2 B yes 7 11 NA NA
3 C yes 4 5 NA NA
4 D yes 7 NA NA NA
在基础 R 中:
reshape(transform(df1, time = ave(ID, ID, FUN = seq)),
dir = 'wide', idvar = c('ID', 'Value'), sep='')
ID Value Timepoint1 Timepoint2 Timepoint3 Timepoint4
1 A yes 1 2 3 4
5 B yes 7 11 NA NA
7 C yes 4 5 NA NA
9 D yes 7 NA NA NA
不是最好的方法,但它有效:splitstackshape
和 data.table
的组合。其他解决方案已经提出:
library(splitstackshape)
library(data.table)
df <- dcast(getanID(df1, 'ID'), ID~.id, value.var='Timepoint')
colnames(df)[2:5] <- paste("Timepoint", colnames(df)[2:5], sep = "")
输出:
ID Timepoint1 Timepoint2 Timepoint3 Timepoint4
1: A 1 2 3 4
2: B 7 11 NA NA
3: C 4 5 NA NA
4: D 7 NA NA NA