重塑文件:更少的行 - 更多的列
Reshape file: less rows - more columns
我有一个基本上看起来像这样的文件。
1 A
2 A
2 B
3 A
3 B
3 C
4 A
4 C
...
我想要一个这样的文件
1 A
2 A B
3 A B C
4 A C
...
我尝试使用 R 中的重塑工具,但没有用...
reshape(df, idvar = V1, timevar = V2, direction = "wide")
出现以下错误:
In reshapeWide(data, idvar = idvar, timevar = timevar, ... : multiple rows match for V2=A: first taken
In reshapeWide(data, idvar = idvar, timevar = timevar, ... : multiple rows match for V2=B: first taken
In reshapeWide(data, idvar = idvar, timevar = timevar, ... : multiple rows match for V2=C: first taken
R 或 linux 中的解决方案非常感谢。谢谢!
df <- read.table(header=FALSE, stringsAsFactors=FALSE, text="
1 A
2 A
2 B
3 A
3 B
3 C
4 A
4 C ")
方法一:dplyr
library(dplyr)
library(tidyr)
df %>%
group_by(V1) %>%
mutate(rn = row_number()) %>%
spread(rn, V2)
# # A tibble: 4 x 4
# # Groups: V1 [4]
# V1 `1` `2` `3`
# <int> <chr> <chr> <chr>
# 1 1 A <NA> <NA>
# 2 2 A B <NA>
# 3 3 A B C
# 4 4 A C <NA>
方法二:data.table
library(data.table)
DT <- as.data.table(df)[,rn := seq_len(.N),by="V1"]
dcast(DT, V1 ~ rn, value.var = "V2")
# V1 1 2 3
# 1: 1 A <NA> <NA>
# 2: 2 A B <NA>
# 3: 3 A B C
# 4: 4 A C <NA>
我有一个基本上看起来像这样的文件。
1 A
2 A
2 B
3 A
3 B
3 C
4 A
4 C
...
我想要一个这样的文件
1 A
2 A B
3 A B C
4 A C
...
我尝试使用 R 中的重塑工具,但没有用...
reshape(df, idvar = V1, timevar = V2, direction = "wide")
出现以下错误:
In reshapeWide(data, idvar = idvar, timevar = timevar, ... : multiple rows match for V2=A: first taken
In reshapeWide(data, idvar = idvar, timevar = timevar, ... : multiple rows match for V2=B: first taken
In reshapeWide(data, idvar = idvar, timevar = timevar, ... : multiple rows match for V2=C: first taken
R 或 linux 中的解决方案非常感谢。谢谢!
df <- read.table(header=FALSE, stringsAsFactors=FALSE, text="
1 A
2 A
2 B
3 A
3 B
3 C
4 A
4 C ")
方法一:dplyr
library(dplyr)
library(tidyr)
df %>%
group_by(V1) %>%
mutate(rn = row_number()) %>%
spread(rn, V2)
# # A tibble: 4 x 4
# # Groups: V1 [4]
# V1 `1` `2` `3`
# <int> <chr> <chr> <chr>
# 1 1 A <NA> <NA>
# 2 2 A B <NA>
# 3 3 A B C
# 4 4 A C <NA>
方法二:data.table
library(data.table)
DT <- as.data.table(df)[,rn := seq_len(.N),by="V1"]
dcast(DT, V1 ~ rn, value.var = "V2")
# V1 1 2 3
# 1: 1 A <NA> <NA>
# 2: 2 A B <NA>
# 3: 3 A B C
# 4: 4 A C <NA>