通过合并 R 中的两列来重塑大型数据框
Reshape large dataframe by merging two columns in R
我正在使用 R 中的大型数据框,可能有 25k 行。我的目标是合并两列(重塑 DF),以便左侧的值成为右侧值的 header。请参见下面的示例。这在 R 中可能吗?感谢任何帮助。
我现在拥有的:
SomeVal1 O26FF
8B53L
FFS4C
2L9PT
Z3NW0
X2SGF
SomeVal2 0D121
Y0483
YAAPT
E0OVA
AL4AW
SomeVal3 TFOA6
3H5G3
这就是我想要的:
SomeVal1 SomeVal2 SomeVal3
O26FF 0D121 TFOA6
8B53L Y0483 3H5G3
FFS4C YAAPT
2L9PT E0OVA
Z3NW0 AL4AW
X2SGF
这里有一个方法——把空格(""
改成NA
,然后用fill
用之前的非NA更新NA元素,通过[=得到序列21=] - rowid
并使用 pivot_wider
重塑为 'wide' 格式
library(dplyr)
library(tidyr)
library(purrr)
library(data.table)
df1 %>%
mutate(Col1 = na_if(Col1, "")) %>%
fill(Col1) %>%
mutate(rn = rowid(Col1)) %>%
pivot_wider(names_from = Col1, values_from = Col2, values_fill = "") %>%
select(-rn)
# A tibble: 6 x 3
SomeVal1 SomeVal2 SomeVal3
<chr> <chr> <chr>
1 O26FF "0D121" "TFOA6"
2 8B53L "Y0483" "3H5G3"
3 FFS4C "YAAPT" ""
4 2L9PT "E0OVA" ""
5 Z3NW0 "AL4AW" ""
6 X2SGF "" ""
数据
df1 <- structure(list(Col1 = c("SomeVal1", "", "", "", "", "", "SomeVal2",
"", "", "", "", "SomeVal3", ""), Col2 = c("O26FF", "8B53L", "FFS4C",
"2L9PT", "Z3NW0", "X2SGF", "0D121", "Y0483", "YAAPT", "E0OVA",
"AL4AW", "TFOA6", "3H5G3")), class = "data.frame", row.names = c(NA,
-13L))
我正在使用 R 中的大型数据框,可能有 25k 行。我的目标是合并两列(重塑 DF),以便左侧的值成为右侧值的 header。请参见下面的示例。这在 R 中可能吗?感谢任何帮助。
我现在拥有的:
SomeVal1 O26FF
8B53L
FFS4C
2L9PT
Z3NW0
X2SGF
SomeVal2 0D121
Y0483
YAAPT
E0OVA
AL4AW
SomeVal3 TFOA6
3H5G3
这就是我想要的:
SomeVal1 SomeVal2 SomeVal3
O26FF 0D121 TFOA6
8B53L Y0483 3H5G3
FFS4C YAAPT
2L9PT E0OVA
Z3NW0 AL4AW
X2SGF
这里有一个方法——把空格(""
改成NA
,然后用fill
用之前的非NA更新NA元素,通过[=得到序列21=] - rowid
并使用 pivot_wider
library(dplyr)
library(tidyr)
library(purrr)
library(data.table)
df1 %>%
mutate(Col1 = na_if(Col1, "")) %>%
fill(Col1) %>%
mutate(rn = rowid(Col1)) %>%
pivot_wider(names_from = Col1, values_from = Col2, values_fill = "") %>%
select(-rn)
# A tibble: 6 x 3
SomeVal1 SomeVal2 SomeVal3
<chr> <chr> <chr>
1 O26FF "0D121" "TFOA6"
2 8B53L "Y0483" "3H5G3"
3 FFS4C "YAAPT" ""
4 2L9PT "E0OVA" ""
5 Z3NW0 "AL4AW" ""
6 X2SGF "" ""
数据
df1 <- structure(list(Col1 = c("SomeVal1", "", "", "", "", "", "SomeVal2",
"", "", "", "", "SomeVal3", ""), Col2 = c("O26FF", "8B53L", "FFS4C",
"2L9PT", "Z3NW0", "X2SGF", "0D121", "Y0483", "YAAPT", "E0OVA",
"AL4AW", "TFOA6", "3H5G3")), class = "data.frame", row.names = c(NA,
-13L))