将每 n # 行转换为列并将它们堆叠在 R 中?
Convert every n # of rows to columns and stack them in R?
我有一个制表符分隔的文本文件,其中包含一系列时间戳数据。我已经使用 read.delim()
将其读入 R,它以单列中的字符形式提供了所有数据。示例:
df <- data.frame(c("2017","A","B","C","2018","X","Y","Z","2018","X","B","C"))
colnames(df) <- "col1"
df
我想将每 n # 行(在本例中为 4)转换为列并将它们堆叠起来而不使用 for 循环。期望的结果:
col1 <- c("2017","2018","2018")
col2 <- c("A","X","X")
col3 <- c("B","Y","B")
col4 <- c("C","Z","C")
df2 <- data.frame(col1, col2, col3, col4)
df2
我创建了一个 for 循环,但它无法处理我的 df 中的数百万行。我应该转换为矩阵吗?转换为列表有帮助吗?我尝试了 as.matrix(read.table())
和 unlist()
但没有成功。
您可以使用 tidyr
将数据重塑为您想要的形式,您首先需要改变数据以确定哪些索引应该放在第一位,哪些与特定列一起使用。
假设您知道有 4 个组 (n = 4
),您可以在 dplyr
包的帮助下执行如下操作。
library(tidyr)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
n <- 4
df <- data.frame(x = c("2017","A","B","C","2018","X","Y","Z","2018","X","B","C")) %>%
mutate(cols = rep(1:n, n()/n),
id = rep(1:(n()/n), each = n))
pivot_wider(df, id_cols = id, names_from = cols, values_from = x, names_prefix = "cols")
#> # A tibble: 3 × 5
#> id cols1 cols2 cols3 cols4
#> <int> <chr> <chr> <chr> <chr>
#> 1 1 2017 A B C
#> 2 2 2018 X Y Z
#> 3 3 2018 X B C
或者,在base
中你可以在向量上使用split
函数,然后使用do.call
制作数据框
df <- data.frame(x = c("2017","A","B","C","2018","X","Y","Z","2018","X","B","C"))
split_df <- setNames(split(df$x, rep(1:4, 3)), paste0("cols", 1:4))
do.call("data.frame", split_df)
#> cols1 cols2 cols3 cols4
#> 1 2017 A B C
#> 2 2018 X Y Z
#> 3 2018 X B C
由 reprex package (v2.0.1)
创建于 2022-02-01
最简单的方法是用 matrix(ncol=x, byrow=TRUE)
创建一个矩阵,然后转换回 data.frame。应该也挺快的。
df |>
unlist() |>
matrix(ncol=4, byrow = TRUE) |>
as.data.frame() |>
setNames(paste0('col', 1:4))
col1 col2 col3 col4
1 2017 A B C
2 2018 X Y Z
3 2018 X B C
我有一个制表符分隔的文本文件,其中包含一系列时间戳数据。我已经使用 read.delim()
将其读入 R,它以单列中的字符形式提供了所有数据。示例:
df <- data.frame(c("2017","A","B","C","2018","X","Y","Z","2018","X","B","C"))
colnames(df) <- "col1"
df
我想将每 n # 行(在本例中为 4)转换为列并将它们堆叠起来而不使用 for 循环。期望的结果:
col1 <- c("2017","2018","2018")
col2 <- c("A","X","X")
col3 <- c("B","Y","B")
col4 <- c("C","Z","C")
df2 <- data.frame(col1, col2, col3, col4)
df2
我创建了一个 for 循环,但它无法处理我的 df 中的数百万行。我应该转换为矩阵吗?转换为列表有帮助吗?我尝试了 as.matrix(read.table())
和 unlist()
但没有成功。
您可以使用 tidyr
将数据重塑为您想要的形式,您首先需要改变数据以确定哪些索引应该放在第一位,哪些与特定列一起使用。
假设您知道有 4 个组 (n = 4
),您可以在 dplyr
包的帮助下执行如下操作。
library(tidyr)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
n <- 4
df <- data.frame(x = c("2017","A","B","C","2018","X","Y","Z","2018","X","B","C")) %>%
mutate(cols = rep(1:n, n()/n),
id = rep(1:(n()/n), each = n))
pivot_wider(df, id_cols = id, names_from = cols, values_from = x, names_prefix = "cols")
#> # A tibble: 3 × 5
#> id cols1 cols2 cols3 cols4
#> <int> <chr> <chr> <chr> <chr>
#> 1 1 2017 A B C
#> 2 2 2018 X Y Z
#> 3 3 2018 X B C
或者,在base
中你可以在向量上使用split
函数,然后使用do.call
制作数据框
df <- data.frame(x = c("2017","A","B","C","2018","X","Y","Z","2018","X","B","C"))
split_df <- setNames(split(df$x, rep(1:4, 3)), paste0("cols", 1:4))
do.call("data.frame", split_df)
#> cols1 cols2 cols3 cols4
#> 1 2017 A B C
#> 2 2018 X Y Z
#> 3 2018 X B C
由 reprex package (v2.0.1)
创建于 2022-02-01最简单的方法是用 matrix(ncol=x, byrow=TRUE)
创建一个矩阵,然后转换回 data.frame。应该也挺快的。
df |>
unlist() |>
matrix(ncol=4, byrow = TRUE) |>
as.data.frame() |>
setNames(paste0('col', 1:4))
col1 col2 col3 col4
1 2017 A B C
2 2018 X Y Z
3 2018 X B C