将每 n # 行转换为列并将它们堆叠在 R 中?

Convert every n # of rows to columns and stack them in R?

我有一个制表符分隔的文本文件,其中包含一系列时间戳数据。我已经使用 read.delim() 将其读入 R,它以单列中的字符形式提供了所有数据。示例:

df <- data.frame(c("2017","A","B","C","2018","X","Y","Z","2018","X","B","C"))
colnames(df) <- "col1"
df

我想将每 n # 行(在本例中为 4)转换为列并将它们堆叠起来而不使用 for 循环。期望的结果:

col1 <- c("2017","2018","2018")
col2 <- c("A","X","X")
col3 <- c("B","Y","B")
col4 <- c("C","Z","C")
df2 <- data.frame(col1, col2, col3, col4)
df2

我创建了一个 for 循环,但它无法处理我的 df 中的数百万行。我应该转换为矩阵吗?转换为列表有帮助吗?我尝试了 as.matrix(read.table())unlist() 但没有成功。

您可以使用 tidyr 将数据重塑为您想要的形式,您首先需要改变数据以确定哪些索引应该放在第一位,哪些与特定列一起使用。

假设您知道有 4 个组 (n = 4),您可以在 dplyr 包的帮助下执行如下操作。

library(tidyr)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
n <- 4
df <- data.frame(x = c("2017","A","B","C","2018","X","Y","Z","2018","X","B","C")) %>%
  mutate(cols = rep(1:n, n()/n),
         id = rep(1:(n()/n), each = n))
pivot_wider(df, id_cols = id, names_from = cols, values_from = x, names_prefix = "cols")
#> # A tibble: 3 × 5
#>      id cols1 cols2 cols3 cols4
#>   <int> <chr> <chr> <chr> <chr>
#> 1     1 2017  A     B     C    
#> 2     2 2018  X     Y     Z    
#> 3     3 2018  X     B     C

或者,在base中你可以在向量上使用split函数,然后使用do.call制作数据框

df <- data.frame(x = c("2017","A","B","C","2018","X","Y","Z","2018","X","B","C"))
split_df <- setNames(split(df$x, rep(1:4, 3)), paste0("cols", 1:4))
do.call("data.frame", split_df)
#>   cols1 cols2 cols3 cols4
#> 1  2017     A     B     C
#> 2  2018     X     Y     Z
#> 3  2018     X     B     C

reprex package (v2.0.1)

创建于 2022-02-01

最简单的方法是用 matrix(ncol=x, byrow=TRUE) 创建一个矩阵,然后转换回 data.frame。应该也挺快的。

df |>
        unlist() |>
        matrix(ncol=4, byrow = TRUE) |>
        as.data.frame() |>
        setNames(paste0('col', 1:4))

  col1 col2 col3 col4
1 2017    A    B    C
2 2018    X    Y    Z
3 2018    X    B    C