如何使用 for 循环从数据框中提取列

Question

我正在尝试使用 for 循环从数据框（名为 table1）中提取列并创建一个新数据框（名为 smalldata），后者只有这些3 列（ID1、ID2、ID3）。我在下面包含了我的代码，但它不起作用。

for (i in 1:3) {
  idlist[[i]] <- table1$ID[i]
}
smalldata <- do.call(cbind, idlist)
View(smalldata)

能否将 [i] 与数据框中的 $ 一起使用以在 for 循环中提取这些列？

编辑：进行循环的原因是我的列名是按顺序命名的。例如：我有 ID1-ID100、EVENT1-EVENT100、EXP1-EXP100。在这个例子中我想做的是创建 100 个数据集。首先，我想提取 ID1、EVENT1、EXP1 并创建数据集并导出。然后我想拉取ID2、EVENT2、EXP2和export等等。感谢任何额外的输入。

Answer 1

如果你必须用 for 循环来做，你可以解决这个问题：

new <- list()      # construct as list -- data.frames are fancy lists
cols <- c(1, 5, 3) # use a vector of column indices
for (i in seq_along(cols)) {
  # append the list at each column
  new[[i]] <- mtcars[, cols[i], drop = FALSE]
}

new <- as.data.frame(new)      # make list into data.frame
identical(new, mtcars[, cols]) # check that this produces the same thing
#> [1] TRUE
head(new)
#>                    mpg drat disp
#> Mazda RX4         21.0 3.90  160
#> Mazda RX4 Wag     21.0 3.90  160
#> Datsun 710        22.8 3.85  108
#> Hornet 4 Drive    21.4 3.08  258
#> Hornet Sportabout 18.7 3.15  360
#> Valiant           18.1 2.76  225
str(new)
#> 'data.frame':    32 obs. of  3 variables:
#>  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#>  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
#>  $ disp: num  160 160 108 258 360 ...

^{由 reprex package (v2.0.1)}

于 2022-05-20 创建

编辑

根据更多信息，以下内容应该有效。但是，for 循环似乎不是必需的，apply 系列函数似乎已经足够好了。希望如果您的处理需要 for 循环，那么将这些组合起来就足以满足您的需求。

data <- Reduce(
  cbind,
  lapply(
    1:20,
    function(i) {
      out <- data.frame(
        id = order(runif(5)),
        event = runif(5) < .5,
        other_col = runif(5)
      )
      colnames(out) <- paste0(colnames(out), i)
      out
    }
  )
)

# just a quick peak
str(data[, c(1:3, 9:12, 21:24)])
#> 'data.frame':    5 obs. of  11 variables:
#>  $ id1       : int  3 2 1 4 5
#>  $ event1    : logi  FALSE FALSE TRUE TRUE FALSE
#>  $ other_col1: num  0.617 0.951 0.511 0.185 0.667
#>  $ other_col3: num  0.6856 0.0524 0.5786 0.9265 0.2291
#>  $ id4       : int  4 2 1 5 3
#>  $ event4    : logi  TRUE TRUE FALSE FALSE FALSE
#>  $ other_col4: num  0.0849 0.8345 0.8465 0.1958 0.2534
#>  $ other_col7: num  0.656 0.353 0.604 0.973 0.381
#>  $ id8       : int  2 3 5 4 1
#>  $ event8    : logi  TRUE FALSE FALSE TRUE TRUE
#>  $ other_col8: num  0.646 0.693 0.534 0.624 0.625

result <- lapply(1:20, function(i) {
  # make pattern (must have letters before number)
  pattern <- paste0("[a-z]", i, "$") 
  
  # find the column indeces that match the pattern
  ind <- grep(pattern, colnames(data))
  
  # extract those indices
  res <- data[, ind, ]
  
  # optional: rename columns
  colnames(res) <- sub(paste0(i, "$"), "", colnames(res))
  res
})

head(result)
#> [[1]]
#>   id event other_col
#> 1  3 FALSE 0.6174577
#> 2  2 FALSE 0.9509916
#> 3  1  TRUE 0.5107370
#> 4  4  TRUE 0.1851543
#> 5  5 FALSE 0.6670226
#> 
#> [[2]]
#>   id event other_col
#> 1  3  TRUE 0.8261719
#> 2  4 FALSE 0.4171351
#> 3  1  TRUE 0.5640345
#> 4  5  TRUE 0.6825371
#> 5  2 FALSE 0.4381013
#> 
#> [[3]]
#>   id event  other_col
#> 1  4 FALSE 0.68559712
#> 2  3 FALSE 0.05241906
#> 3  2 FALSE 0.57857342
#> 4  1  TRUE 0.92649458
#> 5  5  TRUE 0.22908630
#> 
#> [[4]]
#>   id event  other_col
#> 1  4  TRUE 0.08491369
#> 2  2  TRUE 0.83452439
#> 3  1 FALSE 0.84650621
#> 4  5 FALSE 0.19578470
#> 5  3 FALSE 0.25342999
#> 
#> [[5]]
#>   id event other_col
#> 1  4 FALSE 0.8912857
#> 2  1 FALSE 0.1261470
#> 3  3 FALSE 0.7962369
#> 4  5  TRUE 0.3911494
#> 5  2 FALSE 0.6041862
#> 
#> [[6]]
#>   id event other_col
#> 1  4  TRUE 0.8987728
#> 2  2  TRUE 0.2830371
#> 3  5 FALSE 0.6696249
#> 4  3 FALSE 0.6249742
#> 5  1 FALSE 0.4754757

^{由 reprex package (v2.0.1)}

创建于 2022-05-22

Answer 2

看到您的编辑后，这里的答案并未直接回答您的问题，但确实解决了您的问题。一般我把你的数据重新格式化成长格式然后按组导出。

df_main <- data.frame(
  id = 1:26, # you need a row ID so you can unpivot
  ID1 = sample(letters, 26),
  event1 = sample(1:26),
  ID2 = sample(letters, 26),
  event2 = sample(1:26)
)

library(tidyr)

df_pivot <- df_main |> 
  pivot_longer(
    # don't pivot the ID column
    cols = c(everything(), -id), names_to = c("type", "number"), 
    # transform values into lists so characters and integers can be in the same column
    names_pattern = "([A-z]+)(\d+)", values_transform = as.list
  ) |> 
  pivot_wider(names_from = type, values_from = value)

library(dplyr)

df_nested <- df_pivot |> 
  group_by(number) |> 
  nest()

library(purrr)

export_data <- function(number, data) {
  # write.xlsx for exporting, maybe
  # could include the number in the file name
  print(number)
  print(data)
}

df_nested |> 
  with(
    walk2(number, data, export_data)
  )

旧：

听起来像是 dplyr::select

的一个很好的用例

library(dplyr)

# character vector of column names
vec_column_names <- c("Species", "Petal.Width")

df_small <- iris |> 
  select(all_of(vec_column_names))

# or a vector of column positions
df_small <- iris |> 
  select(1:3)

如何使用 for 循环从数据框中提取列

How to use a for loop to extract columns from a data frame

loops

r

编辑