在目录中的多个文件上循环子集并将文件输出到具有后缀的新目录中

Loop subset over several files in a directory and output files into a new directory with a suffix

我已经弄清楚了部分代码,我将在下面描述,但我发现很难在文件列表上迭代(循环)该函数:

library(Hmisc)
filter_173 <- c("kp|917416", "kp|835898", "kp|829747", "kp|767311") 
# This is a vector of values that I want to exclude from the files
setwd("full_path_of_directory_with_desired_files")
filepath <- "//full_path_of_directory_with_desired_files"
list.files(filepath)
predict_files <- list.files(filepath, pattern="predict.txt") 
# all files that I want to filter have _predict.txt in them
predict_full <- file.path(filepath, predict_files)
# generates full pathnames of all desired files I want to filter
sample_names <- sample_names <- sapply(strsplit(predict_files , "_"), `[`, 1)

现在这是一个简单过滤的例子,我想用一个特定的例子文件来做,效果很好。如何在 predict_full

中的所有文件名上循环重复此操作
test_predict <- read.table("a550673-4308980_A05_RepliG_rep2_predict.txt", header = T, sep = "\t")
# this is a file in my current working directory that I set with setwd above
test_predict_filt <- test_predict[test_predict$target_id %nin% filter_173]
     write.table(test_predict_filt, file = "test_predict")

最后如何将过滤后的文件放入与原文件同名且后缀为filtered的文件夹中?

predict_filt <- file.path(filepath, "filtered") 
 # Place filtered files in 
filtered/ subdirectory
filtPreds <- file.path(predict_filt, paste0(sample_names, "_filt_predict.txt"))

我总是卡在循环中!很难共享一个 100% 可重现的示例,因为每个人的工作目录和文件路径都是唯一的,尽管我共享的所有代码都可以工作,如果您将它调整为您机器上的适当路径名。

这应该可以循环遍历每个文件,并使用您需要的文件名规范将它们写到新位置。请务必先更改目录路径。

filter_173 <- c("kp|917416", "kp|835898", "kp|829747", "kp|767311") #This is a vector of values that I want to exclude from the files

filepath <- "//full_path_of_directory_with_desired_files"
filteredpath <- "//full_path_of_directory_with_filtered_results/"

# Get vector of predict.txt files
predict_files <- list.files(filepath, pattern="predict.txt") 

# Get vector of full paths for predict.txt files
predict_full <- file.path(filepath, predict_files) 

# Get vector of sample names
sample_names <- sample_names <- sapply(strsplit(predict_files , "_"), `[`, 1)

# Set for loop to go from 1 to the number of predict.txt files
for(i in 1:length(predict_full))
{
  # Load the current file into a dataframe
  df.predict <- read.table(predict_full[i], header=T, sep="\t")

  # Filter out the unwanted rows
  df.predict <- df.predict[!(df.predict$target_id %in% filter_173)]

  # Write the filtered dataframe to the new directory
  write.table(df.predict, file = file.path(filteredpath, paste(sample_names[i],"_filt_predict.txt",sep = "")))
}