R:如何批量创建多个不同变量名的.R脚本文件?

R: How to batch create multiple copies of an .R script file with different variable names?

这是我第一次 post 来这里,我正在尽我所知努力遵循指南,所以请多多包涵。

我想创建大量类似的 .R 脚本文件,它们仅在使用的变量名称和提及这些变量的字符串中有所不同。当然,这也可以通过搜索和替换来实现,但我想知道是否有更方便的解决方案来更快地创建一堆。

让我们以这个编造的脚本为例(实际数据在这里无关紧要):

prefix.AnExemplaryRandomVariable <- rnorm(n = 100, mean = 0, sd = 1)
AnotherRandomVariable.suffix <- rnorm(n = 100, mean = 10, sd = 3)

plot(prefix.AnExemplaryRandomVariable, AnotherRandomVariable.suffix,
      type = "p", pch = "*", xlab = "An Exemplary Random Variable",
      ylab = "Another Random Variable", main = "A plot of An Exemplary
      Random Variable and Another Random Variable")

我的想法是定义两个向量,每个向量都有 k 个新名称。

newNamesVar1 <- c("prefix.FirstVariable", "prefix.SomeData")
newNamesVar2 <- c("SecondVariable.suffix", "CannotThinkOfMoreNames.suffix")

我要查找的结果是 k 个新的 .R 文件,如下所示:

prefix.FirstVariable <- rnorm(n = 100, mean = 0, sd = 1)
SecondVariable.suffix <- rnorm(n = 100, mean = 10, sd = 3)

plot(prefix.FirstVariable, SecondVariable.suffix, type = "p",
      pch = "*", xlab = "First Variable", ylab = "Second Variable",
      main = "A plot of First Variable and Second Variable")

prefix.SomeData <- rnorm(n = 100, mean = 0, sd = 1)
CannotThinkOfMoreNames.suffix <- rnorm(n = 100, mean = 10, sd = 3)

plot(prefix.SomeData, CannotThinkOfMoreNames.suffix, type = "p",
      pch = "*", xlab = "Some Data", ylab = "Cant Think Of More Names",
      main = "A plot of Some Data and Cannot Think Of More Names")

我看到以下两个挑战:

  1. 用相应的向量条目替换原始变量名
  2. 检查任何字符串是否与原始变量名称相似并替换它们,同时保持语法和格式(区分大小写、间距...)不变。

这是我第一次尝试将 R 用于实际数据分析之外的任何事情,因此我什至无法提供太多的代码草稿。我能够使用 ls() 获取变量名,但我对下一步该做什么一无所知,主要是因为更改不会应用于当前活动的文件,而是应用于一个全新的文件一.

感谢任何解决方案、技巧、提示或建议!

谢谢!

这是一种方法。

此答案的设置

writeLines('
prefix.AnExemplaryRandomVariable <- rnorm(n = 100, mean = 0, sd = 1)
AnotherRandomVariable.suffix <- rnorm(n = 100, mean = 10, sd = 3)

plot(prefix.AnExemplaryRandomVariable, AnotherRandomVariable.suffix,
      type = "p", pch = "*", xlab = "An Exemplary Random Variable",
      ylab = "Another Random Variable",
      main = "A plot of An Exemplary Random Variable and Another Random Variable")
', "template.R")

Table 要使用的替换值,其中列名表示模板字符串,列值是替换文本。

replacements <- data.frame(
  "An Exemplary Random Variable" = c("First Variable", "Some Data"),
  "Another Random Variable" = c("Second Variable", "Cannot Think Of More Names"),
  check.names = FALSE
)
replacements
#   An Exemplary Random Variable    Another Random Variable
# 1               First Variable            Second Variable
# 2                    Some Data Cannot Think Of More Names

替换template.R中每个模板字符串的工作进行替换,最终存储到新文件中。

code <- readLines("template.R")
for (row in seq_len(nrow(replacements))) {
  newcode <- code
  for (col in seq_along(replacements)) {
    if (!is.na(replacements[row,col])) {
      ptn1 <- colnames(replacements)[col] # original
      ptn2 <- gsub(" +", "", ptn1)        # "Title Case Sentence" to "TitleCaseSentence"
      repl1 <- replacements[row,col]
      repl2 <- gsub(" +", "", repl1)
      newcode <- gsub(paste0("\b", ptn1, "\b"), repl1,
                      gsub(paste0("\b", ptn2, "\b"), repl2, newcode))
    }
  }
  writeLines(newcode, sprintf("code_%s.R", row))
}

如果替换字符串(replacements 中特定单元格内的值)是 NA,则不会尝试替换该模式。

输出:

  • code_1.R

    prefix.FirstVariable <- rnorm(n = 100, mean = 0, sd = 1)
    SecondVariable.suffix <- rnorm(n = 100, mean = 10, sd = 3)
    
    plot(prefix.FirstVariable, SecondVariable.suffix,
          type = "p", pch = "*", xlab = "First Variable",
          ylab = "Second Variable",
          main = "A plot of First Variable and Second Variable")
    
  • code_2.R

    prefix.SomeData <- rnorm(n = 100, mean = 0, sd = 1)
    CannotThinkOfMoreNames.suffix <- rnorm(n = 100, mean = 10, sd = 3)
    
    plot(prefix.SomeData, CannotThinkOfMoreNames.suffix,
          type = "p", pch = "*", xlab = "Some Data",
          ylab = "Cannot Think Of More Names",
          main = "A plot of Some Data and Cannot Think Of More Names")
    

限制:

  • 模式字符串在它们自己的行上必须是连续的,所以请注意我将模板 main= 字符串更改为不跨越两行
  • 模式字符串不能直接preceded/followed字母; \b(正则表达式单词边界)的使用允许一些字符(如文字 .),但这并没有试图变得更漂亮

已编辑:完成后,我意识到用空格定义模式和替换字符串可能更容易,并且然后删除第二个 (TitleCase) 模式的空格。这样就避免了用 title-case 分割字符串的一些歧义和技巧。它还允许您的模式或替换为 而不是 标题大小写。