运行 R中根据逻辑条件的脚本

Question

在我的数据集中，我使用组（层）SKU-acnumber-year。这里有一个小例子：

df=structure(list(SKU = c(11202L, 11202L, 11202L, 11202L, 11202L, 
11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 
11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L
), stuff = c(8.85947691, 9.450108704, 10.0407405, 10.0407405, 
10.63137229, 11.22200409, 11.22200409, 11.81263588, 12.40326767, 
12.40326767, 12.40326767, 12.99389947, 13.58453126, 14.17516306, 
14.76579485, 15.94705844, 17.12832203, 17.71895382, 21.26274458, 
25.98779894, 63.19760196), action = c(0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L), 
    acnumber = c(137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L, 
    137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L, 
    137L, 137L, 137L), year = c(2018L, 2018L, 2018L, 2018L, 2018L, 
    2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 
    2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L)), .Names = c("SKU", 
"stuff", "action", "acnumber", "year"), class = "data.frame", row.names = c(NA, 
-21L))

非常重要：

action 列只有两个值 0 和 1。正如我们在这个例子中看到的，有 3 个 1 类别动作的观测值和 18 个 0 类别的观测值。

我需要设置逻辑条件。因此，对于按 1 类行为进行 1 到 4 次观察的组，则运行 script1.r

并且对于 1 类行为的观察值 >=5 的组，则必须运行 script2.r

我会这样想象，script3.r被创建，具有以下内容（条件），但我不知道如何正确设置这些逻辑条件。

# i take data from sql
dbHandle <- odbcDriverConnect("driver={SQL Server};server=;database=;trusted_connection=true")
sql <- paste0(select needed columns)
df <- sqlQuery(dbHandle, sql)



   for groups where from 1-4  observations by stuff of 1 category of action then  C:/path to/скрипт1.r
(or if  groups have from 1-4  observations by stuff of 1 category of action then  C:/path to/script1.r)
    for  groups   where >=5 observations by stuff of 1 category of action then C:/path to/script2.r
( of if groups  have >=5 observations by stuff of 1 category of action then C:/path to/script2.r)

我该如何实施？ script.3r 运行s by schedule，它会按照时间表工作，为了运行两个脚本。我只是不想分别为每个脚本制作我的日程表。

Answer 1

考虑 if by 中的逻辑，即按因子对数据帧进行切片的方法。运行其他脚本通过命令行 system() 调用 Rscript（假设 R bin 目录设置为您的 PATH 环境变量）：

by_list <- by(df, df[,c("SKU", "acnumber", "year")], function(sub) {

  if (sum(sub$action == 1) %in% c(1:4))   system("Rscript /path/to/script1.r")
  if (sum(sub$action == 1) >= 5)          system("Rscript /path/to/script2.r")

  return(sub)
})

更好的是，source() 主脚本中的外部脚本，确保将两个脚本的整个过程包装在 function() 调用中，甚至添加特定 SKU 等参数。否则，source 将运行那些文件。通过这种方式，可以return输出

source("/path/to/script1.r")   # IMPORTS script1_function()
source("/path/to/script2.r")   # IMPORTS script2_function()

by_list <- by(df, df[,c("SKU", "acnumber", "year")], function(sub) {

  current_SKU <- max(sub$SKU)   # OR min(sub$SKU) OR sub$SKU[[1]]

  if (sum(sub$action == 1) %in% c(1:4))  output <- script1_function()
  if (sum(sub$action == 1) >= 5)         output <- script2_function()

  return(output)
})

运行 R中根据逻辑条件的脚本

run the script according to logical conditions in R

odbc

if-statement

r

rodbc

非常重要：