bash 并行执行的脚本

bash script with parallel execution

我正在尝试在 bash 脚本中使用 parallel 来验证 s3 路径是否存在,并且我正在尝试通过计算路径中的对象来验证多个 s3 路径。如果对象的计数为零,它将继续到 for 循环中的下一个日期,并行它不会按预期工作。

对于我在 for 循环中提供的日期范围,我们实际上在 s3bucket 中没有这些文件夹,并且在函数 checkS3Path 中如果 s3 路径不存在,我正在创建一个0KB 文件,但我没有看到在执行脚本后创建的那些 0KB 文件。从脚本的输出中,我看到 S3 Path Consists CSV Files, Proceeding to next step folder1:+2019-10-03,而不是 S3 Path Doesnt Exists folder1:+2019-10-03。请查看下面的输出。

请告诉我可能是什么问题。

这里是示例代码。

#!/bin/bash
#set -x
s3Bucket=testbucket
version=v20
Array=(folder1 folder2 folder3)

checkS3Path() {
  fldName=
  date=
  objectNum=$(aws s3 ls s3://${s3Bucket}/${version}/${fldName}/date=${date}/ | wc -l)
  echo $objectNum
  if [ "$objectNum" -eq  0 ]
  then
    echo "S3 Path Doesnt Exists ${fldName}:${date}" >> /app/${fldName}.log
    touch /home/ubuntu/${fldName}_${date}.txt
    continue
  else
    echo "S3 Path Consists csv Files, Proceeding to next step ${fldName}:${date}"
  fi
}

final() {
  fldName=
  date=
  checkS3Path $fldName $date
  function2 $fldName $date
  function3 $fldName $date
}

export -f final checkS3Path

for date in 2019-10-{01..03}
do
#  finalstep folder1 $date
  parallel --jobs 4 --eta finalstep ::: "${Array[@]}" ::: +"$date"
done

这是我看到的输出。

$ ./test.sh
Academic tradition requires you to cite works you base your article on.
When using programs that use GNU Parallel to process data for publication
please cite:

  O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
  ;login: The USENIX Magazine, February 2011:42-47.

This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.

To silence this citation notice: run 'parallel --citation'.


Computers / CPU cores / Max jobs to run
1:local / 4 / 4

Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 0s Left: 14 AVG: 0.00s  local:4/0/100%/0.0s 202
S3 Path Consists CSV Files, Proceeding to next step folder1:+2019-10-01
ETA: 0s Left: 13 AVG: 0.00s  local:4/1/100%/2.0s 202
S3 Path Consists CSV Files, Proceeding to next step folder2:+2019-10-01
ETA: 0s Left: 12 AVG: 0.00s  local:4/2/100%/1.0s 202
S3 Path Consists CSV Files, Proceeding to next step folder3:+2019-10-01
Academic tradition requires you to cite works you base your article on.
When using programs that use GNU Parallel to process data for publication
please cite:

  O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
  ;login: The USENIX Magazine, February 2011:42-47.

This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.

To silence this citation notice: run 'parallel --citation'.


Computers / CPU cores / Max jobs to run
1:local / 4 / 4

Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 0s Left: 14 AVG: 0.00s  local:4/0/100%/0.0s 202
S3 Path Consists CSV Files, Proceeding to next step folder1:+2019-10-02
ETA: 0s Left: 13 AVG: 0.00s  local:4/1/100%/0.0s 202
S3 Path Consists CSV Files, Proceeding to next step folder2:+2019-10-02
ETA: 6s Left: 12 AVG: 0.50s  local:4/2/100%/0.5s 202
S3 Path Consists CSV Files, Proceeding to next step folder3:+2019-10-02
ETA: 3s Left: 11 AVG: 0.33s  local:4/3/100%/0.3s 202
Academic tradition requires you to cite works you base your article on.
When using programs that use GNU Parallel to process data for publication
please cite:

  O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
  ;login: The USENIX Magazine, February 2011:42-47.

This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.

To silence this citation notice: run 'parallel --citation'.


Computers / CPU cores / Max jobs to run
1:local / 4 / 4

Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 0s Left: 14 AVG: 0.00s  local:4/0/100%/0.0s 202
S3 Path Consists CSV Files, Proceeding to next step folder1:+2019-10-03
ETA: 0s Left: 13 AVG: 0.00s  local:4/1/100%/1.0s 202
S3 Path Consists CSV Files, Proceeding to next step folder2:+2019-10-03
ETA: 0s Left: 12 AVG: 0.00s  local:4/2/100%/0.5s 202
S3 Path Consists CSV Files, Proceeding to next step folder3:+2019-10-03
ETA: 0s Left: 11 AVG: 0.00s  local:4/3/100%/0.3s 202

$

谢谢

如果 checkS3Path 在手动 运行 时有效,那么您可能只需要:

export s3Bucket=testbucket
export version=v20

每个 GNU Parallel 作业 运行 都有自己的 shell(从 Perl 开始),如果您希望变量对作业可见,这就是您需要导出变量的原因。

另请参阅 env_parallel 以自动执行此操作。