使用满足条件的变量在 bash 脚本中创建一个子集

Create a subset in bash script with variables that meet a condition

我正在处理以下数据集(示例可以在下面找到),我想创建一个 bash 脚本,允许我 select 仅满足一组的记录条件和满足这些条件的所有记录都收集在另一个文件中。

1. Regex to get which continent must be Asia, Africa or Europe, therefore discarding the rest. 

2. Regex to get "Death Percentatge" must be greater than 0.50

3- Regex to get the "Survival Percentatge" to be greater than 2.00.

It is important that it is a bash script that uses these regular expressions in if conditions.
Country,Other names,ISO 3166-1 alpha-3 CODE,Population,Continent,Total Cases,Total Deaths,Tot Cases//1M pop,Tot Deaths/1M pop,Death percentage, Survival Percentage
Afghanistan,Afghanistan,AFG,40462186,Asia,177827,7671,4395,190,4.31,0.42
Albania,Albania,ALB,2872296,Europe,273870,3492,95349,1216,1.27,9.41
Algeria,Algeria,DZA,45236699,Africa,265691,6874,5873,152,2.58,0.57
Andorra,Andorra,AND,77481,Europe,40024,153,516565,1975,0.38,51.54

并且由于这些记录包含在新文件中,因此必须减去“Survival Percentatge”和“Death Percentatge”以创建一个名为“Dif.porc.pts”的新变量来收集绝对值的百分比差异。

我提出的代码如下,但我没有 bash 其他语言的经验。


read
while IFS=, read _ _ _ Continent _ _ _ _ _ Death Percentatge Survival Percentatge; do
     if [[Continent ~ /Africa|Asia|Europe/) && (Death Percentage ~ /[0].[5-9][0-9] &&(Survival 
     Percentage ~ /[2-9].[0-9][0-9]]]
          diff.porc.pts=$($Survical Percentatge/$Death Percentatge)|sed 's/-//'
          paste -sd > new_file.txt
     fi
cat new_file.txt

我还附上了所需输出的示例。

Country,Other names,ISO 3166-1 alpha-3 CODE,Population,Continent,Total Cases,Total Deaths,Tot Cases//1M pop,Tot Deaths/1M pop,Death percentage, Survival Percentage, diff.porc.pts
Afghanistan,Afghanistan,AFG,40462186,Asia,177827,7671,4395,190,4.31,0.42,3.89
Albania,Albania,ALB,2872296,Europe,273870,3492,95349,1216,1.27,9.41,8.14
Algeria,Algeria,DZA,45236699,Africa,265691,6874,5873,152,2.58,0.57,2.01
Andorra,Andorra,AND,77481,Europe,40024,153,516565,1975,0.38,51.54,51.16

如果你能帮助我完成它,我将不胜感激。

提前致谢

  • 您不能在 bash 变量名称中包含空格。
  • 您将 Percentage 拼错为 Percentatge
  • 您弄错了 Continent 的列位置。
  • bash 中的正则表达式运算符是 =~,而不是 ~
  • 您不应该用斜杠将正则表达式括起来。
  • 您将需要使用bc或其他外部命令进行运算 十进制数的计算。

那么请您尝试以下操作:

#!/bin/bash

while read -r line; do
    if (( nr++ == 0 )); then            # header line
        echo "$line,diff.porc.pts"
    else                                # body
        IFS=, read _ _ _ _ Continent _ _ _ _ pDeath pSurvival <<< "$line"
        if [[ $Continent =~ ^(Africa|Asia|Europe)$ && $pDeath =~ ^(0\.[5-9]|[1-9]) && $pSurvival =~ ^([2-9]\.|[1-9][0-9]) ]]; then
            diff=$(echo "$pSurvival - $pDeath" | bc)
            echo "$line,$diff"
        fi
    fi
done < input_file.txt > new_file.txt

输出:

Country,Other names,ISO 3166-1 alpha-3 CODE,Population,Continent,Total Cases,Total Deaths,Tot Cases//1M pop,Tot Deaths/1M pop,Death percentage, Survival Percentage,diff.porc.pts
Albania,Albania,ALB,2872296,Europe,273870,3492,95349,1216,1.27,9.41,8.14

看起来Albania的记录只满足相反的条件 显示的所需输出。