使用 \ 分隔符搜索最后一列,并将与其关联的电子邮件地址保存到变量中

Search on the last column with \ delimiter and save the email address associated to it to a variable

我有两个文件。

file1.txt 包含:

META GAIN CORP
GG$
ABG$
PEPRA_UAT
12GHR
CC$
USDP_MAIN
XQ$
PR$
MIX_DEV

和file2.csv包含:

\fr.usdp.org\SOLE\Home\RD,Mailbox.FRmeshare@usdp.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER FLOOR,Jay.Pau@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER FLOOR,Jay.Pau@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\META GAIN CORP,Mary.White@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\META GAIN CORP,Sed.Rasonn@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\META GAIN CORP,Farah.Karlus@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\META GAIN CORP,Mer.Sus@usdpwater.org
\fr.usdp.org\SOLE\Shares\FR\USDP WATER\ABG$,Geboi.torm@usdpwater.org
\fr.usdp.org\SOLE\Shares\FR\USDP WATER\ABG$,Geboi.torm@usdpwater.org
\fr.usdp.org\SOLE\Shares\FR\USDP WATER\ABG$,Josua.Durant@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\HHR DATABASE,Geboi.torm@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\HHR DB2 EDU,Geboi.torm@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\HHR DB2 EDU,Alex.Gold@usdp.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\NICE SHORT,Leni.Braft@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\PRO DEV,Kath.wetfield@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\DUK 20154 USER,
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\DUK 20154 USER,Carlo.Gomez@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\FARE GRUST,Jason.Desanre@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\XYZ GROUP,Aaron.Lee@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\XYZ TEAM TOOLKIT,Aaron.Lee@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\BILLING ELEMENT,Matheo.Logan@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\RRT_SEC,John.Tian@usdpwater.org

我的脚本中有这个,但如果有空格,我无法准确获取最后一列。

for sr in `cat file1.txt`; do
            sname=`echo ${sr} | awk -F: '{ print  }'`
            emdrs=`grep -Fw "${sname}" file2.csv | awk -F',' '{print}' | sed 's/[[:space:]]//' | xargs | sed -e 's/ /,/g'`
            echo "$sname || To: $emdrs" >> details.txt
done

details.txt输出

META || Mary.White@usdpwater.org,Sed.Rasonn@usdpwater.org,Farah.Karlus@usdpwater.org,Mer.Sus@usdpwater.org
GAIN || Mary.White@usdpwater.org,Sed.Rasonn@usdpwater.org,Farah.Karlus@usdpwater.org,Mer.Sus@usdpwater.org
CORP || Mary.White@usdpwater.org,Sed.Rasonn@usdpwater.org,Farah.Karlus@usdpwater.org,Mer.Sus@usdpwater.org

但我想要的是

META GAIN CORP || To: Mary.White@usdpwater.org,Sed.Rasonn@usdpwater.org,Farah.Karlus@usdpwater.org,Mer.Sus@usdpwater.org

而且我应该还可以搜索带有 $ 的字符串,例如这个 ABG$ )并且不包括重复的电子邮件。

ABG$ || To: Geboi.torm@usdpwater.org,Josua.Durant@usdpwater.org

任何帮助将不胜感激。

是这样的吗?

while read -r sr; do
  emails="$(grep -F "\${sr}," file2.csv | cut -d',' -f2 | sort -u | tr -d '\r' | paste -sd',')"
  if [ -n "$emails" ]; then
    echo "$sr || To: $emails"
  fi
done < file1.txt

一些解释:

  • grep -F - 将模式 ($sr) 视为固定字符串而不是正则表达式以避免 $ 匹配行尾
  • cut -d',' -f2 - 在逗号处剪切结果,只输出第二部分
  • sort -u - 删除重复项
  • tr -d '\r' - 删除回车 returns
  • paste -sd',' - 用逗号连接行
  • if [ -n "$emails" ]仅在$emails不为空时输出

一个 awk 想法(取代 OP 当前的 for 循环):

awk -F',|\\' '                                         # field delimiter of "," or "\"
FNR==NR { srlist[]
          next
        }
        { email=$NF
          if (email == "") next
          sr=$(NF-1)

          if (sr in srlist && emlist[sr] !~ email) {    # skip duplicate email addresses
                delim=(emlist[sr]) ? "," : ""
                emlist[sr]=emlist[sr] delim email
             }
        }
END     { for (sr in emlist)
              print sr " || To: " emlist[sr]
        }
' file1.txt file2.csv

这会生成:

ABG$ || To: Geboi.torm@usdpwater.org,Josua.Durant@usdpwater.org
META GAIN CORP || To: Mary.White@usdpwater.org,Sed.Rasonn@usdpwater.org,Farah.Karlus@usdpwater.org,Mer.Sus@usdpwater.org

备注:

  • 虽然比 OP 当前的 for 循环多了一些输入,但这种方法需要单次扫描 file2.awk 并消除了 7 个子进程调用(每次通过 OP 的 for 循环)
  • 对于任何可观的数据量,awk 解决方案应该明显更快
  • 对于提供的示例数据:
    • 0.65 秒:awk
    • 1.80 秒:bash/for-loop

shell 循环永远不是处理文本的正确方法,请参阅 why-is-using-a-shell-loop-to-process-text-considered-bad-practice

对数组的数组使用 GNU awk:

$ cat tst.awk
BEGIN { FS="[\\,]" }
NR == FNR {
    tgts[[=10=]]
    next
}
{
    sr = $(NF-1)
    email = $NF
}
(sr in tgts) && (email != "") {
    emails[sr][email]
}
END {
    for ( sr in emails ) {
        printf "%s || To:", sr
        sep = " "
        for ( email in emails[sr] ) {
            printf "%s%s", sep, email
            sep = ","
        }
        print ""
    }
}

$ awk -f tst.awk file1.txt file2.csv
ABG$ || To: Geboi.torm@usdpwater.org,Josua.Durant@usdpwater.org
META GAIN CORP || To: Mary.White@usdpwater.org,Farah.Karlus@usdpwater.org,Mer.Sus@usdpwater.org,Sed.Rasonn@usdpwater.org