使用 \ 分隔符搜索最后一列,并将与其关联的电子邮件地址保存到变量中
Search on the last column with \ delimiter and save the email address associated to it to a variable
我有两个文件。
file1.txt 包含:
META GAIN CORP
GG$
ABG$
PEPRA_UAT
12GHR
CC$
USDP_MAIN
XQ$
PR$
MIX_DEV
和file2.csv包含:
\fr.usdp.org\SOLE\Home\RD,Mailbox.FRmeshare@usdp.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER FLOOR,Jay.Pau@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER FLOOR,Jay.Pau@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\META GAIN CORP,Mary.White@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\META GAIN CORP,Sed.Rasonn@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\META GAIN CORP,Farah.Karlus@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\META GAIN CORP,Mer.Sus@usdpwater.org
\fr.usdp.org\SOLE\Shares\FR\USDP WATER\ABG$,Geboi.torm@usdpwater.org
\fr.usdp.org\SOLE\Shares\FR\USDP WATER\ABG$,Geboi.torm@usdpwater.org
\fr.usdp.org\SOLE\Shares\FR\USDP WATER\ABG$,Josua.Durant@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\HHR DATABASE,Geboi.torm@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\HHR DB2 EDU,Geboi.torm@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\HHR DB2 EDU,Alex.Gold@usdp.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\NICE SHORT,Leni.Braft@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\PRO DEV,Kath.wetfield@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\DUK 20154 USER,
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\DUK 20154 USER,Carlo.Gomez@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\FARE GRUST,Jason.Desanre@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\XYZ GROUP,Aaron.Lee@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\XYZ TEAM TOOLKIT,Aaron.Lee@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\BILLING ELEMENT,Matheo.Logan@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\RRT_SEC,John.Tian@usdpwater.org
我的脚本中有这个,但如果有空格,我无法准确获取最后一列。
for sr in `cat file1.txt`; do
sname=`echo ${sr} | awk -F: '{ print }'`
emdrs=`grep -Fw "${sname}" file2.csv | awk -F',' '{print}' | sed 's/[[:space:]]//' | xargs | sed -e 's/ /,/g'`
echo "$sname || To: $emdrs" >> details.txt
done
details.txt输出
META || Mary.White@usdpwater.org,Sed.Rasonn@usdpwater.org,Farah.Karlus@usdpwater.org,Mer.Sus@usdpwater.org
GAIN || Mary.White@usdpwater.org,Sed.Rasonn@usdpwater.org,Farah.Karlus@usdpwater.org,Mer.Sus@usdpwater.org
CORP || Mary.White@usdpwater.org,Sed.Rasonn@usdpwater.org,Farah.Karlus@usdpwater.org,Mer.Sus@usdpwater.org
但我想要的是
META GAIN CORP || To: Mary.White@usdpwater.org,Sed.Rasonn@usdpwater.org,Farah.Karlus@usdpwater.org,Mer.Sus@usdpwater.org
而且我应该还可以搜索带有 $ 的字符串,例如这个 ABG$ )并且不包括重复的电子邮件。
ABG$ || To: Geboi.torm@usdpwater.org,Josua.Durant@usdpwater.org
任何帮助将不胜感激。
是这样的吗?
while read -r sr; do
emails="$(grep -F "\${sr}," file2.csv | cut -d',' -f2 | sort -u | tr -d '\r' | paste -sd',')"
if [ -n "$emails" ]; then
echo "$sr || To: $emails"
fi
done < file1.txt
一些解释:
grep -F
- 将模式 ($sr
) 视为固定字符串而不是正则表达式以避免 $
匹配行尾
cut -d',' -f2
- 在逗号处剪切结果,只输出第二部分
sort -u
- 删除重复项
tr -d '\r'
- 删除回车 returns
paste -sd','
- 用逗号连接行
if [ -n "$emails" ]
仅在$emails
不为空时输出
一个 awk
想法(取代 OP 当前的 for
循环):
awk -F',|\\' ' # field delimiter of "," or "\"
FNR==NR { srlist[]
next
}
{ email=$NF
if (email == "") next
sr=$(NF-1)
if (sr in srlist && emlist[sr] !~ email) { # skip duplicate email addresses
delim=(emlist[sr]) ? "," : ""
emlist[sr]=emlist[sr] delim email
}
}
END { for (sr in emlist)
print sr " || To: " emlist[sr]
}
' file1.txt file2.csv
这会生成:
ABG$ || To: Geboi.torm@usdpwater.org,Josua.Durant@usdpwater.org
META GAIN CORP || To: Mary.White@usdpwater.org,Sed.Rasonn@usdpwater.org,Farah.Karlus@usdpwater.org,Mer.Sus@usdpwater.org
备注:
- 虽然比 OP 当前的
for
循环多了一些输入,但这种方法需要单次扫描 file2.awk
并消除了 7 个子进程调用(每次通过 OP 的 for
循环)
- 对于任何可观的数据量,
awk
解决方案应该明显更快
- 对于提供的示例数据:
- 0.65 秒:
awk
- 1.80 秒:
bash/for-loop
shell 循环永远不是处理文本的正确方法,请参阅 why-is-using-a-shell-loop-to-process-text-considered-bad-practice。
对数组的数组使用 GNU awk:
$ cat tst.awk
BEGIN { FS="[\\,]" }
NR == FNR {
tgts[[=10=]]
next
}
{
sr = $(NF-1)
email = $NF
}
(sr in tgts) && (email != "") {
emails[sr][email]
}
END {
for ( sr in emails ) {
printf "%s || To:", sr
sep = " "
for ( email in emails[sr] ) {
printf "%s%s", sep, email
sep = ","
}
print ""
}
}
$ awk -f tst.awk file1.txt file2.csv
ABG$ || To: Geboi.torm@usdpwater.org,Josua.Durant@usdpwater.org
META GAIN CORP || To: Mary.White@usdpwater.org,Farah.Karlus@usdpwater.org,Mer.Sus@usdpwater.org,Sed.Rasonn@usdpwater.org
我有两个文件。
file1.txt 包含:
META GAIN CORP
GG$
ABG$
PEPRA_UAT
12GHR
CC$
USDP_MAIN
XQ$
PR$
MIX_DEV
和file2.csv包含:
\fr.usdp.org\SOLE\Home\RD,Mailbox.FRmeshare@usdp.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER FLOOR,Jay.Pau@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER FLOOR,Jay.Pau@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\META GAIN CORP,Mary.White@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\META GAIN CORP,Sed.Rasonn@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\META GAIN CORP,Farah.Karlus@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\META GAIN CORP,Mer.Sus@usdpwater.org
\fr.usdp.org\SOLE\Shares\FR\USDP WATER\ABG$,Geboi.torm@usdpwater.org
\fr.usdp.org\SOLE\Shares\FR\USDP WATER\ABG$,Geboi.torm@usdpwater.org
\fr.usdp.org\SOLE\Shares\FR\USDP WATER\ABG$,Josua.Durant@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\HHR DATABASE,Geboi.torm@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\HHR DB2 EDU,Geboi.torm@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\HHR DB2 EDU,Alex.Gold@usdp.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\NICE SHORT,Leni.Braft@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\PRO DEV,Kath.wetfield@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\DUK 20154 USER,
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\DUK 20154 USER,Carlo.Gomez@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\FARE GRUST,Jason.Desanre@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\XYZ GROUP,Aaron.Lee@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\XYZ TEAM TOOLKIT,Aaron.Lee@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\BILLING ELEMENT,Matheo.Logan@usdpwater.org
\fr.usdp.org\SOLE\SHARES\FR\USDP WATER\RRT_SEC,John.Tian@usdpwater.org
我的脚本中有这个,但如果有空格,我无法准确获取最后一列。
for sr in `cat file1.txt`; do
sname=`echo ${sr} | awk -F: '{ print }'`
emdrs=`grep -Fw "${sname}" file2.csv | awk -F',' '{print}' | sed 's/[[:space:]]//' | xargs | sed -e 's/ /,/g'`
echo "$sname || To: $emdrs" >> details.txt
done
details.txt输出
META || Mary.White@usdpwater.org,Sed.Rasonn@usdpwater.org,Farah.Karlus@usdpwater.org,Mer.Sus@usdpwater.org
GAIN || Mary.White@usdpwater.org,Sed.Rasonn@usdpwater.org,Farah.Karlus@usdpwater.org,Mer.Sus@usdpwater.org
CORP || Mary.White@usdpwater.org,Sed.Rasonn@usdpwater.org,Farah.Karlus@usdpwater.org,Mer.Sus@usdpwater.org
但我想要的是
META GAIN CORP || To: Mary.White@usdpwater.org,Sed.Rasonn@usdpwater.org,Farah.Karlus@usdpwater.org,Mer.Sus@usdpwater.org
而且我应该还可以搜索带有 $ 的字符串,例如这个 ABG$ )并且不包括重复的电子邮件。
ABG$ || To: Geboi.torm@usdpwater.org,Josua.Durant@usdpwater.org
任何帮助将不胜感激。
是这样的吗?
while read -r sr; do
emails="$(grep -F "\${sr}," file2.csv | cut -d',' -f2 | sort -u | tr -d '\r' | paste -sd',')"
if [ -n "$emails" ]; then
echo "$sr || To: $emails"
fi
done < file1.txt
一些解释:
grep -F
- 将模式 ($sr
) 视为固定字符串而不是正则表达式以避免$
匹配行尾cut -d',' -f2
- 在逗号处剪切结果,只输出第二部分sort -u
- 删除重复项tr -d '\r'
- 删除回车 returnspaste -sd','
- 用逗号连接行if [ -n "$emails" ]
仅在$emails
不为空时输出
一个 awk
想法(取代 OP 当前的 for
循环):
awk -F',|\\' ' # field delimiter of "," or "\"
FNR==NR { srlist[]
next
}
{ email=$NF
if (email == "") next
sr=$(NF-1)
if (sr in srlist && emlist[sr] !~ email) { # skip duplicate email addresses
delim=(emlist[sr]) ? "," : ""
emlist[sr]=emlist[sr] delim email
}
}
END { for (sr in emlist)
print sr " || To: " emlist[sr]
}
' file1.txt file2.csv
这会生成:
ABG$ || To: Geboi.torm@usdpwater.org,Josua.Durant@usdpwater.org
META GAIN CORP || To: Mary.White@usdpwater.org,Sed.Rasonn@usdpwater.org,Farah.Karlus@usdpwater.org,Mer.Sus@usdpwater.org
备注:
- 虽然比 OP 当前的
for
循环多了一些输入,但这种方法需要单次扫描file2.awk
并消除了 7 个子进程调用(每次通过 OP 的for
循环) - 对于任何可观的数据量,
awk
解决方案应该明显更快 - 对于提供的示例数据:
- 0.65 秒:
awk
- 1.80 秒:
bash/for-loop
- 0.65 秒:
shell 循环永远不是处理文本的正确方法,请参阅 why-is-using-a-shell-loop-to-process-text-considered-bad-practice。
对数组的数组使用 GNU awk:
$ cat tst.awk
BEGIN { FS="[\\,]" }
NR == FNR {
tgts[[=10=]]
next
}
{
sr = $(NF-1)
email = $NF
}
(sr in tgts) && (email != "") {
emails[sr][email]
}
END {
for ( sr in emails ) {
printf "%s || To:", sr
sep = " "
for ( email in emails[sr] ) {
printf "%s%s", sep, email
sep = ","
}
print ""
}
}
$ awk -f tst.awk file1.txt file2.csv
ABG$ || To: Geboi.torm@usdpwater.org,Josua.Durant@usdpwater.org
META GAIN CORP || To: Mary.White@usdpwater.org,Farah.Karlus@usdpwater.org,Mer.Sus@usdpwater.org,Sed.Rasonn@usdpwater.org