gawk gensub 中 (GNU coreutils) 日期的命令替换
Command substitution for (GNU coreutils) date in gawk gensub
我有一个数据文件,其中包含格式为 yy-dd-mm
.
的大量(~ 5K)日期行
典型的文件行可能是:
bla bla 21-04-26 blabla blabla 18-01-28 bla bla bla bla 19-01-12 blabla
我需要为任何一个日期做这种替换:
$ date --date="18-01-28" "+%A, %d %B %Y"
Sunday, 28 January 2018
我已经使用 sed 解决了这个问题(有关详细信息,请参阅 post 脚本)。
我想改用 gawk。
我想出了这个命令:
$ gawk '{b = gensub(/([0-9]{2}-[0-9]{2}-[0-9]{2})/,"$(date --date=\"\1\" \"+%A, %d %B %Y\")", "g")}; {print b}'
问题是bash没有在gensub里面展开date命令,其实我得到:
$ echo "bla bla 21-04-26 blabla blabla 18-01-28 bla bla bla bla 19-01-12 blabla" | gawk '{b = gensub(/([0-9]{2}-[0-9]{2}-[0-9]{2})/,"$(date --date=\"\1\" \"+%A, %d %B %Y\")", "g")}; {print b}'
bla bla $(date --date="21-04-26" "+%A, %d %B %Y") blabla blabla $(date --date="18-01-28" "+%A, %d %B %Y") bla bla bla bla $(date --date="19-01-12" "+%A, %d %B %Y") blabla
我不知道如何修改 gawk 命令以获得所需的结果:
bla bla Monday, 26 April 2021 blabla blabla Sunday, 28 January 2018 bla bla bla bla Saturday, 12 January 2019 blabla
post 脚本:
关于sed的问题,我用这个脚本解决了
#!/bin/bash
#pathFile hard-coded here
pathFile='./data.txt'
#treshold to avoid "to many arguments" error with sed
maxCount=1000
counter=0
#list of dates in the data file
dateList=($(egrep -o "[0-9]{2}-[0-9]{2}-[0-9]{2}" "$pathFile" | sort | uniq))
#string to pass multiple instruction to sed
sedCommand=''
for item in ${dateList[@]}
do
sedCommand+="s/"$item"/"$(date --date="$item" "+%A, %d %B %Y")"/g;"
(( counter++ ))
if [[ $counter -gt $maxCount ]]
then
sed -i "$sedCommand" "$pathFile"
counter=0
sedCommand=''
fi
done
[[ ! -z "$sedCommand" ]] && sed -i "$sedCommand" "$pathFile"
Gawk 具有处理 date/time 的内置函数,这比调用外部 date
命令要快得多。
示例输入:
# cat file
79-03-21 | 21-01-01
79-04-17 | 20-12-31
gawk 脚本:
# cat date.awk
{
while (match([=11=], /([0-9]{2})-([0-9]{2})-([0-9]{2})/, arr) ) {
date = sprintf("%s-%s-%s", arr[1], arr[2], arr[3])
# \_YY \_MM \_DD
if (arr[1] >= 70) {
time = sprintf("19%s %s %s 1 0 0", arr[1], arr[2], arr[3])
# YYYY MM DD HH MM SS
} else {
time = sprintf("20%s %s %s 1 0 0", arr[1], arr[2], arr[3])
}
secs = mktime(time)
new_date = strftime("%A, %d %B %Y", secs)
[=11=] = gensub(date, new_date, "g")
}
print
}
结果:
# gawk -f date.awk file
Wednesday, 21 March 1979 | Friday, 01 January 2021
Tuesday, 17 April 1979 | Thursday, 31 December 2020
只是为了展示如何使用 awk 的管道进行“命令替换”—
$ cat foo.awk
{
while (match([=10=], /([0-9]{2}-[0-9]{2}-[0-9]{2})/, arr) ) {
date = arr[1]
cmd = "date -d " date " +'%A, %d %B %Y' "
cmd | getline new_date
# pipes are not closed automatically!
close(cmd)
[=10=] = gensub(date, new_date, "g")
}
print
}
$ cat file
79-03-21 | 21-01-01
79-04-17 | 20-12-31
$ gawk -f foo.awk file
Wednesday, 21 March 1979 | Friday, 01 January 2021
Tuesday, 17 April 1979 | Thursday, 31 December 2020
我有一个数据文件,其中包含格式为 yy-dd-mm
.
典型的文件行可能是:
bla bla 21-04-26 blabla blabla 18-01-28 bla bla bla bla 19-01-12 blabla
我需要为任何一个日期做这种替换:
$ date --date="18-01-28" "+%A, %d %B %Y"
Sunday, 28 January 2018
我已经使用 sed 解决了这个问题(有关详细信息,请参阅 post 脚本)。
我想改用 gawk。 我想出了这个命令:
$ gawk '{b = gensub(/([0-9]{2}-[0-9]{2}-[0-9]{2})/,"$(date --date=\"\1\" \"+%A, %d %B %Y\")", "g")}; {print b}'
问题是bash没有在gensub里面展开date命令,其实我得到:
$ echo "bla bla 21-04-26 blabla blabla 18-01-28 bla bla bla bla 19-01-12 blabla" | gawk '{b = gensub(/([0-9]{2}-[0-9]{2}-[0-9]{2})/,"$(date --date=\"\1\" \"+%A, %d %B %Y\")", "g")}; {print b}'
bla bla $(date --date="21-04-26" "+%A, %d %B %Y") blabla blabla $(date --date="18-01-28" "+%A, %d %B %Y") bla bla bla bla $(date --date="19-01-12" "+%A, %d %B %Y") blabla
我不知道如何修改 gawk 命令以获得所需的结果:
bla bla Monday, 26 April 2021 blabla blabla Sunday, 28 January 2018 bla bla bla bla Saturday, 12 January 2019 blabla
post 脚本:
关于sed的问题,我用这个脚本解决了
#!/bin/bash
#pathFile hard-coded here
pathFile='./data.txt'
#treshold to avoid "to many arguments" error with sed
maxCount=1000
counter=0
#list of dates in the data file
dateList=($(egrep -o "[0-9]{2}-[0-9]{2}-[0-9]{2}" "$pathFile" | sort | uniq))
#string to pass multiple instruction to sed
sedCommand=''
for item in ${dateList[@]}
do
sedCommand+="s/"$item"/"$(date --date="$item" "+%A, %d %B %Y")"/g;"
(( counter++ ))
if [[ $counter -gt $maxCount ]]
then
sed -i "$sedCommand" "$pathFile"
counter=0
sedCommand=''
fi
done
[[ ! -z "$sedCommand" ]] && sed -i "$sedCommand" "$pathFile"
Gawk 具有处理 date/time 的内置函数,这比调用外部 date
命令要快得多。
示例输入:
# cat file
79-03-21 | 21-01-01
79-04-17 | 20-12-31
gawk 脚本:
# cat date.awk
{
while (match([=11=], /([0-9]{2})-([0-9]{2})-([0-9]{2})/, arr) ) {
date = sprintf("%s-%s-%s", arr[1], arr[2], arr[3])
# \_YY \_MM \_DD
if (arr[1] >= 70) {
time = sprintf("19%s %s %s 1 0 0", arr[1], arr[2], arr[3])
# YYYY MM DD HH MM SS
} else {
time = sprintf("20%s %s %s 1 0 0", arr[1], arr[2], arr[3])
}
secs = mktime(time)
new_date = strftime("%A, %d %B %Y", secs)
[=11=] = gensub(date, new_date, "g")
}
print
}
结果:
# gawk -f date.awk file
Wednesday, 21 March 1979 | Friday, 01 January 2021
Tuesday, 17 April 1979 | Thursday, 31 December 2020
只是为了展示如何使用 awk 的管道进行“命令替换”—
$ cat foo.awk
{
while (match([=10=], /([0-9]{2}-[0-9]{2}-[0-9]{2})/, arr) ) {
date = arr[1]
cmd = "date -d " date " +'%A, %d %B %Y' "
cmd | getline new_date
# pipes are not closed automatically!
close(cmd)
[=10=] = gensub(date, new_date, "g")
}
print
}
$ cat file
79-03-21 | 21-01-01
79-04-17 | 20-12-31
$ gawk -f foo.awk file
Wednesday, 21 March 1979 | Friday, 01 January 2021
Tuesday, 17 April 1979 | Thursday, 31 December 2020