在 awk 中如何将数组转换为字符串?
How do you convert an array to a string in awk?
此问题的目的是为常见问题提供可靠、灵活的解决方案。
处理文本时经常出现的情况是需要将输入拆分为字段,操作字段,然后重新组合以进行打印。例如给定此输入:
$ cat file
A 7 C 3
如果我们想确保每个数字都是 .2f 格式,并且我们想保留间距 before/after/between 字段,那么我们可以这样写(使用 GNU awk 作为 split() 的第 4 个参数):
$ cat tst.awk
{
split([=11=],flds,FS,seps)
for (i in flds) {
if (flds[i] ~ /[0-9]/) {
flds[i] = sprintf("%.2f",flds[i])
}
}
#### print the flds[] array, interleaving seps[] values:
printf "%s", seps[0]
for (i=1; i in flds; i++) {
printf "%s%s", flds[i], seps[i]
}
print ""
#####
}
$ awk -f tst.awk file
A 7.00 C 3.00
最后一个循环,我们将一个数组展平为一个字符串以供打印,这是许多 awk 脚本所共有的。有时分隔符存储在与上面不同的数组中,有时它们是特定字符,有时不需要它们。此外,我们希望 flds[] 打印的顺序可以基于像上面那样的数字升序索引,也可以是降序(例如模仿 UNIX 工具 "rev")或者它可以基于 flds[]值而不是它们的索引。
那么 - 是否有 awk 实用函数可以使用提供的分隔符按指定顺序将数组转换为字符串?
我们一直在与 GNU awk 开发人员讨论提供问题中描述的功能,但直到 if/when 到来,下面的用户-space gawk 特定(对于 sorted_in
) 函数将完成这项工作。注意不要将元素添加到调用它之前不存在的 flds、seps 或 PROCINFO 数组。可以按如下方式使用:
$ cat tst.awk
{
split([=10=],flds,FS,seps)
for (i in flds) {
if (flds[i] ~ /[0-9]/) {
flds[i] = sprintf("%.2f",flds[i])
}
}
print "arr2str() usage examples:"
print "1)", arr2str(flds,OFS)
print "2)", arr2str(flds,seps)
print "3)", arr2str(flds,seps,"@ind_num_desc")
print "4)", arr2str(flds,seps,"@val_str_asc")
print "5)", arr2str(flds,",")
}
$ awk -f arr2str.awk -f tst.awk file
arr2str() usage examples:
1) A 7.00 C 3.00
2) A 7.00 C 3.00
3) 3.00 C 7.00 A
4) 3.007.00 A C
5) A,7.00,C,3.00
.
$ cat arr2str.awk
# Usage:
# arr2str(flds[,seps,[sortOrder]])
#
# flds:
# This function converts the mandatory "flds" array argument into a string.
#
# seps:
# If "seps" is not present then the "flds" values will simply be concatenated
# in the returned string.
#
# If "seps" is present and is a string then that "seps" value will be inserted
# between each "flds" value in the returned string.
#
# If "seps" is present and is an array then each "seps" value with the same index
# as a "flds" index will be inserted in the returned string before or after
# (sort order dependent) the corresponding "flds" value with that same index.
# - All "seps" values that do not have an index in "flds" will be inserted in
# the returned string before or after all of the "flds" and other "seps" values.
# This ensures that a "seps" array that, for example, starts at zero as a result
# of a previous split(str,flds,re,seps) will have its zeroth entry included.
#
# sortOrder:
# If "sortOrder" is present then it will be used as the order the "flds" values
# are visited in, otherwise it uses PROCINFO["sorted_in"] if set, otherwise it
# uses ascending numeric indices.
# - If the sort order is descending (ends in "desc") and "seps" is an array then
# the "seps" values are inserted before each "flds" value, otherwise after them.
#
# Example:
# $ cat tst.awk
# BEGIN {
# orig = ",a+b:c-d="
# split(orig,flds,/[^[:alpha:]]/,seps)
#
# printf "orig: <%s>\n", orig
# printf "asc: <%s>\n", arr2str(flds,seps)
# printf "desc: <%s>\n", arr2str(flds,seps,"@ind_num_desc")
# }
# $ awk -f arr2str.awk -f tst.awk
# orig: <,a+b:c-d=>
# asc: <,a+b:c-d=>
# desc: <=d-c:b+a,>
function arr2str(flds, seps, sortOrder, sortedInPresent, sortedInValue, currIdx, prevIdx, idxCnt, outStr) {
if ( "sorted_in" in PROCINFO ) {
sortedInPresent = 1
sortedInValue = PROCINFO["sorted_in"]
}
if ( sortOrder == "" ) {
sortOrder = (sortedInPresent ? sortedInValue : "@ind_num_asc")
}
PROCINFO["sorted_in"] = sortOrder
if ( isarray(seps) ) {
# An array of separators.
if ( sortOrder ~ /desc$/ ) {
for (currIdx in flds) {
outStr = outStr (currIdx in seps ? seps[currIdx] : "") flds[currIdx]
}
}
for (currIdx in seps) {
if ( !(currIdx in flds) ) {
outStr = outStr seps[currIdx]
}
}
if ( sortOrder !~ /desc$/ ) {
for (currIdx in flds) {
outStr = outStr flds[currIdx] (currIdx in seps ? seps[currIdx] : "")
}
}
}
else {
# Fixed scalar separator.
# We would use this if we could distinguish an unset variable arg from a missing arg:
# seps = (magic_argument_present_test == true ? seps : OFS)
# but we cant so just use whatever value was passed in.
for (currIdx in flds) {
outStr = outStr (idxCnt++ ? seps : "") flds[currIdx]
}
}
if ( sortedInPresent ) {
PROCINFO["sorted_in"] = sortedInValue
}
else {
delete PROCINFO["sorted_in"]
}
return outStr
}
此问题的目的是为常见问题提供可靠、灵活的解决方案。
处理文本时经常出现的情况是需要将输入拆分为字段,操作字段,然后重新组合以进行打印。例如给定此输入:
$ cat file
A 7 C 3
如果我们想确保每个数字都是 .2f 格式,并且我们想保留间距 before/after/between 字段,那么我们可以这样写(使用 GNU awk 作为 split() 的第 4 个参数):
$ cat tst.awk
{
split([=11=],flds,FS,seps)
for (i in flds) {
if (flds[i] ~ /[0-9]/) {
flds[i] = sprintf("%.2f",flds[i])
}
}
#### print the flds[] array, interleaving seps[] values:
printf "%s", seps[0]
for (i=1; i in flds; i++) {
printf "%s%s", flds[i], seps[i]
}
print ""
#####
}
$ awk -f tst.awk file
A 7.00 C 3.00
最后一个循环,我们将一个数组展平为一个字符串以供打印,这是许多 awk 脚本所共有的。有时分隔符存储在与上面不同的数组中,有时它们是特定字符,有时不需要它们。此外,我们希望 flds[] 打印的顺序可以基于像上面那样的数字升序索引,也可以是降序(例如模仿 UNIX 工具 "rev")或者它可以基于 flds[]值而不是它们的索引。
那么 - 是否有 awk 实用函数可以使用提供的分隔符按指定顺序将数组转换为字符串?
我们一直在与 GNU awk 开发人员讨论提供问题中描述的功能,但直到 if/when 到来,下面的用户-space gawk 特定(对于 sorted_in
) 函数将完成这项工作。注意不要将元素添加到调用它之前不存在的 flds、seps 或 PROCINFO 数组。可以按如下方式使用:
$ cat tst.awk
{
split([=10=],flds,FS,seps)
for (i in flds) {
if (flds[i] ~ /[0-9]/) {
flds[i] = sprintf("%.2f",flds[i])
}
}
print "arr2str() usage examples:"
print "1)", arr2str(flds,OFS)
print "2)", arr2str(flds,seps)
print "3)", arr2str(flds,seps,"@ind_num_desc")
print "4)", arr2str(flds,seps,"@val_str_asc")
print "5)", arr2str(flds,",")
}
$ awk -f arr2str.awk -f tst.awk file
arr2str() usage examples:
1) A 7.00 C 3.00
2) A 7.00 C 3.00
3) 3.00 C 7.00 A
4) 3.007.00 A C
5) A,7.00,C,3.00
.
$ cat arr2str.awk
# Usage:
# arr2str(flds[,seps,[sortOrder]])
#
# flds:
# This function converts the mandatory "flds" array argument into a string.
#
# seps:
# If "seps" is not present then the "flds" values will simply be concatenated
# in the returned string.
#
# If "seps" is present and is a string then that "seps" value will be inserted
# between each "flds" value in the returned string.
#
# If "seps" is present and is an array then each "seps" value with the same index
# as a "flds" index will be inserted in the returned string before or after
# (sort order dependent) the corresponding "flds" value with that same index.
# - All "seps" values that do not have an index in "flds" will be inserted in
# the returned string before or after all of the "flds" and other "seps" values.
# This ensures that a "seps" array that, for example, starts at zero as a result
# of a previous split(str,flds,re,seps) will have its zeroth entry included.
#
# sortOrder:
# If "sortOrder" is present then it will be used as the order the "flds" values
# are visited in, otherwise it uses PROCINFO["sorted_in"] if set, otherwise it
# uses ascending numeric indices.
# - If the sort order is descending (ends in "desc") and "seps" is an array then
# the "seps" values are inserted before each "flds" value, otherwise after them.
#
# Example:
# $ cat tst.awk
# BEGIN {
# orig = ",a+b:c-d="
# split(orig,flds,/[^[:alpha:]]/,seps)
#
# printf "orig: <%s>\n", orig
# printf "asc: <%s>\n", arr2str(flds,seps)
# printf "desc: <%s>\n", arr2str(flds,seps,"@ind_num_desc")
# }
# $ awk -f arr2str.awk -f tst.awk
# orig: <,a+b:c-d=>
# asc: <,a+b:c-d=>
# desc: <=d-c:b+a,>
function arr2str(flds, seps, sortOrder, sortedInPresent, sortedInValue, currIdx, prevIdx, idxCnt, outStr) {
if ( "sorted_in" in PROCINFO ) {
sortedInPresent = 1
sortedInValue = PROCINFO["sorted_in"]
}
if ( sortOrder == "" ) {
sortOrder = (sortedInPresent ? sortedInValue : "@ind_num_asc")
}
PROCINFO["sorted_in"] = sortOrder
if ( isarray(seps) ) {
# An array of separators.
if ( sortOrder ~ /desc$/ ) {
for (currIdx in flds) {
outStr = outStr (currIdx in seps ? seps[currIdx] : "") flds[currIdx]
}
}
for (currIdx in seps) {
if ( !(currIdx in flds) ) {
outStr = outStr seps[currIdx]
}
}
if ( sortOrder !~ /desc$/ ) {
for (currIdx in flds) {
outStr = outStr flds[currIdx] (currIdx in seps ? seps[currIdx] : "")
}
}
}
else {
# Fixed scalar separator.
# We would use this if we could distinguish an unset variable arg from a missing arg:
# seps = (magic_argument_present_test == true ? seps : OFS)
# but we cant so just use whatever value was passed in.
for (currIdx in flds) {
outStr = outStr (idxCnt++ ? seps : "") flds[currIdx]
}
}
if ( sortedInPresent ) {
PROCINFO["sorted_in"] = sortedInValue
}
else {
delete PROCINFO["sorted_in"]
}
return outStr
}