使用 awk 或 sed 使用列名动态打印列
dynamically print column using column name using awk or sed
我有一个文件,我试图从该文件中使用列名动态打印名为“grant (actual)”的列。我能够通过使用以下命令迭代列号来派生列,当前位置是第 6 列
$ awk '/--/,/Datacenter/ ' cas.txt | awk '{print }'
(actual)
49.9%
55.4%
53.5%
48.7%
(actual)
53.1%
50.0%
47.6%
48.3%
(actual)
50.0%
51.1%
48.9%
51.3%
但我想动态确定列号,这样如果列的位置发生变化,我的脚本应该可以工作。
$ cat cas.txt
Datacenter: DC01
====================
Status=TRUE/FALSE
|/ State=Normal/Leaving/Joining/Moving
-- Address Load USER grant (actual) Host ID Vol
DN 10.0.0.138 221.03 MiB 256 49.9% dd09f7aa STG1
DN 10.0.0.139 173.47 MiB 256 55.4% 53179492 STG1
DN 10.0.0.136 200.08 MiB 256 53.5% 89a28140 STG1
DN 10.0.0.137 318.69 MiB 256 48.7% 8cc9dfac STG1
Datacenter: DC02
====================
Status=TRUE/FALSE
|/ State=Normal/Leaving/Joining/Moving
-- Address Load USER grant (actual) Host ID Vol
DN 10.0.0.142 270.01 MiB 256 53.1% 04210b53 STG1
DN 10.0.0.143 166.65 MiB 256 50.0% d5469c9b STG1
DN 10.0.0.140 199.51 MiB 256 47.6% fcc38a17 STG1
DN 10.0.0.141 170.52 MiB 256 48.3% 3d7b4e59 STG1
Datacenter: DC03
====================
Status=TRUE/FALSE
|/ State=Normal/Leaving/Joining/Moving
-- Address Load USER grant (actual) Host ID Vol
DN 10.0.0.150 229.2 MiB 256 50.0% 0fa51a1a STG1
DN 10.0.0.151 195.88 MiB 256 51.1% e329ac17 STG1
DN 10.0.0.148 147.01 MiB 256 48.9% c14bd7ae STG1
DN 10.0.0.149 298.34 MiB 256 51.3% 6c73d2b5 STG1
考虑以下示例,令 file.txt
内容为
-- Able Baker Charlie
DN 1 2 3
DN 4 5 6
DN 7 8 9
-- Charlie
DN 10
DN 11
DN 12
然后
awk 'BEGIN{colname="Charlie"}/--/{delete names;for(i=1;i<=NF;i+=1){names[$i]=i};next}{print $(names[colname])}' file.txt
给出输出
3
6
9
10
11
12
说明:我使用 colname
变量来存储所需的列名。当遇到包含 -- 的行时,它被视为带有列名的 header 。 names
数组被清除,以防止有前一个块的残余,然后填充,以便列名(键)对应于它的位置(值)。这样做之后,我指示 GNU AWK
处理 next
行,即没有打印任何内容。对于其他行,我通知 GNU AWK
查找与所选名称对应的数字和 print
该列。
(在 gawk 4.2.1 中测试)
结合@Dan 和@Daweo 的想法
awk -F' {2,}' -v col='grant (actual)' '
/^Datacenter/ {i=0}
== "--" {for (i=1; i<=NF; i++) if ($i == col) break; next}
i {print $i}
' cas.txt
49.9%
55.4%
53.5%
48.7%
53.1%
50.0%
47.6%
48.3%
50.0%
51.1%
48.9%
51.3%
如果你想在输出中看到col header,只需删除next
查看您的数据,我们将使用 split()
在 2 个或更多 space 秒处拆分记录 (/ +/
):
$ awk '~/^--$/ { # -- starts the header record
n=split([=10=],h,/ +/) # get field count n of header record
for(i=1;i<=n;i++) # iterate fields
if(h[i]=="grant (actual)") # looking for desired header
break # break once found, i is the field number
}
split([=10=],a,/ +/)==n { # process records with equal amount of fields
print a[i] # and output ith field
}' file
输出:
grant (actual)
49.9%
55.4%
53.5%
48.7%
grant (actual)
53.1%
47.6%
48.3%
grant (actual)
50.0%
51.1%
48.9%
51.3%
对于最后一个字段仅由 1 分隔的记录,上述操作失败 space:
DN 10.0.0.143 166.65 MiB 256 50.0% d5469c9b STG1
使用 FIELDWIDTHS
的 GNU awk 和 split()
的第 4 个参数,您可以创建一个数组(下面的 f[]
),将列名映射到它们的编号,然后您可以打印、比较、重新排序或对列执行任何您喜欢的操作,只需使用列名对该数组进行索引即可:
$ cat tst.awk
/^--/ {
if ( FIELDWIDTHS == "" ) {
wids = ""
numFlds = split([=10=],flds,/ +/,seps)
for ( fldNr=1; fldNr<=numFlds; fldNr++ ) {
f[flds[fldNr]] = fldNr
wids = (fldNr>1 ? wids " " : "") length(flds[fldNr] seps[fldNr])
}
FIELDWIDTHS = wids
[=10=] = [=10=]
}
inBlock = 1
}
inBlock {
if ( /^Datacenter:/ ) {
print ""
inBlock = 0
next
}
for ( i=1; i<=NF; i++ ) {
gsub(/^\s+|\s+$/,"",$i)
}
print $(f["grant (actual)"])
}
$ awk -f tst.awk cas.txt
grant (actual)
49.9%
55.4%
53.5%
48.7%
grant (actual)
53.1%
50.0%
47.6%
48.3%
grant (actual)
50.0%
51.1%
48.9%
51.3%
概要
基于 awk
的解决方案:
- doesn't require gnu-gawk for FIELDWIDTHS/fixed width fields
- doesn't require fudging with FS/OFS/RS/FPAT
- doesn't require a specialized regex engine,
e.g. with back-references support
- doesn't require array-splitting or dealing with the
painfully slow match() function
- doesn't *even* require a single call to any function
输入
Datacenter: DC01
====================
Status=TRUE/FALSE
|/ State=Normal/Leaving/Joining/Moving
-- Address Load USER grant (actual) Host ID Vol
DN 10.0.0.138 221.03 MiB 256 49.9% dd09f7aa STG1
DN 10.0.0.139 173.47 MiB 256 55.4% 53179492 STG1
DN 10.0.0.136 200.08 MiB 256 53.5% 89a28140 STG1
DN 10.0.0.137 318.69 MiB 256 48.7% 8cc9dfac STG1
Datacenter: DC02
====================
Status=TRUE/FALSE
|/ State=Normal/Leaving/Joining/Moving
-- Address Load USER grant (actual) Host ID Vol
DN 10.0.0.142 270.01 MiB 256 53.1% 04210b53 STG1
DN 10.0.0.143 166.65 MiB 256 50.0% d5469c9b STG1
DN 10.0.0.140 199.51 MiB 256 47.6% fcc38a17 STG1
DN 10.0.0.141 170.52 MiB 256 48.3% 3d7b4e59 STG1
Datacenter: DC03
====================
Status=TRUE/FALSE
|/ State=Normal/Leaving/Joining/Moving
-- Address Load USER grant (actual) Host ID Vol
DN 10.0.0.150 229.2 MiB 256 50.0% 0fa51a1a STG1
DN 10.0.0.151 195.88 MiB 256 51.1% e329ac17 STG1
DN 10.0.0.148 147.01 MiB 256 48.9% c14bd7ae STG1
DN 10.0.0.149 298.34 MiB 256 51.3% 6c73d2b5 STG1
代码
< cas.txt |
{m,g}awk ' !NF ? !_ : /^[=]+/ ? ($!_=!__ ? "" : " ") \
: --NF<+_ ? !_ : __+=($!_=(/%/?"":$(_-_^!_)" ")($_))^!_' \_=6
输出
1 grant (actual)
2 49.9%
3 55.4%
4 53.5%
5 48.7%
6
7 grant (actual)
8 53.1%
9 50.0%
10 47.6%
11 48.3%
12
13 grant (actual)
14 50.0%
15 51.1%
16 48.9%
17 51.3%
我有一个文件,我试图从该文件中使用列名动态打印名为“grant (actual)”的列。我能够通过使用以下命令迭代列号来派生列,当前位置是第 6 列
$ awk '/--/,/Datacenter/ ' cas.txt | awk '{print }'
(actual)
49.9%
55.4%
53.5%
48.7%
(actual)
53.1%
50.0%
47.6%
48.3%
(actual)
50.0%
51.1%
48.9%
51.3%
但我想动态确定列号,这样如果列的位置发生变化,我的脚本应该可以工作。
$ cat cas.txt
Datacenter: DC01
====================
Status=TRUE/FALSE
|/ State=Normal/Leaving/Joining/Moving
-- Address Load USER grant (actual) Host ID Vol
DN 10.0.0.138 221.03 MiB 256 49.9% dd09f7aa STG1
DN 10.0.0.139 173.47 MiB 256 55.4% 53179492 STG1
DN 10.0.0.136 200.08 MiB 256 53.5% 89a28140 STG1
DN 10.0.0.137 318.69 MiB 256 48.7% 8cc9dfac STG1
Datacenter: DC02
====================
Status=TRUE/FALSE
|/ State=Normal/Leaving/Joining/Moving
-- Address Load USER grant (actual) Host ID Vol
DN 10.0.0.142 270.01 MiB 256 53.1% 04210b53 STG1
DN 10.0.0.143 166.65 MiB 256 50.0% d5469c9b STG1
DN 10.0.0.140 199.51 MiB 256 47.6% fcc38a17 STG1
DN 10.0.0.141 170.52 MiB 256 48.3% 3d7b4e59 STG1
Datacenter: DC03
====================
Status=TRUE/FALSE
|/ State=Normal/Leaving/Joining/Moving
-- Address Load USER grant (actual) Host ID Vol
DN 10.0.0.150 229.2 MiB 256 50.0% 0fa51a1a STG1
DN 10.0.0.151 195.88 MiB 256 51.1% e329ac17 STG1
DN 10.0.0.148 147.01 MiB 256 48.9% c14bd7ae STG1
DN 10.0.0.149 298.34 MiB 256 51.3% 6c73d2b5 STG1
考虑以下示例,令 file.txt
内容为
-- Able Baker Charlie
DN 1 2 3
DN 4 5 6
DN 7 8 9
-- Charlie
DN 10
DN 11
DN 12
然后
awk 'BEGIN{colname="Charlie"}/--/{delete names;for(i=1;i<=NF;i+=1){names[$i]=i};next}{print $(names[colname])}' file.txt
给出输出
3
6
9
10
11
12
说明:我使用 colname
变量来存储所需的列名。当遇到包含 -- 的行时,它被视为带有列名的 header 。 names
数组被清除,以防止有前一个块的残余,然后填充,以便列名(键)对应于它的位置(值)。这样做之后,我指示 GNU AWK
处理 next
行,即没有打印任何内容。对于其他行,我通知 GNU AWK
查找与所选名称对应的数字和 print
该列。
(在 gawk 4.2.1 中测试)
结合@Dan 和@Daweo 的想法
awk -F' {2,}' -v col='grant (actual)' '
/^Datacenter/ {i=0}
== "--" {for (i=1; i<=NF; i++) if ($i == col) break; next}
i {print $i}
' cas.txt
49.9%
55.4%
53.5%
48.7%
53.1%
50.0%
47.6%
48.3%
50.0%
51.1%
48.9%
51.3%
如果你想在输出中看到col header,只需删除next
查看您的数据,我们将使用 split()
在 2 个或更多 space 秒处拆分记录 (/ +/
):
$ awk '~/^--$/ { # -- starts the header record
n=split([=10=],h,/ +/) # get field count n of header record
for(i=1;i<=n;i++) # iterate fields
if(h[i]=="grant (actual)") # looking for desired header
break # break once found, i is the field number
}
split([=10=],a,/ +/)==n { # process records with equal amount of fields
print a[i] # and output ith field
}' file
输出:
grant (actual)
49.9%
55.4%
53.5%
48.7%
grant (actual)
53.1%
47.6%
48.3%
grant (actual)
50.0%
51.1%
48.9%
51.3%
对于最后一个字段仅由 1 分隔的记录,上述操作失败 space:
DN 10.0.0.143 166.65 MiB 256 50.0% d5469c9b STG1
使用 FIELDWIDTHS
的 GNU awk 和 split()
的第 4 个参数,您可以创建一个数组(下面的 f[]
),将列名映射到它们的编号,然后您可以打印、比较、重新排序或对列执行任何您喜欢的操作,只需使用列名对该数组进行索引即可:
$ cat tst.awk
/^--/ {
if ( FIELDWIDTHS == "" ) {
wids = ""
numFlds = split([=10=],flds,/ +/,seps)
for ( fldNr=1; fldNr<=numFlds; fldNr++ ) {
f[flds[fldNr]] = fldNr
wids = (fldNr>1 ? wids " " : "") length(flds[fldNr] seps[fldNr])
}
FIELDWIDTHS = wids
[=10=] = [=10=]
}
inBlock = 1
}
inBlock {
if ( /^Datacenter:/ ) {
print ""
inBlock = 0
next
}
for ( i=1; i<=NF; i++ ) {
gsub(/^\s+|\s+$/,"",$i)
}
print $(f["grant (actual)"])
}
$ awk -f tst.awk cas.txt
grant (actual)
49.9%
55.4%
53.5%
48.7%
grant (actual)
53.1%
50.0%
47.6%
48.3%
grant (actual)
50.0%
51.1%
48.9%
51.3%
概要
基于 awk
的解决方案:
- doesn't require gnu-gawk for FIELDWIDTHS/fixed width fields
- doesn't require fudging with FS/OFS/RS/FPAT
- doesn't require a specialized regex engine,
e.g. with back-references support
- doesn't require array-splitting or dealing with the
painfully slow match() function
- doesn't *even* require a single call to any function
输入
Datacenter: DC01
====================
Status=TRUE/FALSE
|/ State=Normal/Leaving/Joining/Moving
-- Address Load USER grant (actual) Host ID Vol
DN 10.0.0.138 221.03 MiB 256 49.9% dd09f7aa STG1
DN 10.0.0.139 173.47 MiB 256 55.4% 53179492 STG1
DN 10.0.0.136 200.08 MiB 256 53.5% 89a28140 STG1
DN 10.0.0.137 318.69 MiB 256 48.7% 8cc9dfac STG1
Datacenter: DC02
====================
Status=TRUE/FALSE
|/ State=Normal/Leaving/Joining/Moving
-- Address Load USER grant (actual) Host ID Vol
DN 10.0.0.142 270.01 MiB 256 53.1% 04210b53 STG1
DN 10.0.0.143 166.65 MiB 256 50.0% d5469c9b STG1
DN 10.0.0.140 199.51 MiB 256 47.6% fcc38a17 STG1
DN 10.0.0.141 170.52 MiB 256 48.3% 3d7b4e59 STG1
Datacenter: DC03
====================
Status=TRUE/FALSE
|/ State=Normal/Leaving/Joining/Moving
-- Address Load USER grant (actual) Host ID Vol
DN 10.0.0.150 229.2 MiB 256 50.0% 0fa51a1a STG1
DN 10.0.0.151 195.88 MiB 256 51.1% e329ac17 STG1
DN 10.0.0.148 147.01 MiB 256 48.9% c14bd7ae STG1
DN 10.0.0.149 298.34 MiB 256 51.3% 6c73d2b5 STG1
代码
< cas.txt |
{m,g}awk ' !NF ? !_ : /^[=]+/ ? ($!_=!__ ? "" : " ") \
: --NF<+_ ? !_ : __+=($!_=(/%/?"":$(_-_^!_)" ")($_))^!_' \_=6
输出
1 grant (actual)
2 49.9%
3 55.4%
4 53.5%
5 48.7%
6
7 grant (actual)
8 53.1%
9 50.0%
10 47.6%
11 48.3%
12
13 grant (actual)
14 50.0%
15 51.1%
16 48.9%
17 51.3%