Bash 列更改
Bash column altering
我在列中有一些数据,但是有些数据混淆了我的列号,使 bash 操作混乱,下面的数据是我正在使用的数据(但是有超过 100 万行)。我对第 8 和第 9 列中的数字感兴趣:
2014-05-10 08:47:57.373 3600.633 UDP 114.31.255.90:57844 -> 42.209.2.47:52436 1.3 M 1.8 G 1
2014-05-10 09:50:39.609 3601.385 UDP 114.31.255.90:57844 -> 60.120.101.149:47403 1.0 M 1.5 G 1
2014-05-10 10:00:14.064 3607.106 UDP 114.31.255.90:57844 -> 46.83.205.250:32307 2.0 M 3.0 G 1
2014-05-10 10:03:04.263 3644.192 UDP 114.31.255.90:57844 -> 1.32.33.64:10933 987743 1.4 G 1
2014-05-10 11:07:16.247 546.764 TCP 105.51.244.36:80 -> 114.31.255.222:55580 797919 1.2 G 1
2014-05-10 10:46:15.190 2332.334 UDP 114.31.255.90:57844 -> 43.95.27.215:53394 1.1 M 1.7 G 1
2014-05-10 11:00:49.005 1458.456 UDP 114.31.255.90:57844 -> 39.150.172.138:39326 1.2 M 1.7 G 1
2014-05-09 23:53:03.625 56.271 ICMP 61.114.116.140:3 -> 114.31.255.88:0.3 2 318 1
2014-05-09 23:53:59.833 0.000 UDP 114.31.255.88:15360 -> 24.56.237.230:24752 1 131 1
2014-05-09 23:53:59.835 0.000 UDP 114.31.255.88:15360 -> 154.115.89.25:28904 1 131 1
2014-05-09 23:53:59.767 0.174 TCP 105.51.244.40:80 -> 114.31.255.41:28520 13 6675 1
2014-05-09 23:53:59.409 0.000 UDP 114.31.255.70:53 -> 114.31.255.244:54604 1 536 1
2014-05-09 23:53:59.621 0.333 TCP 105.51.244.40:80 -> 114.31.255.41:28519 16 7034 1
我使用 tr 将所有 space 合并为一个来简化数据处理:
tr -s ' '
这使得使用(下面)更容易:
cut -f [column number(s)] -d ' '
但是,当值具有 G 或 M 时,它会混淆列编号。我想改变例如:
2014-05-10 11:00:49.005 1458.456 UDP 114.31.255.90:57844 -> 39.150.172.138:39326 1.2 M 1.7 G 1
到
2014-05-10 11:00:49.005 1458.456 UDP 114.31.255.90:57844 -> 39.150.172.138:39326 1.2M 1.7G 1
我试过了
tr ' G ' 'G '
tr ' M ' 'M '
也在不同的配置中使用 [:space:] 但是我没有成功。
tr
不像 sed
那样工作,因为它是逐字符翻译的。像这样使用 sed
:
sed 's/ \([MG] \)//g'
解释:
/ \([MG] \)/ # match space followed by letter M or G and followed by another space.
# Also capture matched letter in matched group #1
# replace by back-reference #1
我在列中有一些数据,但是有些数据混淆了我的列号,使 bash 操作混乱,下面的数据是我正在使用的数据(但是有超过 100 万行)。我对第 8 和第 9 列中的数字感兴趣:
2014-05-10 08:47:57.373 3600.633 UDP 114.31.255.90:57844 -> 42.209.2.47:52436 1.3 M 1.8 G 1
2014-05-10 09:50:39.609 3601.385 UDP 114.31.255.90:57844 -> 60.120.101.149:47403 1.0 M 1.5 G 1
2014-05-10 10:00:14.064 3607.106 UDP 114.31.255.90:57844 -> 46.83.205.250:32307 2.0 M 3.0 G 1
2014-05-10 10:03:04.263 3644.192 UDP 114.31.255.90:57844 -> 1.32.33.64:10933 987743 1.4 G 1
2014-05-10 11:07:16.247 546.764 TCP 105.51.244.36:80 -> 114.31.255.222:55580 797919 1.2 G 1
2014-05-10 10:46:15.190 2332.334 UDP 114.31.255.90:57844 -> 43.95.27.215:53394 1.1 M 1.7 G 1
2014-05-10 11:00:49.005 1458.456 UDP 114.31.255.90:57844 -> 39.150.172.138:39326 1.2 M 1.7 G 1
2014-05-09 23:53:03.625 56.271 ICMP 61.114.116.140:3 -> 114.31.255.88:0.3 2 318 1
2014-05-09 23:53:59.833 0.000 UDP 114.31.255.88:15360 -> 24.56.237.230:24752 1 131 1
2014-05-09 23:53:59.835 0.000 UDP 114.31.255.88:15360 -> 154.115.89.25:28904 1 131 1
2014-05-09 23:53:59.767 0.174 TCP 105.51.244.40:80 -> 114.31.255.41:28520 13 6675 1
2014-05-09 23:53:59.409 0.000 UDP 114.31.255.70:53 -> 114.31.255.244:54604 1 536 1
2014-05-09 23:53:59.621 0.333 TCP 105.51.244.40:80 -> 114.31.255.41:28519 16 7034 1
我使用 tr 将所有 space 合并为一个来简化数据处理:
tr -s ' '
这使得使用(下面)更容易:
cut -f [column number(s)] -d ' '
但是,当值具有 G 或 M 时,它会混淆列编号。我想改变例如:
2014-05-10 11:00:49.005 1458.456 UDP 114.31.255.90:57844 -> 39.150.172.138:39326 1.2 M 1.7 G 1
到
2014-05-10 11:00:49.005 1458.456 UDP 114.31.255.90:57844 -> 39.150.172.138:39326 1.2M 1.7G 1
我试过了
tr ' G ' 'G '
tr ' M ' 'M '
也在不同的配置中使用 [:space:] 但是我没有成功。
tr
不像 sed
那样工作,因为它是逐字符翻译的。像这样使用 sed
:
sed 's/ \([MG] \)//g'
解释:
/ \([MG] \)/ # match space followed by letter M or G and followed by another space.
# Also capture matched letter in matched group #1
# replace by back-reference #1