使用 vi 或 sed 的非捕获模式
Noncapturing patterns with vi or sed
我有一个大约有 100,000 行的文件。是否有一个好的正则表达式可以与 vi 或 sed 一起使用以将输入文件转换为输出?该行的竖线分隔部分可以包含数百个条目
为了总结需要做的事情,我需要在行的开头捕获一个表达式,然后将其附加到每个条目(即它出现在任何管道之前或行的结尾)
输入
G1778-BRAZIL .A3_Alagoas|.A5_Amazonas|.B3_Bahia|.C4_Ceara|.D5_Distrito Federal|.E8_Espirito Santo|.G6_Goias|.G8_Guanabara
G2807-ATLANTIC OCEAN .B3_Baffin Bay|.M4_Mexico, Gulf of|.N55_North Atlantic Ocean|.N6_North Sea
输出
G1778-BRAZIL .A3_Alagoas+G1778-BRAZIL|.A5_Amazonas+G1778-BRAZIL|.B3_Bahia+G1778-BRAZIL|.C4_Ceara+G1778-BRAZIL|.D5_Distrito Federal+G1778-BRAZIL|.E8_Espirito Santo+G1778-BRAZIL|.G6_Goias+G1778-BRAZIL|.G8_Guanabara+G1778-BRAZIL
G2807-ATLANTIC OCEAN .B3_Baffin Bay+G2807-ATLANTIC OCEAN|.M4_Mexico, Gulf of+G2807-ATLANTIC OCEAN|.N55_North Atlantic Ocean+G2807-ATLANTIC OCEAN|.N6_North Sea+G2807-ATLANTIC OCEAN
哦,我明白你现在在做什么了。
perl -F'/[\s|]+/' -nE '
BEGIN { $, = " " }
$a = shift @F;
say $a, join "|", map {"$_+$a"} @F
' file
或
gawk -F'[[:blank:]|]+' '{
printf "%s ",
for (i=2; i<=NF; i++) printf "%s+%s%s", $i, , i == NF ? ORS : "|"
}' file
idk 如果第一个 long space 是一个制表符或多个空格,那么这将以任何一种方式工作,假设捕获的字符串不包含任何反向引用元字符(例如 &
)::
$ awk -F' +|\t' '{gsub(/[|]|$/,"+""&")}1' file
G1778-BRAZIL .A3_Alagoas+G1778-BRAZIL|.A5_Amazonas+G1778-BRAZIL|.B3_Bahia+G1778-BRAZIL|.C4_Ceara+G1778-BRAZIL|.D5_Distrito Federal+G1778-BRAZIL|.E8_Espirito Santo+G1778-BRAZIL|.G6_Goias+G1778-BRAZIL|.G8_Guanabara+G1778-BRAZIL
G2807-ATLANTIC OCEAN .B3_Baffin Bay+G2807-ATLANTIC OCEAN|.M4_Mexico, Gulf of+G2807-ATLANTIC OCEAN|.N55_North Atlantic Ocean+G2807-ATLANTIC OCEAN|.N6_North Sea+G2807-ATLANTIC OCEAN
我有一个大约有 100,000 行的文件。是否有一个好的正则表达式可以与 vi 或 sed 一起使用以将输入文件转换为输出?该行的竖线分隔部分可以包含数百个条目
为了总结需要做的事情,我需要在行的开头捕获一个表达式,然后将其附加到每个条目(即它出现在任何管道之前或行的结尾)
输入
G1778-BRAZIL .A3_Alagoas|.A5_Amazonas|.B3_Bahia|.C4_Ceara|.D5_Distrito Federal|.E8_Espirito Santo|.G6_Goias|.G8_Guanabara
G2807-ATLANTIC OCEAN .B3_Baffin Bay|.M4_Mexico, Gulf of|.N55_North Atlantic Ocean|.N6_North Sea
输出
G1778-BRAZIL .A3_Alagoas+G1778-BRAZIL|.A5_Amazonas+G1778-BRAZIL|.B3_Bahia+G1778-BRAZIL|.C4_Ceara+G1778-BRAZIL|.D5_Distrito Federal+G1778-BRAZIL|.E8_Espirito Santo+G1778-BRAZIL|.G6_Goias+G1778-BRAZIL|.G8_Guanabara+G1778-BRAZIL
G2807-ATLANTIC OCEAN .B3_Baffin Bay+G2807-ATLANTIC OCEAN|.M4_Mexico, Gulf of+G2807-ATLANTIC OCEAN|.N55_North Atlantic Ocean+G2807-ATLANTIC OCEAN|.N6_North Sea+G2807-ATLANTIC OCEAN
哦,我明白你现在在做什么了。
perl -F'/[\s|]+/' -nE '
BEGIN { $, = " " }
$a = shift @F;
say $a, join "|", map {"$_+$a"} @F
' file
或
gawk -F'[[:blank:]|]+' '{
printf "%s ",
for (i=2; i<=NF; i++) printf "%s+%s%s", $i, , i == NF ? ORS : "|"
}' file
idk 如果第一个 long space 是一个制表符或多个空格,那么这将以任何一种方式工作,假设捕获的字符串不包含任何反向引用元字符(例如 &
)::
$ awk -F' +|\t' '{gsub(/[|]|$/,"+""&")}1' file
G1778-BRAZIL .A3_Alagoas+G1778-BRAZIL|.A5_Amazonas+G1778-BRAZIL|.B3_Bahia+G1778-BRAZIL|.C4_Ceara+G1778-BRAZIL|.D5_Distrito Federal+G1778-BRAZIL|.E8_Espirito Santo+G1778-BRAZIL|.G6_Goias+G1778-BRAZIL|.G8_Guanabara+G1778-BRAZIL
G2807-ATLANTIC OCEAN .B3_Baffin Bay+G2807-ATLANTIC OCEAN|.M4_Mexico, Gulf of+G2807-ATLANTIC OCEAN|.N55_North Atlantic Ocean+G2807-ATLANTIC OCEAN|.N6_North Sea+G2807-ATLANTIC OCEAN