使用 vi 或 sed 的非捕获模式

Noncapturing patterns with vi or sed

我有一个大约有 100,000 行的文件。是否有一个好的正则表达式可以与 vi 或 sed 一起使用以将输入文件转换为输出?该行的竖线分隔部分可以包含数百个条目

为了总结需要做的事情,我需要在行的开头捕获一个表达式,然后将其附加到每个条目(即它出现在任何管道之前或行的结尾)

输入

G1778-BRAZIL    .A3_Alagoas|.A5_Amazonas|.B3_Bahia|.C4_Ceara|.D5_Distrito Federal|.E8_Espirito Santo|.G6_Goias|.G8_Guanabara
G2807-ATLANTIC OCEAN    .B3_Baffin Bay|.M4_Mexico, Gulf of|.N55_North Atlantic Ocean|.N6_North Sea

输出

G1778-BRAZIL    .A3_Alagoas+G1778-BRAZIL|.A5_Amazonas+G1778-BRAZIL|.B3_Bahia+G1778-BRAZIL|.C4_Ceara+G1778-BRAZIL|.D5_Distrito Federal+G1778-BRAZIL|.E8_Espirito Santo+G1778-BRAZIL|.G6_Goias+G1778-BRAZIL|.G8_Guanabara+G1778-BRAZIL
G2807-ATLANTIC OCEAN    .B3_Baffin Bay+G2807-ATLANTIC OCEAN|.M4_Mexico, Gulf of+G2807-ATLANTIC OCEAN|.N55_North Atlantic Ocean+G2807-ATLANTIC OCEAN|.N6_North Sea+G2807-ATLANTIC OCEAN

哦,我明白你现在在做什么了。

perl -F'/[\s|]+/' -nE '
    BEGIN { $, = " " }
    $a = shift @F; 
    say $a, join "|", map {"$_+$a"} @F
' file

gawk -F'[[:blank:]|]+' '{
    printf "%s ", 
    for (i=2; i<=NF; i++) printf "%s+%s%s", $i, , i == NF ? ORS : "|"
}' file

idk 如果第一个 long space 是一个制表符或多个空格,那么这将以任何一种方式工作,假设捕获的字符串不包含任何反向引用元字符(例如 &)::

$ awk -F'  +|\t' '{gsub(/[|]|$/,"+""&")}1' file
G1778-BRAZIL    .A3_Alagoas+G1778-BRAZIL|.A5_Amazonas+G1778-BRAZIL|.B3_Bahia+G1778-BRAZIL|.C4_Ceara+G1778-BRAZIL|.D5_Distrito Federal+G1778-BRAZIL|.E8_Espirito Santo+G1778-BRAZIL|.G6_Goias+G1778-BRAZIL|.G8_Guanabara+G1778-BRAZIL
G2807-ATLANTIC OCEAN    .B3_Baffin Bay+G2807-ATLANTIC OCEAN|.M4_Mexico, Gulf of+G2807-ATLANTIC OCEAN|.N55_North Atlantic Ocean+G2807-ATLANTIC OCEAN|.N6_North Sea+G2807-ATLANTIC OCEAN