使用 sed （或类似的）删除重复模式之间的任何内容

Question

我实际上是在尝试 "tidy" CSV 中的大量数据。我不需要 "quotes".

中的任何信息

已尝试 sed 's/".*"/""/'，但如果有多个部分在一起，它会删除逗号。

我想从中得到：

1,2,"a",4,"b","c",5

为此：

1,2,,4,,,5

有sed高手帮忙吗？ :)

Answer 1

您可以使用

sed 's/"[^"]*"//g' file > newfile

见online sed demo:

s='1,2,"a",4,"b","c",5'
sed 's/"[^"]*"//g' <<< "$s"
# => 1,2,,4,,,5

详情

"[^"]*" 模式匹配 "，然后是 " 以外的 0 个或多个字符，然后是 "。由于 RHS 为空，因此删除了匹配项。 g 标志使其匹配每行中的所有匹配项。

Answer 2

能否请您尝试以下。

awk -v s1="\"" 'BEGIN{FS=OFS=","} {for(i=1;i<=NF;i++){if($i~s1){$i=""}}} 1' Input_file

解的非单线性形式为：

awk -v s1="\"" '
BEGIN{
  FS=OFS=","
}
{
  for(i=1;i<=NF;i++){
    if($i~s1){
      $i=""
    }
  }
}
1
'  Input_file

详细解释：

awk -v s1="\"" '         ##Starting awk program from here and mentioning variable s1 whose value is "
BEGIN{                   ##Starting BEGIN section of this code here.
  FS=OFS=","             ##Setting field separator and output field separator as comma(,) here.
}
{
  for(i=1;i<=NF;i++){    ##Starting a for loop which traverse through all fields of current line.
    if($i~s1){           ##Checking if current field has " in it if yes then do following.
      $i=""              ##Nullifying current field value here.
    }
  }
}
1                        ##Mentioning 1 will print edited/non-edited line here.
'  Input_file            ##Mentioning Input_file name here.

Answer 3

使用 Perl：

perl -p -e 's/".*?"//g' file

? 强制 * 是非贪婪的。

输出：

1,2,,4,,,5

使用 sed （或类似的）删除重复模式之间的任何内容

Use sed (or similar) to remove anything between repeating patterns

bash

awk

sed

non-greedy