perl正则表达式删除破折号

perl regex to remove dashes

我正在处理一些文件,我想从非日期字段中删除破折号。

我想出了 s/([^0-9]+)-([^0-9]+)/ /g 但只有在字符串中只有一个破折号的情况下才有效,或者我应该说它只会删除一个破折号。

假设我有:

 2014-05-01
 this-and
 this-and-that
 this-and-that-and-that-too
 2015-01-01

我将使用什么正则表达式来生成

 2014-05-01
 this and
 this and that
 this and that and that too
 2015-01-01

丢失 + - 它正在捕获字符串直到最后一个 -,包括之前的任何 - 个字符:

s/([^0-9]|^)-+([^0-9]|$)/ /g;

示例:https://ideone.com/r2CI7v

使用look around

$ perl -pe 's/
    (?<!\d)    # a negative look-behind with a digit: \d
    -          # a dash, literal 
    (?!\d)     # a negative look-ahead  with a digit: \d
/ /gx' file

输出

 2014-05-01
 this and
 this and that
 this and that and that too
 2015-01-01

环顾四周是一些断言,以确保 - 周围没有数字(在本例中)。环顾四周不会进行任何捕获,它实际上只是用来测试断言。随身携带的好工具。

检查:

http://www.perlmonks.org/?node_id=518444
http://www.regular-expressions.info/lookaround.html

不要使用一个正则表达式。没有要求单个正则表达式必须包含所有代码逻辑。

使用一个正则表达式查看它是否是日期,然后使用第二个正则表达式进行转换。如果你把它分成两部分,reader(将来就是你)会更清楚。

#!/usr/bin/perl
use warnings;
use strict;

while ( my $str = <DATA>) {
    chomp $str;
    my $old = $str;
    if ( $str !~ /^\d{4}-\d{2}-\d{2}$/ ) {  # First regex to see if it's a date
        $str =~ s/-/ /g;                    # Second regex to do the transformation
    }
    print "$old\n$str\n\n";
}
__DATA__
2014-05-01
this-and
this-and-that
this-and-that-and-that-too
2015-01-01

运行 给你:

2014-05-01
2014-05-01

this-and
this and

this-and-that
this and that

this-and-that-and-that-too
this and that and that too

2015-01-01
2015-01-01

应该这样做

$line =~ s/(\D)-/ /g;

只要你的程序在$_变量中分别接收每个字段,你只需要

tr/-/ / if /[^-\d]/

正如我在评论中解释的那样,您确实需要在编辑数据之前使用Text::CSV将每条记录拆分为字段。那是因为包含空格的数据需要用双引号引起来,所以像 this-and-that 这样的字段开始时没有空格,但在将连字符转换为空格时需要添加它们。

此程序显示了一个使用您自己的数据的简单示例。

use strict;
use warnings;

use Text::CSV;

my $csv = Text::CSV->new({eol => $/});

while (my $row = $csv->getline(\*DATA)) {
  for (@$row) {
    tr/-/ / unless /^\d\d\d\d-\d\d-\d\d$/;
  }
  $csv->print (\*STDOUT, $row);
}

__DATA__
2014-05-01,this-and-that,this-and-that,this-and-that-and-that-too,2015-01-01

输出

2014-05-01,"this and that","this and that","this and that and that too",2015-01-01