perl 正则表达式方括号和单引号

perl regex square brackets and single quotes

有这个字符串:

ABC,-0.5,10Y,10Y,['TEST'],ABC.1000145721ABC,-0.5,20Y,10Y,['TEST'],ABC.1000145722

数据重复

我需要从数据中删除 []' 字符,因此它看起来像这样:

ABC,-0.5,10Y,10Y,TEST,ABC.1000145721ABC,-0.5,20Y,10Y,TEST,ABC.1000145722

我也在尝试拆分数据以将其分配给变量,如下所示:

my($currency, $strike, $tenor, $tenor2,$ado_symbol) = split /,/, $_;

这适用于除 ['TEST'] 部分以外的所有内容。我应该先删除 []' 字符然后保持拆分不变,还是有更简单的方法来做到这一点?

谢谢

拆分后清理 $ado_symbol

$ado_symbol =~ s/^\['//;
$ado_symbol =~ s/'\]$//;

知道这一点很有用 - split 采用正则表达式。 (它甚至可以让你捕获,但它会插入到返回的列表中,这就是为什么我有 (?: 用于非捕获组)

我观察到您的数据在分隔符旁边只有 [' - 那么如何:

#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;

while ( <DATA> ) {
  chomp;
  my @fields = split /(?:\'])?,(?:\[\')?/; 
  print Dumper \@fields;
}

__DATA__
ABC,-0.5,10Y,10Y,['TEST'],ABC.1000145721ABC,-0.5,20Y,10Y,['TEST'],ABC.1000145722

输出:

$VAR1 = [
          'ABC',
          '-0.5',
          '10Y',
          '10Y',
          'TEST',
          'ABC.1000145721ABC',
          '-0.5',
          '20Y',
          '10Y',
          'TEST',
          'ABC.1000145722'
        ];
my $str = "ABC,-0.5,10Y,10Y,['TEST'],ABC.1000145721ABC,-0.5,20Y,10Y,['TEST'],ABC.1000145722";

$str =~ s/\['|'\]//g;

print $str;

输出是

ABC,-0.5,10Y,10Y,TEST,ABC.1000145721ABC,-0.5,20Y,10Y,TEST,ABC.1000145722

现在可以拆分了。

您可以使用全局正则表达式匹配来查找不是逗号、单引号或方括号的所有子字符串

像这样

use strict;
use warnings 'all';

my $s = q{ABC,-0.5,10Y,10Y,['TEST'],ABC.1000145721ABC,-0.5,20Y,10Y,['TEST'],ABC.1000145722};

my @data = $s =~ /[^,'\[\]]+/g;

my ( $currency, $strike, $tenor, $tenor2, $ado_symbol ) = @data;

print "$currency   = $currency\n";
print "$strike     = $strike\n";
print "$tenor      = $tenor\n";
print "$tenor2     = $tenor2\n";
print "$ado_symbol = $ado_symbol\n";

输出

$currency   = ABC
$strike     = -0.5
$tenor      = 10Y
$tenor2     = 10Y
$ado_symbol = TEST

另一种选择

my $str = "ABC,-0.5,10Y,10Y,['TEST'],ABC.1000145721ABC,-0.5,20Y,10Y,['TEST'],ABC.1000145722";

my ($currency, $strike, $tenor, $tenor2,$ado_symbol) = map{ s/[^A-Z0-9\.-]//g; $_} split ',',$str;
print "$currency, $strike, $tenor, $tenor2, $ado_symbol",$/;

输出为:

ABC, -0.5, 10Y, 10Y, TEST