perl 正则表达式方括号和单引号
perl regex square brackets and single quotes
有这个字符串:
ABC,-0.5,10Y,10Y,['TEST'],ABC.1000145721ABC,-0.5,20Y,10Y,['TEST'],ABC.1000145722
数据重复
我需要从数据中删除 []' 字符,因此它看起来像这样:
ABC,-0.5,10Y,10Y,TEST,ABC.1000145721ABC,-0.5,20Y,10Y,TEST,ABC.1000145722
我也在尝试拆分数据以将其分配给变量,如下所示:
my($currency, $strike, $tenor, $tenor2,$ado_symbol) = split /,/, $_;
这适用于除 ['TEST'] 部分以外的所有内容。我应该先删除 []' 字符然后保持拆分不变,还是有更简单的方法来做到这一点?
谢谢
拆分后清理 $ado_symbol
:
$ado_symbol =~ s/^\['//;
$ado_symbol =~ s/'\]$//;
知道这一点很有用 - split
采用正则表达式。 (它甚至可以让你捕获,但它会插入到返回的列表中,这就是为什么我有 (?:
用于非捕获组)
我观察到您的数据在分隔符旁边只有 ['
- 那么如何:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
while ( <DATA> ) {
chomp;
my @fields = split /(?:\'])?,(?:\[\')?/;
print Dumper \@fields;
}
__DATA__
ABC,-0.5,10Y,10Y,['TEST'],ABC.1000145721ABC,-0.5,20Y,10Y,['TEST'],ABC.1000145722
输出:
$VAR1 = [
'ABC',
'-0.5',
'10Y',
'10Y',
'TEST',
'ABC.1000145721ABC',
'-0.5',
'20Y',
'10Y',
'TEST',
'ABC.1000145722'
];
my $str = "ABC,-0.5,10Y,10Y,['TEST'],ABC.1000145721ABC,-0.5,20Y,10Y,['TEST'],ABC.1000145722";
$str =~ s/\['|'\]//g;
print $str;
输出是
ABC,-0.5,10Y,10Y,TEST,ABC.1000145721ABC,-0.5,20Y,10Y,TEST,ABC.1000145722
现在可以拆分了。
您可以使用全局正则表达式匹配来查找不是逗号、单引号或方括号的所有子字符串
像这样
use strict;
use warnings 'all';
my $s = q{ABC,-0.5,10Y,10Y,['TEST'],ABC.1000145721ABC,-0.5,20Y,10Y,['TEST'],ABC.1000145722};
my @data = $s =~ /[^,'\[\]]+/g;
my ( $currency, $strike, $tenor, $tenor2, $ado_symbol ) = @data;
print "$currency = $currency\n";
print "$strike = $strike\n";
print "$tenor = $tenor\n";
print "$tenor2 = $tenor2\n";
print "$ado_symbol = $ado_symbol\n";
输出
$currency = ABC
$strike = -0.5
$tenor = 10Y
$tenor2 = 10Y
$ado_symbol = TEST
另一种选择
my $str = "ABC,-0.5,10Y,10Y,['TEST'],ABC.1000145721ABC,-0.5,20Y,10Y,['TEST'],ABC.1000145722";
my ($currency, $strike, $tenor, $tenor2,$ado_symbol) = map{ s/[^A-Z0-9\.-]//g; $_} split ',',$str;
print "$currency, $strike, $tenor, $tenor2, $ado_symbol",$/;
输出为:
ABC, -0.5, 10Y, 10Y, TEST
有这个字符串:
ABC,-0.5,10Y,10Y,['TEST'],ABC.1000145721ABC,-0.5,20Y,10Y,['TEST'],ABC.1000145722
数据重复
我需要从数据中删除 []' 字符,因此它看起来像这样:
ABC,-0.5,10Y,10Y,TEST,ABC.1000145721ABC,-0.5,20Y,10Y,TEST,ABC.1000145722
我也在尝试拆分数据以将其分配给变量,如下所示:
my($currency, $strike, $tenor, $tenor2,$ado_symbol) = split /,/, $_;
这适用于除 ['TEST'] 部分以外的所有内容。我应该先删除 []' 字符然后保持拆分不变,还是有更简单的方法来做到这一点?
谢谢
拆分后清理 $ado_symbol
:
$ado_symbol =~ s/^\['//;
$ado_symbol =~ s/'\]$//;
知道这一点很有用 - split
采用正则表达式。 (它甚至可以让你捕获,但它会插入到返回的列表中,这就是为什么我有 (?:
用于非捕获组)
我观察到您的数据在分隔符旁边只有 ['
- 那么如何:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
while ( <DATA> ) {
chomp;
my @fields = split /(?:\'])?,(?:\[\')?/;
print Dumper \@fields;
}
__DATA__
ABC,-0.5,10Y,10Y,['TEST'],ABC.1000145721ABC,-0.5,20Y,10Y,['TEST'],ABC.1000145722
输出:
$VAR1 = [
'ABC',
'-0.5',
'10Y',
'10Y',
'TEST',
'ABC.1000145721ABC',
'-0.5',
'20Y',
'10Y',
'TEST',
'ABC.1000145722'
];
my $str = "ABC,-0.5,10Y,10Y,['TEST'],ABC.1000145721ABC,-0.5,20Y,10Y,['TEST'],ABC.1000145722";
$str =~ s/\['|'\]//g;
print $str;
输出是
ABC,-0.5,10Y,10Y,TEST,ABC.1000145721ABC,-0.5,20Y,10Y,TEST,ABC.1000145722
现在可以拆分了。
您可以使用全局正则表达式匹配来查找不是逗号、单引号或方括号的所有子字符串
像这样
use strict;
use warnings 'all';
my $s = q{ABC,-0.5,10Y,10Y,['TEST'],ABC.1000145721ABC,-0.5,20Y,10Y,['TEST'],ABC.1000145722};
my @data = $s =~ /[^,'\[\]]+/g;
my ( $currency, $strike, $tenor, $tenor2, $ado_symbol ) = @data;
print "$currency = $currency\n";
print "$strike = $strike\n";
print "$tenor = $tenor\n";
print "$tenor2 = $tenor2\n";
print "$ado_symbol = $ado_symbol\n";
输出
$currency = ABC
$strike = -0.5
$tenor = 10Y
$tenor2 = 10Y
$ado_symbol = TEST
另一种选择
my $str = "ABC,-0.5,10Y,10Y,['TEST'],ABC.1000145721ABC,-0.5,20Y,10Y,['TEST'],ABC.1000145722";
my ($currency, $strike, $tenor, $tenor2,$ado_symbol) = map{ s/[^A-Z0-9\.-]//g; $_} split ',',$str;
print "$currency, $strike, $tenor, $tenor2, $ado_symbol",$/;
输出为:
ABC, -0.5, 10Y, 10Y, TEST