用于从原始文本中提取数据(聚合)的 RegEx / table
RegEx for extracting data (aggregate) from raw text / table
我正在处理一个正则表达式任务,我需要创建一个正则表达式以在文本文件中识别 > table
所有第一列以 VI-
开始并以 (PP)
或 (DB)
结束的行,提取并添加最后一列的绝对值(即 0.73 + 0.11 ... ).
然后打印总值。
这是附加的文本文件。
FEES Amount charged to authorize, process and settle card transactions, along with transaction-based and/or fixed amounts charged for specific card processingservices.
MC-WORLDCARD RESTAURANT Interchange charges -.85
MC-CORP T & E I(US) BUS Interchange charges -[=11=].85
MC-CORP T & E I(US) CORP Interchange charges -.18
MC-WORLD ELITE RESTAURANT Interchange charges -.02
MC-HIGH VAL RESTAURANT Interchange charges -.16
MC-DOMESTIC MERIT III Interchange charges -.74
MC-RESTAURANT (DB) Interchange charges -.22
MC-DOMESTIC MERIT III (DB) Interchange charges -.03
MASTERCARD SALES DISCOUNT .006 DISC RATE TIMES 43.61 Service charges -.46
MC LICENSE VOLUME FEE .000061 DISC RATE TIMES 43.14 Service charges -[=11=].19
MASTERCARD DEBIT SALES DISC .006 DISC RATE TIMES 9.53 Service charges -.40
MASTERCARD AUTH FEE 96 TRANSACTIONS AT .05 Fees -.80
MC NETWORK ACCESS AUTH FEE 96 TRANSACTIONS AT .0195 Fees -.87
VISA
VI-US REGULATED COMM (DB) Interchange charges -[=11=].51
VI-CPS SMALL TICKET (PP) Interchange charges -[=11=].11
VISA ASSESSMENT FEE CR .0014 TIMES 64.33 Interchange charges -.75
VISA ASSESSMENT FEE DB .0013 TIMES 68.68 Interchange charges -.82
VI-CPS/RESTAURANT (DB) Interchange charges -.77
VI-CORPORATE TRAVEL SVC Interchange charges -.73
VI-CPS/RESTAURANT CREDIT Interchange charges -.23
VI-PURCHASING TRAVEL SVC Interchange charges -.23
VI-ELECTRONIC (US ACQ) Interchange charges -[=11=].46
VI-INTER PREM LAC ISS US ACQ Interchange charges -.13
VI-SIGNATURE PREFERRED CRP ELC Interchange charges -.70
VI-SIGNATURE CARD ELECTRONIC Interchange charges -.58
VI-BUSINESS CARD TR2 ELEC T&E Interchange charges -.21
VI-BUSINESS CARD TR4 ELEC Interchange charges -.97
VI-BUSINESS CARD CP (DB) Interchange charges -[=11=].54
VI-CPS/RESTAURANT (PP) Interchange charges -[=11=].73
VI-CPS/SMALL TICKET Interchange charges -.62
VI-BUSINESS CARD TR1 ELEC T&E Interchange charges -.32
VI-BUSINESS CARD TR3 ELEC T&E Interchange charges -.46
VI-CPS SMALL TICKET (DB) Interchange charges -.12
VI-US REGULATED (DB) Interchange charges -.89
VI-CPS/REWARDS 2 Interchange charges -.87
VI-US HNW CONSUMER ELECT Interchange charges -[=11=].81
VI-US CPS/SMALL TCKT REG (DB) Interchange charges -.58
VISA DEBIT SALES DISCOUNT .006 DISC RATE TIMES 68.68 Service charges -.01
VISA SALES DISCOUNT .006 DISC RATE TIMES 64.33 Service charges -.79
VISA AUTH FEE 280 TRANSACTIONS AT .05 Fees -.00
ACQUIRER PROCESSOR FEE DB/PP 65 TRANSACTIONS AT .0155 Fees -.01
ACQUIRER PROCESSOR FEE CREDIT 212 TRANSACTIONS AT .0195 Fees -.13
DISCOVER
DSCVR PSL REST PR Interchange charges -.01
DSCVR PSL REST PP Interchange charges -[=11=].86
DISCOVER ASSESSMENT FEE .0013 TIMES 0.98 Interchange charges -.25
DSCVR COMML ELECT OTHER Interchange charges -.06
DSCVR PSL EXP SVC PR Interchange charges -[=11=].62
DSCVR PSL EXP SVC RW Interchange charges -.62
DSCVR PSL REST RW Interchange charges -.91
DISCOVER SALES DISCOUNT .006 DISC RATE TIMES 0.98 Service charges -.77
DISCOVER DATA USAGE FEE 35 TRANSACTIONS AT .0195 Service charges -[=11=].68
DISCOVER AUTH FEE 35 TRANSACTIONS AT .05 Fees -.75
NETWORK AUTHORIZATION FEE 35 TRANSACTIONS AT .0025 Fees -[=11=].09
AMERICAN EXPRESS
AMEX AUTH FEE 17 TRANSACTIONS AT .05 Fees -[=11=].85
这是 php 代码。
<?php
$file = fopen("sampledata.txt", "r") or die("Cannot open file!\n");
$regex = "/VI-\w.+?(\(PP\)|\(DB\))+/g"; // regex, but it selected the individual row > field. see the sreenshot.
$total = 0;
while ($line = fgets($file, 1024)) {
preg_match_all($regex, $line, $matches, PREG_OFFSET_CAPTURE);
if (count($matches) > 0) {
// sum the matching value.
} else {
echo "No match: ";
}
}
fclose($file);
print_r($total);
?>
正则表达式结果
不需要 preg_match_all,g
标志隐含在 preg_match 中,如果要对它们求和,则必须捕获行尾的值。
使用:/VI-.+?\((?:PP|DB)\).+?$(\d+(?:\.\d\d)?)/
解释:
/ # regex delimiter
VI- # literally VI-
.+? # 1 or more any character but newline, not greedy
\( # opening parenthesis
(?: # non capture group
PP|DB # PP or DB
) # end group
\) # closing parenthesis
.+? # 1 or more any character but newline, not greedy
$ # $ sign
( # start group 1
\d+ # 1 or more digits
(?: # non capture group
\.\d\d # a dot and 2 digits
)? # end group, optional
) # end group 1
/ # regex delimiter
代码:
$file = fopen("file.txt", "r") or die("Cannot open file!\n");
$regex = '/VI-.+?\((?:PP|DB)\).+?$(\d+(?:\.\d\d)?)/';
$total = 0;
while ($line = fgets($file, 1024)) {
if (preg_match($regex, $line, $matches)) {
$total += $matches[1];
}
}
fclose($file);
echo $total,"\n";
输出(对于给定的例子):
20.25
我正在处理一个正则表达式任务,我需要创建一个正则表达式以在文本文件中识别 > table
所有第一列以 VI-
开始并以 (PP)
或 (DB)
结束的行,提取并添加最后一列的绝对值(即 0.73 + 0.11 ... ).
然后打印总值。
这是附加的文本文件。
FEES Amount charged to authorize, process and settle card transactions, along with transaction-based and/or fixed amounts charged for specific card processingservices.
MC-WORLDCARD RESTAURANT Interchange charges -.85
MC-CORP T & E I(US) BUS Interchange charges -[=11=].85
MC-CORP T & E I(US) CORP Interchange charges -.18
MC-WORLD ELITE RESTAURANT Interchange charges -.02
MC-HIGH VAL RESTAURANT Interchange charges -.16
MC-DOMESTIC MERIT III Interchange charges -.74
MC-RESTAURANT (DB) Interchange charges -.22
MC-DOMESTIC MERIT III (DB) Interchange charges -.03
MASTERCARD SALES DISCOUNT .006 DISC RATE TIMES 43.61 Service charges -.46
MC LICENSE VOLUME FEE .000061 DISC RATE TIMES 43.14 Service charges -[=11=].19
MASTERCARD DEBIT SALES DISC .006 DISC RATE TIMES 9.53 Service charges -.40
MASTERCARD AUTH FEE 96 TRANSACTIONS AT .05 Fees -.80
MC NETWORK ACCESS AUTH FEE 96 TRANSACTIONS AT .0195 Fees -.87
VISA
VI-US REGULATED COMM (DB) Interchange charges -[=11=].51
VI-CPS SMALL TICKET (PP) Interchange charges -[=11=].11
VISA ASSESSMENT FEE CR .0014 TIMES 64.33 Interchange charges -.75
VISA ASSESSMENT FEE DB .0013 TIMES 68.68 Interchange charges -.82
VI-CPS/RESTAURANT (DB) Interchange charges -.77
VI-CORPORATE TRAVEL SVC Interchange charges -.73
VI-CPS/RESTAURANT CREDIT Interchange charges -.23
VI-PURCHASING TRAVEL SVC Interchange charges -.23
VI-ELECTRONIC (US ACQ) Interchange charges -[=11=].46
VI-INTER PREM LAC ISS US ACQ Interchange charges -.13
VI-SIGNATURE PREFERRED CRP ELC Interchange charges -.70
VI-SIGNATURE CARD ELECTRONIC Interchange charges -.58
VI-BUSINESS CARD TR2 ELEC T&E Interchange charges -.21
VI-BUSINESS CARD TR4 ELEC Interchange charges -.97
VI-BUSINESS CARD CP (DB) Interchange charges -[=11=].54
VI-CPS/RESTAURANT (PP) Interchange charges -[=11=].73
VI-CPS/SMALL TICKET Interchange charges -.62
VI-BUSINESS CARD TR1 ELEC T&E Interchange charges -.32
VI-BUSINESS CARD TR3 ELEC T&E Interchange charges -.46
VI-CPS SMALL TICKET (DB) Interchange charges -.12
VI-US REGULATED (DB) Interchange charges -.89
VI-CPS/REWARDS 2 Interchange charges -.87
VI-US HNW CONSUMER ELECT Interchange charges -[=11=].81
VI-US CPS/SMALL TCKT REG (DB) Interchange charges -.58
VISA DEBIT SALES DISCOUNT .006 DISC RATE TIMES 68.68 Service charges -.01
VISA SALES DISCOUNT .006 DISC RATE TIMES 64.33 Service charges -.79
VISA AUTH FEE 280 TRANSACTIONS AT .05 Fees -.00
ACQUIRER PROCESSOR FEE DB/PP 65 TRANSACTIONS AT .0155 Fees -.01
ACQUIRER PROCESSOR FEE CREDIT 212 TRANSACTIONS AT .0195 Fees -.13
DISCOVER
DSCVR PSL REST PR Interchange charges -.01
DSCVR PSL REST PP Interchange charges -[=11=].86
DISCOVER ASSESSMENT FEE .0013 TIMES 0.98 Interchange charges -.25
DSCVR COMML ELECT OTHER Interchange charges -.06
DSCVR PSL EXP SVC PR Interchange charges -[=11=].62
DSCVR PSL EXP SVC RW Interchange charges -.62
DSCVR PSL REST RW Interchange charges -.91
DISCOVER SALES DISCOUNT .006 DISC RATE TIMES 0.98 Service charges -.77
DISCOVER DATA USAGE FEE 35 TRANSACTIONS AT .0195 Service charges -[=11=].68
DISCOVER AUTH FEE 35 TRANSACTIONS AT .05 Fees -.75
NETWORK AUTHORIZATION FEE 35 TRANSACTIONS AT .0025 Fees -[=11=].09
AMERICAN EXPRESS
AMEX AUTH FEE 17 TRANSACTIONS AT .05 Fees -[=11=].85
这是 php 代码。
<?php
$file = fopen("sampledata.txt", "r") or die("Cannot open file!\n");
$regex = "/VI-\w.+?(\(PP\)|\(DB\))+/g"; // regex, but it selected the individual row > field. see the sreenshot.
$total = 0;
while ($line = fgets($file, 1024)) {
preg_match_all($regex, $line, $matches, PREG_OFFSET_CAPTURE);
if (count($matches) > 0) {
// sum the matching value.
} else {
echo "No match: ";
}
}
fclose($file);
print_r($total);
?>
正则表达式结果
不需要 preg_match_all,g
标志隐含在 preg_match 中,如果要对它们求和,则必须捕获行尾的值。
使用:/VI-.+?\((?:PP|DB)\).+?$(\d+(?:\.\d\d)?)/
解释:
/ # regex delimiter
VI- # literally VI-
.+? # 1 or more any character but newline, not greedy
\( # opening parenthesis
(?: # non capture group
PP|DB # PP or DB
) # end group
\) # closing parenthesis
.+? # 1 or more any character but newline, not greedy
$ # $ sign
( # start group 1
\d+ # 1 or more digits
(?: # non capture group
\.\d\d # a dot and 2 digits
)? # end group, optional
) # end group 1
/ # regex delimiter
代码:
$file = fopen("file.txt", "r") or die("Cannot open file!\n");
$regex = '/VI-.+?\((?:PP|DB)\).+?$(\d+(?:\.\d\d)?)/';
$total = 0;
while ($line = fgets($file, 1024)) {
if (preg_match($regex, $line, $matches)) {
$total += $matches[1];
}
}
fclose($file);
echo $total,"\n";
输出(对于给定的例子):
20.25