用于从原始文本中提取数据(聚合)的 RegEx / table

RegEx for extracting data (aggregate) from raw text / table

我正在处理一个正则表达式任务,我需要创建一个正则表达式以在文本文件中识别 > table

所有第一列以 VI- 开始并以 (PP)(DB) 结束的行,提取并添加最后一列的绝对值(即 0.73 + 0.11 ... ).

然后打印总值。

这是附加的文本文件。


FEES          Amount charged to authorize, process and settle card transactions, along with transaction-based and/or fixed amounts charged for specific card processingservices.
            MC-WORLDCARD RESTAURANT                                                                                      Interchange charges                      -.85
            MC-CORP T & E I(US) BUS                                                                                      Interchange charges                            -[=11=].85
            MC-CORP T & E I(US) CORP                                                                                     Interchange charges                            -.18
            MC-WORLD ELITE RESTAURANT                                                                                    Interchange charges                      -.02
            MC-HIGH VAL RESTAURANT                                                                                       Interchange charges                            -.16
            MC-DOMESTIC MERIT III                                                                                        Interchange charges                            -.74
            MC-RESTAURANT (DB)                                                                                           Interchange charges                            -.22
            MC-DOMESTIC MERIT III (DB)                                                                                   Interchange charges                            -.03
            MASTERCARD SALES DISCOUNT .006 DISC RATE TIMES 43.61                                                        Service charges                        -.46
            MC LICENSE VOLUME FEE .000061 DISC RATE TIMES 43.14                                                         Service charges                              -[=11=].19
            MASTERCARD DEBIT SALES DISC .006 DISC RATE TIMES 9.53                                                       Service charges                              -.40
            MASTERCARD AUTH FEE 96 TRANSACTIONS AT .05                                                                          Fees                                    -.80
            MC NETWORK ACCESS AUTH FEE 96 TRANSACTIONS AT .0195                                                                 Fees                                    -.87
        VISA
            VI-US REGULATED COMM (DB)                                                                                    Interchange charges                            -[=11=].51
            VI-CPS SMALL TICKET (PP)                                                                                     Interchange charges                            -[=11=].11
            VISA ASSESSMENT FEE CR .0014 TIMES 64.33                                                                  Interchange charges                            -.75
            VISA ASSESSMENT FEE DB .0013 TIMES 68.68                                                                  Interchange charges                            -.82
            VI-CPS/RESTAURANT (DB)                                                                                       Interchange charges                            -.77
            VI-CORPORATE TRAVEL SVC                                                                                      Interchange charges                            -.73
            VI-CPS/RESTAURANT CREDIT                                                                                     Interchange charges                            -.23
            VI-PURCHASING TRAVEL SVC                                                                                     Interchange charges                            -.23
            VI-ELECTRONIC (US ACQ)                                                                                       Interchange charges                            -[=11=].46
            VI-INTER PREM LAC ISS US ACQ                                                                                 Interchange charges                            -.13
            VI-SIGNATURE PREFERRED CRP ELC                                                                               Interchange charges                      -.70
            VI-SIGNATURE CARD ELECTRONIC                                                                                 Interchange charges                      -.58
            VI-BUSINESS CARD TR2 ELEC T&E                                                                                Interchange charges                            -.21
            VI-BUSINESS CARD TR4 ELEC                                                                                    Interchange charges                            -.97
            VI-BUSINESS CARD CP (DB)                                                                                     Interchange charges                            -[=11=].54
            VI-CPS/RESTAURANT (PP)                                                                                       Interchange charges                            -[=11=].73
            VI-CPS/SMALL TICKET                                                                                          Interchange charges                            -.62
            VI-BUSINESS CARD TR1 ELEC T&E                                                                                Interchange charges                            -.32
            VI-BUSINESS CARD TR3 ELEC T&E                                                                                Interchange charges                            -.46
            VI-CPS SMALL TICKET (DB)                                                                                     Interchange charges                            -.12
            VI-US REGULATED (DB)                                                                                         Interchange charges                            -.89
            VI-CPS/REWARDS 2                                                                                             Interchange charges                      -.87
            VI-US HNW CONSUMER ELECT                                                                                     Interchange charges                            -[=11=].81
            VI-US CPS/SMALL TCKT REG (DB)                                                                                Interchange charges                            -.58
            VISA DEBIT SALES DISCOUNT .006 DISC RATE TIMES 68.68                                                        Service charges                        -.01
            VISA SALES DISCOUNT .006 DISC RATE TIMES 64.33                                                              Service charges                        -.79
            VISA AUTH FEE 280 TRANSACTIONS AT .05                                                                               Fees                              -.00
            ACQUIRER PROCESSOR FEE DB/PP 65 TRANSACTIONS AT .0155                                                               Fees                                    -.01
            ACQUIRER PROCESSOR FEE CREDIT 212 TRANSACTIONS AT .0195                                                             Fees                                    -.13
        DISCOVER
            DSCVR PSL REST PR                                                                                            Interchange charges                            -.01
            DSCVR PSL REST PP                                                                                            Interchange charges                            -[=11=].86
            DISCOVER ASSESSMENT FEE .0013 TIMES 0.98                                                                  Interchange charges                            -.25
            DSCVR COMML ELECT OTHER                                                                                      Interchange charges                            -.06
            DSCVR PSL EXP SVC PR                                                                                         Interchange charges                            -[=11=].62
            DSCVR PSL EXP SVC RW                                                                                         Interchange charges                            -.62
            DSCVR PSL REST RW                                                                                            Interchange charges                      -.91
            DISCOVER SALES DISCOUNT .006 DISC RATE TIMES 0.98                                                           Service charges                              -.77
            DISCOVER DATA USAGE FEE 35 TRANSACTIONS AT .0195                                                               Service charges                              -[=11=].68
            DISCOVER AUTH FEE 35 TRANSACTIONS AT .05                                                                            Fees                                    -.75
            NETWORK AUTHORIZATION FEE 35 TRANSACTIONS AT .0025                                                                  Fees                                    -[=11=].09
        AMERICAN EXPRESS
            AMEX AUTH FEE 17 TRANSACTIONS AT .05                                                                                Fees                                    -[=11=].85

这是 php 代码。

<?php 
    $file = fopen("sampledata.txt", "r") or die("Cannot open file!\n"); 

    $regex = "/VI-\w.+?(\(PP\)|\(DB\))+/g"; // regex, but it selected the individual row > field. see the sreenshot.
    $total = 0;

    while ($line = fgets($file, 1024)) { 

        preg_match_all($regex, $line, $matches, PREG_OFFSET_CAPTURE);

        if (count($matches) > 0) { 

            // sum the matching value.
        } else { 
            echo "No match: "; 
        } 
    }

    fclose($file);

    print_r($total);

?>

正则表达式结果

不需要 preg_match_all,g 标志隐含在 preg_match 中,如果要对它们求和,则必须捕获行尾的值。

使用:/VI-.+?\((?:PP|DB)\).+?$(\d+(?:\.\d\d)?)/

解释:

/               # regex delimiter
  VI-           # literally VI-
  .+?           # 1 or more any character but newline, not greedy
  \(            # opening parenthesis
  (?:           # non capture group
    PP|DB       # PP or DB
  )             # end group
  \)            # closing parenthesis
  .+?           # 1 or more any character but newline, not greedy
  $            # $ sign
  (             # start group 1
    \d+         # 1 or more digits
    (?:         # non capture group
      \.\d\d    # a dot and 2 digits
    )?          # end group, optional
  )             # end group 1
/               # regex delimiter

代码:

$file = fopen("file.txt", "r") or die("Cannot open file!\n"); 

$regex = '/VI-.+?\((?:PP|DB)\).+?$(\d+(?:\.\d\d)?)/'; 
$total = 0;
while ($line = fgets($file, 1024)) { 
    if (preg_match($regex, $line, $matches)) {
        $total += $matches[1];
    } 
}
fclose($file);
echo $total,"\n";

输出(对于给定的例子):

20.25