java 如何使用信用卡正则表达式提高屏蔽方法的性能

How to improve performance for masking method with credit card regex in java

我有这个功能可以通过输入字符串中的正则表达式识别信用卡并在没有最后 4 位数字的情况下屏蔽它:

public CharSequence obfuscate(CharSequence data) {
    String[] result = data.toString().replaceAll("[^a-zA-Z0-9-_*]", " ").trim().replaceAll(" +", " ").split(" ");
    for(String str : result){
        String originalString = str;
        String cleanString = str.replaceAll("[-_]","");
        CardType cardType = CardType.detect(cleanString);
        if(!CardType.UNKNOWN.equals(cardType)){
            String maskedReplacement = maskWithoutLast4Digits(cleanString ,replacement);
            data = data.toString().replace(originalString , maskedReplacement);
        }
    }
    return data;
}

static String maskWithoutLast4Digits(String input , String replacement) {
    if(input.length() < 4){
        return input;
    }
    return input.replaceAll(".(?=.{4})", replacement);
}

//模式枚举

 public enum CardType {
UNKNOWN,
VISA("^4[0-9]{12}(?:[0-9]{3}){0,2}$"),
MASTERCARD("^(?:5[1-5]|2(?!2([01]|20)|7(2[1-9]|3))[2-7])\d{14}$"),
AMERICAN_EXPRESS("^3[47][0-9]{13}$"),
DINERS_CLUB("^3(?:0[0-5]|[68][0-9])[0-9]{11}$"),
DISCOVER("^6(?:011|[45][0-9]{2})[0-9]{12}$");

private Pattern pattern;

CardType() {
    this.pattern = null;
}

CardType(String pattern) {
    this.pattern = Pattern.compile(pattern);
}

public static CardType detect(String cardNumber) {

    for (CardType cardType : CardType.values()) {
        if (null == cardType.pattern) continue;
        if (cardType.pattern.matcher(cardNumber).matches()) return cardType;
    }

    return UNKNOWN;
}


public Pattern getPattern() {
    return pattern;
}
}

输入1: “有效的美国运通卡:371449635398431”。

输出1: “有效的美国运通卡:***********8431”

输入2: "Invalid credit card: 1234222222222" //不匹配任何信用卡模式

输出2: “无效信用卡:1234222222222”

输入3: “带有垃圾字符的有效美国运通卡:<3714-4963-5398-431>”

输出: “带有垃圾字符的有效美国运通卡:<***********8431>”

这不是进行屏蔽的最佳方法,因为将针对巨大 html 中的每个标记和巨大文本文件中的每一行调用此方法 我怎样才能提高这个方法的性能

如果在卡号进入数据库(或数据文件)之前完成所有验证岂不是很好。

如果您想要的是速度,我不认为对代码的任何部分使用 RegEx 一定是最好的选择,因为处理正则表达式会消耗 很多的时间。例如,以 maskWithoutLast4Digits() 方法中执行字符串屏蔽的行为例:

static String maskWithoutLast4Digits(String input, String replacement) {
    if(input.length() <= 4){
        return input;    // There is nothing to mask!
    }
    return input.replaceAll(".(?=.{4})", replacement);
}
    

并将其替换为以下代码:

static String maskWithoutLast4Digits(String input, String replacement) {
    if (input.length() <= 4) {
        return input; // There is nothing to mask!
    }
    char[] chars = input.toCharArray();
    Arrays.fill(chars, 0, chars.length - 4, replacement);
    return new String(chars);
}

您可能会发现 整体代码 对单个信用卡号字符串执行任务的速度几乎是使用正则表达式的方法的两倍。这是一个相当大的差异。事实上,如果您通过分析器 运行 代码,您很可能会发现其中包含正则表达式的方法对于处理的每个字符串可能会逐渐变慢,而第二种方法将使事情更流畅恒速.

不同的信用卡基本上都是以一个特定的单一数值开头,少数卡除外,例如信用卡号以3开头,那么它总是American Express、Diner's Club或Carte的一部分布兰奇支付网络。如果卡以 4 开头,则它是 Visa。以 5 开头的卡号属于 MasterCards,而以 6 开头的卡属于 Discover 网络。

  Card                   Starts With                   No. of Digits
  ==================================================================
  American Express       can be 34 or usually 37       15
  JBC                    35                            16
  Diners Club            usually 36 or can be 38       14
  VISA                   4                             16
  Mastercard             5                             16
  Discovery              6                             16

您不需要正则表达式来确定信用卡号是否以这些值中的任何一个开头,应该注意的是,有些卡不一定总是包含相同的数字。这可能取决于发卡机构,我相信您已经知道,但无论如何,属于 Visa、Mastercard 和 Discover 支付网络的信用卡有 16 位数字,而属于美国运通支付网络的信用卡只有 16 位数字15. 虽然信用卡最常见的是 16 位数字,但它们 可以 可能少至 13 位,多至 19 位。我没有搜索过您的 RegEx,但我敢肯定他们已经涵盖了(对吗?)。

要取消使用 Regex,您可以改用 switch/case 机制,例如:

// Demo card number...
    String cardNumber = "371449635398431";
    
/* Remove all Characters other than digits. 
   Don't want them for validation.      */
cardNumber = cardNumber.replaceAll("\D", ""); // Remove all Characters other than digits
String cardName;  // Used to store the card's name 
switch (cardNumber.substring(0, 1)) {
    case "3":
        String typeNum = cardNumber.substring(0, 2);
        switch(typeNum) {
            case "34": case "37":
               cardName = "American-Express";
               break;
            case "35":
               cardName = "JBC";
               break;        
            case "30": case "36": case "38": case "39":
                cardName = "Diners-Club";
                break;
            default: 
                cardName = "UNKNOWN";
        }
        break;
    case "4":
        cardName = "Visa";
        break;
    case "5":
        cardName= "Mastercard";
        break;
    case "6":
        cardName = "Discovery";
        break;
    default:
        cardName = "UNKNOWN";
}

如果您要运行 与迭代一堆 RegEx 相比,对这段代码进行速度测试,我相信您会发现 相当大的 速度改进,即使您还想检查每个 case.

中处理的每个卡号的长度

验证信用卡号的最佳方法是使用 Luhn 公式(也称为 Luhn 算法),它基本上遵循这个方案:

  1. 首先将卡号的每个奇数位的值加倍 你正在验证。如果任何给定加倍的结果总和 运算大于 9(例如,7 x 2 = 14 或 9 x 2 = 18), 然后添加该和的数字(例如,14:1 + 4 = 5 或 18:1 + 8 = 9).
  2. 现在将所有结果数字相加,包括偶数, 你没有乘以二。
  3. 如果您收到的总数以0结尾,则卡号有效 根据 Luhn 算法;否则无效。

整个过程当然可以放在一个方法中方便使用,例如:

/**
 * Returns true if card (ie: MasterCard, Visa, etc) number is valid using
 * the 'Luhn Algorithm'.
 *
 * @param cardNumber (String)
 *
 * @return (Boolean)
 */
public static boolean isValidCardNumber(String cardNumber) {
    if (cardNumber == null || cardNumber.trim().isEmpty()) {
        return false;
    }
    cardNumber = cardNumber.replaceAll("\D", "");
    
    // Luhn algorithm
    int nDigits = cardNumber.length();

    int nSum = 0;
    boolean isSecond = false;
    for (int i = nDigits - 1; i >= 0; i--) {
        int d = cardNumber.charAt(i) - '0';
        if (isSecond == true) {
            d = d * 2;
        }
        // We add two digits to handle 
        // cases that make two digits  
        // after doubling 
        nSum += d / 10;
        nSum += d % 10;
        isSecond = !isSecond;
    }
    return (nSum % 10 == 0);
}

将所有这些放在一起,您的代码可能类似于以下内容:

public static String validateCreditCardNumber(String cardNumber) {
    // Remove all Characters other than digits
    cardNumber = cardNumber.replaceAll("\D", ""); // Remove all Characters other than digits
    String cardName;  // Used to store the card's name 
    switch (cardNumber.substring(0, 1)) {
        case "3":
            String typeNum = cardNumber.substring(0, 2);
            switch(typeNum) {
                case "34": case "37":
                   cardName = "American-Express";
                   break;
                case "35":
                   cardName = "JBC";
                   break;        
                case "30": case "36": case "38": case "39":
                    cardName = "Diners-Club";
                    break;
                default: 
                    cardName = "UNKNOWN";
            }
            break;
        case "4":
            cardName = "Visa";
            break;
        case "5":
            cardName= "Mastercard";
            break;
        case "6":
            cardName = "Discovery";
            break;
        default:
            cardName = "UNKNOWN";
    }
    
    if (!cardName.equals("UNKNOWN") && isValidCardNumber(cardNumber)) {
        return ("The " + cardName + " card number (" + maskWithoutLast4Digits(cardNumber, '*') + ") is VALID!");
    }
    else {
        return ("The " + cardName + " card number (" +  maskWithoutLast4Digits(cardNumber, '*') + ") is NOT VALID!");
    }
}

public static String maskWithoutLast4Digits (String input, char replacement) {
    if (input.length() <= 4) {
        return input; // Nothing to mask
    }
    char[] buf = input.toCharArray();
    Arrays.fill(buf, 0, buf.length - 4, replacement);
    return new String(buf);
}

/**
 * Returns true if card (ie: MasterCard, Visa, etc) number is valid using
 * the 'Luhn Algorithm'.
 *
 * @param cardNumber (String)
 *
 * @return (Boolean)
 */
public static boolean isValidCardNumber(String cardNumber) {
    if (cardNumber == null || cardNumber.trim().isEmpty()) {
        return false;
    }
    cardNumber = cardNumber.replaceAll("\D", "");
    
    // Luhn algorithm
    int nDigits = cardNumber.length();

    int nSum = 0;
    boolean isSecond = false;
    for (int i = nDigits - 1; i >= 0; i--) {
        int d = cardNumber.charAt(i) - '0';
        if (isSecond == true) {
            d = d * 2;
        }
        // We add two digits to handle 
        // cases that make two digits  
        // after doubling 
        nSum += d / 10;
        nSum += d % 10;
        isSecond = !isSecond;
    }
    return (nSum % 10 == 0);
}

并且基本上使用上面的:

// Demo card number...
String cardNumber = "371449635398431";
    
String isItValid = validateCreditCardNumber(cardNumber);
System.out.println(isItValid);

输出到控制台将是:

The American-Express card number (***********8431) is VALID!

我不确定您的输出的去向,但最好在显示之前将其归档到某个地方,因为您将始终受限于该过程的速度。此外,将数据分成可管理的块并使用多个 executor-Service 线程来处理数据将大大提高速度,因为可以使用较新的 JDK 之一(Java8 以上)并利用一些较新的方法。

这个 Post 完全基于上面答案中的评论,特别是来自 OP 的评论:

And also the input string can be "my phone number 12345678 and credit card 1234567890"

如果您热衷于使用 RegEx,并且想从特定字符串中检索 phone 号码和/或信用卡号码,那么您可以使用此 Java 正则表达式:

String regex = String regex = "(\+?\d+.{0,1}\d+.{0,1}\d+.{0,1}\d+)|"
                            + "(\+{0,1}\d+{0,3}\s{0,1}\-{0,1}\({0,1}\d+"       // Phone Numbers
                            + "\){0,1}\s{0,1}\-{0,1}\d+\s{0,1}\-{0,1}\d+)";   // Credit Cards

要使用此正则表达式字符串,您需要通过 Pattern/Matcher 机制 运行 它,例如:

String strg = "Valid Phone #: <+1 (212) 555-3456> - "
            + "Valid American Express card 24 with garbage 33.6 characters: <3714-4963-5398-431>";

final java.util.List<String> numbers = new java.util.ArrayList<>();

final String regex = "(\+?\d+.{0,1}\d+.{0,1}\d+.{0,1}\d+)|"       // Phone Numbers
                   + "(\+{0,1}\d+{0,3}\s{0,1}\-{0,1}\({0,1}\d+"  // Credit Cards
                   + "\){0,1}\s{0,1}\-{0,1}\d+\s{0,1}\-{0,1}\d+)";

final java.util.regex.Pattern pattern = java.util.regex.Pattern.compile(regex); // the regex
final java.util.regex.Matcher matcher = pattern.matcher(strg); // your string
while (matcher.find()) { 
    numbers.add(matcher.group()); 
}
        
for (String str : numbers) {
    System.out.println(str);
}

使用上面提供的字符串,控制台 Window 将显示:

+1 (212) 555-3456
3714-4963-5398-431

考虑这些 原始 Phone 数字和 Credit-Card 数字子字符串。将这些字符串放入代表变量中,例如 origPhoneNumorigcreditCardNum。现在验证数字。您已经在上一个答案中提供了验证信用卡号的工具。这是一个验证 phone 数字的方法:

public static boolean isValidPhoneNumber(String phoneNumber) {
    return phoneNumber.matches("^(?!\b(0)\1+\b)(\+?\d{1,3}[. -]?)?"
                             + "\(?\d{3}\)?([. -]?)\d{3}\3\d{4}$");
}

我已经针对 许多 不同国家/地区的 phone 数字以多种不同格式测试了上面提供的正则表达式字符串,并取得了成功。它还针对 许多 种不同格式的信用卡号码进行了测试,再次取得成功。尽管如此,当然总会有一些格式可能会导致特定问题,因为在数据生成源中显然没有数字条目的规则what-so-ever。

看一下我在这个 post 顶部显示的注释行:

And also the input string can be "my phone number 12345678 and credit card 1234567890"

无法区分哪个号码应该是 phone 号码,哪个号码应该是信用卡号码 ,除非 它特别说明与上面的字符串一样在字符串中包含文本。明天或下周可能不会,因为这里看起来没有任何数据输入规则在起作用。

字符串表示 12345678 的 phone 个数,即 8 位数字。该字符串还表示信用卡号 1234567890。在国际上,phone 数字的范围从 9 到多达 13 位数字,具体取决于国家/地区。根据国家/地区的不同,本地位数范围会再次变小。由于 phone 号码(国际)有如此多的数字范围,因此无法知道被视为信用卡号码的号码实际上是信用卡号码,除非字符串在号码之前告诉您或在它之后。如果有的话,它会出现在下一个输入字符串中吗?

为此,我留给您来决定如何处理这种情况,但无论如何,不​​要指望它有任何速度。好像我之前的回答开头写的:

Wouldn't it be nice if all validations were done before the card numbers
went into the database (or data files).

编辑:根据您在较早答案下的最新评论:

我制作了一个小演示:

// Place this code into a method or event somewhere...
String inputString = "my phone number is +54 123 344-4567 and CC 2222 4053 4324 8877 bla bla bla";
System.out.println("Input:  " + inputString);
System.out.println();

final java.util.List<String> numbers = new java.util.ArrayList<>();
    
final String regex = "(\+?\d+.{0,1}\d+.{0,1}\d+.{0,1}\d+)|"       // Phone Numbers
                   + "(\+{0,1}\d+{0,3}\s{0,1}\-{0,1}\({0,1}\d+"  // Credit Cards
                   + "\){0,1}\s{0,1}\-{0,1}\d+\s{0,1}\-{0,1}\d+)";

final java.util.regex.Pattern pattern = java.util.regex.Pattern.compile(regex);
final java.util.regex.Matcher matcher = pattern.matcher(inputString); 
while (matcher.find()) { 
    numbers.add(matcher.group()); 
}
    
String outputString = inputString;
    
for (String str : numbers) {
    //System.out.println(str);  // Uncomment for testing.
    // Is substring a valid Phone Number?
    int len = str.replaceAll("\D","").length();  // Crushed number length
    if (isValidPhoneNumber(str)) {
        outputString = outputString.replace(str, maskAllExceptLast(str, 3, "x"));
    }
    else if (isValidCreditCardNumber(str)) {
        outputString = outputString.replace(str, 
        maskAllExceptLast(str.replaceAll("\D",""), 4, "*"));
    }
}

System.out.println("Output: " + outputString);

支持方法....

public static String maskAllExceptLast (String inputString, int exceptLast_N, String... maskCharacter) {
    if(inputString.length() < exceptLast_N){
        return inputString;
    }
    String mask = "*";  // Default mask character.
    if (maskCharacter.length > 0) {
        mask = maskCharacter[0];
    }
    return inputString.replaceAll(".(?=.{" + exceptLast_N + "})", mask);
}

/**
 * Method to validate a supplied phone number. Currently validates phone
 * numbers supplied in the following fashion:
 * <pre>
 *
 *      Phone number 1234567890 validation result: true
 *      Phone number 123-456-7890 validation result: true
 *      Phone number 123-456-7890 x1234 validation result: true
 *      Phone number 123-456-7890 ext1234 validation result: true
 *      Phone number (123)-456-7890 validation result: true
 *      Phone number 123.456.7890 validation result: true
 *      Phone number 123 456 7890 validation result: true
 *      Phone number 01 123 456 7890 validation result: true
 *      Phone number 1 123-456-7890 validation result: true
 *      Phone number 1-123-456-7890 validation result: true</pre>
 *
 * @param phoneNumber (String) The phone number to check.<br>
 *
 * @return (boolean) True is returned if the supplied phone number is valid.
 *         False if it isn't.
 */
public static boolean isValidPhoneNumber(String phoneNumber) {
    boolean isValid = false;
    long len = phoneNumber.replaceAll("\D","").length(); // Crush the phone Number into only digits
    // Check phone Number's length range. Must be from 8 to 12 digits long
    if (len < 8 || len > 12) {
        return false;
    }
    // Validate phone numbers of format "xxxxxxxx to xxxxxxxxxxxx"
    else if (phoneNumber.matches("\d+")) {
        isValid = true;
    }
    //validating phone number with -, . or spaces
    else if (phoneNumber.matches("^(\+\d{1,3}( )?)?((\(\d{1,3}\))|\d{1,3})[- .]?\d{3,4}[- .]?\d{4}$")) {
        isValid = true;
    }
    /* Validating phone number with -, . or spaces and long distance prefix.
       This regex also ensures:
          - The actual number (withoug LD prefix) should be 10 digits only.
          - For North American, numbers with area code may be surrounded 
              with parentheses ().
          - The country code can be 1 to 3 digits long. Optionally may be 
            preceded by a + sign.
          - There may be dashes, spaces, dots or no spaces between country 
            code, area code and the rest of the number.
          - A valid phone number cannot be all zeros.                 */
    else if (phoneNumber.matches("^(?!\b(0)\1+\b)(\+?\d{1,3}[. -]?)?"
                               + "\(?\d{3}\)?([. -]?)\d{3}\3\d{4}$")) {
        isValid = true;
    }
    //validating phone number with extension length from 3 to 5
    else if (phoneNumber.matches("\d{3}-\d{3}-\d{4}\s(x|(ext))\d{3,5}")) {
        isValid = true;
    } 
    //validating phone number where area code is in braces ()
    else if (phoneNumber.matches("^(\(\d{1,3}\)|\d{1,3})[- .]?\d{2,4}[- .]?\d{4}$")) {
        isValid = true;
    } 
    //return false if nothing matches the input
    else {
        isValid = false;
    }
    return isValid;
}

/**
 * Returns true if card (ie: MasterCard, Visa, etc) number is valid using
 * the 'Luhn Algorithm'. First this method validates for a correct Card 
 * Network Number. The supported networks are:<pre>
 * 
 *    Number            Card Network
 *    ====================================
 *      2               Mastercard (BIN 2-Series) This is NEW!!
 *      30, 36, 38, 39  Diners-Club
 *      34, 37          American Express
 *      35              JBC
 *      4               Visa
 *      5               Mastercard
 *      6               Discovery</pre><br>
 * 
 * Next, the overall Credit Card number is checked with the 'Luhn Algorithm' 
 * for validity.<br>
 *
 * @param cardNumber (String)
 *
 * @return (Boolean) True if valid, false if not.
 */
public static boolean isValidCreditCardNumber(String cardNumber) {
    if (cardNumber == null || cardNumber.trim().isEmpty()) {
        return false;
    }
    // Strip card number of all non-digit characters.
    cardNumber = cardNumber.replaceAll("\D", "");
    
    long len = cardNumber.length();
    if (len < 14 || len > 16) {   // Only going to 16 digits here 
        return false;
    }
        
    // Validate Card Network
    String[] cardNetworks = {"2", "30", "34", "35", "36", "37", "38", "39", "4", "5", "6"};
    String cardNetNum = cardNumber.substring(0, (cardNumber.startsWith("3") ? 2 : 1));
    boolean pass = false;
    for (String netNum : cardNetworks) {
        if (netNum.equals(cardNetNum)) {
            pass = true;
            break;
        }
    }
    if (!pass) {
        return false;  // Invalid Card Network
    }

    // Validate card number with the 'Luhn algorithm'.
    int nDigits = cardNumber.length();

    int nSum = 0;
    boolean isSecond = false;
    for (int i = nDigits - 1; i >= 0; i--) {
        int d = cardNumber.charAt(i) - '0';
        if (isSecond == true) {
            d = d * 2;
        }
        nSum += d / 10;
        nSum += d % 10;
        isSecond = !isSecond;
    }
    return (nSum % 10 == 0);
}

上面的代码绝对不会很快!

调整正则表达式或代码以满足您的特定需求