如何从字符串中提取逗号分隔的子字符串?

How to extract comma separated substrings from a string?

需要在组中解析以逗号分隔的算法。

SSH Enabled - version 2.0
Authentication methods:publickey,keyboard-interactive,password
Encryption Algorithms:aes128-ctr,aes192-ctr,aes256-ctr,aes128-cbc,3des-cbc,aes192-cbc,aes256-cbc
MAC Algorithms:hmac-sha1,hmac-sha1-96
Authentication timeout: 120 secs; Authentication retries: 3
Minimum expected Diffie Hellman key size : 1024 bits
IOS Keys in SECSH format(ssh-rsa, base64 encoded):

我试过用逗号分隔它们,但没有得到预期的结果:

^Encryption Algorithms:(.*?)(?:,|$)

预期结果是第 1 组中的每个算法都没有空组

aes128-ctr
aes192-ctr
aes256-ctr
aes128-cbc
3des-cbc
aes192-cbc
aes256-cbc

这可能不是最好的方法,但它可能是将我们的字符串分成三部分的一种方法,甚至可能在 运行 通过 RegEx 引擎将其拆分之前。如果情况并非如此,我们希望有一个表达式,这可能很接近:

(.+Encryption Algorithms:)|([a-z0-9-]+)(?:,|\s)|(MAC.+)


如果你也有换行,你可能想用其他表达式来测试,可能类似于:

([\s\S]+Encryption Algorithms:)|([a-z0-9-]+)(?:,|\s)|(MAC[\s\S]+)

([\w\W]+Encryption Algorithms:)|([a-z0-9-]+)(?:,|\s)|(MAC[\w\W]+)

([\d\D]+Encryption Algorithms:)|([a-z0-9-]+)(?:,|\s)|(MAC[\d\D]+)

Demo 1

Demo 2

正则表达式

如果不需要此表达式,可以在 regex101.com 中对其进行修改或更改。

正则表达式电路

jex.im 可视化正则表达式:

测试

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"([\w\W]+Encryption Algorithms:)|([a-z0-9-]+)(?:,|\s)|(MAC[[\w\W]+)"

test_str = ("SSH Enabled - version 2.0\n"
    "Authentication methods:publickey,keyboard-interactive,password\n"
    "Encryption Algorithms:aes128-ctr,aes192-ctr,aes256-ctr,aes128-cbc,3des-cbc,aes192-cbc,aes256-cbc\n"
    "MAC Algorithms:hmac-sha1,hmac-sha1-96\n"
    "Authentication timeout: 120 secs; Authentication retries: 3\n"
    "Minimum expected Diffie Hellman key size : 1024 bits\n"
    "IOS Keys in SECSH format(ssh-rsa, base64 encoded):\n")

subst = "\2 "

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

if result:
    print (result)

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

演示

const regex = /(.+Encryption Algorithms:)|([a-z0-9-]+)(?:,|\s)|(MAC.+)/gm;
const str = `SSH Enabled - version 2.0 Authentication methods:publickey,keyboard-interactive,password Encryption Algorithms:aes128-ctr,aes192-ctr,aes256-ctr,aes128-cbc,3des-cbc,aes192-cbc,aes256-cbc MAC Algorithms:hmac-sha1,hmac-sha1-96 Authentication timeout: 120 secs; Authentication retries: 3 Minimum expected Diffie Hellman key size : 1024 bits IOS Keys in SECSH format(ssh-rsa, base64 encoded):`;
const subst = ` `;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);

console.log('Substitution result: ', result);

另一种方法是匹配以 Encryption Algorithms: 开头的字符串,然后在组中捕获一个重复模式,该模式与带连字符的部分匹配,并以逗号开头重复。

如果匹配,您可以用逗号分隔第一个捕获组。

^Encryption Algorithms:(\w+-\w+(?:,\w+-\w+)*)

说明

  • ^
  • Encryption Algorithms:
  • (开始抓包
    • \w+-\w+ 匹配 1+ 个单词字符,- 和 1+ 个单词字符
    • (?:,\w+-\w+)* 0+ 次重复逗号后跟 1+ 个单词字符,- 和 1+ 个单词字符
  • ) 关闭捕获组

Regex demo | Python demo

import re
regex = r"^Encryption Algorithms:(\w+-\w+(?:,\w+-\w+)*)"
test_str = ("SSH Enabled - version 2.0\n"
            "Authentication methods:publickey,keyboard-interactive,password\n"
            "Encryption Algorithms:aes128-ctr,aes192-ctr,aes256-ctr,aes128-cbc,3des-cbc,aes192-cbc,aes256-cbc\n"
            "MAC Algorithms:hmac-sha1,hmac-sha1-96\n"
            "Authentication timeout: 120 secs; Authentication retries: 3\n"
            "Minimum expected Diffie Hellman key size : 1024 bits\n"
            "IOS Keys in SECSH format(ssh-rsa, base64 encoded):")

matches = re.search(regex, test_str, re.MULTILINE)
if matches:
    print(matches.group(1).split(","))

结果:

['aes128-ctr', 'aes192-ctr', 'aes256-ctr', 'aes128-cbc', '3des-cbc', 'aes192-cbc', 'aes256-cbc']