根据格式获取某些项目

Question

我有一个值列表，有些只是数字，有些由单词组成，有些则是两者的混合。我只想select那些由组合数字、单个字母、数字组成的项目。

让我解释一下，这是我的价值观列表

l = ['980X2350', 'DO_UN_HPL_Glas_Links', 'DO_UN_HPL_Glas_Rechts',
     '930x2115', 'DO_UN_HPL_Links', 'DO_UN_HPL_Rechts', '830X2115',
     'Deuropening', 'BF_32_Tourniquets_dubbeledeur_Aluminium']

我想回去：

['980X2350', '930x2115', '830X2115']

Answer 1

假设字符串列表作为输入，您可以使用正则表达式和列表理解：

l = ['980X2350', 'DO_UN_HPL_Glas_Links', 'DO_UN_HPL_Glas_Rechts',
     '930x2115', 'DO_UN_HPL_Links', 'DO_UN_HPL_Rechts', '830X2115',
     'Deuropening', 'BF_32_Tourniquets_dubbeledeur_Aluminium']
import re
regex = re.compile('\d+x\d+', flags=re.I)

out = [s for s in l if regex.match(s.strip())]

输出：

['980X2350', '930x2115', '830X2115']

Answer 2

假设一个字符串列表：

你可以在计数器中存储遇到的字母数，如果这个数字恰好等于 1 并且你遇到了一些数字那么你可以将它存储到你的输出列表中：

a = ['980X2350', 'DO_UN_HPL_Glas_Links', 'DO_UN_HPL_Glas_Rechts', '930x2115', 'DO_UN_HPL_Links',
    'DO_UN_HPL_Rechts', '830X2115', 'Deuropening' ]

alphabet = 'abcdefghijklmnopqrstuvwxyz'
alphabet+= alphabet.upper()
numeric = '0123456789'

numeric_flag = False

output = []

for item in a:
    alphabet_count = 0
    for char in item:
        if char in alphabet:
            alphabet_count += 1
        if char in numeric:
            numeric_flag = True

    if alphabet_count == 1 and numeric_flag:
        output.append(item)

print(output)
# ['980X2350', '930x2115', '830X2115']

Answer 3

这种小事不用导入re

这是一种比基于正则表达式的方法更有效的方法：

allowed = '0123456786x'

def filter_str(lst):
    output = []
    for s in lst:
        c = s.lower().strip()
        if all(i in allowed for i in c) and c.count('x') == 1:
            output.append(s)
    return output

如果字符串必须包含两个数字字段：

allowed = '0123456786x'

def filter_str(lst):
    output = []
    for s in lst:
        c = s.lower().strip()
        n = len(c) - 1
        if all(i in allowed for i in c) and c.count('x') == 1 and c.index('x') not in (0, n):
            output.append(s)
    return output

all 函数 short-circuits（即一旦注册了 Falsy 值就停止检查），所有 Python 逻辑运算符也 short-circuit，对于 and 运算符，如果 left-hand 操作数为 Falsy，则不会执行 right-hand 操作数，因此我的代码看起来确实比基于正则表达式的代码长，但实际上执行速度更快，因为正则表达式检查整个字符串并且不 short-circuit.

根据格式获取某些项目

Get certain items based on their formatting

python

ironpython