子列表的正则表达式

Question

我有一个列表列表，例如：

my_list = [['aaa_house', 'aaa_car', 'aaa_table'], ['aaa_love', 'aaa_hate', 'aaa_life']]

desired_result = [['house', 'car', 'table'], ['love', 'hate', 'life']]

我正在使用正则表达式来过滤所需的字符串。

我试过了：

import re
pattern = re.compile(r'\baaa[_]')
[pattern.search(i).group(1) for i in lista_fim]

我试过了

def find_fims(sublist):
    pattern = re.compile(r'\baaa_')
    return [pattern.search(i).group(1) for i in sublist]


answer = map(find_fims, lista_with_sublists)

我无法使用它获得任何结果。我如何将函数应用于列表的子列表但保留我的子列表格式？我只想获得我的子列表的正确名称。

有什么帮助吗？

Answer 1

请尝试以下模式：

(\w)+_(\w+)

(\w)\1+ 匹配您要丢弃的重复字符串，例如aaa
(\w+) 抓取第2组下的目标词

不过您将不得不使用第 2 组而不是第 1 组。

Answer 2

import re
out_list = [[re.findall(r'aaa_(\w+)', i)[0] for i in j] for j in lista_with_sublists] 

#output: 

out_list = [['house', 'car', 'table'], ['love', 'hate', 'life']]

Answer 3

你的模式与你想要丢弃的东西相匹配，你正在用它来提取（你不需要的东西）。所以，你只需要使用 re.sub:

import re

pattern = re.compile(r'\baaa_')
my_list = [['aaa_house', 'aaa_car', 'aaa_table'], ['aaa_love', 'aaa_hate', 'aaa_life']]
print([[pattern.sub('', i) for i in y] for y in my_list])

输出：

[['house', 'car', 'table'], ['love', 'hate', 'life']]

See the Python demo and the regex demo.

注意如果要匹配 字符串 开头的 aaa，请将 \b 替换为^。参见 this regex demo。

请注意，您不需要将 _ 放入字符 class 中，_ 不是特殊的正则表达式元字符，将单个单词 char 放入字符 class（此构造旨在填充多个字符或字符范围）。

子列表的正则表达式

Regex expression for sublists

python

regex

dictionary

list

sublist