如何使用 Python re 提取某些字符之间的所有文本

Question

我正在尝试提取某些字符之间的所有文本，但我当前的代码只是 returns 一个空列表。每行都有一个长文本字符串，如下所示：

"[{'index': 0, 'spent_transaction_hash': '4b3e9741022d4', 'spent_output_index': 68, 'script_asm': '3045022100e9e2280f5e6d965ced44', 'value': Decimal('381094.000000000')}\n {'index': 1, 'spent_transaction_hash': '0cfbd8591a3423', 'spent_output_index': 2, 'script_asm': '3045022100a', 'value': Decimal('3790496.000000000')}]"

我只需要“spent_transaction_hash”的值。例如，我想创建一个包含 ['4b3e9741022d4', '0cfbd8591a3423'] 列表的新列。我正在尝试提取 'spent_transaction_hash': 和逗号之间的值。这是我当前的代码：

my_list = []

for row in df['column']:
    value = re.findall(r'''spent_transaction_hash'\: \(\[\'(.*?)\'\]''', row)
    my_list.append(value)

这段代码只是returns一个空白列表。谁能告诉我我的代码的哪一部分是错误的？

Answer 1

是您要查找的内容吗？ 'spent_transaction_hash'\: '([a-z0-9]+)'

测试：https://regex101.com/r/cnviyS/1

Answer 2

因为看起来您已经有了一个 Python dict 对象的列表，但是是字符串格式的，为什么不只是 eval 它并获取所需的键呢？当然，使用这种方法你不需要正则表达式匹配，这正是问题所问的。

from decimal import Decimal

v = """\
[{'index': 0, 'spent_transaction_hash': '4b3e9741022d4', 'spent_output_index': 68, 'script_asm': '3045022100e9e2280f5e6d965ced44', 'value': Decimal('381094.000000000')}\n {'index': 1, 'spent_transaction_hash': '0cfbd8591a3423', 'spent_output_index': 2, 'script_asm': '3045022100a', 'value': Decimal('3790496.000000000')}]\
"""

L = eval(v.replace('\n', ','))
hashes = [e['spent_transaction_hash'] for e in L]

print(hashes)
# ['4b3e9741022d4', '0cfbd8591a3423']

如何使用 Python re 提取某些字符之间的所有文本

How to extract all text between certain characters with Python re

python

python-3.x

python-re