只知道开始和结束字符时收集列表中的子字符串

Question

这个问题的措辞有点尴尬，但这就是我的意思：我有一个很大的字符串，叫做 text。其中包含来自不同用户的消息，并且我想在字符串中隔离某些内容。它看起来像这样：

text = 'user1:\nhey how are you\nrandomuser:\nim doing good\nrandomuser2:\noh hey user1\n'

我要隔离的是 user1 的消息和响应。所以制作两个如下所示的列表：

messages = ['hey how are you']
responses = ['im doing good'] # and so on, as many messages and responses there are

当然，有时其他用户之间会互相交谈，所以我只想要用户1 消息后的第一个响应。我想这里可以使用带有正则表达式的东西，但我找不到确切的内容和方法。
如果需要澄清，请告诉我。谢谢

Answer 1

真的不需要正则表达式。相反，扫描字符串并使用其排列的先验知识来获取您想要的片段。

text = 'user1:\nhey how are you\nrandomuser:\nim doing good\nrandomuser2:\noh hey user1\n'

target = 'user1:'
messages = []
responses = []

lastUser = None
isResponse = False
for v in text.split('\n'):
    if lastUser == None:
        lastUser = v
    else:
        if lastUser == target:
            messages.append(v)
            isResponse = True
        elif isResponse:
            responses.append(v)
            isResponse = False
        lastUser = None

Answer 2

text = 'user1:\nhey how are you\nrandomuser:\nim doing good\nrandomuser2:\noh hey user1\n'
tokens = text.splitlines()
print(tokens)

['user1:', 'hey how are you', 'randomuser:', 'im doing good', 'randomuser2:', 'oh hey user1']

我认为你可以完成剩下的工作。

Answer 3

如果你的格式总是这样（user1 和 randomuser1234...），你可以使用下面的代码：

import re

text = 'user1:\nhey how are you\nrandomuser:\nim doing good\nrandomuser2:\noh hey user1\n'

messages = []
responses = []

user1_msg = re.compile(r'user1:\n(.+)\n')
randomusers_msg = re.compile(r'randomuser\d*:\n(.+)\n')

messages.extend(user1_msg.findall(text))
responses.extend(randomusers_msg.findall(text))

print(messages)
print(responses)

输出：

['hey how are you']
['im doing good', 'oh hey user1']

希望对您有所帮助。

只知道开始和结束字符时收集列表中的子字符串

Collect substrings in a list when only knowing start and end characters

python

python-re