只知道开始和结束字符时收集列表中的子字符串
Collect substrings in a list when only knowing start and end characters
这个问题的措辞有点尴尬,但这就是我的意思:
我有一个很大的字符串,叫做 text
。其中包含来自不同用户的消息,并且我想在字符串中隔离某些内容。它看起来像这样:
text = 'user1:\nhey how are you\nrandomuser:\nim doing good\nrandomuser2:\noh hey user1\n'
我要隔离的是 user1 的消息和响应。所以制作两个如下所示的列表:
messages = ['hey how are you']
responses = ['im doing good'] # and so on, as many messages and responses there are
当然,有时其他用户之间会互相交谈,所以我只想要用户1 消息后的第一个响应。我想这里可以使用带有正则表达式的东西,但我找不到确切的内容和方法。
如果需要澄清,请告诉我。谢谢
真的不需要正则表达式。相反,扫描字符串并使用其排列的先验知识来获取您想要的片段。
text = 'user1:\nhey how are you\nrandomuser:\nim doing good\nrandomuser2:\noh hey user1\n'
target = 'user1:'
messages = []
responses = []
lastUser = None
isResponse = False
for v in text.split('\n'):
if lastUser == None:
lastUser = v
else:
if lastUser == target:
messages.append(v)
isResponse = True
elif isResponse:
responses.append(v)
isResponse = False
lastUser = None
text = 'user1:\nhey how are you\nrandomuser:\nim doing good\nrandomuser2:\noh hey user1\n'
tokens = text.splitlines()
print(tokens)
['user1:', 'hey how are you', 'randomuser:', 'im doing good', 'randomuser2:', 'oh hey user1']
我认为你可以完成剩下的工作。
如果你的格式总是这样(user1 和 randomuser1234...),你可以使用下面的代码:
import re
text = 'user1:\nhey how are you\nrandomuser:\nim doing good\nrandomuser2:\noh hey user1\n'
messages = []
responses = []
user1_msg = re.compile(r'user1:\n(.+)\n')
randomusers_msg = re.compile(r'randomuser\d*:\n(.+)\n')
messages.extend(user1_msg.findall(text))
responses.extend(randomusers_msg.findall(text))
print(messages)
print(responses)
输出:
['hey how are you']
['im doing good', 'oh hey user1']
希望对您有所帮助。
这个问题的措辞有点尴尬,但这就是我的意思:
我有一个很大的字符串,叫做 text
。其中包含来自不同用户的消息,并且我想在字符串中隔离某些内容。它看起来像这样:
text = 'user1:\nhey how are you\nrandomuser:\nim doing good\nrandomuser2:\noh hey user1\n'
我要隔离的是 user1 的消息和响应。所以制作两个如下所示的列表:
messages = ['hey how are you']
responses = ['im doing good'] # and so on, as many messages and responses there are
当然,有时其他用户之间会互相交谈,所以我只想要用户1 消息后的第一个响应。我想这里可以使用带有正则表达式的东西,但我找不到确切的内容和方法。
如果需要澄清,请告诉我。谢谢
真的不需要正则表达式。相反,扫描字符串并使用其排列的先验知识来获取您想要的片段。
text = 'user1:\nhey how are you\nrandomuser:\nim doing good\nrandomuser2:\noh hey user1\n'
target = 'user1:'
messages = []
responses = []
lastUser = None
isResponse = False
for v in text.split('\n'):
if lastUser == None:
lastUser = v
else:
if lastUser == target:
messages.append(v)
isResponse = True
elif isResponse:
responses.append(v)
isResponse = False
lastUser = None
text = 'user1:\nhey how are you\nrandomuser:\nim doing good\nrandomuser2:\noh hey user1\n'
tokens = text.splitlines()
print(tokens)
['user1:', 'hey how are you', 'randomuser:', 'im doing good', 'randomuser2:', 'oh hey user1']
我认为你可以完成剩下的工作。
如果你的格式总是这样(user1 和 randomuser1234...),你可以使用下面的代码:
import re
text = 'user1:\nhey how are you\nrandomuser:\nim doing good\nrandomuser2:\noh hey user1\n'
messages = []
responses = []
user1_msg = re.compile(r'user1:\n(.+)\n')
randomusers_msg = re.compile(r'randomuser\d*:\n(.+)\n')
messages.extend(user1_msg.findall(text))
responses.extend(randomusers_msg.findall(text))
print(messages)
print(responses)
输出:
['hey how are you']
['im doing good', 'oh hey user1']
希望对您有所帮助。