如何根据规则在客户和客服代理之间拆分段落？

Question

我有一段记录了客户和客服人员的对话。如何将对话分开并创建两个列表（或任何其他格式，如字典），其中一个仅包含客户的文本，另一个仅包含代理的文本？

示例段落：
代理人姓名：您好！我叫 X。今天有什么可以帮到你的吗？ ( 4m 46s ) 客户：我的名字是 Y。这是我的问题 ( 4m 57s ) 代理名称：这是解决方案 ( 5m 40s ) 代理名称：你在吗？ ( 6m 30s ) 顾客：是的，我还在这里。我还是不明白... ( 6m 40s ) Agent Name: 好的。让我们尝试另一种方式。 ( 6m 50s ) Agent Name: 这能解决问题吗？ ( 7m 40s ) 代理人姓名：感谢您联系客服

预期输出：
仅包含代理文本的列表：['Agent Name: Hello! My name is X. How can I help you today? ( 4m 46s )', 'Agent Name: Are you there? ( 6m 30s )', '代理名称：好的。让我们尝试另一种方式。 ( 6m 50s )', 'Agent Name: Does that solve the problem? (7m 40s) Agent Name: Thank you for contacting the customer service.']

仅包含客户文本的列表：['Customer: My name is Y. Here is my issue ( 4m 57s )'，'客户：是的，我还在这里。我还是不明白...( 6m 40s )'].

谢谢！

Answer 1

给定：

txt='''\
Agent Name: Hello! My name is X. How can I help you today? ( 4m 46s ) Customer: My name is Y. Here is my issue ( 4m 57s ) Agent Name: Here's the solution ( 5m 40s ) Agent Name: Are you there? ( 6m 30s ) Customer: Yes I'm still here. I still don't understand... ( 6m 40s ) Agent Name: Ok. Let's try another way. ( 6m 50s ) Agent Name: Does that solve the problem? (7m 40s) Agent Name: Thank you for contacting the customer service.'''

您可以使用 re.findall:

s1='Agent Name:'
s2='Customer:'
>>> re.findall(rf'({s1}.*?(?={s2}|\Z))', txt)
['Agent Name: Hello! My name is X. How can I help you today? ( 4m 46s ) ', "Agent Name: Here's the solution ( 5m 40s ) Agent Name: Are you there? ( 6m 30s ) ", "Agent Name: Ok. Let's try another way. ( 6m 50s ) Agent Name: Does that solve the problem? (7m 40s) Agent Name: Thank you for contacting the customer service."]

>>> re.findall(rf'({s2}.*?(?={s1}|\Z))', txt)
['Customer: My name is Y. Here is my issue ( 4m 57s ) ', "Customer: Yes I'm still here. I still don't understand... ( 6m 40s ) "]

如何根据规则在客户和客服代理之间拆分段落？

How do I split a paragraph between customer and customer service agent based on rules?

python

regex

text-segmentation