将字符名称及其行添加到数组/列表中的新字典
Add character names and their lines to a new dictionary from array / list
我有一个电影剧本。我的第一份工作是把每个字的台词收集到字典里。
稍后我需要将数据放入一个系列中。
现在,我将所有对话都列在一个列表中,从角色名称开始。它的格式如下:
对话[0]
'NAME1\n(16 个空格)YO,YO,真好你在这里。'
所有名字都以\n结尾。然后所有对话行都以 16 个空格开头。我认为这可能会有用,但我不确定如何使用它。
我尝试了很多方法,但几乎没有运气。
result = {}
for lines in dialogue:
first_token = para.split()[0]
if first_token.endswith('\n'): #this would be the name
name, line = para.split(on the new line?)
name = name.strip()
if name not in result:
result[name] = []
result[name].append(line)
return result
这段代码给我一大堆错误,所以我认为在这里列出它们没有用。
理想情况下,我需要将每个字符作为字典中的第一个键,然后将它们的所有行作为数据。
像这样:
名称 1:[第 1 行,第 2 行,第 3 行...]
名称 2:[第 1 行,第 2 行,第 3 行...]
编辑:
部分人物名字有两个字
编辑 2:
也许回到原始电影脚本文本文件会更容易。
格式如下:
NAME1
Yo, Yo, good that you're here
man.
NAME2
(Laughing)
I don't think that's good! We were
at the club, smoking, laughing -- doing
stuff.
编辑后的答案:回到您的原始文件,如果我们可以假设所有字符名称前面都有 22 个空白字符,我们可以这样做:
example = """
NAME1
Yo, Yo, good that you're here
man.
NAME2
(Laughing)
I don't think that's good! We were
at the club, smoking, laughing -- doing
stuff.
"""
lines = example.split('\n')
characters = [line for line in lines if line.startswith(' ' * 22)]
result = {c.strip(): [] for c in characters}
current = ''
for line in lines:
if line in characters:
current = line.strip()
elif current:
result[current].append(line.strip())
现在的结果是:
{'NAME1': ["Yo, Yo, good that you're here", 'man.', ''], 'NAME2': ['(Laughing)', "I don't think that's good! We were", 'at the club, smoking, laughing -- doing', 'stuff.', '']}
这可能需要一些额外的清理工作
方法一:
由'\n'分割并剥离。列表的第一个元素是名字,剩下的是你的台词。 str.pop 将就地修改您的列表。
如果您的对话有多行,此解决方案将不起作用。
>>> dialogue
'NAME1\n abc adbaiuho saidainbw\n sadi waiudi qoweoq asodhoqndoqndqwdq.\n qiudwqd aisdiqnd asfiqwofnqofoweqomdomkmq!!'
>>> lines = list(map(str.strip, dialogue.split('\n')))
>>> lines
['NAME1', 'abc adbaiuho saidainbw', 'sadi waiudi qoweoq asodhoqndoqndqwdq.', 'qiudwqd aisdiqnd asfiqwofnqofoweqomdomkmq!!']
>>> name = lines.pop(0)
>>> name
'NAME1'
>>> lines
['abc adbaiuho saidainbw', 'sadi waiudi qoweoq asodhoqndoqndqwdq.', 'qiudwqd aisdiqnd asfiqwofnqofoweqomdomkmq!!']
方法二:
当您有多行对话时,即对话可能包含 '\n' 字符,首先按第一次出现的 '\n' 字符拆分。第一个元素将是名称,下一个元素我们将进一步拆分为“16 个空格”。
>>> dialogue
'NAME1\n abc adbaiuho saidainbw\n sadi waiudi qoweoq asodhoqndoqndqwdq.\n qiudwqd aisdiqnd asfiqwofnqofoweqomdomkmq!!'
>>> parse_temp = dialogue.split('\n',1)
>>> name = parse_temp[0]
>>> lines = parse_temp[1].split(" " * 16)[1:]
>>> name
'NAME1'
>>> lines
['abc adbaiuho saidainbw\n', 'sadi waiudi qoweoq asodhoqndoqndqwdq.\n', 'qiudwqd aisdiqnd asfiqwofnqofoweqomdomkmq!!']
作为函数,
def parse(dialogue):
parse_temp = dialogue.split('\n',1)
name = parse_temp[0].strip()
lines = list(map(str.strip, parse_temp[1].split(" " * 16)[1:]))
return name, lines
注意:对于第二次拆分,您可以使用您拥有的任何空白模式进行替换。您甚至可以使用正则表达式拆分它。我在这里使用了简单的16个空格。
根据迭代请求添加的代码:
data = dict()
for _dialogue in dialogue:
name, lines = parse(_dialogue)
data[name] = data.get(name, list()) + lines
- 拆分文本行
- 为每个演员创建带有唯一键的字典
- 向字典添加演员台词
编辑:在名称正则表达式中添加空格,去除名称空白
import re
lines = [
"Dialogue[0] 'NAME1 \n YO, YO, good that you're here man.'",
"Dialogue[1] 'NAME 1\n YO, YO, ",
"Dialogue[2] 'NAME2\n YO, YO, good that ",
"Dialogue[3] 'NAME2\n YO, YO, good that you're here'",
]
regex = h = re.compile("'([A-Z 0-9]+)\n[ ]{16}(.+)")
lineslist = [re.findall(regex, line) for line in lines]
lineslist = [ match[0] for match in lineslist if len(match)]
keys = [l[0].strip() for l in lineslist]
result = {k:[] for k in set(keys)}
[result[l[0].strip()].append(l[1]) for l in lineslist]
result
输出:
{'NAME 1': ['YO, YO, '],
'NAME1': ["YO, YO, good that you're here man.'"],
'NAME2': ['YO, YO, good that ', "YO, YO, good that you're here'"]}
我有一个电影剧本。我的第一份工作是把每个字的台词收集到字典里。
稍后我需要将数据放入一个系列中。
现在,我将所有对话都列在一个列表中,从角色名称开始。它的格式如下:
对话[0] 'NAME1\n(16 个空格)YO,YO,真好你在这里。'
所有名字都以\n结尾。然后所有对话行都以 16 个空格开头。我认为这可能会有用,但我不确定如何使用它。
我尝试了很多方法,但几乎没有运气。
result = {}
for lines in dialogue:
first_token = para.split()[0]
if first_token.endswith('\n'): #this would be the name
name, line = para.split(on the new line?)
name = name.strip()
if name not in result:
result[name] = []
result[name].append(line)
return result
这段代码给我一大堆错误,所以我认为在这里列出它们没有用。
理想情况下,我需要将每个字符作为字典中的第一个键,然后将它们的所有行作为数据。
像这样:
名称 1:[第 1 行,第 2 行,第 3 行...] 名称 2:[第 1 行,第 2 行,第 3 行...]
编辑: 部分人物名字有两个字
编辑 2: 也许回到原始电影脚本文本文件会更容易。
格式如下:
NAME1
Yo, Yo, good that you're here
man.
NAME2
(Laughing)
I don't think that's good! We were
at the club, smoking, laughing -- doing
stuff.
编辑后的答案:回到您的原始文件,如果我们可以假设所有字符名称前面都有 22 个空白字符,我们可以这样做:
example = """
NAME1
Yo, Yo, good that you're here
man.
NAME2
(Laughing)
I don't think that's good! We were
at the club, smoking, laughing -- doing
stuff.
"""
lines = example.split('\n')
characters = [line for line in lines if line.startswith(' ' * 22)]
result = {c.strip(): [] for c in characters}
current = ''
for line in lines:
if line in characters:
current = line.strip()
elif current:
result[current].append(line.strip())
现在的结果是:
{'NAME1': ["Yo, Yo, good that you're here", 'man.', ''], 'NAME2': ['(Laughing)', "I don't think that's good! We were", 'at the club, smoking, laughing -- doing', 'stuff.', '']}
这可能需要一些额外的清理工作
方法一:
由'\n'分割并剥离。列表的第一个元素是名字,剩下的是你的台词。 str.pop 将就地修改您的列表。 如果您的对话有多行,此解决方案将不起作用。
>>> dialogue
'NAME1\n abc adbaiuho saidainbw\n sadi waiudi qoweoq asodhoqndoqndqwdq.\n qiudwqd aisdiqnd asfiqwofnqofoweqomdomkmq!!'
>>> lines = list(map(str.strip, dialogue.split('\n')))
>>> lines
['NAME1', 'abc adbaiuho saidainbw', 'sadi waiudi qoweoq asodhoqndoqndqwdq.', 'qiudwqd aisdiqnd asfiqwofnqofoweqomdomkmq!!']
>>> name = lines.pop(0)
>>> name
'NAME1'
>>> lines
['abc adbaiuho saidainbw', 'sadi waiudi qoweoq asodhoqndoqndqwdq.', 'qiudwqd aisdiqnd asfiqwofnqofoweqomdomkmq!!']
方法二:
当您有多行对话时,即对话可能包含 '\n' 字符,首先按第一次出现的 '\n' 字符拆分。第一个元素将是名称,下一个元素我们将进一步拆分为“16 个空格”。
>>> dialogue
'NAME1\n abc adbaiuho saidainbw\n sadi waiudi qoweoq asodhoqndoqndqwdq.\n qiudwqd aisdiqnd asfiqwofnqofoweqomdomkmq!!'
>>> parse_temp = dialogue.split('\n',1)
>>> name = parse_temp[0]
>>> lines = parse_temp[1].split(" " * 16)[1:]
>>> name
'NAME1'
>>> lines
['abc adbaiuho saidainbw\n', 'sadi waiudi qoweoq asodhoqndoqndqwdq.\n', 'qiudwqd aisdiqnd asfiqwofnqofoweqomdomkmq!!']
作为函数,
def parse(dialogue):
parse_temp = dialogue.split('\n',1)
name = parse_temp[0].strip()
lines = list(map(str.strip, parse_temp[1].split(" " * 16)[1:]))
return name, lines
注意:对于第二次拆分,您可以使用您拥有的任何空白模式进行替换。您甚至可以使用正则表达式拆分它。我在这里使用了简单的16个空格。
根据迭代请求添加的代码:
data = dict()
for _dialogue in dialogue:
name, lines = parse(_dialogue)
data[name] = data.get(name, list()) + lines
- 拆分文本行
- 为每个演员创建带有唯一键的字典
- 向字典添加演员台词
编辑:在名称正则表达式中添加空格,去除名称空白
import re
lines = [
"Dialogue[0] 'NAME1 \n YO, YO, good that you're here man.'",
"Dialogue[1] 'NAME 1\n YO, YO, ",
"Dialogue[2] 'NAME2\n YO, YO, good that ",
"Dialogue[3] 'NAME2\n YO, YO, good that you're here'",
]
regex = h = re.compile("'([A-Z 0-9]+)\n[ ]{16}(.+)")
lineslist = [re.findall(regex, line) for line in lines]
lineslist = [ match[0] for match in lineslist if len(match)]
keys = [l[0].strip() for l in lineslist]
result = {k:[] for k in set(keys)}
[result[l[0].strip()].append(l[1]) for l in lineslist]
result
输出:
{'NAME 1': ['YO, YO, '],
'NAME1': ["YO, YO, good that you're here man.'"],
'NAME2': ['YO, YO, good that ', "YO, YO, good that you're here'"]}