将字符名称及其行添加到数组/列表中的新字典

Add character names and their lines to a new dictionary from array / list

我有一个电影剧本。我的第一份工作是把每个字的台词收集到字典里。

稍后我需要将数据放入一个系列中。

现在,我将所有对话都列在一个列表中,从角色名称开始。它的格式如下:

对话[0] 'NAME1\n(16 个空格)YO,YO,真好你在这里。'

所有名字都以\n结尾。然后所有对话行都以 16 个空格开头。我认为这可能会有用,但我不确定如何使用它。

我尝试了很多方法,但几乎没有运气。

    result = {}
    for lines in dialogue:
        first_token = para.split()[0]
        if first_token.endswith('\n'): #this would be the name
            name, line = para.split(on the new line?)
            name = name.strip()
            if name not in result:
                result[name] = []
            result[name].append(line)
    return result

这段代码给我一大堆错误,所以我认为在这里列出它们没有用。

理想情况下,我需要将每个字符作为字典中的第一个键,然后将它们的所有行作为数据。

像这样:

名称 1:[第 1 行,第 2 行,第 3 行...] 名称 2:[第 1 行,第 2 行,第 3 行...]

编辑: 部分人物名字有两个字

编辑 2: 也许回到原始电影脚本文本文件会更容易。

格式如下:

          NAME1
Yo, Yo, good that you're here
man.

          NAME2
     (Laughing)
I don't think that's good!  We were
at the club, smoking, laughing -- doing
stuff.

编辑后的答案:回到您的原始文件,如果我们可以假设所有字符名称前面都有 22 个空白字符,我们可以这样做:

example = """
                      NAME1
            Yo, Yo, good that you're here
            man.

                      NAME2
                 (Laughing)
            I don't think that's good!  We were
            at the club, smoking, laughing -- doing
            stuff.
"""

lines = example.split('\n')
characters = [line for line in lines if line.startswith(' ' * 22)]
result = {c.strip(): [] for c in characters}
current = ''
for line in lines:
    if line in characters:
        current = line.strip()
    elif current:
        result[current].append(line.strip())

现在的结果是:

{'NAME1': ["Yo, Yo, good that you're here", 'man.', ''], 'NAME2': ['(Laughing)', "I don't think that's good!  We were", 'at the club, smoking, laughing -- doing', 'stuff.', '']}

这可能需要一些额外的清理工作

方法一:

由'\n'分割并剥离。列表的第一个元素是名字,剩下的是你的台词。 str.pop 将就地修改您的列表。 如果您的对话有多行,此解决方案将不起作用。

>>> dialogue
'NAME1\n                abc adbaiuho saidainbw\n                sadi waiudi qoweoq asodhoqndoqndqwdq.\n                qiudwqd aisdiqnd asfiqwofnqofoweqomdomkmq!!'
>>> lines = list(map(str.strip, dialogue.split('\n')))
>>> lines
['NAME1', 'abc adbaiuho saidainbw', 'sadi waiudi qoweoq asodhoqndoqndqwdq.', 'qiudwqd aisdiqnd asfiqwofnqofoweqomdomkmq!!']
>>> name = lines.pop(0)
>>> name
'NAME1'
>>> lines
['abc adbaiuho saidainbw', 'sadi waiudi qoweoq asodhoqndoqndqwdq.', 'qiudwqd aisdiqnd asfiqwofnqofoweqomdomkmq!!']

方法二:

当您有多行对话时,即对话可能包含 '\n' 字符,首先按第一次出现的 '\n' 字符拆分。第一个元素将是名称,下一个元素我们将进一步拆分为“16 个空格”。

>>> dialogue
'NAME1\n                abc adbaiuho saidainbw\n                sadi waiudi qoweoq asodhoqndoqndqwdq.\n                qiudwqd aisdiqnd asfiqwofnqofoweqomdomkmq!!'
>>> parse_temp = dialogue.split('\n',1)
>>> name = parse_temp[0]
>>> lines = parse_temp[1].split(" " * 16)[1:]
>>> name
'NAME1'
>>> lines
['abc adbaiuho saidainbw\n', 'sadi waiudi qoweoq asodhoqndoqndqwdq.\n', 'qiudwqd aisdiqnd asfiqwofnqofoweqomdomkmq!!']

作为函数,

def parse(dialogue):
    parse_temp = dialogue.split('\n',1)
    name = parse_temp[0].strip()
    lines = list(map(str.strip, parse_temp[1].split(" " * 16)[1:]))
    return name, lines

注意:对于第二次拆分,您可以使用您拥有的任何空白模式进行替换。您甚至可以使用正则表达式拆分它。我在这里使用了简单的16个空格。

根据迭代请求添加的代码:

data = dict()
for _dialogue in dialogue:
   name, lines = parse(_dialogue)
   data[name] = data.get(name, list()) + lines
  • 拆分文本行
  • 为每个演员创建带有唯一键的字典
  • 向字典添加演员台词

编辑:在名称正则表达式中添加空格,去除名称空白

import re
lines = [
    "Dialogue[0] 'NAME1 \n                YO, YO, good that you're here man.'",
    "Dialogue[1] 'NAME 1\n                YO, YO, ",
    "Dialogue[2] 'NAME2\n                YO, YO, good that ",
    "Dialogue[3] 'NAME2\n                YO, YO, good that you're here'",
]

regex = h = re.compile("'([A-Z 0-9]+)\n[ ]{16}(.+)")
lineslist = [re.findall(regex, line) for line in lines]
lineslist = [ match[0] for match in lineslist if len(match)]
keys = [l[0].strip() for l in lineslist]
result = {k:[] for k in set(keys)}
[result[l[0].strip()].append(l[1]) for l in lineslist]
result

输出:

{'NAME 1': ['YO, YO, '],
 'NAME1': ["YO, YO, good that you're here man.'"],
 'NAME2': ['YO, YO, good that ', "YO, YO, good that you're here'"]}