计算一个角色在电影剧本中说的话
Counting the words a character said in a movie script
我已经在一些帮助下设法发现了口语。
现在我正在寻找让选定的人说出的文字。
这样我就可以输入 MIA 并获取她在电影中所说的每一个字
像这样:
name = input("Enter name:")
wordsspoken(script, name)
name1 = input("Enter another name:")
wordsspoken(script, name1)
这样我就可以数出后面的字数了
电影剧本是这样的
An awkward beat. They pass a wooden SALOON -- where a WESTERN
is being shot. Extras in COWBOY costumes drink coffee on the
steps.
Revision 25.
MIA (CONT'D)
I love this stuff. Makes coming to work
easier.
SEBASTIAN
I know what you mean. I get breakfast
five miles out of the way just to sit
outside a jazz club.
MIA
Oh yeah?
SEBASTIAN
It was called Van Beek. The swing bands
played there. Count Basie. Chick Webb.
(then,)
It's a samba-tapas place now.
MIA
A what?
SEBASTIAN
Samba-tapas. It's... Exactly. The joke's on
history.
我会先询问用户脚本中的所有名称。然后询问他们想要哪个名字的单词。我会逐字搜索文本,直到找到想要的名字,然后将以下单词复制到变量中,直到找到与脚本中其他人匹配的名字。现在人们可以说出另一个角色的名字,但是如果您假设说话的人的标题全部大写,或者在一行中,文本应该很容易过滤。
for word in script:
if word == speaker and word.isupper(): # you may want to check that this is on its own line as well.
recording = True
elif word in character_names and word.isupper(): # you may want to check that this is on its own line as well.
recording = False
if recording:
spoken_text += word + " "
如果你想只通过一次脚本来计算你的计数(我想这可能会很长),你可以只跟踪哪个角色在说话;像小型状态机一样设置:
import re
from collections import Counter, defaultdict
words_spoken = defaultdict(Counter)
currently_speaking = 'Narrator'
for line in SCRIPT.split('\n'):
name = line.replace('(CONT\'D)', '').strip()
if re.match('^[A-Z]+$', name):
currently_speaking = name
else:
words_spoken[currently_speaking].update(line.split())
您可以使用更复杂的正则表达式来检测说话者何时发生变化,但这应该可以解决问题。
我将概述如何生成一个口述,它可以为您提供所有说话者所说的单词数,以及一个近似于您现有实现的口述。
一般用途
如果我们将单词定义为沿“ ”拆分的字符串中的任意字符块 (space)...
import re
speaker = '' # current speaker
words = 0 # number of words on line
word_count = {} # dict of speakers and the number of words they speak
for line in script.split('\n'):
if re.match('^[ ]{19}[^ ]{1,}.*', line): # name of speaker
speaker = line.split(' (')[0][19:]
if re.match('^[ ]{6}[^ ]{1,}.*', line): # dialogue line
words = len(line.split())
if speaker in word_count:
word_count[speaker] += words
else:
word_count[speaker] = words
如果 John Doe 说了 55 个单词,则生成格式为 {'JOHN DOE':55}
的字典。
示例输出:
>>> word_count['MIA']
13
您的实施
这是上述过程的一个版本,它与您的实现很接近。
import re
def wordsspoken(script,name):
word_count = 0
for line in script.split('\n'):
if re.match('^[ ]{19}[^ ]{1,}.*', line): # name of speaker
speaker = line.split(' (')[0][19:]
if re.match('^[ ]{6}[^ ]{1,}.*', line): # dialogue line
if speaker == name:
word_count += len(line.split())
print(word_count)
def main():
name = input("Enter name:")
wordsspoken(script, name)
name1 = input("Enter another name:")
wordsspoken(script, name1)
上面有一些不错的想法。以下内容在 Python 2.x 和 3.x 中应该可以正常工作:
import codecs
from collections import defaultdict
speaker_words = defaultdict(str)
with codecs.open('script.txt', 'r', 'utf8') as f:
speaker = ''
for line in f.read().split('\n'):
# skip empty lines
if not line.split():
continue
# speakers have their names in all uppercase
first_word = line.split()[0]
if (len(first_word) > 1) and all([char.isupper() for char in first_word]):
# remove the (CONT'D) from a speaker string
speaker = line.split('(')[0].strip()
# check if this is a dialogue line
elif len(line) - len(line.lstrip()) == 6:
speaker_words[speaker] += line.strip() + ' '
# get a Python-version-agnostic input
try:
prompt = raw_input
except:
prompt = input
speaker = prompt('Enter name: ').strip().upper()
print(speaker_words[speaker])
示例输出:
Enter name: sebastian
I know what you mean. I get breakfast five miles out of the way just to sit outside a jazz club. It was called Van Beek. The swing bands played there. Count Basie. Chick Webb. It's a samba-tapas place now. Samba-tapas. It's... Exactly. The joke's on history.
我已经在一些帮助下设法发现了口语。 现在我正在寻找让选定的人说出的文字。 这样我就可以输入 MIA 并获取她在电影中所说的每一个字 像这样:
name = input("Enter name:")
wordsspoken(script, name)
name1 = input("Enter another name:")
wordsspoken(script, name1)
这样我就可以数出后面的字数了
电影剧本是这样的
An awkward beat. They pass a wooden SALOON -- where a WESTERN
is being shot. Extras in COWBOY costumes drink coffee on the
steps.
Revision 25.
MIA (CONT'D)
I love this stuff. Makes coming to work
easier.
SEBASTIAN
I know what you mean. I get breakfast
five miles out of the way just to sit
outside a jazz club.
MIA
Oh yeah?
SEBASTIAN
It was called Van Beek. The swing bands
played there. Count Basie. Chick Webb.
(then,)
It's a samba-tapas place now.
MIA
A what?
SEBASTIAN
Samba-tapas. It's... Exactly. The joke's on
history.
我会先询问用户脚本中的所有名称。然后询问他们想要哪个名字的单词。我会逐字搜索文本,直到找到想要的名字,然后将以下单词复制到变量中,直到找到与脚本中其他人匹配的名字。现在人们可以说出另一个角色的名字,但是如果您假设说话的人的标题全部大写,或者在一行中,文本应该很容易过滤。
for word in script:
if word == speaker and word.isupper(): # you may want to check that this is on its own line as well.
recording = True
elif word in character_names and word.isupper(): # you may want to check that this is on its own line as well.
recording = False
if recording:
spoken_text += word + " "
如果你想只通过一次脚本来计算你的计数(我想这可能会很长),你可以只跟踪哪个角色在说话;像小型状态机一样设置:
import re
from collections import Counter, defaultdict
words_spoken = defaultdict(Counter)
currently_speaking = 'Narrator'
for line in SCRIPT.split('\n'):
name = line.replace('(CONT\'D)', '').strip()
if re.match('^[A-Z]+$', name):
currently_speaking = name
else:
words_spoken[currently_speaking].update(line.split())
您可以使用更复杂的正则表达式来检测说话者何时发生变化,但这应该可以解决问题。
我将概述如何生成一个口述,它可以为您提供所有说话者所说的单词数,以及一个近似于您现有实现的口述。
一般用途
如果我们将单词定义为沿“ ”拆分的字符串中的任意字符块 (space)...
import re
speaker = '' # current speaker
words = 0 # number of words on line
word_count = {} # dict of speakers and the number of words they speak
for line in script.split('\n'):
if re.match('^[ ]{19}[^ ]{1,}.*', line): # name of speaker
speaker = line.split(' (')[0][19:]
if re.match('^[ ]{6}[^ ]{1,}.*', line): # dialogue line
words = len(line.split())
if speaker in word_count:
word_count[speaker] += words
else:
word_count[speaker] = words
如果 John Doe 说了 55 个单词,则生成格式为 {'JOHN DOE':55}
的字典。
示例输出:
>>> word_count['MIA']
13
您的实施
这是上述过程的一个版本,它与您的实现很接近。
import re
def wordsspoken(script,name):
word_count = 0
for line in script.split('\n'):
if re.match('^[ ]{19}[^ ]{1,}.*', line): # name of speaker
speaker = line.split(' (')[0][19:]
if re.match('^[ ]{6}[^ ]{1,}.*', line): # dialogue line
if speaker == name:
word_count += len(line.split())
print(word_count)
def main():
name = input("Enter name:")
wordsspoken(script, name)
name1 = input("Enter another name:")
wordsspoken(script, name1)
上面有一些不错的想法。以下内容在 Python 2.x 和 3.x 中应该可以正常工作:
import codecs
from collections import defaultdict
speaker_words = defaultdict(str)
with codecs.open('script.txt', 'r', 'utf8') as f:
speaker = ''
for line in f.read().split('\n'):
# skip empty lines
if not line.split():
continue
# speakers have their names in all uppercase
first_word = line.split()[0]
if (len(first_word) > 1) and all([char.isupper() for char in first_word]):
# remove the (CONT'D) from a speaker string
speaker = line.split('(')[0].strip()
# check if this is a dialogue line
elif len(line) - len(line.lstrip()) == 6:
speaker_words[speaker] += line.strip() + ' '
# get a Python-version-agnostic input
try:
prompt = raw_input
except:
prompt = input
speaker = prompt('Enter name: ').strip().upper()
print(speaker_words[speaker])
示例输出:
Enter name: sebastian
I know what you mean. I get breakfast five miles out of the way just to sit outside a jazz club. It was called Van Beek. The swing bands played there. Count Basie. Chick Webb. It's a samba-tapas place now. Samba-tapas. It's... Exactly. The joke's on history.