如何在 .srt 文件中进行文本匹配并获取文本所在行的时间戳

How to do text matching in .srt file and get the timestamp of the line in which the text exists

return 值应该是该句子的开始时间。

import re

key = input("ENTER THE KEY PHRASE")
file = open('tcs.srt','r')

for line in file.readlines():
    if re.search(r'^%s'%key, line, re.I):
        print(line)

例如:

SERACH key : milestone

to be found in: 0:01:25,299 --> 0:01:31,099 one of the significant milestones and great momentum in many of the areas that

0:01:25,299 should be returned in seconds

使用 str.split 而不是正则表达式,您可以使用 if key in line

例如:

import re

key = input("ENTER THE KEY PHRASE")
file = open('tcs.srt','r')

for line in file.readlines():
    if key in line:
        print(line.split()[0])

代码:

text="0:01:25,299 --> 0:01:31,099 one of the significant milestones and great momentum in many of the areas that"
import re
print(re.findall(r"\d{1}\:\d{2}\:\d{2}\,\d{3}",text))

输出:

['0:01:25,299', '0:01:31,099']

.srt 个文件包含时间戳和字幕。时间格式为hours:minutes:seconds,milliseconds。这是 returns 第一个时间戳 hours:minutes:seconds,milliseconds --> hours:minutes:seconds,milliseconds 以秒为单位的函数。

import re

def return_seconds(line):
    timeValues = line[:line.index("-->")].strip().replace(",",":").split(":")
    timeValues = list(map(int, timeValues))
    hours_to_seconds = timeValues[0] * 3600
    minutes_to_seconds = timeValues[1] * 60
    seconds = timeValues[2]
    milliseconds_to_seconds = round(timeValues[3]/1000, 2)
    total_seconds = hours_to_seconds + minutes_to_seconds + seconds + milliseconds_to_seconds
    return total_seconds

key = input("ENTER THE KEY PHRASE")
file = open('tcs.srt','r')

previousLine = ""

for line in file.readlines():
    if key in line:
        print("Starting seconds at line is {}".format(return_seconds(previousLine)))
    previousLine = line