如何将 SRT 文件制作成数据集?
How to make an SRT file into a dataset?
是否可以将用于视频字幕的 SRT
文件 转换为数据集 ?
导入到 Excel 时,SRT
文件格式如下所示:
1
00:00:03,000 --> 00:00:04,000
OVERLAPS PURE COINCIDENCE THAT
...
随着 "video"/transcript 中时间的推移,这种模式仍在继续。我想这样格式化 SRT
文件:
number ; start ; end ; text
1 ; 00:00:03,000 ; 00:00:04,000 ; OVERLAPS PURE COINCIDENCE THAT
下面的 VBA 过程从本地文件加载标准 .srt
(SubRip 电影字幕文件)并将其拆分为活动 Excel 工作表上的 rows/columns。
从本地文件导入 SRT 字幕:
Sub importSRTfromFile(fName As String)
'Loads SRT from local file and converts to columns in Active Worksheet
Dim sIn As String, sOut As String, sArr() As String, x As Long
'load file
Open fName For Input As #1
While Not EOF(1)
Line Input #1, sIn
sOut = sOut & sIn & vbLf
Wend
Close #1
'convert LFs to delimiters & split into array
sOut = Replace(sOut, vbLf & vbLf, vbCr)
sOut = Replace(Replace(sOut, vbLf, "|"), " --> ", "|")
sArr = Split(sOut, vbCr)
'check if activesheet is blank
If ActiveSheet.UsedRange.Cells.Count > 1 Then
If MsgBox(UBound(sArr) & " rows found." & vbLf & vbLf & _
"Okay to clear worksheet '" & ActiveSheet.Name & "'?", _
vbOKCancel, "Delete Existing Data?") <> vbOK Then Exit Sub
ActiveSheet.Cells.ClearContents
End If
'breakout into rows
For x = 1 To UBound(sArr)
Range("A" & x) = sArr(x)
Next x
'split into columns
Columns("A:A").TextToColumns Destination:=Range("A1"), _
DataType:=xlDelimited, Other:=True, OtherChar:="|"
MsgBox "Imported " & UBound(sArr) & " rows from:" & vbLf & fName
End Sub
用法示例:
Sub test_FileImport()
importSRTfromFile "c:\yourPath\yourFilename.srt"
End Sub
从网站导入 SRT 字幕 URL:
或者,您可以从 网站 URL 导入 .srt
(或其他类似的文本文件),例如 https://subtitle-index.org/ :
Sub importSRTfromWeb(url As String)
'Loads SRT from URL and converts to columns in Active Worksheet
Dim sIn As String, sOut As String, sArr() As String, rw As Long
Dim httpData() As Byte, XMLHTTP As Object
'load file from URL
Set XMLHTTP = CreateObject("MSXML2.XMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.send
httpData = XMLHTTP.responseBody
Set XMLHTTP = Nothing
sOut = StrConv(httpData, vbUnicode)
'convert LFs to delimiters & split into array
sOut = Replace(sOut, vbLf & vbLf, vbCr)
sOut = Replace(Replace(sOut, vbLf, "|"), " --> ", "|")
sArr = Split(sOut, vbCr)
'check if activesheet is blank
If ActiveSheet.UsedRange.Cells.Count > 1 Then
If MsgBox(UBound(sArr) & " rows found." & vbLf & vbLf & _
"Okay to clear worksheet '" & ActiveSheet.Name & "'?", _
vbOKCancel, "Delete Existing Data?") <> vbOK Then Exit Sub
ActiveSheet.Cells.ClearContents
End If
'breakout into rows
For rw = 1 To UBound(sArr)
Range("A" & rw) = sArr(rw)
Next rw
'split into columns
Columns("A:A").TextToColumns Destination:=Range("A1"), _
DataType:=xlDelimited, Other:=True, OtherChar:="|"
MsgBox "Imported " & UBound(sArr) & " rows from:" & vbLf & url
End Sub
用法示例:
Sub testImport()
importSRTfromWeb _
"https://subtitle-index.org/download/4670541854528212663953859964/SRT/Pulp+Fiction"
End Sub
许多网站都免费提供 .srt
;您可能需要 right-click 下载按钮来复制 link(它可能有一个 .srt
扩展名或者可能是一个指针,如上例)。该过程不适用于 .zip
个文件。
更多信息:
在上面的代码中:
'breakout into rows
For rw = 1 To UBound(sArr)
Range("A" & rw) = sArr(rw)
Next rw
应替换为:
'breakout into rows
For rw = 0 To UBound(sArr)
Range("A" & rw+1) = sArr(rw)
Next rw
否则输出将从第 2 行开始
我使用 Vim 并编写了一个快速正则表达式来将 .srt 转换为 .csv 文件,供需要类似转换的翻译朋友使用。然后可以在 Excel / LibreOffice 中打开 csv 文件并保存为 .xls、.ods 或其他格式。
我的朋友不需要字幕编号出现在第一列中,因此正则表达式代码如下所示:
set fileencoding=utf-8
%s/"/""/g
g/^\d\+$/d
%s@^\(.*\) --> \(.*\)\n@"","","@g
%s/\n^$/"/g
保留子编号的变体:
set fileencoding=utf-8
%s/"/""/g
%s@\(^\d\+\)$\n^\(.*\) --> \(.*\)\n@"","","","@g
%s/\n^$/"/g
将此代码保存到扩展名为 .vim
的文本文件中,然后在 Vim / Gvim 中编辑 .srt 时获取此文件。将结果另存为 .csv。享受正则表达式的魔力!
注意:我的代码使用逗号作为字段分隔符。将上面代码中的逗号改为semi-colons即可使用semi-colons。我还添加了 double-quotes 作为字符串定界符,以防 double-quotes 和逗号出现在字幕文本中。更多错误证明!
是否可以将用于视频字幕的 SRT
文件 转换为数据集 ?
导入到 Excel 时,SRT
文件格式如下所示:
1
00:00:03,000 --> 00:00:04,000
OVERLAPS PURE COINCIDENCE THAT
...
随着 "video"/transcript 中时间的推移,这种模式仍在继续。我想这样格式化 SRT
文件:
number ; start ; end ; text
1 ; 00:00:03,000 ; 00:00:04,000 ; OVERLAPS PURE COINCIDENCE THAT
下面的 VBA 过程从本地文件加载标准 .srt
(SubRip 电影字幕文件)并将其拆分为活动 Excel 工作表上的 rows/columns。
从本地文件导入 SRT 字幕:
Sub importSRTfromFile(fName As String)
'Loads SRT from local file and converts to columns in Active Worksheet
Dim sIn As String, sOut As String, sArr() As String, x As Long
'load file
Open fName For Input As #1
While Not EOF(1)
Line Input #1, sIn
sOut = sOut & sIn & vbLf
Wend
Close #1
'convert LFs to delimiters & split into array
sOut = Replace(sOut, vbLf & vbLf, vbCr)
sOut = Replace(Replace(sOut, vbLf, "|"), " --> ", "|")
sArr = Split(sOut, vbCr)
'check if activesheet is blank
If ActiveSheet.UsedRange.Cells.Count > 1 Then
If MsgBox(UBound(sArr) & " rows found." & vbLf & vbLf & _
"Okay to clear worksheet '" & ActiveSheet.Name & "'?", _
vbOKCancel, "Delete Existing Data?") <> vbOK Then Exit Sub
ActiveSheet.Cells.ClearContents
End If
'breakout into rows
For x = 1 To UBound(sArr)
Range("A" & x) = sArr(x)
Next x
'split into columns
Columns("A:A").TextToColumns Destination:=Range("A1"), _
DataType:=xlDelimited, Other:=True, OtherChar:="|"
MsgBox "Imported " & UBound(sArr) & " rows from:" & vbLf & fName
End Sub
用法示例:
Sub test_FileImport()
importSRTfromFile "c:\yourPath\yourFilename.srt"
End Sub
从网站导入 SRT 字幕 URL:
或者,您可以从 网站 URL 导入 .srt
(或其他类似的文本文件),例如 https://subtitle-index.org/ :
Sub importSRTfromWeb(url As String)
'Loads SRT from URL and converts to columns in Active Worksheet
Dim sIn As String, sOut As String, sArr() As String, rw As Long
Dim httpData() As Byte, XMLHTTP As Object
'load file from URL
Set XMLHTTP = CreateObject("MSXML2.XMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.send
httpData = XMLHTTP.responseBody
Set XMLHTTP = Nothing
sOut = StrConv(httpData, vbUnicode)
'convert LFs to delimiters & split into array
sOut = Replace(sOut, vbLf & vbLf, vbCr)
sOut = Replace(Replace(sOut, vbLf, "|"), " --> ", "|")
sArr = Split(sOut, vbCr)
'check if activesheet is blank
If ActiveSheet.UsedRange.Cells.Count > 1 Then
If MsgBox(UBound(sArr) & " rows found." & vbLf & vbLf & _
"Okay to clear worksheet '" & ActiveSheet.Name & "'?", _
vbOKCancel, "Delete Existing Data?") <> vbOK Then Exit Sub
ActiveSheet.Cells.ClearContents
End If
'breakout into rows
For rw = 1 To UBound(sArr)
Range("A" & rw) = sArr(rw)
Next rw
'split into columns
Columns("A:A").TextToColumns Destination:=Range("A1"), _
DataType:=xlDelimited, Other:=True, OtherChar:="|"
MsgBox "Imported " & UBound(sArr) & " rows from:" & vbLf & url
End Sub
用法示例:
Sub testImport()
importSRTfromWeb _
"https://subtitle-index.org/download/4670541854528212663953859964/SRT/Pulp+Fiction"
End Sub
许多网站都免费提供 .srt
;您可能需要 right-click 下载按钮来复制 link(它可能有一个 .srt
扩展名或者可能是一个指针,如上例)。该过程不适用于 .zip
个文件。
更多信息:
在上面的代码中:
'breakout into rows
For rw = 1 To UBound(sArr)
Range("A" & rw) = sArr(rw)
Next rw
应替换为:
'breakout into rows
For rw = 0 To UBound(sArr)
Range("A" & rw+1) = sArr(rw)
Next rw
否则输出将从第 2 行开始
我使用 Vim 并编写了一个快速正则表达式来将 .srt 转换为 .csv 文件,供需要类似转换的翻译朋友使用。然后可以在 Excel / LibreOffice 中打开 csv 文件并保存为 .xls、.ods 或其他格式。 我的朋友不需要字幕编号出现在第一列中,因此正则表达式代码如下所示:
set fileencoding=utf-8
%s/"/""/g
g/^\d\+$/d
%s@^\(.*\) --> \(.*\)\n@"","","@g
%s/\n^$/"/g
保留子编号的变体:
set fileencoding=utf-8
%s/"/""/g
%s@\(^\d\+\)$\n^\(.*\) --> \(.*\)\n@"","","","@g
%s/\n^$/"/g
将此代码保存到扩展名为 .vim
的文本文件中,然后在 Vim / Gvim 中编辑 .srt 时获取此文件。将结果另存为 .csv。享受正则表达式的魔力!
注意:我的代码使用逗号作为字段分隔符。将上面代码中的逗号改为semi-colons即可使用semi-colons。我还添加了 double-quotes 作为字符串定界符,以防 double-quotes 和逗号出现在字幕文本中。更多错误证明!