Return 仅特定值与 Python 脚本
Return Only Specific Values With Python Script
我是一个完全 Python 的初学者,正在尝试编写一个脚本来查找文件中的黑色视频和无声音频,并且 return 仅在它们出现的时间实例。
我有以下代码使用 ffmpeg-python 包装器来获取 stdout 中的值,但我想不出一种有效的方法来将 stdout 或 stderror 解析为 return 仅 black_start、black_end、[=41=的实例]black_duration, silence_start, silence_end, silence_duration。
对于那些不是专家的人来说,把 ffmpeg 放在一边,我如何使用 re.findall 或类似的方法将正则表达式定义为 return 只有上述值?
import ffmpeg
input = ffmpeg.input(source)
video = input.video.filter('blackdetect', d=0, pix_th=0.00)
audio = input.audio.filter('silencedetect', d=0.1, n='-60dB')
out = ffmpeg.output(audio, video, 'out.null', format='null')
run = out.run_async(pipe_stdout=True, pipe_stderr=True)
result = run.communicate()
print(result)
这导致 ffmpeg 输出,其中包含我需要的结果。这是输出(为简洁起见进行了编辑):
(b'', b"ffmpeg version 4.2.2 Copyright (c) 2000-2019 the FFmpeg developers
built with Apple clang version 11.0.0 (clang-1100.0.33.17)
configuration: --prefix=/usr/local/Cellar/ffmpeg/4.2.2_3 --enable-shared --enable-pthreads --...
[silencedetect @ 0x7fdd82d011c0] silence_start: 0
frame= 112 fps=0.0 q=-0.0 size=N/A time=00:00:05.00 bitrate=N/A speed=9.96x
[blackdetect @ 0x7fdd82e06580] black_start:0 black_end:5 black_duration:5
[silencedetect @ 0x7fdd82d011c0] silence_end: 5.06285 | silence_duration: 5.06285
frame= 211 fps=210 q=-0.0 size=N/A time=00:00:09.00 bitrate=N/A speed=8.97x
frame= 319 fps=212 q=-0.0 size=N/A time=00:00:13.00 bitrate=N/A speed=8.63x
frame= 427 fps=213 q=-0.0 size=N/A time=00:00:17.08 bitrate=N/A speed=8.51x
frame= 537 fps=214 q=-0.0 size=N/A time=00:00:22.00 bitrate=N/A speed=8.77x
frame= 650 fps=216 q=-0.0 size=N/A time=00:00:26.00 bitrate=N/A speed=8.63x
frame= 761 fps=217 q=-0.0 size=N/A time=00:00:31.00 bitrate=N/A speed=8.82x
frame= 874 fps=218 q=-0.0 size=N/A time=00:00:35.00 bitrate=N/A speed=8.71x
frame= 980 fps=217 q=-0.0 size=N/A time=00:00:39.20 bitrate=N/A speed=8.67x
...
frame= 5680 fps=213 q=-0.0 size=N/A time=00:03:47.20 bitrate=N/A speed=8.53x
[silencedetect @ 0x7fdd82d011c0] silence_start: 227.733
[silencedetect @ 0x7fdd82d011c0] silence_end: 229.051 | silence_duration: 1.3184
[silencedetect @ 0x7fdd82d011c0] silence_start: 229.051
[blackdetect @ 0x7fdd82e06580] black_start:229.28 black_end:230.24 black_duration:0.96
frame= 5757 fps=214 q=-0.0 Lsize=N/A time=00:03:50.28 bitrate=N/A speed=8.54x
video:3013kB audio:43178kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[silencedetect @ 0x7fdd82d011c0] silence_end: 230.28 | silence_duration: 1.22856
\n")
将输出数据解析为 find/return 只有那些结果值的最有效方法是什么,以便我可以在我的代码中根据它们构建更多逻辑?在这种情况下,我只需要以下值 returned:
silence_start: 0
silence_end: 5.06285
silence_duration: 5.06285
black_start:0
black_end:5
black_duration:5
silence_start: 227.733
silence_end: 229.051
silence_duration: 1.3184
black_start:229.28
black_end:230.24
black_duration:0.96
silence_start: 229.051
silence_end: 230.28
silence_duration: 1.22856
我用正则表达式尝试了很多不同的 re.findall() 选项,但我得到的最接近的是 returning 值的名称。例如,如果我将此添加到上面:
found = re.findall('\b' + 'silence_end' + '\b', str(result))
print(found)
我得到的只有名字:
['silence_end', 'silence_end', 'silence_end']
借用Mikel's Answer
regex = re.compile(r'''
[\S]+: # a key (any word followed by a colon)
(?:
\s # then a space in between
(?!\S+:)\S+\d+ # then a value (any word not followed by a colon)
) # match multiple values if present
''', re.VERBOSE)
matches = regex.findall(str)
matches
['configuration: --prefix=/usr/local/Cellar/ffmpeg/4.2.2_3',
'silence_end: 5.06285',
'silence_duration: 5.06285',
'silence_start: 227.733',
'silence_end: 229.051',
'silence_duration: 1.3184',
'silence_start: 229.051',
'silence_end: 230.28',
'silence_duration: 1.22856']
您可以进行 2 次交替以组合所有可能性,然后将 1+ 位数字与可选点和 1+ 位数字相匹配:
\b(?:silence|black)_(?:start|end|duration):\s*\d+(?:\.\d+)?\b
模式将匹配:
\b
字边界
(?:silence|black)_
匹配静音或黑色和下划线
(?:start|end|duration):\s*
匹配开始或结束或持续时间,:
和 0+ 个空白字符
\d+(?:\.\d+)?
匹配 1+ 个数字和可选的点和数字部分
\b
字边界
例如
import re
test_str = "your string"
regex = r"\b(?:silence|black)_(?:start|end|duration):\s*\d+(?:\.\d+)?\b"
print(re.findall(regex, test_str))
输出
['silence_start: 0', 'black_start:0', 'black_end:5', 'black_duration:5', 'silence_end: 5.06285', 'silence_duration: 5.06285', 'silence_start: 227.733', 'silence_end: 229.051', 'silence_duration: 1.3184', 'silence_start: 229.051', 'black_start:229.28', 'black_end:230.24', 'black_duration:0.96', 'silence_end: 230.28', 'silence_duration: 1.22856']
我是一个完全 Python 的初学者,正在尝试编写一个脚本来查找文件中的黑色视频和无声音频,并且 return 仅在它们出现的时间实例。
我有以下代码使用 ffmpeg-python 包装器来获取 stdout 中的值,但我想不出一种有效的方法来将 stdout 或 stderror 解析为 return 仅 black_start、black_end、[=41=的实例]black_duration, silence_start, silence_end, silence_duration。
对于那些不是专家的人来说,把 ffmpeg 放在一边,我如何使用 re.findall 或类似的方法将正则表达式定义为 return 只有上述值?
import ffmpeg
input = ffmpeg.input(source)
video = input.video.filter('blackdetect', d=0, pix_th=0.00)
audio = input.audio.filter('silencedetect', d=0.1, n='-60dB')
out = ffmpeg.output(audio, video, 'out.null', format='null')
run = out.run_async(pipe_stdout=True, pipe_stderr=True)
result = run.communicate()
print(result)
这导致 ffmpeg 输出,其中包含我需要的结果。这是输出(为简洁起见进行了编辑):
(b'', b"ffmpeg version 4.2.2 Copyright (c) 2000-2019 the FFmpeg developers
built with Apple clang version 11.0.0 (clang-1100.0.33.17)
configuration: --prefix=/usr/local/Cellar/ffmpeg/4.2.2_3 --enable-shared --enable-pthreads --...
[silencedetect @ 0x7fdd82d011c0] silence_start: 0
frame= 112 fps=0.0 q=-0.0 size=N/A time=00:00:05.00 bitrate=N/A speed=9.96x
[blackdetect @ 0x7fdd82e06580] black_start:0 black_end:5 black_duration:5
[silencedetect @ 0x7fdd82d011c0] silence_end: 5.06285 | silence_duration: 5.06285
frame= 211 fps=210 q=-0.0 size=N/A time=00:00:09.00 bitrate=N/A speed=8.97x
frame= 319 fps=212 q=-0.0 size=N/A time=00:00:13.00 bitrate=N/A speed=8.63x
frame= 427 fps=213 q=-0.0 size=N/A time=00:00:17.08 bitrate=N/A speed=8.51x
frame= 537 fps=214 q=-0.0 size=N/A time=00:00:22.00 bitrate=N/A speed=8.77x
frame= 650 fps=216 q=-0.0 size=N/A time=00:00:26.00 bitrate=N/A speed=8.63x
frame= 761 fps=217 q=-0.0 size=N/A time=00:00:31.00 bitrate=N/A speed=8.82x
frame= 874 fps=218 q=-0.0 size=N/A time=00:00:35.00 bitrate=N/A speed=8.71x
frame= 980 fps=217 q=-0.0 size=N/A time=00:00:39.20 bitrate=N/A speed=8.67x
...
frame= 5680 fps=213 q=-0.0 size=N/A time=00:03:47.20 bitrate=N/A speed=8.53x
[silencedetect @ 0x7fdd82d011c0] silence_start: 227.733
[silencedetect @ 0x7fdd82d011c0] silence_end: 229.051 | silence_duration: 1.3184
[silencedetect @ 0x7fdd82d011c0] silence_start: 229.051
[blackdetect @ 0x7fdd82e06580] black_start:229.28 black_end:230.24 black_duration:0.96
frame= 5757 fps=214 q=-0.0 Lsize=N/A time=00:03:50.28 bitrate=N/A speed=8.54x
video:3013kB audio:43178kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[silencedetect @ 0x7fdd82d011c0] silence_end: 230.28 | silence_duration: 1.22856
\n")
将输出数据解析为 find/return 只有那些结果值的最有效方法是什么,以便我可以在我的代码中根据它们构建更多逻辑?在这种情况下,我只需要以下值 returned:
silence_start: 0
silence_end: 5.06285
silence_duration: 5.06285
black_start:0
black_end:5
black_duration:5
silence_start: 227.733
silence_end: 229.051
silence_duration: 1.3184
black_start:229.28
black_end:230.24
black_duration:0.96
silence_start: 229.051
silence_end: 230.28
silence_duration: 1.22856
我用正则表达式尝试了很多不同的 re.findall() 选项,但我得到的最接近的是 returning 值的名称。例如,如果我将此添加到上面:
found = re.findall('\b' + 'silence_end' + '\b', str(result))
print(found)
我得到的只有名字:
['silence_end', 'silence_end', 'silence_end']
借用Mikel's Answer
regex = re.compile(r'''
[\S]+: # a key (any word followed by a colon)
(?:
\s # then a space in between
(?!\S+:)\S+\d+ # then a value (any word not followed by a colon)
) # match multiple values if present
''', re.VERBOSE)
matches = regex.findall(str)
matches
['configuration: --prefix=/usr/local/Cellar/ffmpeg/4.2.2_3',
'silence_end: 5.06285',
'silence_duration: 5.06285',
'silence_start: 227.733',
'silence_end: 229.051',
'silence_duration: 1.3184',
'silence_start: 229.051',
'silence_end: 230.28',
'silence_duration: 1.22856']
您可以进行 2 次交替以组合所有可能性,然后将 1+ 位数字与可选点和 1+ 位数字相匹配:
\b(?:silence|black)_(?:start|end|duration):\s*\d+(?:\.\d+)?\b
模式将匹配:
\b
字边界(?:silence|black)_
匹配静音或黑色和下划线(?:start|end|duration):\s*
匹配开始或结束或持续时间,:
和 0+ 个空白字符\d+(?:\.\d+)?
匹配 1+ 个数字和可选的点和数字部分\b
字边界
例如
import re
test_str = "your string"
regex = r"\b(?:silence|black)_(?:start|end|duration):\s*\d+(?:\.\d+)?\b"
print(re.findall(regex, test_str))
输出
['silence_start: 0', 'black_start:0', 'black_end:5', 'black_duration:5', 'silence_end: 5.06285', 'silence_duration: 5.06285', 'silence_start: 227.733', 'silence_end: 229.051', 'silence_duration: 1.3184', 'silence_start: 229.051', 'black_start:229.28', 'black_end:230.24', 'black_duration:0.96', 'silence_end: 230.28', 'silence_duration: 1.22856']