如何使用 ffmpeg 连接 2 个视频并在它们之间添加一段时间的静音 + 黑屏？

Question

有几个现有的示例，但我想要一个使用 complex_filter 的命令来实现 objective，而无需执行生成空白视频/音频文件等额外操作。

环顾四周，这是我迄今为止想到的最好的：

ffmpeg -i video1.mp4 -i video2.mp4 -filter_complex "color=black:s=960x540:d=1[b0];aevalsrc=0:d=1[s0];[0:v:0][0:a:0][b0][s0][1:v:0][1:a:0]concat=n=3:v=1:a=1[outv][outa]" -map '[outv]' -map '[outa]' out.mp4

我对这样一条命令的分解理解是

指定 2 个输入
定义一个复杂的过滤器
定义一个时长为1s的黑色视频流，命名为b0
定义一个持续时间为 1s 的无声音频流，并将其命名为 s0
使用

第一个流：[0:v:0][0:a:0] // video1.mp4
第二个流：[b0][s0] // 我之前定义的东西
第三流：[1:v:0][1:a:0] // video2.mp4

分别为视频和音频定义2个输出流[outv][outa]
合并成out.mp4

所有文件输入都是正确的 960x540，都使用相同的音频和视频编解码器，但 ffmpeg 给我这个错误并开始使用 100% CPU

More than 1000 frames duplicated

我认为我在过滤器中定义的流有问题 - 什么问题？我需要在某处指定更多参数吗？

编辑：这是输入视频元数据的 MediaInfo 打印输出

Format                                   : MPEG-4
Format profile                           : Base Media
Codec ID                                 : isom (isom/iso2/mp41)
File size                                : 7.82 MiB
Duration                                 : 31 s 449 ms
Overall bit rate                         : 2 086 kb/s
Writing application                      : Lavf59.16.100

Video
ID                                       : 1
Format                                   : HEVC
Format/Info                              : High Efficiency Video Coding
Format profile                           : Main@L3@Main
Codec ID                                 : hvc1
Codec ID/Info                            : High Efficiency Video Coding
Duration                                 : 31 s 449 ms
Bit rate                                 : 1 973 kb/s
Maximum bit rate                         : 2 000 kb/s
Width                                    : 960 pixels
Height                                   : 540 pixels
Display aspect ratio                     : 16:9
Frame rate mode                          : Constant
Frame rate                               : 23.976 (24000/1001) FPS
Color space                              : YUV
Chroma subsampling                       : 4:2:0 (Type 0)
Bit depth                                : 8 bits
Scan type                                : Progressive
Bits/(Pixel*Frame)                       : 0.159
Stream size                              : 7.40 MiB (95%)
Writing library                          : x265 3.4+31-6722fce1f:[Mac OS X][clang 12.0.0][64 bit] 8bit+10bit+12bit
Encoding settings                        : cpuid=1111039 / frame-threads=4 / wpp / no-pmode / no-pme / no-psnr / no-ssim / log-level=2 / input-csp=1 / input-res=960x540 / interlace=0 / total-frames=0 / level-idc=0 / high-tier=1 / uhd-bd=0 / ref=3 / no-allow-non-conformance / no-repeat-headers / annexb / no-aud / no-hrd / info / hash=0 / no-temporal-layers / open-gop / min-keyint=23 / keyint=250 / gop-lookahead=0 / bframes=4 / b-adapt=2 / b-pyramid / bframe-bias=0 / rc-lookahead=20 / lookahead-slices=0 / scenecut=40 / hist-scenecut=0 / radl=0 / no-splice / no-intra-refresh / ctu=64 / min-cu-size=8 / no-rect / no-amp / max-tu-size=32 / tu-inter-depth=1 / tu-intra-depth=1 / limit-tu=0 / rdoq-level=0 / dynamic-rd=0.00 / no-ssim-rd / signhide / no-tskip / nr-intra=0 / nr-inter=0 / no-constrained-intra / strong-intra-smoothing / max-merge=3 / limit-refs=1 / no-limit-modes / me=1 / subme=2 / merange=57 / temporal-mvp / no-frame-dup / no-hme / weightp / no-weightb / no-analyze-src-pics / deblock=0:0 / sao / no-sao-non-deblock / rd=3 / selective-sao=4 / early-skip / rskip / no-fast-intra / no-tskip-fast / no-cu-lossless / b-intra / no-splitrd-skip / rdpenalty=0 / psy-rd=2.00 / psy-rdoq=0.00 / no-rd-refine / no-lossless / cbqpoffs=0 / crqpoffs=0 / rc=abr / bitrate=2000 / qcomp=0.60 / qpstep=4 / stats-write=1 / stats-read=0 / slow-firstpass / ipratio=1.40 / pbratio=1.30 / aq-mode=2 / aq-strength=1.00 / cutree / zone-count=0 / no-strict-cbr / qg-size=32 / no-rc-grain / qpmax=69 / qpmin=0 / no-const-vbv / sar=1 / overscan=0 / videoformat=5 / range=0 / colorprim=1 / transfer=1 / colormatrix=1 / chromaloc=1 / chromaloc-top=0 / chromaloc-bottom=0 / display-window=0 / cll=0,0 / min-luma=0 / max-luma=255 / log2-max-poc-lsb=8 / vui-timing-info / vui-hrd-info / slices=1 / no-opt-qp-pps / no-opt-ref-list-length-pps / no-multi-pass-opt-rps / scenecut-bias=0.05 / hist-threshold=0.03 / no-opt-cu-delta-qp / no-aq-motion / no-hdr10 / no-hdr10-opt / no-dhdr10-opt / no-idr-recovery-sei / analysis-reuse-level=0 / analysis-save-reuse-level=0 / analysis-load-reuse-level=0 / scale-factor=0 / refine-intra=0 / refine-inter=0 / refine-mv=1 / refine-ctu-distortion=0 / no-limit-sao / ctu-info=0 / no-lowpass-dct / refine-analysis-type=0 / copy-pic=1 / max-ausize-factor=1.0 / no-dynamic-refine / no-single-sei / no-hevc-aq / no-svt / no-field / qp-adaptation-range=1.00 / scenecut-aware-qp=0conformance-window-offsets / right=0 / bottom=0 / decoder-max-rate=0 / no-vbv-live-multi-pass
Color range                              : Limited
Color primaries                          : BT.709
Transfer characteristics                 : BT.709
Matrix coefficients                      : BT.709
Codec configuration box                  : hvcC

Audio
ID                                       : 2
Format                                   : AAC LC
Format/Info                              : Advanced Audio Codec Low Complexity
Codec ID                                 : mp4a-40-2
Duration                                 : 31 s 449 ms
Source duration                          : 31 s 492 ms
Source_Duration_LastFrame                : -17 ms
Bit rate mode                            : Constant
Bit rate                                 : 106 kb/s
Channel(s)                               : 2 channels
Channel layout                           : L R
Sampling rate                            : 48.0 kHz
Frame rate                               : 46.875 FPS (1024 SPF)
Compression mode                         : Lossy
Stream size                              : 405 KiB (5%)
Source stream size                       : 406 KiB (5%)
Title                                    : Core Media Audio
Language                                 : English
Default                                  : Yes
Alternate group                          : 1

Answer 1

warning/error

More than 1000 frames duplicated

由 concat 过滤器产生，因为@Gyan 雄辩地把它放在 OP 下的评论中：

This usually happens when the videos are VFR or inputs have different frame rates or frame rate can't be ascertained by the concat filter.

因此，请确保所有视频流的帧率，包括由输入过滤器创建的填充流。这同样适用于音频流。

所以试试

ffmpeg -i video1.mp4 -i video2.mp4 \
  -filter_complex "color=black:s=960x540:d=1[b0]:r=24000/1001;\
                   aevalsrc=0:s=44100:d=1[s0];\
                   [0:v:0][0:a:0][b0][s0][1:v:0][1:a:0]concat=n=3:v=1:a=1[outv][outa]" \
  -map '[outv]' -map '[outa]' out.mp4

请注意，aevalsrc 的默认采样率为 44100。因此 s=44100 不是必需的，但很好地提醒了这一匹配要求。

如何使用 ffmpeg 连接 2 个视频并在它们之间添加一段时间的静音 + 黑屏？

How do I use ffmpeg to concatenate 2 videos and add a period of silence + black screen between them?

video

ffmpeg