在流中组合音频和图像

Question

我希望能够即时创建图像，也能即时创建音频，并能够将它们组合成一个 rtmp 流（用于 Twitch 或 YouTube）。目标是在 Python 3 中完成此操作，因为这是我编写的机器人所使用的语言。无需保存到磁盘的奖励积分。

到目前为止，我已经弄清楚了如何通过加载 PNG 图像并循环播放以及加载 mp3 然后将它们组合在流中来使用 ffmpeg 流式传输到 rtmp 服务器。问题是我必须从文件中加载至少其中之一。

我知道我可以使用 Moviepy 创建视频，但我不知道是否可以将视频从 Moviepy 流式传输到 ffmpeg 或直接流式传输到 rtmp。我想我必须生成很多非常短的剪辑并发送它们，但我想知道是否有现有的解决方案。

我听说还有 OpenCV 可以流式传输到 rtmp，但不能处理音频。

我成功测试过的 ffmpeg 命令的编辑版本是

ffmpeg -loop 1 -framerate 15 -i ScreenRover.png -i "Song-Stereo.mp3" -c:v libx264 -preset fast -pix_fmt yuv420p -threads 0 -f flv rtmp://SITE-SUCH-AS-TWITCH/.../STREAM-KEY

或

cat Song-Stereo.mp3 | ffmpeg -loop 1 -framerate 15 -i ScreenRover.png -i - -c:v libx264 -preset fast -pix_fmt yuv420p -threads 0 -f flv rtmp://SITE-SUCH-AS-TWITCH/.../STREAM-KEY

我知道这些命令没有正确设置以实现流畅的流式传输，结果设法搞砸了 Twitch 和 Youtube 的播放器，我将不得不想办法解决这个问题。

问题是我认为在现场创建图像和音频时我不能同时流式传输它们。我必须从硬盘加载其中一个。当试图对命令或用户聊天或任何其他需要实时反应的事情做出反应时，这会成为一个问题。我也不想因为不断地保存而破坏我的硬盘。

至于 python 代码，到目前为止我尝试制作视频的是以下代码。这仍然保存到 HD 并且没有实时响应，所以这对我来说不是很有用。视频本身没问题，唯一的例外是随着时间的推移，随着视频接近尾声，二维码显示的时钟与视频的时钟之间的距离开始越来越远。如果它在直播时出现，我可以解决这个限制。

def make_frame(t):
  img = qrcode.make("Hello! The second is %s!" % t)
  return numpy.array(img.convert("RGB"))

clip = mpy.VideoClip(make_frame, duration=120)
clip.write_gif("test.gif",fps=15)

gifclip = mpy.VideoFileClip("test.gif")
gifclip.set_duration(120).write_videofile("test.mp4",fps=15)

我的目标是能够根据

的伪代码生成一些东西

original_video = qrcode_generator("I don't know, a clock, pyotp, today's news sources, just anything that can be generated on the fly!")
original_video.overlay_text(0,0,"This is some sample text, the left two are coordinates, the right three are font, size, and color", Times_New_Roman, 12, Blue)
original_video.add_audio(sine_wave_generator(0,180,2)) # frequency min-max, seconds

# NOTICE - I did not add any time measurements to the actual video itself. The whole point is this is a live stream and not a video clip, so the time frame would be now. The 2 seconds list above is for our psuedo sine wave generator to know how long the audio clip should be, not for the actual streaming library.

stream.send_to_rtmp_server(original_video) # Doesn't matter if ffmpeg or some native library

以上示例是我在 Python 中创建视频然后进行流式传输的目的。我不是在尝试创建剪辑然后稍后进行流式传输，我是在尝试让程序能够响应外部事件然后更新它的流以执行任何它想做的事情。它有点像聊天机器人，但使用视频而不是文本。

def track_movement(...):
  ...
  return ...

original_video = user_submitted_clip(chat.lastVideoMessage)
original_video.overlay_text(0,0,"The robot watches the user's movements and puts a blue square around it.", Times_New_Roman, 12, Blue)
original_video.add_audio(sine_wave_generator(0,180,2)) # frequency min-max, seconds

# It would be awesome if I could also figure out how to perform advance actions such as tracking movements or pulling a face out of a clip and then applying effects to it on the fly. I know OpenCV can track movements and I hear that it can work with streams, but I cannot figure out how that works. Any help would be appreciated! Thanks!

因为我忘了添加导入，这里有一些有用的导入在我的文件中！

import pyotp
import qrcode
from io import BytesIO
from moviepy import editor as mpy

pyotp 库用于生成一次性验证码，qrcode 用于二维码，BytesIO 用于虚拟文件，moviepy 用于生成 GIF 和 MP4。我相信 BytesIO 可能对将数据通过管道传输到流服务很有用，但如何发生完全取决于数据如何发送到服务，无论是命令行上的 ffmpeg（来自子进程导入 Popen，PIPE）还是本地图书馆。

Answer 1

您是否在通过 CMD 使用 ffmpeg.exe 和运行命令？如果是这样，您可以使用 concat demuxer 或管道。当您使用 concat demuxer 时，ffmpeg 可以从文本文件中获取图像输入。文本文件应包含图像路径，ffmpeg 可以从不同的文件夹中找到这些图像。以下代码行显示了如何使用 concat demuxer。图像位置保存到 input.txt fie。

ffmpeg -f concat -i input.txt -vsync vfr -pix_fmt yuv420p output.mp4

但最合适的解决方案是使用数据管道将图像提供给 ffmpeg。

cat *.png | ffmpeg -f image2pipe -i - output.mkv

您可以查看this link 以查看有关 ffmpeg 数据管道的更多信息。

生成多个视频并实时流式传输并不是一个非常稳定的解决方案。您可以运行解决几个问题。

Answer 2

我决定使用 Gstreamer 即时创建我的流。它可以让我获取单独的视频和音频流并将它们组合在一起。我现在没有确切的工作示例，但我希望很快就会在 .

找到答案或自己解决

在流中组合音频和图像

Combine Audio and Images in Stream

ffmpeg

rtmp

python-3.x

moviepy