Python 如何将pyaudio字节转换成虚拟文件？

Question

简而言之

有没有办法将原始音频数据（通过PyAudio模块获得）转换成虚拟文件的形式（可以使用python open()函数获得），而不用将其保存到磁盘并从磁盘读取？详情如下。

我在做什么

我正在使用 PyAudio 录制音频，然后将其输入张量流模型以进行预测。目前，当我首先将录制的声音作为 .wav 文件保存在磁盘上，然后再次读取它以将其输入模型时，它就可以工作了。下面是记录和保存的代码：

import pyaudio
import wave

CHUNK_LENGTH = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
RECORD_SECONDS = 1

p = pyaudio.PyAudio()

stream = p.open(format=FORMAT,
                channels=CHANNELS,
                rate=RATE,
                input=True,
                frames_per_buffer=CHUNK_LENGTH)

print("* recording")
frames = [stream.read(RATE * RECORD_SECONDS)]  # here is the recorded data, in the form of list of bytes
print("* done recording")

stream.stop_stream()
stream.close()
p.terminate()

获得原始音频数据（变量frames）后，可以使用python wave模块保存，如下所示。我们可以看到，在保存的时候，有些meta message必须通过调用wf.setxxx.

这样的函数来保存

import os

output_dir = "data/"
output_path = output_dir + "{:%Y%m%d_%H%M%S}.wav".format(datetime.now())

if not os.path.exists(output_dir):
    os.makedirs(output_dir)

# save the recorded data as wav file using python `wave` module
wf = wave.open(output_path, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
wf.close()

这里是使用保存的文件对 tensorflow 模型进行运行推理的代码。它只是简单地将其读取为二进制文件，然后模型将处理其余部分。

import classifier  # my tensorflow model

with open(output_path, 'rb') as f:
    w = f.read()
    classifier.run_graph(w, labels, 5)

问题

出于实时需求，我需要持续播放音频并将其输入模型一次。但是一直把文件保存在磁盘上，然后一遍又一遍地读取，这似乎是不合理的，这样会浪费时间I/O。

我想将数据保存在内存中直接使用，而不是反复保存和读取。但是pythonwave模块不支持同时读写（参考here）。

如果我直接提供没有元数据（例如频道、帧率）的数据（可以在保存过程中由 wave 模块添加），如下所示：

w = b''.join(frames)
classifier.run_graph(w, labels, 5)

我会得到如下错误：

2021-04-07 11:05:08.228544: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at decode_wav_op.cc:55 : Invalid argument: Header mismatch: Expected RIFF but found 
Traceback (most recent call last):
  File "C:\Users\anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1365, in _do_call
    return fn(*args)
  File "C:\Users\anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "C:\Users\anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: Header mismatch: Expected RIFF but found

这里提供了我正在使用的张量流模型：ML-KWS-for-MCU，希望对您有所帮助。这是产生错误的代码：(classifier.run_graph())

def run_graph(wav_data, labels, num_top_predictions):
    """Runs the audio data through the graph and prints predictions."""
    with tf.Session() as sess:
        #   Feed the audio data as input to the graph.
        #   predictions  will contain a two-dimensional array, where one
        #   dimension represents the input image count, and the other has
        #   predictions per class
        softmax_tensor = sess.graph.get_tensor_by_name("labels_softmax:0")
        predictions, = sess.run(softmax_tensor, {"wav_data:0": wav_data})

        # Sort to show labels in order of confidence
        top_k = predictions.argsort()[-num_top_predictions:][::-1]
        for node_id in top_k:
            human_string = labels[node_id]
            score = predictions[node_id]
            print('%s (score = %.5f)' % (human_string, score))

        return 0

Answer 1

您应该可以使用 io.BytesIO 而不是物理文件，它们共享相同的接口，但 BytesIO 仅保存在内存中：

import io
container = io.BytesIO()
wf = wave.open(container, 'wb')
wf.setnchannels(4)
wf.setsampwidth(4)
wf.setframerate(4)
wf.writeframes(b'abcdef')

# Read the data up to this point
container.seek(0)
data_package = container.read()

# add some more data...
wf.writeframes(b'ghijk')

# read the data added since last
container.seek(len(data_package))
data_package = container.read()

这应该允许您在使用 TensorFlow 代码读取多余数据的同时连续将数据流式传输到文件中。

Python 如何将pyaudio字节转换成虚拟文件？

Python How to convert pyaudio bytes into virtual file?

python

wave

pyaudio

tensorflow

简而言之

我在做什么

问题