Python 如何将pyaudio字节转换成虚拟文件?

Python How to convert pyaudio bytes into virtual file?

简而言之

有没有办法将原始音频数据(通过PyAudio模块获得)转换成虚拟文件的形式(可以使用python open()函数获得),而不用将其保存到磁盘并从磁盘读取?详情如下。

我在做什么

我正在使用 PyAudio 录制音频,然后将其输入张量流模型以进行预测。目前,当我首先将录制的声音作为 .wav 文件保存在磁盘上,然后再次读取它以将其输入模型时,它就可以工作了。下面是记录和保存的代码:

import pyaudio
import wave

CHUNK_LENGTH = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
RECORD_SECONDS = 1

p = pyaudio.PyAudio()

stream = p.open(format=FORMAT,
                channels=CHANNELS,
                rate=RATE,
                input=True,
                frames_per_buffer=CHUNK_LENGTH)

print("* recording")
frames = [stream.read(RATE * RECORD_SECONDS)]  # here is the recorded data, in the form of list of bytes
print("* done recording")

stream.stop_stream()
stream.close()
p.terminate()

获得原始音频数据(变量frames)后,可以使用python wave模块保存,如下所示。我们可以看到,在保存的时候,有些meta message必须通过调用wf.setxxx.

这样的函数来保存
import os

output_dir = "data/"
output_path = output_dir + "{:%Y%m%d_%H%M%S}.wav".format(datetime.now())

if not os.path.exists(output_dir):
    os.makedirs(output_dir)

# save the recorded data as wav file using python `wave` module
wf = wave.open(output_path, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
wf.close()

这里是使用保存的文件对 tensorflow 模型进行 运行 推理的代码。它只是简单地将其读取为二进制文件,然后模型将处理其余部分。

import classifier  # my tensorflow model

with open(output_path, 'rb') as f:
    w = f.read()
    classifier.run_graph(w, labels, 5)

问题

出于实时需求,我需要持续播放音频并将其输入模型一次。但是一直把文件保存在磁盘上,然后一遍又一遍地读取,这似乎是不合理的,这样会浪费时间I/O。

我想将数据保存在内存中直接使用,而不是反复保存和读取。但是pythonwave模块不支持同时读写(参考here)。

如果我直接提供没有元数据(例如频道、帧率)的数据(可以在保存过程中由 wave 模块添加),如下所示:

w = b''.join(frames)
classifier.run_graph(w, labels, 5)

我会得到如下错误:

2021-04-07 11:05:08.228544: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at decode_wav_op.cc:55 : Invalid argument: Header mismatch: Expected RIFF but found 
Traceback (most recent call last):
  File "C:\Users\anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1365, in _do_call
    return fn(*args)
  File "C:\Users\anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "C:\Users\anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: Header mismatch: Expected RIFF but found

这里提供了我正在使用的张量流模型:ML-KWS-for-MCU,希望对您有所帮助。 这是产生错误的代码:(classifier.run_graph())

def run_graph(wav_data, labels, num_top_predictions):
    """Runs the audio data through the graph and prints predictions."""
    with tf.Session() as sess:
        #   Feed the audio data as input to the graph.
        #   predictions  will contain a two-dimensional array, where one
        #   dimension represents the input image count, and the other has
        #   predictions per class
        softmax_tensor = sess.graph.get_tensor_by_name("labels_softmax:0")
        predictions, = sess.run(softmax_tensor, {"wav_data:0": wav_data})

        # Sort to show labels in order of confidence
        top_k = predictions.argsort()[-num_top_predictions:][::-1]
        for node_id in top_k:
            human_string = labels[node_id]
            score = predictions[node_id]
            print('%s (score = %.5f)' % (human_string, score))

        return 0

您应该可以使用 io.BytesIO 而不是物理文件,它们共享相同的接口,但 BytesIO 仅保存在内存中:

import io
container = io.BytesIO()
wf = wave.open(container, 'wb')
wf.setnchannels(4)
wf.setsampwidth(4)
wf.setframerate(4)
wf.writeframes(b'abcdef')

# Read the data up to this point
container.seek(0)
data_package = container.read()

# add some more data...
wf.writeframes(b'ghijk')

# read the data added since last
container.seek(len(data_package))
data_package = container.read()

这应该允许您在使用 TensorFlow 代码读取多余数据的同时连续将数据流式传输到文件中。