Python 如何将pyaudio字节转换成虚拟文件?
Python How to convert pyaudio bytes into virtual file?
简而言之
有没有办法将原始音频数据(通过PyAudio
模块获得)转换成虚拟文件的形式(可以使用python open()
函数获得),而不用将其保存到磁盘并从磁盘读取?详情如下。
我在做什么
我正在使用 PyAudio
录制音频,然后将其输入张量流模型以进行预测。目前,当我首先将录制的声音作为 .wav
文件保存在磁盘上,然后再次读取它以将其输入模型时,它就可以工作了。下面是记录和保存的代码:
import pyaudio
import wave
CHUNK_LENGTH = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
RECORD_SECONDS = 1
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK_LENGTH)
print("* recording")
frames = [stream.read(RATE * RECORD_SECONDS)] # here is the recorded data, in the form of list of bytes
print("* done recording")
stream.stop_stream()
stream.close()
p.terminate()
获得原始音频数据(变量frames
)后,可以使用python wave
模块保存,如下所示。我们可以看到,在保存的时候,有些meta message必须通过调用wf.setxxx
.
这样的函数来保存
import os
output_dir = "data/"
output_path = output_dir + "{:%Y%m%d_%H%M%S}.wav".format(datetime.now())
if not os.path.exists(output_dir):
os.makedirs(output_dir)
# save the recorded data as wav file using python `wave` module
wf = wave.open(output_path, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
wf.close()
这里是使用保存的文件对 tensorflow 模型进行 运行 推理的代码。它只是简单地将其读取为二进制文件,然后模型将处理其余部分。
import classifier # my tensorflow model
with open(output_path, 'rb') as f:
w = f.read()
classifier.run_graph(w, labels, 5)
问题
出于实时需求,我需要持续播放音频并将其输入模型一次。但是一直把文件保存在磁盘上,然后一遍又一遍地读取,这似乎是不合理的,这样会浪费时间I/O。
我想将数据保存在内存中直接使用,而不是反复保存和读取。但是pythonwave
模块不支持同时读写(参考here)。
如果我直接提供没有元数据(例如频道、帧率)的数据(可以在保存过程中由 wave
模块添加),如下所示:
w = b''.join(frames)
classifier.run_graph(w, labels, 5)
我会得到如下错误:
2021-04-07 11:05:08.228544: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at decode_wav_op.cc:55 : Invalid argument: Header mismatch: Expected RIFF but found
Traceback (most recent call last):
File "C:\Users\anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1365, in _do_call
return fn(*args)
File "C:\Users\anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1350, in _run_fn
target_list, run_metadata)
File "C:\Users\anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Header mismatch: Expected RIFF but found
这里提供了我正在使用的张量流模型:ML-KWS-for-MCU,希望对您有所帮助。
这是产生错误的代码:(classifier.run_graph()
)
def run_graph(wav_data, labels, num_top_predictions):
"""Runs the audio data through the graph and prints predictions."""
with tf.Session() as sess:
# Feed the audio data as input to the graph.
# predictions will contain a two-dimensional array, where one
# dimension represents the input image count, and the other has
# predictions per class
softmax_tensor = sess.graph.get_tensor_by_name("labels_softmax:0")
predictions, = sess.run(softmax_tensor, {"wav_data:0": wav_data})
# Sort to show labels in order of confidence
top_k = predictions.argsort()[-num_top_predictions:][::-1]
for node_id in top_k:
human_string = labels[node_id]
score = predictions[node_id]
print('%s (score = %.5f)' % (human_string, score))
return 0
您应该可以使用 io.BytesIO 而不是物理文件,它们共享相同的接口,但 BytesIO 仅保存在内存中:
import io
container = io.BytesIO()
wf = wave.open(container, 'wb')
wf.setnchannels(4)
wf.setsampwidth(4)
wf.setframerate(4)
wf.writeframes(b'abcdef')
# Read the data up to this point
container.seek(0)
data_package = container.read()
# add some more data...
wf.writeframes(b'ghijk')
# read the data added since last
container.seek(len(data_package))
data_package = container.read()
这应该允许您在使用 TensorFlow 代码读取多余数据的同时连续将数据流式传输到文件中。
简而言之
有没有办法将原始音频数据(通过PyAudio
模块获得)转换成虚拟文件的形式(可以使用python open()
函数获得),而不用将其保存到磁盘并从磁盘读取?详情如下。
我在做什么
我正在使用 PyAudio
录制音频,然后将其输入张量流模型以进行预测。目前,当我首先将录制的声音作为 .wav
文件保存在磁盘上,然后再次读取它以将其输入模型时,它就可以工作了。下面是记录和保存的代码:
import pyaudio
import wave
CHUNK_LENGTH = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
RECORD_SECONDS = 1
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK_LENGTH)
print("* recording")
frames = [stream.read(RATE * RECORD_SECONDS)] # here is the recorded data, in the form of list of bytes
print("* done recording")
stream.stop_stream()
stream.close()
p.terminate()
获得原始音频数据(变量frames
)后,可以使用python wave
模块保存,如下所示。我们可以看到,在保存的时候,有些meta message必须通过调用wf.setxxx
.
import os
output_dir = "data/"
output_path = output_dir + "{:%Y%m%d_%H%M%S}.wav".format(datetime.now())
if not os.path.exists(output_dir):
os.makedirs(output_dir)
# save the recorded data as wav file using python `wave` module
wf = wave.open(output_path, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
wf.close()
这里是使用保存的文件对 tensorflow 模型进行 运行 推理的代码。它只是简单地将其读取为二进制文件,然后模型将处理其余部分。
import classifier # my tensorflow model
with open(output_path, 'rb') as f:
w = f.read()
classifier.run_graph(w, labels, 5)
问题
出于实时需求,我需要持续播放音频并将其输入模型一次。但是一直把文件保存在磁盘上,然后一遍又一遍地读取,这似乎是不合理的,这样会浪费时间I/O。
我想将数据保存在内存中直接使用,而不是反复保存和读取。但是pythonwave
模块不支持同时读写(参考here)。
如果我直接提供没有元数据(例如频道、帧率)的数据(可以在保存过程中由 wave
模块添加),如下所示:
w = b''.join(frames)
classifier.run_graph(w, labels, 5)
我会得到如下错误:
2021-04-07 11:05:08.228544: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at decode_wav_op.cc:55 : Invalid argument: Header mismatch: Expected RIFF but found
Traceback (most recent call last):
File "C:\Users\anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1365, in _do_call
return fn(*args)
File "C:\Users\anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1350, in _run_fn
target_list, run_metadata)
File "C:\Users\anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Header mismatch: Expected RIFF but found
这里提供了我正在使用的张量流模型:ML-KWS-for-MCU,希望对您有所帮助。
这是产生错误的代码:(classifier.run_graph()
)
def run_graph(wav_data, labels, num_top_predictions):
"""Runs the audio data through the graph and prints predictions."""
with tf.Session() as sess:
# Feed the audio data as input to the graph.
# predictions will contain a two-dimensional array, where one
# dimension represents the input image count, and the other has
# predictions per class
softmax_tensor = sess.graph.get_tensor_by_name("labels_softmax:0")
predictions, = sess.run(softmax_tensor, {"wav_data:0": wav_data})
# Sort to show labels in order of confidence
top_k = predictions.argsort()[-num_top_predictions:][::-1]
for node_id in top_k:
human_string = labels[node_id]
score = predictions[node_id]
print('%s (score = %.5f)' % (human_string, score))
return 0
您应该可以使用 io.BytesIO 而不是物理文件,它们共享相同的接口,但 BytesIO 仅保存在内存中:
import io
container = io.BytesIO()
wf = wave.open(container, 'wb')
wf.setnchannels(4)
wf.setsampwidth(4)
wf.setframerate(4)
wf.writeframes(b'abcdef')
# Read the data up to this point
container.seek(0)
data_package = container.read()
# add some more data...
wf.writeframes(b'ghijk')
# read the data added since last
container.seek(len(data_package))
data_package = container.read()
这应该允许您在使用 TensorFlow 代码读取多余数据的同时连续将数据流式传输到文件中。