如何将 MP3 音频文件读入 numpy 数组/将 numpy 数组保存到 MP3?
How to read a MP3 audio file into a numpy array / save a numpy array to MP3?
有没有办法 read/write MP3 音频文件 into/from 具有与 API 与 scipy.io.wavfile.read and scipy.io.wavfile.write 相似的 numpy
数组:
sr, x = wavfile.read('test.wav')
wavfile.write('test2.wav', sr, x)
?
注意:pydub
的 AudioSegment
对象不提供对 numpy 数组的直接访问。
PS:我已经读过Importing sound files into Python as NumPy arrays (alternatives to audiolab), tried all the answers, including those which requires to Popen
ffmpeg and read the content from stdout pipe, etc. I have also read , etc., and tried the main answers, but there was no simple solution. After spending hours on this, I'm posting it here with "Answer your own question – share your knowledge, Q&A-style". I have also read 但是这并不容易涵盖多通道情况等
调用 ffmpeg
并手动解析它的 stdout
正如许多关于阅读 MP3 的帖子中所建议的那样是一项乏味的任务(许多极端情况因为可能有不同数量的频道等),所以这是一个使用 pydub
的有效解决方案(您需要先 pip install pydub
)。
此代码允许将 MP3 读取到 numpy 数组/将 numpy 数组写入 MP3 文件,其 API 与 scipy.io.wavfile.read/write
类似:
import pydub
import numpy as np
def read(f, normalized=False):
"""MP3 to numpy array"""
a = pydub.AudioSegment.from_mp3(f)
y = np.array(a.get_array_of_samples())
if a.channels == 2:
y = y.reshape((-1, 2))
if normalized:
return a.frame_rate, np.float32(y) / 2**15
else:
return a.frame_rate, y
def write(f, sr, x, normalized=False):
"""numpy array to MP3"""
channels = 2 if (x.ndim == 2 and x.shape[1] == 2) else 1
if normalized: # normalized array - each item should be a float in [-1, 1)
y = np.int16(x * 2 ** 15)
else:
y = np.int16(x)
song = pydub.AudioSegment(y.tobytes(), frame_rate=sr, sample_width=2, channels=channels)
song.export(f, format="mp3", bitrate="320k")
备注:
- 目前只适用于16位的文件(虽然24位的WAV文件很常见,但我很少见到24位的MP3文件...有这个吗?)
normalized=True
允许使用浮点数组([-1,1) 中的每个项目)
用法示例:
sr, x = read('test.mp3')
print(x)
#[[-225 707]
# [-234 782]
# [-205 755]
# ...,
# [ 303 89]
# [ 337 69]
# [ 274 89]]
write('out2.mp3', sr, x)
您可以使用 audio2numpy 库。
安装
pip install audio2numpy
那么,您的代码将是:
import audio2numpy as a2n
x,sr=a2n.audio_from_file("test.mp3")
对于写作,使用@Basj 的回答
有没有办法 read/write MP3 音频文件 into/from 具有与 API 与 scipy.io.wavfile.read and scipy.io.wavfile.write 相似的 numpy
数组:
sr, x = wavfile.read('test.wav')
wavfile.write('test2.wav', sr, x)
?
注意:pydub
的 AudioSegment
对象不提供对 numpy 数组的直接访问。
PS:我已经读过Importing sound files into Python as NumPy arrays (alternatives to audiolab), tried all the answers, including those which requires to Popen
ffmpeg and read the content from stdout pipe, etc. I have also read
调用 ffmpeg
并手动解析它的 stdout
正如许多关于阅读 MP3 的帖子中所建议的那样是一项乏味的任务(许多极端情况因为可能有不同数量的频道等),所以这是一个使用 pydub
的有效解决方案(您需要先 pip install pydub
)。
此代码允许将 MP3 读取到 numpy 数组/将 numpy 数组写入 MP3 文件,其 API 与 scipy.io.wavfile.read/write
类似:
import pydub
import numpy as np
def read(f, normalized=False):
"""MP3 to numpy array"""
a = pydub.AudioSegment.from_mp3(f)
y = np.array(a.get_array_of_samples())
if a.channels == 2:
y = y.reshape((-1, 2))
if normalized:
return a.frame_rate, np.float32(y) / 2**15
else:
return a.frame_rate, y
def write(f, sr, x, normalized=False):
"""numpy array to MP3"""
channels = 2 if (x.ndim == 2 and x.shape[1] == 2) else 1
if normalized: # normalized array - each item should be a float in [-1, 1)
y = np.int16(x * 2 ** 15)
else:
y = np.int16(x)
song = pydub.AudioSegment(y.tobytes(), frame_rate=sr, sample_width=2, channels=channels)
song.export(f, format="mp3", bitrate="320k")
备注:
- 目前只适用于16位的文件(虽然24位的WAV文件很常见,但我很少见到24位的MP3文件...有这个吗?)
normalized=True
允许使用浮点数组([-1,1) 中的每个项目)
用法示例:
sr, x = read('test.mp3')
print(x)
#[[-225 707]
# [-234 782]
# [-205 755]
# ...,
# [ 303 89]
# [ 337 69]
# [ 274 89]]
write('out2.mp3', sr, x)
您可以使用 audio2numpy 库。 安装
pip install audio2numpy
那么,您的代码将是:
import audio2numpy as a2n
x,sr=a2n.audio_from_file("test.mp3")
对于写作,使用@Basj 的回答