如何将 MP3 音频文件读入 numpy 数组/将 numpy 数组保存到 MP3？

Question

有没有办法 read/write MP3 音频文件 into/from 具有与 API 与 scipy.io.wavfile.read and scipy.io.wavfile.write 相似的 numpy 数组:

sr, x = wavfile.read('test.wav')
wavfile.write('test2.wav', sr, x)

?

注意：pydub 的 AudioSegment 对象不提供对 numpy 数组的直接访问。

^{PS：我已经读过Importing sound files into Python as NumPy arrays (alternatives to audiolab), tried all the answers, including those which requires to Popen ffmpeg and read the content from stdout pipe, etc. I have also read , etc., and tried the main answers, but there was no simple solution. After spending hours on this, I'm posting it here with "Answer your own question – share your knowledge, Q&A-style". I have also read 但是这并不容易涵盖多通道情况等}

Answer 1

调用 ffmpeg 并手动解析它的 stdout 正如许多关于阅读 MP3 的帖子中所建议的那样是一项乏味的任务（许多极端情况因为可能有不同数量的频道等），所以这是一个使用 pydub 的有效解决方案（您需要先 pip install pydub）。

此代码允许将 MP3 读取到 numpy 数组/将 numpy 数组写入 MP3 文件，其 API 与 scipy.io.wavfile.read/write 类似：

import pydub 
import numpy as np

def read(f, normalized=False):
    """MP3 to numpy array"""
    a = pydub.AudioSegment.from_mp3(f)
    y = np.array(a.get_array_of_samples())
    if a.channels == 2:
        y = y.reshape((-1, 2))
    if normalized:
        return a.frame_rate, np.float32(y) / 2**15
    else:
        return a.frame_rate, y

def write(f, sr, x, normalized=False):
    """numpy array to MP3"""
    channels = 2 if (x.ndim == 2 and x.shape[1] == 2) else 1
    if normalized:  # normalized array - each item should be a float in [-1, 1)
        y = np.int16(x * 2 ** 15)
    else:
        y = np.int16(x)
    song = pydub.AudioSegment(y.tobytes(), frame_rate=sr, sample_width=2, channels=channels)
    song.export(f, format="mp3", bitrate="320k")

备注：

目前只适用于16位的文件（虽然24位的WAV文件很常见，但我很少见到24位的MP3文件...有这个吗？）
normalized=True 允许使用浮点数组（[-1,1) 中的每个项目）

用法示例：

sr, x = read('test.mp3')
print(x)

#[[-225  707]
# [-234  782]
# [-205  755]
# ..., 
# [ 303   89]
# [ 337   69]
# [ 274   89]]

write('out2.mp3', sr, x)

Answer 2

您可以使用 audio2numpy 库。安装

pip install audio2numpy

那么，您的代码将是：

import audio2numpy as a2n
x,sr=a2n.audio_from_file("test.mp3")

对于写作，使用@Basj 的回答

如何将 MP3 音频文件读入 numpy 数组/将 numpy 数组保存到 MP3？

How to read a MP3 audio file into a numpy array / save a numpy array to MP3?

python

audio

mp3

ffmpeg

numpy