如何从 pydub AudioSegment 创建一个 numpy 数组?
How to create a numpy array from a pydub AudioSegment?
我知道以下问题:
我的问题正好相反。如果我有一个 pydub AudioSegment,我怎样才能将它转换成一个 numpy 数组?
我想使用 scipy 过滤器等。
我不是很清楚AudioSegment原始数据的内部结构是什么。
Pydub 有一个获取 audio data as an array of samples 的工具,它是一个 array.array
实例(不是 numpy 数组)但你应该能够相对容易地将它转换为 numpy 数组:
from pydub import AudioSegment
sound = AudioSegment.from_file("sound1.wav")
# this is an array
samples = sound.get_array_of_samples()
尽管如此,您也许能够创建实现的 numpy 变体。该方法的实现非常简单:
def get_array_of_samples(self):
"""
returns the raw_data as an array of samples
"""
return array.array(self.array_type, self._data)
也可以从(修改过的?)样本数组创建新的音频片段:
new_sound = sound._spawn(samples)
上面有点老套,它是为 AudioSegment 内部使用而编写的 class,但它主要只是弄清楚您正在使用的音频数据类型(样本数组、样本列表、字节、字节串等)。尽管有下划线前缀,但仍可安全使用。
您可以从 AudioSegment
得到 array.array
,然后将其转换为 numpy.ndarray
:
from pydub import AudioSegment
import numpy as np
song = AudioSegment.from_mp3('song.mp3')
samples = song.get_array_of_samples()
samples = np.array(samples)
None 的现有答案是完美的,他们错过了重塑和样本宽度。我已经编写了这个函数,它有助于将音频转换为 np:
中的标准音频表示形式
def pydub_to_np(audio: pydub.AudioSegment) -> (np.ndarray, int):
"""
Converts pydub audio segment into np.float32 of shape [duration_in_seconds*sample_rate, channels],
where each value is in range [-1.0, 1.0].
Returns tuple (audio_np_array, sample_rate).
"""
return np.array(audio.get_array_of_samples(), dtype=np.float32).reshape((-1, audio.channels)) / (
1 << (8 * audio.sample_width - 1)), audio.frame_rate
get_array_of_samples(未在 [ReadTheDocs.AudioSegment]: audiosegment module 上找到)return 是一个 1 维数组 ,并且效果不佳,因为它丢失了有关音频流的信息(帧、通道...)
几天前,我 运行 遇到了这个问题,因为我使用了 [PyPI]: sounddevice (期望 numpy.ndarray)播放声音(我需要在不同的输出音频设备上播放)。这是我想出的。
code00.py:
#!/usr/bin/env python
import sys
from pprint import pprint as pp
import numpy as np
import pydub
import sounddevice as sd
def audio_file_to_np_array(file_name):
asg = pydub.AudioSegment.from_file(file_name)
dtype = getattr(np, "int{:d}".format(asg.sample_width * 8)) # Or could create a mapping: {1: np.int8, 2: np.int16, 4: np.int32, 8: np.int64}
arr = np.ndarray((int(asg.frame_count()), asg.channels), buffer=asg.raw_data, dtype=dtype)
print("\n", asg.frame_rate, arr.shape, arr.dtype, arr.size, len(asg.raw_data), len(asg.get_array_of_samples())) # @TODO: Comment this line!!!
return arr, asg.frame_rate
def main(*argv):
pp(sd.query_devices()) # @TODO: Comment this line!!!
a, fr = audio_file_to_np_array("./test00.mp3")
dvc = 5 # Index of an OUTPUT device (from sd.query_devices() on YOUR machine)
#sd.default.device = dvc # Change default OUTPUT device
sd.play(a, samplerate=fr)
sd.wait()
if __name__ == "__main__":
print("Python {:s} {:03d}bit on {:s}\n".format(" ".join(elem.strip() for elem in sys.version.split("\n")),
64 if sys.maxsize > 0x100000000 else 32, sys.platform))
rc = main(*sys.argv[1:])
print("\nDone.")
sys.exit(rc)
输出:
[cfati@CFATI-5510-0:e:\Work\Dev\Whosebug\q038015319]> set PATH=%PATH%;f:\Install\pc064\FFMPEG\FFMPEG.3.1\bin
[cfati@CFATI-5510-0:e:\Work\Dev\Whosebug\q038015319]> dir /b
code00.py
test00.mp3
[cfati@CFATI-5510-0:e:\Work\Dev\Whosebug\q038015319]> "e:\Work\Dev\VEnvs\py_pc064_03.09.01_test0\Scripts\python.exe" code00.py
Python 3.9.1 (tags/v3.9.1:1e5d33e, Dec 7 2020, 17:08:21) [MSC v.1927 64 bit (AMD64)] 064bit on win32
0 Microsoft Sound Mapper - Input, MME (2 in, 0 out)
> 1 Microphone (Logitech USB Headse, MME (2 in, 0 out)
2 Microphone (Realtek Audio), MME (2 in, 0 out)
3 Microsoft Sound Mapper - Output, MME (0 in, 2 out)
< 4 Speakers (Logitech USB Headset), MME (0 in, 2 out)
5 Speakers / Headphones (Realtek , MME (0 in, 2 out)
6 Primary Sound Capture Driver, Windows DirectSound (2 in, 0 out)
7 Microphone (Logitech USB Headset), Windows DirectSound (2 in, 0 out)
8 Microphone (Realtek Audio), Windows DirectSound (2 in, 0 out)
9 Primary Sound Driver, Windows DirectSound (0 in, 2 out)
10 Speakers (Logitech USB Headset), Windows DirectSound (0 in, 2 out)
11 Speakers / Headphones (Realtek Audio), Windows DirectSound (0 in, 2 out)
12 Realtek ASIO, ASIO (2 in, 2 out)
13 Speakers (Logitech USB Headset), Windows WASAPI (0 in, 2 out)
14 Speakers / Headphones (Realtek Audio), Windows WASAPI (0 in, 2 out)
15 Microphone (Logitech USB Headset), Windows WASAPI (1 in, 0 out)
16 Microphone (Realtek Audio), Windows WASAPI (2 in, 0 out)
17 Microphone (Realtek HD Audio Mic input), Windows WDM-KS (2 in, 0 out)
18 Speakers (Realtek HD Audio output), Windows WDM-KS (0 in, 2 out)
19 Stereo Mix (Realtek HD Audio Stereo input), Windows WDM-KS (2 in, 0 out)
20 Microphone (Logitech USB Headset), Windows WDM-KS (1 in, 0 out)
21 Speakers (Logitech USB Headset), Windows WDM-KS (0 in, 2 out)
44100 (82191, 2) int16 164382 328764 164382
--- (Manually inserted line) Sound is playing :) ---
Done.
备注:
- 如上所示,没有硬编码的值(就维度而言,dtype,...)
- 我还需要 return 采样率(因为它不能嵌入数组中),并且设备需要它(在这种情况下它是默认值 44.1k - 但我已经测试过具有该值一半的文件)
- 所有现有答案都使用float来表示样本。这对我不起作用,因为对于大多数测试文件来说,采样率是 16bit 长,并且 np.float16 不受支持(通过我的 FPU),所以我不得不使用 int
- 附带说明一下,在对各种文件进行测试时,.m4a 无法在我的 Win 笔记本电脑上播放 SoundDevice(很可能是因为 32k 采样率),但是 PyDub 能够
我知道以下问题:
我的问题正好相反。如果我有一个 pydub AudioSegment,我怎样才能将它转换成一个 numpy 数组?
我想使用 scipy 过滤器等。 我不是很清楚AudioSegment原始数据的内部结构是什么。
Pydub 有一个获取 audio data as an array of samples 的工具,它是一个 array.array
实例(不是 numpy 数组)但你应该能够相对容易地将它转换为 numpy 数组:
from pydub import AudioSegment
sound = AudioSegment.from_file("sound1.wav")
# this is an array
samples = sound.get_array_of_samples()
尽管如此,您也许能够创建实现的 numpy 变体。该方法的实现非常简单:
def get_array_of_samples(self):
"""
returns the raw_data as an array of samples
"""
return array.array(self.array_type, self._data)
也可以从(修改过的?)样本数组创建新的音频片段:
new_sound = sound._spawn(samples)
上面有点老套,它是为 AudioSegment 内部使用而编写的 class,但它主要只是弄清楚您正在使用的音频数据类型(样本数组、样本列表、字节、字节串等)。尽管有下划线前缀,但仍可安全使用。
您可以从 AudioSegment
得到 array.array
,然后将其转换为 numpy.ndarray
:
from pydub import AudioSegment
import numpy as np
song = AudioSegment.from_mp3('song.mp3')
samples = song.get_array_of_samples()
samples = np.array(samples)
None 的现有答案是完美的,他们错过了重塑和样本宽度。我已经编写了这个函数,它有助于将音频转换为 np:
中的标准音频表示形式def pydub_to_np(audio: pydub.AudioSegment) -> (np.ndarray, int):
"""
Converts pydub audio segment into np.float32 of shape [duration_in_seconds*sample_rate, channels],
where each value is in range [-1.0, 1.0].
Returns tuple (audio_np_array, sample_rate).
"""
return np.array(audio.get_array_of_samples(), dtype=np.float32).reshape((-1, audio.channels)) / (
1 << (8 * audio.sample_width - 1)), audio.frame_rate
get_array_of_samples(未在 [ReadTheDocs.AudioSegment]: audiosegment module 上找到)return 是一个 1 维数组 ,并且效果不佳,因为它丢失了有关音频流的信息(帧、通道...)
几天前,我 运行 遇到了这个问题,因为我使用了 [PyPI]: sounddevice (期望 numpy.ndarray)播放声音(我需要在不同的输出音频设备上播放)。这是我想出的。
code00.py:
#!/usr/bin/env python
import sys
from pprint import pprint as pp
import numpy as np
import pydub
import sounddevice as sd
def audio_file_to_np_array(file_name):
asg = pydub.AudioSegment.from_file(file_name)
dtype = getattr(np, "int{:d}".format(asg.sample_width * 8)) # Or could create a mapping: {1: np.int8, 2: np.int16, 4: np.int32, 8: np.int64}
arr = np.ndarray((int(asg.frame_count()), asg.channels), buffer=asg.raw_data, dtype=dtype)
print("\n", asg.frame_rate, arr.shape, arr.dtype, arr.size, len(asg.raw_data), len(asg.get_array_of_samples())) # @TODO: Comment this line!!!
return arr, asg.frame_rate
def main(*argv):
pp(sd.query_devices()) # @TODO: Comment this line!!!
a, fr = audio_file_to_np_array("./test00.mp3")
dvc = 5 # Index of an OUTPUT device (from sd.query_devices() on YOUR machine)
#sd.default.device = dvc # Change default OUTPUT device
sd.play(a, samplerate=fr)
sd.wait()
if __name__ == "__main__":
print("Python {:s} {:03d}bit on {:s}\n".format(" ".join(elem.strip() for elem in sys.version.split("\n")),
64 if sys.maxsize > 0x100000000 else 32, sys.platform))
rc = main(*sys.argv[1:])
print("\nDone.")
sys.exit(rc)
输出:
[cfati@CFATI-5510-0:e:\Work\Dev\Whosebug\q038015319]> set PATH=%PATH%;f:\Install\pc064\FFMPEG\FFMPEG.3.1\bin [cfati@CFATI-5510-0:e:\Work\Dev\Whosebug\q038015319]> dir /b code00.py test00.mp3 [cfati@CFATI-5510-0:e:\Work\Dev\Whosebug\q038015319]> "e:\Work\Dev\VEnvs\py_pc064_03.09.01_test0\Scripts\python.exe" code00.py Python 3.9.1 (tags/v3.9.1:1e5d33e, Dec 7 2020, 17:08:21) [MSC v.1927 64 bit (AMD64)] 064bit on win32 0 Microsoft Sound Mapper - Input, MME (2 in, 0 out) > 1 Microphone (Logitech USB Headse, MME (2 in, 0 out) 2 Microphone (Realtek Audio), MME (2 in, 0 out) 3 Microsoft Sound Mapper - Output, MME (0 in, 2 out) < 4 Speakers (Logitech USB Headset), MME (0 in, 2 out) 5 Speakers / Headphones (Realtek , MME (0 in, 2 out) 6 Primary Sound Capture Driver, Windows DirectSound (2 in, 0 out) 7 Microphone (Logitech USB Headset), Windows DirectSound (2 in, 0 out) 8 Microphone (Realtek Audio), Windows DirectSound (2 in, 0 out) 9 Primary Sound Driver, Windows DirectSound (0 in, 2 out) 10 Speakers (Logitech USB Headset), Windows DirectSound (0 in, 2 out) 11 Speakers / Headphones (Realtek Audio), Windows DirectSound (0 in, 2 out) 12 Realtek ASIO, ASIO (2 in, 2 out) 13 Speakers (Logitech USB Headset), Windows WASAPI (0 in, 2 out) 14 Speakers / Headphones (Realtek Audio), Windows WASAPI (0 in, 2 out) 15 Microphone (Logitech USB Headset), Windows WASAPI (1 in, 0 out) 16 Microphone (Realtek Audio), Windows WASAPI (2 in, 0 out) 17 Microphone (Realtek HD Audio Mic input), Windows WDM-KS (2 in, 0 out) 18 Speakers (Realtek HD Audio output), Windows WDM-KS (0 in, 2 out) 19 Stereo Mix (Realtek HD Audio Stereo input), Windows WDM-KS (2 in, 0 out) 20 Microphone (Logitech USB Headset), Windows WDM-KS (1 in, 0 out) 21 Speakers (Logitech USB Headset), Windows WDM-KS (0 in, 2 out) 44100 (82191, 2) int16 164382 328764 164382 --- (Manually inserted line) Sound is playing :) --- Done.
备注:
- 如上所示,没有硬编码的值(就维度而言,dtype,...)
- 我还需要 return 采样率(因为它不能嵌入数组中),并且设备需要它(在这种情况下它是默认值 44.1k - 但我已经测试过具有该值一半的文件)
- 所有现有答案都使用float来表示样本。这对我不起作用,因为对于大多数测试文件来说,采样率是 16bit 长,并且 np.float16 不受支持(通过我的 FPU),所以我不得不使用 int
- 附带说明一下,在对各种文件进行测试时,.m4a 无法在我的 Win 笔记本电脑上播放 SoundDevice(很可能是因为 32k 采样率),但是 PyDub 能够