在 python 中重叠的音频文件中制作块

Make chunks in Audio files with overlap in python

我想从我的音频文件中制作块,以便在块之间重叠。例如,如果每个块的长度为 4 秒,第一个块从 0 到 4 开始,重叠步长为 1 秒,则第二个块应从 3 到 7.According 到此 如何拼接音频文件(wav 格式)转换为 python 中的 1 秒拼接? ,我使用 pydub 模块和 make_chunks(your_audio_file_object, chunk_length_ms) 方法,但它在块之间没有重叠,只是将音频文件切成固定长度的块。有人对此有想法吗?谢谢

这是一种方法:

import numpy as np
from scipy.io import wavfile

frequency, signal = wavfile.read(path)

slice_length = 4 # in seconds
overlap = 1 # in seconds
slices = np.arange(0, len(signal)/frequency, slice_length-overlap, dtype=np.int)

for start, end in zip(slices[:-1], slices[1:]):
    start_audio = start * frequency
    end_audio = (end + overlap)* frequency
    audio_slice = signal[int(start_audio): int(end_audio)]

本质上,我们做了以下事情:

  1. 加载文件及其对应的频率。为了举例,我假设它是单通道的,多通道它也可以工作,只是更多的代码。
  2. 定义所需的切片长度和重叠。该数组将为我们提供每个音频片段的开始。通过进一步压缩并添加重叠,我们得到了所需的块。

要让自己相信切片有效,请检查此代码段:

slice_length = 4 # in seconds
overlap = 1 # in seconds
slices = np.arange(0, 26, slice_length-overlap, dtype=np.int) # 26 is arbitrary

frequency = 1
for start, end in zip(slices[:-1], slices[1:]):
    start_audio = start * frequency
    end_audio = (end + overlap) * frequency 
    print(start_audio, end_audio)

输出:

0 4
3 7
6 10
9 13
12 16
15 19
18 22
21 25