去除歌曲人声中的噪音 python

Question

我正在尝试使用深度学习模型从歌曲中分离人声。输出没有错，但是一些额外的噪音导致信号听起来不好。

以下是输出文件中存在噪声的3秒（长方形区域为噪声）：

如何从我的输出文件中去除这些噪音？我可以看到这些部分的振幅与我想要的歌曲的其他部分不同。有没有一种方法可以根据这些振幅过滤信号，并只允许我的信号中存在特定的振幅范围？

谢谢

更新： 请查看已接受的答案和我按预期工作的去噪算法代码！

Answer 1

'How can I remove these noises from my output file? 您可以 'window' 它（将信号的这些部分与阶跃函数相乘，例如噪声为 0.001，信号为 1）。这将使嘈杂的区域静音，并保持您感兴趣的区域。然而，它不是通用的 - 并且仅适用于 pre-specified 音频段，因为 window 将被修复。

I can see that these parts have a different amplitude than the other parts of the songs I want. is there a way to filter the signal based on these amplitudes and only allow a specific amplitude range to exist in my signal

这里您可以使用两种方法 1) running-window 计算能量（X^{2} 在 N 个样本上的总和，其中 X 是您的音频信号）或 2) 为您的信号生成希尔伯特包络，并使用适当长度的 window 平滑包络（可能长 1-100 毫秒）。您可以根据能量或希尔伯特包络设置阈值。

Answer 2

我使用了已接受的答案建议并创建了以下算法，该算法使用希尔伯特包络并在没有人声的情况下对歌曲的某些部分进行降噪。

def hilbert_metrics(signal):
    '''this calculates the amplitude envelope of the audio and returns it'''
    analytic_signal = sp.signal.hilbert(signal)
    amplitude_envelope = np.abs(analytic_signal)
    instantaneous_phase = np.unwrap(np.angle(analytic_signal))
    instantaneous_frequency = (np.diff(instantaneous_phase) /
                              (2.0*np.pi) * 44100)
    instantaneous_frequency += np.max(instantaneous_frequency)
    return amplitude_envelope, instantaneous_frequency


def denoise(wav_file_handler, hop_length:int=1024, window_length_in_second:float=0.5, threshold_softness:float=4.0, stat_mode="mean", verbose:int=0)->np.array:
  '''This method runs a window on the wav signal.
  it checks the previous segment and the next segment of the current segment and if those segments have a lower than average amplitude / threshold_softness
  then it mens those areas are probably only noise and therefore the middle segment will also become silence
  This method is effective as it looks at the local area and search for the noise
  if the segments have a more than average amplitude /threshold_softness then it probably is actual part of the song
  the lower the threshold_softness, the more extreme the noise detection becomes'''
  stat_mode = str.lower(stat_mode)
  assert stat_mode in ["median", "mean", "mode"], print(f"expected 'mean', 'median' or 'mode' for `stat_mode` but received: '{stat_mode}'")

  def amps_reducer_function(amps):
    if stat_mode == "median":
          return np.median(amps)
    elif stat_mode == "mean":
          return np.mean(amps)
    elif stat_mode == "mode":
          return sp.stats.mode(amps)

  wav = np.copy(wav_file_handler.wav_file)
  amp, freq = hilbert_metrics(wav)
  window_length_frames = int(window_length_in_second*wav_file_handler.sample_rate)
  amp_metric = amps_reducer_function(amp)
  threshold = amp_metric/threshold_softness
  muted_segments_count = 0
  for i in range(window_length_frames, len(wav)-window_length_frames, hop_length):
    segment = amp[i: i+window_length_frames]
    previous_segment_stat = amps_reducer_function(amp[i-window_length_frames: i])
    next_segment_stat = amps_reducer_function(amp[i+window_length_frames: i+window_length_frames*2])
    if previous_segment_stat < threshold and next_segment_stat < threshold:
      if verbose: print(f"previous segment stat: {previous_segment_stat}, threshold: {threshold}, next_segment_stat: {next_segment_stat} ")
      muted_segments_count += 1
      segment *= 0.0
      wav[i: i+window_length_frames] = segment
  if verbose: print(f"Denoising completed! muted {muted_segments_count} segments")
  return wav

通过使用不同的阈值或什至使用 low-pass 和 high-pass 过滤器来删除不需要的频率，此方法可以明显改进。

下面是运行在wav信号上应用该方法的例子，可以看到去噪效果：

这是原始信号：

这是使用默认参数的去噪信号：

这是用 threshold_softness = 2 而不是 4 去噪的同一个信号：

这是与前一个相同的去噪算法，但我们使用的是 np.median 而不是 np.mean，这使得运行的方法更快，并给出了类似的结果：

去除歌曲人声中的噪音 python

Remove noise from vocals of a song python

python

audio

audio-processing

librosa