如何使用麦克风作为源来检测音高?
How to detect pitch using mic as source?
如何使用麦克风作为音源来检测音高? (并打印出来)。我看到一些消息来源允许通过 wav 文件进行音高延迟,但我想知道前者是否有办法这样做。
这是我正在使用的基地
import speech_recognition as sr
r = sr.Recognizer()
mic = sr.Microphone()
with mic as source:
r.adjust_for_ambient_noise(source, duration=0.3)
audio = r.listen(source)
transcript = r.recognize_google(audio)
print(transcript)
编辑:具体来说,想要对 male/female 语音进行一般检测。
aubio 具有良好的音调检测方法和 Python 绑定。以下是您可以如何使用它:
import aubio
import numpy as np
samplerate = 44100
tolerance = 0.8
win_s = 4096 // downsample # fft size
hop_s = 512 // downsample # hop size
pitch_o = pitch("yin", win_s, hop_s, samplerate)
pitch_o.set_unit("Hz")
pitch_o.set_tolerance(tolerance)
signal_win = np.array_split(audio, np.arange(hop_s, len(audio), hop_s))
pitch_profile = []
for frame in signal_win[:-1]:
pitch = pitch_o(frame)[0]
if pitch > 0:
pitch_profile.append(pitch)
if pitch_profile:
pitch_array = np.array(pitch_profile)
Q25, Q50, Q75 = np.quantile(pitch_array, [0.25, 0.50, 0.75])
IQR = Q75 - Q25
median = np.median(pitch_array)
pitch_min = pitch_array.min()
pitch_max = pitch_array.max()
显然,您需要获取数组格式的音频。接下来要观察的是,在提供的代码中,我正在计算音高曲线的统计数据。原因是持续时间为 0.3 秒,这比通常考虑用于音调跟踪的样本数要长得多。
其他示例:
- Aubio demo
- Audio Explorer - 我的代码。提供的示例来自该代码。
如何使用麦克风作为音源来检测音高? (并打印出来)。我看到一些消息来源允许通过 wav 文件进行音高延迟,但我想知道前者是否有办法这样做。
这是我正在使用的基地
import speech_recognition as sr
r = sr.Recognizer()
mic = sr.Microphone()
with mic as source:
r.adjust_for_ambient_noise(source, duration=0.3)
audio = r.listen(source)
transcript = r.recognize_google(audio)
print(transcript)
编辑:具体来说,想要对 male/female 语音进行一般检测。
aubio 具有良好的音调检测方法和 Python 绑定。以下是您可以如何使用它:
import aubio
import numpy as np
samplerate = 44100
tolerance = 0.8
win_s = 4096 // downsample # fft size
hop_s = 512 // downsample # hop size
pitch_o = pitch("yin", win_s, hop_s, samplerate)
pitch_o.set_unit("Hz")
pitch_o.set_tolerance(tolerance)
signal_win = np.array_split(audio, np.arange(hop_s, len(audio), hop_s))
pitch_profile = []
for frame in signal_win[:-1]:
pitch = pitch_o(frame)[0]
if pitch > 0:
pitch_profile.append(pitch)
if pitch_profile:
pitch_array = np.array(pitch_profile)
Q25, Q50, Q75 = np.quantile(pitch_array, [0.25, 0.50, 0.75])
IQR = Q75 - Q25
median = np.median(pitch_array)
pitch_min = pitch_array.min()
pitch_max = pitch_array.max()
显然,您需要获取数组格式的音频。接下来要观察的是,在提供的代码中,我正在计算音高曲线的统计数据。原因是持续时间为 0.3 秒,这比通常考虑用于音调跟踪的样本数要长得多。
其他示例:
- Aubio demo
- Audio Explorer - 我的代码。提供的示例来自该代码。