在Unity中，如何根据响度从麦克风中分割用户的声音？

Question

我需要从连续的音频流中收集语音片段。我需要稍后处理用户刚刚说过的语音片段（不是用于语音识别）。我关注的只是基于响度的声音分割。

如果至少沉默1秒后，他的声音变大了一会儿，然后又沉默了至少1秒，我说这是一个句子，应该在这里分段。

我只知道我可以从 Microphone.Start() 创建的 AudioClip 中获取原始音频数据。我想写一些这样的代码：

void Start()
{
    audio = Microphone.Start(deviceName, true, 10, 16000);
}

void Update()
{
    audio.GetData(fdata, 0);
    for(int i = 0; i < fdata.Length; i++) {
        u16data[i] = Convert.ToUInt16(fdata[i] * 65535);
    }
    // ... Process u16data
}

但我不确定的是：

我调用audio.GetData(fdata, 0)的每一帧，如果fdata足够大，我得到的是最近10秒的声音数据，如果[=14]，我得到的是最近10秒的声音数据=] 不够大，是吗？
fdata是一个float数组，我需要的是一个16kHz，16bit的PCM buffer。像这样转换数据是否正确：u16data[i] = fdata[i] * 65535?
在 fdata 中检测响亮时刻和安静时刻的正确方法是什么？

Answer 1

没有。您必须使用 Microphone.GetPosition
从 AudioClip 中的当前位置开始阅读

Get the position in samples of the recording.

并将获取的索引传递给AudioClip.GetData

Use the offsetSamples parameter to start the read from a specific position in the clip
```
fdata = new float[clip.samples * clip.channels];

var currentIndex = Microphone.GetPosition(null);
audio.GetData(fdata, currentIndex);
```
我不明白你把它转换成什么。 fdata 将包含

floats ranging from -1.0f to 1.0f (AudioClip.GetData)

所以如果出于某种原因你需要在 short.MinValue (= -32768) and short.MaxValue(= 32767) 之间获取值而不是你可以使用
```
u16data[i] = Convert.ToUInt16(fdata[i] * short.MaxValue);
```
但是请注意 Convert.ToUInt16(float):

value, rounded to the nearest 16-bit unsigned integer. If value is halfway between two whole numbers, the even number is returned; that is, 4.5 is converted to 4, and 5.5 is converted to 6.

您可能希望先使用 Mathf.RoundToInt 来四舍五入，例如4.5.
```
u16data[i] = Convert.ToUInt16(Mathf.RoundToInt(fdata[i] * short.MaxValue));
```
然而，您的命名表明您实际上是在尝试获取无符号值 ushort（或 UInt16）。为此，您可以 not 具有 negative 值！因此，您必须向上移动浮点值，以便在通过 ushort.MaxValue(= 65535)
```
u16data[i] = Convert.ToUInt16(Mathf.RoundToInt(fdata[i] + 1) / 2 * ushort.MaxValue);
```
您从AudioClip.GetData收到的是-1.0f和1.0f之间音轨的增益值。

所以 "loud" 时刻将是
```
Mathf.Abs(fdata[i]) >= aCertainLoudThreshold;
```
一个"silent"时刻将是
```
Mathf.Abs(fdata[i]) <= aCertainSiltenThreshold;
```

其中 aCertainSiltenThreshold 可能例如是 0.2f 和 aCertainLoudThreshold 可能例如是 0.8f.

在Unity中，如何根据响度从麦克风中分割用户的声音？

In Unity, how to segment the user's voice from microphone based on loudness?

audio-recording

unity3d