从 WAV 文件解码 DTMF
Decoding DTMF from a WAV file
根据我的 ,我的目标是从 C# 检测 WAV 文件中的 DTMF 音调。但是,我真的很难理解如何做到这一点。
我知道 DTMF 使用频率组合,并且可以使用 Goertzel 算法......不知何故。我抓取了一个 Goertzel 代码片段,并尝试将一个 .WAV 文件放入其中(使用 NAudio 读取该文件,这是一个 8KHz 单声道 16 位 PCM WAV):
using (WaveFileReader reader = new WaveFileReader(@"dtmftest_w.wav"))
{
byte[] buffer = new byte[reader.Length];
int read = reader.Read(buffer, 0, buffer.Length);
short[] sampleBuffer = new short[read/2];
Buffer.BlockCopy(buffer, 0, sampleBuffer, 0, read/2);
Console.WriteLine(CalculateGoertzel(sampleBuffer,8000,16));
}
public static double CalculateGoertzel(short[] sample, double frequency, int samplerate)
{
double Skn, Skn1, Skn2;
Skn = Skn1 = Skn2 = 0;
for (int i = 0; i < sample.Length; i++)
{
Skn2 = Skn1;
Skn1 = Skn;
Skn = 2 * Math.Cos(2 * Math.PI * frequency / samplerate) * Skn1 - Skn2 + sample[i];
}
double WNk = Math.Exp(-2 * Math.PI * frequency / samplerate);
return 20 * Math.Log10(Math.Abs((Skn - WNk * Skn1)));
}
我知道我在做什么是错误的:我假设我应该遍历缓冲区,并且一次只计算一小块的 Goertzel 值 - 这是否正确?
其次,我不太明白 Goertzel 方法的输出告诉我什么:我得到一个双精度值(示例:210.985812
)返回,但我不知道将其等同于音频文件中 DTMF 音调的存在和值。
我到处寻找答案,包括 this answer; unfortunately, the code here doesn't appear to work (as noted in the comments on the site). There is a commercial library offered by TAPIEx 中引用的库;我试过他们的评估库,它完全符合我的需要 - 但他们不回复电子邮件,这让我对实际购买他们的产品持谨慎态度。
我很清楚我正在寻找答案,但也许我不知道确切的问题,但最终我需要的是一种在 .WAV 文件中找到 DTMF 音调的方法。我在正确的路线上吗?如果没有,谁能指出我正确的方向?
编辑:以@Abbondanza 的代码为基础,并基于(可能根本上是错误的)假设我需要滴入音频文件的一小部分,我现在有了这个(非常粗略,证明仅限概念)代码:
const short sampleSize = 160;
using (WaveFileReader reader = new WaveFileReader(@"\mac\home\dtmftest.wav"))
{
byte[] buffer = new byte[reader.Length];
reader.Read(buffer, 0, buffer.Length);
int bufferPos = 0;
while (bufferPos < buffer.Length-(sampleSize*2))
{
short[] sampleBuffer = new short[sampleSize];
Buffer.BlockCopy(buffer, bufferPos, sampleBuffer, 0, sampleSize*2);
var frequencies = new[] {697.0, 770.0, 852.0, 941.0, 1209.0, 1336.0, 1477.0};
var powers = frequencies.Select(f => new
{
Frequency = f,
Power = CalculateGoertzel(sampleBuffer, f, 8000)
});
const double AdjustmentFactor = 1.05;
var adjustedMeanPower = AdjustmentFactor*powers.Average(result => result.Power);
var sortedPowers = powers.OrderByDescending(result => result.Power);
var highestPowers = sortedPowers.Take(2).ToList();
float seconds = bufferPos / (float)16000;
if (highestPowers.All(result => result.Power > adjustedMeanPower))
{
// Use highestPowers[0].Frequency and highestPowers[1].Frequency to
// classify the detected DTMF tone.
switch (Convert.ToInt32(highestPowers[0].Frequency))
{
case 1209:
switch (Convert.ToInt32(highestPowers[1].Frequency))
{
case 697:
Console.WriteLine("1 pressed at " + bufferPos + " (" + seconds + "s)");
break;
case 770:
Console.WriteLine("4 pressed at " + bufferPos + " (" + seconds + "s)");
break;
case 852:
Console.WriteLine("7 pressed at " + bufferPos + " (" + seconds + "s)");
break;
case 941:
Console.WriteLine("* pressed at " + bufferPos);
break;
}
break;
case 1336:
switch (Convert.ToInt32(highestPowers[1].Frequency))
{
case 697:
Console.WriteLine("2 pressed at " + bufferPos + " (" + seconds + "s)");
break;
case 770:
Console.WriteLine("5 pressed at " + bufferPos + " (" + seconds + "s)");
break;
case 852:
Console.WriteLine("8 pressed at " + bufferPos + " (" + seconds + "s)");
break;
case 941:
Console.WriteLine("0 pressed at " + bufferPos + " (" + seconds + "s)");
break;
}
break;
case 1477:
switch (Convert.ToInt32(highestPowers[1].Frequency))
{
case 697:
Console.WriteLine("3 pressed at " + bufferPos + " (" + seconds + "s)");
break;
case 770:
Console.WriteLine("6 pressed at " + bufferPos + " (" + seconds + "s)");
break;
case 852:
Console.WriteLine("9 pressed at " + bufferPos + " (" + seconds + "s)");
break;
case 941:
Console.WriteLine("# pressed at " + bufferPos + " (" + seconds + "s)");
break;
}
break;
}
}
else
{
Console.WriteLine("No DTMF at " + bufferPos + " (" + seconds + "s)");
}
bufferPos = bufferPos + (sampleSize*2);
}
这是在 Audacity 中查看的示例文件;我在按下的 DTMF 按键中添加了-
并且...它几乎有效。从上面的文件中,直到几乎正好 3 秒后我才应该看到任何 DTMF,但是,我的代码报告:
9 pressed at 1920 (0.12s)
1 pressed at 2880 (0.18s)
* pressed at 3200
1 pressed at 5120 (0.32s)
1 pressed at 5440 (0.34s)
7 pressed at 5760 (0.36s)
7 pressed at 6080 (0.38s)
7 pressed at 6720 (0.42s)
5 pressed at 7040 (0.44s)
7 pressed at 7360 (0.46s)
7 pressed at 7680 (0.48s)
1 pressed at 8000 (0.5s)
7 pressed at 8320 (0.52s)
... 直到达到 3 秒,然后它开始确定正确答案:1
被按下:
7 pressed at 40000 (2.5s)
# pressed at 43840 (2.74s)
No DTMF at 44800 (2.8s)
1 pressed at 45120 (2.82s)
1 pressed at 45440 (2.84s)
1 pressed at 46080 (2.88s)
1 pressed at 46720 (2.92s)
4 pressed at 47040 (2.94s)
1 pressed at 47360 (2.96s)
1 pressed at 47680 (2.98s)
1 pressed at 48000 (3s)
1 pressed at 48960 (3.06s)
4 pressed at 49600 (3.1s)
1 pressed at 49920 (3.12s)
1 pressed at 50560 (3.16s)
1 pressed at 51520 (3.22s)
1 pressed at 52160 (3.26s)
4 pressed at 52480 (3.28s)
如果我将 AdjustmentFactor
提高到 1.2 以上,我几乎无法检测到。
我感觉到我快到了,但是有人能看出我错过了什么吗?
EDIT2:上面的测试文件可用here。上例中的adjustedMeanPower
为47.6660450354638
,幂为:
CalculateGoertzel()
returns 所提供样本中所选频率的 功率 。
计算每个 DTMF 频率(697、770、852、941、1209、1336 和 1477 Hz)的功率,对结果功率进行排序并选择最高的两个。如果两者都高于某个阈值,则检测到 DTMF 音。
用作阈值的值取决于样本的信噪比 (SNR)。首先,计算所有 Goerzel 值的平均值、将平均值乘以一个因子(例如 2 或 3)并检查两个最高 Goerzel 值是否高于该值应该就足够了。
这里是一个代码片段,以更正式的方式表达我的意思:
var frequencies = new[] {697.0, 770.0, 852.0, 941.0, 1209.0, 1336.0, 1477.0};
var powers = frequencies.Select(f => new
{
Frequency = f,
Power = CalculateGoerzel(sample, f, samplerate)
});
const double AdjustmentFactor = 1.0;
var adjustedMeanPower = AdjustmentFactor * powers.Average(result => result.Power);
var sortedPowers = powers.OrderByDescending(result => result.Power);
var highestPowers = sortedPowers.Take(2).ToList();
if (highestPowers.All(result => result.Power > adjustedMeanPower))
{
// Use highestPowers[0].Frequency and highestPowers[1].Frequency to
// classify the detected DTMF tone.
}
从 1.0
的 AdjustmentFactor
开始。如果您从测试数据中得到误报(即您在不应该有任何 DTMF 音调的样本中检测到 DTMF 音调),请继续增加它直到误报停止。
更新 #1
我在 wave 文件上试过你的代码并调整了一些东西:
我在 Goertzel 计算后实现了枚举(对性能很重要):
var powers = frequencies.Select(f => new
{
Frequency = f,
Power = CalculateGoertzel(sampleBuffer, f, 8000)
// Materialize enumerable to avoid multiple calculations.
}).ToList();
我没有使用调整后的均值进行阈值处理。我只是使用 100.0
作为阈值:
if (highestPowers.All(result => result.Power > 100.0))
{
...
}
我把样本量加倍了(我相信你用了160
):
int sampleSize = 160 * 2;
我修复了你的 DTMF 分类。我使用嵌套字典来捕获 所有 种可能的情况:
var phoneKeyOf = new Dictionary<int, Dictionary<int, string>>
{
{1209, new Dictionary<int, string> {{1477, "?"}, {1336, "?"}, {1209, "?"}, {941, "*"}, {852, "7"}, {770, "4"}, {697, "1"}}},
{1336, new Dictionary<int, string> {{1477, "?"}, {1336, "?"}, {1209, "?"}, {941, "0"}, {852, "8"}, {770, "5"}, {697, "2"}}},
{1477, new Dictionary<int, string> {{1477, "?"}, {1336, "?"}, {1209, "?"}, {941, "#"}, {852, "9"}, {770, "6"}, {697, "3"}}},
{ 941, new Dictionary<int, string> {{1477, "#"}, {1336, "0"}, {1209, "*"}, {941, "?"}, {852, "?"}, {770, "?"}, {697, "?"}}},
{ 852, new Dictionary<int, string> {{1477, "9"}, {1336, "8"}, {1209, "7"}, {941, "?"}, {852, "?"}, {770, "?"}, {697, "?"}}},
{ 770, new Dictionary<int, string> {{1477, "6"}, {1336, "5"}, {1209, "4"}, {941, "?"}, {852, "?"}, {770, "?"}, {697, "?"}}},
{ 697, new Dictionary<int, string> {{1477, "3"}, {1336, "2"}, {1209, "1"}, {941, "?"}, {852, "?"}, {770, "?"}, {697, "?"}}}
}
phone 密钥然后通过以下方式检索:
var key = phoneKeyOf[(int)highestPowers[0].Frequency][(int)highestPowers[1].Frequency];
结果并不完美,但有些可靠。
更新 #2
我想我已经解决了问题,但现在无法亲自尝试。您不能将目标频率直接传递给 CalculateGoertzel()
。必须对其进行归一化以在 DFT bin 上居中。在计算幂时尝试这种方法:
var powers = frequencies.Select(f => new
{
Frequency = f,
// Pass normalized frequenzy
Power = CalculateGoertzel(sampleBuffer, Math.Round(f*sampleSize/8000.0), 8000)
}).ToList();
另外你必须使用 205
作为 sampleSize
以最小化错误。
更新 #3
我重新编写了原型以使用 NAudio 的 ISampleProvider
接口,该接口 returns 标准化样本值(float
s 在 [-1.0; 1.0] 范围内)。我还从头开始重写了 CalculateGoertzel()
。它仍然没有优化性能,但在频率之间提供了非常非常明显的功率差异。当我 运行 它是您的测试数据时,没有 没有 误报。我强烈建议您看一看:http://pastebin.com/serxw5nG
更新 #4
我创建了一个 GitHub project and two NuGet packages 来检测实时(捕获的)音频和预录音频文件中的 DTMF 音调。
根据我的
我知道 DTMF 使用频率组合,并且可以使用 Goertzel 算法......不知何故。我抓取了一个 Goertzel 代码片段,并尝试将一个 .WAV 文件放入其中(使用 NAudio 读取该文件,这是一个 8KHz 单声道 16 位 PCM WAV):
using (WaveFileReader reader = new WaveFileReader(@"dtmftest_w.wav"))
{
byte[] buffer = new byte[reader.Length];
int read = reader.Read(buffer, 0, buffer.Length);
short[] sampleBuffer = new short[read/2];
Buffer.BlockCopy(buffer, 0, sampleBuffer, 0, read/2);
Console.WriteLine(CalculateGoertzel(sampleBuffer,8000,16));
}
public static double CalculateGoertzel(short[] sample, double frequency, int samplerate)
{
double Skn, Skn1, Skn2;
Skn = Skn1 = Skn2 = 0;
for (int i = 0; i < sample.Length; i++)
{
Skn2 = Skn1;
Skn1 = Skn;
Skn = 2 * Math.Cos(2 * Math.PI * frequency / samplerate) * Skn1 - Skn2 + sample[i];
}
double WNk = Math.Exp(-2 * Math.PI * frequency / samplerate);
return 20 * Math.Log10(Math.Abs((Skn - WNk * Skn1)));
}
我知道我在做什么是错误的:我假设我应该遍历缓冲区,并且一次只计算一小块的 Goertzel 值 - 这是否正确?
其次,我不太明白 Goertzel 方法的输出告诉我什么:我得到一个双精度值(示例:210.985812
)返回,但我不知道将其等同于音频文件中 DTMF 音调的存在和值。
我到处寻找答案,包括 this answer; unfortunately, the code here doesn't appear to work (as noted in the comments on the site). There is a commercial library offered by TAPIEx 中引用的库;我试过他们的评估库,它完全符合我的需要 - 但他们不回复电子邮件,这让我对实际购买他们的产品持谨慎态度。
我很清楚我正在寻找答案,但也许我不知道确切的问题,但最终我需要的是一种在 .WAV 文件中找到 DTMF 音调的方法。我在正确的路线上吗?如果没有,谁能指出我正确的方向?
编辑:以@Abbondanza 的代码为基础,并基于(可能根本上是错误的)假设我需要滴入音频文件的一小部分,我现在有了这个(非常粗略,证明仅限概念)代码:
const short sampleSize = 160;
using (WaveFileReader reader = new WaveFileReader(@"\mac\home\dtmftest.wav"))
{
byte[] buffer = new byte[reader.Length];
reader.Read(buffer, 0, buffer.Length);
int bufferPos = 0;
while (bufferPos < buffer.Length-(sampleSize*2))
{
short[] sampleBuffer = new short[sampleSize];
Buffer.BlockCopy(buffer, bufferPos, sampleBuffer, 0, sampleSize*2);
var frequencies = new[] {697.0, 770.0, 852.0, 941.0, 1209.0, 1336.0, 1477.0};
var powers = frequencies.Select(f => new
{
Frequency = f,
Power = CalculateGoertzel(sampleBuffer, f, 8000)
});
const double AdjustmentFactor = 1.05;
var adjustedMeanPower = AdjustmentFactor*powers.Average(result => result.Power);
var sortedPowers = powers.OrderByDescending(result => result.Power);
var highestPowers = sortedPowers.Take(2).ToList();
float seconds = bufferPos / (float)16000;
if (highestPowers.All(result => result.Power > adjustedMeanPower))
{
// Use highestPowers[0].Frequency and highestPowers[1].Frequency to
// classify the detected DTMF tone.
switch (Convert.ToInt32(highestPowers[0].Frequency))
{
case 1209:
switch (Convert.ToInt32(highestPowers[1].Frequency))
{
case 697:
Console.WriteLine("1 pressed at " + bufferPos + " (" + seconds + "s)");
break;
case 770:
Console.WriteLine("4 pressed at " + bufferPos + " (" + seconds + "s)");
break;
case 852:
Console.WriteLine("7 pressed at " + bufferPos + " (" + seconds + "s)");
break;
case 941:
Console.WriteLine("* pressed at " + bufferPos);
break;
}
break;
case 1336:
switch (Convert.ToInt32(highestPowers[1].Frequency))
{
case 697:
Console.WriteLine("2 pressed at " + bufferPos + " (" + seconds + "s)");
break;
case 770:
Console.WriteLine("5 pressed at " + bufferPos + " (" + seconds + "s)");
break;
case 852:
Console.WriteLine("8 pressed at " + bufferPos + " (" + seconds + "s)");
break;
case 941:
Console.WriteLine("0 pressed at " + bufferPos + " (" + seconds + "s)");
break;
}
break;
case 1477:
switch (Convert.ToInt32(highestPowers[1].Frequency))
{
case 697:
Console.WriteLine("3 pressed at " + bufferPos + " (" + seconds + "s)");
break;
case 770:
Console.WriteLine("6 pressed at " + bufferPos + " (" + seconds + "s)");
break;
case 852:
Console.WriteLine("9 pressed at " + bufferPos + " (" + seconds + "s)");
break;
case 941:
Console.WriteLine("# pressed at " + bufferPos + " (" + seconds + "s)");
break;
}
break;
}
}
else
{
Console.WriteLine("No DTMF at " + bufferPos + " (" + seconds + "s)");
}
bufferPos = bufferPos + (sampleSize*2);
}
这是在 Audacity 中查看的示例文件;我在按下的 DTMF 按键中添加了-
并且...它几乎有效。从上面的文件中,直到几乎正好 3 秒后我才应该看到任何 DTMF,但是,我的代码报告:
9 pressed at 1920 (0.12s)
1 pressed at 2880 (0.18s)
* pressed at 3200
1 pressed at 5120 (0.32s)
1 pressed at 5440 (0.34s)
7 pressed at 5760 (0.36s)
7 pressed at 6080 (0.38s)
7 pressed at 6720 (0.42s)
5 pressed at 7040 (0.44s)
7 pressed at 7360 (0.46s)
7 pressed at 7680 (0.48s)
1 pressed at 8000 (0.5s)
7 pressed at 8320 (0.52s)
... 直到达到 3 秒,然后它开始确定正确答案:1
被按下:
7 pressed at 40000 (2.5s)
# pressed at 43840 (2.74s)
No DTMF at 44800 (2.8s)
1 pressed at 45120 (2.82s)
1 pressed at 45440 (2.84s)
1 pressed at 46080 (2.88s)
1 pressed at 46720 (2.92s)
4 pressed at 47040 (2.94s)
1 pressed at 47360 (2.96s)
1 pressed at 47680 (2.98s)
1 pressed at 48000 (3s)
1 pressed at 48960 (3.06s)
4 pressed at 49600 (3.1s)
1 pressed at 49920 (3.12s)
1 pressed at 50560 (3.16s)
1 pressed at 51520 (3.22s)
1 pressed at 52160 (3.26s)
4 pressed at 52480 (3.28s)
如果我将 AdjustmentFactor
提高到 1.2 以上,我几乎无法检测到。
我感觉到我快到了,但是有人能看出我错过了什么吗?
EDIT2:上面的测试文件可用here。上例中的adjustedMeanPower
为47.6660450354638
,幂为:
CalculateGoertzel()
returns 所提供样本中所选频率的 功率 。
计算每个 DTMF 频率(697、770、852、941、1209、1336 和 1477 Hz)的功率,对结果功率进行排序并选择最高的两个。如果两者都高于某个阈值,则检测到 DTMF 音。
用作阈值的值取决于样本的信噪比 (SNR)。首先,计算所有 Goerzel 值的平均值、将平均值乘以一个因子(例如 2 或 3)并检查两个最高 Goerzel 值是否高于该值应该就足够了。
这里是一个代码片段,以更正式的方式表达我的意思:
var frequencies = new[] {697.0, 770.0, 852.0, 941.0, 1209.0, 1336.0, 1477.0};
var powers = frequencies.Select(f => new
{
Frequency = f,
Power = CalculateGoerzel(sample, f, samplerate)
});
const double AdjustmentFactor = 1.0;
var adjustedMeanPower = AdjustmentFactor * powers.Average(result => result.Power);
var sortedPowers = powers.OrderByDescending(result => result.Power);
var highestPowers = sortedPowers.Take(2).ToList();
if (highestPowers.All(result => result.Power > adjustedMeanPower))
{
// Use highestPowers[0].Frequency and highestPowers[1].Frequency to
// classify the detected DTMF tone.
}
从 1.0
的 AdjustmentFactor
开始。如果您从测试数据中得到误报(即您在不应该有任何 DTMF 音调的样本中检测到 DTMF 音调),请继续增加它直到误报停止。
更新 #1
我在 wave 文件上试过你的代码并调整了一些东西:
我在 Goertzel 计算后实现了枚举(对性能很重要):
var powers = frequencies.Select(f => new
{
Frequency = f,
Power = CalculateGoertzel(sampleBuffer, f, 8000)
// Materialize enumerable to avoid multiple calculations.
}).ToList();
我没有使用调整后的均值进行阈值处理。我只是使用 100.0
作为阈值:
if (highestPowers.All(result => result.Power > 100.0))
{
...
}
我把样本量加倍了(我相信你用了160
):
int sampleSize = 160 * 2;
我修复了你的 DTMF 分类。我使用嵌套字典来捕获 所有 种可能的情况:
var phoneKeyOf = new Dictionary<int, Dictionary<int, string>>
{
{1209, new Dictionary<int, string> {{1477, "?"}, {1336, "?"}, {1209, "?"}, {941, "*"}, {852, "7"}, {770, "4"}, {697, "1"}}},
{1336, new Dictionary<int, string> {{1477, "?"}, {1336, "?"}, {1209, "?"}, {941, "0"}, {852, "8"}, {770, "5"}, {697, "2"}}},
{1477, new Dictionary<int, string> {{1477, "?"}, {1336, "?"}, {1209, "?"}, {941, "#"}, {852, "9"}, {770, "6"}, {697, "3"}}},
{ 941, new Dictionary<int, string> {{1477, "#"}, {1336, "0"}, {1209, "*"}, {941, "?"}, {852, "?"}, {770, "?"}, {697, "?"}}},
{ 852, new Dictionary<int, string> {{1477, "9"}, {1336, "8"}, {1209, "7"}, {941, "?"}, {852, "?"}, {770, "?"}, {697, "?"}}},
{ 770, new Dictionary<int, string> {{1477, "6"}, {1336, "5"}, {1209, "4"}, {941, "?"}, {852, "?"}, {770, "?"}, {697, "?"}}},
{ 697, new Dictionary<int, string> {{1477, "3"}, {1336, "2"}, {1209, "1"}, {941, "?"}, {852, "?"}, {770, "?"}, {697, "?"}}}
}
phone 密钥然后通过以下方式检索:
var key = phoneKeyOf[(int)highestPowers[0].Frequency][(int)highestPowers[1].Frequency];
结果并不完美,但有些可靠。
更新 #2
我想我已经解决了问题,但现在无法亲自尝试。您不能将目标频率直接传递给 CalculateGoertzel()
。必须对其进行归一化以在 DFT bin 上居中。在计算幂时尝试这种方法:
var powers = frequencies.Select(f => new
{
Frequency = f,
// Pass normalized frequenzy
Power = CalculateGoertzel(sampleBuffer, Math.Round(f*sampleSize/8000.0), 8000)
}).ToList();
另外你必须使用 205
作为 sampleSize
以最小化错误。
更新 #3
我重新编写了原型以使用 NAudio 的 ISampleProvider
接口,该接口 returns 标准化样本值(float
s 在 [-1.0; 1.0] 范围内)。我还从头开始重写了 CalculateGoertzel()
。它仍然没有优化性能,但在频率之间提供了非常非常明显的功率差异。当我 运行 它是您的测试数据时,没有 没有 误报。我强烈建议您看一看:http://pastebin.com/serxw5nG
更新 #4
我创建了一个 GitHub project and two NuGet packages 来检测实时(捕获的)音频和预录音频文件中的 DTMF 音调。