将音频样本添加到没有剪辑的正确方法是什么
What is correct way to add audio samples into the one without clipping
我生成不同频率的声音样本(sin/saw/triangle 发生器)作为双值数组 [-1...1](1-最大振幅)。我想将所有信号合并为一个。
1) 如果我添加 (combineWithNormalize) 并最终归一化为 [-1...1] - 声音质量很好,但信号太安静了。
2) 如果我添加使用线性(combineWithLinearDynaRangeCompression)或日志(combineWithLnDynaRangeCompression)压缩 - 信号更响亮,但质量是可怕的(金属声音)。
我究竟做错了什么?我想我错过了处理步骤。
从多个源 wav 文件添加音频信号并创建最终文件的一般可接受的算法是什么(像 Yamaha 等合成器使用哪些方法用于此目的)?
附加值:
-我生成的音频(组合两个样本:顶部 - combineWithNormalize,底部 - combineWithLnDynaRangeCompression)。顶部信号安静,但正确。底部 - 声音更大,但更可怕。
audio samples
-Java代码(草稿,未优化):
// add samples and linear normalize to [-1,1]
public static double[] combineWithNormalize( double[]... audio) {
if (audio.length == 0) return null;
if (audio.length == 1) return audio[0];
int maxIdx = 0;
// look for the longest sample
for(double[] arr: audio)
if (arr.length > maxIdx) maxIdx = arr.length;
// add 0 to the end of short samples
for(int i=0; i < audio.length; i++)
if (audio[i].length < maxIdx) audio[i] = Arrays.copyOf(audio[i], maxIdx);
// add all samples to result (+ find absolute max value)
double[] result = new double[maxIdx];
double normalizer = 1.0;
for (int i = 0; i < maxIdx; i++) {
for (int j = 0; j < audio.length; j++)
result[i] += audio[j][i];
double res = Math.abs(result[i]);
if (res > normalizer)
normalizer = res;
}
//normalize rezult
double coeff = 1.0/ normalizer;
if (normalizer !=1.0)
for (int i = 0; i < maxIdx; i++)
result[i] *= coeff;
return result;
}
// add samples and liners compression (all samples must be [-1..1])
public static double[] combineWithLinearDynaRangeCompression(double threshold, double[]... audio) {
if (audio.length == 0 || threshold >= 1 || threshold < 0) return null;
if (audio.length == 1) return audio[1];
int maxIdx = 0;
// look for the longest sample
for(double[] arr: audio)
if (arr.length > maxIdx) maxIdx = arr.length;
// add 0 to the end of short samples
for(int i=0; i < audio.length; i++)
if (audio[i].length < maxIdx) audio[i] = Arrays.copyOf(audio[i], maxIdx);
double[] result = Arrays.copyOf(audio[0], maxIdx); // Copy first sample
double linearCoeff = (1-threshold)/(2-threshold);
// Add all samples to first + compression
for (int j = 0; j < maxIdx; j++) {
double res = 0;
for (int i = 1; i < audio.length; i++)
res = result[j] + audio[i][j];
double absRes = Math.abs(res);
result[j] = (absRes <= threshold) ? res : Math.signum(res) * (threshold + linearCoeff * (absRes - threshold));
}
return result;
}
// add samples and log compression (all samples must be [-1..1])
public static double[] combineWithLnDynaRangeCompression(double threshold, double[]... audio) {
if (audio.length == 0 || threshold >= 1 || threshold < 0) return null;
if (audio.length == 1) return audio[0];
int maxIdx = 0;
// look for the longest sample
for(double[] arr: audio)
if (arr.length > maxIdx) maxIdx = arr.length;
// add 0 to the end of short samples
for(int i=0; i < audio.length; i++)
if (audio[i].length < maxIdx) audio[i] = Arrays.copyOf(audio[i], maxIdx);
double[] result = Arrays.copyOf(audio[0], maxIdx); // Copy first sample
double expCoeff = alphaT[(int) threshold*100];
// Add all samples to first + compression
for (int j = 0; j < maxIdx; j++) {
double res = 0;
for (int i = 1; i < audio.length; i++)
res = result[j] + audio[i][j];
double absRes = Math.abs(res);
result[j] = (absRes <= threshold) ? res :
Math.signum(res) * (threshold + ( 1 - threshold) *
Math.log(1.0 + expCoeff * (absRes-threshold) /(2-threshold)) / Math.log(1.0 + expCoeff ));
}
return result;
}
// Solutions of equations pow(1+x,1/x)=exp((1-t)/(2-t)) for t=0, 0.01, 0.02 ... 0.99
final private static double[] alphaT =
{
2.51286, 2.54236, 2.57254, 2.60340, 2.63499, 2.66731, 2.70040, 2.73428, 2.76899, 2.80454,
2.84098, 2.87833, 2.91663, 2.95592, 2.99622, 3.03758, 3.08005, 3.12366, 3.16845, 3.21449,
3.26181, 3.31048, 3.36054, 3.41206, 3.46509, 3.51971, 3.57599, 3.63399, 3.69380, 3.75550,
3.81918, 3.88493, 3.95285, 4.02305, 4.09563, 4.17073, 4.24846, 4.32896, 4.41238, 4.49888,
4.58862, 4.68178, 4.77856, 4.87916, 4.98380, 5.09272, 5.20619, 5.32448, 5.44790, 5.57676,
5.71144, 5.85231, 5.99980, 6.15437, 6.31651, 6.48678, 6.66578, 6.85417, 7.05269, 7.26213,
7.48338, 7.71744, 7.96541, 8.22851, 8.50810, 8.80573, 9.12312, 9.46223, 9.82527, 10.21474,
10.63353, 11.08492, 11.57270, 12.10126, 12.67570, 13.30200, 13.98717, 14.73956, 15.56907, 16.48767,
17.50980, 18.65318, 19.93968, 21.39661, 23.05856, 24.96984, 27.18822, 29.79026, 32.87958, 36.59968,
41.15485, 46.84550, 54.13115, 63.74946, 76.95930, 96.08797, 125.93570, 178.12403, 289.19889, 655.12084
};
提前致谢。
我还没有测试过你的代码,但我会分享一些一般性的技巧:
"I generate sound samples at different frequency (sin/saw/triangle generators)"
所以您在某个字节数组中有 PCM 样本。让我们假设 16 位,每个 Short @ [i]
保存 sample @ [i]
的振幅。其中 [i]
是您在样本总数中的位置。
..."As an array of Double
values"...
对于您的数字声音 (PCM),您应该使用 Floats
。您的输入声音是 16 位格式吗?您稍后可以转换为 16 位值整数(或短整数)。
同时检查这个其他答案:
对于问题...
"What is correct way to add audio samples into the one without clipping"
使用 +
进行 adding 有什么问题?
final_sample[i] = ( sourceA[i] /2 ) + ( sourceB[i] /2 ); //divide by 2 to halve amplitudes
我们除以 2 将每个源的振幅减半。这样,即使每个 Source
的样本值(振幅)为 1.0
,在混合过程中它们也会给出 0.5
作为它们的最大值。
混合 final_sample
现在总计为 1.0
。希望没有剪辑。
"1) If I add (combineWithNormalize
) and finally normalize to [-1...1]
:
result: Quality of sound is good, but signal is too silent."
尝试通过乘以样本值来增强信号。例如:signal * 2.0 //double volume
.
PS: 查看这篇文章 + 其他 Stack Exchange 任何想法的答案:
(1) blog: Mix Audio Samples on iOS(试试同样的逻辑,代码很容易理解)。
(2) SO: Modify volume gain on audio sample buffer.
(3) SO: Mixing PCM audio samples.
(4) SO: Algorithm To Mix Sound.
(5) DSP: Algorithm(s) to mix audio signals without clipping.
我生成不同频率的声音样本(sin/saw/triangle 发生器)作为双值数组 [-1...1](1-最大振幅)。我想将所有信号合并为一个。
1) 如果我添加 (combineWithNormalize) 并最终归一化为 [-1...1] - 声音质量很好,但信号太安静了。
2) 如果我添加使用线性(combineWithLinearDynaRangeCompression)或日志(combineWithLnDynaRangeCompression)压缩 - 信号更响亮,但质量是可怕的(金属声音)。 我究竟做错了什么?我想我错过了处理步骤。 从多个源 wav 文件添加音频信号并创建最终文件的一般可接受的算法是什么(像 Yamaha 等合成器使用哪些方法用于此目的)?
附加值:
-我生成的音频(组合两个样本:顶部 - combineWithNormalize,底部 - combineWithLnDynaRangeCompression)。顶部信号安静,但正确。底部 - 声音更大,但更可怕。 audio samples
-Java代码(草稿,未优化):
// add samples and linear normalize to [-1,1]
public static double[] combineWithNormalize( double[]... audio) {
if (audio.length == 0) return null;
if (audio.length == 1) return audio[0];
int maxIdx = 0;
// look for the longest sample
for(double[] arr: audio)
if (arr.length > maxIdx) maxIdx = arr.length;
// add 0 to the end of short samples
for(int i=0; i < audio.length; i++)
if (audio[i].length < maxIdx) audio[i] = Arrays.copyOf(audio[i], maxIdx);
// add all samples to result (+ find absolute max value)
double[] result = new double[maxIdx];
double normalizer = 1.0;
for (int i = 0; i < maxIdx; i++) {
for (int j = 0; j < audio.length; j++)
result[i] += audio[j][i];
double res = Math.abs(result[i]);
if (res > normalizer)
normalizer = res;
}
//normalize rezult
double coeff = 1.0/ normalizer;
if (normalizer !=1.0)
for (int i = 0; i < maxIdx; i++)
result[i] *= coeff;
return result;
}
// add samples and liners compression (all samples must be [-1..1])
public static double[] combineWithLinearDynaRangeCompression(double threshold, double[]... audio) {
if (audio.length == 0 || threshold >= 1 || threshold < 0) return null;
if (audio.length == 1) return audio[1];
int maxIdx = 0;
// look for the longest sample
for(double[] arr: audio)
if (arr.length > maxIdx) maxIdx = arr.length;
// add 0 to the end of short samples
for(int i=0; i < audio.length; i++)
if (audio[i].length < maxIdx) audio[i] = Arrays.copyOf(audio[i], maxIdx);
double[] result = Arrays.copyOf(audio[0], maxIdx); // Copy first sample
double linearCoeff = (1-threshold)/(2-threshold);
// Add all samples to first + compression
for (int j = 0; j < maxIdx; j++) {
double res = 0;
for (int i = 1; i < audio.length; i++)
res = result[j] + audio[i][j];
double absRes = Math.abs(res);
result[j] = (absRes <= threshold) ? res : Math.signum(res) * (threshold + linearCoeff * (absRes - threshold));
}
return result;
}
// add samples and log compression (all samples must be [-1..1])
public static double[] combineWithLnDynaRangeCompression(double threshold, double[]... audio) {
if (audio.length == 0 || threshold >= 1 || threshold < 0) return null;
if (audio.length == 1) return audio[0];
int maxIdx = 0;
// look for the longest sample
for(double[] arr: audio)
if (arr.length > maxIdx) maxIdx = arr.length;
// add 0 to the end of short samples
for(int i=0; i < audio.length; i++)
if (audio[i].length < maxIdx) audio[i] = Arrays.copyOf(audio[i], maxIdx);
double[] result = Arrays.copyOf(audio[0], maxIdx); // Copy first sample
double expCoeff = alphaT[(int) threshold*100];
// Add all samples to first + compression
for (int j = 0; j < maxIdx; j++) {
double res = 0;
for (int i = 1; i < audio.length; i++)
res = result[j] + audio[i][j];
double absRes = Math.abs(res);
result[j] = (absRes <= threshold) ? res :
Math.signum(res) * (threshold + ( 1 - threshold) *
Math.log(1.0 + expCoeff * (absRes-threshold) /(2-threshold)) / Math.log(1.0 + expCoeff ));
}
return result;
}
// Solutions of equations pow(1+x,1/x)=exp((1-t)/(2-t)) for t=0, 0.01, 0.02 ... 0.99
final private static double[] alphaT =
{
2.51286, 2.54236, 2.57254, 2.60340, 2.63499, 2.66731, 2.70040, 2.73428, 2.76899, 2.80454,
2.84098, 2.87833, 2.91663, 2.95592, 2.99622, 3.03758, 3.08005, 3.12366, 3.16845, 3.21449,
3.26181, 3.31048, 3.36054, 3.41206, 3.46509, 3.51971, 3.57599, 3.63399, 3.69380, 3.75550,
3.81918, 3.88493, 3.95285, 4.02305, 4.09563, 4.17073, 4.24846, 4.32896, 4.41238, 4.49888,
4.58862, 4.68178, 4.77856, 4.87916, 4.98380, 5.09272, 5.20619, 5.32448, 5.44790, 5.57676,
5.71144, 5.85231, 5.99980, 6.15437, 6.31651, 6.48678, 6.66578, 6.85417, 7.05269, 7.26213,
7.48338, 7.71744, 7.96541, 8.22851, 8.50810, 8.80573, 9.12312, 9.46223, 9.82527, 10.21474,
10.63353, 11.08492, 11.57270, 12.10126, 12.67570, 13.30200, 13.98717, 14.73956, 15.56907, 16.48767,
17.50980, 18.65318, 19.93968, 21.39661, 23.05856, 24.96984, 27.18822, 29.79026, 32.87958, 36.59968,
41.15485, 46.84550, 54.13115, 63.74946, 76.95930, 96.08797, 125.93570, 178.12403, 289.19889, 655.12084
};
提前致谢。
我还没有测试过你的代码,但我会分享一些一般性的技巧:
"I generate sound samples at different frequency (sin/saw/triangle generators)"
所以您在某个字节数组中有 PCM 样本。让我们假设 16 位,每个 Short @ [i]
保存 sample @ [i]
的振幅。其中 [i]
是您在样本总数中的位置。
..."As an array of
Double
values"...
对于您的数字声音 (PCM),您应该使用 Floats
。您的输入声音是 16 位格式吗?您稍后可以转换为 16 位值整数(或短整数)。
同时检查这个其他答案:
对于问题...
"What is correct way to add audio samples into the one without clipping"
使用 +
进行 adding 有什么问题?
final_sample[i] = ( sourceA[i] /2 ) + ( sourceB[i] /2 ); //divide by 2 to halve amplitudes
我们除以 2 将每个源的振幅减半。这样,即使每个 Source
的样本值(振幅)为 1.0
,在混合过程中它们也会给出 0.5
作为它们的最大值。
混合 final_sample
现在总计为 1.0
。希望没有剪辑。
"1) If I add (
combineWithNormalize
) and finally normalize to[-1...1]
:
result: Quality of sound is good, but signal is too silent."
尝试通过乘以样本值来增强信号。例如:signal * 2.0 //double volume
.
PS: 查看这篇文章 + 其他 Stack Exchange 任何想法的答案:
(1) blog: Mix Audio Samples on iOS(试试同样的逻辑,代码很容易理解)。
(2) SO: Modify volume gain on audio sample buffer.
(3) SO: Mixing PCM audio samples.
(4) SO: Algorithm To Mix Sound.
(5) DSP: Algorithm(s) to mix audio signals without clipping.