在一定持续时间内重复修改声音音调的算法和包

Question

我想使用现有的音频文件创建一个音频文件，通过它我可以为文件的不同持续时间修改音频的音调。就像如果文件是 36 秒，那么我想用某个值修改第 1 2 秒的音高，然后从第 6 秒到第 9 秒修改一些其他值，依此类推。

基本上，我试图根据消息中的每个字符 k、i、l、b...我采用了一个存储不同持续时间的数组，就像我有 26 个字母 a、b、c、d... 的 table 等等。基于这些持续时间，我正在尝试修改这些特定持续时间的文件。问题是我对音频的实际操作不是很好，我什至在 Java 中尝试过同样的操作，但无法做到。

是否有一些其他参数可以在音频文件中更改而不会使更改太明显？

我指的是这些值，虽然代码在 Java 但忽略它。我稍后会在 Python 中对其进行转换。值以毫秒为单位。

public static void convertMsgToAudio(String msg){

        int len = msg.length();
        duration = new double[len];
        msg = msg.toUpperCase();
        System.out.println("Msg 2 : " + msg);

        int i;
        //char ch;
        for(i=0;i<msg.length();i++){

            if(msg.charAt(i) == 'A'){
                duration[i] = 50000;
            }
            else if (msg.charAt(i) == 'B'){
                duration[i] = 100000; // value in milliseconds 
            }
            else if (msg.charAt(i) == 'C'){
                duration[i] = 150000;
            }
            else if (msg.charAt(i) == 'D'){
                duration[i] = 200000;               
            }
            else if (msg.charAt(i) == 'E'){
                duration[i] = 250000;
            }
            else if (msg.charAt(i) == 'F'){
                duration[i] = 300000;
            }
            else if (msg.charAt(i) == 'G'){
                duration[i] = 350000;
            }
            else if (msg.charAt(i) == 'H'){
                duration[i] = 400000;
            }
            else if (msg.charAt(i) == 'I'){
                duration[i] = 450000;
            }
            else if (msg.charAt(i) == 'J'){
                duration[i] = 500000;
            }
            else if (msg.charAt(i) == 'K'){
                duration[i] = 550000;
            }
            else if (msg.charAt(i) == 'L'){
                duration[i] = 600000;
            }
            else if (msg.charAt(i) == 'M'){
                duration[i] = 650000;
            }
            else if (msg.charAt(i) == 'N'){
                duration[i] = 700000;
            }
            else if (msg.charAt(i) == 'O'){
                duration[i] = 750000;
            }
            else if (msg.charAt(i) == 'P'){
                duration[i] = 800000;
            }
            else if (msg.charAt(i) == 'Q'){
                duration[i] = 850000;
            }
            else if (msg.charAt(i) == 'R'){
                duration[i] = 900000;
            }
            else if (msg.charAt(i) == 'S'){
                duration[i] = 950000;
            }
            else if (msg.charAt(i) == 'T'){
                duration[i] = 1000000;
            }
            else if (msg.charAt(i) == 'U'){
                duration[i] = 1100000;
            }
            else if (msg.charAt(i) == 'V'){
                duration[i] = 1200000;
            }
            else if (msg.charAt(i) == 'W'){
                duration[i] = 1300000;
            }
            else if (msg.charAt(i) == 'X'){
                duration[i] = 1400000;
            }
            else if (msg.charAt(i) == 'Y'){
                duration[i] = 1500000;
            }
            else if (msg.charAt(i) == 'Z'){
                duration[i] = 1600000;
            }

        }

    }

现在，我正尝试在 Python 中做同样的事情。我对这个概念很陌生，但这是我第一次遇到这个概念的问题。

Answer 1

一种简单的方法是直接处理原始 PCM 数据；在这种格式中，音频数据只是一系列 -32768...32767 值，每个条目存储为 2 个字节（假设 16 位有符号，单声道），并定期采样（例如 44100Hz）。

要改变音高，您可以 "read" 更快地获取此数据，例如在 45000Hz 或 43000Hz，这很容易通过重采样程序完成。例如

 import struct
 data = open("pcm.raw", "rb").read()
 parsed = struct.unpack("%ih" % (len(data)//2), data)
 # Here parsed is an array of numbers

 pos = 0.0     # position in the source file
 speed = 1.0   # current read speed = original sampling speed
 result = []

 while pos < len(parsed)-1:
     # Compute a new sample (linear interpolation)
     ip = int(pos)
     v = int(parsed[ip] + (pos - ip)*(parsed[ip+1] - parsed[ip]))
     result.append(v)

     pos += speed     # Next position
     speed += 0.0001  # raise the pitch

 # write the result to disk
 open("out.raw", "wb").write(struct.pack("%ih" % len(result)), result)

这是解决该问题的一种非常非常简单的方法，但请注意，例如增加间距会缩短长度，为了避免这种情况，需要比插值更复杂的数学运算。

例如，我使用这种方法将一首歌曲的长度提高一个音调（我想看看这是否很明显……事实并非如此）。

在一定持续时间内重复修改声音音调的算法和包

Algorithm and package to modify the pitch of the sound for certain durations repeatedly

python

audio

pitch

pitch-shifting