如何改变声音的音高和节奏

Calinyara

4.91/5 (32投票s)

2011年8月25日

LGPL3

6分钟阅读

200116

15017

本文展示了如何改变声音的音调和节奏。

1. 引言

您是否遇到过这样的情况：一个声音的节奏太快，您想将其放慢以便听得更清楚。或者您只是想通过修改您男友的声音，让他听起来像个女人一样来开玩笑。本文将向您展示如何改变声音的音调和节奏，以达到上述的音效。

本文组织如下。第 2-4 部分介绍了如何使用 cpct_dll.dll。第 5-7 部分说明了如何开发 cpct_dll.dll。第 8 部分展示了如何使用 C# 将 cpct_dll.dll 封装成 CpctDotNet.dll。第 9 部分提供了一个简单的演示。结论将在第 10 部分给出。

2. 与 .wav 文件交互

下载适用于 Windows 的 CPCT。您将在包中找到 WavManipulateDll.dll。它用于与 .wav 文件交互。我们将使用的一些重要 API 如下：

typedef void* HANDLE;

/// API for wav reading
// get wav file by name.
API_AudioManipulate HANDLE getWavInFileByName(const char* filename);

// close reading wav file.
API_AudioManipulate void destroyWavInFile(HANDLE h);

// get sample rate
API_AudioManipulate uint getSampleRate(HANDLE h);

// get Get number of bits per sample, i.e. 8 or 16.
API_AudioManipulate uint getNumBits(HANDLE h);

// get number of audio channels in the file (1=mono, 2=stereo)
API_AudioManipulate uint getNumChannels(HANDLE h);

// Reads audio samples from the WAV file to floating point format, converting 
// sample values to range [-1,1]. Reads given number of elements from the file
// or if end-of-file reached, as many elements as are left in the file.
// return Number of elements read from the file.
API_AudioManipulate int readFloat(HANDLE h, float *buffer, int maxElems);

// Check end-of-file.
// return Nonzero if end-of-file reached.
API_AudioManipulate int isFileEnd(HANDLE h);

/// API for wav writing
// save wav file by name
API_AudioManipulate HANDLE saveWavOutFileByName(
    const char* fileName,int sampleRate,int bits,int channels);

// close writing wav file 
API_AudioManipulate void destroyWavOutFile(HANDLE h);

// Write data to WAV file in floating point format, saturating sample values to range
// [-1,1]. Throws a 'runtime_error' exception if writing to file fails.
API_AudioManipulate void writeFloat(HANDLE h, const float* buffer, int numElems);

3. 改变声音的音调和节奏

下载适用于 Windows 的 CPCT。在包中找到 cpct_dll.dll。这是用于改变声音的音调和节奏的核心 DLL。它公开的 API 如下：

typedef void* HANDLE;

// create cpct-mstftm by default parameters
API_CPCT HANDLE createCpctByDefault();

// create cpct-mstftm by specific parameters
API_CPCT HANDLE createCpctByParams(int winlen, int hoplen, int nit);

// float* data is the input data, datalength
// is the length of data, nChannels is the number of channels
API_CPCT void setData(HANDLE h, const float* data, int datalength, int nChannels);

// set the tempo and pitch
API_CPCT void setParams(HANDLE h, float tempo, float pitch);

// get the output data, datalength is the length
// of data, nChannels is the number of channels
API_CPCT void getData(HANDLE h, float* data, int& datalength);

// destroy the cpct-mstftm instance
API_CPCT void destroyCpct(HANDLE h);

接下来，我将通过一个示例向您展示如何使用 cpct_dll.dll。

#define DATA_LENGTH 4096
#define BUFFER_SIZE (DATA_LENGTH * 3)

static void openFile(void** infile, void** outfile, ParseParams *param)
{
    *infile = getWavInFileByName(param->getInputFile());
    int samplerate = (int)getSampleRate(*infile);
    int bits = (int)getNumBits(*infile);
    int channels = (int)getNumChannels(*infile);
    *outfile = saveWavOutFileByName(param->getOutputFile(), 
               samplerate, bits, channels);

    printf("openFile done!\n");
}

static void process(void* infile, void* outfile, 
            void* cpct, ParseParams *param)
{
    float sampleBuffer[BUFFER_SIZE];
    int nSample;
    int nChannels;

    nChannels = (int)getNumChannels(infile);

    while (isFileEnd(infile)==0)
    {
        int num;
        int datalength;
        num = readFloat(infile, sampleBuffer, DATA_LENGTH);
        nSample = num / nChannels;
        
        setData(cpct, sampleBuffer, DATA_LENGTH, nChannels);
        setParams(cpct, param->getTempo(), param->getPitch());
        getData(cpct, sampleBuffer, datalength);

        writeFloat(outfile , sampleBuffer, datalength);
    }
    destroyWavInFile(infile);
    destroyWavOutFile(outfile);

    printf("process done!\n");
}

int main(int numparams, char* params[])
{
    void* infile;
    void* outfile;
    void* cpct = createCpctByParams(512, 256, 5);

    try
    {
        ParseParams *parameter = new ParseParams(numparams, params);
        openFile(&infile, &outfile, parameter);
        process(infile, outfile, cpct, parameter);
        destroyCpct(cpct);
    }
    catch (const runtime_error &e) 
    {
        fprintf(stderr, "%s\n", e.what());
        return -1;
    }
    printf("Done!!!\n");

    return 0;
}

DATA_LENGTH 是每次读取 Wav 数据的数据长度。构建项目后，您将得到 cpct.exe。将 cpct.exe、cpct_dll.dll、WavManipulateDll.dll 和您的 .wav 文件复制到同一个文件夹中。在控制台执行以下命令：

cpct input.wav output.wav -t:0.5 -> 将声音节奏加快 0.5
cpct input.wav output.wav -p:5 -> 将声音音调提高 5
cpct input.wav output.wav -t:0.5 -p:5 -> 同时改变节奏和音调

-t:0.5 表示节奏加快，-t:-0.5 表示节奏放慢，-t 的范围是 [-1,1]。

-p:5 表示音调提高，-p:-5 表示音调降低，-p 的范围是 [-12,12]。

4. 适用于 Linux 的 CPCT

在 Linux 上的过程与 Windows 类似。cpct_dll.dll 的替代品是 libCpctDll.so，WavManipulateDll.dll 的替代品是 libWavManipulateDll.so。

5. 关于改变声音的音调和节奏的背景知识

在本节中，我将介绍 cpct_dll.dll 的背景理论以及如何使用 C++ 开发 cpct_dll.dll。如果您想改变离散信号序列 x(n)，n = 1, 2, ...，您可以简单地在时域中修改每个 x(n)。然而，在许多情况下，我们需要首先通过短时傅里叶变换 (STFT) 将原始信号转换到频域，然后修改频域中的信号频谱图，最后将修改后的信号转换回时域。

短时分析是声音处理中的一项重要分析技术。像人类语音这样的声音信号是时变的。然而，在约 10-20 毫秒量级的一个非常短的时间跨度内，声音信号的参数可以被认为是恒定的。因此，我们通常将声音信号分割成许多小块进行分析和处理。通过实现 STFT，我们可以得到原始信号 x(n) 的参数矩阵，我们通常称之为频谱图，它可以被认为是 x(n) 在频域中的表示。横轴表示经过的时间，纵轴表示在正确时间点的帧参数。Daniel W. Griffin 在他的论文“Signal Estimation from Modified Short-Time Fourier Transform, IEEE Transactions on Acoustics, Speech, and Signal Processing”中阐述了如何从修改后的短时傅里叶变换中估计信号。

5.1. 如何改变声音的节奏

图 1 展示了如何改变声音的节奏。L 是分析窗口的长度。La 是分析的跳数（hop length）。Ls 是合成的跳数（hop length）。如果 La > Ls，会产生节奏更快的声音；如果 La < Ls，会产生节奏更慢的声音。在图 1 中，当节奏处理完成后，声音文件的长度会缩短。会产生节奏更快的声音。

cpct/TSM.PNG

图 1 改变声音节奏的过程

5.2. 如何改变声音的音调

图 2 展示了如何改变声音的音调。分析的跳数 La 等于合成的跳数。但是，分析窗口 L 与合成窗口不同。L 长度的声音数据被重采样到 L' 长度。如果 L > L'，会产生音调更高的声音；如果 L < L'，会产生音调更低的声音。在图 2 中，当音调处理完成后，声音文件的长度保持不变。会产生音调更高的声音。

cpct/PM.png

图 2 改变声音音调的过程

6. CPCT 库的设计

所有内容都定义在命名空间 CPCT 中。最重要的函数是 void tsm() 和 void pm()。上面已经说明了 void tsm() 和 void pm() 的原理。快速傅里叶变换函数 void fft1(...) 和重采样函数 int resample(...) 来自 aflibFFT 类和 aflibConverter 类。这两个类来自免费开源音频库项目 (OSALP)。

#ifndef CPCT_MSTFTM_H
#define CPCT_MSTFTM_H

namespace CPCT
{
    class CPCT_MSTFTM
    {
    private:
/// input parameters
        float* dataInput;    // input data
        int nDataInput;        // length of input data    
        int nChannels;        // number of channels

/// control parameters, default: tempo = 0, pitch = 0;
        // used to control the tempo of the audio [-1,1], + means faster, - means slower
        float tempo;    
        // used to control the pitch of the audio [-12,12] + means higher,- means lower    
        float pitch;        

/// output parameters
        float* dataOutput;    // output data
        int nDataOutput;    // length of output data

/// processing parameters, initialized by constructor function
        int winlen;        // length of processing window
        int hoplen;        // length of hop size, overlap size = winlen - hoplen;
        int nit;        // times of iteration in mstftm based signal estimation
        double *hamwin;    // hamming window

    private:
/// helper function
        // function for sum the elements in the array x
        double sum(double* x, int length);
        // change float array[-1, 1] to short array[-32768,32767]
        void float2short(const double* f, short* s, int numElems);
        // change short array[-32768,32767] to float array[-1,1]
        void short2float(const short* s, double* f, int numElems);

/// private function

        // function for creating hamming window
        void hamming(double* win, int length);
        
        // fft function, fft1 is based on class aflibFFT
        void fft1(unsigned NumSamples,
            int InverseTransform,
            const double   *RealIn,
            const double   *ImagIn,
            double   *RealOut,
            double   *ImagOut );

        // resampling function, resample is based on class aflibConverter
        int resample(double factor, 
            int channels, 
            int &inCount, 
            int outCount, 
            short inArray[], 
            short outArray[] );

        // xSTFTM: magnitude of processing data
        // x_res: update in each iteration, it saves the estimation results
        // nit: times of iteration
        // win: hamming window
        // processing data length, length(xSTFTM) = length(x_res) = length(win) = datalength
        void recon(const double* xSTFTM,
            double* x_res,          
            int nit,                   
            const double* win,    
            int datalength);    

        // function for changing the tempo 
        void tsm();

        // function for changing the pitch
        void pm();
        
        void process();

    public:
/// public function
        CPCT_MSTFTM(void);
        ~CPCT_MSTFTM(void);
        CPCT_MSTFTM(int winlen, int hoplen, int nit);

        // float* data is the processed sound data
        // int& datalength return the processed data length
        void getData(float* data, int& datalength);
        
        // const float* data is the unprocessed data
        // int datalength is the unprocessed data length
        // int nChannels is the number of channels
        void setData(const float* data, int datalength, int nChannels);
        
        // set the float tempo, float pitch
        void setParams(float tempo, float pitch);

    };
    
}

#endif

7. cpct_dll.dll 的开发

使用以下方法导出 CPCT 库中的函数，我们将得到 cpct_dll.dll。

API_CPCT HANDLE createCpctByDefault()
{
    CPCT_MSTFTM *cpct = new CPCT_MSTFTM();
    return (HANDLE)cpct;
}

API_CPCT HANDLE createCpctByParams( int winlen, int hoplen, int nit )
{
    CPCT_MSTFTM *cpct = new CPCT_MSTFTM(winlen, hoplen, nit);
    return (HANDLE)cpct;
}

API_CPCT void setData( HANDLE h, const float* data, int datalength, int nChannels )
{
    CPCT_MSTFTM *cpct = (CPCT_MSTFTM*)h;
    cpct->setData(data, datalength, nChannels);
}

API_CPCT void setParams( HANDLE h, float tempo, float pitch )
{
    CPCT_MSTFTM *cpct = (CPCT_MSTFTM*)h;
    cpct->setParams(tempo, pitch);
}

API_CPCT void getData( HANDLE h, float* data, int& datalength )
{
    CPCT_MSTFTM *cpct = (CPCT_MSTFTM*)h;
    cpct->getData(data, datalength);
}

API_CPCT void destroyCpct( HANDLE h )
{
    CPCT_MSTFTM *cpct = (CPCT_MSTFTM*)h;
    delete cpct;
}

CpctDotNetDll：cpct_dll.dll 的 C# 包装器

.NET 开发现在越来越流行。在 C# 中进行 GUI 开发非常容易。在本节中，我将尝试使用 C# 来包装 cpct_dll.dll。我希望它能为 .NET 开发人员使用 CPCT 库带来便利。

cpct_dll.dll 暴露了六个函数，包括 createCpctByDefault()、createCpctByParams(...)、setData(...)、setParams(...)、getData(...) 和 destroyCpct(...)。下面的代码用于包装这六个 API。

public class CpctDotNet
{
#region Members
    private IntPtr m_handle = IntPtr.Zero;
#endregion

#region Native C++ API Methods
    private const string DllName = "cpct_dll.dll";

    [DllImport(DllName)]
    private static extern IntPtr createCpctByDefault();

    [DllImport(DllName)]
    private static extern IntPtr createCpctByParams(int winlen, int hoplen, int nit);

    [DllImport(DllName)]
    private static extern void setData(IntPtr h, [MarshalAs(UnmanagedType.LPArray)] 
            float[] data, int datalength, int nChannels);

    [DllImport(DllName)]
    private static extern void setParams(IntPtr h, float tempo, float pitch);

    [DllImport(DllName)]
    private static extern void getData(IntPtr h, 
      [MarshalAs(UnmanagedType.LPArray)] float[] data, out int datalength);

    [DllImport(DllName)]
    private static extern void destroyCpct(IntPtr h);

#endregion

#region C# Wrapper Methods

    public void CreateCpctByDefault()
    {
        m_handle = createCpctByDefault();
    }

    public void CreateCpctByParams(int winlen, int hoplen, int nit)
    {
        m_handle = createCpctByParams(winlen, hoplen, nit);
    }

    public void SetData(float[] data, int datalength, int nChannels)
    {
        setData(m_handle, data, datalength, nChannels);
    }

    public void SetParams(float tempo, float pitch)
    {
        setParams(m_handle, tempo, pitch);
    }

    public void GetData(float[] data, out int datalength)
    {
        getData(m_handle, data, out datalength);
    }

    public void DestroyCpct()
    {
        destroyCpct(m_handle);
    }
#endregion
}

请注意 getData(...) 的包装。上面的代码展示了如何导入带有指针和引用参数的函数。

9. CPCT 的演示

cpct/CpctDemo1.png cpct/CpctDemo2.png

CpctDemo.zip 包含一个简单的 CPCT 演示。单击“开始”按钮，然后对着麦克风说话。然后改变音调，再次尝试。

这个简单的演示向您展示了如何使用 C# 构建 CPCT 应用程序。

CpctDotNet cpct = new CpctDotNet();
...
cpct.CreateCpctByParams(.., ..., ...); // create cpct
cpct.SetData(..., ..., ...); // set the input data
cpct.SetParams(..., ...); // set processing parameters
cpct.GetData(..., ...); // get processed data
...
// use the processed data
...
cpct.DestroyCpct(); // destroy cpct

10. 结论

我希望本文能帮助您理解如何改变声音的音调和节奏，并为您的声音信号处理带来乐趣。CPCT 库已用于 DAEAPP，这是一款您可能也感兴趣的应用。它可以用于生成数字音频效果，例如时间伸缩效果（改变节奏）、音调修改（改变音调）、立体声效果（3D 音频效果）、回声效果……

历史

2011 年 8 月 26 日：文章初稿：如何使用 cpct_dll.dll。
2011 年 8 月 28 日：添加了 cpct_dll.dll 的开发细节。
2011 年 9 月 19 日：为 cpct_dll.dll 添加了 C# 包装器和 CPCT 的 GUI 演示。