How to change the pitch and tempo of a sound

Calinyara

4.91/5 (32 votes)

Aug 25, 2011

LGPL3

6 min read

199581

14956

This article shows how to change the pitch and tempo of a sound.

1. Introduction

Have you encountered such a situation in which the tempo of a sound is too fast, you want to slow it down to make it more clear. Or you just want to joke with your boyfriend by modifying his voice as if it was uttered by a female. This article will show you how to change the pitch and tempo of a sound to achieve the above sound effect.

The article is organized as follows. Sections 2-4 introduce how to use cpct_dll.dll. Sections 5-7 illustrate how to develop cpct_dll.dll. Section 8 shows how to use C# to wrap cpct_dll.dll into CpctDotNet.dll. Section 9 gives a simple demo. Conclusions are drawn in Section 10.

2. Interact with a .wav file

Download CPCT for Windows. You will find WavManipulateDll.dll in the package. It is used to interact with a .wav file. Some important APIs we will use are as follows:

typedef void* HANDLE;

/// API for wav reading
// get wav file by name.
API_AudioManipulate HANDLE getWavInFileByName(const char* filename);

// close reading wav file.
API_AudioManipulate void destroyWavInFile(HANDLE h);

// get sample rate
API_AudioManipulate uint getSampleRate(HANDLE h);

// get Get number of bits per sample, i.e. 8 or 16.
API_AudioManipulate uint getNumBits(HANDLE h);

// get number of audio channels in the file (1=mono, 2=stereo)
API_AudioManipulate uint getNumChannels(HANDLE h);

// Reads audio samples from the WAV file to floating point format, converting 
// sample values to range [-1,1]. Reads given number of elements from the file
// or if end-of-file reached, as many elements as are left in the file.
// return Number of elements read from the file.
API_AudioManipulate int readFloat(HANDLE h, float *buffer, int maxElems);

// Check end-of-file.
// return Nonzero if end-of-file reached.
API_AudioManipulate int isFileEnd(HANDLE h);

/// API for wav writing
// save wav file by name
API_AudioManipulate HANDLE saveWavOutFileByName(
    const char* fileName,int sampleRate,int bits,int channels);

// close writing wav file 
API_AudioManipulate void destroyWavOutFile(HANDLE h);

// Write data to WAV file in floating point format, saturating sample values to range
// [-1,1]. Throws a 'runtime_error' exception if writing to file fails.
API_AudioManipulate void writeFloat(HANDLE h, const float* buffer, int numElems);

3. Change the pitch and tempo of a sound

Download CPCT for Windows. Find cpct_dll.dll in the package. The is the core DLL used to change the pitch and tempo of a sound. It exposes APIs as follows:

typedef void* HANDLE;

// create cpct-mstftm by default parameters
API_CPCT HANDLE createCpctByDefault();

// create cpct-mstftm by specific parameters
API_CPCT HANDLE createCpctByParams(int winlen, int hoplen, int nit);

// float* data is the input data, datalength
// is the length of data, nChannels is the number of channels
API_CPCT void setData(HANDLE h, const float* data, int datalength, int nChannels);

// set the tempo and pitch
API_CPCT void setParams(HANDLE h, float tempo, float pitch);

// get the output data, datalength is the length
// of data, nChannels is the number of channels
API_CPCT void getData(HANDLE h, float* data, int& datalength);

// destroy the cpct-mstftm instance
API_CPCT void destroyCpct(HANDLE h);

Next, I will given a example to show you how to use cpct_dll.dll.

#define DATA_LENGTH 4096
#define BUFFER_SIZE (DATA_LENGTH * 3)

static void openFile(void** infile, void** outfile, ParseParams *param)
{
    *infile = getWavInFileByName(param->getInputFile());
    int samplerate = (int)getSampleRate(*infile);
    int bits = (int)getNumBits(*infile);
    int channels = (int)getNumChannels(*infile);
    *outfile = saveWavOutFileByName(param->getOutputFile(), 
               samplerate, bits, channels);

    printf("openFile done!\n");
}

static void process(void* infile, void* outfile, 
            void* cpct, ParseParams *param)
{
    float sampleBuffer[BUFFER_SIZE];
    int nSample;
    int nChannels;

    nChannels = (int)getNumChannels(infile);

    while (isFileEnd(infile)==0)
    {
        int num;
        int datalength;
        num = readFloat(infile, sampleBuffer, DATA_LENGTH);
        nSample = num / nChannels;
        
        setData(cpct, sampleBuffer, DATA_LENGTH, nChannels);
        setParams(cpct, param->getTempo(), param->getPitch());
        getData(cpct, sampleBuffer, datalength);

        writeFloat(outfile , sampleBuffer, datalength);
    }
    destroyWavInFile(infile);
    destroyWavOutFile(outfile);

    printf("process done!\n");
}

int main(int numparams, char* params[])
{
    void* infile;
    void* outfile;
    void* cpct = createCpctByParams(512, 256, 5);

    try
    {
        ParseParams *parameter = new ParseParams(numparams, params);
        openFile(&infile, &outfile, parameter);
        process(infile, outfile, cpct, parameter);
        destroyCpct(cpct);
    }
    catch (const runtime_error &e) 
    {
        fprintf(stderr, "%s\n", e.what());
        return -1;
    }
    printf("Done!!!\n");

    return 0;
}

DATA_LENGTH is the data length in each Wav data reading. When you have built the project, you will get cpct.exe. Copy cpct.exe, cpct_dll.dll, WavManipulateDll.dll, and your .wav file into the same folder. Run commands on the console as follows:

cpct input.wav output.wav -t:0.5 -> change the tempo of sound faster by 0.5
cpct input.wav output.wav -p:5 -> change the pitch of sound higher by 5
cpct input.wav output.wav -t:0.5 -p:5 -> change both tempo and pitch

-t:0.5 means faster tempo, -t:-0.5 means slower tempo, -t limits to [-1,1].

-p:5 means higher pitch, -p:-5 means lower pitch, -p limits to [-12,12].

4. CPCT for Linux

The process on Linux is similar to that on Windows. The substitute for cpct_dll.dll is libCpctDll.so, the substitute for WavManipulateDll.dll is libWavManipulateDll.so.

5. Background knowledge on changing the pitch and tempo of a sound

In this part, I will introduce the background theory of cpct_dll.dll and how to develop cpct_dll.dll using C++. If you want to change a discrete signal sequence like x(n), n= 1,2,..., you may simply modify each x(n) in the time domain. However, in many cases, we need to first transform the original signal into a frequency domain through a Short Time Fourier Transform (STFT), then modify the spectrogram of the signal in the frequency domain, and finally transform the modified signal back to the time domain.

Short time analysis is an important analysis technology in sound processing. Sound signals like human speech, are time varying. However, in a very short time span of about 10-20 millisecond order of magnitude, the parameters of the sound signal can be considered constant. So, we usually cut the sound signal into many small pieces for analyzing and processing. Implementing STFT, we can get a parameter matrix of the original signal x(n), which we usually call spectrogram, that can be considered as a representation of x(n) in the frequency domain. The horizontal axis denotes the lapsing time, the vertical axis denotes a frame of frequency parameters at the right time spot. Daniel W. Griffin illustrated how to estimate a signal from a Modified Short-Time Fourier Transform in his paper "Signal Estimation from Modified Short-Time Fourier Transform, IEEE Transactions on Acoustics, Speech, and Signal Processing".

5.1. How to change the tempo of a sound

Fig. 1 shows how to change the tempo of a sound. L is the length of the analysis window. La is the hop analysis length. Ls is the hop synthesis length. If La > Ls, a faster sound will be produced, while La < Ls, a slower sound will be produced. In Fig. 1, when the tempo process is finished, the length of the sound file will reduce. A faster sound is produced.

cpct/TSM.PNG

Fig.1 The process of changing the tempo of a sound

5.2. How to change the pitch of a sound

Fig. 2 shows how to change the pitch of a sound. The hop analysis length La is equal to the hop synthesis length. However, the analysis window L is different from the synthesis window. L length sound data is resampled to L' length. If L > L', a higher pitch sound will be produced, while L < L', a lower pitch sound will be produced. In Fig. 2, when the pitch process is finished, the length of the sound file is the same. A higher pitch sound is produced.

cpct/PM.png

Fig. 2 The process of changing the pitch of a sound

6. Design of the CPCT library

All things are defined in the namespace CPCT. The most important functions are void tsm() and void pm(). The principles of void tsm() and void pm() are illustrated above. The fast Fourier transform function void fft1(...) and the resampling function int resample(...) are from the class aflibFFT and class aflibConverter. The two classes are from the free Open Source Audio Library Project (OSALP).

#ifndef CPCT_MSTFTM_H
#define CPCT_MSTFTM_H

namespace CPCT
{
    class CPCT_MSTFTM
    {
    private:
/// input parameters
        float* dataInput;    // input data
        int nDataInput;        // length of input data    
        int nChannels;        // number of channels

/// control parameters, default: tempo = 0, pitch = 0;
        // used to control the tempo of the audio [-1,1], + means faster, - means slower
        float tempo;    
        // used to control the pitch of the audio [-12,12] + means higher,- means lower    
        float pitch;        

/// output parameters
        float* dataOutput;    // output data
        int nDataOutput;    // length of output data

/// processing parameters, initialized by constructor function
        int winlen;        // length of processing window
        int hoplen;        // length of hop size, overlap size = winlen - hoplen;
        int nit;        // times of iteration in mstftm based signal estimation
        double *hamwin;    // hamming window

    private:
/// helper function
        // function for sum the elements in the array x
        double sum(double* x, int length);
        // change float array[-1, 1] to short array[-32768,32767]
        void float2short(const double* f, short* s, int numElems);
        // change short array[-32768,32767] to float array[-1,1]
        void short2float(const short* s, double* f, int numElems);

/// private function

        // function for creating hamming window
        void hamming(double* win, int length);
        
        // fft function, fft1 is based on class aflibFFT
        void fft1(unsigned NumSamples,
            int InverseTransform,
            const double   *RealIn,
            const double   *ImagIn,
            double   *RealOut,
            double   *ImagOut );

        // resampling function, resample is based on class aflibConverter
        int resample(double factor, 
            int channels, 
            int &inCount, 
            int outCount, 
            short inArray[], 
            short outArray[] );

        // xSTFTM: magnitude of processing data
        // x_res: update in each iteration, it saves the estimation results
        // nit: times of iteration
        // win: hamming window
        // processing data length, length(xSTFTM) = length(x_res) = length(win) = datalength
        void recon(const double* xSTFTM,
            double* x_res,          
            int nit,                   
            const double* win,    
            int datalength);    

        // function for changing the tempo 
        void tsm();

        // function for changing the pitch
        void pm();
        
        void process();

    public:
/// public function
        CPCT_MSTFTM(void);
        ~CPCT_MSTFTM(void);
        CPCT_MSTFTM(int winlen, int hoplen, int nit);

        // float* data is the processed sound data
        // int& datalength return the processed data length
        void getData(float* data, int& datalength);
        
        // const float* data is the unprocessed data
        // int datalength is the unprocessed data length
        // int nChannels is the number of channels
        void setData(const float* data, int datalength, int nChannels);
        
        // set the float tempo, float pitch
        void setParams(float tempo, float pitch);

    };
    
}

#endif

7. Development of cpct_dll.dll

Export functions in the CPCT library using the following method, and we will get cpct_dll.dll.

API_CPCT HANDLE createCpctByDefault()
{
    CPCT_MSTFTM *cpct = new CPCT_MSTFTM();
    return (HANDLE)cpct;
}

API_CPCT HANDLE createCpctByParams( int winlen, int hoplen, int nit )
{
    CPCT_MSTFTM *cpct = new CPCT_MSTFTM(winlen, hoplen, nit);
    return (HANDLE)cpct;
}

API_CPCT void setData( HANDLE h, const float* data, int datalength, int nChannels )
{
    CPCT_MSTFTM *cpct = (CPCT_MSTFTM*)h;
    cpct->setData(data, datalength, nChannels);
}

API_CPCT void setParams( HANDLE h, float tempo, float pitch )
{
    CPCT_MSTFTM *cpct = (CPCT_MSTFTM*)h;
    cpct->setParams(tempo, pitch);
}

API_CPCT void getData( HANDLE h, float* data, int& datalength )
{
    CPCT_MSTFTM *cpct = (CPCT_MSTFTM*)h;
    cpct->getData(data, datalength);
}

API_CPCT void destroyCpct( HANDLE h )
{
    CPCT_MSTFTM *cpct = (CPCT_MSTFTM*)h;
    delete cpct;
}

8. CpctDotNetDll: A C# wrapper for cpct_dll.dll

.NET development is now more and more popular. GUI development is very easy in C#. In this part, I will try to use C# to wrap cpct_dll.dll. I hope it will make it convenient for .NET developers to use the CPCT library.

cpct_dll.dll exposes six functions, including createCpctByDefault(), createCpctByParams(...), setData(...), setParams(...), getData(...), and destroyCpct(...). The following code is used to wrap these six APIs:

public class CpctDotNet
{
#region Members
    private IntPtr m_handle = IntPtr.Zero;
#endregion

#region Native C++ API Methods
    private const string DllName = "cpct_dll.dll";

    [DllImport(DllName)]
    private static extern IntPtr createCpctByDefault();

    [DllImport(DllName)]
    private static extern IntPtr createCpctByParams(int winlen, int hoplen, int nit);

    [DllImport(DllName)]
    private static extern void setData(IntPtr h, [MarshalAs(UnmanagedType.LPArray)] 
            float[] data, int datalength, int nChannels);

    [DllImport(DllName)]
    private static extern void setParams(IntPtr h, float tempo, float pitch);

    [DllImport(DllName)]
    private static extern void getData(IntPtr h, 
      [MarshalAs(UnmanagedType.LPArray)] float[] data, out int datalength);

    [DllImport(DllName)]
    private static extern void destroyCpct(IntPtr h);

#endregion

#region C# Wrapper Methods

    public void CreateCpctByDefault()
    {
        m_handle = createCpctByDefault();
    }

    public void CreateCpctByParams(int winlen, int hoplen, int nit)
    {
        m_handle = createCpctByParams(winlen, hoplen, nit);
    }

    public void SetData(float[] data, int datalength, int nChannels)
    {
        setData(m_handle, data, datalength, nChannels);
    }

    public void SetParams(float tempo, float pitch)
    {
        setParams(m_handle, tempo, pitch);
    }

    public void GetData(float[] data, out int datalength)
    {
        getData(m_handle, data, out datalength);
    }

    public void DestroyCpct()
    {
        destroyCpct(m_handle);
    }
#endregion
}

Please pay attention to the wrapping of getData(...). The above code shows how to import functions with pointer and reference parameters.

9. A demo for CPCT

cpct/CpctDemo1.png cpct/CpctDemo2.png

CpctDemo.zip contains a simple CPCT demo. Click the button "Start" and speak to your microphone. Then change the pitch, try again.

This simple demo shows you how to build a CPCT application using C#.

CpctDotNet cpct = new CpctDotNet();
...
cpct.CreateCpctByParams(.., ..., ...); // create cpct
cpct.SetData(..., ..., ...); // set the input data
cpct.SetParams(..., ...); // set processing parameters
cpct.GetData(..., ...); // get processed data
...
// use the processed data
...
cpct.DestroyCpct(); // destroy cpct

10. Conclusion

I hope this article can help you understand how to change the pitch and tempo of a sound and brings you fun processing sound signals. CPCT library is used in DAEAPP which is an app you may also be interested in. It can be used to generate digital audio effects e.g. time scale modifciation effect (change tempo), pitch modification (change pitch), stereo effect (3D audio effect), echo effect...

History

26^th August, 2011: Initial version of the article: how to use cpct_dll.dll.
28^th August, 2011: Added development details of cpct_dll.dll.
19^th September, 2011: Added C# wrapper for cpct_dll.dll and a GUI demo for CPCT.