How to change the pitch and tempo of a sound
This article shows how to change the pitch and tempo of a sound.
- Download source code (Windows) - 1.15 MB
- Download source code (Linux) - 1.34 MB
- Download library - 2.98 MB
- Download .NET DLL - 32.2 KB
- Download demo - 117 KB
1. Introduction
Have you encountered such a situation in which the tempo of a sound is too fast, you want to slow it down to make it more clear. Or you just want to joke with your boyfriend by modifying his voice as if it was uttered by a female. This article will show you how to change the pitch and tempo of a sound to achieve the above sound effect.
The article is organized as follows. Sections 2-4 introduce how to use cpct_dll.dll. Sections 5-7 illustrate how to develop cpct_dll.dll. Section 8 shows how to use C# to wrap cpct_dll.dll into CpctDotNet.dll. Section 9 gives a simple demo. Conclusions are drawn in Section 10.
2. Interact with a .wav file
Download CPCT for Windows. You will find WavManipulateDll.dll in the package. It is used to interact with a .wav file. Some important APIs we will use are as follows:
typedef void* HANDLE;
/// API for wav reading
// get wav file by name.
API_AudioManipulate HANDLE getWavInFileByName(const char* filename);
// close reading wav file.
API_AudioManipulate void destroyWavInFile(HANDLE h);
// get sample rate
API_AudioManipulate uint getSampleRate(HANDLE h);
// get Get number of bits per sample, i.e. 8 or 16.
API_AudioManipulate uint getNumBits(HANDLE h);
// get number of audio channels in the file (1=mono, 2=stereo)
API_AudioManipulate uint getNumChannels(HANDLE h);
// Reads audio samples from the WAV file to floating point format, converting
// sample values to range [-1,1]. Reads given number of elements from the file
// or if end-of-file reached, as many elements as are left in the file.
// return Number of elements read from the file.
API_AudioManipulate int readFloat(HANDLE h, float *buffer, int maxElems);
// Check end-of-file.
// return Nonzero if end-of-file reached.
API_AudioManipulate int isFileEnd(HANDLE h);
/// API for wav writing
// save wav file by name
API_AudioManipulate HANDLE saveWavOutFileByName(
const char* fileName,int sampleRate,int bits,int channels);
// close writing wav file
API_AudioManipulate void destroyWavOutFile(HANDLE h);
// Write data to WAV file in floating point format, saturating sample values to range
// [-1,1]. Throws a 'runtime_error' exception if writing to file fails.
API_AudioManipulate void writeFloat(HANDLE h, const float* buffer, int numElems);
3. Change the pitch and tempo of a sound
Download CPCT for Windows. Find
typedef void* HANDLE;
// create cpct-mstftm by default parameters
API_CPCT HANDLE createCpctByDefault();
// create cpct-mstftm by specific parameters
API_CPCT HANDLE createCpctByParams(int winlen, int hoplen, int nit);
// float* data is the input data, datalength
// is the length of data, nChannels is the number of channels
API_CPCT void setData(HANDLE h, const float* data, int datalength, int nChannels);
// set the tempo and pitch
API_CPCT void setParams(HANDLE h, float tempo, float pitch);
// get the output data, datalength is the length
// of data, nChannels is the number of channels
API_CPCT void getData(HANDLE h, float* data, int& datalength);
// destroy the cpct-mstftm instance
API_CPCT void destroyCpct(HANDLE h);
Next, I will given a example to show you how to use cpct_dll.dll.
#define DATA_LENGTH 4096
#define BUFFER_SIZE (DATA_LENGTH * 3)
static void openFile(void** infile, void** outfile, ParseParams *param)
{
*infile = getWavInFileByName(param->getInputFile());
int samplerate = (int)getSampleRate(*infile);
int bits = (int)getNumBits(*infile);
int channels = (int)getNumChannels(*infile);
*outfile = saveWavOutFileByName(param->getOutputFile(),
samplerate, bits, channels);
printf("openFile done!\n");
}
static void process(void* infile, void* outfile,
void* cpct, ParseParams *param)
{
float sampleBuffer[BUFFER_SIZE];
int nSample;
int nChannels;
nChannels = (int)getNumChannels(infile);
while (isFileEnd(infile)==0)
{
int num;
int datalength;
num = readFloat(infile, sampleBuffer, DATA_LENGTH);
nSample = num / nChannels;
setData(cpct, sampleBuffer, DATA_LENGTH, nChannels);
setParams(cpct, param->getTempo(), param->getPitch());
getData(cpct, sampleBuffer, datalength);
writeFloat(outfile , sampleBuffer, datalength);
}
destroyWavInFile(infile);
destroyWavOutFile(outfile);
printf("process done!\n");
}
int main(int numparams, char* params[])
{
void* infile;
void* outfile;
void* cpct = createCpctByParams(512, 256, 5);
try
{
ParseParams *parameter = new ParseParams(numparams, params);
openFile(&infile, &outfile, parameter);
process(infile, outfile, cpct, parameter);
destroyCpct(cpct);
}
catch (const runtime_error &e)
{
fprintf(stderr, "%s\n", e.what());
return -1;
}
printf("Done!!!\n");
return 0;
}
DATA_LENGTH
is the data length in each Wav data reading. When you have built the project, you will get cpct.exe. Copy cpct.exe, cpct_dll.dll, WavManipulateDll.dll, and your .wav file into the same folder. Run commands on the console as follows:
- cpct input.wav output.wav -t:0.5 -> change the tempo of sound faster by 0.5
- cpct input.wav output.wav -p:5 -> change the pitch of sound higher by 5
- cpct input.wav output.wav -t:0.5 -p:5 -> change both tempo and pitch
-t:0.5 means faster tempo, -t:-0.5 means slower tempo, -t limits to [-1,1].
-p:5 means higher pitch, -p:-5 means lower pitch, -p limits to [-12,12].
4. CPCT for Linux
The process on Linux is similar to that on Windows. The substitute for cpct_dll.dll is libCpctDll.so, the substitute for WavManipulateDll.dll is libWavManipulateDll.so.
5. Background knowledge on changing the pitch and tempo of a sound
In this part, I will introduce the background theory of cpct_dll.dll and how to develop cpct_dll.dll using C++. If you want to change a discrete signal sequence like x(n), n= 1,2,..., you may simply modify each x(n) in the time domain. However, in many cases, we need to first transform the original signal into a frequency domain through a Short Time Fourier Transform (STFT), then modify the spectrogram of the signal in the frequency domain, and finally transform the modified signal back to the time domain.
Short time analysis is an important analysis technology in sound processing. Sound signals like human speech, are time varying. However, in a very short time span of about 10-20 millisecond order of magnitude, the parameters of the sound signal can be considered constant. So, we usually cut the sound signal into many small pieces for analyzing and processing. Implementing STFT, we can get a parameter matrix of the original signal x(n), which we usually call spectrogram, that can be considered as a representation of x(n) in the frequency domain. The horizontal axis denotes the lapsing time, the vertical axis denotes a frame of frequency parameters at the right time spot. Daniel W. Griffin illustrated how to estimate a signal from a Modified Short-Time Fourier Transform in his paper "Signal Estimation from Modified Short-Time Fourier Transform, IEEE Transactions on Acoustics, Speech, and Signal Processing".
5.1. How to change the tempo of a sound
Fig. 1 shows how to change the tempo of a sound. L is the length of the analysis window. La is the hop analysis length. Ls is the hop synthesis length. If La > Ls, a faster sound will be produced, while La < Ls, a slower sound will be produced. In Fig. 1, when the tempo process is finished, the length of the sound file will reduce. A faster sound is produced.
5.2. How to change the pitch of a sound
Fig. 2 shows how to change the pitch of a sound. The hop analysis length La is equal to the hop synthesis length. However, the analysis window L is different from the synthesis window. L length sound data is resampled to L' length. If L > L', a higher pitch sound will be produced, while L < L', a lower pitch sound will be produced. In Fig. 2, when the pitch process is finished, the length of the sound file is the same. A higher pitch sound is produced.
6. Design of the CPCT library
All things are defined in the namespace CPCT
. The most important functions are void tsm()
and void pm()
. The principles of void tsm()
and void pm()
are illustrated above. The fast Fourier transform function void fft1(...)
and the resampling function int resample(...)
are from the class aflibFFT
and class aflibConverter
. The two classes are from the free Open Source Audio Library Project (OSALP).
#ifndef CPCT_MSTFTM_H
#define CPCT_MSTFTM_H
namespace CPCT
{
class CPCT_MSTFTM
{
private:
/// input parameters
float* dataInput; // input data
int nDataInput; // length of input data
int nChannels; // number of channels
/// control parameters, default: tempo = 0, pitch = 0;
// used to control the tempo of the audio [-1,1], + means faster, - means slower
float tempo;
// used to control the pitch of the audio [-12,12] + means higher,- means lower
float pitch;
/// output parameters
float* dataOutput; // output data
int nDataOutput; // length of output data
/// processing parameters, initialized by constructor function
int winlen; // length of processing window
int hoplen; // length of hop size, overlap size = winlen - hoplen;
int nit; // times of iteration in mstftm based signal estimation
double *hamwin; // hamming window
private:
/// helper function
// function for sum the elements in the array x
double sum(double* x, int length);
// change float array[-1, 1] to short array[-32768,32767]
void float2short(const double* f, short* s, int numElems);
// change short array[-32768,32767] to float array[-1,1]
void short2float(const short* s, double* f, int numElems);
/// private function
// function for creating hamming window
void hamming(double* win, int length);
// fft function, fft1 is based on class aflibFFT
void fft1(unsigned NumSamples,
int InverseTransform,
const double *RealIn,
const double *ImagIn,
double *RealOut,
double *ImagOut );
// resampling function, resample is based on class aflibConverter
int resample(double factor,
int channels,
int &inCount,
int outCount,
short inArray[],
short outArray[] );
// xSTFTM: magnitude of processing data
// x_res: update in each iteration, it saves the estimation results
// nit: times of iteration
// win: hamming window
// processing data length, length(xSTFTM) = length(x_res) = length(win) = datalength
void recon(const double* xSTFTM,
double* x_res,
int nit,
const double* win,
int datalength);
// function for changing the tempo
void tsm();
// function for changing the pitch
void pm();
void process();
public:
/// public function
CPCT_MSTFTM(void);
~CPCT_MSTFTM(void);
CPCT_MSTFTM(int winlen, int hoplen, int nit);
// float* data is the processed sound data
// int& datalength return the processed data length
void getData(float* data, int& datalength);
// const float* data is the unprocessed data
// int datalength is the unprocessed data length
// int nChannels is the number of channels
void setData(const float* data, int datalength, int nChannels);
// set the float tempo, float pitch
void setParams(float tempo, float pitch);
};
}
#endif
7. Development of cpct_dll.dll
Export functions in the CPCT library using the following method, and we will get cpct_dll.dll.
API_CPCT HANDLE createCpctByDefault()
{
CPCT_MSTFTM *cpct = new CPCT_MSTFTM();
return (HANDLE)cpct;
}
API_CPCT HANDLE createCpctByParams( int winlen, int hoplen, int nit )
{
CPCT_MSTFTM *cpct = new CPCT_MSTFTM(winlen, hoplen, nit);
return (HANDLE)cpct;
}
API_CPCT void setData( HANDLE h, const float* data, int datalength, int nChannels )
{
CPCT_MSTFTM *cpct = (CPCT_MSTFTM*)h;
cpct->setData(data, datalength, nChannels);
}
API_CPCT void setParams( HANDLE h, float tempo, float pitch )
{
CPCT_MSTFTM *cpct = (CPCT_MSTFTM*)h;
cpct->setParams(tempo, pitch);
}
API_CPCT void getData( HANDLE h, float* data, int& datalength )
{
CPCT_MSTFTM *cpct = (CPCT_MSTFTM*)h;
cpct->getData(data, datalength);
}
API_CPCT void destroyCpct( HANDLE h )
{
CPCT_MSTFTM *cpct = (CPCT_MSTFTM*)h;
delete cpct;
}
8. CpctDotNetDll: A C# wrapper for cpct_dll.dll
.NET development is now more and more popular. GUI development is very easy in C#. In this part, I will try to use C# to wrap cpct_dll.dll. I hope it will make it convenient for .NET developers to use the CPCT library.
cpct_dll.dll exposes six functions, including createCpctByDefault()
, createCpctByParams(...)
, setData(...)
, setParams(...)
, getData(...)
, and destroyCpct(...)
. The following code is used to wrap these six APIs:
public class CpctDotNet
{
#region Members
private IntPtr m_handle = IntPtr.Zero;
#endregion
#region Native C++ API Methods
private const string DllName = "cpct_dll.dll";
[DllImport(DllName)]
private static extern IntPtr createCpctByDefault();
[DllImport(DllName)]
private static extern IntPtr createCpctByParams(int winlen, int hoplen, int nit);
[DllImport(DllName)]
private static extern void setData(IntPtr h, [MarshalAs(UnmanagedType.LPArray)]
float[] data, int datalength, int nChannels);
[DllImport(DllName)]
private static extern void setParams(IntPtr h, float tempo, float pitch);
[DllImport(DllName)]
private static extern void getData(IntPtr h,
[MarshalAs(UnmanagedType.LPArray)] float[] data, out int datalength);
[DllImport(DllName)]
private static extern void destroyCpct(IntPtr h);
#endregion
#region C# Wrapper Methods
public void CreateCpctByDefault()
{
m_handle = createCpctByDefault();
}
public void CreateCpctByParams(int winlen, int hoplen, int nit)
{
m_handle = createCpctByParams(winlen, hoplen, nit);
}
public void SetData(float[] data, int datalength, int nChannels)
{
setData(m_handle, data, datalength, nChannels);
}
public void SetParams(float tempo, float pitch)
{
setParams(m_handle, tempo, pitch);
}
public void GetData(float[] data, out int datalength)
{
getData(m_handle, data, out datalength);
}
public void DestroyCpct()
{
destroyCpct(m_handle);
}
#endregion
}
Please pay attention to the wrapping of getData(...)
. The above code shows how to import functions with pointer and reference parameters.
9. A demo for CPCT
CpctDemo.zip contains a simple CPCT demo. Click the button "Start" and speak to your microphone. Then change the pitch, try again.
This simple demo shows you how to build a CPCT application using C#.
CpctDotNet cpct = new CpctDotNet();
...
cpct.CreateCpctByParams(.., ..., ...); // create cpct
cpct.SetData(..., ..., ...); // set the input data
cpct.SetParams(..., ...); // set processing parameters
cpct.GetData(..., ...); // get processed data
...
// use the processed data
...
cpct.DestroyCpct(); // destroy cpct
10. Conclusion
I hope this article can help you understand how to change the pitch and tempo of a sound and brings you fun processing sound signals. CPCT library is used in DAEAPP which is an app you may also be interested in. It can be used to generate digital audio effects e.g. time scale modifciation effect (change tempo), pitch modification (change pitch), stereo effect (3D audio effect), echo effect...
History
- 26th August, 2011: Initial version of the article: how to use cpct_dll.dll.
- 28th August, 2011: Added development details of cpct_dll.dll.
- 19th September, 2011: Added C# wrapper for cpct_dll.dll and a GUI demo for CPCT.