Click here to Skip to main content
Licence CPOL
First Posted 27 Jan 2011
Views 4,134
Downloads 210
Bookmarked 3 times

Iterative Fast 1D Forvard DCT

By | 27 Jan 2011 | Article
Algorithm of fast forward DCT in 1D by only iteration and without branching of code

Introduction

Discret cosine’s transformation (DCT) is a well known transformation. It has very wide implementation and is used for compression, processing, e.g., due to real character of coefficients. DCT is used more often than Fourier transform. But for DCT, only one fast algorithm is well known - for 2D processing. For 1D, there is the Byeong Lee’s DCT algorithm. But its algorithm is recursive. Using it is useless for some type of equipments (e.g., GPGPU). For this target, I have created an iterative algorithm, without recursive calling and almost unlimited input data (only power of 2).

Background

For base was chosen Byeong Lee’s DCT, that is shown in the figure. The algorithm is based on Butterfly distribution. This algorithm is used wide for hardware implementation.

Lee_s_algorithm.gif

Using the Code

The developed code is based on two Paths: input with Input Butterfly and output with Output Butterfly.

Massive for keeping data between iterations is one, and allocated from heap by demand. Other massive is used for storing computed cosine’s coefficient. Every iteration divides the Front buffer into two parts: Even and Odd. After copying them into Back, the Front and Back buffers are exchanged.

//
double *Front, *Back, *temp; // Pointers on Front, Back, and temp massives of iteration.
double *Array = new double[N<<1]; // Massive with doubling size. 
double *Curent_Cos = new double[N>>1]; // store for precomputing.

Front = Array; // Front pointed on begin Massive.
step = 0;
Back = Front + N; // Back pointed on center of Massive.
        
for(i = 0; i < N; i++)
{
    Front[i] = block[i]; // Load input Data
}

while(j > 1)// Cycle of iterations Input Butterfly
{
    half_N = half_N >> 1;
    current_PI_half_By_N = PI_half / (double)prev_half_N;
    current_PI_By_N = 0;
    step_Phase = current_PI_half_By_N * 2.0;
    for(i = 0; i < half_N; i++)
    {
        //Precompute Cosine's coefficients
        Curent_Cos[i] = 0.5 / cos(current_PI_By_N + current_PI_half_By_N);
        current_PI_By_N += step_Phase;
    }
    shif_Block = 0;
    for(int x = 0; x < step; x++)
    {    
        for(i= 0; i<half_N; i++)
        {
            shift = shif_Block + prev_half_N - i - 1;
            Back[shif_Block + i] = Front[shif_Block + i] + Front[shift];

            Back[shif_Block + half_N + i] = 
              (Front[shif_Block + i] - Front[shift]) * Curent_Cos[i];
        }
        shif_Block += prev_half_N;
    }
    temp = Front;
    Front = Back;
    Back = temp;
    j = j >> 1;
    step = step << 1;
    prev_half_N = half_N;
}      

while(j < N) // Cycle of Out ButterFly
{
    shif_Block = 0;
    for(int x = 0; x < step; x++)
    {     
        for(i = 0; i<half_N - 1; i++)
        {
            Back[shif_Block + (i << 1)] = Front[shif_Block + i];    
            Back[shif_Block + (i << 1) + 1] = 
               Front[shif_Block + half_N + i] + Front[shif_Block + half_N + i + 1];    
        }
        Back[shif_Block + ((half_N - 1) << 1)] = Front[shif_Block + half_N - 1]; 
        Back[shif_Block + (half_N << 1) - 1] = Front[shif_Block + (half_N << 1) - 1]; 
        shif_Block += prev_half_N;
    }
    temp = Front;
    Front = Back;
    Back = temp;
    j = j << 1;
    step = step >> 1;
    half_N = prev_half_N;
    prev_half_N = prev_half_N << 1;
}
for(int i = 0; i < N; i++)
{
    block[i] = Front[i]; // Unload DCT coefficients 
}
block[0] = block[0] / dN; // Compute DC.

Points of Interest

A very important fact is that after the first iteration, we have only one group of odd coefficients, but after the second iteration, we have two groups of odd coefficients. For those two groups, you will need identical cosine's coefficients. Store for precomputing allows to compute that coefficient only for one group. For another group will be used data from store. It's very useful for computing 1024 and more points.

In the end, I can say that the developed algorithm has more precision than the usual implementation (due to less computing of cosine's coefficient).

History

  • 27th Jan, 2011: Initial post

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Evgeny Pereguda

Web Developer

Russian Federation Russian Federation

Member



Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board. (secure sign-in)
 
Search this forum  
 FAQ
    Noise  Layout  Per page   
  Refresh
GeneralMy vote of 5 Pinmemberxfx16:22 8 Mar '12  
GeneralMy vote of 3 PinmemberEmilio Garavaglia9:48 27 Jan '11  
GeneralMy vote of 1 PinmvpRichard MacCutchan6:13 27 Jan '11  
GeneralRe: My vote of 1 PinmemberRugbyLeague6:29 27 Jan '11  
GeneralRe: My vote of 1 PinmvpRichard MacCutchan7:02 27 Jan '11  
GeneralRe: My vote of 1 Pinmemberxfx16:22 8 Mar '12  

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Mobile
Web04 | 2.5.120517.1 | Last Updated 27 Jan 2011
Article Copyright 2011 by Evgeny Pereguda
Everything else Copyright © CodeProject, 1999-2012
Terms of Use
Layout: fixed | fluid