12,507,845 members (56,015 online)
alternative version

652.1K views
122 bookmarked
Posted

# A C++ implementation of Douglas-Peucker Line Approximation Algorithm

, 3 Mar 2003
 Rate this:
DP Line approximation algorithm is a well-known method to approximate 2D lines. It is quite fast, O(nlog_2(n)) for a n-points line and can drastically compress a data curve. Here, a fully OOP implementation is given.

## Introduction

When working with mathematical simulations or engineering problems, it is not unusual to handle curves that contains thousands of points. Usually, displaying all the points is not useful, a number of them will be rendered on the same pixel since the screen precision is finite. Hence, you use a lot of resource for nothing!

This article presents a fast 2D-line approximation algorithm based on the Douglas-Peucker algorithm (see [1]), well-known in the cartography community. It computes a hull, scaled by a tolerance factor, around the curve by choosing a minimum of key points. This algorithm has several advantages:

• It is fast!: For a n point curve, computation time of the approximation is proportional to nlog_2(n),
• You don't need a priori knowledge on the curve,
• The compression ratio can be easily tuned using the tolerance parameter.

The class has been integrated to a plotting library: Plot Graphic Library.

## Douglas-Peucker (DP) Line Simplification Algorithm

The DP line simplification algorithm was originally proposed in [1]. John Hershberger and Jack Snoeyink have implemented it in C in [2] as a package named DPSimp:

DPsimp is a Douglas-Peucker line simplification algorithm implementation by John Hershberger and Jack Snoeyink. They analyze the line simplification algorithm reported by Douglas and Peucker and show that its worst case is quadratic in n, the number of input points. The algorithm is based on path hulls, that uses the geometric structure of the problem to attain a worst-case running time proportional to nlog_2(n), which is the best case of the Douglas algorithm.

The algorithm is based on a recursive construction of path hull, as depicted in the picture below. They did all the hard work (writing the base code in C), I ported it to C++ templates.

## Modifications to DPSimp

DPSimp was using a recursive call in the DP method. This could lead to a stack overflow when the algorithm would go deep in recursion. To avoid this problem an internal stack has been added to the class to mimic the recursive function call without stack overflow.

## Concepts and class design

Let the points denote all the points of the original curve and the keys the points from the original curve that are kept for the approximation.

The idea is that the user provides a container for the points, denoted PointContainer, and for the keys, denoted KeyContainer, and the link between those containers will be the line simplification class, denoted LineApproximator.

How do we build a class hierarchy without restricting ourselves to particular container? In fact, one user might store it's points in vector< pair<T,T>> and another one in 2 separate vector<T>. Of course, the same argument applies to the KeyContainer.

A possible answer is templating. Passing the PointContainer and KeyContainer as template arguments for LineApproximator allows to build the approximation class without specifying the containers, since the class is built at compilation time (We could write interface for those containers but in fact, I'm too lazy for that :) ).

With this idea in mind, here are the specifications of the container:

### PointContainer

Let

• Point a structure, template or class that has x,y of type T as member,
• PointRef a structure, template or class that has x,y of type T& as member,

PointContainer behaves like a vector of Point:

• has clear(), size() and resize() methods,
• has random access iterator,
• const_iterator points to a structure similar to Point,
• iterator points to a structure similar to PointRef
• operator[] const returns a Point,
• operator[] returns a PointRef

A simple example of valid PointContainer is

vector<Point>

However, a hybrid container has been developed to handle the case where x and y are in separate containers (See below).

### KeyContainer

KeyContainer behaves like a list of PointContainer::const_iterator:

• has size(), clear() methods,
• support push_back method

A simple example of valid KeyContainer is

vector<PointContainer::const_iterator>

Again, a hydrid container to handle the case where the keys must be outputted in separate containers is provided.

## Templates

All the classes are templated to support float and double version of the algorithm.

The template TDPHull is the user interface to the DP algorithm. However, it relies on a series of subclasses detailed below:

NameDescriptionUse
TLine<T, TPointContainer, TKeyContainer>2D Line templatePoints and keys
TLineApproximator<T, TPointContainer, TKeyContainer>2D Line approximator base classDefault interface to approximation algorithms
TDPHull<T, TPointContainer, TKeyContainer>Implementing Douglas-Peukler algorithmUser front end
TPathHull<T, TPointContainer, TKeyContainer>Path hullInternal in TDPHull<T>
TPoint<T>A pair of T: x, yTemplate for 2D point TLineApproximator<T>
TPointRef<T>A pair of T&: x, yTemplate for 2D point TLineApproximator<T>
SHomog2D Homogenous point, T x,y,wInternal structure to TLineApproximator<T>

## How to use TDPHull?

In the following examples, we adopt the following notations

using namespace hull; // all classes are in the hull namespace,
using namespace std; // using some STL containers

// defining the point container
typedef vector<TPoint<float>> VectorPointContainer;
// defining the key container
typedef vector<MyPointContainer::const_iterator> ListKeyContainer;
// defining the line approximation class
typedef TDPHull<float, VectorPointContainer, ListKeyContainer> CDPHullF;
CDPHullF dp; // a DPHull object, note that you can also work with doubles

The approximation methods throw exception, so you should always enclose them in a try-catch statement.

### Normalization

The data points are, by default, normalized before approximation. This is in order to reduce numerical errors in the gradients computations. This happens when the data is badly scaled: using big numbers close together will lead to disastrous loss of significant numbers.

However, if you feel confident about your data, you can disable it by using SetNormalization.

### Handling points and keys

Get a reference to the point container and modify it:

// getting reference to container,
TDPHull<float>::PointContainer& pc = dp.GetPoints();
for (UINT i=0;i<pc.size();i++)
{
pc[i].x=...;
pc[i].y=...;
}

If you are using normalization (default behavior), do not forget to re-compute the data bounding box after your changes:

dp.ComputeBoundingBox();

### Approximation tuning

You can control the compression ratio by different ways:

• Setting the tolerance
// Setting tolerance of approximation
dp.SetTol(0.5);
• Setting a compression ratio, an acceptable compression threshold:
// dp will find the tolerance corresponding to 10 % of
// the original points, with 5% of possible error.
try
{
dp.ShrinkNorm(0.1,0.05);
}
catch(TCHAR* str)
{
// catch and process errors throw by dp
// example: output to cerr
cerr<<str<<endl;
}

The method uses dichotomy to accelerate convergence.

• Setting the desired number of points, an acceptable number of points threshold:
// dp will find the tolerance corresponding to 100
// in the approximated curve, with 5 points of possible error.
try
{
dp.Shrink(100,5);
}
catch(TCHAR* str)
{
// catch and process errors throw by dp
...
}

### Simplifaction

The easiest part of the job:

try
{
dp.Simplify();
}
catch(TCHAR* str)
{
// catch and process errors throw by dp
...
}

or by using ShrinkNorm, Shrink methods.

### Accessing the approximated curve

The keys are stored as PointContainer::const_iterator. You can access the key container by using GetKeys:

// getting conts reference to keys
const TDPHull<float>::KeyContainer& kc = dp.GetKeys();
TDPHull<float>::KeyContainer::const_iterator it; // iterator for keys
for (it = kc.begin(); it != kc.end(); it++)
{
// it is an const_iterator pointing to a PointContainer::const_iterator
xi=(*it)->x;
yi=(*it)->y;
}

All you have to do is inherit a class from TLineApproximator and override the function ComputeKeys.

## Hydrid containers

You can implement your own containers for points and keys as long as they meet the requirements.

### Separate containers for x,y

It is not unusual to have x,y stored in separate containers and moreover these containers are not of the same type. To tackle this problem, two wrapper templates have been written: TPointDoubleContainer and TKeyDoubleContainer which serve as an interface between the approximation algorithms and the containers:

CVectorX vX;    // the vector of x coordinates
CVectorY vY;    // the vector of y coordinates

// defining the hydrid container
typedef TPointDoubleContainer<float, CVectorX,
CVectorY> HybridPointContainer;

// a list of key x coordinates
CListX lKeyX;
// a vector of key y coordinates, any container
// that has push_back will do the job :)
CVectorY vKeyY;
// defining the hydrid container
typedef TKeyDoubleContainer<float, CListX, CVectorY> HybridKeyContainer;

// creating approximator
TDPHull< float, HybridPointContainer, HydridKeyContainer> dp;
// attaching point containers
dp.GetPoints().SetContainers( &vX, &vY);
// attaching key containers
dp.GetKeys().SetContainers( &lKeyX, &vKeyY);
// dp is now ready to work

## Using the demo

The demo shows a real time approximation of a curve by different algorithms.

• You can stop/start the animation using the toolbar buttons,
• You can modify the shrinking ration with the scroll bar,
• You can load your own data with the menu, File->Load Data Set. The file must be formatted with a pair x,y per line.

## Using it in your project

Insert the following files in your project and you're done.

LineApproximator.h,DPHull.h, PathHull.h

## Known issues

• The original curve must not self-intersect. This means also that the start and end points must be different, no closed curve !
• Sick dataset and stack overflow: solved. The problem was due to recursion leading to stack overflow. It is solved now.

## Update history

• 04-03-2003
• Got rid of DP recursion by adding an internal function call stack. Hence, the stack overflow problem is solved!!!
• Applied stuff I learned in Effective STL to the classes: using algorithms, functors, etc...
• Changed class T to typename T
• Better floating point comparison using boost::close_at_tolerance
• 15-11-2002
• More and more templating,
• Detecting when curve is closed
• Hybrid containers
• Fixed bug in compute limits
• Added LOTS of ASSERT, so Debug version is significantly slower than release build
• 7-11-2002
• Fixed a bug in the SLimits structure (GetCenterY)
• TPathHull working with iterators rather that SPoint*
• Added exception to handle errors
• Fixed a bug in TDPHull::ComputeKeys. Was using pc.end() rather that pc.end()--
• 6-11-2002
• Added base class TLineApproximator
• Added proposed algorithm by S.Rog: see TKeyFramer, TGlobalKeyFramer, TDispKeyFramer, TVectKeyFramer
• Updated demo
• 3-11-2002
• Added data normalization for better numerical behavior. Avoids the algorithm to crash when using badly conditioned data. Problem submitted by Corey W.
• Templated version
• Got rid of macro and rewrote in more OOP style

## References

1. D. H. Douglas and T. K. Peucker. Algorithms for the reduction of the number of points required to represent a line or its caricature. The Canadian Cartographer, 10(2):112--122, 1973.
2. J. Hershberger and J. Snoeyink. Speeding up the Douglas-Peucker line simplification algorithm. In Proc. 5th Intl. Symp. Spatial Data Handling. IGU Commission on GIS, pages 134--143, 1992. (home page).

A list of licenses authors might use can be found here

## Share

 Engineer United States
Jonathan de Halleux is Civil Engineer in Applied Mathematics. He finished his PhD in 2004 in the rainy country of Belgium. After 2 years in the Common Language Runtime (i.e. .net), he is now working at Microsoft Research on Pex (http://research.microsoft.com/pex).

## You may also be interested in...

 Pro Pro

 First PrevNext
 dp.Shrink - specifying number of points in sampled data rather than tolerance Member 1200651625-Sep-15 6:54 Member 12006516 25-Sep-15 6:54
 My vote of 5 manoj kumar choubey15-Feb-12 23:16 manoj kumar choubey 15-Feb-12 23:16
 Makefile? daviddoria25-Jul-11 3:31 daviddoria 25-Jul-11 3:31
 how to convert a recursive DP algorithm to non-recursive? unravel-the-sky4-May-11 3:19 unravel-the-sky 4-May-11 3:19
 this template version has bugs yhu2213001-Sep-10 3:33 yhu221300 1-Sep-10 3:33
 Re: this template version has bugs Elmar de Koning8-Oct-10 1:17 Elmar de Koning 8-Oct-10 1:17
 Good job stcornflakes19-Dec-07 0:30 stcornflakes 19-Dec-07 0:30
 what is the license Andreas Fabri13-Nov-07 21:58 Andreas Fabri 13-Nov-07 21:58
 collinear points [modified] Romain Michel12-Mar-07 4:33 Romain Michel 12-Mar-07 4:33
 Re: collinear points Ralf Wirtz3-Apr-07 20:34 Ralf Wirtz 3-Apr-07 20:34
 Re: collinear points Romain Michel11-Apr-07 5:01 Romain Michel 11-Apr-07 5:01
 Re: collinear points donkey30124-Aug-09 16:41 donkey301 24-Aug-09 16:41
 BUG?? Maniez5-Mar-07 1:55 Maniez 5-Mar-07 1:55
 Re: BUG?? donkey30124-Aug-09 16:44 donkey301 24-Aug-09 16:44
 It is still crashing after I change the code to (k = m_pPHtag - 2; k > i; --k). Do you know how to fix it? thanks.
 another simple implementation in ANSI C Albertino14-Feb-07 12:04 Albertino 14-Feb-07 12:04
 any plans for 3d? gok25-Jan-06 11:42 gok 25-Jan-06 11:42
 Original C code Reunion18-Jul-05 19:15 Reunion 18-Jul-05 19:15
 Re: Original C code John M. Drescher18-Jul-05 21:09 John M. Drescher 18-Jul-05 21:09
 Re: Original C code Reunion19-Jul-05 4:04 Reunion 19-Jul-05 4:04
 Loop when calling Shrink function Christian Staffe8-Aug-04 21:59 Christian Staffe 8-Aug-04 21:59
 Re: Loop when calling Shrink function Corey W25-Nov-04 11:52 Corey W 25-Nov-04 11:52
 Re: Loop when calling Shrink function Christian Staffe26-Nov-04 9:20 Christian Staffe 26-Nov-04 9:20
 Re: Loop when calling Shrink function Corey W26-Nov-04 17:44 Corey W 26-Nov-04 17:44
 Re: Loop when calling Shrink function Corey W26-Nov-04 17:58 Corey W 26-Nov-04 17:58
 douglas line simplification jehovah10-Jun-04 10:24 jehovah 10-Jun-04 10:24
 How do you pronounce Peucker? Jeff Bogan19-Apr-04 7:55 Jeff Bogan 19-Apr-04 7:55
 Re: How do you pronounce Peucker? Hans Peucker22-Jun-04 3:46 Hans Peucker 22-Jun-04 3:46
 using douglas mih221-Jan-04 23:54 mih2 21-Jan-04 23:54
 problem with dphull mih221-Jan-04 23:44 mih2 21-Jan-04 23:44
 Compiling with VS.Net 2003 Corey W10-Oct-03 11:55 Corey W 10-Oct-03 11:55
 Re: Compiling with VS.Net 2003 centomo25-Nov-04 8:54 centomo 25-Nov-04 8:54
 Re: Compiling with VS.Net 2003 Corey W25-Nov-04 11:49 Corey W 25-Nov-04 11:49
 Re: Compiling with VS.Net 2003 Christian Staffe18-May-05 11:44 Christian Staffe 18-May-05 11:44
 Confused ashnyr25-Sep-03 22:48 ashnyr 25-Sep-03 22:48
 another implementation faraz05-Aug-03 2:00 faraz0 5-Aug-03 2:00
 Re: another implementation Jonathan de Halleux5-Aug-03 5:24 Jonathan de Halleux 5-Aug-03 5:24
 Well done Bitterman4-Mar-03 13:19 Bitterman 4-Mar-03 13:19
 SQL Jonathan de Halleux5-Mar-03 8:14 Jonathan de Halleux 5-Mar-03 8:14
 Re: SQL Carsten Breum21-Mar-03 1:44 Carsten Breum 21-Mar-03 1:44
 Diving on hold Jonathan de Halleux27-Mar-03 6:43 Jonathan de Halleux 27-Mar-03 6:43
 Re: Diving on hold Carsten Breum27-Mar-03 20:30 Carsten Breum 27-Mar-03 20:30
 Re: Diving on hold Jonathan de Halleux27-Mar-03 22:18 Jonathan de Halleux 27-Mar-03 22:18
 Re: Diving on hold Ceox14-Oct-03 23:37 Ceox 14-Oct-03 23:37
 Re: SQL Jonathan de Halleux28-Jul-03 13:26 Jonathan de Halleux 28-Jul-03 13:26
 Examples Chopper4-Mar-03 2:53 Chopper 4-Mar-03 2:53
 Re: Examples Corey W4-Mar-03 4:27 Corey W 4-Mar-03 4:27
 Re: Examples Chopper5-Mar-03 4:46 Chopper 5-Mar-03 4:46
 Sick datatypes Jonathan de Halleux12-Mar-03 7:27 Jonathan de Halleux 12-Mar-03 7:27
 Re: Sick datatypes Corey W12-Mar-03 7:34 Corey W 12-Mar-03 7:34
 Cool Jonathan de Halleux13-Mar-03 0:46 Jonathan de Halleux 13-Mar-03 0:46
 Last Visit: 31-Dec-99 18:00     Last Update: 28-Sep-16 13:36 Refresh 12 Next »