Click here to Skip to main content
15,880,891 members
Articles / Programming Languages / C++

The Concave Hull of a Set of Points

Rate me:
Please Sign up or sign in to vote.
4.96/5 (31 votes)
16 Aug 2017CPOL8 min read 64.8K   5K   42   38
A C++ implementation of a published algorithm for computing the concave hull using a k-nearest neighbour approach.

Introduction

The algorithm is described in the published paper "Concave Hull: A k-nearest neighbours approach for the computation of the region occupied by a set of points" by A. Moreira and M. Santos, 2007, University of Minho, Portugal. 

I achieved significant performance gains over the unoptimised algorithm. I recognised that the algorithm would benefit from a C++ implementation using the Flann library for the k-nearest neighbour searches and OpenMP parallelism for point-in-polygon checks. This article demonstrates my findings such as what affects the computation time.

My curiosity is on performance when the datasets get large. I use ground point datasets such as those from aerial Lidar scans as test datasets. My program is lightweight and depends only on Flann.

Examples

The examples below each show a set of points where the blue polygon is the computed concave hull.

Example 1: 771 input points, 166 concave hull points, 0.0 seconds to compute.

Image 1

Example 2: 4726 input points, 406 concave hull points, 0.1 seconds to compute

Image 2

Example 3: 54323 input points, 1135 concave hull points, 0.4 seconds to compute

Image 3

Example 4: 312428 input points, 1162 concave hull points, 26.0 seconds to compute (see section Analysis below)

Image 4

How it works

The Moreira-Santos algorithm is an iterative solution, where an initial nearest neighbour K-value is set to 3 and iteratively increased until a polygon is found that encloses all the points. That is the concave hull. Increasing K further will smooth out the hull but perhaps making it less aesthetically correct. 

When K=3 is used by the k-nearest neighbour search it means the three closest points to any test point are found. Too small a K means an enclosing hull doesn't get found. Too large a K might mean important concavities in the hull are smoothed over. 

The algorithm stops on the first K that produces a valid concave hull. My program has a command line parameter to pass a higher K to produce a smoother polygon if so desired. Ideally there would be some visual inspection to determine whether the first K which determines the hull is sufficient, or whether an increased K gives a more pleasing result. 

See the section Finding Performance below on where I achieved performance gains over the unoptimised algorithm. 

Analysis

The time to compute the hull starting with K=3 increases fairly linearly with the number of input points. However other factors do affect the absolute running time. Because of these factors, one dataset of n points will not compute in the same time as another set of n points. 

The number of input points

All the points need to pass a point-in-polygon test for the iteration to stop. Therefore the more input points there are, the slower this test will be. I try minimise this by parallelising the input to the point-in-polygon test. 

The point spacing near the edge, whether uniform or having large variations

The geometry of the points near the edge has an impact on the K iteration. If the edge is "messy", more neighbouring points (K) will need to be inspected than if the edge has little variation in point density and spacing. A regular grid of points will have a faster computation time than a disordered and irregular point set. Example 4 above is an example where K needs to reach 50+ before completion because of very irregular point spacing.

The number of iterations of K needed to reach a solution

The more iterations there are as a result of (2) above, the more times the point-in-polygon test probably has to be done, this being one of two iteration step criteria. (See a possible optimisation in the section Finding Performance below.)

The size of the hull polygon

The point-in-polygon test will be slower for hull polygons with many vertices than if the polygon is smaller, though this is less important than the number of points passed to the test. 

The bottleneck

In my code, after many performance gains (see section Finding Performance below), profiling shows the biggest bottleneck to be the point-in-polygon check. This test is the stopping condition for the iteration. If any points in the dataset are not inside the polygon computed in that iteration, then K is incremented. So the point-in-polygon test is crucial, but expensive.

Benchmarks of large examples

My emphasis is on large input datasets. Here follows three examples where I have test cases in increments of 100,000 up to one million. This is to compare the affect of input point size and the number of iterations of K have on computation time. 

Large Example 1: 995165 input points, 5221 concave hull points, 10.7 seconds to compute.

Image 5

Image 6


Large Example 2: 984119 input points, 4907 concave hull points, 16.8 seconds to compute.

Image 7

Image 8


Large Example 3: 998817 input points, 8787 concave hull points, 31.1 seconds to compute.

Image 9

Image 10


This algorithm could still be useful for a large dataset if it is first downsampled to a more manageable size, and only then the concave hull computed. This would only be an option though if the concave hull needn't be accurate to the original dataset's edge. 

Above timings were made on a Core i7-2600 @3.4 GHz, 16GB RAM, Windows 10.

Finding performance

Perfomance gains I found listed in order of impact:

  1.  Using Flann for the nearest neighbour point searches gave me the most immediate performance boost.
  2. The algorithm in the paper requires frequent deletions of points from the dataset. Initially, by doing the same using std::vector::erase I found to be a major bottleneck. I did away with that storage entirely as the Flann way is already storing points and capable of marking points as deleted.
  3. Testing whether any points fail a point-in-polygon check is an ideal candidate for parallelising. I used an OpenMP parallel implementation of any_of
  4. Along with the 2D coordinate I also store a unique ID for each point which stays sorted and allows me to binary-search for a point ID rather than searching by value semantics.

Optimisations I foresee as being possibly worthwhile, but were not attempted:

  1. The final K could be found faster by instead of incrementing K only by 1, a bigger increment could be chosen to find a valid, but overly smoothed polygon quicker, and then K decremented to converge in a binary-search type manner.
  2. Keep a count of the number of times each point has passed the point-in-polygon test and then sort by this number to first test those that have never passed in a previous iteration. The idea being that if it has passed before, it'll probably pass again.

Using the code or program

Building the code

Other than the Flann library calls and a little OpenMP, the code is standard C++ with a heavy reliance on STL algorithms. Flann also needs to be separately installed, but thereafter it is just a header-only include. I built this with Visual Studio but it will build with any C++11 compliant compiler. 

Running the executable

If you get a system error stating the program can't start because VCOMP140.dll, MSVCP140.dll or VCRUNTIME140.dll is missing, then you need to download and install the Microsoft Visual C++ 2015 Redistributable.

The command line usage of is as follows:

Usage: concave.exe filename [-out arg] [-k arg] [-field_for_x arg] [-field_for_y arg] [-no_out] [-no_iterate]

 filename     (required) : file of input coordinates, one row per point.
-out          (optional) : output file for the hull polygon coordinates. Default=stdout.
-k            (optional) : start iteration K value. Default=3.
-field_for_x  (optional) : 1-based column number of input for x-coordinate. Default=1.
-field_for_y  (optional) : 1-based column number of input for y-coordinate. Default=2.
-no_out       (optional) : disable output of the hull polygon coordinates.
-no_iterate   (optional) : stop after only one iteration of K, irrespective of result.

The only required parameter is filename of input coordinates. The fields in the input file can be comma, tab or space delimited. Adjacent spaces are treated as one. By default the first two fields are used for x and y, but this can be changed with the -field_for_x and -field_for_y parameters. Output of the hull coordinates is by default to stdout, but can be directed to a file with the -out parameter. In this case output is space delimited unless the filename has a .csv extension to make it comma delimited. The parameters of -k, -no_out and -no_iterate are mostly for analysis/testing purposes but are self explanatory. 

Comparison with alpha-shapes 

Alpha-shapes are the usual mentioned way of determining concave hulls. C++ implementations of alpha-shapes are available in CGAL, PCL (Point Cloud Library) and LASLib. However these either have commercial unfriendly licensing requirements, or very onerous build dependencies that are prohibitive if otherwise not needed. 

I still did compare the performance of my program with PCL's alpha-shape implementation pcl::ConcaveHull<>. I found the Moreira-Santos algorithm to compute the concave hull mostly faster than PCL alpha-shape. I did not attempt to look for speed optimisations with the PCL code, and perhaps better performance could be had with more experience with PCL

Just like with this algorithm, the alpha-shape computation is also an iterative solution where an initial guess for α is increased or decreased repeatedly until one polygon is found and that is the concave hull. A fair comparison between the two should therefore be a single iteration only for both. i.e. a final K-value vs. a final α value. That is what I show here for the above three large examples: 

Image 11

Image 12

Image 13

Conclusion

  • The Moreira-Santos algorithm is very attractive especially if the dataset size is fairly small, say in the tens of thousands. 
  • When datasets are in the order of many hundreds of thousands or millions as is typical with Lidar data, then the algorithm will need further optimisations over and above what I achieved. 
  • The bottleneck is the iteration criteria of all points passing a point-in-polygon test. A faster point-in-polygon test, or more selective use of it, would improve performance.
  • My implementation of the algorithm is lightweight and depends only on Flann. It doesn't really satisfy my requirements of large datasets, but it might prove useful to others.

References

  1. The Moreira-Santos paper: Concave Hull: A k-nearest neighbours approach for the computation of the region occupied by a set of points.
  2. FLANN - Fast Library for Approximate Nearest Neighbors
  3. PCL - Point Cloud Library
  4. Civil Designer was used for the screenshots.

History

Aug 2017: Initial post.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer
South Africa South Africa
Capetownian, software developer, land surveyor.

Comments and Discussions

 
QuestionNot able to compile using VC++ 2022 Pin
tyjiang31-Mar-24 16:25
tyjiang31-Mar-24 16:25 
QuestionNot giving results when on Iterate Pin
Member 1607541124-Dec-23 4:22
Member 1607541124-Dec-23 4:22 
SuggestionAlso suitable for enclosed interior holes Pin
Manicat23-Dec-22 4:42
Manicat23-Dec-22 4:42 
GeneralRe: Also suitable for enclosed interior holes Pin
Alasdair Craig23-Dec-22 20:34
Alasdair Craig23-Dec-22 20:34 
PraiseAdapted your code Pin
Manicat23-Dec-22 8:55
Manicat23-Dec-22 8:55 
QuestionBUILD DLL to reuse in other .NET languages like C# Pin
Romulo Magalhaes8-Dec-22 3:54
Romulo Magalhaes8-Dec-22 3:54 
QuestionResults are not as expected when the code with the same dataset runs in Unix platform Pin
Member 145331278-Jun-20 15:36
Member 145331278-Jun-20 15:36 
Bug[UNIX] executables compiled with other compiler than visual studio obtain poor results Pin
jrobichaud7-Jan-20 6:22
jrobichaud7-Jan-20 6:22 
QuestionProblems With Co-Linear Points Pin
Kyudos12-Nov-19 11:06
Kyudos12-Nov-19 11:06 
I think some 'unsolvable' problems here arise when points are co-linear or nearly so. Without some tolerance on the angle normalisation, for instance, when the initial point is added back in (on step 4), it can be given an angle of ~2PI, instead of ~0. This can then make it the 'most clockwise' of the nearest points, messing things up.

I can also confirm that the 'inside' test doesn't always work, but I haven't yet investigated replacing it with something more robust (e.g., perhaps using a tolerance to check if points are on the boundary, before checking if they are 'inside').

In my test cases co-linear starting points with decreasing clockwise angle (i.e., sloping up) cause problems, whereas increasing angle (or sloping down) is OK.

EDIT:

In my problem cases, this results in better behaviour:

C++
// Return angle in range: 0 <= angle < 2PI
double NormaliseAngle(double radians)
{
	if (radians < 0.0)
	{
		if (fabs(radians) < 0.02) // Approx 1 degree
		{
			return 0.0L;
		}
		else
		{
			return radians + M_PI + M_PI;
		}
	}
	else
	{
		return radians;
	}
}


modified 12-Nov-19 19:51pm.

QuestionGuide for compiling and running experiment Pin
Member 145518206-Aug-19 2:04
Member 145518206-Aug-19 2:04 
QuestionGenerally works well with my datasets but sometimes never completes and runs endlessly. Any fixes? Pin
Alec Barker25-Jan-19 13:59
Alec Barker25-Jan-19 13:59 
AnswerRe: Generally works well with my datasets but sometimes never completes and runs endlessly. Any fixes? Pin
GregDude11-Feb-19 14:28
GregDude11-Feb-19 14:28 
GeneralRe: Generally works well with my datasets but sometimes never completes and runs endlessly. Any fixes? Pin
Kyudos12-Nov-19 11:07
Kyudos12-Nov-19 11:07 
GeneralRe: Generally works well with my datasets but sometimes never completes and runs endlessly. Any fixes? Pin
GregDude12-Nov-19 21:48
GregDude12-Nov-19 21:48 
GeneralRe: Generally works well with my datasets but sometimes never completes and runs endlessly. Any fixes? Pin
Kyudos13-Nov-19 8:56
Kyudos13-Nov-19 8:56 
GeneralRe: Generally works well with my datasets but sometimes never completes and runs endlessly. Any fixes? Pin
GMGDerek17-Jun-22 10:14
GMGDerek17-Jun-22 10:14 
BugDid not work with a small set of random points Pin
Andreas Killer3-Sep-18 2:10
Andreas Killer3-Sep-18 2:10 
GeneralRe: Did not work with a small set of random points Pin
jerry c simington4-Jan-19 8:04
jerry c simington4-Jan-19 8:04 
GeneralRe: Did not work with a small set of random points Pin
Alasdair Craig8-Jun-19 1:04
Alasdair Craig8-Jun-19 1:04 
Questioncan't get same performance Pin
shandoosheru7-Aug-18 11:18
shandoosheru7-Aug-18 11:18 
Questioncan i get a 3D concave hull from point cloud? Pin
Member 139347415-Aug-18 5:13
Member 139347415-Aug-18 5:13 
AnswerRe: can i get a 3D concave hull from point cloud? Pin
Member 1472863628-Jan-20 1:19
Member 1472863628-Jan-20 1:19 
QuestionIs there a parameter to adjust concaveness? Pin
Seunghoon.Shin12-Jul-18 0:21
Seunghoon.Shin12-Jul-18 0:21 
AnswerRe: Is there a parameter to adjust concaveness? Pin
Alasdair Craig17-Jul-18 20:22
Alasdair Craig17-Jul-18 20:22 
QuestionPort to C++ Builder from Embarcadero Pin
Peter Stadler29-May-18 21:58
Peter Stadler29-May-18 21:58 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.