Click here to Skip to main content
13,456,475 members
Click here to Skip to main content
Add your own
alternative version


29 bookmarked
Posted 3 Aug 2003

Template Specialisation - An Introductory Survey

, 3 Aug 2003
Rate this:
Please Sign up or sign in to vote.
An in-depth description of the user's perspective to working with complete and partial template specialisations and why they are useful.


Modern programming is more than just producing a big slew of C code that seemingly gets the job done and should work without any major problems. As C programmers, we have probably experienced the need to reinvent basic data structures over and over again, thus C++ entered the scene and with its standard library produced a number of common data structures, but it did more than that – it provided common data structures that were type safe and generic at the same time. If you have had some exposure to C++, and I hope you have if you are reading this, then it is fairly obvious of what I’m talking... templates.

What many books describe as being templates is “a way to generalise the type that a function or class uses” – an example of this would be a single-linked list taking a user-defined type. C++’s template system provides more than that, it provides further control than merely generalising over a type, it provides functionality for creating specific implementations for specific types through (partial) template specialisation. This may boost the efficiency of an implementation greatly as you can usually make more presumptions on your data than your compiler can, because your compiler will always need to make a conservative guess about your intentions.

In this article I will cover how to specialise templates, both completely and partially, and illustrate how this might help in increasing the efficiency of an implementation greatly, and I will also show some more strange tricks that rely on more in-depth knowledge of the template mechanism more-so than it is a way to ‘generalise’ types.

A Motivating Example

Let’s presume that we have a functor, we cannot rely solely on functions because we cannot partially specialise these1, that copies data from one std::vector to another:

<PRE lang=c++>#include <vector> #include <cstdlib> template<typename T> class copier { public: void operator()(std::vector<T>& dest, const std::vector<T>& src) { dest.resize(src.size()); for (size_t i = 0; i < src.size(); ++i) dest[i] = src[i]; } };

Since we cannot make presumptions of our data this is the best we can do... or is it?

Complete Specialisation

A complete specialisation of a template is one where we assign a type/value to each template parameter in a new declaration of a class. With our motivating example this is fairly easy as there is only one parameter. To return to the question I let hang in the air in the previous section: “[...] this is the best we can do... or is it?”, what if we knew that our type was a pointer-to-an-int, in that case we would be able to use std::memcpy to move the contents2 from the source to the destination. So, how exactly do we specialise this thing?

<PRE lang=c++>template<> class copier<int*> { public: void operator()(std::vector<int*>& dest, const std::vector<int*>& src) { dest.resize(src.size()); std::memcpy(&dest[0], &src[0], sizeof(int*)*src.size()); } };

This doesn’t seem so bad, apart from the code duplication, of course, and the way to specialise should be fairly straightforward. The above specialisation of our copier will approximately (without optimisations turned on) run somewhere between three and seven times faster than the general version, which is, of course, a rather nice speed improvement. However, it becomes tedious, and not to mention error-prone, to do this for each and every single type you will use your functor on. It would be a lot better if we could say given a type T that is indeed a pointer, let us use std::memcpy instead of a per element copy.

Partial Specialisation

There is, of course, also a way to accomplish specialising on some type that is a pointer, this is through partial template specialisation3. Partial specialisation looks like normal template declaration coupled with complete specialisation, like this:

<PRE lang=c++ id=PRE1>template<typename T> class copier<T*> { public: void operator()(std::vector<T*>& dest, const std::vector<T*>& src) { dest.resize(src.size()); std::memcpy(&dest[0], &src[0], sizeof(T*)*src.size()); } };

This looks surprisingly a lot like our complete specialisation for the int, right? So with an extra 8 lines of code we now have an appealing speed gain for all pointers as we can utilise the faster std::memcpy .

There are, of course, more advanced things one can accomplish using this: compile-time based size constraints, compile-time assertions, type manipulations, and many, many more. This beginning was, however, merely a short example of why it can be interesting to use complete and/or partial template specialisation.

Notes and Caveats

Due to the fact that we can't guarantee that std::vector uses contiguous storage it would be sensible to, as one normally would, use std::copy appropriately, but this illustrates how to make a functor-version of this, providing std::vector does indeed use contiguous storage.

Timing the Difference

To notice the benefit of our specialisation it is, of course, useful with a good benchmark, and for this I’ve written the following little program. It is written for linux, but I am sure that it can be reworked to work in windows with relatively little effort. I will just present main here as the specialisations are shown above. First compile without the specialisation and then later with the specialisation and notice the difference in speed.

<PRE lang=c++>int main() { std::vector<long long> times; // make the benchmark over 10 iterations for (size_t i = 0; i < 10; ++i) { std::vector<int*> dest; std::vector<int*> src; for (int i = 0; i < 1000000; ++i) src.push_back((int*)i); timeval tv, tv2; gettimeofday(&tv, 0); copier<int*> cp; cp(dest, src); // call the operator() gettimeofday(&tv2, 0); // compute the time taken to copy all the elements long long l = (tv2.tv_sec - tv.tv_sec) * 1000000 + tv2.tv_usec - tv.tv_usec; times.push_back(l); } // compute the average time of the 10 runs: long long l = 0; for (size_t i = 0; i < times.size(); ++i) { l += times[i]; } l /= times.size(); std::cout << l << ’\n’; return 0; }

I have run the above on my 1.5 GHz Athlon XP+ with 256 MB RAM and it produced the following results:

VersionTime (in micro-seconds)
Without Specialisation292925
With Specialisation51254

That is, indeed, a speed benefit that is tangible, and one that should provide the first inkling of motivation.

Non-Type Related Templates

Apart from providing a generalisation over user-defined types templates can also be used together with a number of simple types, bool, (unsigned) char, (unsigned) short, (unsigned) int, (unsigned) long, enumerations and pointers. In other words templates can be used with bool, simple integer-types and pointers. This can, for instance, be used for compile-time computations that aren’t trivial and will incur a noticeable overhead at runtime. We might, for instance, be computing something requiring the calculation of the same factorial in the body of a loop that is run hundreds of times, rather than compute this at runtime we would like to calculate it to a constant number and just use this, however just replacing the call to the factorial function with a number introduces one of the so-called ‘magic’ numbers that will leave maintainers scratching their heads in puzzlement. So... time for yet another motivating example:

<PRE lang=c++>for (unsigned int i = 0; i < 1000; ++i) { ... fac(10) * ... ... }

This will evaluate fac(10) for every single iteration, but we would, of course, like to do better than that (we cannot rely on the compiler hoisting the computation from the loop as it is a function call). This is critical code after all, but we would like all our changes to take place so another programmer will be able to deduce the meaning of the code without having to ponder said magic constants. So what can we do? Considering this is a paper on templates I'm almost positive that it was a leading question.

So how, exactly, do we use the integral types with templates? Almost exactly like we abstract types, let us try to implement a simple recursive factorial using templates. I will come back to why it has to be recursive later in the article.

<PRE lang=c++>template<unsigned long n> struct factorial { static const unsigned long result = n * factorial<n-1>::result; };

This code might require a bit of explanation. The template ``type'' is as we are used to it, just instead of class or typename we have now used unsigned long instead. Nothing too complicated there. To be able to specify a value of a variable inside a class4 the variable must both be static and const. Compared to a normal recursive definition of factorial we can see the similarity:

<PRE lang=c++>unsigned long factorial(unsigned long n) { if (n <= 1) return 1; else return n * factorial(n-1); }

You may already have spotted the problem with our template code now... we don't have the check for the stopping criterion, oh horror! Our template will just keep on going and going and going, we will have sent the compiler into an infinite loop5 trying to compile our code (!) If I didn't have a solution to this it would be a rather short and disappointing article. We can, of course, use our knowledge from the last section and completely specialise the factorial template to catch n equal to zero, like this:

<PRE lang=c++>template<> struct factorial<0> { static const unsigned long result = 1; };

So when we instantiate factorial with a zero result will be 1. Now it is time to see how, exactly, all this takes place within the compiler. Knowing how the compiler deals with things often makes it a lot easier to choose the better way to implement something. Do remember that this is happening at compile time, your compiler is actually doing the computation for you when it is compiling your source file. So let us presume we instantiate our factorial template with 10 like this:

<PRE lang=c++>std::cout << factorial<10>::result << '\n';

What exactly happens at the instantiation of factorial<10>::result? It implicitly instantiates factorial<9>, which instantiates factorial<8>, etc. down to factorial<0>, which returns 1. When the 1 is found we proceed all the way back up through the instantiation sequence and thus gets the following sequence of computations: 1*1*2*3*4*5*6*7*8*9*10, and the result of this will be stored in factorial<10>::result .

Using all of this we can apply it to our loop mentioned earlier and we get this:

<PRE lang=c++>for (unsigned int i = 0; i < 1000; ++i) { ... factorial<10>::result * ... ... }

Considering factorial<10>::result is evaluated at compile-time we will merely be multiplying with a constant at runtime and we have thus saved a significant amount of computations in a critical loop.

There are, of course, other things that we can use this for that aren't only related to mathematics, some of the things you might have heard of, or will come to hear of is compile-time assertions and compile-time enforcements. For good measure I will illustrate the first of these here.

Compile-time Assertions

The C++ Standard Library provides the assert macro in the cassert header, but whatever you assert here will only be checked at runtime - that is one more thing to do at runtime that might be unnecessary. There are, of course, some things you want to assert at runtime, but what about the things we can tell already at compile-time?

What exactly is there that we can assert at compile-time? Plenty of stuff, actually. If we know the size of two matrices at compile-time we can also check that they comply with the normal mathematical rules if we wish to multiply them with each other. The benefit of doing this at compile-time is, once again, that we save the time at runtime. This means that the program will potentially run faster for those of you following at home. The way to do compile-time assertions is actually amazingly simple, and like all good ideas you sit wondering afterwards why you didn't think of that yourself.

<PRE lang=c++>template<bool> struct CTAssert; template<> struct CTAssert<true> {};

This code is, indeed, fairly simple, right? What we do is we forward-declare a general CTAssert taking a bool (which is a fairly limited data structure as values can either be true or false). We then proceed to make a complete specialisation for when the bool is true, but we omit any definition of the false specialisation or the general implementation. This means that if you try to use it with a bool that is false it cannot figure out what class to instantiate (as it hasn't been defined), so it will issue a compile-time error. Pretty fancy, isn't it?

So how are we going to use it, and for what? We can use it in any place we might've used assert to test something that we already know at the time of compilation, for instance whether sizeof(int) is 4. This is accomplished using: CTAssert<sizeof(int) == 4>(); in some code block. As this is an empty object any optimising compiler should be able to optimise the construction away if it isn't subsequently used (which is not the case in the example, at least). The use of CTAssert to test the size of int might seem a bit dull to you, but you could, for instance, use it to test whether the compile-time known sizes of two matrices fit for multiplication, and there are many other things you can use it for.

Summing up the Paradigms

As the past sections should have illustrated we have in templates an entire unique programming language within the confines of C++. The templates actually feature a simple second order lambda calculus6 based system (that is a good thing in case you are wondering). Unlike the rest of C++ templates aren't bound in the procedural paradigm, but rather in the functional programming language paradigm7 and as much as a pure functional programming language templates only provide some concepts. The concepts it provide is, however, adequate for pretty much everything: recursion and testing conditions. You have seen examples of both previously, and soon it is time to see how we can use these to design two mathematical vector classes with a maximal amount of code reusability between the two implementations.


We have already seen recursion implemented using templates in the factorial example where the template calls itself with a value one lower. This is recursion in its most easily identifiable form. Whenever you need to make a loop with templates you use recursion, be it for iterating over types, numbers, or enumerations.


We can hardly live without our ancient and trust-worthy if-construct, all non-trivial programming depends on it, and as such it is fortunate that we have it with templates as well. ``Where?'', you may ask. It is the complete and partial specialisations, nothing less, nothing more. Remember the factorial example once more, it was through complete specialisation we tested for when we called factorial with 0 and it is through complete and/or partial specialisation that we can test on one or more of the template arguments. Of course doing or-constructs (that with partial specialisation) is a great deal more `wordy' than a normal if-construct, but the mere fact that we are able to do so will aid us greatly in the future as we delve deeper into the art of template metaprogramming.

An Advanced Example: Mathematical Vectors

There is a duality between many things in how you can implement them with C++: as a run-time size specified data structure or as a compile-time size specified data structure. As such you seem to get either or, never both within the same data structure, and hardly ever with the exact same interface. There is, of course, a way to remedy this, which I am about to explain. A way that I have often a time used when developing templated code for my template library: template metaprogramming.

I will make the (evil) presumption that you, the reader, is proficient enough in mathematics to understand vectors and what they do and instead of explaining these things focus on how we are going to implement it. The first, and most obvious, thing to do would be to design the call interface.

The Interface

I will here only describe the public interface, which is to be shared between the compile-time size specified vector and the runtime size specified vector, because it is in the private sections of the scene that the true adventure will unfold.

<PRE lang=c++>template<typename Type, ...> // to appear later class Vector { public: Vector(); Vector(const Vector& v); Vector(size_t dimension, const Type& t); ~Vector(); Vector& operator=(const Vector& v); inline size_t size() const; inline size_t length() const; inline T& operator[](size_t idx); inline const T& operator[](size_t idx) const; Vector& operator+=(const Vector& v); Vector operator+(const Vector& v) const; Vector& operator-=(const Vector& v); Vector operator-(const Vector& v) const; Vector operator-() const; // unary negation Type operator*(const Vector& v); Vector& operator*=(const Type& t); Vector operator*(const Type& t) const; friend Vector operator*(const Type& t, const Vector& v); bool operator==(const Vector& v) const; bool operator!=(const Vector& v) const; };

These were all trivial functions to implement when we had a simple Vector without all these template frills and whatnot8. They will, actually, look almost completely alike this time around, with the exception of size, which will differ. Having established the external interface of the Vector it is time to figure out what is going to go on behind the scenes - how to choose between runtime and compile-time based storage.

Behind the Scenes

The tricky work happens behind the scenes, as is usually the case with templates. The devise goes 'Easy to use, hard to write'. First we need two distinct classes to specialise to depending on whether we are using runtime sized vectors or compile-time sized vectors. These we will call Vector_rtsize and Vector_ctsize respectively. The latter of these will be templated with a size (we need to be able to specify the size somehow).

<PRE lang=c++>class Vector_rtsize {}; template<size_t dimension> class Vector_ctsize {};

So now we have two distinct classes to work with. We now need to define the actual storage. To do that we need to know the type of the data we are going to store and the size of the data to store (in case it is compile-time sized). If we call our storage class for Vector_storage then an initial sketch of the runtime-sized storage could look like this:

<PRE lang=c++>template<typename StorageType, typename Type> struct Vector_storage; template<typename Type> struct Vector_storage< Vector_rtsize, Type > { Vector_storage() {} Vector_storage(size_t dim, const Type& t) : elements(dim, t) {} Vector_storage(const Vector_storage& v) : elements(v.elements) {} ~Vector_storage() {} Vector_storage& operator=(const Vector_storage& v) { elements = v.elements; return *this; } inline size_t size() const { return elements.size(); } std::vector<Type> elements; };

The actual body here isn't too hard, it is the specialisation that matters to us mainly. As you may recall this is a partial template specialisation that takes Vector_storage and specialises it to Vector_rtsize as a Storage specifier while leaving Type still a template parameter. While the contents of Vector_storage specialised to Vector_ctsize is remarkably the same contents-wise it will still be interesting to see how the actual specialisation is done, considering Vector_ctsize takes a template parameter on its own and that size() relies on the compile-time constant as well.

<PRE lang=c++>template<typename Type, size_t dimension> struct Vector_storage< Vector_ctsize<dimension>, Type > { ... inline size_t size() const { return dimension; } Type elements[dimension]; };

So we can also specify further template parameters to specialisations as can be seen above. This allows for even more expressive power when programming with templates.

What we have now is a general interface for mathematical vectors, as well as two distinct storage policies, one using the free store, the other using the stack. All we need to do now is add it all together. This we do by instantiating the appropriate storage inside the Vector class and for that we need to know the storage policy, thus we make this a part of the template parameter. So, without further ado, I give you the entirety of the Vector class and its auxiliary classes:

<PRE lang=c++>#include <vector> #include <stdexcept> class Vector_rtsize {}; template<size_t dimension> class Vector_ctsize {}; template<typename StorageType, typename Type> struct Vector_storage; template<typename Type> struct Vector_storage< Vector_rtsize, Type > { Vector_storage() {} Vector_storage(size_t dim, const Type& t) : elements(dim, t) {} Vector_storage(const Vector_storage& v) : elements(v.elements) {} ~Vector_storage() {} Vector_storage& operator=(const Vector_storage& v) { elements = v.elements; return *this; } inline size_t size() const { return elements.size(); } std::vector<Type> elements; }; template<typename Type, size_t dimension> struct Vector_storage< Vector_ctsize<dimension>, Type > { Vector_storage() {} Vector_storage(size_t dim, const Type& t) { assert(dim == dimension); for (size_t i = 0; i < dimension; ++i) elements[i] = t; } Vector_storage(const Vector_storage& v) { std::copy(v.elements, v.elements + dimension, elements); } ~Vector_storage() {} Vector_storage& operator=(const Vector_storage& v) { std::copy(v.elements, v.elements + dimension, elements); return *this; } inline size_t size() const { return dimension; } Type elements[dimension]; }; template<typename Type, class Storage> class Vector { public: Vector() {} Vector(size_t dim, const Type& t) : storage_(dim, t) {} Vector(const Vector& v) : storage_(v.storage_) {} ~Vector() {} Vector& operator=(const Vector& v) { storage_ = v.storage_; return *this; } inline size_t size() const { return storage_.size(); } inline size_t length() const { return size(); } inline Type& operator[](size_t idx) { if (idx >= size()) throw std::out_of_range(); return storage_.elements[idx]; } inline const Type& operator[](size_t idx) const { if (idx >= size()) throw std::out_of_range(); return storage_.elements[idx]; } Vector& operator+=(const Vector& v) { if (size() != v.size()) throw std::runtime_error("Size mismatch"); for (size_t i = 0; i < size(); ++i) storage_.elements[i] += v[i]; return *this; } Vector operator+(const Vector& v) const; Vector& operator-=(const Vector& v); Vector operator-(const Vector& v) const; Vector operator-() const; // unary negation Type operator*(const Vector& v); Vector& operator*=(const Type& t); Vector operator*(const Type& t) const; friend Vector operator*(const Type& t, const Vector& v); bool operator==(const Vector& v) const; bool operator!=(const Vector& v) const; private: Vector_storage<Storage, Type> storage_; };

The remainder of the Vector methods are fairly trivial to write and I will, as most other lazy authors, leave it to the avid reader to write these. There should never be written code without at least a single example or two on how to use it, so a small example of instantiating the Vector in different ways and to use them is shown in the next piece of code:

<PRE lang=c++>#include <ostream> #include <iostream> template<typename T, class S> std::ostream& operator<<(std::ostream& os, const Vector<T,S>& v) { os << '{'; for (size_t i = 1; i < v.size(); ++i) os << v[i-1] << ", "; if (v.size()) os << v[v.size()-1]; os << '}'; return os; } int main() { Vector<int, Vector_ctsize<3> > a, b; a[0] = 1; a[1] = 2; a[2] = 3; b[0] = 4; b[1] = 5; b[2] = 6; std::cout << a+b << '\n'; Vector<float, Vector_rtsize> c(8, 0.5f); std::cout << c << '\n'; }

This will print the following text:

{5, 7, 9}
{0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5}

The Vector at a Review

What was it that allowed us to create one vector class yet be able to differentiate between storage mechanisms: using the free store or using the stack. Rather than specialising the actual Vector class we have specialised the storage of the Vector as appropriate. This leaves us with the same interface for the class the user sees, but with differing internals depending on the template parameter.

This is but one more place where partial template specialisation may aid our programming and allow us to use the stack-bound version, which doesn't have the dynamic memory overhead, in time-critical code, and allow us to use the dynamic version when we cannot predict at compile time the size of the storage. Much like using fixed-width arrays as opposed to dynamic arrays in C.

All in all we have a Vector where the storage method is transparent to the user (just like any good object-oriented encapsuled class should guarantee), but with which the user can transparently switch between storage types without changing any other places in the code but the declaration of the variable. This is in places known as a policy-based design that defers several choices of specific implementation to the application programmer rather than letting the library developer take the decisions. Policy-based design is described thoroughly in [Alexandrescu] chapter 1.

Further Down the Road

What you have learned above, providing you have actually read it all and managed to digest it, should provide you with the tools to program just about anything relating to template specialisations. The basic understanding of templates I trust you had already prior to throwing yourself into this article. So what are some things that you are ready to tackle and play with? To name a few: smart pointers, type lists, matrix classes, object factories, and many, many other design patterns. Further information on design patterns and their applicability in object-oriented design can be found in [GOF].


1 It is possible to completely specialise functions, and it is used in several places of the C++ Standard Library.

2 This is presuming std::vector uses contiguous memory, as it is now this is not a requirement on behalf of the standard, but as to my knowledge then all implementations of the STL uses contiguous memory for it. However, this is merely to serve as an example.

3 At the time of writing only a small handful of compilers supports this, so take great care what compiler you use.

4 A struct is equivalent to a class in C++, just with a different default visibility specifier.

5 This isn't exactly correct as most compilers will have a limit on the maximum recursion depth of template parameters so most compilers will produce an error message stating that the maximum recursion depth has been exceeded, but in theory we would get an infinite loop.

6 A typed lambda calculus.

7 This might be why it draws us nerdy Computer Science majors to experiment with it.

8 Providing you have actually tried to implement a Vector class before.


[Alexandrescu]: Andrei Alexandrescu, Modern C++ Design: Generic Programming and Design Patterns Applied. Addison-Wesley, Inc., 2001

[GOF]: Erich Gamma, Richard Helm, Ralph Johnson & John Vlissides, Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley, Inc., 1995


5th August 2003: Initial version uploaded.


This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


About the Author

Henrik Stuart
Denmark Denmark
No Biography provided

You may also be interested in...

Comments and Discussions

GeneralRe: vector's *do* use contiguous memory. Pin
Taka Muraoka5-Aug-03 1:07
memberTaka Muraoka5-Aug-03 1:07 
GeneralRe: vector's *do* use contiguous memory. Pin
Taka Muraoka7-Aug-03 15:26
memberTaka Muraoka7-Aug-03 15:26 
GeneralRe: vector's *do* use contiguous memory. Pin
Henrik Stuart7-Aug-03 23:18
memberHenrik Stuart7-Aug-03 23:18 
GeneralRe: vector's *do* use contiguous memory. Pin
peterchen5-Aug-03 1:32
memberpeterchen5-Aug-03 1:32 
GeneralRe: vector's *do* use contiguous memory. Pin
Johnny ²5-Aug-03 2:09
memberJohnny ²5-Aug-03 2:09 
GeneralRe: vector's *do* use contiguous memory. Pin
peterchen5-Aug-03 6:12
memberpeterchen5-Aug-03 6:12 
GeneralRe: vector's *do* use contiguous memory. Pin
Henrik Stuart5-Aug-03 3:07
memberHenrik Stuart5-Aug-03 3:07 
GeneralRe: vector's *do* use contiguous memory. Pin
Nemanja Trifunovic5-Aug-03 5:09
memberNemanja Trifunovic5-Aug-03 5:09 
GeneralRe: vector's *do* use contiguous memory. Pin
peterchen5-Aug-03 5:30
memberpeterchen5-Aug-03 5:30 
GeneralRe: vector's *do* use contiguous memory. Pin
Henrik Stuart5-Aug-03 5:46
memberHenrik Stuart5-Aug-03 5:46 
GeneralRe: vector's *do* use contiguous memory. Pin
Nemanja Trifunovic5-Aug-03 5:49
memberNemanja Trifunovic5-Aug-03 5:49 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web03-2016 | 2.8.180322.1 | Last Updated 4 Aug 2003
Article Copyright 2003 by Henrik Stuart
Everything else Copyright © CodeProject, 1999-2018
Layout: fixed | fluid