Just a few words to introduce the “why” this series of articles.
After my submission to CP of the article MFC, namespaces MI and Serialization, a friend working on a WTL project did a phone call asking “Good thing, can I use it with a WTL project?”
My very first answer was … “Why not! After all there is very few MFC up there – apart the demo app, of course. The only thing you miss will probably be
I started to look if and how to port the code, I found myself into a coding dilemma.
WTL is a template library. Many of my classes also are, but others are not. Should I have to provide them in “inline” form, or in a library form ? What does it change to the final code of an application if I do in “inline” or in a “library”?
Again, other problems exist with the nature of WTL itself: it is a “template library”: it produces no binaries until not used (no “OBJ”, “LIB” or “DLL” directly from its sources). But in my classes, there are certain global or static objects that must exist in memory even before you instantiate your own objects (think to
SFactory, of that article)
And what should I do with my code? Make it a template library too, or a normal static library?
In this article
I’ll not supply any code (I’m currently working on the porting from MFC to WTL), but I want to discuss some aspects of code styling about general “good programming techniques” and how they can be implemented into libraries. The idea is to share opinions on the subject. The goal is "write a Windows application" (an exe) on top of the Windows API.
The use of MFC or WTL as API wrapper can also be considered, but not the use of those libraries as "frameworks" (set of interrelated classes), because we cannot compare them (they have different approaches on various things, so it is improper to compare code coming from both the "framewiorks").
What and Why libraries?
A library – in general sense - is a “collection of code” that someone did and that has been made available to others to be used.
The reason to do this can be described in term of two exigencies: code reusability and code structuring.
Different models and even philosophy have been invented do this, depending on what is the “collection” and which is the “usage”. In this sense, we can speak about “open” and “closed” codes.
Different coding techniques (like “information hiding” or “code sharing”) may then be differently used in these two different situations.
At this point, some confusions arise, by mixing different concepts (which code, to do what and how to do it) with similar terminology (the real meaning of the word “open” is quite different in all this 2x2x2 cases, but is always used), starting up some strange “religion wars”. Just to avoid confusion, let me clarify how some terms will be used in this article.
Information hiding – despite of what many open source fans say – has not been invented by Bill Gates : it is a theory that’s part of the information science. The concept it refers comes from some consideration of industrial design and processing:
If you are the designer of a car, you are also involved in "using" wheels: you must know about their mechanics and physics, but you’re probably not interested in their production process. And often – this process- is in charge to a completely different industry.
The same happens with coding:
If you are the user of an object, you must know how to use its interface, not the way the designer of that object worked to implement it internally (apart of certain “quality insurance” about its correctness and performances). Now the point is: are languages really oriented in this way? Are “real” production processes of coding really based on these assumptions?
It’s my opinion that C++ isn’t: if I want to “hide” the implementation, I should give you only headers (so you can derive your classes and call my functions) and binaries (LIB or DLL) but not sources. But – admitting this is good – why C++ imposes me to declare in the header I give you, also my “private” members? What should you do with that? (I know: there are some technical reasons due to the fact that the compiler must know about the space those members require, but this – whatever is this reason good or bad – violates the principle of hiding). Again: are headers enough to describe an object functionality? No, they aren’t: they describe data and functions, but not the sequence you must call them to get a certain effect: they cannot describe “protocols”! You need extra documentation that says "you must first '
CreateWindow' and then '
MoveWindow' and not vice versa".
There is both excess of information (that is: entropy in the header code) and lack of it (that is: entropy in your process, to compensate what I didn't/couldn't document).
Is "open coding" a solution for all those problems? (I'm talking about "open source" as a "source" not as "community of developers", thus I call it "open coding" to avoid confusions)
Not yet: Does source code completely describe the object behavior? (forgetting to hide details: that’s not what an open source is intended for, obviously!). Technically speaking yes: after all it is what the machine executes. But think a bit about it: If I give you a source, do you really understand what an object is for, and what was it intended for?
It depends on very many things: styling (if you don’t like the way I wrote, you probably encounter more difficulties), “entalgment” (if a function creates side effects on data in other objects in other completely different part of the source(s), you can be in trouble to track) and even psycanalisis (Why did I design it in that way? What was I intending to do with that design and not with another?).
Again, knowing “how” a thing is done, doesn’t tell you “why” it was done that way (and why it was done tout-court). And if do you find an error, you can correct in many ways? What is the “correct” way to correct, without changing the nature of the design? How many of you will think differently? And how many different "variations on theme" may be born from those assumptions?
None of the philosophies does itself solve all the problems: whatever of them you like, you always must have some discipline to manage what languages themselves do not provide.
The advantages to keeping the sources “open” or “closed” at this point are related to the fact you may (or not) consider (or not) as an advantage, the capability of everyone to somehow modify your code in his needs.
I don’t want to go further in this argumentation (it’s really a matter of religion, not technology, see this article... and its comments!), but what I want to be clear is that these facts (that are pertinent on “organizational” aspects) must not be confused with the technical aspects.
The fact you are going to deploy a number of classes in an “open source way” does not mean itself that you cannot organize those sources into libraries that other programmers can use without the need to include your sources in their projects and recompiling them. Or that you cannot take advantage organizing your code into a set of libraries.
And also, the fact you’re giving your sources away, does not mean you can avoid to document the “reasons” of your design and the way it was intended for.
Now, assuming we are going to make our sources public, what's best for a Windows application ?
- Inline coding (give away a pure collection of headers)
- "Near line" coding (give away headers, but with implementation in different files than declarations)
- "Offline" coding (give away headers and sources to be included into other projects)
- Static libraries (give away headers and .lib, and sources only for document and debug)
- Dynamic libraries (give away headers and .dll plus export .lib)
Of course, no universal answer exists, but here I propose some experiments to show benefits and drawbacks.
The translation process
Suppose to create an application, your development environment is involved in this kind of a process:
- A number of headers (.h) are precompiled into a .pch
- A number of headers (.h) are included into a source (.cpp)
- A number of sources are translated into object files (.obj)
- A number of object files (.obj) are packed into a library (.lib)
- A number of object files (.obj) and libraries (.lib) are linked into a binary (.exe or .dll)
Code styling may be different in how declarations, definitions and implementations are distributed among those files. There are - however - some constraints.
Now, consider me providing you some classes, how should the code appear, how should it be used and how your translation process becomes?
Whatever is your preferred style or framework, I think it is generally valid, a model that supposes:
- One precompiled header (will contain a number of frequently used headers)
- Zero or more of your headers (zero is atypical, but possible ...)
- A number of library headers (some standards, some of other origin - like mine used by you)
- Zero or more static library (or referenced DLL import library)
- Zero or more imported type library ("tlb")
- One or more source file (your "cpp" files)
- One or more "somebody else's sources", you include in your project
Let's consider different possible styles.
A. Inline coding
The code looks like this:
#include <span class="code-string">"MyBase.h"
And this is the only file I should give you.
- All function bodies are inline, and all functions are declared and defined in one single step
- Instantiation requires the
_declspec(selectany) declaration: that is because if you include this header in more of your sources (.cpp) in the same project, the instantiation will be present in each of the .obj the compiler generates. So I have to tell to your linker to consider all those instance as a single one (not distinct). Otherwise, you have the message "Object already defined or declared" and if you set the linker to continue, each code in each original .cpp file will refer to its own instance for that object.
- Definition of global functions require the
inline declaration. If the function is recursive (or is part of a recursion mechanism) or has too long a body, the compiler will not expand the body in the expression you call it, but in any case will place an ordinary function call. In any case, the "
inline" here, plays the same role of the
selectany played above. In fact, also member functions are inline, but for members defined in that way, the
inline is implicit.
B. Near line coding
That way of coding has the advantage of putting everything in one place, but has drawbacks:
- Code readability: you may have difficulties to understand which functions a call made available: too dispersive
- Coding functionality: if A uses B and B uses A ... you cannot define A until you didn't define B and you cannot define B until you didn't define A.
In this last case, the only way to solve, is to keep separate, the declaration and definitions. But if I still want to use the "only headers paradigm", here's a possible solution:
#include <span class="code-string">"MyBase.h"
imline returntype CMyClass::DoSomething( parameters )
inline returntype CMyClass::DoOverride ( parameters )
inline returntype CMyClass::DoImplement( parameters)
function body goes here
_declspec(selectany) CSometypeB CMyClass::s_staticMember;
_declspec(selectany) CSometypeC g_glbalObject;
inline returntype GlobalFunction (parameters)
As you note, the .hpp file plays the same role of a .cpp, but it isn't. By including at the end of the .h, I'm supposing that - if you include my .h in your code, I also force your compiler to translate also my definitions.
But because you may include it in different .cpp, we still need "
inline" and "
We can so conclude that - by the translation process point of view - very few changes: all symbols are generated by the compiler for every source ("cpp") that includes the headers. If more sources refer same headers, and headers contain instantiations, multiple instantiation arises, so the linker must be instructed accordingly (via
What really changes is code styling. B style is probably more suitable for long and complex classes with lots of internal helper functions. You (as user) can better concentrate on interfaces, rather than implementation details.
We must also note that this is the last possible style if classes are templates: we cannot define templates in sources because the compiler cannot translate them until the template parameters are assigned. And this will be done by you, not me.
C. Offline coding
The idea of "offline coding" is to avoid delay in code translation and anticipate it for all the classes and instances that can exist (or must exist) independently of their usage.
The same sample will look like this:
#include <span class="code-string">"MyBase.h"
#include <span class="code-string">"myclass.h"
returntype CMyClass::DoSomething( parameters )
CMyClass::DoOverride ( parameters )
function body goes here
returntype GlobalFunction (parameters)
And here you see very few "styling" differences with respect to the previous coding, but there is a substantial fact: The CPP can be translated and static or global objects can be instantiated. There is no need for the compiler to "redo" those tasks for every CPP of your project. This is just another CPP of your project!
But with a problem: it's written by me, not knowing your project settings and not knowing what you put in an eventual precompiled header, that I must include (or the compiler will not translate). This fact makes this model less attractive. It may be good if you're going to manipulate my code yourself, reusing it "adjusted" to your specific needs. Not if you simply want to use it in your project.
D. Static libraries
To overcome the previous problem, the solution is to provide myself a project and do myself the translation of my code, letting you implement your project as "dependent on mine".
I can have my precompiled header, and you can have yours. I can have my compiler settings and you can have yours.
Code styling is identical to the previous, but there is a problem with include: if I use some headers for my code taken by other libraries in defining my headers without including, but pre-compiling, you also have to pre-compile them.
This can be done providing:
- a header you have to precompile with yours,
- the headers describing my classes (you will include where needed) and
- a .lib file containing the translated code.
Your compiling process is simpler because you don't have to translate my classes every time you use them in every source: you simply link my translation just before you create the binary (EXE or DLL). Information hiding is simpler because I can provide you only those headers required to you to use my classes, not everything I eventually used. And I even can arrange my classes with internal private implementation, you can even not care about.
Note - also - that the linker does not link all the .lib but only that part that your sources really reference. An efficient modularization of classes (different classes in different sources, with very few entanglement) can make your code short and fast, as it is made in the A. style.
One step more can be to not only translate my classes, but also link them and give you a DLL.
To do this:
- The headers I provide to you must define everything you'll use as
- I also have to provide a .lib containing the stubs that allow your source to place calls to the functions and then do the "relocation" towards the supplied DLL (real function addresses are not known at compile time ...).
The advantages are essentially a better isolation of my code from your code, and the fact that if you use that library in many applications, you'll load it in memory just once (a static library is linked in every executable).
The drawbacks are essentially that:
- All my code is loaded in your memory (even if you used it only partially)
- Your executable is no more autonomous
The first point makes DLL really an advantage only if you are going to share them between a big number of related applications. If I provide a DLL with 3 classes and you use only two, if you are deploying just one application, then you load in memory all my 3 classes (with a static "lib", you'd link only the two you used). But if you are deploying ten applications using my classes, loading a shared DLL will load all my classes once, and - even if you use only a part - it may be better than statically linking ten times.
It is clear - at this point- that there should be a balance between the content of a DLL and the part of that content you are interested in. And how frequently you are using it.
As a consequence
It seems (to me) that the advantages of using libraries exist even with an "open code": it is useful to give better structure to projects and better the way reusing of part of the code is done.
Probably, for relatively small frameworks, the static library approach is the better tuned. It avoids DLL complications and cost relatively few in code replication on different applications.
But with templates it cannot be applied. For them a "near line coding" is probably the more similar approach.
Templates against Inheritance
Another consideration is the way polymorphism can be implemented: with "inheritance" of bases and interfaces, or by "wrapping" with templates.
And from this philosophies originate MFC, WFC, etc. and STL, ATL, WTL etc. or even languages like C# (it does everything with inheritance: it even doesn't have templates).
And this is another issue that's beginning to become another religion battle. But whatever the battle goes, what are the advantages in terms of coding in one way or in another?
This will be the subject for the next article, where I'll start to compare some samples.
- to be continued -