Click here to Skip to main content
Email Password   helpLost your password?

Introduction

Cpphtml is a utility to convert your C++ code to HTML. If you have a C++ file, say myprogram.cpp, and you want to put it on your website, you can run it through Cpphtml which will convert the code to HTML with all comments, keywords and preprocessor directives highlighted. Cpphtml will send all output to cout, so you have to redirect the output to a file if you want to create a HTML file:

C:\>cpphtml myprogram.cpp >myprogram.htm

Cpphtml will convert all tabs to 4 spaces. If you want the tab size to be 8 spaces, you can specify the tab size on the command line:

C:\>cpphtml myprogram.cpp 8 >myprogram.htm

The HTML code contains a <style> element which contains style rules for comments, keywords and preprocessor directives. So, you don't have to do a search-and-replace if you want to change, say, the color of keywords. For example, if you want all keywords in bold, just change the .keyword style rule: .keyword{color:rgb(0,0,255);font-weight:bold}. It's that easy.

I don't claim Cpphtml works perfectly. I tested it on the Dinkumware STL files, the source of Cpphtml, and a large Microsoft CPP file. The results are great. Cpphtml was compiled with the Borland C++ 5.5 command line compiler: bcc32 cpphtml.cpp.

A walk through the code

#include<fstream>

#include<string>

#include<ctype.h>

Cpphtml will replace all tabs by 4 spaces if no tab size is specified. Change _TABSIZE to 8 if you want the default tab size to be 8.

#define _TABSIZE    4

using namespace std;

int tabsize = _TABSIZE;

Token is a class which represents chunks of code. A token can have the type comment, pp (preprocessor directive), keyword or code. Code is everything which is not a comment, pp or keyword. Note that there are no getter and setter methods: because operator>> and operator<< are friends of class token, we don't need any.

class token {
public:
    token() : _what(code) {}
protected:
    enum type {code, comment, pp, keyword};
    string _str;
    type _what;
    friend istream& operator>>(istream&, token&);
    friend ostream& operator<<(ostream&, const token&);
};

The function iskeyword() returns true if string s is a C++ keyword, false if not. It's possible you don't recognize some keywords, e.g. and. Those keywords can be used by programmers who don't have access to all ASCII characters. I've never seen code with such keywords though.

bool iskeyword(const string& s)
{
    static const char* keywords[] = {
        "and",
        "and_eq",
        "asm",
        "auto",
        "bitand",
        "bitor",
        "bool",
        "break",
        "case",
        "catch",
        "char",
        "class",
        "compl",
        "const",
        "const_cast",
        "continue",
        "default",
        "delete",
        "do",
        "double",
        "dynamic_cast",
        "else",
        "enum",
        "explicit",
        "export",
        "extern",
        "false",
        "float",
        "for",
        "friend",
        "goto",
        "if",
        "inline",
        "int",
        "long",
        "mutable",
        "namespace",
        "new",
        "not",
        "not_eq",
        "operator",
        "or",
        "or_eq",
        "private",
        "protected",
        "public",
        "register",
        "reinterpret_cast",
        "return",
        "short",
        "signed",
        "sizeof",
        "static",
        "static_cast",
        "struct",
        "switch",
        "template",
        "this",
        "throw",
        "true",
        "try",
        "typedef",
        "typeid",
        "typename",
        "union",
        "unsigned",
        "using",
        "virtual",
        "void",
        "volatile",
        "wchar_t",
        "while",
        "xor",
        "xor_eq"
    };

    for (int i = 0; i < sizeof(keywords) / sizeof(char*); i++)
        if (string(keywords[i]) == s)
            return true;

    return false;
}

The function containspp() returns true if string s contains a substring which is a preprocessor directive. A token of type pp can contain a string of the form "#...define", therefore, we have to find a substring.

bool containspp(const string& s)
{
    static const char* pptokens[] = {
        "define",
        "elif",
        "else",
        "endif",
        "error",
        "if",
        "ifdef",
        "ifndef",
        "include",
        "line",
        "pragma",
        "undef"
    };

    for (int i = 0; i < sizeof(pptokens) / sizeof(char*); i++)
        if (s.find(pptokens[i]) != string::npos)
            return true;

    return false;
}

Operator>> extracts a token from an input stream. It recognizes "//" and "/*...*/" comments, preprocessor directives of the form "#...define", and keywords. String constants are also recognized to avoid keywords to be highlighted in strings.

istream& operator>>(istream& is, token& t)
{
    t._str = "", t._what = token::code;
    int c = is.get();
    switch (c) {
        case '/':
            c = is.get();
            if (c == '*') {
                t._str = "/*";
                t._what = token::comment;
                while (1) {
                    c = is.get();
                    if (c == EOF)
                        return is.unget(), is.clear(), is;
                    if (c == '/') {
                        if (t._str.length() > 2 &&
                            t._str[t._str.length() - 1] == '*') {
                            return t._str += '/', is;
                        }
                    }
                    t._str += (char)c;
                }
            } else if (c == '/') {
                t._str = "//";
                t._what = token::comment;
                c = is.get();
                while (c != '\n' && c != EOF) {
                    t._str += (char)c;
                    c = is.get();
                }
                if (c == '\n') {
                    t._str += '\n';
                }
                return is;
            }
            t._str = '/';
            return is.unget(), is.clear(), is;
        case '#':
            t._str = '#';
            c = is.get();
            while (strchr(" \r\n\t", c)) {
                t._str += (char)c;
                c = is.get();
            }
            if (c == EOF)
                return is.unget(), is.clear(), is;
            while (strchr("abcdefghijklmnopqrstuvwxyz", c)) {
                t._str += (char)c;
                c = is.get();
            }
            is.unget(), is.clear();
            if (containspp(t._str))
                t._what = token::pp;
            return is;
        case '\'':
        case '"': {
            char q = (char)c;
            t._str = q;
            while (1) {
                c = is.get();
                if (c == EOF)
                    return is.unget(), is.clear(), is;
                if (c == q) {
                    if (t._str.length() >= 2) {
                        if (!(t._str[t._str.length() - 1] == '\\' &&
                            t._str[t._str.length() - 2] != '\\'))
                            return t._str += q, is;
                    } else {
                        return t._str += q, is;
                    }
                }
                t._str += (char)c;                
            }
        }
        case 'a':
        case 'b':
        case 'c':
        case 'd':
        case 'e':
        case 'f':
        case 'g':
        case 'i':
        case 'l':
        case 'm':
        case 'n':
        case 'o':
        case 'p':
        case 'r':
        case 's':
        case 't':
        case 'u':
        case 'v':
        case 'w':
        case 'x':
            t._str += (char)c;
            c = is.get();
            while (isalpha(c) || isdigit(c) || c == '_') {
                t._str += (char)c;
                c = is.get();
            }
            is.unget(), is.clear();
            if (iskeyword(t._str))
                t._what = token::keyword;
            return is;
        case EOF:
            return is;
        default:
            t._str += (char)c;
            c = is.get();
            while (c != '/' && c != '#' && !strchr("abcdefgilmnoprstuvwx", c) &&
                c != '\'' && c != '"' && c != EOF) {
                t._str += (char)c;
                c = is.get();
            }
            is.unget(), is.clear();
            return is;
    }
}

The function html() replaces the characters '&', '<', '>' and '"' in string s by its HTML equivalents and replaces the tabs by spaces.

string html(const string& s)
{
    string s1;
    string::size_type i;
    for (i = 0; i < s.length(); i++) {
        switch (s[i]) {
            case '&':
                s1 += "&";
                break;
            case '<':
                s1 += "<";
                break;
            case '>':
                s1 += ">";
                break;
            case '"':
                s1 += """;
                break;
            case '\t':
                s1.append(tabsize, ' ');
                break;
            default:
                s1 += s[i];
        }
    }
    return s1;
}

Operator<< sends a token to an output stream. The code is straightforward.

ostream& operator<<(ostream& os, const token& t)
{
    if (t._what == token::code)
        cout << html(t._str);
    else if (t._what == token::comment)
        cout << "<span class=comment>" << html(t._str) << "</span>";
    else if (t._what == token::keyword)
        cout << "<span class=keyword>" << html(t._str) << "</span>";
    else if (t._what == token::pp)
        cout << "<span class=pp>" << html(t._str) << "</span>";
    else
        cout << html(t._str);
    return os;
}

This is the entry point of Cpphtml. All code will be wrapped in a <pre> element. By overloading operator>> and operator<<, the while loop is very short and clean. All output is sent to cout.

int main(int argc, char **argv)
{
    if (argc != 2 && argc != 3) {
        cout << "usage: cpphtml file [tab size]" << endl;
        return 0;
    }
    ifstream is(argv[1]);
    if (!is.good()) {
        cerr << "bad input file" << endl;
        return -1;
    }
    if (argc == 3) {
        tabsize = atoi(argv[2]);
        if (tabsize <= 0)
            tabsize = _TABSIZE;
    }
    cout << "<html>" << endl 
      << "<head>" << endl 
      << "<style>" << endl;
    cout << ".keyword{color:rgb(0,0,255);}" << endl;
    cout << ".comment{color:rgb(0,128,0);}" << endl;
    cout << ".pp{color:rgb(0,0,255);}" << endl;
    cout << "</style>" << endl << "<body>" << endl;
    cout << "<pre style=\"font-family:courier;font-size:10pt\">";
    token t;
    while (is >> t) {
        cout << t;
    }
    cout << "</pre>" << "</body>" 
         << endl << "</html>" << endl;
    return 0;
}
You must Sign In to use this message board.
 
 
Per page   
 FirstPrevNext
GeneralWow. Thanks!
CaptainMorganRawks
20:50 26 Aug '07  
I began writing my own version, and then I figured I was going to need more time to be able to write this and wham! I came across yours! Thanks for saving me a lot of time coding this up. Although, that javascript version is pretty sweet and I'm not a fan of javascript, thus I think I will improve on yours to change coloring schemes and to recognize decimal values and cout output( cout << "Output text" << endl;).

Thanks for getting this puppy started. You rock.

Capt
GeneralHeader
Flame Alchemist
2:50 20 Jul '07  
i think you forgot the iostream header, because the cout doesn't work... and how i can specify the path? when i drag a file to the exe the conversion begin immediately...
GeneralRe: Header
q123456789
23:56 21 Jul '07  
1) I didn't forget to include iostream, I used the free Borland C++ compiler at that time and it seems to include the necessary file. But indeed, Microsoft C++ seems not to include the file needed.

2) "...drag a file to the exe..."? I didn't even know you could do that, I learned something new! You could change the utility to write to a file (a.out, for example) instead of to std::cout. The utility writes to std::cout so that you can redirect the output to whatever you want: a file or maybe another utility using pipes.
GeneralRe: Header
Flame Alchemist
3:34 24 Jul '07  
i use dev cpp.. i must include manually. Thanks!
GeneralN00b needs help with this program
Harrierfalcon
18:39 2 May '06  
I am very involved in HTML and know the basics of C++, but how do you send the output to, say, a text file? In fact, can someone send me a redone program whose output leads to "C:\Windows\Desktop\HTML.txt" or " C:\Windows\Desktop\WebPages\HTML.htm" that would be much easier just to harrierfalcon@yahoo.com

-- modified at 23:39 Tuesday 2nd May, 2006
GeneralRe: N00b needs help with this program
Wes Aday
18:56 2 May '06  
The very top of the article tells you how to do exactly that:

C:\>cpphtml myprogram.cpp >myprogram.htm

Why is common sense not common?
Never argue with an idiot. They will drag you down to their level where they are an expert.
GeneralRe: N00b needs help with this program
Harrierfalcon
12:10 3 May '06  
Yeah, I know that, but where do you put it? would it look something like this?

C:\>cpphtml FunctionGraceCalc.cpp>HTML.htm <<

with FunctionGradeCalc being the program, HTML.htm the page, and the whole thing replacing the "cout"
GeneralRe: N00b needs help with this program
Wes Aday
12:19 3 May '06  
It has nothing to do with cout. You would run this program on the command line giving it your cpp file and the name of the output file

C:\cpphtml FunctionGraceCalc.cpp > NameofHTMLfile.htm

Would turn you FunctionGraceCalc.cpp file into an HTML file named NameofHTMLFile.htm. If you do not supply the htm filename then the program just dumps the output to the screen.

Why is common sense not common?
Never argue with an idiot. They will drag you down to their level where they are an expert.
Generaldigrams/trigrams
toxcct
23:53 31 Jan '06  
you don't take in account the digrams/trigrams C++ operators... Wink

this could be an improvement to do....



TOXCCT >>> GEII power
[toxcct][VisualCalc 2.20][VCalc 3.0 soon...]
GeneralRe: digrams/trigrams
Jörgen Sigvardsson
9:14 4 Feb '06  
Anyone using digrams and trigrams deserves no HTML :P
GeneralI want Line Number
onjo
18:49 14 Nov '05  
Java2html => Line Number (0)

c++2html => Line Number (x)

I want Line Number !!!
GeneralRe: I want Line Number
q123456789
23:11 14 Nov '05  
If you want line numbers, at first sight, I think you would have to do something like this:

1. add a variable "int lineno=0" to main()

2. write a function print(int *lineno, const token& t) which breaks up the token text in lines (text separated by '\n') and do what operator<< does for each line of text (can't be that difficult) and spit out each line in the desired format, for example:

printf("(%d) %s\n", *lineno, t._code);

and update lineno:

*lineno += lines.size().

3. call print(&lineno, t) in the loop instead of doing stdout << t;

GeneralAnother suggestion... not to the author
gcampbell
15:26 14 Oct '05  
This application is very useful, but you might consider this other alternative to... Big Grin

http://www.bedaux.net/cpp2html/[^]
GeneralThank you (and suggestion)
Pablo Aliskevicius
5:29 1 May '05  
Thank you, you've saved me hours of coding.
A couple of suggestions:

1. Add another level of syntax coloring - user defined words (such as can be found in Visual Studio).

2. A change to the inserter:

ostream& operator<<(ostream& os, const token& t)
{
if (t._what == token::code)
os << html(t._str);
else if (t._what == token::comment)
os << "" << html(t._str) << "";
else if (t._what == token::keyword)
os << "" << html(t._str) << "";
else if (t._what == token::pp)
os << "" << html(t._str) << "";
else if (t._what == token::udf)
os << "" << html(t._str) << "";
else os << html(t._str);
return os;
}

Using os instead of cout enables using string streams, which makes the tool suitable for syntax coloring just parts of a document in memory.



Pablo
GeneralRe: Thank you (and suggestion)
q123456789
4:41 2 May '05  
1. I'm not going to do this Smile

2. Yes, you're right. It should have been like you say.

3. Recently, I discovered a bug: if you have code with types like "GLfloat", cpphtml will highlite the "float" in "GLfloat".
GeneralThanks, It's very good
lynhoo
16:49 2 Jun '04  
Thanks, It's very good


Last Updated 25 May 2004 | Advertise | Privacy | Terms of Use | Copyright © CodeProject, 1999-2010