Click here to Skip to main content
15,883,827 members
Articles / Programming Languages / C++

BEncode Lexing in C++

Rate me:
Please Sign up or sign in to vote.
4.67/5 (3 votes)
21 Jun 2009CPOL2 min read 24K   202   9   2
A very efficient BEncode Lexer in C++.

Introduction

BENCODE is a data encoding method which is strongly associated with .torrent files. This article introduces a C++ class that can decode streams in that format.

Background

Actually, I needed a BENCODE lexing code in a project I'm working on, and no matter how extensive my web searches were, all I could come up with was either non-C/C++ code, the code was too dependant on other files which I couldn't locate, or the code took care of things I didn't want it to. So, as usual, the only sensible solution for me was to spend couple of hours and do it myself, the way I want it.

Using the code

Before we go to the actual usage part which is totally straightforward, I would like to emphasis a few points about the bencode_lexer class:

  • The bencode_lexer is stream-aware, which means you can feed it arbitrarily-sized chunks of data in the order they are read in by your code.
  • The class acts as the parsing context as well. In other words, a single bencode_lexer instance cannot process more than one content stream at a time.
  • The class is not thread-safe, but since it's not supposed to run on parallel streams, that is perfectly acceptable in my opinion.

Here is a simple code that dumps the content of a bencode stream onto the standard output device. It might not be too useful in the real world, but it shows how to use class:

C++
#include <stdio.h>
#include "bencode.h"

int g_iIdent = 0;

bool Callback(bencode_lexer::Event e, const void* pv, int cb, long lParam)
{
 switch( e )
 {
  case bencode_lexer::eBeginDictionary:
  {
   for(int i=0; i < g_iIdent; ++i) fputc('\t', stdout);
   fputs("dictionary {\n", stdout);
   ++g_iIdent;
   break;
  }
  case bencode_lexer::eEndDictionary:
  {
   --g_iIdent;
   for(int i=0; i < g_iIdent; ++i) fputc('\t', stdout);
   fputs("}\n", stdout);
   break;
  }
  case bencode_lexer::eBeginList:
  {
   for(int i=0; i < g_iIdent; ++i) fputc('\t', stdout);
   fputs("list {\n", stdout);
   ++g_iIdent;
   break;
  }
  case bencode_lexer::eEndList:
  {
   --g_iIdent;
   for(int i=0; i < g_iIdent; ++i) fputc('\t', stdout);
   fputs("}\n", stdout);
   break;
  }
  case bencode_lexer::eInteger:
  {
   for(int i=0; i < g_iIdent; ++i) fputc('\t', stdout);
   fprintf(stdout, "integer=%lu\n", *(int*)pv);
   break;
  }
  case bencode_lexer::eString:
  {
   for(int i=0; i < g_iIdent; ++i) fputc('\t', stdout);
   fprintf(stdout, "string=%s\n", (const char*)pv);
   break;
  }
 }
 return true;
}

int main(int argc, char* argv[])
{
 if( argc < 2 )
  return 1;

 FILE* pf = fopen(argv[1], "rb");
 if( pf )
 {
  bencode_lexer bed(Callback);
  char chBuf[1024];

  for(;;)
  {
   int cbRead = fread(chBuf, 1, sizeof(chBuf), pf);
   if( cbRead < 1 )
    break;
   int cbParsed = bed.lex_chunk(chBuf, cbRead, 0);
   if( cbParsed < cbRead )
    break;
  }

  fclose(pf);
 }

 return 0;
}

Again, since bencode_lexer doesn't create any data-structures, it is up to you to construct the extracted data according to your project's needs.

Points of Interest

The bencode_lexer uses a hand-written stack class for managing its internal state machine. This class mimics the STL's stack class in a lot more limited but efficient way. In a real project, you will have to replace the hard-coded breakpoints I put into the ASSERT calls.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior)
Israel Israel
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
RantMore info needed! Pin
PatLeCat22-Jun-09 23:30
PatLeCat22-Jun-09 23:30 
GeneralRe: More info needed! Pin
airlobster23-Jun-09 3:46
airlobster23-Jun-09 3:46 
wow, was i that unclear?
ok, i'll try to help a little.
the class i provided in this article decodes bencode streams. that's it. no more, no less. i didnt provide an encoder, simply because...well...i didnt need one myself Smile | :)
anyway, streams of that kind are used, among other places, in .torrent files and in responses from torrent trackers.
about how to use it, well, i did gave a sample code for that. it's just too simple so you must have missed it.
Also, the term "bencode" is just a term to describe a way of encoding arbitrary data, and just like all other encoding methods out there, if you want to read the data, you need to DECODE it first, and that's what i provided.

sorry for the confusion. hope i assisted a little here.

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.