Click here to Skip to main content
Click here to Skip to main content

BEncode Lexing in C++

, 21 Jun 2009 CPOL
Rate this:
Please Sign up or sign in to vote.
A very efficient BEncode Lexer in C++.

Introduction

BENCODE is a data encoding method which is strongly associated with .torrent files. This article introduces a C++ class that can decode streams in that format.

Background

Actually, I needed a BENCODE lexing code in a project I'm working on, and no matter how extensive my web searches were, all I could come up with was either non-C/C++ code, the code was too dependant on other files which I couldn't locate, or the code took care of things I didn't want it to. So, as usual, the only sensible solution for me was to spend couple of hours and do it myself, the way I want it.

Using the code

Before we go to the actual usage part which is totally straightforward, I would like to emphasis a few points about the bencode_lexer class:

  • The bencode_lexer is stream-aware, which means you can feed it arbitrarily-sized chunks of data in the order they are read in by your code.
  • The class acts as the parsing context as well. In other words, a single bencode_lexer instance cannot process more than one content stream at a time.
  • The class is not thread-safe, but since it's not supposed to run on parallel streams, that is perfectly acceptable in my opinion.

Here is a simple code that dumps the content of a bencode stream onto the standard output device. It might not be too useful in the real world, but it shows how to use class:

#include <stdio.h>
#include "bencode.h"

int g_iIdent = 0;

bool Callback(bencode_lexer::Event e, const void* pv, int cb, long lParam)
{
 switch( e )
 {
  case bencode_lexer::eBeginDictionary:
  {
   for(int i=0; i < g_iIdent; ++i) fputc('\t', stdout);
   fputs("dictionary {\n", stdout);
   ++g_iIdent;
   break;
  }
  case bencode_lexer::eEndDictionary:
  {
   --g_iIdent;
   for(int i=0; i < g_iIdent; ++i) fputc('\t', stdout);
   fputs("}\n", stdout);
   break;
  }
  case bencode_lexer::eBeginList:
  {
   for(int i=0; i < g_iIdent; ++i) fputc('\t', stdout);
   fputs("list {\n", stdout);
   ++g_iIdent;
   break;
  }
  case bencode_lexer::eEndList:
  {
   --g_iIdent;
   for(int i=0; i < g_iIdent; ++i) fputc('\t', stdout);
   fputs("}\n", stdout);
   break;
  }
  case bencode_lexer::eInteger:
  {
   for(int i=0; i < g_iIdent; ++i) fputc('\t', stdout);
   fprintf(stdout, "integer=%lu\n", *(int*)pv);
   break;
  }
  case bencode_lexer::eString:
  {
   for(int i=0; i < g_iIdent; ++i) fputc('\t', stdout);
   fprintf(stdout, "string=%s\n", (const char*)pv);
   break;
  }
 }
 return true;
}

int main(int argc, char* argv[])
{
 if( argc < 2 )
  return 1;

 FILE* pf = fopen(argv[1], "rb");
 if( pf )
 {
  bencode_lexer bed(Callback);
  char chBuf[1024];

  for(;;)
  {
   int cbRead = fread(chBuf, 1, sizeof(chBuf), pf);
   if( cbRead < 1 )
    break;
   int cbParsed = bed.lex_chunk(chBuf, cbRead, 0);
   if( cbParsed < cbRead )
    break;
  }

  fclose(pf);
 }

 return 0;
}

Again, since bencode_lexer doesn't create any data-structures, it is up to you to construct the extracted data according to your project's needs.

Points of Interest

The bencode_lexer uses a hand-written stack class for managing its internal state machine. This class mimics the STL's stack class in a lot more limited but efficient way. In a real project, you will have to replace the hard-coded breakpoints I put into the ASSERT calls.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

airlobster
Software Developer (Senior)
Israel Israel
No Biography provided

Comments and Discussions

 
RantMore info needed! PinmemberPatLeCat23-Jun-09 0:30 
GeneralRe: More info needed! Pinmemberairlobster23-Jun-09 4:46 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web01 | 2.8.150327.1 | Last Updated 21 Jun 2009
Article Copyright 2009 by airlobster
Everything else Copyright © CodeProject, 1999-2015
Layout: fixed | fluid