Click here to Skip to main content
15,883,979 members
Articles / .NET

A Simple Brainf**k To COBOL Compiler In Managed COBOL

Rate me:
Please Sign up or sign in to vote.
4.50/5 (4 votes)
25 May 2010CC (ASA 2.5)7 min read 21.9K   1   6
A 101 on writing a compiler - a Managed COBOL compiler which compiles Brainf**k (a very simple language) to Managed COBOL

Introduction

Brainf**k (replace ** with u and c and you get the actual name - but I don't want to cause offence) is a super minimal programming language. Managed COBOL is a very large programming language. I just love the idea of translating BF into Managed COBOL. BF (as I shall call it from here on) contains only 8 symbols and is derived from P" which contains only 6. Both languages are Turin Complete and neither is of any utility in actual programming. P" was created as a exercise; the font of all knowledge (ahem) Wikipedia has this to say about it 'P'' was the first "GOTO-less" imperative structured programming language to be proven1,2 Turing-complete.'

My idea here is:
  1. Make a translator from BF to Managed COBOL in Managed COBOL.
  2. The result should be able to be used to create classes where the BF code is a method body.
  3. Demonstrate these being compile and called from COBOL with Visual COBOL 2010.
  4. Demonstrate some simple abstract symbol tree optimizations.
  5. Demonstrate BF being called from C# and (using a developmental compiler) Java [Later post]. 
  6. Go the whole-hog and translate BF to COBOL and then call from F# [Later post].
Yes - I want to prove that Managed COBOL can do fun stuff like making it possible to call BF from F#! 

For more on Managed COBOL please see my blog.  

What is BF and how can we translate it to COBOL?
BF has a notion of a continuous area of bytes which hold integers. There is a pointer into those bytes referred to as the data-pointer. There is also the program which is a sequence of symbols. All symbols are legal, but anything other than the 8 BF symbols are ignored. The program notionally moves from the start to the end of the symbols but does have a looping construct. P" and BF are the same to this point, however, BF extends the model into having some form of IO. This is normally associated with keyboard and display, but I will use the IO extensions of BF to provide the mechanism for passing in and returning data when BF is invoked from other languages.


Here as the BF symbols:
> move the data pointer right (higher value)
< move the data pointer left (lower value)
+ increment by 1 the byte at the current data pointer
- decrement by 1 the byte at the current data pointer
[ begin loop
] close loop
. move the value at the data pointer to the output
, move the next input value to the value at the data pointer

Notes: Loops continue to execute the symbols contained repeatedly until the data pointed to by the data pointer is 0 at the start of an iteration. Bytes can take the decimal values 0-255. Incrementing a value 255 will result in 0; decrementing a value 0 will result in 255. Loops can be nested.

Arguments And Returns
To allow my BF routine to be invoked from other languages, I have the idea of using the IO symbols. A table of bytes will be passed to BF. And BF will 'return' a new table of bytes (for the non COBOL heads, a table is pretty much the same thing as an array in C#).

The passed in table is the arguments to the routine and the passed back table is the return. BF has an input pointer which starts at the start on the arguments table; each time BF reads an bytes  from the arguments the pointer is incremented. Incrementing the input pointer beyond the end of the table causes the pointer to reset to 1 (the start).

The returned table is initialized as empty then each time there is an output from BF the output bytes value is added to the end of the output table. (The implementation does not actually append to a table, it appends to a list and converts the result to a table, but the logical effect is the same.)

Compilation
The compilation process involves simply iterating over to input bytes and each one being converted into an abstract symbol tree (AST) token or ignored. The AST can then be translated into COBOL and finally wrapped in some boilerplate COBOL to give it the appropriate environment. Optimization can happen to the AST.

The only part of BF which requires any serious thought (and only a few seconds at that) is the [] looping construct. Fortunately, modern COBOL dialects have the in-line, nested perform syntax. The [] BF syntax directly translated to...

    until data(data-pointer) = 0
...
end-perform

... assuming that the data area is simply a table of bytes and there is a local-storage item data-pointer which points into that table.

OK - Let's write a compiler!
I have chosen not to implement the normal file read and write stuff which a compiler does. This compiler will accept a string and return a string. The input string being BF and the output being the source for a Managed COBOL method body. This is the most straight forward implementation which can then be added to other programs to compile files or other source of source.

The compiler is massive overkill. BF is such a simple language that a very simple compiler will do. My aim with this compiler it to make it have a very clean and structured design. I have not actually implemented a full AST but rather just a list (a tree with only one branch). This is possible because there is no calling structure and no variables. It is really a testament to the efficiency of languages with only conditional loops.

Also, I have represented each of the BF symbols as an object (I call them tokens) which 'knows' the symbol and the required COBOL to enact that symbol. These objects all implement a common interface (BFToken). By moving the symbol definition and COBOL representation to a polymorphic type, it is possible to move the responsibility for understanding the lex from the main compiler loop and into these token objects. Although BF has only 8 symbols, the approach would allow additions of new symbols without changing the lexer or parser at all. This was a design feature I used in VSPL to some success.

Here is the code defining the token class interface and an instance of a token class:

interface-id BFToken as "BF.BFToken".

    method-id get property BFSymbol.
    procedure division returning ret as string.
    end method.

    method-id get property COBOLCode.
    procedure division returning ret as string.
    end method.

end interface.

class-id BFIncr as "BF.BFIncr" implements type BFToken.

    01 source-code.
      03 filler pic x(29) value "if data-block(data-ptr) = 255".
      03 filler pic x(02) value x"0D0A".
      03 filler pic x(34) value "    move 0 to data-block(data-ptr)".
      03 filler pic x(02) value x"0D0A".
      03 filler pic x(04) value "else".
      03 filler pic x(02) value x"0D0A".
      03 filler pic x(34) value "    add 1 to data-block(data-ptr)".
      03 filler pic x(02) value x"0D0A".
      03 filler pic x(06) value "end-if".

    method-id get property BFSymbol.
    procedure division returning ret as string.
       move "+" to ret.
    end method.

    method-id get property COBOLCode.
    procedure division returning ret as string.
       move source-code to ret
    end method.

end class.

The AST is just a list of already created token classes. Here is the code which does the lexing/parsing and creates the AST:

   table of type BFToken
   (
    new BFDecr
    new BFIncr
    new BFLeft
    new BFRight
    new BFInput
    new BFOutput
    new BFLoopStart
    new BFLoopEnd
   ).
01 separator string value x"0D0A".

method-id compile public.
    01 current-char character.
    01 current-token   type BFToken.
    *> Use a list as a dicationary is overkill when we only
    *> have 8 memebers and this is a trivial implementation
    01 ast type List[type  BFToken] value new List[type BFToken].
    01 code-builder type StringBuilder value new StringBuilder.

    procedure division using by value
        source-code as string
        returning generated-code as string.

    *> These loops make up the parser and lexer
    *> all in one
    perform varying current-char through source-code::ToCharArray
        *> respocibility for what characters make up the language
        *> is devolved to the token objects
        perform varying current-token through tokens
            if current-token::BFSymbol::Chars(0) equals current-char
                invoke ast::Add(current-token)
            end-if
        end-perform
    end-perform

Once this loop has constructed the AST it can be interrogated directly to get the COBOL source to inject into the boiler plate.

By going to the effort of laying out the COBOL to be generated in the value clauses of COBOL group items we cause the generated COBOL code to be human readable and have a clean representation in the compiler source as well.

Here is an example of the generated code:

01 data-block binary-short occurs 32768.
01 data-ptr binary-long value 1.
01 input-ptr binary-long value 1.
01 output-list type List[binary-short] value new List[binary-short].
01 data-block-size binary-long.
01 input-block-size binary-long.
procedure division
    using by value input-block as binary-short occurs any
    returning output-block     as binary-short occurs any.
set data-block-size  to data-block::Length
set input-block-size to input-block::Length
if data-block(data-ptr) = 255
    move 0 to data-block(data-ptr)
else
    add 1 to data-block(data-ptr) 
end-if
if data-block(data-ptr) = 255
    move 0 to data-block(data-ptr)
else
    add 1 to data-block(data-ptr) 
end-if
if data-block(data-ptr) = 255
    move 0 to data-block(data-ptr)
else
    add 1 to data-block(data-ptr) 
end-if
if data-block(data-ptr) = 255
    move 0 to data-block(data-ptr)
else
    add 1 to data-block(data-ptr) 
end-if

Visual COBOL 2010 - this makes playing around like this really straight forward!
Below is the compiler in Visual COBOL 2010 and then the source code for the compiler. I have included the entire compiler in one source file.



Here is my working compiler with the null optimizer:

$set sourceformat(variable)
$set ilusing(BF)

 class-id BFTester as "BF.BFTester".

     method-id main static.
     01 compiler type BF.BFCompiler value new BF.BFCompiler.
     procedure division.
         display compiler::compile("++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.")
         display "All done"
     end method.
 end class.

 class-id BFCompiler as "BF.BFCompiler".

     01 optimizer type BFOptimizer value new NullBFOptimizer.
     01 tokens    type BFToken occurs any value
        table of type BFToken
        (
         new BFDecr
         new BFIncr
         new BFLeft
         new BFRight
         new BFInput
         new BFOutput
         new BFLoopStart
         new BFLoopEnd
        ).
     01 separator string value x"0D0A".

     method-id compile public.
         01 current-char character.
         01 current-token   type BFToken.
         *> Use a list as a dicationary is overkill when we only
         *> have 8 memebers and this is a trivial implementation
         01 ast type List[type  BFToken] value new List[type BFToken].
         01 code-builder type StringBuilder value new StringBuilder.

         procedure division using by value
             source-code as string
             returning generated-code as string.

         *> These loops make up the parser and lexer
         *> all in one
         perform varying current-char through source-code::ToCharArray
             *> respocibility for what characters make up the language
             *> is devolved to the token objects
             perform varying current-token through tokens
                 if current-token::BFSymbol::Chars(0) equals current-char
                     invoke ast::Add(current-token)
                 end-if
             end-perform
         end-perform

         *> Now we have the ast we optimize it
         invoke optimizer::Optimize(ast)

         *> Now we generate
         *> First put in boiler plate
         invoke code-builder::Append("01 data-block binary-short occurs 32768.")
         invoke code-builder::Append(separator)
         invoke code-builder::Append("01 data-ptr binary-long value 1.")
         invoke code-builder::Append(separator)
         invoke code-builder::Append("01 input-ptr binary-long value 1.")
         invoke code-builder::Append(separator)
         invoke code-builder::Append("01 output-list type List[binary-short] value new List[binary-short].")
         invoke code-builder::Append(separator)
         invoke code-builder::Append("01 data-size binary-long.")
         invoke code-builder::Append(separator)
         invoke code-builder::Append("01 input-size binary-long.")
         invoke code-builder::Append(separator)
         invoke code-builder::Append("procedure division")
         invoke code-builder::Append(separator)
         invoke code-builder::Append("    using by value input-block as binary-short occurs any")
         invoke code-builder::Append(separator)
         invoke code-builder::Append("    returning output-block     as binary-short occurs any.")
         invoke code-builder::Append(separator)
         invoke code-builder::Append("set data-block-size  data-block::Length")
         invoke code-builder::Append(separator)
         invoke code-builder::Append("set input-block-size input-block::Length")
         invoke code-builder::Append(separator)

         perform varying current-token through ast
            invoke code-builder::Append(current-token::COBOLCode)
            invoke code-builder::Append(separator)
         end-perform
         *> Final boiler plate
         invoke code-builder::Append("set output-block to output-list::ToArray")
         invoke code-builder::Append(separator)

         set generated-code to code-builder::ToString
         goback.
     end method.

 end class.

 interface-id BFToken as "BF.BFToken".

     method-id get property BFSymbol.
     procedure division returning ret as string.
     end method.

     method-id get property COBOLCode.
     procedure division returning ret as string.
     end method.

 end interface.

 class-id BFIncr as "BF.BFIncr" implements type BFToken.

     01 source-code.
       03 filler pic x(29) value "if data-block(data-ptr) = 255".
       03 filler pic x(02) value x"0D0A".
       03 filler pic x(34) value "    move 0 to data-block(data-ptr)".
       03 filler pic x(02) value x"0D0A".
       03 filler pic x(04) value "else".
       03 filler pic x(02) value x"0D0A".
       03 filler pic x(34) value "    add 1 to data-block(data-ptr)".
       03 filler pic x(02) value x"0D0A".
       03 filler pic x(06) value "end-if".

     method-id get property BFSymbol.
     procedure division returning ret as string.
        move "+" to ret.
     end method.

     method-id get property COBOLCode.
     procedure division returning ret as string.
        move source-code to ret
     end method.

 end class.

 class-id BFDecr as "BF.BFDecr" implements type BFToken.

     01 source-code.
       03 filler pic x(27) value "if data-block(data-ptr) = 0".
       03 filler pic x(02) value x"0D0A".
       03 filler pic x(36) value "    move 255 to data-block(data-ptr)".
       03 filler pic x(02) value x"0D0A".
       03 filler pic x(04) value "else".
       03 filler pic x(02) value x"0D0A".
       03 filler pic x(40) value "    subtract 1 from data-block(data-ptr)".
       03 filler pic x(02) value x"0D0A".
       03 filler pic x(06) value "end-if".

     method-id get property BFSymbol.
     procedure division returning ret as string.
        move "-" to ret.
     end method.

     method-id get property COBOLCode.
     procedure division returning ret as string.
        move source-code to ret
     end method.

 end class.

 class-id BFDecr as "BF.BFRight" implements type BFToken.

     01 source-code.
       03 filler pic x(30) value "if data-ptr = data-block-size".
       03 filler pic x(02) value x"0D0A".
       03 filler pic x(22) value "    move 1 to data-ptr".
       03 filler pic x(02) value x"0D0A".
       03 filler pic x(04) value "else".
       03 filler pic x(02) value x"0D0A".
       03 filler pic x(21) value "    add 1 to data-ptr".
       03 filler pic x(02) value x"0D0A".
       03 filler pic x(06) value "end-if".

     method-id get property BFSymbol.
     procedure division returning ret as string.
        move ">" to ret.
     end method.

     method-id get property COBOLCode.
     procedure division returning ret as string.
        move source-code to ret
     end method.

 end class.

 class-id BFDecr as "BF.BFLeft" implements type BFToken.
     01 source-code.
       03 filler pic x(15) value "if data-ptr = 1"
       03 filler pic x(02) value x"0D0A".
       03 filler pic x(37) value "    move data-block-size to data-ptr".
       03 filler pic x(02) value x"0D0A".
       03 filler pic x(04) value "else".
       03 filler pic x(02) value x"0D0A".
       03 filler pic x(28) value "    subtract 1 from data-ptr".
       03 filler pic x(02) value x"0D0A".
       03 filler pic x(06) value "end-if".

     method-id get property BFSymbol.
     procedure division returning ret as string.
        move "<" to ret.
     end method.

     method-id get property COBOLCode.
     procedure division returning ret as string.
        move source-code to ret
     end method.

 end class.

 class-id BFDecr as "BF.BFInput" implements type BFToken.

     01 source-code.
       03 filler pic x(51) value "move input-block(input-ptr) to data-block(data-ptr)"
       03 filler pic x(02) value x"0D0A".
       03 filler pic x(32) value "if input-ptr = input-block-size".
       03 filler pic x(02) value x"0D0A".
       03 filler pic x(23) value "    move 1 to input-ptr".
       03 filler pic x(02) value x"0D0A".
       03 filler pic x(04) value "else".
       03 filler pic x(02) value x"0D0A".
       03 filler pic x(22) value "    add 1 to input-ptr".
       03 filler pic x(02) value x"0D0A".
       03 filler pic x(06) value "end-if".

     method-id get property BFSymbol.
     procedure division returning ret as string.
        move "," to ret.
     end method.

     method-id get property COBOLCode.
     procedure division returning ret as string.
        move source-code to ret
     end method.

 end class.

 class-id BFDecr as "BF.BFOutput" implements type BFToken.

     method-id get property BFSymbol.
     procedure division returning ret as string.
        move "." to ret.
     end method.

     method-id get property COBOLCode.
     procedure division returning ret as string.
        move "invoke output-list::Add(data-block(data-ptr))" to ret
     end method.

 end class.

 class-id BFDecr as "BF.BFLoopStart" implements type BFToken.

     01 source-code pic x(55) value "perform with test before until data-block(data-ptr) = 0".

     method-id get property BFSymbol.
     procedure division returning ret as string.
        move "[" to ret.
     end method.

     method-id get property COBOLCode.
     procedure division returning ret as string.
        move source-code to ret
     end method.

 end class.

 class-id BFDecr as "BF.BFLoopEnd" implements type BFToken.

     01 source-code pic x(11) value "end-perform".

     method-id get property BFSymbol.
     procedure division returning ret as string.
        move "]" to ret.
     end method.

     method-id get property COBOLCode.
     procedure division returning ret as string.
        move source-code to ret
     end method.

 end class.

 interface-id BFOptimizer as "BF.BFOptimizer".

     method-id Optimize.
     procedure division using by value ast
               as type System.Collections.Generic.List[type BFToken].
     end method.

 end interface.

 class-id NullBFOptimizer as "BF.NullBFOptimizer" implements type BFOptimizer.

     method-id Optimize.
     procedure division using by value ast
               as type System.Collections.Generic.List[type BFToken].
     *> Do nothing - does not optimize
     end method.

 end class.

I ran this compiler which was compiling a simple 'Hello World' program. I directed the output to a text file and then cut and paste the COBOL into a Visual Studio 2010 Managed COBOL project. When I ran the project, it produced the expected array (which in ASCII says Hello World! followed by a new line).



Simple Optimization
In the above implementation I have used a null optimiser; it does not do anything. The optimiser has the AST passed to it and is expected to perform all optimisations in place. I have been doing this project for fun (lunch is for lightweights etc.) and so I have not implemented much in the way of an optimiser. One thing which is really simple but insanely verbose about BF is its need of long sections of identical symbols. My optimiser takes chains of incr tokens and converts them into a single token which performs the same operation but in one step. To do this I have added a new token which is not added to the lexer but which is  usable by the generator. This token replaces the inefficient incr chains in the AST.

The optimiser works by iterating over the AST and adding all non incr tokens directly to a new AST. When it encounters incr tokens, it counts over them and when the next non incr token is found it creates a single token to replace the chain of incr tokens and adds this new token to the new AST. Once the entire AST has been iterated over, it replaced the contents of the AST with the new AST.

class-id BFIncrMany as "BF.BFIncrMany" implements type BFToken.

    01 source-code string.

    method-id new.
    01 source-builder type StringBuilder value new StringBuilder.
    procedure division using by value how-many as binary-long.

      invoke source-builder::AppendFormat("add {0} to data-block(data-ptr)" how-many)
      invoke source-builder::Append(x"0D0A")
      invoke source-builder::AppendFormat("compute data-block(data-ptr) = data-block(data-ptr) b-and 255")
      set source-code to source-builder::ToString

    end method.

    method-id get property BFSymbol.
    procedure division returning ret as string.
       move "Not Implemented" to ret.
    end method.

    method-id get property COBOLCode.
    procedure division returning ret as string.
       move source-code to ret
    end method.

end class.

class-id IncrBFOptimizer as "BF.IncrBFOptimizer" implements type BFOptimizer.

    method-id Optimize.
       01 in-incr condition-value value false.
       01 how-many binary-long.
       01 current-token type BFToken.
       01 swapsies type List[type BFToken] value new List[Type BFToken].

    procedure division using by value ast
              as type System.Collections.Generic.List[type BFToken].

        perform varying current-token through ast
            if in-incr
                if current-token::BFSymbol = "+"
                   *> This simple logic means very long chains of
                   *> + could overflow - but this is demo only
                   add 1 to how-many
                else
                   set in-incr to false
                   invoke swapsies::Add(new BFIncrMany(how-many))
                end-if
            else
                if current-token::BFSymbol = "+"
                   add 1 to how-many
                   set in-incr to true
                else
                   invoke swapsies::Add(current-token)
                end-if
            end-if
        end-perform

        invoke ast::Clear
        perform varying current-token through swapsies
            invoke ast::Add(current-token)
        end-perform

    end method.

end class.

Now we can see how the large chains of increment token generated code have been replaced by simple additions:

01 data-block binary-short occurs 32768.
01 data-ptr binary-long value 1.
01 input-ptr binary-long value 1.
01 output-list type List[binary-short] value new List[binary-short].
01 data-size binary-long.
01 input-size binary-long.
procedure division
    using by value input-block as binary-short occurs any
    returning output-block     as binary-short occurs any.
set data-block-size  data-block::Length
set input-block-size input-block::Length
add 10 to data-block(data-ptr)
compute data-block(data-ptr) = data-block(data-ptr) b-and 255
if data-ptr = data-block-size 
    move 1 to data-ptr
else
    add 1 to data-ptr
end-if
add 17 to data-block(data-ptr)
compute data-block(data-ptr) = data-block(data-ptr) b-and 255
add 27 to data-block(data-ptr)
compute data-block(data-ptr) = data-block(data-ptr) b-and 255
add 30 to data-block(data-ptr)
compute data-block(data-ptr) = data-block(data-ptr) b-and 255
add 31 to data-block(data-ptr)
compute data-block(data-ptr) = data-block(data-ptr) b-and 255
if data-ptr = 1
    move data-block-size to data-ptr 
else
    subtract 1 from data-ptr
end-if
if data-ptr = 1
    move data-block-size to data-ptr 
else
    subtr...


I really enjoyed that. I got back to my nerd roots!
If you got as far as the bottom of this post - thanks for reading and keep having fun!

License

This article, along with any associated source code and files, is licensed under The Creative Commons Attribution-ShareAlike 2.5 License


Written By
Web Developer
United Kingdom United Kingdom
I am now a Software Systems Developer - Senior Principal at Micro Focus Plc. I am honoured to work in a team developing new compiler and runtime technology for Micro Focus.

My past includes a Ph.D. in computational quantum mechanics, software consultancy and several/various software development and architecture positions.

For more - see

blog: http://nerds-central.blogspot.com

twitter: http://twitter.com/alexturner

Comments and Discussions

 
GeneralMy vote of 5 Pin
RugbyLeague10-Oct-12 4:25
RugbyLeague10-Oct-12 4:25 
GeneralYou're a sick, sick man, part deux Pin
Gary Wheeler1-Jun-10 0:41
Gary Wheeler1-Jun-10 0:41 
GeneralRe: You're a sick, sick man, part deux Pin
alex turner1-Jun-10 1:50
alex turner1-Jun-10 1:50 
GeneralRe: You're a sick, sick man, part deux Pin
Gary Wheeler1-Jun-10 1:58
Gary Wheeler1-Jun-10 1:58 
GeneralRe: You're a sick, sick man, part deux Pin
alex turner1-Jun-10 1:59
alex turner1-Jun-10 1:59 
GeneralYou're a sick sick man Pin
Lee Humphries26-May-10 13:48
professionalLee Humphries26-May-10 13:48 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.