Click here to Skip to main content
Click here to Skip to main content

A Simple Brainf**k To COBOL Compiler In Managed COBOL

, 25 May 2010
Rate this:
Please Sign up or sign in to vote.
A 101 on writing a compiler - a Managed COBOL compiler which compiles Brainf**k (a very simple language) to Managed COBOL

Introduction

Brainf**k (replace ** with u and c and you get the actual name - but I don't want to cause offence) is a super minimal programming language. Managed COBOL is a very large programming language. I just love the idea of translating BF into Managed COBOL. BF (as I shall call it from here on) contains only 8 symbols and is derived from P" which contains only 6. Both languages are Turin Complete and neither is of any utility in actual programming. P" was created as a exercise; the font of all knowledge (ahem) Wikipedia has this to say about it 'P'' was the first "GOTO-less" imperative structured programming language to be proven1,2 Turing-complete.'

My idea here is:
  1. Make a translator from BF to Managed COBOL in Managed COBOL.
  2. The result should be able to be used to create classes where the BF code is a method body.
  3. Demonstrate these being compile and called from COBOL with Visual COBOL 2010.
  4. Demonstrate some simple abstract symbol tree optimizations.
  5. Demonstrate BF being called from C# and (using a developmental compiler) Java [Later post]. 
  6. Go the whole-hog and translate BF to COBOL and then call from F# [Later post].
Yes - I want to prove that Managed COBOL can do fun stuff like making it possible to call BF from F#! 

For more on Managed COBOL please see my blog.  

What is BF and how can we translate it to COBOL?
BF has a notion of a continuous area of bytes which hold integers. There is a pointer into those bytes referred to as the data-pointer. There is also the program which is a sequence of symbols. All symbols are legal, but anything other than the 8 BF symbols are ignored. The program notionally moves from the start to the end of the symbols but does have a looping construct. P" and BF are the same to this point, however, BF extends the model into having some form of IO. This is normally associated with keyboard and display, but I will use the IO extensions of BF to provide the mechanism for passing in and returning data when BF is invoked from other languages.


Here as the BF symbols:
> move the data pointer right (higher value)
< move the data pointer left (lower value)
+ increment by 1 the byte at the current data pointer
- decrement by 1 the byte at the current data pointer
[ begin loop
] close loop
. move the value at the data pointer to the output
, move the next input value to the value at the data pointer

Notes: Loops continue to execute the symbols contained repeatedly until the data pointed to by the data pointer is 0 at the start of an iteration. Bytes can take the decimal values 0-255. Incrementing a value 255 will result in 0; decrementing a value 0 will result in 255. Loops can be nested.

Arguments And Returns
To allow my BF routine to be invoked from other languages, I have the idea of using the IO symbols. A table of bytes will be passed to BF. And BF will 'return' a new table of bytes (for the non COBOL heads, a table is pretty much the same thing as an array in C#).

The passed in table is the arguments to the routine and the passed back table is the return. BF has an input pointer which starts at the start on the arguments table; each time BF reads an bytes  from the arguments the pointer is incremented. Incrementing the input pointer beyond the end of the table causes the pointer to reset to 1 (the start).

The returned table is initialized as empty then each time there is an output from BF the output bytes value is added to the end of the output table. (The implementation does not actually append to a table, it appends to a list and converts the result to a table, but the logical effect is the same.)

Compilation
The compilation process involves simply iterating over to input bytes and each one being converted into an abstract symbol tree (AST) token or ignored. The AST can then be translated into COBOL and finally wrapped in some boilerplate COBOL to give it the appropriate environment. Optimization can happen to the AST.

The only part of BF which requires any serious thought (and only a few seconds at that) is the [] looping construct. Fortunately, modern COBOL dialects have the in-line, nested perform syntax. The [] BF syntax directly translated to...

    until data(data-pointer) = 0
...
end-perform

... assuming that the data area is simply a table of bytes and there is a local-storage item data-pointer which points into that table.

OK - Let's write a compiler!
I have chosen not to implement the normal file read and write stuff which a compiler does. This compiler will accept a string and return a string. The input string being BF and the output being the source for a Managed COBOL method body. This is the most straight forward implementation which can then be added to other programs to compile files or other source of source.

The compiler is massive overkill. BF is such a simple language that a very simple compiler will do. My aim with this compiler it to make it have a very clean and structured design. I have not actually implemented a full AST but rather just a list (a tree with only one branch). This is possible because there is no calling structure and no variables. It is really a testament to the efficiency of languages with only conditional loops.

Also, I have represented each of the BF symbols as an object (I call them tokens) which 'knows' the symbol and the required COBOL to enact that symbol. These objects all implement a common interface (BFToken). By moving the symbol definition and COBOL representation to a polymorphic type, it is possible to move the responsibility for understanding the lex from the main compiler loop and into these token objects. Although BF has only 8 symbols, the approach would allow additions of new symbols without changing the lexer or parser at all. This was a design feature I used in VSPL to some success.

Here is the code defining the token class interface and an instance of a token class:

       interface-id BFToken as "BF.BFToken".
       
           method-id get property BFSymbol.
           procedure division returning ret as string.
           end method.
       
           method-id get property COBOLCode.
           procedure division returning ret as string.
           end method.
       
       end interface.

       class-id BFIncr as "BF.BFIncr" implements type BFToken.

           01 source-code.
             03 filler pic x(29) value "if data-block(data-ptr) = 255".
             03 filler pic x(02) value x"0D0A".
             03 filler pic x(34) value "    move 0 to data-block(data-ptr)".
             03 filler pic x(02) value x"0D0A".
             03 filler pic x(04) value "else".
             03 filler pic x(02) value x"0D0A".
             03 filler pic x(34) value "    add 1 to data-block(data-ptr)".
             03 filler pic x(02) value x"0D0A".
             03 filler pic x(06) value "end-if".
         
           method-id get property BFSymbol.
           procedure division returning ret as string.
              move "+" to ret.
           end method.
       
           method-id get property COBOLCode.
           procedure division returning ret as string.
              move source-code to ret
           end method.

       end class.

The AST is just a list of already created token classes. Here is the code which does the lexing/parsing and creates the AST:

              table of type BFToken
              (
               new BFDecr
               new BFIncr
               new BFLeft
               new BFRight
               new BFInput
               new BFOutput
               new BFLoopStart
               new BFLoopEnd
              ).
           01 separator string value x"0D0A".

           method-id compile public.
               01 current-char character.
               01 current-token   type BFToken.
               *> Use a list as a dicationary is overkill when we only
               *> have 8 memebers and this is a trivial implementation
               01 ast type List[type  BFToken] value new List[type BFToken].
               01 code-builder type StringBuilder value new StringBuilder.
       
               procedure division using by value 
                   source-code as string
                   returning generated-code as string.
               
               *> These loops make up the parser and lexer 
               *> all in one
               perform varying current-char through source-code::ToCharArray
                   *> respocibility for what characters make up the language
                   *> is devolved to the token objects
                   perform varying current-token through tokens
                       if current-token::BFSymbol::Chars(0) equals current-char
                           invoke ast::Add(current-token)
                       end-if
                   end-perform
               end-perform          

Once this loop has constructed the AST it can be interrogated directly to get the COBOL source to inject into the boiler plate.

By going to the effort of laying out the COBOL to be generated in the value clauses of COBOL group items we cause the generated COBOL code to be human readable and have a clean representation in the compiler source as well.

Here is an example of the generated code:

01 data-block binary-short occurs 32768.
01 data-ptr binary-long value 1.
01 input-ptr binary-long value 1.
01 output-list type List[binary-short] value new List[binary-short].
01 data-block-size binary-long.
01 input-block-size binary-long.
procedure division
    using by value input-block as binary-short occurs any
    returning output-block     as binary-short occurs any.
set data-block-size  to data-block::Length
set input-block-size to input-block::Length
if data-block(data-ptr) = 255
    move 0 to data-block(data-ptr)
else
    add 1 to data-block(data-ptr) 
end-if
if data-block(data-ptr) = 255
    move 0 to data-block(data-ptr)
else
    add 1 to data-block(data-ptr) 
end-if
if data-block(data-ptr) = 255
    move 0 to data-block(data-ptr)
else
    add 1 to data-block(data-ptr) 
end-if
if data-block(data-ptr) = 255
    move 0 to data-block(data-ptr)
else
    add 1 to data-block(data-ptr) 
end-if

Visual COBOL 2010 - this makes playing around like this really straight forward!
Below is the compiler in Visual COBOL 2010 and then the source code for the compiler. I have included the entire compiler in one source file.



Here is my working compiler with the null optimizer:

      $set sourceformat(variable)
      $set ilusing(BF)
       
       class-id BFTester as "BF.BFTester".
       
           method-id main static.
           01 compiler type BF.BFCompiler value new BF.BFCompiler.
           procedure division.
               display compiler::compile("++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.")
               display "All done"
           end method.
       end class.
       
       class-id BFCompiler as "BF.BFCompiler".

           01 optimizer type BFOptimizer value new NullBFOptimizer.
           01 tokens    type BFToken occurs any value
              table of type BFToken
              (
               new BFDecr
               new BFIncr
               new BFLeft
               new BFRight
               new BFInput
               new BFOutput
               new BFLoopStart
               new BFLoopEnd
              ).
           01 separator string value x"0D0A".

           method-id compile public.
               01 current-char character.
               01 current-token   type BFToken.
               *> Use a list as a dicationary is overkill when we only
               *> have 8 memebers and this is a trivial implementation
               01 ast type List[type  BFToken] value new List[type BFToken].
               01 code-builder type StringBuilder value new StringBuilder.
       
               procedure division using by value 
                   source-code as string
                   returning generated-code as string.
               
               *> These loops make up the parser and lexer 
               *> all in one
               perform varying current-char through source-code::ToCharArray
                   *> respocibility for what characters make up the language
                   *> is devolved to the token objects
                   perform varying current-token through tokens
                       if current-token::BFSymbol::Chars(0) equals current-char
                           invoke ast::Add(current-token)
                       end-if
                   end-perform
               end-perform
               
               *> Now we have the ast we optimize it
               invoke optimizer::Optimize(ast)
               
               *> Now we generate
               *> First put in boiler plate
               invoke code-builder::Append("01 data-block binary-short occurs 32768.")
               invoke code-builder::Append(separator)
               invoke code-builder::Append("01 data-ptr binary-long value 1.")
               invoke code-builder::Append(separator)
               invoke code-builder::Append("01 input-ptr binary-long value 1.")
               invoke code-builder::Append(separator)
               invoke code-builder::Append("01 output-list type List[binary-short] value new List[binary-short].")
               invoke code-builder::Append(separator)
               invoke code-builder::Append("01 data-size binary-long.")
               invoke code-builder::Append(separator)
               invoke code-builder::Append("01 input-size binary-long.")
               invoke code-builder::Append(separator)               
               invoke code-builder::Append("procedure division")
               invoke code-builder::Append(separator)               
               invoke code-builder::Append("    using by value input-block as binary-short occurs any")
               invoke code-builder::Append(separator)               
               invoke code-builder::Append("    returning output-block     as binary-short occurs any.")
               invoke code-builder::Append(separator)               
               invoke code-builder::Append("set data-block-size  data-block::Length")
               invoke code-builder::Append(separator)               
               invoke code-builder::Append("set input-block-size input-block::Length")
               invoke code-builder::Append(separator)               
               
               perform varying current-token through ast
                  invoke code-builder::Append(current-token::COBOLCode)
                  invoke code-builder::Append(separator)
               end-perform
               *> Final boiler plate
               invoke code-builder::Append("set output-block to output-list::ToArray")
               invoke code-builder::Append(separator) 

               set generated-code to code-builder::ToString
               goback.
           end method.
                  
       end class.

       interface-id BFToken as "BF.BFToken".
       
           method-id get property BFSymbol.
           procedure division returning ret as string.
           end method.
       
           method-id get property COBOLCode.
           procedure division returning ret as string.
           end method.
       
       end interface.

       class-id BFIncr as "BF.BFIncr" implements type BFToken.

           01 source-code.
             03 filler pic x(29) value "if data-block(data-ptr) = 255".
             03 filler pic x(02) value x"0D0A".
             03 filler pic x(34) value "    move 0 to data-block(data-ptr)".
             03 filler pic x(02) value x"0D0A".
             03 filler pic x(04) value "else".
             03 filler pic x(02) value x"0D0A".
             03 filler pic x(34) value "    add 1 to data-block(data-ptr)".
             03 filler pic x(02) value x"0D0A".
             03 filler pic x(06) value "end-if".
         
           method-id get property BFSymbol.
           procedure division returning ret as string.
              move "+" to ret.
           end method.
       
           method-id get property COBOLCode.
           procedure division returning ret as string.
              move source-code to ret
           end method.

       end class.

       class-id BFDecr as "BF.BFDecr" implements type BFToken.

           01 source-code.
             03 filler pic x(27) value "if data-block(data-ptr) = 0".
             03 filler pic x(02) value x"0D0A".
             03 filler pic x(36) value "    move 255 to data-block(data-ptr)".
             03 filler pic x(02) value x"0D0A".
             03 filler pic x(04) value "else".
             03 filler pic x(02) value x"0D0A".
             03 filler pic x(40) value "    subtract 1 from data-block(data-ptr)".
             03 filler pic x(02) value x"0D0A".
             03 filler pic x(06) value "end-if".

           method-id get property BFSymbol.
           procedure division returning ret as string.
              move "-" to ret.
           end method.
       
           method-id get property COBOLCode.
           procedure division returning ret as string.
              move source-code to ret
           end method.

       end class.

       class-id BFDecr as "BF.BFRight" implements type BFToken.

           01 source-code.
             03 filler pic x(30) value "if data-ptr = data-block-size".
             03 filler pic x(02) value x"0D0A".
             03 filler pic x(22) value "    move 1 to data-ptr".
             03 filler pic x(02) value x"0D0A".
             03 filler pic x(04) value "else".
             03 filler pic x(02) value x"0D0A".
             03 filler pic x(21) value "    add 1 to data-ptr".
             03 filler pic x(02) value x"0D0A".
             03 filler pic x(06) value "end-if".
                  
           method-id get property BFSymbol.
           procedure division returning ret as string.
              move ">" to ret.
           end method.
       
           method-id get property COBOLCode.
           procedure division returning ret as string.
              move source-code to ret
           end method.

       end class.

       class-id BFDecr as "BF.BFLeft" implements type BFToken.
           01 source-code.
             03 filler pic x(15) value "if data-ptr = 1"
             03 filler pic x(02) value x"0D0A".
             03 filler pic x(37) value "    move data-block-size to data-ptr".
             03 filler pic x(02) value x"0D0A".
             03 filler pic x(04) value "else".
             03 filler pic x(02) value x"0D0A".
             03 filler pic x(28) value "    subtract 1 from data-ptr".
             03 filler pic x(02) value x"0D0A".
             03 filler pic x(06) value "end-if".
             
           method-id get property BFSymbol.
           procedure division returning ret as string.
              move "<" to ret.
           end method.
       
           method-id get property COBOLCode.
           procedure division returning ret as string.
              move source-code to ret
           end method.

       end class.

       class-id BFDecr as "BF.BFInput" implements type BFToken.

           01 source-code.
             03 filler pic x(51) value "move input-block(input-ptr) to data-block(data-ptr)"
             03 filler pic x(02) value x"0D0A".
             03 filler pic x(32) value "if input-ptr = input-block-size".
             03 filler pic x(02) value x"0D0A".
             03 filler pic x(23) value "    move 1 to input-ptr".
             03 filler pic x(02) value x"0D0A".
             03 filler pic x(04) value "else".
             03 filler pic x(02) value x"0D0A".
             03 filler pic x(22) value "    add 1 to input-ptr".
             03 filler pic x(02) value x"0D0A".
             03 filler pic x(06) value "end-if".
             
           method-id get property BFSymbol.
           procedure division returning ret as string.
              move "," to ret.
           end method.
       
           method-id get property COBOLCode.
           procedure division returning ret as string.
              move source-code to ret
           end method.

       end class.

       class-id BFDecr as "BF.BFOutput" implements type BFToken.

           method-id get property BFSymbol.
           procedure division returning ret as string.
              move "." to ret.
           end method.
       
           method-id get property COBOLCode.
           procedure division returning ret as string.
              move "invoke output-list::Add(data-block(data-ptr))" to ret
           end method.

       end class.

       class-id BFDecr as "BF.BFLoopStart" implements type BFToken.

           01 source-code pic x(55) value "perform with test before until data-block(data-ptr) = 0".

           method-id get property BFSymbol.
           procedure division returning ret as string.
              move "[" to ret.
           end method.
       
           method-id get property COBOLCode.
           procedure division returning ret as string.
              move source-code to ret
           end method.

       end class.

       class-id BFDecr as "BF.BFLoopEnd" implements type BFToken.

           01 source-code pic x(11) value "end-perform".

           method-id get property BFSymbol.
           procedure division returning ret as string.
              move "]" to ret.
           end method.
       
           method-id get property COBOLCode.
           procedure division returning ret as string.
              move source-code to ret
           end method.

       end class.

       interface-id BFOptimizer as "BF.BFOptimizer".
         
           method-id Optimize.
           procedure division using by value ast 
                     as type System.Collections.Generic.List[type BFToken]. 
           end method.
       
       end interface.

       class-id NullBFOptimizer as "BF.NullBFOptimizer" implements type BFOptimizer.
         
           method-id Optimize.
           procedure division using by value ast 
                     as type System.Collections.Generic.List[type BFToken]. 
           *> Do nothing - does not optimize
           end method.
       
       end class.

I ran this compiler which was compiling a simple 'Hello World' program. I directed the output to a text file and then cut and paste the COBOL into a Visual Studio 2010 Managed COBOL project. When I ran the project, it produced the expected array (which in ASCII says Hello World! followed by a new line).



Simple Optimization
In the above implementation I have used a null optimiser; it does not do anything. The optimiser has the AST passed to it and is expected to perform all optimisations in place. I have been doing this project for fun (lunch is for lightweights etc.) and so I have not implemented much in the way of an optimiser. One thing which is really simple but insanely verbose about BF is its need of long sections of identical symbols. My optimiser takes chains of incr tokens and converts them into a single token which performs the same operation but in one step. To do this I have added a new token which is not added to the lexer but which is  usable by the generator. This token replaces the inefficient incr chains in the AST.

The optimiser works by iterating over the AST and adding all non incr tokens directly to a new AST. When it encounters incr tokens, it counts over them and when the next non incr token is found it creates a single token to replace the chain of incr tokens and adds this new token to the new AST. Once the entire AST has been iterated over, it replaced the contents of the AST with the new AST.

       class-id BFIncrMany as "BF.BFIncrMany" implements type BFToken.

           01 source-code string.
       
           method-id new.
           01 source-builder type StringBuilder value new StringBuilder.
           procedure division using by value how-many as binary-long.
           
             invoke source-builder::AppendFormat("add {0} to data-block(data-ptr)" how-many)
             invoke source-builder::Append(x"0D0A")               
             invoke source-builder::AppendFormat("compute data-block(data-ptr) = data-block(data-ptr) b-and 255")
             set source-code to source-builder::ToString
             
           end method.
       
           method-id get property BFSymbol.
           procedure division returning ret as string.
              move "Not Implemented" to ret.
           end method.
       
           method-id get property COBOLCode.
           procedure division returning ret as string.
              move source-code to ret
           end method.

       end class.
       
       class-id IncrBFOptimizer as "BF.IncrBFOptimizer" implements type BFOptimizer.
         
           method-id Optimize.
              01 in-incr condition-value value false.
              01 how-many binary-long.
              01 current-token type BFToken.
              01 swapsies type List[type BFToken] value new List[Type BFToken].
              
           procedure division using by value ast 
                     as type System.Collections.Generic.List[type BFToken]. 
                     
               perform varying current-token through ast
                   if in-incr
                       if current-token::BFSymbol = "+"
                          *> This simple logic means very long chains of
                          *> + could overflow - but this is demo only
                          add 1 to how-many
                       else
                          set in-incr to false
                          invoke swapsies::Add(new BFIncrMany(how-many))
                       end-if
                   else 
                       if current-token::BFSymbol = "+"
                          add 1 to how-many
                          set in-incr to true
                       else
                          invoke swapsies::Add(current-token)
                       end-if
                   end-if
               end-perform
               
               invoke ast::Clear
               perform varying current-token through swapsies
                   invoke ast::Add(current-token)
               end-perform
               
           end method.
       
       end class.       

Now we can see how the large chains of increment token generated code have been replaced by simple additions:

01 data-block binary-short occurs 32768.
01 data-ptr binary-long value 1.
01 input-ptr binary-long value 1.
01 output-list type List[binary-short] value new List[binary-short].
01 data-size binary-long.
01 input-size binary-long.
procedure division
    using by value input-block as binary-short occurs any
    returning output-block     as binary-short occurs any.
set data-block-size  data-block::Length
set input-block-size input-block::Length
add 10 to data-block(data-ptr)
compute data-block(data-ptr) = data-block(data-ptr) b-and 255
if data-ptr = data-block-size 
    move 1 to data-ptr
else
    add 1 to data-ptr
end-if
add 17 to data-block(data-ptr)
compute data-block(data-ptr) = data-block(data-ptr) b-and 255
add 27 to data-block(data-ptr)
compute data-block(data-ptr) = data-block(data-ptr) b-and 255
add 30 to data-block(data-ptr)
compute data-block(data-ptr) = data-block(data-ptr) b-and 255
add 31 to data-block(data-ptr)
compute data-block(data-ptr) = data-block(data-ptr) b-and 255
if data-ptr = 1
    move data-block-size to data-ptr 
else
    subtract 1 from data-ptr
end-if
if data-ptr = 1
    move data-block-size to data-ptr 
else
    subtr...


I really enjoyed that. I got back to my nerd roots!
If you got as far as the bottom of this post - thanks for reading and keep having fun!

License

This article, along with any associated source code and files, is licensed under The Creative Commons Attribution-ShareAlike 2.5 License

About the Author

alex turner
Web Developer
United Kingdom United Kingdom
I am now a Software Systems Developer - Senior Principal at Micro Focus Plc. I am honoured to work in a team developing new compiler and runtime technology for Micro Focus.
 
My past includes a Ph.D. in computational quantum mechanics, software consultancy and several/various software development and architecture positions.
 
For more - see
 
blog: http://nerds-central.blogspot.com
 
twitter: http://twitter.com/alexturner

Comments and Discussions

 
GeneralMy vote of 5 PinmemberRugbyLeague10-Oct-12 4:25 
GeneralYou're a sick, sick man, part deux PinmemberGary Wheeler1-Jun-10 0:41 
GeneralRe: You're a sick, sick man, part deux Pinmemberalex turner1-Jun-10 1:50 
GeneralRe: You're a sick, sick man, part deux PinmemberGary Wheeler1-Jun-10 1:58 
GeneralRe: You're a sick, sick man, part deux Pinmemberalex turner1-Jun-10 1:59 
GeneralYou're a sick sick man PinmemberLee Humphries26-May-10 13:48 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web03 | 2.8.140709.1 | Last Updated 25 May 2010
Article Copyright 2010 by alex turner
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid