Click here to Skip to main content
16,017,238 members
Articles / Programming Languages / F#

Using an F# DSL to Generate C# code

Rate me:
Please Sign up or sign in to vote.
4.33/5 (2 votes)
3 May 2021CPOL10 min read 5.9K   1   2
More readable Domain Specific Language in F#
My current project is for a company called HDMP (The website is in Dutch / French). We make software for medical practitioners. I am working on a service that communicates with web services created by the Belgian government. The services are quite old, proven by the fact that they expect as input a flat file with specific record formats, the so called efact format. I am parsing this file using the excellent FileHelpers package, which allows describing fixed length file formats. This article is only about the code generation, not about the use of the Filehelpers library.

In the efact file, there over 10 different record formats, all with a fixed length of 370 bytes. Fields in a record are positional. To describe this, I created a more readable Domain Specific Language in F#. In this article, I will demonstrate how this works. The code for this article can be found on GVerelst/CodeGen: F# DSL for C#Code generation. (github.com).

Prerequisites

  • You’ll need a little bit of C# knowledge to follow this post. In particular, I will show how to create a couple of C# classes, with some (custom) attributes. But in the end, the F# program will just generate some text that happens to be C# code.
  • F# knowledge will help, but I will explain most of what I’m doing in this post.

The Problem

The code that is needed for FileHelpers to work with fixed records looks like this:

C#
// //////////////////////////////////////
// FileInfoBase
// //////////////////////////////////////

[FixedLengthRecord()]
public partial class FileInfoBase
{
    // Segment segment200
    [EFactMetadata("200", "6N", "1-6", "Naam van het bericht", "Nom du message"),
     FieldFixedLength(6), FieldAlign(AlignMode.Right, '0')]
     public int MessageName { get; set; } = 920000; // 920000|920900|...
    [EFactMetadata("2001", "2N", "7-8", "Code fout", "Code érreur"),
     FieldFixedLength(2), FieldAlign(AlignMode.Right, '0')]
     public byte Error2001 { get; set; } = 0;
    [EFactMetadata("201", "2N", "9-10", "Versienummer formaat van het bericht",
     "N° version du format du message"), FieldFixedLength(2),
     FieldAlign(AlignMode.Right, '0')] public byte MessageVersionNumber { get; set; } // 2
    // ...
    [EFactMetadata("204", "14N", "21-34", "Referentie bericht ziekenhuis",
     "Reference du message"), FieldFixedLength(14), FieldTrim(TrimMode.Both)]
     public string InputReference { get; set; } = new string('0', 14);
    // ...
    [EFactMetadata("3091", "2N", "206-207", "Code fout", "Code érreur"),
     FieldFixedLength(2), FieldAlign(AlignMode.Right, '0')]
     public byte Error3091 { get; set; } = 0;
}

We describe the record fields mainly using attributes:

  • The FileHelper attributes that describe the format of the field
  • The EFactMetadata custom attribute that gives some additional information about the field. It contains the name of the zone (example: “200”. Remember, this is an archaic record format), the type of the zone (example: “6N” or “45A”), the position in the record (calculated), and then the translation in Dutch and French.
  • And in addition, we also give the field a data type, a name, an optional default value and an optional comment.

The documentation describes the file format using these terms. I also created a file viewer to show the contents in a user-friendly way, hence the translated fields. This will not be on GitHub. The attributes allow the use of reflection in the user interface to show the file format. The documentation also uses a notion of segments to describe a block in the record formats that can be reused in similar record formats. We want to mimic this behavior as well.

Clearly, this is a lot of error prone code to type, and also not very readable because of all the clutter. If only we could represent this in a more concise and readable way, and generate the necessary code from this …

Defining the Internal Domain Specific Language in F#

Looking at this, we can see some needed entities. The first thing we need to describe is a zone (which will translate into a property in the generated class. For the first zone (“200”), this can look like:

F#
Z "200" (N 6) "Dutch name" "French name" Int "PropertyName" "920000" "920000|920900|..."

This contains all the data we need to describe a field with all its attributes. Let’s create the Zone type:

F#
type Zone = { zone: string; length: Length; nl: string; fr: string; 
              datatype: Datatypes; name: string; defaultvalue: string; comments: string}

and a constructor for this type:

F#
let Z zone length nl fr dt name dft comments =
    { zone=zone; length = length; nl= nl; fr = fr; datatype=dt; name=name; 
      defaultvalue=dft; comments=comments }
  • zone: the zone name (being “200”, “2001”, …)
  • length: definition of the length (ex N 6 indicates 6 digits, A 5 indicates 5 characters)
  • nl: description in Dutch
  • fr: description in French
  • The rest of the fields are clear.
This is valid F# code, embedded in our project. The nice thing is that the compiler will prevent a lot of errors for us. This is what is called an “internal DSL”. An external DSL describes a separate language, with its own syntax rules. This means that the interpretation of the external DSL needs to be written as well.

There are some unknown parts in there:

F#
Z "200" (N 6) "Naam van het bericht" "Nom du message" 
               Int "PropertyName" "920000" "920000|920900|..."

The datatype is Int, we must describe this as well. This could have been just a string, but that doesn’t allow for validation. Ideally, we want the F# compiler to catch as many errors as possible before we start to generate the code. So here is the Datatypes enumeration:

F#
type Datatypes = Bool | CRC | Byte | Short | Int | DateTime | 
                 Time | String | Money | AmbHos | Gender | Error

Now the F# compiler will only allow these datatypes. Depending on the datatype, we can generate slightly different C# code. Example:

Int generates this:

C#
[EFactMetadata("200", "6N", "1-6", "Naam van het bericht", "Nom du message"), 
FieldFixedLength(6), FieldAlign(AlignMode.Right, '0')] 
public int MessageName { get; set; } = 920000; // 920000|920900|...

And String will generate this:

C#
[EFactMetadata("204", "14N", "21-34", "Referentie bericht ziekenhuis", 
"Reference du message"), FieldFixedLength(14), FieldTrim(TrimMode.Both)] 
public string InputReference { get; set; } = new string('0', 14);

Of course, the other datatypes generate their own versions.

Having this in place already reduces the number of hard to find errors in the C# code.

F#
Z "200" (N 6) "Naam van het bericht" "Nom du message" 
Int "PropertyName" "920000" "920000|920900|..."

We also see a Length. The constructor (N 6) is actually composed of the length type (“A” is alphabetic, “N” is numeric, “S” is numeric, but prefixed with ‘+’ or ‘-‘. This will later be used in the code generation. Let’s describe this:

F#
type LengthType = A | N | S
type Length = { ltype: LengthType; length: int }

and we create three constructor functions:

F#
let N x = { ltype= N; length= x }
let A x = { ltype= A; length= x }
let S x = { ltype= S; length= x }

N 6 will now return a new Length record with ltype = N, length = 6. Having these three little functions allows the F# again to validate the code at compile time.

Recap of the Definition of the Zone So Far

F#
type Datatypes = Bool | CRC | Byte | Short | Int | DateTime | Time | 
                 String | Money | AmbHos | Gender | Error 
type LengthType = A | N | S
type Length = { ltype: LengthType; length: int }
type Zone = { zone: string; length: Length; nl: string; fr: string; 
              datatype: Datatypes; name: string; defaultvalue: string; comments: string}

let N x = { ltype= N; length= x }
let A x = { ltype= A; length= x }
let S x = { ltype= S; length= x }

let Z zone length nl fr dt name dft comments =
    { zone=zone; length = length; nl= nl; fr = fr; datatype=dt; name=name; 
      defaultvalue=dft; comments=comments }

These 9 lines of code allow us to create zones in a concise and clear way. Let’s add semantics to this. Some zones are of the same type, and have a specific meaning. For example, I defined the Recordtype function as:

F#
let Recordtype rectype zone = 
    let rt = rectype.ToString()
    Z zone (N 2) ("recordtype " + rt) ("enregistrement de type " + rt) 
                   Byte "Recordtype" rt ("Always " + rt);

Every efact record will have a specific record type, the 2 first bytes of the record. They always have the same NL and FR description, so I made a new function for this. The function on itself is not to save typing, but to give semantics to this field.

F#
Recordtype 95 "400"

Indicates a record that is a record type. We can also write this out in the code as:

F#
Z "40" (N 2) "recordtype 95" " enregistrement de type 95" Byte "Recordtype" "95" "Always 95"

It is not much longer (copy / paste is your friend here), but a lot clearer on what it means. So in the same style, I defined Mutuality:

F#
let Mutuality zone =
    Z zone (N 3) "Nummer mutualiteit" "Numéro de Mutualité" Int "MutualityNumber" "" ""

And again:

F#
Mutuality "401"

indicates very clearly what we mean here. I made some more:

F#
let Errorcode zone =
    let nzone = normalizeName zone
    Z zone (N 2) "Code fout" "Code érreur" Error ("Error" + nzone) "0" ""

let Reserved l zone =
    let nzone = normalizeName zone
    let dft = match l.ltype with
                | A -> sprintf "new string(' ', %d)" l.length 
                | N -> sprintf "new string('0', %d)" l.length 
                | S -> sprintf "'+' + new string('0', %d)" (l.length - 1)
    Z zone l "Reserve" "Reserve" String ("Reserved" + nzone) dft ""

As you can see, a reserved zone can only be of datatypes A | N | S. For each of the cases, I defined the outcome. No more need to think about what kind of attributes need to be generated, and it is clear that this is a zone that is there as a filler, in case more zones would be needed in the future (remember, this is an archaic format).

This now gives us a (domain specific) language to describe the records, for example:

F#
Recordtype 95 "400"
Errorcode "4001"
Mutuality "401"
Errorcode "4011"
Z "402" (N 12) "Nummer van verzamelfactuur"
               "Numéro de facture récapitulative" String "RecapInvoiceNumber" "" ""
Errorcode "4021"
// ...
Reserved (N 257) "413"

Now we have a way to describe the zones in the flat file that will be converted into properties in a C# class. Let’s extend the DSL to include classes. In the eFact documentation, there are some predefined structures called segments. A segment has a name and is composed of 1 or more zones. These segments will be put together in a class. So a class is a named collection of segments, and a segment is a named collection of zones. A class can also inherit from another class, which saves some more typing. A namespace is a named collection of classes, and finally a program (I didn’t find a better name for this) is composed of namespaces, and has a filename.

Image 1

Here are the definitions:

F#
type Segment = { name: string; zones: Zone list }
type Interface = { name: string; lines: string list }
type Record = { name: string; inherits: Record option; 
                implements: Interface list; segments: Segment list }
type Namespace = { name: string; records: Record list }
type Program = { filename: string; baseNamespace: string; namespaces: Namespace list }

Let’s define a small program:

F#
let segment200 = 
    { 
        name= "segment200"; 
        zones= 
        [
            Z "200" (N 6) "Naam van het bericht" "Nom du message" Int " 
                           MessageName" "920000" "920000|920900|..."
            Errorcode "2001" 
            Z "201" (N 2) "Versienummer formaat van het bericht" 
                    "N° version du format du message" Byte " MessageVersionNumber" "" "2"
            Errorcode "2011" 
	     // ...
            Z "205" (N 14) "Referentie bericht VI" 
                           "Reference du message OA" String " ReferenceOA" "" ""
            Errorcode "2051" 
            Reserved (N 15) "206"
        ] 
    }

let segment300 = 
    { 
        name= "segment300"; 
        zones= 
        [
            Z "300a" (N 4) "Factureringsjaar"  "Année de facturation"  Int "YearBilled" "" ""
            Z "300b" (N 2) "Factureringsmaand" "Mois de facturation"  Byte "MonthBilled" "" ""
            Errorcode "3001" 
            Z "301" (N 3) "Nummer van de verzendingen" "Numero d''envoi" Int " RequestNr" "" ""
            Errorcode "3011" 
            Z "302" (N 8) "Datum opmaak factuur" 
              "Date de création de facture" DateTime " Creationdate" "" ""
            Errorcode "3021" 
	     // ...
            Z "309" (N 2) "Type facturering" "Type facturation" Byte "Invoicingtype" "" ""
            Errorcode "3091" 
        ] 
    }

let fileInfoBase =
   {
        name= "FileInfoBase"; 
        inherits = None;
        implements = [];
        segments= 
        [
           segment200 
           segment300
        ]
   }

let fileInfo =
    {
        name= "FileInfo"; 
        inherits = Some fileInfoBase;
        implements = [];
        segments= 
        [
            segment300a
        ]
    }

// ...
let namespaceRequests =
    {
        name="Requests";
        records=
        [
            fileInfoBase
            fileInfo
	     // ...
        ]
    }

let namespaceSettlement =
    {
        name="Settlement";
        records=
        [
	     // ...
        ]
    }

let prog = 
    { 
        filename="eFact.cs";
        baseNamespace="HdmpCloud.eHealth.eFact.Serializer.Recordformats.";
        namespaces = 
        [
            namespaceRequests
            namespaceSettlement
        ]
    }

As you can see, the definition of all the needed datatypes is about 10 lines, and very readable:

F#
type Datatypes = Bool | CRC | Byte | Short | Int | DateTime | Time | 
                 String | Money | AmbHos | Gender | Error 
type LengthType = A | N | S
type Length = { ltype: LengthType; length: int }
type Zone = { zone: string; length: Length; nl: string; fr: string; 
              datatype: Datatypes; name: string; defaultvalue: string; comments: string}
type Segment = { name: string; zones: Zone list }
type Interface = { name: string; lines: string list }
type Record = { name: string; inherits: Record option; implements: Interface list; 
                segments: Segment list }
type Namespace = { name: string; records: Record list }
type Program = { filename: string; baseNamespace: string; namespaces: Namespace list }

Then we defined some helper functions to make the definition of the zones a bit easier, and to give it semantic meaning. And now we have described the zones, segments, records, namespaces and the program. This is done in about 1000 lines of code.

Let’s Generate Some C#

Nice. We have described our language (DSL), and we have described what our C# classes should look like. We can compile this program, and if it succeeds, we know that the program in our DSL is syntactically correct. Time to generate the code, so this becomes useful.

To start, let’s output a Zone. This will be output as a property in a C# class. Don’t mind the pos parameter yet.

F#
let outputZone pos zone  =
    let (declaration, att3) = outputDeclaration zone
    let att1 = outputEFactMetadata zone pos
    let att2 = sprintf "FieldFixedLength(%d)" zone.length.length

    let attslist = [ att1; att2; att3 ]

    let atts = attslist |> List.reduce (fun a b -> a + ", " + b)
    let comment = if zone.comments.Length = 0 then "" else (C2 zone.comments)

    "[" + atts + "] " + declaration + (outputDefaultValue zone) + " " + comment

As you can see, there are some helper functions here. I’ll discuss them below.

The outputZone function takes 2 parameters: pos and zone. The output is a string describing a C# property with the necessary attributes. This is the central function in the code generation. The output type of this function is a string. In the end, the generated program will just be a list of strings to be written into a file.

In F#, a function can only be used if it was defined before the calling function. At first this is a pain, but it forces you to have a correct dependency structure. Typically, this results in a list of small functions that are composed into more useful functions. Let’s look at some of the functions in “generator.fs”, which contains the code to generate the C# classes.

Very simple function to generate the string “5N” from the type (N 5):

F#
let outputLength (l: Length) =
    sprintf "%d%A" l.length l.ltype

Make the first character of a string uppercase:

F#
let captitalize (s:string) =
    if s.Length = 0 then ""
    else s.Substring(0,1).ToUpper() + s.Substring(1)

Create the EFactMetadata attribute:

F#
// EFactMetadata("312", "449", "352-800", "Reserve", "Reserve") 
let outputEFactMetadata zone pos =
    let nl = captitalize zone.nl
    let fr = captitalize zone.fr
    let rng = sprintf "%d-%d" pos (pos + zone.length.length - 1)
    sprintf "EFactMetadata(\"%s\", \"%s\", \"%s\", \"%s\", \"%s\")" 
             zone.zone (outputLength zone.length) rng nl fr

The function is straightforward, thanks to the use of the small helpers.

F#
// public string Reserve9 { get; set; } = new string(' ', 449);
let outputDeclaration zone =
    let (dt, att) = match zone.datatype with
                     | CRC -> ("byte", "FieldTrim(TrimMode.Both)")
                     | Int -> ("int", if (zone.length.ltype = LengthType.S )
                                         then sprintf 
                                         "FieldConverter(typeof(SignedIntConverter), %d)" 
                                         zone.length.length
                                         else "FieldAlign(AlignMode.Right, '0')")
                    // ...
                     | Gender -> ("Gender", "FieldConverter(typeof(EnumIntConverter),1)")

    (sprintf "public %s %s { get; set; }" dt zone.name, att)

// [EFactMetadata("312", "449", "352-800", "Reserve", "Reserve"), 
//  FieldFixedLength(450), FieldValueDiscarded] 
//  public string Reserve9 { get; set; } = new string(' ', 449);
let outputZone pos zone  =
    let (declaration, att3) = outputDeclaration zone
    let att1 = outputEFactMetadata zone pos
    let att2 = sprintf "FieldFixedLength(%d)" zone.length.length

    let attslist = [ att1; att2; att3 ]

    let atts = attslist |> List.reduce (fun a b -> a + ", " + b)
    let comment = if zone.comments.Length = 0 then "" else (C2 zone.comments)

    "[" + atts + "] " + declaration + (outputDefaultValue zone) + " " + comment

The First Function With Some Logic in it: outputSegment

We want to output a segment, which is a number of zones. There will be a loop to cover all the zones, but in functional programming, we avoid loops as much as possible. F# provides us with a lot of functions to handle collections.

The output we want is not just a line for each zone, but given that eFact files are records with fixed-length fields, we also want to indicate the position of the field in the record. We saw before that each record has a length, this allows us to calculate the positions. Here is some partial output of a zone:

F#
// Segment segment200
[EFactMetadata("200", "6N", "1-6", "Naam van het bericht", "Nom du message"), 
FieldFixedLength(6), FieldAlign(AlignMode.Right, '0')] 
public int MessageName { get; set; } = 920000; // 920000|920900|...
[EFactMetadata("2001", "2N", "7-8", "Code fout", "Code érreur"), 
FieldFixedLength(2), FieldAlign(AlignMode.Right, '0')] 
public byte Error2001 { get; set; } = 0;
[EFactMetadata("201", "2N", "9-10", "Versienummer formaat van het bericht", 
"N° version du format du message"), FieldFixedLength(2), 
FieldAlign(AlignMode.Right, '0')] public byte MessageVersionNumber { get; set; } // 2
[EFactMetadata("2011", "2N", "11-12", "Code fout", "Code érreur"), 
FieldFixedLength(2), FieldAlign(AlignMode.Right, '0')] 
public byte Error2011 { get; set; } = 0;

Notice the 3rd parameter of the eFactMetadata attribute (“1-6”, “7-8”, “9-10”, “11-12”, …). This is a running total that is calculated using a start position and the lengths of the zones. Remember that the outputZone function takes a pos parameter, this explains why. Here is the function:

let outputSegment start (seg: Segment)  =
    let (endpos, lines) = 
        seg.zones |> List.fold (fun (pos, lines) z -> 
            let z2 = outputZone pos z
            (pos + z.length.length, z2::lines)
                                    ) (start, [])

    let zs2 = (C2 ("Segment " + seg.name)) :: (lines |> List.rev)

    (endpos, zs2)

A record is composed of one or multiple segments, so we need a start position and we return the end position for this segment. Later the outputRecord function will use the same trick as we use here for the position in the EfactMetadata attribute.

The main loop is implemented in the List.fold function:

F#
seg.zones |> List.fold (fun (pos, lines) z -> 
            let z2 = outputZone pos z
            (pos + z.length.length, z2::lines)
                       ) (start, [])

Taking the collection of zones as its input, List.fold will iterate over each zone and apply an accumulator function to it. The accumulator is the tuple (pos, lines), which indicates that we are accumulating 2 things at the same time: the position and the generated lines.

F#
let z2 = outputZone pos z			// generates the line for the current position
(pos + z.length.length, z2::lines)	// returns pos plus the length of the zone 
       // and the generated line in front of all the lines that were already generated

The result is that we now have our lines with the position correctly filled, but in reverse order. This explains the following line:

F#
let zs2 = (C2 ("Segment " + seg.name)) :: (lines |> List.rev)

If you like, you can read the rest of the code on GitHub. Most of the code is straightforward from this point on.

More Enhancements

One simple enhancement is this:

F#
let C2 s = "// " + s

Now we can generate comments like C2 “Segment 200”.

Errorcodes

In the efact format, there are many Errorcode fields. They always look the same:

F#
Z  "2001" (N 2) "Code fout" "Code érreur" Error ("Error" + nzone) "0" ""

This is always a 2-digit field (N 2), so we can define a new function for this:

let Errorcode zone =
    let nzone = normalizeName zone
    Z zone (N 2) "Code fout" "Code érreur" Error ("Error" + nzone) "0" ""

Errorcode “2001” will now create a 2-digit zone in a descriptive way.

Reserved Zones

There are also 2 types of reserved zones: numeric and alphabetic. Depending on their type, they will be filled up with different values. They are the FILLERS in good old COBOL (and yes, this says something about my age).

To describe them, we make another function:

F#
let Reserved l zone =
    let nzone = normalizeName zone
    let dft = match l.ltype with
                | A -> sprintf "new string(' ', %d)" l.length 
                | N -> sprintf "new string('0', %d)" l.length 
                | S -> sprintf "'+' + new string('0', %d)" (l.length - 1)
    Z zone l "Reserve" "Reserve" String ("Reserved" + nzone) dft ""

Conclusion

Describing the data model for the classes to be generated takes about 15 lines of code. Then we defined a couple of small helper functions and some bigger functions to generate the code. The generator.fs file contains 163 lines of code. With this, we can describe our program in a readable way. We also added some semantics to the code with constructor functions to describe fillers, errors, a mutuality, … I think this is a nice demonstration of F# as a functional language.

References

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Architect Faq.be bvba
Belgium Belgium
Gaston Verelst is the owner of Faq.be, an IT consultancy company based in Belgium (the land of beer and chocolate!) He went through a variety of projects during his career so far. Starting with Clipper - the Summer '87 edition, he moved on to C and mainly C++ during the first 15 years of his career.

He quickly realized that teaching others is very rewarding. In 1995, he became one of the first MCT's in Belgium. He teaches courses on various topics:
• C, C++, MFC, ATL, VB6, JavaScript
• SQL Server (he is also an MSDBA)
• Object Oriented Analysis and Development
• He created courses on OMT and UML and trained hundreds of students in OO
• C# (from the first beta versions)
• Web development (from ASP, ASP.NET, ASP.NET MVC)
• Windows development (WPF, Windows Forms, WCF, Entity Framework, …)
• Much more

Of course, this is only possible with hands-on experience. Gaston worked on many large scale projects for the biggest banks in Belgium, Automotive, Printing, Government, NGOs. His latest and greatest project is all about extending an IoT gateway built in MS Azure.

"Everything should be as simple as it can be but not simpler!" – Albert Einstein

Gaston applies this in all his projects. Using frameworks in the best ways possible he manages to make code shorter, more stable and much more elegant. Obviously, he refuses to be paid by lines of code!

This led to the blog at https://msdev.pro. The articles of this blog are also available on https://www.codeproject.com/script/Articles/MemberArticles.aspx?amid=4423636, happy reading!

When he is not working or studying, Gaston can be found on the tatami in his dojo. He is the chief instructor of Ju-Jitsu club Zanshin near Antwerp and holds high degrees in many martial arts as well.

Gaston can best be reached via https://www.linkedin.com/in/gverelst/.


Comments and Discussions

 
PraiseVery similar approach Pin
Jarmo Muukka12-May-21 11:03
Jarmo Muukka12-May-21 11:03 
GeneralRe: Very similar approach Pin
Gaston Verelst15-May-21 0:09
Gaston Verelst15-May-21 0:09 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.