Click here to Skip to main content
Click here to Skip to main content

XsdTidy beautifies the Xsd.exe output *with full DocBook .NET Wrapper*

By , 1 Mar 2004
 

Sample Image - xsdtidy.png

If you like this tool, support it by voting, if you don't like it, make your vote verbose...

Pre-Introduction

The XsdTidy tool has been entirely rebuild from scratch using CodeDom which is much easier to handle than Emit. The new version is called Refly and available at the Refly article.

Introduction

XsdTidy is a refactoring tool to overcome some silly limitations of the exceptional Xsd.exe (see [1]) tool provided with the .NET framework. More specifically, XsdTidy addresses the following problems:

  • Name normalization: if your XSD schema is using lower case names or more generally non ".NET" normalized names, you will end up with types that will make the FxCop (see [2]) spit out hundreds of infractions.
  • Fixed Array Sizes: xsd.exe handles multiple elements by creating an array. There is no problem when you are loading the data, but unfortunately this is not convenient if you want to populate a document since arrays do not support Add or Remove. XsdTidy uses ArrayList for more flexibility.
  • Default Constructor: Xsd.exe does not care about providing a default constructor that initializes the fields with the proper values. This work can become very silly when the object structure is getting big.

XsdTidy achieves refactoring by recreating new classes for each type exported by the Xsd.exe tool using the System.Reflection.Emit namespace. It also takes care of "transferring" the Xml.Serialization attributes to the factored classes. Hence, the factored classes are more .NET-ish and still outputs the same XML. Moreover, there is no dependency between the refactored code and the original code.

As a nice application of the tool, a full .NET wrapper of the DocBook schema (see [3]) is provided with the project. This .NET wrapper lets write or generate DocBook XML easily with the help of Intellisense.

Fixing problems

Name conversion

The .NET standards define specific naming convention for all types of data: arguments should be camel case, function names capitalized, etc... This is really helpful to keep the framework consistent. Tools like FxCop help us stay on the "normalized" side.

This problem is tackled the dumb way: given a dictionary of "common" words, the class NameConformer tries to split a name in separate words, after that it renders it to the needed convention.

There is much room for improvement on the list of words and the algorithm to split the name, any contribution welcome.

FixedArraySize

Arrays are replaced by System.Collection.ArrayList which are much more flexible. Moreover, array fields are created by default using their default constructor. This is to economize you the hassle of creating a collection before using it.

Properties

Fields are hidden in properties, which is more convenient to use. Moreover, collection fields do not have set property according to FxCop rule.

public class testclass
{
    [XmlElement("values",typeof(int)]
    public int[] values;
}

becomes:

public class TestClass
{
    private ArrayList values;

    [XmlElement("values",typeof(int)]
    public ArrayList Values
    {
        get
        {
            return this.values;
        }
    }
}

System.Reflection.Emit

The System.Reflection.Emit namespace is truly and amazingly powerful, it enables you to create new types at runtime and execute them or store them to assemblies for further use. Unfortunately, there are not much tutorials and examples on this advanced topic. In this chapter, I will try to explain my limited understanding of this tool.

What is Emit?

The Emit namespace gives you the tools to write IL (Interpreted Language) instructions and compile them to types. Hence, you can basically do anything with Emit. A typical emit code will look like this:

// emit
ILGenerator il = ...;
il.Emit(OpCodes.Ldarg_0); 
il.Emit(OpCodes.Ldarg_1);
il.Emit(OpCodes.Stfld, fb);
il.Emit(OpCodes.Ret);

// C# equivalent
this.fb = value;

If you are a newcomer, it can look cryptic, but we'll try to explain the above a bit.

Where to start ?

The problem with Emit is that debugging is complicated: if you generate wrong IL code, the framework will not execute it, throwing an error without giving any clue.. Moreover, you usually don't have the time to learn the dozens of the code that are part of the OpCodes class. Therefore, it would be nice to always have some "model" IL and then try to implement it with Emit.

Hopefully, creating this model is easy! It is possible, using decompilers such as Reflector (see [4]), to read the IL code of any .NET assembly. The idea is simple: open a dummy project where you create the model class that needs to be factored, compile and use a decompiler to read the IL of your model and there you go...you have IL code!

Reflector

Emit, the basics

I will cover some very basic facts about using Emit. As mentioned above, the most efficient way to learn is to work with a dummy project and Reflector on the side. We will see here how to make a basic C# statement in some instance method where value is the first argument and field is a member of the class.

if (value==null)
    throw new ArgumentNullException("value");
this.field=value;

Getting a ILGenerator

Usually, you start by creating an AssemblyBuilder, then a ModuleBuilder, then a TypeBuilder and finally you can add methods to the TypeBuilder using TypeBuilder.DefineMethod which returns a MethodBuilder. This instance is then used to retrieve an ILGenerator object which we use to output IL code:

MethodBuilder mb = ...;
ILGenerator il = mb.GetGenerator();

OpCodes

The OpCodes class contains all the IL operations. It has to be used in conjunction with ILGenerator.Emit as we will see in the following.

Arguments

Each time you call a method (static or non-static), the method arguments are accessible through OpCodes.Ldarg_0, OpCodes_1, ... In an instance method, OpCodes.Ldarg_0 is the "this" address.

Labels

Labels are used to make jumps in the IL code. You need to set up Labels if you want to build instructions such as if...else.... A Label is defined as follows:

Label isTrue = il.DefineLabel();

Once the Label is defined, it can be used in an instruction that makes a jump. When you reach the instruction that the Label should mark, call MarkLabel:

il.MarkLabel(isTrue);

Comparing values to null

Comparing a value to null is done using the OpCodes.Brtrue_S. This instruction makes a jump to a Label if the value is not null.

Label isTrue = il.DefineLabel();
il.Emit(OpCodes.Ldarg_1); // pushing value on the stack
il.Emit(OpCodes.Brtrue_S,isTrue); // if non null, jump to label
// IL code to throw an exception here
...
// marking label
il.MarkLabel(isTrue);
...

Creating objects

To create object, you must first retrieve the ContructorInfo of the type, push the constructor arguments on the stack and call the constructor using OpCodes.NewObj. If we use the default constructor of ArgumentNullException, we have:

ConstructorInfo ci = 
    typeof(ArgumentNullException).GetConstructor(Type.EmptyTypes);

Label isTrue = il.DefineLabel();
il.Emit(OpCodes.Ldarg_1);
il.Emit(OpCodes.Brtrue_S,isTrue);
il.Emit(OpCodes.NewObj,ci); // creating new exception
il.Emit(OpCodes.Throw); // throwing the exception
il.MarkLabel(isTrue);
...

You can clearly see the "jump across the exception" with the label isTrue.

Assigning fields

The last step is to assign the field with the value (stored in the first argument). To do so, we need to push the "this" address on the stack (OpCodes.Ldarg_0), push the first argument (OpCodes.Ldarg_1) and use OpCodes.Stdfld:

// Type t is the class type
FieldInfo fi = t.GetField("field");
il.Emit(OpCodes.Ldarg_0);
il.Emit(OpCodes.Ldarg_1);
il.Emit(OpCodes.Stdfld,fi);

Finish the work

To close a method, use OpCodes.Ret:

il.Emit(OpCodes.Ret);

Refactoring

Main steps

The refactoring is handled by the XsdWrappedGenerator class. The main factoring steps are:

  1. Create an AssemblyBuilder and define a new ModuleBuilder.
  2. For each user-provided type that needs to be refactored, define a new TypeBuilder in the ModuleBuilder.
  3. For public fields in the source type, generate a field in the refactored type.
  4. Define default constructors for each factored class.
  5. Add properties for each field in the factored types and copy the XML serialization attributes to the properties.

During the process of factoring, special care is taken about nullable/non nullable types and collection handling:

  • collections are preallocated for easier use,
  • non-nullable fields are allocated, nullable fields are left to null.
  • non-nullable fields are always checked against zero, while nullable fields are not checked.

Once the factoring is finished, the types are created and saved to an assembly.

Using XsdWrappedGenerator

The XsdWrappedGenerator encapsulates all the "wrapping" functionalities: create a new instance, add the types you need to be refactored and save the result to a file:

XsdWrapperGenerator gen = new XsdWrapperGenerator(
    "CodeWrapper", // output namespace and assembly name
    new Version(1.0), // outputed assembly version
    );

// adding types
gen.AddClass( typeof(myclass) );
...

// refactor
gen.WrapClasses();
// save to file, this invalidates gen.
gen.Save();

The name passed to the constructor is used as default namespace and output assembly name.

Using the Command Line application

XsdWrapperGenerator comes with a minimal console application that loads an assembly, searches to types, refactors them and output the results. Calling convention is as follows:

XsdTidy.Cons.exe AssemblyName WrappedClassNamespace OutputNamespace Version

where

  • AssemblyName is the name of the assembly to scan (without .dll)
  • WrappedClassNamespace is the namespace from which the types are extracted
  • OutputNamespace is the factored namespace
  • Version is the version number: major.minor.build.revision

NDocBook

DocBook is an XML standard to describe a document. It is a very powerful tool since the same XML source can be rendered in almost all possible output formats: HTML, CHM, PDF, etc... This richness comes to a price: DocBook is complicated for the beginner and it tends to be XML-ish.

This was the starting of the article for me: I needed to generate DocBook XML to automatically generate code in GUnit (see [5]) but I wanted to take advantage of VS intellisense.

The first step was to generate the .NET classes mapping the DocBook schema using the Xsd.exe tool. The generated code had some problems that would make it unusable: non-nullable fields where not initialized automatically and this would lead to a lot of manual work.

Hence, the second was to write XsdTidy and apply it to DocBook. So here's an example of use:

// creating a book object
Book b = new Book();
// title is nullable, so we must allocate it
b.Title = new Title();
// text is a collection, preallocated
b.Title.Text.Add("My first book");
// nullable
b.Subtitle = new Subtitle();
b.Subtitle.Text.Add("A subtitle");

Toc toc = new Toc();
b.Items.Add(toc);
toc.Title = new Title();
toc.Title.Text.Add("What a Toc!");

Part part = new Part();
b.Items.Add(part);
part.Title = new Title();
part.Title.Text.Add("My first page");
            
// generate xml using XmlSerialization tools
using (StreamWriter sw = new StreamWriter("mybook.xml"))
{
    XmlTextWriter writer = new XmlTextWriter(sw);
    writer.Formatting = Formatting.Indented;
    XmlSerializer ser = new XmlSerializer(typeof(Book));
    ser.Serialize(writer,b);
}

The output of this snippet looks like this:

<?xml version="1.0" encoding="utf-8"?>
<book xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <title>My first book</title>
  <subtitle>A subtitle</subtitle>
  <toc>
    <title>What a Toc!</title>
  </toc>
  <part>
    <title>My first page</title>
  </part>
</book>

Now, with Intellisense on our side, I am much more comfortable with DocBook...

Conclusion

System.Reflection.Emit is a powerful tool that deservers more attention than it currently has. It can be used to generate optimized parsers (like Regex is doing), runtime typed DataSets, etc...

History

  • 20/2/2004, initial try-out.

References

  1. XSD Schema Definition Tool
  2. FxCop
  3. DocBook
  4. Reflector
  5. GUnit

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

Jonathan de Halleux
Engineer
United States United States
Member
Jonathan de Halleux is Civil Engineer in Applied Mathematics. He finished his PhD in 2004 in the rainy country of Belgium. After 2 years in the Common Language Runtime (i.e. .net), he is now working at Microsoft Research on Pex (http://research.microsoft.com/pex).

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
QuestionHelp, Update or Alternative?memberJames Coleman26 Feb '07 - 4:54 
It has been a couple of years now since this article has been posted. XSD.EXE still has the same drawbacks of using arrays and such. I wasn't successful in using this tool from a cmd line nor could I figure out how to use it in an application correctly. Can someone tell me how I would be able to use XSDTidy (or some other app) to pass in an XSD file and generate a class file that has strongly typed collections (and possibly instantiates any child collections so I can avoid using "New" so often).
 
Thanks
~james
colema18@gmail.com
AnswerRe: Help, Update or Alternative?membertrx110 Jul '08 - 5:23 
I wrote a Visual Studio (2008) macro that takes an XSD and generates a .cs class file using XSD.exe then does a regular expression search & replace to add generic lists for all the fixed arrays created by XSD.exe. I can share if interested.
 
The search & replace is an adaptation from this: Tweaking the output of XSD.exe to use generics[^]
 
Chad
QuestionHow can I download this tool?memberboazru27 Nov '05 - 5:27 
As far as I know I have here only the source and I don't know what I need to with it..........
I am java Dev........
 
Thanks
Generalcool tool..few questions/issues...memberRama K22 Jun '04 - 3:43 
It's a great tool and I am really impressed. I have a question (might be dumb), I see that all the properties are being generated as "virtual". Is there any specific reason??
 
Also, It would be nice if the default path to xsd.exe is like this "C:\Program Files\Microsoft Visual Studio .NET 2003\SDK\v1.1\Bin" instead of "C:\Program Files\Microsoft Visual Studio 2003\SDK\v1.1\Bin"
 
I had an error in my xsd and xsdtidy launched the xsd window and closed it before I can read what is the error. Is there anyway to pause that window or show the error in a dialog??
 
These are very minor issues, but thought I should bring them to your notice.
 
Thanks
rama
GeneralRe: cool tool..few questions/issues...memberJonathan de Halleux23 Jun '04 - 10:43 
Hi Rama,
 
I'm happy to see that you like XsdTidy. I think the most important about it, it is that... you actually contribute to it quite easily. XsdTidy is part of REfly which is hosted at http://mbunit.tigris.org.
 
If you have requests and so, you can either post them at refly@mbunit.tigris.org or using the issue tracking there.
 
ps: hope it's not too much of a hassle to ask you to repost Smile | :)
 
Jonathan de Halleux - My Blog
GeneralRe: cool tool..few questions/issues...memberrama_k23 Jun '04 - 11:56 
done. i think the issue is #53. i still have question about the properties being virtual. Smile | :)
 
thanks
rama
GeneralRe: cool tool..few questions/issues...memberrama k23 Jun '04 - 11:57 
oops its #51
GeneralRe: cool tool..few questions/issues...memberJonathan de Halleux23 Jun '04 - 12:05 
It's the default behavior of CodeDom. You can change that by changing the Attributes of the members.
 
Jonathan de Halleux - My Blog
GeneralAlternative ImplementationmemberEron Wright20 Apr '04 - 18:49 
The concept of extending the output of the XSD tool is a good one. Here is an alternative implementation that generates constructors, properties, strong collections, IClonable implementation, comparers for the properties, ISerializable, and more.
 
.NET Framework Tools - Code Generation Extensions
GeneralRe: Alternative ImplementationmemberJonathan de Halleux20 Apr '04 - 21:20 
Thanks for the heads up.
 
Jonathan de Halleux - My Blog - www.dotnetwiki.org -
MbUnit - QuickGraph - NCollection

GeneralFix for duplicate elementsmemberobe1line29 Mar '04 - 4:40 
Using the following schema, causes a reflection error which relates to two [XmlElement(ElementName="Fullname")] clauses:
 
<?xml version="1.0" encoding="utf-8"?>
<xs:schema targetNamespace="http://tempuri.org/test.xsd" xmlns="http://tempuri.org/test.xsd" xmlns:mstns="http://tempuri.org/test.xsd" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata" elementFormDefault="qualified" id="test">
     <xs:element name="Name">
          <xs:complexType>
               <xs:sequence>
                    <xs:element name="Surname" type="xs:string"/>
                    <xs:element name="Forenames" type="xs:string"/>
               </xs:sequence>
          </xs:complexType>
     </xs:element>
     <xs:element name="AllNames">
          <xs:complexType>
               <xs:sequence>
                    <xs:element name="FullName" type="xs:Name" minOccurs="0" maxOccurs="unbounded"/>
               </xs:sequence>
          </xs:complexType>
     </xs:element>
</xs:schema>
 

 

Seems to fix this by adding a line into XsdWrapperGenerator.cs (AttachXmlElementAttributes) to skip the foreach loop if (el.ElementName == f.Name).
 

Not sure of knock-on effects though.
 

Chris

GeneralRe: Fix for duplicate elementsmemberJonathan de Halleux19 Apr '04 - 3:41 
This problem is fixed in the latest version of xsdtidy. You can download the latest bits on the http://mbunit.tigris.org[^] CVS.
 
Jonathan de Halleux - My Blog - www.dotnetwiki.org -
MbUnit - QuickGraph - NCollection

GeneralWould love to try...memberVerdant12310 Mar '04 - 18:04 
but i can't seem to get the solution to open correctly, no matter what i try, it says the referenced component "NDocBook" could not be found,(at compile time) (and it appears with an exlamation next to its icon under Test)
 
but if i doubleclick on it to open the object browser, everything works fine...
 
i realize the ndocbook part is just a test, but i would love to be able to play around with it...
GeneralRe: Would love to try...memberJonathan de Halleux16 Mar '04 - 8:30 
You can download the latest version XsdTidy at http://www.dotnetwiki.org [^]
 
The zip contains the generate NDocBook file that you must insert into one of the project and recompile to use them.
 
Jonathan de Halleux.

www.dotnetwiki.org

GUnit

GeneralRe: Would love to try...memberMagick9320 Oct '07 - 6:19 
Hi,
 
is there another link?
 
http://www.dotnetwiki.org/ is now 404
 
Thanks
GeneralOne nice article per week!memberxxxyyyzzz20 Feb '04 - 6:12 
I just want to say 'gg~'
GeneralRe: One nice article per week!memberJonathan de Halleux20 Feb '04 - 10:54 
Till next week Smile | :)
 
Jonathan de Halleux.

www.dotnetwiki.org

QuestionHow does intellisense help DocBook authoring?sussNorman Walsh20 Feb '04 - 1:31 
Hi Jonathan,
 
The NDocBook example is interesting, but I don't understand how it helps the author. Writing the document as a series of method calls looks more complex to me. Can you explain how "intellisense" comes into play and how it helps? Is this something that the GUI does for you to generate that code because it knows the schema?
AnswerRe: How does intellisense help DocBook authoring?memberJonathan de Halleux20 Feb '04 - 5:52 
I was more thinking about automatic report generation, not real "literal" context.
 
Intellisence helps you becauses it proposes the properties of the object which are possible childs/arguments of the corresponding docbook tag. When I write b. , intellisense pops the possible properties so I know what I can add.
 

 
Jonathan de Halleux.

www.dotnetwiki.org

GeneralCoolmemberrondalescott19 Feb '04 - 12:18 
This sounds *really wicked* man, I gotta try it. Although I have already coded around XSD.EXE's shortcomings, you have no idea how many times I've typed (MyType[])myArrayList.ToArray(typeof(MyType)). So using this would take some pretty hefty refactoring as I'm using like eight docs genned from XSD.EXE...
 
I'm gonna have to seriously consider it.
GeneralRe: CoolmemberJonathan de Halleux19 Feb '04 - 12:24 
Since I was in a hurry I did take the time to generate more "strongly typed" solutions. Right now, I'm transforming all arrays to ArrayList which is a shame.
 
One solution, would be to generate strongly typed collection for each "array type". I wonder if I have the courage to do that unless there is a real need for this.
 
The most important thing to me is how to get back "documentation" in the Xsd schema in to metadata in the assembly so that Intellisense can give us more info Smile | :)
 
Jonathan de Halleux.

www.dotnetwiki.org

GeneralRe: CoolmemberLowell Heddings19 Feb '04 - 16:39 
Arraylists!!! That is exactly what was missing.. Thank you so much.
GeneralRe: CoolmemberJonathan de Halleux19 Feb '04 - 21:21 
Smile | :) , I knew people would like this feature Smile | :)
 
Hey, don't forget to vote, this article is again under-rated (the responsability of the assemblytion lies with the author)!
 
Jonathan de Halleux.

www.dotnetwiki.org

GeneralRe: CoolmemberLowell Heddings19 Feb '04 - 21:24 
I gave it a 5... this is something I've been wishing I could find for quite some time.
GeneralRe: CoolmemberJonathan de Halleux19 Feb '04 - 22:13 
If you need more features, don't hesitate to feedback!
 
Jonathan de Halleux.

www.dotnetwiki.org

GeneralRe: Coolmemberasm4324 Feb '04 - 7:53 
You've got my 5. I have to admit though, I would really love to see strongly-typed collections and then I'd use this for sure. It's just that with ArrayLists you've blown away all info about what the element type is. I'd rather have my code blow up in the particular method that I'm creating objects in than when it's all done and tries to serialize. But I'd rather have it be entirely type-safe in the first place!
 
Also, wondering why you used Reflection.Emit instead of CodeDom? I think XSD.exe ability to generate C# source file is nice. I will admit that I haven't made any changes to ours yet though Wink | ;) But with partial classes in Whidbey it could be really cool.
GeneralRe: CoolmemberJonathan de Halleux24 Feb '04 - 12:24 
asm43 wrote:
I would really love to see strongly-typed collections and then I'd use this for sure
 
It's done! You can download the new version from my personal web at http://www.dotnetwiki.org[^]. It is called Refly
 
Actually I'm generating smart strong "multi" typed collections. For instance, if the xsd outputs something like
 
public class vehicles
{
    [XmlElement("car",typeof(car))]
    [XmlElement("truck",typeof(truck))]
    public Object[] Items;
}
 
my new tool will generate this:
public class Vehicles
{
    private ItemCollection items;
 
    [XmlElement("car",typeof(Car))]
    [XmlElement("truck",typeof(Truck))]
    public ItemCollection Items
    {
      get
      {
         return this.items;
      }
    }
 
    public class ItemCollection : System.Collections.CollectionBase
    {
        ...
 
        public void AddCar(Car car)
        {...}
        public void RemoveCar(Car car)
        {...}
        public bool ContainsCar(Car car)
        {...}
        public void AddTruck(Truck truck)
        {...}
        public void RemoveTruck(Truck truck)
        {...}
        public bool ContainsTruck(Truck truck)
        {...}
 
    }
}
 
Smile | :)
 
asm43 wrote:
Also, wondering why you used Reflection.Emit instead of CodeDom?
 
I was just testing how was Emit working but you are totally right. CodeDom is way much easier to use, no question about. That's why I've totally dropped Emit for CodeDom in the new version.

 
Jonathan de Halleux.

www.dotnetwiki.org

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web03 | 2.6.130516.1 | Last Updated 2 Mar 2004
Article Copyright 2004 by Jonathan de Halleux
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid