Click here to Skip to main content
15,884,177 members
Articles / Programming Languages / XSLT

Generic Data Points Series XML format and its validated loading with LINQ to XML

Rate me:
Please Sign up or sign in to vote.
4.67/5 (3 votes)
24 Apr 2009CPOL13 min read 24.5K   101   7  
How to express a series of generic data points in XML and read them without much pain.

Foreword

It's often desirable to provide a generic data point series data in some XML format. XML gives the ability to decorate a series with attributes, nest them in complex data objects, mix different data series/data objects in one data file, and load them with concise LINQ to XML code.

A generic data point is a structure with just two required {X, Y} properties expressing the point position in 2D space. Each point dimension has its own "Base Type", e.g., numeric, DateTime, etc. That's why the term "generic" is applied. The Data Point object type is defined by the pair of these Base Types. There is a lot of point types we can express in XML. For example, the basic set of Data Point types is produced by the Cartesian self-product of all XML atomic types. This basic set could be extended by the inclusion of XPath data types, simple types derived from XML simple types by restrictions, etc.

A Data Point Series contains one or more Data Points of the same type. The Data Point Series type is defined by the type of Data Points it contains.

One Data Point Series XML document could contain multiple Data Point Series of different types. Every application can pose its own requirements on the Data Point Series types it will accept or forbid. The Loader library must be able to validate the file format against the following list of requirements:

  1. Ensure that all the Data Points in the Data Point Series have the same type.
  2. Restrict the list of Data Point Series types it contains.
  3. Validate the Data Point dimension values against the XSD namespace and, optionally, other namespaces (like XPath) where the Base Types are defined.
  4. More...

Some of these requirements are application-specific, so the application must provide the Loader class with the appropriate information in some way.

We'll use an XML schema to validate the content of a Data Point Series XML document. This approach gives the following opportunities:

  1. Abstract the Loader code from the features which could be described in terms of the XML schema. This allows the Loader code to be both generic and concise. We'll use LINQ to XML to load the data.
  2. Pass the XML schema data to the Loader class in one form or another. The Loader class can use:
    1. The default schema stored in the library. This is the easy-to-use option, but it suffers from the lack of configurability and extensibility.
    2. Dynamically generated schema based on type mappings (see below for details). This option allows to define the list of expected Data Point Series types, but limits (at the present time) the list of Base Types by the XML schema's atomic types.
    3. User-provided schema. This is the most powerful option, but the user should be aware of the XML schema language.

Generic Data Point Series XML Format

First, we have to define the root element. Suppose it is called Items. For the sake of safety, we'll require it to define the default XML namespace urn:PointSeries-schema. The root element will look like that:

XML
<?xml version="1.0" encoding="utf-8"?>
<Items xmlns="urn:PointSeries-schema">
...
</Items>

The root element contains an unrestricted number of Data Point Series. First, we will try to define point series elements as follows:

XML
<Items xmlns="urn:PointSeries-schema">
  <Points ...>
  </Points>
  <Points ...>
  </Points>
  ...
</Items>

That won't work because different Data Point Series elements could contain points of different Base Types and, so, the Data Point Series elements themselves could be of different types. XML schema rules don't allow elements of different types to have the same name in the same scope. Hence, we must assign different names to Data Point Series elements of different types.

So, we name the Data Point Series elements according to the following patterns:

  • <Points.BaseType ...> if both data series dimensions have the same Base Type. E.g., <Points.Double ...>.
  • <Points.XBaseType.YBaseType ...> if data series dimensions have different Base Types. E.g., <Points.DateTime.Int ...>.

BaseType, XBaseType, and YBaseType Data Point Series element name parts are collectively called "type strings". It's necessary to draw an agreement on how to define these type strings, and establish the mapping between the type strings, XSD-defined data types, and CLR data types.

Table 1. Example of XSD type to CLR type to type string mapping
XSD TypeDescriptionExamplesType string.NET type
xsd:intAn integer that can be represented as a four-byte, two's complement number-2147483648, 2147483645,..., -3, -2, -1, 0, 1, 2, 3, ...IntSystem.Int32
xsd:doubleIEEE 754 64-bit floating-point number-INF, 1.401E-90, -1E4, -0, 0, 12.78E-2, 12, INF, NaN, 3.4E42DoubleSystem.Double
xsd:dateTimeA particular moment in Coordinated Universal Time, up to an arbitrarily small fraction of a second1999-05-31T13:20:00.000-05:00, 1999-05-31T18:20:00.000Z, 1999-05-31T13:20:00.000, 1999-05-31T13:20:00.000-05:32DateTimeSystem.DateTime
xsd:dateA specific day in history0044-03-15, 0001-01-01, 1969-06-27, 2000-10-31, 2001-11-17DateSystem.DateTime
xsd:gMonthA month in no particular year--01--, --02--, --03--,..., --09--, --10--, --11--, --12--MonthSystem.Int32

This table contains a partial list of XSD simple types. You can extend it by including other XML types.

According to the mapping above, for example, the <Points.Double ...> Data Point Series XML element should contain Data Points of xsd:double type for both x and y dimensions, and these points will be loaded as points with System.Double x, y properties.

The Point element itself is something like <Point x="2008-01-01" y="-20"/> with the required x and y attributes.

Shown below is the excerpt from the example input XML data file:

XML
<?xml version="1.0" encoding="utf-8"?>
<Items xmlns="urn:PointSeries-schema">
  <Points.Int.Double YName="y=x^2">
    <Point x="0" y="0"/>
    <Point x="1" y="0.01"/>
    ...
  </Points.Int.Double>
  <Points.Date.Int YName="temperature" XName="Date">
    <Point x="2008-01-01" y="-20"/>
    <Point x="2008-02-01" y="-25"/>
    ...
  </Points.Date.Int>
  <Points.Month.Double YName="2008 year month temperatures" XName="Month">
    <Point x="--01--" y="-20.8"/>
    <Point x="--02--" y="-25.2"/>
    ...
  </Points.Month.Double>
  ...
</Items>

Note: the point series elements are decorated with optional YName and XName attributes intending to represent x and y dimension labels.

XML Schema

A generic Data Point Series XML format is defined by an XML schema whose excerpt follows:

XML
<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" 
           xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <!-- Root element -->
  <xs:element name="Items" type="itemsType"/>
  
  <!-- Root element type -->
  <xs:complexType name="itemsType">
    <xs:choice maxOccurs="unbounded">
      <xs:element name="Points.Int" type="pointsIntIntType"/>
      <xs:element name="Points.Int.DateTime" type="pointsIntDttmType"/>
      ...
      <xs:element name="Points.Double" type="pointsDblDblType"/>
      <xs:element name="Points.Double.Int" type="pointsDblIntType"/>
      ...
    </xs:choice>
  </xs:complexType>
  
  <!-- Point Series Type attributes -->
  <xs:attributeGroup name="pointSetAttributes">
    <xs:attribute name="YName" 
      type="xs:string" use="optional" />
    <xs:attribute name="XName" 
      type="xs:string" use="optional" />
  </xs:attributeGroup>

  <!-- Point Series Types -->
  <xs:complexType name="pointsIntIntType">
      <xs:sequence>
        <xs:element minOccurs="1" 
            maxOccurs="unbounded" name="Point">
          <xs:complexType>
            <xs:attribute name="x" 
              type="xs:int" use="required" />
            <xs:attribute name="y" 
              type="xs:int" use="required" />
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    <xs:attributeGroup ref="pointSetAttributes"/>
  </xs:complexType>
  <xs:complexType name="pointsIntDttmType">
      <xs:sequence>
        <xs:element minOccurs="1" 
                  maxOccurs="unbounded" name="Point">
          <xs:complexType>
            <xs:attribute name="x" 
                     type="xs:int" use="required" />
            <xs:attribute name="y"
                     type="xs:dateTime" use="required" />
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    <xs:attributeGroup ref="pointSetAttributes"/>
  </xs:complexType>
  ...
  <xs:complexType name="pointsDblIntType">
      <xs:sequence>
        <xs:element minOccurs="1" 
              maxOccurs="unbounded" name="Point">
          <xs:complexType>
            <xs:attribute name="x" 
                 type="xs:double" use="required" />
            <xs:attribute name="y" 
                 type="xs:int" use="required" />
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    <xs:attributeGroup ref="pointSetAttributes"/>
  </xs:complexType>
  <xs:complexType name="pointsDblDblType">
      <xs:sequence>
        <xs:element minOccurs="1" 
                 maxOccurs="unbounded" name="Point">
          <xs:complexType>
            <xs:attribute name="x" 
                 type="xs:double" use="required" />
            <xs:attribute name="y" 
                 type="xs:double" use="required" />
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    <xs:attributeGroup ref="pointSetAttributes"/>
  </xs:complexType>
  ...
</xs:schema>

This schema defines the <Items ...> root element whose expected contents is defined by the XSD choice selector. You should modify the contents of the selector to just those Data Point Series element types your application expects.

The rest of the schema contains the long list of element type definitions. Each of these types defines the Data Point Series with specific x, y Base Types.

You can define new Base Types in the schema using XML Schema type derivation rules.

Type Mapping

Writing or editing the Data Points Series XML schema by hand is tedious, and requires a knowledge of the XML schema specification (see part1, part2).

Instead, the schema could be composed on the fly. If you look at the schema excerpt above, you'll see that most of the text is repeated from one type definition to another. The information which varies from one schema to another can be expressed in a much more compact form than the schema itself. All that is required to compose the schema is data like those in Table 1. We should describe the Data Point Series types by defining the Base Types and the mapping between the XSD and CLR types along with the "type string" used to construct the Data Point Series XML element tag name.

That is an example type mapping XML document excerpt:

XML
<?xml version="1.0" encoding="utf-8"?>
<Mappings xmlns="urn:PointSeries-mapping">
  <Mapping>
    <XAxis xsd-type="int" clr-type="System.Int32" type-string="Int"/>
    <YAxis xsd-type="double" type-string="Double"/>
  </Mapping>
  <Mapping>
    <XAxis xsd-type="double" clr-type="System.Double" type-string="Double"/>
    <YAxis xsd-type="date" clr-type="System.DateTime" type-string="Date"/>
  </Mapping>
  <Mapping>
    <XAxis xsd-type="double" clr-type="System.Double" type-string="Double"/>
    <YAxis xsd-type="gMonth" clr-type="System.Int32" type-string="Month"/>
  </Mapping>
  ...
  <Mapping>
    <XAxis xsd-type="dateTime" type-string="DateTime"/>
    <YAxis xsd-type="double" type-string="Double"/>
  </Mapping>
</Mappings>

The root Mappings element declares the urn:PointSeries-mapping XML namespace. It could contains one or more Mapping elements.

A Mapping element defines a Data Point Series type. It contains exactly two elements: XAxis for the x dimension, and YAxis for the y dimension.

Every ...Axis element defines the type of the dimension in the world of XML (xsd-type) and the world of .NET (clr-type). The type-string attribute provides the name used to compose the name of the Data Point Series element in the data XML file. For example, the first mapping element in the snippet above will produce the type definition for the <Points.Int.Double> element. The xsd-type and the type-string attributes are required, and the clr-type attribute is optional. If it's missed, then the CLR type is deduced from the XSD type to CLR type default mapping table hardcoded into the Loader library (it's the same mapping as .NET uses, see Mapping XML Data Types to CLR Types). If it's present, then the Loader will attempt to convert the value of the XSD type to the CLR type specified. For example, see the third Mapping element. The default CLR type for the gMonth XSD type is DateTime, but the clr-type attribute value is Int32. The Loader will convert the value of the gMonth type to Int32 with the help of the XML Converter class instance, see below. Note that the clr-type attribute value could contain the full assembly-qualified type name.

The mapping file must not contain contradictory entries: it must not define two Data Point Series elements with the same element names.

The mapping document is validated against the following schema:

XML
<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" 
           targetNamespace="urn:PointSeries-mapping"
           xmlns="urn:PointSeries-mapping"
           xmlns:xs="http://www.w3.org/2001/XMLSchema">
  
  <xs:element name="Mappings">
    <xs:complexType>
      <xs:sequence>
        <xs:element maxOccurs="unbounded" name="Mapping" type="mappingType"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  
  <xs:complexType name="axisType">
    <xs:attribute name="xsd-type" type="xs:string" use="required" />
    <xs:attribute name="clr-type" type="xs:string" use="optional" />
    <xs:attribute name="type-string" type="xs:string" use="required" />
  </xs:complexType>
  
  <xs:complexType name="mappingType">
    <xs:sequence>
      <xs:element name="XAxis" type="axisType"/>
      <xs:element name="YAxis" type="axisType"/>
    </xs:sequence>
  </xs:complexType>
</xs:schema>

This schema is stored in the Loader library assembly as a resource.

Reading the Data

In the code attached to the article, all data reading code is placed into the Loader class library project producing the XmlDataPointSeries.Loader assembly. The Loader class contains the data reading/parsing code, and the supplementary classes, XsdDataPoint, DataPoint, and DataPointSeries provide the place to store the results.

C#
/// <summary>
/// Loads <see cref="DataPointSeries"/> collection from an XML file or Stream.
/// </summary>
/// <remarks>
/// Contents of the Data Point Series XML document is validated against a XML schema. 
/// <para>That schema is either
/// <list type="number">
/// <item>Prebuilt and stored as the resource.</item>
/// <item>Provided by the user.</item>
/// <item>Dynamically constructed from the contents of XML mapping data.</item>
/// </list>
/// </para>
/// </remarks>
public static class Loader
{
    // XML namespace must be used in XML data files.
    internal const string dataNamespaceName = "urn:PointSeries-schema";

    #region LoadWithSchema
    /// <summary>
    /// Loads a <see cref="DataPointSeries"/> collection from
    /// the <paramref name="dataReader"/> specified with
    /// the XML schema provided by <paramref name="schemaReader"/>.
    /// </summary>
    /// <param name="dataReader">XML DataPointSeries collection
    /// <see cref="System.IO.TextReader"/>.</param>
    /// <param name="schemaReader">XML schema <see cref="System.IO.TextReader"/>.</param>
    /// <returns><see cref="DataPointSeries"/> collection.</returns>
    /// <exception cref="ValidationException"/>
    public static IEnumerable<DataPointSeries> 
           LoadWithSchema(TextReader dataReader, XmlReader schemaReader){ ... }
    /// <summary>
    /// Loads <see cref="DataPointSeries"/> collection from
    /// the <paramref name="dataStream"/> specified with
    /// the XML schema provided by <paramref name="schemaStream"/>.
    /// </summary>
    /// <param name="dataStream">Input XML data <see cref="System.IO.Stream"/>.</param>
    /// <param name="schemaStream">Input XML schema <see cref="System.IO.Stream"/>.</param>
    /// <returns><see cref="DataPointSeries"/> collection.</returns>
    /// <exception cref="ValidationException"/>
    public static IEnumerable<DataPointSeries> 
           LoadWithSchema(Stream dataStream, Stream schemaStream) { ... }
    /// <summary>
    /// Loads <see cref="DataPointSeries"/> collection from the
    /// <paramref name="dataFileName"/> file specified with
    /// the XML schema provided by <paramref name="schemaFileName"/>.
    /// </summary>
    /// <param name="dataFileName">Name of the data file.</param>
    /// <param name="schemaFileName">Name of the schema file.</param>
    /// <returns><see cref="DataPointSeries"/> collection.</returns>
    /// <exception cref="ValidationException"/>
    public static IEnumerable<DataPointSeries> 
           LoadWithSchema(string dataFileName, string schemaFileName) { ... }
    /// <summary>
    /// Loads a <see cref="DataPointSeries"/> collection from
    /// the XML data <paramref name="dataReader"/>
    /// specified with prebuilt XML schema.
    /// </summary>
    /// <param name="dataReader">DataPointSeries collection
    /// XML <see cref="System.IO.TextReader"/>.</param>
    /// <returns><see cref="DataPointSeries"/> collection.</returns>
    /// <exception cref="ValidationException"/>
    public static IEnumerable<DataPointSeries> LoadWithSchema(TextReader dataReader) { ... }
    /// <summary>
    /// Loads a <see cref="DataPointSeries"/> collection
    /// from the XML data file specified with prebuilt XML schema.
    /// </summary>
    /// <param name="dataStream">Input XML data <see cref="System.IO.Stream"/>.</param>
    /// <returns><see cref="DataPointSeries"/> collection.</returns>
    /// <exception cref="ValidationException"/>
    public static IEnumerable<DataPointSeries> LoadWithSchema(Stream dataStream) { ... }
    /// <summary>
    /// Loads a <see cref="DataPointSeries"/> collection
    /// from the XML data file specified with the prebuilt XML schema.
    /// </summary>
    /// <param name="fileName">DataPointSeries collection XML file Name.</param>
    /// <returns><see cref="DataPointSeries"/> collection.</returns>
    /// <exception cref="ValidationException"/>
    public static IEnumerable<DataPointSeries> LoadWithSchema(string fileName) { ... }
    #endregion LoadWithSchema

    #region LoadWithMappings
    /// <summary>
    /// Loads a <see cref="DataPointSeries"/> collection
    /// from the <paramref name="dataReader"/>
    /// specified with the mappings provided by the <paramref name="mappingReader"/>.
    /// </summary>
    /// <param name="dataReader">Input XML data
    /// <see cref="System.IO.TextReader"/>.</param>
    /// <param name="mappingReader">Input XML mapping
    /// <see cref="System.IO.TextReader"/>.</param>
    /// <returns><see cref="DataPointSeries"/> collection.</returns>
    /// <exception cref="ValidationException"/>
    public static IEnumerable<DataPointSeries> 
           LoadWithMappings(TextReader dataReader, TextReader mappingReader) { ... }
    /// <summary>
    /// Loads <see cref="DataPointSeries"/> collection from the 
    /// <paramref name="dataStream"/> specified.
    /// </summary>
    /// <param name="dataStream">Input XML data
    /// <see cref="System.IO.Stream"/>.</param>
    /// <param name="mappingStream">Input XML mapping
    /// <see cref="System.IO.Stream"/>.</param>
    /// <returns><see cref="DataPointSeries"/> collection.</returns>
    /// <exception cref="ValidationException"/>
    public static IEnumerable<DataPointSeries> 
           LoadWithMappings(Stream dataStream, Stream mappingStream) { ... }
    /// <summary>
    /// Loads <see cref="DataPointSeries"/> collection from the file specified.
    /// </summary>
    /// <param name="dataFileName">Name of the data file.</param>
    /// <param name="mappingFileName">Name of the mapping file.</param>
    /// <returns><see cref="DataPointSeries"/> collection.</returns>
    /// <exception cref="ValidationException"/>
    public static IEnumerable<DataPointSeries> 
           LoadWithMappings(string dataFileName, string mappingFileName) { ... }
    #endregion LoadWithMappings

    /// <summary>
    /// Parses the point series element tag and returns x,y type strings.
    /// </summary>
    /// <param name="tagName">Tag name.</param>
    /// <param name="xType">Output Type of the x-dimension.</param>
    /// <param name="yType">Output Type of the y-dimension.</param>
    static void getXYTypeStrings(string tagName, out string xType, out string yType)
    {
        int n = tagName.IndexOf('}');
        Debug.Assert(n > 0, "n > 0");
        const string pointsTagPrefix = "Points";
        int pointsTagPrefixLength = pointsTagPrefix.Length;
        Debug.Assert(tagName.Length > n + pointsTagPrefixLength + 1, 
                     "tagName.Length > n + pointsTagPrefixLength + 1");
        string xyTypes = tagName.Substring(n + pointsTagPrefixLength + 2);
        n = xyTypes.IndexOf('.');
        if (n < 0)
        {
            xType = xyTypes;
            yType = xyTypes;
        }
        else
        {
            xType = xyTypes.Substring(0, n);
            yType = xyTypes.Substring(n + 1);
        }
    }
}

This class provides the LoadWithSchema and LoadWithMappings method overloads to load Data Point Series XML documents validating against either the default or the user-supplied schema, or against the dynamically generated schema.

By design, the LoadWithSchema and LoadWithMappings methods fail on any error occurring on file opening, reading, parsing, and validating, and throw either System.IO exceptions or the Loader library ValidationException containing the error descriptions. All validation errors are returned by the ValidationException.ValidationErrors property; this gives the user a chance to fix all the errors at once.

Load with Schema

The principal LoadWithSchema method overload is:

C#
public static IEnumerable<DataPointSeries> 
       LoadWithSchema(TextReader dataReader, XmlReader schemaReader)
{
    StringBuilder sbErrors = null;
    List<ValidationException.ValidationError> errors = null;
    // Load and validate the schema.
    XmlSchema schema = XmlSchema.Read(schemaReader, (sender, e) =>
    {
        if (sbErrors == null)
            sbErrors = new StringBuilder();
        sbErrors.AppendFormat(
            "Schema validation error: {1}{0}Line={2}, position={3}{0}", 
            System.Environment.NewLine, e.Exception.Message, 
            e.Exception.LineNumber, e.Exception.LinePosition);
        if (errors == null)
            errors = new List<ValidationException.ValidationError>();
        errors.Add(new ValidationException.ValidationError()
        {
            Message = e.Exception.Message,
            Line = e.Exception.LineNumber,
            Position = e.Exception.LinePosition
        });
    });
    if (sbErrors != null)
        // Validation error(s) occured.
        throw new ValidationException(sbErrors.ToString(), errors.ToArray());
    XmlSchemaSet schemaSet = new XmlSchemaSet();
    schemaSet.Add(schema);

    // Load and validate the data file.
    using (XmlReader reader = XmlReader.Create(dataReader))
    {
        XDocument doc = XDocument.Load(reader);
        doc.Validate(schemaSet, (sender, e) =>
        {
            if (sbErrors == null)
                sbErrors = new StringBuilder();
            sbErrors.AppendFormat("Validation error: {1}{0}Line={2}, position={3}{0}"
                , System.Environment.NewLine, e.Exception.Message
                , e.Exception.LineNumber, e.Exception.LinePosition);
            if (errors == null)
                errors = new List<ValidationException.ValidationError>();
            errors.Add(new ValidationException.ValidationError()
            {
                Message = e.Exception.Message,
                Line = e.Exception.LineNumber,
                Position = e.Exception.LinePosition
            });
        }, true); 
        if (sbErrors != null)
            // Validation error(s) occured.
            throw new ValidationException(sbErrors.ToString(), errors.ToArray());

        XNamespace xns = dataNamespaceName;
        XElement items = doc.Element(xns + "Items");
        // Check the root element name (i.e. Items in "urn:PointSeries-schema" xmlns).
        //if (items.Name != xns + "Items")
        if (items == null)
            throw new ValidationException(string.Format("Root element {0} missed", 
                                                        xns + "Items"));
        // Parse the Point.XXX elements.
        return items.Elements().Select<XElement, DataPointSeries>(
            (item) =>
            {
                // Parse item tag name for X/Y type strings.
                string xType, yType;
                getXYTypeStrings(item.Name.ToString(), out xType, out yType);
                // Optional attributes.
                var yName = item.Attribute("YName");
                var xName = item.Attribute("XName");

                IXmlSchemaInfo schemaInfo = item.GetSchemaInfo();
                XmlSchemaElement e = schemaInfo.SchemaElement;

                DataPointSeries series = new DataPointSeries()
                {
                    XName = xName == null ? "" : xName.Value,
                    XTypeString = xType,
                    YName = yName == null ? "" : yName.Value,
                    YTypeString = yType
                };
                foreach (var pt in from pt in item.Elements(xns + "Point") select pt)
                {
                    XAttribute xAttr = pt.Attribute("x");
                    if (series.XXsdTypeString == null)
                        series.XXsdTypeString = 
                          xAttr.GetSchemaInfo().SchemaAttribute.SchemaTypeName.Name;
                    XAttribute yAttr = pt.Attribute("y");
                    if (series.YXsdTypeString == null)
                        series.YXsdTypeString = 
                          yAttr.GetSchemaInfo().SchemaAttribute.SchemaTypeName.Name;
                    series.XsdPoints.Add(new XsdDataPoint((string)xAttr, (string)yAttr));
                }
                return series;
            });
    }
}

At first, this method loads the XML schema with the XmlSchema schema = XmlSchema.Read() method call. Then, it creates the XmlReader reader object, loads the XML document with XDocument doc = XDocument.Load(reader), and validates the loaded XML with the Validate extension method. If no errors happen at this point, the data is loaded and is validated against the schema.

The LoadWithSchema method gets the root element with:

C#
XNamespace xns = dataNamespaceName;
XElement items = doc.Element(xns + "Items");

Note the xns variable: it assures that the Items element is defined in the right XML namespace. After that, the LoadWithSchema method parses the loaded XML and returns the result with:

C#
return items.Elements().Select<XElement, DataPointSeries>(...)

DataPointSeries instances are created by the lambda statement which:

  1. Extracts the data series Base Types from the XElement tag name with the getXYTypeStrings method.
  2. Gets the optional attributes.
  3. Creates the instance of the DataPointSeries class. Extracts the DataPointSeries class instance XXsdTypeString and YXsdTypeString property values from the post-validation IXmlSchemaInfo instances associated with a Point element x and y attributes. The XClrType and YClrType property values are left null.
  4. Fills that instance's XsdPoints property with the Points collection.

Some of the LoadWithSchema method overloads have just one argument. These overloads use the default schema stored as a resource in the Loader assembly.

Load with Mappings

The principal LoadWithMappings method overload is:

C#
public static IEnumerable<DataPointSeries> 
       LoadWithMappings(TextReader dataReader, TextReader mappingReader)
{
    // Load mappings.
    List<Mapping> mappings = Mapping.Load(mappingReader);

    // Prepaire XmlReaderSettings for input file validation.
    XmlReaderSettings settings = new XmlReaderSettings();
    settings.ValidationType = ValidationType.Schema;
    settings.Schemas.Add(SchemaBuilder.Build(mappings));
    StringBuilder sbErrors = null;
    List<ValidationException.ValidationError> errors = null;
    settings.ValidationEventHandler += (sender, e) =>
    {
        if (sbErrors == null)
            sbErrors = new StringBuilder();
        sbErrors.AppendFormat(
             "Validation error: {1}{0}Line={2}, position={3}{0}",
             System.Environment.NewLine, e.Exception.Message, 
             e.Exception.LineNumber, e.Exception.LinePosition);
        if (errors == null)
            errors = new List<ValidationException.ValidationError>();
        errors.Add(new ValidationException.ValidationError()
        {
            Message = e.Exception.Message,
            Line = e.Exception.LineNumber,
            Position = e.Exception.LinePosition
        });
    };

    // Load and validate the file.
    using (XmlReader reader = XmlReader.Create(dataReader, settings))
    {
        XElement items = XElement.Load(reader);
        if (sbErrors != null)
            // Validation error(s) occured.
            throw new ValidationException(sbErrors.ToString(), errors.ToArray());

        XNamespace xns = dataNamespaceName;
        // Check the root element name (i.e. Items in "urn:PointSeries-schema" xmlns).
        if (items.Name != xns + "Items")
            throw new ValidationException(string.Format("Root element {0} missed", 
                                                        xns + "Items"));
        // Parse the Point.XXX elements.
        return items.Elements().Select<XElement, DataPointSeries>(
            (item) =>
            {
                // Parse item tag name for X/Y type strings.
                string xType, yType;
                getXYTypeStrings(item.Name.ToString(), out xType, out yType);
                
                // Dot-separated type string.
                string xyType = xType == yType ? xType : xType + "." + yType;
                Mapping map = (from mapItem in mappings
                               where mapItem.DotSeparatedTypeString == xyType
                               select mapItem).Single();

                // Optional attributes.
                var yName = item.Attribute("YName");
                var xName = item.Attribute("XName");
                
                DataPointSeries series = new DataPointSeries()
                {
                    XName = xName == null ? "" : xName.Value,
                    XXsdTypeString = map.XAxis.XsdTypeString,
                    XClrType = map.XAxis.ClrType,
                    XTypeString = map.XAxis.TypeString,
                    YName = yName == null ? "" : yName.Value,
                    YXsdTypeString = map.YAxis.XsdTypeString,
                    YClrType = map.YAxis.ClrType,
                    YTypeString = map.YAxis.TypeString
                };
                foreach (var pt in from pt in item.Elements(xns + "Point") select pt)
                {
                    series.XsdPoints.Add(new XsdDataPoint((string)pt.Attribute("x"), 
                                        (string)pt.Attribute("y")));
                }
                return series;
            });
    }
}

The LoadWithMappings method calls List<Mapping> mappings = Mapping.Load(mappingReader) to build the schema from the mappings reader instance provided (see later). Then, it creates the XmlReader instance with the reader = XmlReader.Create(fileName, settings) statement, and loads the XML into memory with XElement items = XElement.Load(reader). If no errors happen at this point, the data is loaded and is validated against the schema generated.

Then, the LoadWithSchema method parses the loaded XML and returns the result with:

C#
return items.Elements().Select<XElement, DataPointSeries>(...)

DataPointSeries instances are created by the lambda statement which:

  1. Extracts the data series Base Types from the XElement tag name with the getXYTypeStrings method.
  2. Gets the Mapping class instance associated with the XElement.
  3. Gets the optional attributes.
  4. Creates the instance of the DataPointSeries class. That instance XXsdTypeString, YXsdTypeString, XClrType, and YClrType property values are got from the Mapping class instance.
  5. Fills that instance's XsdPoints property with the Points collection.

Constructing the XML Schema from the Mappings XML

The XML-CLR-string type mapping XML document and its associated XML schema are described above. In the code, this mapping is represented by two classes.

The first one represents the XML-CLR-string type mapping in one dimension:

C#
/// <summary>
/// XML-CLR-string type mapping in one dimension.
/// </summary>
public class AxisMapping
{
    /// <summary>
    /// Initializes a new instance of the <see cref="AxisMapping"/> class.
    /// </summary>
    /// <param name="xsdType">XML Type Name.</param>
    /// <param name="clrType">CLR Type Name.</param>
    /// <param name="typeString">The type string.</param>
    public AxisMapping(string xsdType, string clrType, string typeString)
    {
        XsdTypeString = xsdType;
        ClrType = string.IsNullOrEmpty(clrType) ? null : Type.GetType(clrType);
        TypeString = typeString;
    }

    /// <summary>
    /// Gets XML atomic type name like "double" or "gMonth".
    /// </summary>
    /// <value>XML atomic type name string.</value>
    /// <remarks>XSD type name string doesn't
    ///   contains namespace prefix. </remarks>
    public string XsdTypeString { get; private set; }

    /// <summary>
    /// Gets the CLR type.
    /// </summary>
    /// <value>The CLR type or <c>null</c>.</value>
    public Type ClrType { get; private set; }

    /// <summary>
    /// Gets the "type string" assigned to this mapping like Double, Int, etc.
    /// </summary>
    /// <value>The type string.</value>
    /// <remarks>"Type string" is used in XML
    ///         schema construction.</remarks>
    public string TypeString { get; private set; }

    ...
}

The second one contains the XML-CLR-string type mapping for x, y dimensions, and defines some Load method overrides to load the mappings from the mappings XML document:

C#
/// <summary>
/// X,Y dimensions XML-CLR-string type mappings.
/// </summary>
/// <remarks>
/// <see cref="IEquatable{Mapping}"/> interface
/// implemented for use with the Distinct() LINQ operator.
/// <para>In order to compare the elements,
/// the Distinct operator uses the elements'
/// implementation of the IEquatable<T>.Equals method if the elements
/// implement the IEquatable<T> interface.
/// It uses their implementation of the
/// Object.Equals method otherwise.</para>
/// </remarks>
public class Mapping : IEquatable<Mapping> 
{
    public AxisMapping XAxis { get; private set; }
    public AxisMapping YAxis { get; private set; }

    /// <summary>
    /// Loads XML-CLR-string type mappings from the
    /// <see cref="System.IO.TextReader"/> specified.
    /// </summary>
    /// <param name="mappingReader">Input Mapping XML
    /// <see cref="System.IO.TextReader"/>.</param>
    /// <returns>List of <see cref="Mapping"/> objects.</returns>
    /// <remarks>
    /// <see cref="Mapping"/> instances
    /// with recurring <see cref="TypeString"/> property 
    /// values are removed from output.
    /// </remarks>
    /// <exception cref="ValidationException"/>
    /// <exception cref="RecurringMappingEntriesException"/>
    public static List<Mapping> Load(TextReader mappingReader) { ... }
    /// <summary>
    /// Loads XML-CLR-string type mappings from the
    /// <see cref="System.IO.Stream"/> specified.
    /// </summary>
    /// <param name="stm">Input Mapping XML data
    /// <see cref="System.IO.Stream"/>.</param>
    /// <returns>List of <see cref="Mapping"/> objects.</returns>
    /// <remarks>
    /// <see cref="Mapping"/> instances with
    /// recurring <see cref="TypeString"/> property 
    /// values are removed from output.
    /// </remarks>
    /// <exception cref="ValidationException"/>
    /// <exception cref="RecurringMappingEntriesException"/>
    public static List<Mapping> Load(Stream stm) { ... }
    /// <summary>
    /// Loads XML-CLR-string type mappings from the file specified.
    /// </summary>
    /// <param name="mappingFileName">Mapping file name.</param>
    /// <returns>List of <see cref="Mapping"/> objects.</returns>
    /// <remarks>
    /// <see cref="Mapping"/> instances with recurring
    /// <see cref="TypeString"/> property values are removed from output.
    /// </remarks>
    /// <exception cref="ValidationException"/>
    /// <exception cref="RecurringMappingEntriesException"/>
    public static List<Mapping> Load(string mappingFileName) { ... }

    ...
}

The principal Load method overload is as follows:

C#
public static List<Mapping> Load(TextReader mappingReader)
{
    // XML Mapping Schema resource name.
    const string mappingSchemaResourceName = "typemappings.xsd";
    // XML namespace must be used in XML mappings files.
    const string mappingNamespaceName = "urn:PointSeries-mapping";
    // Mapping element attributes.
    const string attrNameXsdType = "xsd-type"
        , attrNameClrType = "clr-type"
        , attrNameTypeString = "type-string";

    // Get xml schema stream from the "mappingSchemaFileName" resource.
    Assembly assembly = Assembly.GetAssembly(typeof(Loader));
    ResourceManager rm = new ResourceManager(assembly.GetName().Name + 
                                             ".g", assembly);
    using (XmlTextReader schemaReader = 
           new XmlTextReader(rm.GetStream(mappingSchemaResourceName)))
    {
        // Prepaire XmlReaderSettings for input file validation.
        XmlReaderSettings settings = new XmlReaderSettings();
        settings.ValidationType = ValidationType.Schema;
        settings.Schemas.Add(mappingNamespaceName, schemaReader);
        StringBuilder sbErrors = null;
        List<ValidationException.ValidationError> errors = null;
        settings.ValidationEventHandler += (sender, e) =>
        {
            if (sbErrors == null)
                sbErrors = new StringBuilder();
            sbErrors.AppendFormat(
                "Validation error: {1}{0}Line={2}, position={3}{0}", 
                System.Environment.NewLine, e.Exception.Message, 
                e.Exception.LineNumber, e.Exception.LinePosition);
            if (errors == null)
                errors = new List<ValidationException.ValidationError>();
            errors.Add(new ValidationException.ValidationError()
            {
                Message = e.Exception.Message,
                Line = e.Exception.LineNumber,
                Position = e.Exception.LinePosition
            });
        };

        // Load and validate the file.
        using (XmlReader reader = XmlReader.Create(mappingReader, settings))
        {
            XElement mappings = XElement.Load(reader);
            if (sbErrors != null)
                // Validation error(s) occured.
                throw new ValidationException("Mapping file validation errors\n"
                    + sbErrors.ToString(), errors.ToArray());

            XNamespace xns = mappingNamespaceName;
            // Check the root element name
            // (i.e. Mappings in "urn:PointSeries-mapping" xmlns).
            if (mappings.Name != xns + "Mappings")
                throw new ValidationException(string.Format("Root element {0} missed", 
                                                            xns + "Items"));
            // Parse the Mapping elements.
            List<Mapping> mappingList = (from mapping in mappings.Elements(xns + "Mapping")
                    let xAxis = mapping.Element(xns + "XAxis")
                    let yAxis = mapping.Element(xns + "YAxis")
                    select new Mapping()
                    {
                        XAxis = new AxisMapping((string)xAxis.Attribute(attrNameXsdType),
                            (string)xAxis.Attribute(attrNameClrType),
                            (string)xAxis.Attribute(attrNameTypeString)),
                        YAxis = new AxisMapping((string)yAxis.Attribute(attrNameXsdType),
                            (string)yAxis.Attribute(attrNameClrType),
                            (string)yAxis.Attribute(attrNameTypeString))
                    }
                ).ToList();

            // Check result for recurring entries.
            List<Mapping> recurring = new List<Mapping>();
            for (int i = 0; i < mappingList.Count - 1; i++)
            {
                Mapping map = mappingList[i];
                for (int j = i + 1; j < mappingList.Count; j++)
                {
                    Mapping map1 = mappingList[j];
                    if (map.DotSeparatedTypeString == map1.DotSeparatedTypeString)
                        recurring.Add(map1);
                }
            }
            if (recurring.Count > 0)
            {
                StringBuilder sb = 
                  new StringBuilder("Recurring entries found in the mapping file:");
                foreach (Mapping map in recurring)
                {
                    sb.Append(System.Environment.NewLine + map.ToString());
                }
                throw new RecurringMappingEntriesException(sb.ToString(), 
                                                           recurring.ToArray());
            }

            return mappingList;
        }
    }
}

The mapping XML schema file is stored in the Loader library assembly as a resource. The Load method gets it with the ResourceManager, and uses it to prepare the XmlReaderSettings class instance for loading the mappings XML document with validation. Then, the Load method loads the mappings XML with XmlReader, and converts its content to the Mapping object collection with the LINQ query. At last, it checks if the Mapping object collection contains recurring entries and, if so, throws the RecurringMappingEntriesException.

Loaded Data Representation

The result of data loading is stored in the DataPointSeries object collection.

The DataPointSeries class contains the properties describing the x, y dimension types in terms of both the XML and the CLR. The points loaded are returned as a Collection<XsdDataPoint> by the DataPointSeries.XsdPoints property. The XsdDataPoint structure stores the x, y point coordinate values as strings in the same form as they were presented in the input XML file.

To get the typed Data Points, you should use the

public IEnumerable<DataPoint> GetPoints(IXmlTypeConverter converter)
method which converts the XsdDataPoint x, y field string values to the specific CLR types with the help of the XML-to-CLR type converter provided by the caller. As an alternative, you can use the GetPoints method overload without parameters. It uses the default converter hardcoded into the Loader assembly.

Note that the DataPoint class stores x, y values in the fields of the System.Object type. We could resort to the more type safe world, but with C# 3.0, we'll be forced, sooner or later, to return or get such values as System.Object and use Reflection to work with them. Let's wait for C# 4.0 dynamic types.

Using the Code

The code attached to this article contains the Visual Studio 2008 SP1 solution targeted at .NET Framework 3.5, with three projects. The main part is the Loader class library project described above.

The other two projects are the simple Console applications which load the data from the XML file pointed to by the first command line argument and (for the second project) the mapping XML file pointed to by the second argument. They either report errors, or display the results of the XML data parsing. The sample input files for these applications are in the root solution directory.

Pay your attention to the Unit Test project. It contains the tests for a lot of Data Point Series types, and provides you with examples of which data is supported by the XML format in question and how they should look like.

History

  • 9th April, 2009: Initial post.
  • 16th April, 2009: Second article revision with the following additions:
    1. Added support for on-the-fly XML schema generation.
    2. The Loader class interface modified to load Data Points Series XML data with either the default schema, the schema provided by the caller, or the schema generated from the type mappings XML file.
    3. The IXmlConverter interface and its default implementation added.
    4. The DataPointSeries class interface modified to return the results of the Data Points Series XML data parsing as either a collection of raw XsdDataPoint objects or typesafe DataPoint objects.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Team Leader
Russian Federation Russian Federation
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
-- There are no messages in this forum --