Click here to Skip to main content
15,892,737 members
Articles / Programming Languages / XML

XSD Tutorial - Part 2 of 5 - Conventions & Recommendations

Rate me:
Please Sign up or sign in to vote.
4.70/5 (31 votes)
3 Jul 2014CPOL3 min read 73.9K   683   59   4
This article gives a basic overview of the building blocks underlying XML Schemas.

XSD Tutorial Parts

  1. Elements and Attributes
  2. Conventions and Recommendations
  3. Extending Existing Types
  4. Namespaces
  5. Other Useful bits...

Introduction

This section covers conventions and recommendations when designing your schemas.

When to Use Elements or Attributes

There is often some confusion over when to use an element or an attribute. Some people say that elements describe data and attributes describe the meta data. Another way to look at it is that attributes are used for small pieces of data such as order ids, but really, it is personal taste that dictates when to use an attribute. Generally, it is best to use a child element if the information feels like data. Some of the problems with using attributes are:

  • Attributes cannot contain multiple values (child elements can)
  • Attributes are not easily expandable (to incorporate future changes to the schema)
  • Attributes cannot describe structures (child elements can)

lf you use attributes as containers for data, you end up with documents that are difficult to read and maintain. Try to use elements to describe data. What I am trying to say here is that metadata (data about data) should be stored as attributes, and that data itself should be stored as elements.

Mixed Element Content

Mixed content is something you should try to avoid as much as possible. It is used heavily on the web in the form of XHTML, but that has many limitations. It is difficult to parse and it can lead to unforeseen complexity in the resulting data. XML Data Binding has limitations associated with it making it difficult to manipulate such documents.

Conventions

  • All Element and Attributes should use UCC camel case, e.g. (PostalAddress), avoid hyphens, spaces or other syntax.
  • Readability is more important than tag length. There is always a line to draw between document size and readability, wherever possible favor readability.
  • Try to avoid abbreviations and acronyms for element, attribute, and type names. Exceptions should be well known within your business area, e.g., ID (Identifier), and POS (Point of Sale).
  • Postfix new types with the name 'Type', e.g., AddressType, USAddressType.
  • Enumerations should use names not numbers, and the values should be UCC camel case.
  • Names should not include the name of the containing structure, e.g., CustomerName, should be Name within the sub element Customer.
  • Only produce complexTypes or simpleTypes for types that are likely to be re-used. If the structure only exists in one place, define it inline with an anonymous complexType.
  • Avoid the use of mixed content.
  • Only define root level elements if the element is capable of being the root element in an XML document.
  • Use consistent name space aliases
  • Try to think about versioning early on in your schema design. If it is important for a new version of a schema to be backwardly compatible, then all additions to the schema should be optional. If it is important that existing products should be able to read newer versions of a given document, then consider adding any and anyAttribute entries to the end of your definitions. See Versioning recommendations.
  • Define a targetNamespace in your schema. This better identifies your schema and can make things easier to modularize and re-use.
  • Set elementFormDefault="qualified" in the schema element of your schema. This makes qualifying the namespaces in the resulting XML simpler (if not more verbose).

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior) Liquid Technologies
United Kingdom United Kingdom
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
QuestionWhat is mixed element content? Pin
Nathan Holt at EMOM25-Apr-07 11:00
Nathan Holt at EMOM25-Apr-07 11:00 
AnswerRe: What is mixed element content? Pin
Sprotty28-Apr-07 11:42
Sprotty28-Apr-07 11:42 
QuestionA little more detail on some of the recommendations? Pin
HerbCSO23-Apr-07 17:39
HerbCSO23-Apr-07 17:39 
AnswerRe: A little more detail on some of the recommendations? Pin
Sprotty28-Apr-07 11:40
Sprotty28-Apr-07 11:40 
I have tried to pull together a number of common and best practices that I have seen used in other standards. I am happy to use this area as a discussion for a 'best practices' page, so if anyone has any input. I'm happy to update the page to reflect it.

Looking at your list.

Enumerations. This was a practice taken from IBM. To quote there full text.

Enumeration values should use names only (not numbers) and the names used for enumeration values must conform to the guidelines for element or attribute names. If suitable names already exist, they should be used. Prefer ISO standards to national standards or consortium specifications. Names composed of natural language words can suggest the meaning of the value. Numbered enumerations invite nonstandard extensions that do not interoperate. (A criticism of this guideline is that the requirement to use names forces a choice of natural language. The language chosen for these names should be the one most helpful to those who maintain and extend the messaging system. However, these names should be limited to differentiating information handled differently by the information technology system; users should always be presented with messages in each user's chosen language. This guideline is not an excuse to avoid good user interfaces.)

I would also add that when using XML Data Binding, numbered enumerations cause invalid property names in most languages, causing the binding to (or user) to have to alias them, which makes for a messy class library.


OK root elements. Say you have 2 messages a purchase request and a purchase response. These would be root level elements in your schema, as they are they only elements that your XML documents can start with. Now say you have an Address element that was used within one of these elements. If this was declared as a root element, it implys that the an XML document containing only a Address XML is valid, the schema would validate it, but it has no meaning on its own. However if the address elemen is declared within the PurchaseOrder element, then the schema is not implying that Address is valid on its own, and would not validate it. If Address is common to both root elements, then it should be defined as a Complex Type.

The alias used for a namespace can be anything, however there are a number of standard namespaces, and its better to use a consistent alias for these. The main 2 being. ‘http://www.w3.org/2001/XMLSchema’ which should be aliased with ‘xs’ and ‘http://www.w3.org/2001/XMLSchema-instance’ which should be aliased with ‘xsi’

Namespacing I XSD’s is a bit of a mine field, I’ve touched on it in this article, but it realy is not that simple, so anything you can do to simplify it is a good thing. elementFormDefault="qualified" does make the resulting XML a little more verbose, but it is much easier to understand which alias should be applied to each element.

I hope this covers your questions, please follow up if your not happy with any of the answers!

Cheers Simon

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.