Introduction

Today I talk about how the XML technology may be used to simplify the electronic communication of financial documents. Corporate financial documents are commonly represented by means of balance sheets, income statements, statement of cash flows, etc. These various parts of an organization’s financial statements adhere to guidelines established by the Securities and Exchange Commission (Pakistan) or to GAAP. Since the financial documents fit neatly (most of the times) into a well-defined and laid out template, it is easy for financial documents from one organization to be useful to other organizations, stockholders or just keen individuals. However, when it comes to electronic transmittal of such documents, no such standardization exists. Even though technologies do exist – most notably, Financial Information Exchange Markup Language – they have not, yet, been standardized. Furthermore, there exists no generic mechanism that can process such formats; custom applications can be developed to process it, however, that would be too expensive in both a monetary and a non-monetary sense as the demand is not highly saturated as of yet.

This article focuses on how the existing server-side technologies can be used to markup financial data for exchange over the Internet. Readers are asked not to confuse this with an abstract for a new markup language. As I have mentioned earlier, mechanisms do exist that allow for the markup of financial information. This article, on the other hand, explains how the markup can be used to transmit and process information inexpensively, efficiently and reliably, using the existing infrastructure (in our case, the Internet) and highly industry supported technologies (ASP, in our case). Isn’t it kind of ironic that an entirely new technology should be used in conjunction with an old technology? Moreover, why bother with all this markup baloney when the processing of data can, just as easily, be handled without marking up the data? There are very good reasons to do so:

True that data can be processed using a host of technologies that we have at our disposal. However, more often than not, these data and the format that they are stored in are exclusive to the organization. Well and good as far as that one organization goes. However, organizations seldom exist in seclusion; indeed, if they did, they probably would not survive for long. Organizations survive by maintaining long-term liaison with both the customers and other organizations. Like other things in this world, a liaison or a relationship exists for only as long as there is effective communication; transfer of information between the parties. When data is standardized in some manner (markup, in this case), the reliability of the data transmittal can be, somewhat, taken for granted.
Cost is another factor, albeit a rather important one. When organizations have an accepted mechanism to send and receive data, costs associated with hiring programmers to do the transformation, having applications built to support the various exchange formats, hiring staff to train people on the different sets of formats, publishing extensive documentation and record keeping are considerably reduced.
Since markup of information using XML is done in a text format, processing can be carried out economically. A benefit of using text files for XML is that we do not need expensive, proprietary software to extract data, as we would with a binary file.
All XML parsers use UNICODE encoding internally. Therefore, XML documents can easily be internationalized.
Besides cost, savings of time is also the resultant of these benefits. Organizations can, thus, divert their energy and time to other projects.

So here you go. I am quite sure that several others can be highlighted for marking up the data. However, I think I have made my point. So, without further ado, I will move on to show how a markup language can be created to process financial data.

Top Level View of the Process

For the sake of those just entering the arena, I will go over some of the rudimentary concepts. The data for the financial processing would be marked up using custom tags (syntax of markup languages). The tags developed by the programmers are human-readable. To make these tags machine-readable, we need to define a vocabulary or grammar that tells the computer how to render the tags. Essentially, the tags have no meaning unless a meaning is assigned to them. Also, since we are aiming at the standardization of storing and communicating data, we also need to make sure that the data entered by everyone conforms to some standard. The assignment of meanings to tags and syntax validation would be carried out using Document Type Definition (DTD’s) –Microsoft Schemas and Relax-NG are the other two main contenders that can do the same. The reason for choosing a DTD over the other two is that is fairly simple to use. However, the other two methods offer far more power than a DTD; at the expense of simplicity.

Once the document and the tags are conformed to a standard, a program called a parser has to parse the data and pass it on to a higher level application for processing. For the purpose of this introductory, I will use Microsoft’s MSXML parser that comes bundled with IE5.0. The application would be designed using server-side technology called Active Server Pages (ASP). The reason that I have chosen MSXML and IE5.0 is that it would be convenient for a majority of readers to try out the examples presented in this article. However, there are a variety of parsers available on the Internet, some of which are free to use. Once a parser is downloaded, it can be used in a custom-built application to parse the markup language and produce results.

The XML application for financial data exchange Table I: XML document

<?xml version="1.0" standalone="no" ?>
<!DOCTYPE balancesheet SYSTEM "balsheet.dtd">
<balancesheet>
<contents date="June 30, 2001">
<assets>
<fassets>
<item id="1">
<title>Operating Fixed Assets</title>
<rupee>Rs. 3,305,760</rupee>
</item>
<item id="2">
<title>Capital work-in-progress</title>
<rupee>Rs. 686,035</rupee>
</item>
</fassets>
<cassets>
<item id="3">
<title>Cash and bank balances</title>
<rupee>Rs.963,994</rupee>
</item>
<item id="4">
<title>Trade debts</title>
<rupee>Rs.8,385,843</rupee>
</item>
</cassets>
</assets>
<liabilities>
<cliabilities>
<item id="5">
<title>Short term loans</title>
<rupee>Rs.1,979,219</rupee>
</item>
</cliabilities>
</liabilities>
<equities>
<item id="6">
<title>Share capital</title>
<rupee>Rs.1,429,325</rupee>
</item>
</equities>
</contents>
</balancesheet>

I bet most of you can figure out what the document is about; it is pretty much self-explanatory and shows the structure of a corporate balance sheet. However, because we are going through this whole “ordeal” to make electronic communication of data more reliable, we need some mechanism for the computer to understand this markup as well and judge its validity. This can be accomplished using a DTD. A sample DTD is shown below:

Table II: DTD for the XML document

<!ELEMENT balancesheet (contents)>
<!ELEMENT contents (assets,liabilities,equities)>
<!ATTLIST contents date CDATA #REQUIRED>
<!ELEMENT assets (fassets,cassets)>
<!ELEMENT fassets (item+)>
<!ATTLIST item id CDATA #REQUIRED>
<!ELEMENT cassets (item+)>
<!ELEMENT item (title,rupee)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT rupee (#PCDATA)>
<!ELEMENT liabilities (cliabilities,fliabilities?)>
<!ELEMENT cliabilities (item+)>
<!ELEMENT fliabilities (item+)>
<!ELEMENT equities (item+)>

The DTD clearly identifies each tag and what data values they can hold. The specification for data values is not to be confused with the data types. As their prime limitation (and a very good excuse to avoid DTD’s in favor of XML Schemas or Relax-NG), DTD’s do not allow the specification of data types. Thus, a user can just as well enter a non-monetary value for and get away without the parser ever noticing the error. Data type checks can, however, be done in the application that would process the data.

At this point, we have an elementary system with which to:

Share data as needed
Process data as per the organizational requirements
Display data as per the organizational requirements

We have effectively completed a document structure that can be used to exchange financial information. The proposed language structure could be made to comply with the GAAP and the Securities and Exchange Commission. The compliance can be enforced using the DTD. Due to space restriction and the continuous evolution of information based projects, I have not included the entire code for the markup data and the DTD.

What to do once the document is validated?

Ok, so the data has been marked up and validated. What next? How do we actually process that data? Processing of XML data requires a software program called a parser. Parsers are able to read XML syntax and get the information for us. These parsers are then used in applications; applications are essentially an integrated way to provide a user interface.

So how is the data access actually accomplished? This requires an additional layer to our model.Notice the additional layer called DOM. DOM stands for Document Object Model. DOM is an Application Programming Interface (API) that allows the programmer to access the marked up data and process that data.

Accessing Data using DOM in our applications

Before I dive further into this topic, a discussion of the version and support of DOM is in order. Since XML-related technologies are a new kid on the block, most of them are still in the construction and testing phases. However, DOM has been in use before XML became a W3C recommendation. In fact, it was the very combination of DOM and HTML that made Dynamic HTML (DHTML) possible. Nonetheless, the DOM has been changing to accommodate the latest XML family of technologies (XML family of technologies refers to the various languages that work together to enhance the functionality of XML and includes: XPath, XSL, Xlink, Xpointer, Xquery, etc.)

The DOM specification discussed in this article is for Level 1. Users with IE 5.0 can safely assume that this level of DOM is already installed on their systems.

For our purposes, an elementary web-based application would be designed in ASP. ASP would, in turn, be responsible to communicate with the DOM functions to process the marked up data.

Show me HOW

An ASP code is shown below to access two balance sheet values with ID attributes of 1 and 2, respectively. The total is then output to the browser. I would not go further into the workings of this code, since it involves a whole discussion on the Active Server Pages. This example should suffice to show how easily we can use the DOM API to access data and manipulate it to suit organizational needs.

Table III: ASP application sample code

<%
set objxml=server.createobject("microsoft.XMLDOM")
objxml.load(server.mappath("balsheet.xml"))
set objbalsheet=objxml.documentElement
set objfig1=objbalsheet.selectsinglenode("//item[@id='1']/rupee")
set objfig2=objbalsheet.selectsinglenode("//item[@id='2']/rupee")
fig1=objfig1.text
fig2=objfig2.text
fig1=Mid(fig1,4,10)
fig2=Mid(fig2,4,10)
fig1=clng(fig1)
fig2=clng(fig2)
total=fig1+fig2
response.write total
%>

It would be safe to say that the versatility and power that this application can command is limited only by the creativity of the development team.

What have we achieved?

Huh ha! A question that we all have asked or heard; yet, it never fails to bother us each time that it is asked. I think I owe the answer to my audience who patiently sat through the whole article.

To be truthful, we have not achieved anything. Now wait a minute. Don’t you get all uptight and read on. Why I say such a thing would soon become apparent. We have not achieved anything, in the sense that there is nothing new that we have done, that has not already been done before. In fact, mathematical processing and calculation is at the very heart of a system’s processor.

However, what we have achieved is a new way to communicate information and exchange it electronically. We have discovered how the structure of data can be made both apparent (to applications and humans) and transparent, so that incompatible systems can exchange documents without running into all sorts of errors, simultaneously. Up till now when all the markup languages used American standards, we have effectively found a mechanism to document data using standards devised for businesses that are not based in the North American region. Internationlization is at the very core of all these efforts. Even though what we have done is not new, what we have accomplished is worth investigating. Indeed, if the potential of XML family of technologies is realized and their power is capitalized upon, it would not be anything short of a revolution.

Anything Else I can do with Marked up Data?

Why not! Indeed, that is the whole point of this mark up deal. Once our document follows our standards and has a strict hierarchy it is possible to do just about anything with the data. It can be stored in the database as is as vendors have already started to support XML as a native format. It may be imported in a Microsoft Word or Excel. It may even be converted to formats such as Adobe’s PDF format. The data may be rendered using a web browser. Basically, any application can make use of our data.

Last Thoughts

As with other things in life, there is more than one way to go about designing a data exchange mechanism. The DTD that I have presented above is soon to be replaced by XML Schemas. XML Schemas are more robust than the DTD’s and can support data types. XML Schemas are also more flexible and easier to get used to. Another advantage of Schemas over DTD’s is that the former are written in plain old XML, while the latter are represented in a language called Extended Backus Naur Form (EBNF). DTD’s cannot be accessed using the DOM either. A major limitation of the DTD’s is that there can be only one of them associated with a XML document. Therefore, it is impossible to break a rather complex set of DTD’s into smaller DTD’s for simplicity and clarity.

XML Schemas, however, boast of the features that the DTD’s lack. Unfortunately, as the technology is still in its infancy, XML Schemas are not foolproof –as of yet, any way. Thus, care must be taken if schemas are adopted as the preferred mechanism to validate XML documents. In fact, I was unable to validate my document using the MSXML parser against a XML schema. However, as this whole markup deal grows old, we can reasonably expect XML Schemas to replace DTD’s.

I have also mentioned how DOM can be used to manipulate the marked up data. Another API of use for XML programmers is Simple API for XML (SAX). While DOM treats a XML document as a tree structure in memory, SAX is an event drive set of Java interfaces that process an XML document as a stream.

SAX is open-source and freely available for download. SAX, much like the DOM, is used to analyze XML documents and extract information out of them. However, for larger documents SAX could produce far better results than the DOM (partly, because DOM models the entire document in memory and requires more system resources). Most people would probably not even come across SAX because it is most often used in conjunction with the Java Language. To find out more about SAX, please visit its web site at http://www.megginson.com/SAX/

Before I wrap up, there is another concept that must be discussed. Since XML is a technology that allows the definition of custom tags it is quite likely that two people come up with the same tags that represent different entities. What if the automobile industry decides to use <name> tag to markup the names of automobiles, while the pharmaceutical industry also wants to use <name> to markup the names of drugs. Clearly, such a situation would not have problems as long as there is no exchange of information between the two industries. However, that is rarely the case. Therefore, XML introduced the concept of Namespaces to avoid such conflicts.

A definition of a new markup language would never be complete without the assignment of namespaces to our tags. The reason that I have avoided namespaces is that the intent of this article was to present a macro view of the entire markup process.I will leave that for another article.