Vector Data Language Specification (v1.1)

Stephan Johnson

Rate me:

4.71/5 (4 votes)

22 Feb 2004CPOL8 min read

31.6K

353

This article covers the Vector Data Language Specification for Vector version 1.1

Download Vector - 78.2 KB

Language Specification

Below is the basic structure of the XML document that is called a Vector.

XML

<vectorRoot>
     <header>
     </header>
     <command>
     </command>
</vectorRoot>

The <header> Tag

The <header> tag contains information about the Vector.

XML

<header>
  <name></name>
  <description></description>
  <version>
     <major></major>
     <minor></minor>
     <build></build>
  </version>
</header>

<name>

This tag contains the name of the Vector. It is not used for any other reason than to identify the Vector.

<description>

This tag contains a description for the Vector.

<version>

This tag contains the version information for the Vector.

<header> extension tags

The <header> tag can contain some extensible elements.

The <param> Tag

XML

<param name="myParameter">value</param>

The <param> tag creates a parameter for use inside the Vector. Parameters can be referenced by using the name attribute prefixed with an @ in any string value inside the Vector. By assigning a value to the parameter, the Vector does not require the value to be set at runtime, but if a value is set at runtime, the default value will be overridden.

XML

<var name="myVariable">@myParameter</var>

The code in the above sample will create a new variable, myVariable and assign its value to the value of myParameter.

The <include> Tag

XML

<include path="MSSQL.dll" prefix="mssql"/>

Ever thought of including Microsoft .NET assemblies inside your XML? NO? By implementing the <include> tag, you can include compiled assemblies into Vector. The purpose of this is to call methods inside your assemblies to execute custom actions against your data.

By specifying the full path of the assembly (if it is not in the local executing directory) and a prefix for later access from the Vector.

The <command> Tag

The <command> tag contains the Vector. The Vector is a series of commands that are understood by the Parser. The base Vector Language currently only contains 3 known and understood tags.

<var>

XML

<var name="myVariable">value</var>

Creating and assigning variables are as easy as 1-2-3. The parser always checks to see if the variable has been created before, if the variable exists, the value is simply replaced by the new value, otherwise the variable is created and the value is assigned.

Using the variable inside the Vector is just as simple as using a parameter except that instead of using an @ as the prefix, you use a $ as the prefix. Below are some samples of variables:

XML

<var name="var1">100.25</var>
<var name="var2">@parm1</var>
<var name="var3">$var1</var>
<var name="var2">$var3</var>

<var name="cs">Password=myPass;User ID=sa;
 Initial Catalog=Northwind;Data Source=.;</var>

In the sample above, we create 3 variables and re-assign the value of an existing variable. var1 is created and assigned a value of '100.25'. var2 is created and assigned to the value of parm1. var3 is created and assigned the value of var1, and then finally var2 is re-assigned to the value of var3.

The $cs variable is used in samples through this document.

<create>

The <create> tag creates an object from the assembly defined in an <include> tag.

XML

<create type="object" ns="mssql" 
typename="Vector.Data.MSSQL" name="o_mssql"/>

The syntax is actually quite simple to understand. First, you see the type attribute, it only accepts the value of "object" at this point in time, but later extendable types such as "connection" and "transaction" will also be supported. Don't worry too much about this attribute in this release, just ensure that it is always present.

The ns attribute tells the create procedure at which included assembly to look. Remember that you set the prefix attribute when you added your include tag. These two values must match.

The typename points to the class inside the assembly, use the full namespace and classname.

Then finally the name attribute is the name of the newly created object. This is the value that you must use when calling this object.

NOTE: None of the tags or attributes in Vector have any specific order in which they must appear, I don't look for attributes or elements using numeric indexes, but rather find them by name. Vector relies heavily on the Garbage Collector that .NET claims to be the best in the world, otherwise there would be a <destroy> tag as well.

<call>

The <call> tag is responsible for calling methods inside the objects that you have created.

XML

<call object="o_mssql" methodname="connect" cs="$cs" 
  name="mssql_connect"></call>

<call object="mssql" methodname="query" 
  name="customer" qry="select distinct 
 c.customerid, c.companyname, c.contactname 
 from customers as c right join orders as o 
 on o.customerid = c.customerid where
 c.customerid = @custId order by c.customerid" />

Hmmm, how do I explain the <call> tag. It is in essence a very powerful tag. It has some basic attributes which are required to execute the method, and any other attributes are matched up to the parameter names that are required by the method and passed to the method as such. Simple, eh?

I thought so too. The first required attribute would be the object. Remember the object that you created using the <create> tag. This is the name that you assigned to the object. In this case, it is <ecode>o_mssql.

Then you require the methodname attribute to tell the Parser which method to look for inside the object. In this case, we are using the connect method.

The last required attribute is the name for the results returned from the call. Right now, the relevance for having a name assigned to a call is not clear, but by the time you reach the end of this document, and have read through the data processing section, you will understand the relevance that this attribute has.

In the above sample, you will notice the cs attribute. It is pointing to the $cs variable that we have declared before. It contains the Connection String for connecting to a Microsoft SQL Server database.

Now let's have a look at the code for the Vector Extension.

using System;
using System.Xml;

namespace Vector.Data
{
  /// <SUMMARY>
  /// The Vector.Data.MSSQL class contains methods
  /// to connect to Microsoft
  /// SQL Server data sources and execute TransactSQL
  /// queries against them.
  /// </SUMMARY>
  public class MSSQL
  {
    private System.Data.SqlClient.SqlConnection x_connect;

    /// <SUMMARY>
    /// Vector.Data.MSSQL Constructor.
    /// </SUMMARY>
    public MSSQL()
    {
    }

    public void connect(XmlNode commandNode, string cs)
    {
      // this code removes the '' that the replaveValues method in
      // the parser adds to replaced values.
      string cn = cs.Substring(1, (cs.Length - 2));

      x_connect = new System.Data.SqlClient.SqlConnection(cn);

      x_connect.Open();
    }

    /// <SUMMARY>
    /// query VDL extension. This language extension 
    /// executes TSQL queries 
    /// against MSSQL data sources.
    /// </SUMMARY>
    ///<param name="commandNode" />The node containing the command meta data. System.
    ///<param name="cs" />The connection string for the MSSQL data source.
    /// <RETURNS></RETURNS>
    public XmlNodeList query(XmlNode commandNode, string qry)
    {
      string ct = qry;
      System.Data.DataSet m_dataset;

      System.Data.SqlClient.SqlDataAdapter o_query = new
        System.Data.SqlClient.SqlDataAdapter(ct,
        x_connect);

      m_dataset = new System.Data.DataSet(commandNode.Attributes["name"].Value 
        + "s");
      o_query.Fill(m_dataset, commandNode.Attributes["name"].Value);

      XmlDocument tmp_query_xml = new XmlDocument();
      tmp_query_xml.LoadXml(m_dataset.GetXml());

      XmlNode retNode = tmp_query_xml.CreateNode(XmlNodeType.Element, 
        commandNode.Attributes["name"].Value, "");
            
      XmlNodeList results = tmp_query_xml.SelectNodes(@"//" + 
        commandNode.Attributes["name"].Value);

      return results;
    }
  }
}

You notice that there are two publicly exposed methods in the Vector.Data.MSSQL object. connect and query, the connect method creates a new database connection, whereas the query method executes the query (specified in the qry parameter) against the database.

And if you have actually been reading this document... you will notice that there are anomalies in what I have said as to how the <call> method works. These are visible in the required commandNode parameters in the methods. I never pass this parameter from the <call> tag. The reason for this is that there are three parameters that are populated by the Parser if they are required by the calling method.

These are commandNode, dataNode and dataNodes. The commandNode passes the current node from Vector to the method.

XML

<call object="o_mssql" methodname="connect" cs="$cs" 
  name="mssql_connect"></call>

The two dataNode parameters individually pass either the single data node which is being processed at this time, or the collection of sibling nodes that are being processed at this time.

How Vector Processes the Data

Vector has a very unique way of processing data. Due to the flexibility of XML, Vector can process all of its commands recursively, and therefore execute recursive commands on recursive data.

Look at the sample below:

XML

<?xml version="1.0" encoding="UTF-8"?>
<vectorRoot>
  <header>
    <name>Vector 1.1</name>
    <description>A complex Vector</description>
    <version>
      <major>1</major>
      <minor>1</minor>
      <build>jackal</build>
    </version>
        <param name="custId">A%</param>
    <include path="MSSQL.dll" prefix="mssql"/>
  </header>
  <command>
    <var name="cs">Password=myPass;User ID=sa;Initial Catalog=Northwind;
      Data Source=.;</var>
        <create type="object" ns="mssql" 
              typename="Vector.Data.MSSQL" name="o_mssql"/>
    <call object="o_mssql" methodname="connect" 
              cs="$cs" name="mssql_connect"/>
    <call object="o_mssql" methodname="query" 
            name="customer" qry="select distinct 
     c.customerid, c.companyname, c.contactname 
     from customers as c right join 
     orders as o on o.customerid = c.customerid where c.customerid LIKE @custId 
     order by c.customerid">
      <var name="cid">!#.//customerid##</var>
      <call object="o_mssql" methodname="query" name="order" 
       qry="declare @oc int select @oc = count(customerid) from orders where 
       customerid = $cid if (@oc > 0) begin select customerid, orderid, 
       orderdate, shippeddate, employeeid from orders where customerid = 
       $cid end">
        <var name="oid">!#.//orderid##</var>
        <call object="o_mssql" methodname="query" name="item" 
         qry="declare @oi int select @oi = count(orderid) from 
         [order details] where orderid = $oid if (@oi > 0) begin 
          select orderid, productid, unitprice, quantity, discount 
           from [order details] where orderid = $oid end"/>
      </call>
    </call>
  </command>
</vectorRoot>

WOW! Looks complicated, doesn't it! Well, it isn't. We will skip the <header> tag except for the <param> and the <include> tags.

Firstly, we create a parameter called custId which is assigned to 'A%' (for use with a LIKE compare in SQL). Then we include the MSSQL.dll assembly as 'mssql', this assembly must be located in the same directory as the Parser.dll because there is no full path specified.

Moving on to the <command> tag, we create a variable to hold the connection string for the database. Then we create the MSSQL data access object. We need to connect to the database before we can execute our queries against it, therefore the <call> to the connect method.

The first process is where we select the required customer data by using:

SQL

select distinct c.customerid, c.companyname, c.contactname from customers as c 
right join orders as o on o.customerid = c.customerid where c.customerid = @custId 
order by c.customerid

You will notice the use of the parameter custId in this query.

NOTE: Parameters and variables are CASE SENSITIVE.

Now you will notice that I have nested some commands inside the <call> tag that executes the query:

XML

<var name="cid">!#.//customerid##</var>

Huh? !#.//customerid##. That looks almost like an XPath. That is because it is an XPath. When the Parsers loads, it creates a blank XMLDocument for the result that is to be passed back to the calling application.

As every command tag executes, it appends the output of the command as a child node of the XMLDocument. This means that you can assign variables to the value of an XPath query. XPath queries are located between the start delimiter !# and the end delimiter ##.

In the example above, the variable is assigned to the value of the customerid that is returned from the database. I think it is important to note at this stage that when you are using nested command tags, the nested command will be executed against every returned result node (or record for that matter) from the parent tag.

This means that if we get 5 results back from the database, the nested commands will be executed 5 times. Each time against the individual record.

Now that we have our variable assigned to the value of the customerid, we can look for any Orders that the customer might have placed with our company. You will notice the SQL gets a little more complicated at this point in time.

SQL

declare @oc int select @oc = count(customerid) from orders where customerid = $cid 
if (@oc > 0) begin select customerid, orderid, orderdate, shippeddate, employeeid 
from orders where customerid = $cid end

The simple reason for this is that not all of our customers might have placed orders yet, and that the results will throw an error if no orders are found. So the SQL was modified to first check if the customer has orders.

Now that you have the hang of this, you will see the exact same principle applied to get the Order Items for every Order. This is the output generated by the Parser.

XML

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <customer>
    <customerid>ALFKI</customerid>
    <companyname>Alfreds Futterkiste</companyname>
    <contactname>Maria Anders</contactname>
    <orders>
      <order>
        <customerid>ALFKI</customerid>
        <orderid>10643</orderid>
                <orderdate>1997-08-25T00:00:00.0000000+02:00</orderdate>
        <shippeddate>1997-09-02T00:00:00.0000000+02:00</shippeddate>
        <employeeid>6</employeeid>
        <items>
          <item>
            <orderid>10643</orderid>
            <productid>28</productid>
            <unitprice>45.6</unitprice>
            <quantity>15</quantity>    <discount>0.25</discount>
          </item>
                    <item>
            <orderid>10643</orderid>
            <productid>39</productid>
            <unitprice>18</unitprice>
            <quantity>21</quantity>
            <discount>0.25</discount>
          </item>
          <item>
            <orderid>10643</orderid>
            <productid>46</productid>
            <unitprice>12</unitprice>
            <quantity>2</quantity>    <discount>0.25</discount>
          </item>
        </items>
      </order>
    </orders>
  </customer>
</root>

Due to brevity, I have omitted all but the very first results.

Now, some of you, especially those with extensive SQL experience will say that all of this is useless and over complicated because "I can do the same thing in SQL".

SQL

SELECT customers.customerid, customers.companyname, customers.contactname,
  orders.orderid, orders.employeeid, orders.orderdate,
  items.productid, items.unitprice, items.quantity, items.discount
FROM customers AS customers
RIGHT JOIN orders AS orders
ON orders.customerid = customers.customerid
RIGHT JOIN [order details] AS items
ON items.orderid = orders.orderid
WHERE customers.customerid LIKE 'A%'
FOR XML AUTO, ELEMENTS

All fine and well, but you cannot write your own custom extensions that can process on a per record basis, or with the same ease, execute the sub-queries against different databases (e.g., a SQL Server hosts the Customers table, and ODBC Oracle server hosts the Orders table, and an old UNIX system stores the Order Items in separate CSV files.) Only Vector can achieve this seamless data integration and provide useful data back in an XML format.

Conclusion

Visit my website for more information about the future development of Vector. My plans are to release the code under a Shared Source license similar to the Shared Source License found in the .NET CLI.

Register as a developer. The serial number required to install the Vector distribution included in this article is 00000-00000-00000-00000-00070.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Written By

Stephan Johnson

We Fix Code

South Africa

This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.