For enterprises which have been in business for decades, problems due to silos of applications and data that evolved over the years is a common issue. These issues sometimes become show stoppers when an enterprise is starting a new strategic initiative to revamp its IT portfolio, to float new business models and explore new business opportunities.
This article is going to discuss possible options available for unification of data silos, and how efficiently an enterprise can expose its data with minimal effort by leveraging the recent advancements in technology.
Scenarios in age old enterprises
Common key scenarios in enterprises that are in business for decades are:
- Data in silos spread across multiple, intermediate RDBMS systems and legacy data sources like Mainframes, AS400 etc.
- Data duplicated across multiple applications / data sources, and hence lot of synchronization batch jobs.
- Business exceptions in applications because of non-availability of real time data.
- Different applications adopting different paths to access the same data and hence embedded with the same validation logics / business rules in multiple places - business rules duplicated.
- Data in proprietary databases / directories aroused as a result of M&A events.
- Applications on heterogeneous platforms consuming data from different types of data sources through proprietary protocols.
The scenarios listed are not complete, just cursors. These scenarios become constraints for any new initiative that an enterprise wants to start, to remain competitive in business.
Possible solutions usually adopted are unification of data silos, and dismantling the monolithic applications into services to eliminate duplicated business rules and logics. In some cases, enterprises will take an iterative journey where they will allow applications that are resulting out of new initiatives to rest aside the existing applications with some stop gap arrangements. Even then, unless data silos are unified and made available for consumption by the applications, problems due to duplicated data will affect the benefits from new applications.
Unification can happen under different levels: at the data source level, or at consumption level. When it happens at data source level, applications can still adopt their own style to consume the data, but all the applications should talk to the same source. This style of unification could be more time consuming, and involves effort related to testing, data migration, etc. But it will eliminate synchronization batch routines. This should be the long term goal of any enterprise.
In case of unification at consumption level, all the applications have to be made to adopt the same style of data consumption. Here, there will not be any effort related to data migration. But all the applications have to be revamped to adopt the one common style/pattern of consumption.
Unification at consumption level can be a good option based on a good school of thought: "Abstraction". It could be a starting point for a strategic initiative with a goal to achieve unification both at source level and consumption level. The amount of effort involved in this style depends on the number of applications. But the good thing is, it will not encourage applications to adopt different styles of data access mechanisms. This style of unification results in façades over data, which are called "Data Services".
Data Services can be considered in one of the categories in the service taxonomy of an enterprise. They are also called "Entity Services". The dynamics of these services will change based on the context under which they are used. In the scenarios where they are used for Insert / Delete / Update operations, it is the responsibility of the data service facades to perform any validations / business rules execution. In the context where consumers use them only for "lookup" data or under read-only mode, they have no other responsibility apart from performing initial security checks (authentication alone).
Data services can be made available in different formats / modes: SOAP services (Operations), RESTful services (Resources). Here, I used the word "Modes" to mean Operations and Resources. The factors that could decide the format and mode are contexts [read-only data access, data access for IDU (Insert, Delete, and Update operations)], and the type of the customers (which we will see in detail in the coming section).
In the scenarios where it involves lots of IDU (Insert, Delete, and Update) operations, since the validation logic will be part of data services, there will not be any duplication as in the case of "unification at data source" level. Hence, maintenance will be comparatively easier. When unification is carried out at data source level, especially in scenarios where data sources are available in both RDBMs like MS SQL Server, Oracle etc., and legacy systems like Mainframes, the application will hold the responsibility of data / rule validation. The more the number of applications, the more duplications and maintenance will become a headache.
In case of "Read-only data access", exposing data as RESTful services could be the best option because of the following key benefits:
- Light weight - RESTful services generate simple, plain old XML output which is lighter than the XML output from SOAP based Web Services. In the case of Web Services, the XML output generated is heavy because of lots of metadata information.
- Flexibility - It provides flexibility to generate output in different data formats: XML and JSON. JSON is even lighter than plain old XML.
In addition, REST has a lot of other benefits. Detailing them here is out of the scope of this article.
In the case of the "data access for IDU operations" context, SOAP Web Services will be the best option. If we go for adopting REST style here, we cannot leverage the benefits of advanced Web Service standards (WS-*, i.e., WS-Security, WS-ReliableMessaging, WS-addressing, WS-Coordination etc.) like enlist in transactions, reliable messaging, message level encryption, etc.
Data Services Consumers
Consumers of data services can be broadly classified as:
- Consumers that can consume SOAP /RPC Web Services (typical .NET client applications)
These includes typical web applications, desktop applications, and Enterprise applications (ERP, CRM etc).
- Consumers that need to rely on AJAX for responsiveness (AJAX applications)
- RIA consumers
These are browser plug-in based applications that provide a rich user experience. They can consume SOAP Web Services as well as REST style services, but mostly restricted to asynchronous mode. Silverlight is one such browser plug-in that helps in building RIAs.
Based on the type of the client, data services need to be made available in a particular format. For the clients that can consume XML and SOAP, it can be made available as Web Services. In this case, client applications communicate with the Web Services through proxy objects generated in the respective platform based on the Web Services metadata. This is what is generally adopted in many of the enterprises.
Nowadays, even applications that are capable of consuming Web Services are switched to consume RESTful URLs because of their advantages over SOAP-RPC.
Data on REST
The fundamental difference between SOAP /RPC and RESTful is that in SOAP based Web Services, data is exposed as "operations", while in the case of REST, it is exposed as "resources". When data entities are exposed as resources, consumers have the freedom to query it in a way they want, instead of confining to operation signatures.
In the communication between the web client and server side service endpoints, a serialization format is required to convert platform specific types to HTTP network packets. In the case of RESTful services, the formats available are XML and JSON. For decades, XML remained as the only choice for this. In the past few years, JSON is gradually becoming the choice of data format for the communication between browser and server because of its simplicity and lightweight.
With RESTful services, we have options to expose our entities in both XML and JSON so that we can cover a wider client base.
Having seen the significance of REST and its benefits, it's time now to see how we can expose our entities through RESTful URLs with minimal effort. The picture shown below depicts ADO.NET Data Services in the context of different consumers:
How to REST it?
There are different options available to expose our data / entities as RESTful URIs / services. The general approach is to develop WCF services and to do a workaround to make them available as RESTful URIs. Here, we need to develop the data access layer, business components layer, and then the service layer. With the release of the "WCF REST Starter Kit", this work becomes a bit easier. But, we still need to stick to different layers for various reasons like security, abstraction, maintainability, etc.
When we need to expose business logic or processes, there is a justification to stick to these layering principles and expose them as operations. But when we need to just make available our data / entities for consumption for various types of clients, or when we need to allow external clients just to perform a lookup on enterprise data, the better option will be to adopt a framework which provides us facilities to expose our entities / data as RESTful URLs without much effort. One of such frameworks is "ADO.NET Data Services", available from Microsoft.
ADO.NET Data Services allows us to expose our data across the network, by leveraging the Microsoft Windows Communication Foundation's capability to create REST style services and Microsoft LINQ's capability to perform queries on data sources like entity objects (like the MS ADO.NET Entity Framework).
ADO.NET Data Services allows us to expose our Entities / Data in a RESTful way viz. through RESTful URIs without much effort. It provides us the options to expose data in two formats: XML Atom and JSON. It serializes data as Atom or JSON based on the "Accept" HTTP header of the request.
Let me walk you through the steps that will allow you to expose your data in a few clicks:
For this exercise, I have created two tables in MS SQL Server Express 2008:
- tblEmp - To hold the basic details of employees
- tblCompDtls - To hold the compensation details of employees
[I have attached the scripts to create those tables and to insert the data in the download file above.]
Step 1: Create a new ASP.NET web application in MS Visual Studio 2008 [Professional edition would suffice].
Step 2: Add a database connection to your new ASP.NET Web Application project through the Tools > Connect to Database menu options. This connection should point to the database in which you create these tables.
Step 3: Now, from the Add > New Item menu options of your ASP.NET web application project, add a new "ADO.NET Entity Data Model" project item. It is available under the section "Data". Name it as "EmployeeModel".
Clicking "Add" will bring you the next screen.
Select the "Generate from database" option, as highlighted in the above picture. Then, click Next. You will get the screen depicted below. Specify the entity name as "EmployeeEntity".
Click the "Next" button now.
The next screen will show the objects from the database. Select the tables "tblCompDtls" and "tblEmp". Type the name of the model as "EmployeeModel". Here, tblEmp represents the entity "Employee". Click the "Finish" button. Now, the entity model will be generated.
The picture depicted above shows the model generated.
Having created the entity model, now you are ready to expose your data in a "RESTful" way.
Step 4: Now from the Add > New Item menu options of your ASP.NET web application project, add a new "ADO.NET Data Services" project item. It will be available under the "Web" section.
Name it as "EmployeeDataService". Notice that its extension is ".svc". Does it remind you of some thing? Yes, it is the same extension used to represent a WCF service. The ADO.NET Data Services framework totally relies on the WCF model. Click the "Add" button.
Now, Visual Studio will generate for you the code-behind file [EmployeeDataService.svc.cs] for the data service you have created. Change the following portion of code from:
public class EmployeeDataService :
public class EmployeeDataService : DataService<EmployeeEntity>
Also, uncomment the following portion of the code:
Don't forget to alter this code portion as [replace the words "
MyEntityset" and "
MyServiceOperation" as "*"]:
Else, you will hit the error "The server encountered an error processing the request. See server logs for more details." when running the project.
Hit F5 now. You will see the browser window as:
It shows all the tables (Entities) selected. To view only the employee entity, change the URL as http://localhost:62800/EmployeeDataService.svc/tblEmp. It will list the employees. [Check for the exact port number in your system.]
If you host this service in IIS, you can access it as http://localhost/DataServicesDemo/EmployeeDataService.svc/tblEmp, without worrying about the port address.
So now, we successfully exposed our data in a RESTful way without much effort. The ADO.NET Data Services default format is XML Atom. We can also make it to emit in JSON format.
Let us perform it with a small tool named "fiddler2", which is an amazing tool from Microsoft, for inspecting HTTP traffic between a client and server. For more details on this tool related to download, installation, and usage, visit http://www.fiddler2.com/fiddler2/.
In Fiddler, click the "Request Builder" button. Type the URL http://localhost/DataServicesDemo/EmployeeDataService.svc/tblEmp in the URL text box. Select the option as "GET". Let "HTTP/1.1" be the request type. Now, click the "Execute" button.
From the left pane, click the request link. With the options "Raw" and "Textview" selected in the right upper and lower pane, respectively, you can see the details on the HTTP request that was made and the response. The above picture shows the default response (XML-Atom). You can make the service to emit in JSON format by switching back to the "Request Builder" view and adding the following lines:
The resulting request header info will look like:
GET /NorthwindDataService/Northwind.svc/ HTTP/1.1
Now, click the "Execute" button. Now, the service will return a JSON output as depicted in the following screen:
As I had mentioned earlier, JSON is lightweight than Atom format. I exported both the outputs in two separate Notepad files and saved them [you can do this by clicking the "View in Notepad" button]. The resulting file sizes are:
- XML - Atom output - 10 KB
- JSON output - 4 KB
As you can see, the XML output is approximately 50% heavier than JSON!
In every enterprise, there is a common need to expose data/entities as services. That too with the growth of Web 2.0 / RIA / AJAX applications, there is a definite need to expose our data as RESTful URIs and in multiple formats like JSON and Atom. Leveraging ADO.NET Data Services will make it comparatively easier for us to expose our data for the wider variety of applications, in various formats. Exposing data in JSON format improves performance because of the decrease in response payload, and hence minimizes the impact on the network bandwidth and other resources.