Introduction
In continuation to my CodeProject article that I wrote last year on Cassandra/.NET programming, this year's article is related to .NET programming with HP Vertica, which is another BigData platform.
This article covers three aspects in details. First on HP Big Data platform Vertica and its dimension. Secondly, .NET application layer leverages HP Vertica data storage. Finally, .NET code samples are used for CRUD (Create Read Update Delete) data operations.
Data Warehouse
A data warehouse is a central repository for all or significant parts of the data that an enterprise's various business systems collect. The term was coined by W. H. Inmon.
Applications of data warehouses include data mining, Web Mining, and decision support systems (DSS).
Traditional Data Warehouse
Traditionally, data warehouse is housed on an enterprise mainframe server. Data from various online transaction processing (OLTP) applications and other sources is selectively extracted and organized on the data warehouse database for use by analytical applications and user queries. Data warehousing emphasizes the capture of data from diverse sources for useful analysis and access.
Latest Data Warehouse
In the latest industry trend, Data Warehouse is built not only with internal system but also from the related external systems for better clarity and granularity.
The world of business intelligence (BI) and data warehousing keeps evolving. Technologies such as analytical database servers, big data, NoSQL, mashups, SaaS BI, data virtualization servers, advanced analytics, and SOA, have become available and offer new possibilities to organisations.
Comparison
As the result of detailed analysis, traditional warehouse and latest big data warehouse are compared as below:
HP Vertica
What Is It?
Vertica Systems is an analytic database management software company, founded in 2005, acquired by HP (Hewlett Packard) on Mar 2011.
The cluster-based, column-oriented, Vertica Analytics Platform is designed to manage large, fast-growing volumes of data and provide very fast query performance when used for data warehouses and other query-intensive applications.
Concepts
HP Vertica is built on a foundation of four distinguishing characteristics (the four Cs):
- Column storage: store data to optimize for access
- Compression: store more data in less physical storage
- Clustering: scale your database easily, whether you need three nodes or 300
- Continuous performance: optimize the database on an ongoing basis, automatically
Why Vertica?
Pretty straight forward question as the traditional data storage programmer.
In modern business, actionable data is becoming more important than ever. The ability to manage data quickly and effectively is a huge competitive advantage - an advantage easily attained with the Vertica Analytics Platform. Vertica is a powerful, analytic database that allows users to extract value from massive amounts of data at lightning fast speeds. Big Data needs are achievable using the above 4C concepts.
Who Use It?
Some of the biggest names in data, including Facebook. Facebook selected the HP Vertica Analytics Platform as one component of its big data infrastructure. Reference Link.
Vertica External Tables
With the latest version of Vertica, it is now possible to use the full expressive power of Vertica Analytics Engine and its analytics without having to load the data into Vertica. It is depicted in the below diagram.
Microsoft Connectivity
HP Vertica allows you to integrate your HP Vertica server with an environment that includes Microsoft components previously installed on your system. The Connectivity Pack includes the ADO.NET client driver and additional tools for integration with Microsoft Visual Studio and Microsoft SQL Server.
HP Vertica's Microsoft Connectivity Pack allows you to integrate your HP Vertica server with an environment that includes Microsoft components previously installed on your system. The Connectivity Pack includes the ADO.NET client driver and additional tools for integration with Microsoft Visual Studio and Microsoft SQL Server.
Specifically, the Connectivity Pack is an enhanced .NET data provider package and provides integration with the following Microsoft components:
- Business Intelligence Development Studio (BIDS) for Visual Studio 2008 for use with SQL Server 2012. BIDS is a client-based application used to develop business intelligence solutions and based on the Microsoft Visual Studio development environment; BIDS includes additional project types specific to SQL Server Business Intelligence. Developers use BIDS to develop business solutions.
- SQL Server Data Tool - Business Intelligence (SSDT-BI) for Visual Studio 2010/2012 for use with SQL Server 2012. SSDT-BI replaces BIDS for Visual Studio 2010 and 2012. It serves the same purpose as BIDS, providing a development environment for developing business intelligence solutions.
- SQL Server Analysis Services (SSAS) for SQL Server 2008 and 2012. Use SSAS for OLAP and data mining, using HP Vertica as source for cube creation.
- SQL Server Integration Services (SSIS) for SQL Server 2008 and 2012, which provides SQL Server Type Mappings to map data types between HP Vertica and SQL Server. Use SSIS for data migration, data integration and workflow, and ETL.
- SQL Server Reporting Services (SSRS). Use SSRS to generate reports.
Vertica .NET Integration
Infra Architecture
In Vertica .NET integration effort, Infra architecture is designed as below:
App Server layer is added with Vertica ODBC connectivity component to establish the communication between .NET App and HP Vertica data layer.
Data layer contains cluster-based HP Vertica DB. Presentation layer stays as it is.
Logical Architecture
As per the industry standard n-tier architecture, HP Vertica .NET integrated Application logical architecture goes as below:
Connectivity Code
Programmers should use connection properties to configure the connection between ADO.NET client application and HP Vertica database.
You can set a connection property in two ways:
- Include the property name and value as part of the connection string you pass to a
VerticaConnection
. - Set the properties in a
VerticaConnectionStringBuilder
object, and then pass the object as a string
to a VerticaConnection
.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Vertica.Data.VerticaClient;
namespace VerticaApp
{
class Connectivity
{
static void Main(string[] args)
{
VerticaConnectionStringBuilder builder = new VerticaConnectionStringBuilder();
builder.Host = "192.10.10.10";
builder.Database = "TestDB";
builder.User = "AppUser";
VerticaConnection conn = new VerticaConnection(builder.ToString());
conn.Open();
conn.Close();
}
}
}
CRUD Operation
CRUD operation is the best use case to explain the functionalities between Application and Data Storage layer. In the given sample, the working code (except delete) has been attached in details.
Create Code
You can load data in batches using a prepared statement with parameters. You can also use transactions to rollback the batch load if any errors are encountered.
If you are loading large batches of data (more than 100MB), then consider using a direct batch insert.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Vertica.Data.VerticaClient;
namespace VerticaApp
{
class Create
{
static void Main(string[] args)
{
VerticaConnectionStringBuilder builder = new VerticaConnectionStringBuilder();
builder.Host = "192.10.10.10";
builder.Database = "TestDB";
builder.User = "AppUser";
VerticaConnection conn = new VerticaConnection(builder.ToString());
conn.Open();
VerticaCommand command = conn.CreateCommand();
command.CommandText =
"INSERT into Employee values(1001, 'FirstName', 'LastName', 'eMail', 'Dept7')";
Int32 rowsAdd = command.ExecuteNonQuery();
Console.WriteLine(rowsAdd + " rows added!");
conn.Close();
}
}
}
Read Code
The attached code snap is to retrieve the content from the existing vertica table into .NET application layer.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Vertica.Data.VerticaClient;
namespace VerticaApp
{
class Read
{
static void Main(string[] args)
{
VerticaConnectionStringBuilder builder = new VerticaConnectionStringBuilder();
builder.Host = "192.10.10.10";
builder.Database = "TestDB";
builder.User = "AppUser";
VerticaConnection conn = new VerticaConnection(builder.ToString());
conn.Open();
VerticaCommand command = conn.CreateCommand();
command.CommandText =
"SELECT Mast.EmpID, Mast.FirstName, Mast.LastName, Mast.Mail, Ref.DeptName " +
" FROM Employee Mast, Department Ref " +
" WHERE Mast.DeptID == Ref.DeptID " +
" ORDER BY Mast.EmpID " +
" LIMIT 100 ";
VerticaDataReader reader = command.ExecuteReader();
Console.WriteLine(" EmployeeName\t Department");
Console.WriteLine("------------\t -------------------");
int rows = 0;
if (VerticaDataReader.HasRows())
{
while (reader.Read())
{
Console.WriteLine(" " + reader[1] + reader[2] + " \t " + reader[4]);
++rows;
}
}
Console.WriteLine("Total Read Rows: (" + rows + ")");
reader.Close();
conn.Close();
}
}
}
Update Code
Update scenario is to alter the content of the existing Vertica table elements for the given condition. It is derived by Vertica Client library as given.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Vertica.Data.VerticaClient;
namespace VerticaApp
{
class Update
{
static void Main(string[] args)
{
VerticaConnectionStringBuilder builder = new VerticaConnectionStringBuilder();
builder.Host = "192.10.10.10";
builder.Database = "TestDB";
builder.User = "AppUser";
VerticaConnection conn = new VerticaConnection(builder.ToString());
conn.Open();
VerticaCommand command = new VerticaCommand();
command.Text =
"UPDATE Employee " +
" SET Mail = " + "test@test.com" +
" WHERE FirstName = @First AND " +
" LastName = @Last ";
command.Parameters.Add(new VerticaParameter( "First", VerticaType.VarChar));
command.Parameters.Add(new VerticaParameter( "Last", VerticaType.VarChar));
command.Parameters["First"] = 'Ganesan';
command.Parameters["Last"] = 'Senthilvel';
Int32 rowsUpdate = command.ExecuteNonQuery();
conn.Close();
}
}
}
Points of Interest
It is interesting to know the HP Vertica big data and .NET programming concepts along with CRUD code level implementation.
History
Ganesan Senthilvel is a passionate IT leader with demonstrated 2 decades’ experience in architecture, design and implementing cutting edge solutions to address business opportunities of Enterprise applications. He earned Masters in Computer Science and Master in Business Administration. Now, he is pursing Doctorate program in Big Data. He is consistent technical contributor via COE, Blog, Whitepaper, Summit, Certification, etc. and also highly skilled at providing coaching and mentoring to internal teams and external institutes. Earned Microsoft Certifications like MCP, MCAD and financial domain certification.
He maintains his weekly blog at http://ganesansenthilvel.blogspot.in