Click here to Skip to main content
15,879,326 members
Articles / Programming Languages / C#
Tip/Trick

How to Communicate to Hadoop via Hive using .NET/C#

Rate me:
Please Sign up or sign in to vote.
4.68/5 (15 votes)
4 Mar 2014CPOL3 min read 96.8K   27   11
Connect to database in Hive

Introduction

Before I start telling you my problem, I have put down certain terms that are relevant to my problem. All the definitions are basically excerpts from Wikipedia.

What is BigData?

Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis and visualization. Big data is difficult to work with using most relational database management systems and desktop statistics and visualization packages, requiring instead massively parallel software running on tens, hundreds, or even thousands of servers.

What is Hadoop?

Hadoop is an open-source framework from Apache Software Foundation. It emerged as a solution for storing as well as processing BigData. Hadoop consists of the Hadoop Common package, which provides filesystem and OS level abstractions, a MapReduce engine and the Hadoop Distributed File System (HDFS).

What is MapReduce?

MapReduce is a programming model for processing large data sets with a parallel, distributed algorithm on a cluster. A MapReduce program is composed of:

  1. Map() procedure performs filtering and sorting.
  2. Reduce() procedure that performs a summary operation.

What is Hive?

Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis.

What is HiveQL?

HiveQL is based on SQL, but do not strictly follow the full SQL-92 standard. Internally, a compiler translates HiveQL statements into a directed acyclic graph of MapReduce jobs, which are submitted to Hadoop for execution.

What is my problem?

I was looking for a code snippet which can connect to Hadoop via HIVE using C#. The following discussion will help you connect to HIVE and play with different tables and data underneath. It will also provide you a ground to explore Hadoop/HIVE via C#/.NET.

Background

I Googled everywhere in this regard but could gather few vague references only from Stackoverflow or some other sites. I have added limitations that I cannot use Azure HDInsight.

Using the Code

To begin, you need to download Microsoft® Hive ODBC Driver. The different parameters and their value that can be assigned are explained in detail in this section (Appendix C: Driver Configuration Options) of this article.

Following are the important parameters to get-set ConnectionString. Rest of the parameters can be set as required by ones application.

  • DRIVER={Microsoft Hive ODBC Driver}
  • Host=server_name
  • Port=10000
  • Schema=default
  • DefaultTable=table_name

DRIVER={Microsoft Hive ODBC Driver} is the name of the actual driver.

Host=server_name is the name of the server where the Hadoop is running

Port=10000 is the default port, but you can assign your own.

Schema=default is default database. You can create your own.

DefaultTable=table_name is the name of a table in HIVE system.

Function GetDataFromHive() connects to Hadoop/HIVE using Microsoft® Hive ODBC Driver.

SELECT * FROM table_name LIMIT 10 tells database to bring the TOP(10) records from database in SQL Server style.

C++
private void GetDataFromHive(){
   var conn = new OdbcConnection
                  {
                      ConnectionString = @"DRIVER={Microsoft Hive ODBC Driver};                                        
                                        Host=server_name;
                                        Port=10000;
                                        Schema=default;
                                        DefaultTable=table_name;
                                        HiveServerType=1;
                                        ApplySSPWithQueries=1;
                                        AsyncExecPollInterval=100;
                                        AuthMech=0;
                                        CAIssuedCertNamesMismatch=0;
                                        TrustedCerts=C:\Program Files\Microsoft Hive ODBC Driver\lib\cacerts.pem;"
                  };
    try 
    {
        conn.Open();

        var adp = new OdbcDataAdapter("Select * from table_name limit 10", conn); 
        var ds = new DataSet();
        adp.Fill(ds);

        foreach (var table in ds.Tables)  
        {
            var dataTable = table as DataTable;

            if (dataTable == null)
                continue;

            var dataRows = dataTable.Rows;

            if (dataRows == null)
                continue;

            //log.Info("Records found " + dataTable.Rows.Count);

            foreach (var row in dataRows)
            {
                var dataRow = row as DataRow;
                if (dataRow == null)
                    continue;

                //log.Info(dataRow[0].ToString() + " " + dataRow[1].ToString());
            }
        }

    }
    catch (Exception ex)
    {
       // log.Info("Failed to connect to data source");
    }
    finally
    {
        conn.Close();
    }
} 

Points of Interest

BigData is coming a big way as traditional relational databases such as SQL Server, Oracle, Sybase and others are finding it more and more difficult to handle big data and data in varied(structured/document-style/unstructured, etc.) formats. In this regard, Hadoop is fast emerging as one of the solutions that big banks, and other data mining industries are embracing. This piece of code will help you talk to Hadoop and will accelerate your effort to solve the problem at hand.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Architect Siliconguys Inc
United States United States
Worked in various projects inlcuding WPF, WCF, Silverlight, MongoDB, Hadoop and Web development projects using ASP.NET, AJAX, C#, JavaScript, Web Services and SQL Server and Oracle.

Comments and Discussions

 
QuestionRegarding Hive Insatallation! Pin
Member 108689601-Oct-15 5:40
Member 108689601-Oct-15 5:40 
QuestionDSN Less Connections Pin
balachaitanya12313-Aug-15 21:16
balachaitanya12313-Aug-15 21:16 
AnswerRe: DSN Less Connections Pin
asif rehman baber21-May-20 11:42
asif rehman baber21-May-20 11:42 
Questionwriting data to hive server2 table using c# Pin
Member 1169456521-Jul-15 0:14
Member 1169456521-Jul-15 0:14 
QuestionHadoop and c# trip. Problem with hive server starting Pin
Member 1169456520-May-15 0:30
Member 1169456520-May-15 0:30 
QuestionMy Vote 4 Pin
sudevsu6-Jan-15 3:47
sudevsu6-Jan-15 3:47 
QuestionError IM002 Data source name not found and default driver specified Pin
anhdung8825-Oct-14 6:18
anhdung8825-Oct-14 6:18 
Hi Rajib,
I'm following your tutorial. I have installed the ODBC driver and tested connection ok in ODBC admin console and been able to query data to excel. But when I debug your code, when opening connection, it raises exception in the subject.
This is my connection string:
C#
@"DRIVER={Microsoft Hive ODBC Driver};
                                        Host=192.168.0.104;
                                        Port=10000;
                                        Schema=default;
                                        DefaultTable=nyse_stocks;
                                        HiveServerType=2;
                                        ApplySSPWithQueries=1;
                                        AsyncExecPollInterval=100;
                                        AuthMech=0;
                                        CAIssuedCertNamesMismatch=0;
                                        TrustedCerts=C:\Program Files\Microsoft Hive ODBC Driver\lib\cacerts.pem;"

Please help correct if anything wrong. Thanks!
QuestionQuestion on the same topic Pin
Chhrisha_Prasad22-Mar-14 15:07
Chhrisha_Prasad22-Mar-14 15:07 
AnswerRe: Question on the same topic Pin
Rajibdotnet0526-Mar-14 20:24
Rajibdotnet0526-Mar-14 20:24 
GeneralRe: Question on the same topic Pin
Yogesh Sonawane25-Jul-14 3:06
Yogesh Sonawane25-Jul-14 3:06 
QuestionNice article Pin
coolRahul_124-Mar-14 19:27
coolRahul_124-Mar-14 19:27 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.