Click here to Skip to main content
13,094,732 members (79,178 online)
Click here to Skip to main content
Add your own
alternative version

Stats

4.5K views
12 bookmarked
Posted 25 Jan 2017

Hadoop Beginners Guide - How To Install

, 25 Jan 2017
Rate this:
Please Sign up or sign in to vote.
Step by step procedure to install Hadoop 2.7.3 version on Ubuntu 16.04 operating system. This may work for any other versions of Hadoop and Ubuntu.

Editorial Note

This article appears in the Third Party Products and Tools section. Articles in this section are for the members only and must not be used to promote or advertise products in any way, shape or form. Please report any spam or advertising.

Introduction

In my previous article, I tried to give an overview on Big Data and Hadoop. In this article, I will show you how to install hadoop (single node cluster) on ubuntu operating system. Windows users can also follow this article to install Ubuntu in a virtual machine and get the flavor of hadoop :)

Prerequisite of Hadoop

  • JDK: The Java Development Kit (JDK) is a software development environment used for developing Java applications and applets. It includes the Java Runtime Environment (JRE), an interpreter/loader (java), a compiler (javac), an archiver (jar), a documentation generator (javadoc) and other tools needed in Java development. Since Hadoop framework is written in Java, it requires JDK
  • SSH: SSH ("Secure SHell") is a protocol for securely accessing one computer from another. Despite the name, SSH allows you to run command line and graphical programs, transfer files, and even create secure virtual private networks over the Internet.

Install VMWare Player and Ubuntu Operating System

This step is for windows users only. Please skip this step if you already have Ubuntu system installed. Start from step "Install Java 8 JDK"

  • Download VMWare Player here

          http://www.vmware.com/products/player/playerpro-evaluation.html

  • Install VMWare Player
  • Download Ubuntu here

          https://www.ubuntu.com/download/desktop

  • Open VMWare Player

  • Click on “Create a New Virtual Machine” which opens the following screen

  • Choose option “I will install the operating system later” and click on “Next” button which opens the following screen

  • Choose option “Linux” and select “Ubuntu 64-bit” from version dropdownlist and click on “Next” button to go to the next screen

  • Enter the name of virtual machine, set the location and click on “Next” button to go to the next screen

  • Set maximum disk size as 40 GB if you have enough disk space, choose option “Store virtual disk as a single file” and click on “Next” button which navigates to the next screen

  • Click on Customize Hardware if you have more than 4GB RAM

  • Select 2GB RAM and click on “Close” button. And then click on “Finish” button

  • Click on “Edit virtual machine settings”

  • Click on “CD/DVD (SATA)” hardware, choose option “Use ISO image file” and browse the Ubuntu ISO file. Click “OK” to close this window

  • Click on “Play Virtual Machine”. This will start installing Ubuntu operating system. Follow the step by step procedure and finish the installation

Install Java 8 JDK

  • Login to Ubuntu machine
  • Open Terminal by pressing Ctrl+Alt+T
  • Login as "su" (super user) using the following command. Use the same password while you install Ubuntu
  • sudo su
  • Type "cd" (change directory) and press enter to move to the root directory
  • cd
  • Type the following command and press enter
  • apt-get install openjdk-8-jdk
  • This will ask for a confirmation. Type Y and press enter
  • This will take sometime to complete. Execute “clear” command to clear the screen
  • clear
  • Execute the following command to see if JDK is installed successfully
  • java -version
    javac -version

Setting JAVA_HOME Variable

  • Run this command to get JDK path
  • update-alternatives --config java
  •  
  • So JDK is installed in “/usr/lib/jvm/java-8-openjdk-amd64” path
  •  
  • Edit environment variables by typing the following command
  • gedit /etc/environmen
  • This will open an editor. Add the following line to the end of the editor
  • JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64"
  • Click on “Save” and close the window

  • Run this command to check if the edited file is error free

  • source /etc/environmen
  • Run this command to check if JAVA_HOME variable has been added properly
  • echo $JAVA_HOME
  •  

Installing SSH

  • Run following command
  • apt-get install ssh
  • This will ask for a confirmation. Type Y and press enter
  •  
  •  
  • Once done, generate public/private rsa key pair by executing the following command
  • ssh-keygen -t rsa -P ""
  • This will ask “Enter file in which to save the key (/root/.ssh/id_rsa):”. Type nothing and press enter.
  •  
  •  
  • Make the generated public key authorized by running the following command
  • cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
  •  
  • Check if ssh is installed and running properly by executing the following command
  • ssh localhost
  • This will ask “Are you sure you want to continue connecting (yes/no)?”. Type yes and press enter
  •  
  •  
  • If it shows error, execute the same command again
  • ssh localhost
  •  
  • It should display the above message if ssh is installed and running properly

Download Hadoop

  • Download Hadoop version 2.7.3 from the following link

          http://hadoop.apache.org/releases.html

  • Click on 2.7.3 version binary
  •  
  •  
  • Click on the link marked as red to download the file. This will open a window. Select “Save File” option and click on “Save” button
  •  
  •  
  • This will start downloading the file
  •  
  •  
  •  
  • The file will be saved in default download location set in the browser

Installing Hadoop

  • Close the terminal and open it again. No need to login as “su”
  • Find the path where the hadoop installation file is downloaded and run the following command to unpack it.
  • tar -xvzf ‘<downloaded package path>’
  • In my case, it is
  • tar -xvzf ‘/home/fazlur/Downloads/hadoop-2.7.3.tar.gz’
  • This creates a directory "hadoop-2.7.3" under home directory
  •  

 

Configuring Hadoop

  • In Terminal, login as root using the following command. Use the same password while you install Ubuntu
  • sudo su
  •  
  • Run this command to edit “.bashrc” file
  • gedit ~/.bashrc
  •  
  • This will open an editor. Add the following lines to the end of this editor. Replace <JAVA_PATH> and <HADOOP_HOME_PATH> with appropriate paths
  • #HADOOP VARIABLES START
    <meta charset="utf-8" />export JAVA_HOME=<JAVA PATH>
    <meta charset="utf-8" />export PATH=${JAVA_HOME}/bin:${PATH}
    <meta charset="utf-8" />export HADOOP_INSTALL=<HADOOP HOME PATH>
    export PATH=$PATH:$HADOOP_INSTALL/bin
    export PATH=$PATH:$HADOOP_INSTALL/sbin
    export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar
    export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
    export HADOOP_COMMON_HOME=$HADOOP_INSTALL
    export HADOOP_HDFS_HOME=$HADOOP_INSTALL
    export YARN_HOME=$HADOOP_INSTALL
    export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
    export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
    #HADOOP VARIABLES END
  • In my case, it looks like this
  •  
  •  
  • Save and close the editor
  • Run the following command to check if there is any error in .bashrc file
  • source ~/.bashrc
  •  
  • Get into path “hadoop-2.7.3/etc/hadoop” by running the following command
  • cd <HADOOP PATH>
  • In my case, it is
    
    cd /home/fazlur/hadoop-2.7.3/etc/hadoop
  •  
  • Edit “hadoop-env.sh” file using following command
  • gedit hadoop-env.sh
  •  
  • This will open an editor. Append this line to the end of the editor. Save and close the editor
  • export JAVA_HOME=<Your Java Path>
  •  
  • In my case, it looks like this
    
    export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

         

  • Run the following command to check if there is any error in hadoop-env.sh file
  • source hadoop-env.sh
  •  
  • Make a directory called “hadoop_store” in the same directory where hadoop-2.7.3 exists. And get into the directory. Run the following commands to do that
  • cd <HOME PATH>
    mkdir hadoop_store
    cd hadoop_store
  • In my case, it is
    
    cd /home/fazlur
  • Make a directory called “hdfs” and get into it. Run these commands to do that
  • mkdir hdfs
    cd hdfs
  •  
  • Make two directories called “namenode” and “datanode” inside “hdfs” directory. Run these commands to do that. The screenshot shows the consecutive commands and directory structure
  • mkdir namenode
    mkdir datanode
  •  
  • Get into path “hadoop-2.7.3/etc/hadoop” by running the following command
  • cd <HADOOP PATH>
    
    In my case, it is
    
    cd /home/fazlur/hadoop-2.7.3/etc/hadoop
  •  
  • Edit “hdfs-site.xml” by running the following command. This will open an editor
  • gedit hdfs-site.xml
  •  
  • Append the following lines between <configuration></configuration> tags. Replace <NAMENODE_FOLDER_PATH> and <DATANODE_FOLDER_PATH> with appropriate paths
  • <property>
     <name>dfs.replication</name>
     <value>1</value>
     <description>Default block replication.
     The actual number of replications can be specified when the file is created.
     The default is used if replication is not specified in create time.
     </description>
    </property>
    <property>
      <name>dfs.namenode.name.dir</name>
     <value>file:<NAMENODE_FOLDER_PATH></value>
    </property>
    <property>
      <name>dfs.datanode.data.dir</name>
      <value>file:<DATANODE_FOLDER_PATH></value>
    </property>
    
  • It looks like this in my case
  •  
  •  
  • Save and close the editor
  • Get into “hadoop-2.7.3” folder and create a directory called “tmp”. The following commands do this
  • cd <hadoop-2.7.3 path>
    mkdir tmp
    
    In my case
    
    cd /home/fazlur/hadoop-2.7.3
    mkdir tmp
  •  
  • Edit “core-site.xml” file using the following command
  • gedit core-site.xml
  •  
  • This will open an editor. Append the following lines between <configuration></configuration> tags. Replace <TMP_FOLDER_PATH> with appropriate path.
  • <property>
     <name>hadoop.tmp.dir</name>
     <value>/home/fazlur/hadoop-2.7.3/tmp</value>
     <description>A base for other temporary directories.</description>
    </property>
    
    <property>
     <name>fs.default.name</name>
     <value>hdfs://localhost:54310</value>
     <description>The name of the default file system.  A URI whose
     scheme and authority determine the FileSystem implementation.  The
     uri's scheme determines the config property (fs.SCHEME.impl) naming
     the FileSystem implementation class.  The uri's authority is used to
     determine the host, port, etc. for a filesystem.</description>
    </property>
  •  
  • Here is my one looks like
  •  
  •  
  • Save and close the editor
  • Run the following command to create “mapred-site.xml” file using “mapred-site.xml.template” template
  • cp mapred-site.xml.template mapred-site.xml
  •  
  • Edit “mapred-site.xml” using the following command
  • gedit mapred-site.xml
  •  
  • This will open an editor. Append the following lines between <configuration></configuration> tags. Replace <TMP_FOLDER_PATH> with appropriate path.
  • <property>
     <name>mapred.job.tracker</name>
     <value>localhost:54311</value>
     <description>The host and port that the MapReduce job tracker runs
     at.  If "local", then jobs are run in-process as a single map
     and reduce task.
     </description>
    </property>
  • Here is my one looks like
  •  
  •  
  • Save and close the editor
  • Get into the root directory by executing command “cd”
  • Format Hadoop File System by running the following command
  • hadoop namenode -format
  •  
  • Restart your machine
  • Open the terminal and login as “su”
  • Run this command to start hadoop
  • start-all.sh
  •  
  • Run this command to check if all the services has been started
  • jps
  • It looks like NameNode service is not running. Follow this steps to get it worked
    • Restart your machine
    • Open terminal and login as “su”
    • Type “cd” to move to root directory
    • Execute command “hadoop namenode -format” to format hadoop file system
    • Execute command “start-all.sh” to start all services
    • Execute command “jps” to check if all the services has been started

                 

  • Now open your favourite browser and type the following url
  • http://localhost:8088
  • It opens a page like this if everything is up and running
  •  
  •  
  • Type the following url to check datanodes as well as browse hadoop file system
  • http://localhost:50070
  • This opens a page like this
  •  
  •  
  • Navigate to “Utilities-->Browse the file system” to check hadoop file system
  •  

Conclusion

Hope you enjoyed reading and get a successful installation of hadoop in your ubuntu system. In my next consecutive articles, I will explain different components of Hadoop in details.

Thanks to read my article and keep in touch.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Fazlur Rahman
Technical Lead Ominto Inc
United Arab Emirates United Arab Emirates
I am Bachelor in CSE from Khulna University of Engineering & Technology,Bangladesh. I have more than 11 years experience in software design & development, data analysis & modeling, project management and currently working in a software company in Dubai,UAE as a Lead Software Engineer. I am MCAD(Microsoft Certified Application Developer) certified since 2005. Please feel free to contact with me at nill_akash_7@yahoo.com.


You may also be interested in...

Pro

Comments and Discussions

 
QuestionI haven't seen you modified any changes to yarn-site.xml Pin
Member 130445827-Mar-17 10:15
memberMember 130445827-Mar-17 10:15 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web04 | 2.8.170813.1 | Last Updated 26 Jan 2017
Article Copyright 2017 by Fazlur Rahman
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid