Apache Hadoop Tutorial – We shall learn to install Apache Hadoop on Ubuntu. Java is a prerequisite to run Hadoop.

Install Apache Hadoop on Ubuntu

Following is a step by step guide to Install Apache Hadoop on Ubuntu

Install Java

Hadoop is an open-source framework written in Java. So, for Hadoop to run on your computer, you should install Java in prior.

Open a terminal and run the following command :

$ sudo apt-get install default-jdk

To verify the installation of Java, run the following command in the terminal :

$ java -version

The output for the command would be as shown below.

hadoopuser@tutorialkart:~# java -version
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-0ubuntu1.16.04.2-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)

Install Hadoop

Download latest Hadoop binary package from [http://hadoop.apache.org/releases.html].

Look for latest stable release (not in alpha channel) and click on binary link provided for the release.

Install Apache Hadoop on Ubuntu - Apache Hadoop Tutorial - www.tutorialkart.com

Click on the first mirror link

Install Apache Hadoop on Ubuntu - Apache Hadoop Tutorial - www.tutorialkart.com

Copy the downloaded tar file to /usr/lib/ and untar.

$ sudo cp hadoop-2.8.1.tar.gz /usr/lib/
$ sudo tar zxf hadoop-2.8.1.tar.gz
$ sudo rm hadoop-2.8.1.tar.gz

Provide the password if asked.

Set Java and Hadoop Path

Make sure you have the PATHs set up for Java and Hadoop in bashrc file.Open a Terminal and run the following command to edit bashrc file.

$ sudo nano ~/.bashrc

Paste the following entries at the end of .bashrc file.

export JAVA_HOME=/usr/lib/jvm/default-java/jre
export HADOOP_INSTALL=/usr/lib/hadoop-2.8.1
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"

Run Hadoop

After setting up the path for Hadoop and Java, you may run the hadoop command, from anywhere, using the terminal.

$ hadoop

The output would be as shown below :

Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
  CLASSNAME            run the class named CLASSNAME
  where COMMAND is one of:
  fs                   run a generic filesystem user client
  version              print the version
  jar             run a jar file
                       note: please use "yarn jar" to launch
                             YARN applications, not this command.
  checknative [-a|-h] check native hadoop and compression libraries availability
  distcp   copy file or directories recursively
  archive -archiveName NAME -p  * create a hadoop archive
  classpath            prints the class path needed to get the
                       Hadoop jar and the required libraries
  credential           interact with credential providers
  daemonlog            get/set the log level for each daemon
  trace                view and modify Hadoop tracing settings
Most commands print help when invoked w/o parameters.


In this Apache Hadoop Tutorial, we have successfully installed Hadoop on Ubuntu. In subsequent tutorials, we shall look into HDFS and MapReduce and start with Word Count Example in Hadoop.