How To Install Hadoop On Ubuntu 18.04 or 20.04
How To Install Hadoop On Ubuntu 18.04 or 20.04
com/kb/install-hadoop-ubuntu
Introduction
Every major industry is implementing Apache Hadoop as the standard framework for
processing and storing big data. Hadoop is designed to be deployed across a network
of hundreds or even thousands of dedicated servers. All these machines work together
to deal with the massive volume and variety of incoming datasets.
Deploying Hadoop services on a single node is a great way to get yourself acquainted
with basic Hadoop commands and concepts.
This easy-to-follow guide helps you install Hadoop on Ubuntu 18.04 or Ubuntu 20.04.
1 of 20 10/05/22, 20:29
How to Install Hadoop on Ubuntu 18.04 or 20.04 https://phoenixnap.com/kb/install-hadoop-ubuntu
Prerequisites
At the moment, Apache Hadoop 3.x fully supports Java 8. The OpenJDK 8 package in
Ubuntu contains both the runtime environment and development kit.
The OpenJDK or Oracle Java version can affect how elements of a Hadoop ecosystem
interact. To install a speci�c Java version, check out our detailed guide on how to
install Java on Ubuntu.
Once the installation process is complete, verify the current Java version:
2 of 20 10/05/22, 20:29
How to Install Hadoop on Ubuntu 18.04 or 20.04 https://phoenixnap.com/kb/install-hadoop-ubuntu
In the example below, the output con�rms that the latest version is already installed.
If you have installed OpenSSH for the �rst time, use this opportunity to implement
these vital SSH security recommendations.
3 of 20 10/05/22, 20:29
How to Install Hadoop on Ubuntu 18.04 or 20.04 https://phoenixnap.com/kb/install-hadoop-ubuntu
The
Getusername, in thisbandwidth
15 TB FREE example, is hdoop.
(5 TB You are free the use
in Singapore) anyBare
with username andCloud!
Metal
password you see �t. Switch to the newly created user and enter the corresponding
DEPLOY NOW
password:
su - hdoop
The user now needs to be able to SSH to the localhost without being prompted for a
password.
The system proceeds to generate and save the SSH key pair.
Use the cat command to store the public key as authorized_keys in the ssh directory:
Set the permissions for your user with the chmod command:
4 of 20 10/05/22, 20:29
How to Install Hadoop on Ubuntu 18.04 or 20.04 https://phoenixnap.com/kb/install-hadoop-ubuntu
The
Getnew
15 user is nowbandwidth
TB FREE able to SSH without
(5 TB needing to enter awith
in Singapore) password
Bareevery
Metaltime.
Cloud!
Verify everything is set up correctly by using the hdoop user to SSH to localhost:
DEPLOY NOW
ssh localhost
After an initial prompt, the Hadoop user is now able to establish an SSH connection to
the localhost seamlessly.
The steps outlined in this tutorial use the Binary download for Hadoop Version 3.2.1.
Select your preferred option, and you are presented with a mirror link that allows you to
download the Hadoop tar package.
5 of 20 10/05/22, 20:29
How to Install Hadoop on Ubuntu 18.04 or 20.04 https://phoenixnap.com/kb/install-hadoop-ubuntu
Use the provided mirror link and download the Hadoop package with the wget
command:
wget https://downloads.apache.org/hadoop/common/hadoop-3.2.
1/hadoop-3.2.1.tar.gz
Once the download is complete, extract the �les to initiate the Hadoop installation:
The Hadoop binary �les are now located within the hadoop-3.2.1 directory.
6 of 20 10/05/22, 20:29
How to Install Hadoop on Ubuntu 18.04 or 20.04 https://phoenixnap.com/kb/install-hadoop-ubuntu
This setup, also called pseudo-distributed mode, allows each Hadoop daemon to run
as a single Java process. A Hadoop environment is con�gured by editing a set of
con�guration �les:
• bashrc
• hadoop-env.sh
• core-site.xml
• hdfs-site.xml
• mapred-site-xml
• yarn-site.xml
De�ne the Hadoop environment variables by adding the following content to the end of
the �le:
7 of 20 10/05/22, 20:29
How to Install Hadoop on Ubuntu 18.04 or 20.04 https://phoenixnap.com/kb/install-hadoop-ubuntu
Getexport
15 TB HADOOP_OPTS"-Djava.library.path=$HADOOP_HOME/lib/nat
FREE bandwidth (5 TB in Singapore) with Bare Metal Cloud!
iv"
DEPLOY NOW
Once you add the variables, save and exit the .bashrc �le.
It is vital to apply the changes to the current running environment by using the following
command:
source ~/.bashrc
When setting up a single node Hadoop cluster, you need to de�ne which Java
implementation is to be utilized. Use the previously created $HADOOP_HOME variable to
access the hadoop-env.sh �le:
Uncomment the $JAVA_HOME variable (i.e., remove the # sign) and add the full path to
the OpenJDK installation on your system. If you have installed the same version as
8 of 20 10/05/22, 20:29
How to Install Hadoop on Ubuntu 18.04 or 20.04 https://phoenixnap.com/kb/install-hadoop-ubuntu
presented
Get 15 TB in the part of this tutorial,
�rstbandwidth
FREE (5 TBadd the following line:
in Singapore) with Bare Metal Cloud!
DEPLOY NOW
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
The path needs to match the location of the Java installation on your system.
If you need help to locate the correct Java path, run the following command in your
terminal window:
which javac
The resulting output provides the path to the Java binary directory.
Use the provided path to �nd the OpenJDK directory with the following command:
readlink -f /usr/bin/javac
The section of the path just before the /bin/javac directory needs to be assigned to the
$JAVA_HOME variable.
9 of 20 10/05/22, 20:29
How to Install Hadoop on Ubuntu 18.04 or 20.04 https://phoenixnap.com/kb/install-hadoop-ubuntu
To set up Hadoop in a pseudo-distributed mode, you need to specify the URL for your
NameNode, and the temporary directory Hadoop uses for the map and reduce process.
Add the following con�guration to override the default values for the temporary
directory and add your HDFS URL to replace the default local �le system setting:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hdoop/tmpdata</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://127.0.0.1:9000</value>
</property>
</configuration>
This example uses values speci�c to the local system. You should use values that
match your systems requirements. The data needs to be consistent throughout the
con�guration process.
10 of 20 10/05/22, 20:29
How to Install Hadoop on Ubuntu 18.04 or 20.04 https://phoenixnap.com/kb/install-hadoop-ubuntu
Do not forget to create a Linux directory in the location you speci�ed for your
temporary data.
Use the following command to open the hdfs-site.xml �le for editing:
Add the following con�guration to the �le and, if needed, adjust the NameNode and
DataNode directories to your custom locations:
<configuration>
<property>
<name>dfs.data.dir</name>
<value>/home/hdoop/dfsdata/namenode</value>
</property>
<property>
11 of 20 10/05/22, 20:29
How to Install Hadoop on Ubuntu 18.04 or 20.04 https://phoenixnap.com/kb/install-hadoop-ubuntu
<name>dfs.data.dir</name>
Get 15 TB FREE bandwidth (5 TB in Singapore) with Bare Metal Cloud!
<value>/home/hdoop/dfsdata/datanode</value>
DEPLOY NOW
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
If necessary, create the speci�c directories you de�ned for the dfs.data.dir value.
Add the following con�guration to change the default MapReduce framework name
value to yarn:
<configuration>
<property>
12 of 20 10/05/22, 20:29
How to Install Hadoop on Ubuntu 18.04 or 20.04 https://phoenixnap.com/kb/install-hadoop-ubuntu
<name>mapreduce.framework.name</name>
Get 15 TB FREE bandwidth (5 TB in Singapore) with Bare Metal Cloud!
<value>yarn</value>
DEPLOY NOW
</property>
</configuration>
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.cla
ss</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
13 of 20 10/05/22, 20:29
How to Install Hadoop on Ubuntu 18.04 or 20.04 https://phoenixnap.com/kb/install-hadoop-ubuntu
Get</property>
15 TB FREE bandwidth (5 TB in Singapore) with Bare Metal Cloud!
<property>
DEPLOY NOW
<name>yarn.resourcemanager.hostname</name>
<value>127.0.0.1</value>
</property>
<property>
<name>yarn.acl.enable</name>
<value>0</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADO
OP_CONF_DIR,CLASSPATH_PERPEND_DISTCACHE,HADOOP_YARN_HOME,HA
DOOP_MAPRED_HOME</value>
</property>
</configuration>
The shutdown noti�cation signi�es the end of the NameNode format process.
14 of 20 10/05/22, 20:29
How to Install Hadoop on Ubuntu 18.04 or 20.04 https://phoenixnap.com/kb/install-hadoop-ubuntu
./start-dfs.sh
Once the namenode, datanodes, and secondary namenode are up and running, start
the YARN resource and nodemanagers by typing:
./start-yarn.sh
As with the previous command, the output informs you that the processes are starting.
Type this simple command to check if all the daemons are active and running as Java
processes:
15 of 20 10/05/22, 20:29
How to Install Hadoop on Ubuntu 18.04 or 20.04 https://phoenixnap.com/kb/install-hadoop-ubuntu
http://localhost:9870
The NameNode user interface provides a comprehensive overview of the entire cluster.
The default port 9864 is used to access individual DataNodes directly from your
browser:
http://localhost:9864
16 of 20 10/05/22, 20:29
How to Install Hadoop on Ubuntu 18.04 or 20.04 https://phoenixnap.com/kb/install-hadoop-ubuntu
Vladimir Kaplarevic
http://localhost:8088
Databases, DevOps
and Development
How to Install
Elasticsearch on
Conclusion
Ubuntu 18.04
April 23, 2020
17 of 20 10/05/22, 20:29
How to Install Hadoop on Ubuntu 18.04 or 20.04 https://phoenixnap.com/kb/install-hadoop-ubuntu
R
Get 15 TB FREE bandwidth (5 TB in Singapore) with Bare Metal Cloud!
E
DEPLOY NOW
A
Databases, DevOps
D
and Development
M
How to Install
O
Spark on Ubuntu
R
April 13, 2020
E
18 of 20 10/05/22, 20:29
How to Install Hadoop on Ubuntu 18.04 or 20.04 https://phoenixnap.com/kb/install-hadoop-ubuntu
19 of 20 10/05/22, 20:29
How to Install Hadoop on Ubuntu 18.04 or 20.04 https://phoenixnap.com/kb/install-hadoop-ubuntu
RE
20 of 20 10/05/22, 20:29