0% found this document useful (0 votes)
121 views

Installing Hadoop On Ubuntu

The document provides steps to install Hadoop on Ubuntu in standalone mode and run an example MapReduce program to verify the installation. It involves installing Java, downloading and verifying the Hadoop tarball, extracting and moving Hadoop files to /usr/local, configuring Hadoop's Java home by finding the default Java path, and running Hadoop commands to test the installation.

Uploaded by

SAYYAD RAFI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
121 views

Installing Hadoop On Ubuntu

The document provides steps to install Hadoop on Ubuntu in standalone mode and run an example MapReduce program to verify the installation. It involves installing Java, downloading and verifying the Hadoop tarball, extracting and moving Hadoop files to /usr/local, configuring Hadoop's Java home by finding the default Java path, and running Hadoop commands to test the installation.

Uploaded by

SAYYAD RAFI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Installing Hadoop on Ubuntu

AMRITPAL SINGH
Introduction
• Hadoop is a Java-based programming framework that supports the
processing and storage of extremely large datasets on a cluster of
inexpensive machines.

• It was the first major open source project in the big data playing field
and is sponsored by the Apache Software Foundation.
Introduction
• Hadoop 2.7 is comprised of four main layers:

• Hadoop Common is the collection of utilities and libraries that


support other Hadoop modules.

• HDFS, which stands for Hadoop Distributed File System, is responsible


for persisting data to disk.
Introduction
• YARN, short for Yet Another Resource Negotiator, is the "operating
system" for HDFS.

• MapReduce is the original processing model for Hadoop clusters. It


distributes work within the cluster or map, then organizes and
reduces the results from the nodes into a response to a query.

• Many other processing models are available for the 2.x version of
Hadoop.
Introduction
• Hadoop clusters are relatively complex to set up, so the project
includes a stand-alone mode which is suitable for learning about
Hadoop, performing simple operations, and debugging.

• We'll install Hadoop in stand-alone mode and run one of the example
example MapReduce programs it includes to verify the installation
Prerequisites
• An Ubuntu 16.04 server with a non-root user with sudo privileges

• Java
Steps
• Step 1 — Installing Java

• To get started, we'll update our package list:

• sudo apt-get update

• Next, install OpenJDK, the default Java Development Kit on Ubuntu


16.04.
Steps
• sudo apt-get install default-jdk

• Once the installation is complete, let's check the version.

• java –version

• openjdk version "1.8.0_91"


• OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-
3ubuntu1~16.04.1-b14)
• OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
Steps
• Step 2 — Installing Hadoop

• With Java in place, we'll visit the Apache Hadoop Releases page to
find the most recent stable release.

• http://hadoop.apache.org/releases.html
Steps
Steps
• On the server, we'll use wget to fetch it:

• wget http://apache.mirrors.tds.net/hadoop/common/hadoop-
2.7.3/hadoop-2.7.3.tar.gz

• In order to make sure that the file we downloaded hasn't been


altered, we'll do a quick check using SHA-256.
Steps
Steps
Steps
Steps
• Again, we'll right-click to copy the file location, then use wget to
transfer the file:

• wget
https://dist.apache.org/repos/dist/release/hadoop/common/hadoop
-2.7.3/hadoop-2.7.3.tar.gz.mds
Steps
• Then run the verification:

• shasum -a 256 hadoop-2.7.3.tar.gz

• Output
• d489df3808244b906eb38f4d081ba49e50c4603db03efd5e594a1e98b
09259c2 hadoop-2.7.3.tar.gz
Steps
• Compare this value with the SHA-256 value in the .mds file:

• cat hadoop-2.7.3.tar.gz.mds
Steps
• You can safely ignore the difference in case and the spaces.

• The output of the command we ran against the file we downloaded


from the mirror should match the value in the file we downloaded
from apache.org.
Steps
• Now that we've verified that the file wasn't corrupted or changed,
we'll use the tar command with the -x flag to extract, -z to
uncompress, -v for verbose output, and -f to specify that we're
extracting from a file.

• Use tab-completion or substitute the correct version number in the


command below:
Steps
• tar -xzvf hadoop-2.7.3.tar.gz

• Finally, we'll move the extracted files into /usr/local, the appropriate
place for locally installed software.

• Change the version number, if needed, to match the version you


downloaded.
Steps
• sudo mv hadoop-2.7.3 /usr/local/Hadoop

• With the software in place, we're ready to configure its environment.


Steps
• Step 3 — Configuring Hadoop's Java Home

• Hadoop requires that you set the path to Java, either as an


environment variable or in the Hadoop configuration file.
Steps
• The path to Java, /usr/bin/java is a symlink to /etc/alternatives/java,
which is in turn a symlink to default Java binary.

• We will use readlink with the -f flag to follow every symlink in every
part of the path, recursively.

• Then, we'll use sed to trim bin/java from the output to give us the
correct value for JAVA_HOME.
Steps
• To find the default Java path

• readlink -f /usr/bin/java | sed "s:bin/java::“

• Output
• /usr/lib/jvm/java-8-openjdk-amd64/jre/
Steps
• You can copy this output to set Hadoop's Java home to this specific
version, which ensures that if the default Java changes, this value will
not.

• Alternatively, you can use the readlink command dynamically in the


file so that Hadoop will automatically use whatever Java version is set
as the system default.
Steps
• To begin, open hadoop-env.sh:

• sudo nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh


Steps
• To begin, open hadoop-env.sh:

• sudo nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh


Steps
• To begin, open hadoop-env.sh:

• sudo nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh


Step 4 — Running Hadoop
• /usr/local/hadoop/bin/Hadoop

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy