0% found this document useful (0 votes)

184 views

Install and Run Hadoop On Windows

1. The document provides instructions on how to install and run Hadoop on Windows 10. It describes downloading and configuring Java, Hadoop, and editing configuration files. 2. The prerequisites and steps include setting environment variables, creating HDFS directories, editing configuration files for core-site, mapred-site, hdfs-site, and yarn-site, and replacing the bin folder. 3. Formatting the namenode and starting HDFS and YARN services are also covered. Finally, commands for creating a directory and listing files in HDFS are demonstrated.

Uploaded by

sunilswastik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

184 views

Install and Run Hadoop On Windows

Uploaded by

sunilswastik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 29

Install and Run Hadoop on Windows

Introduction
Hadoop is a software framework from Apache Software Foundation that is used to store and process Big Data. It has two
main components; Hadoop Distributed File System (HDFS), its storage system and MapReduce, is its data processing
framework. Hadoop has the capability to manage large datasets by distributing the dataset into smaller chunks across
multiple machines and performing parallel computation on it .
Overview of HDFS
Hadoop is an essential component of the Big Data industry as it provides the most reliable storage layer, HDFS, which
can scale massively. Companies like Yahoo and Facebook use HDFS to store their data.
HDFS has a master-slave architecture where the master node is called NameNode and slave node is called DataNode.
The NameNode and its DataNodes form a cluster. NameNode acts like an instructor to DataNode while the DataNodes
store the actual data.

There is another component of Hadoop known as YARN. The idea of Yarn is to manage the resources and
schedule/monitor jobs in Hadoop. Yarn has two main components, Resource Manager and Node Manager. The resource
manager has the authority to allocate resources to various applications running in a cluster. The node manager is
responsible for monitoring their resource usage (CPU, memory, disk) and reporting the same to the resource manager.

Advantages of Hadoop

1. Economical – Hadoop is an open source Apache product, so it is free software. It has hardware cost associated with it.
It is cost effective as it uses commodity hardware that are cheap machines to store its datasets and not any specialized
machine.
2. Scalable – Hadoop distributes large data sets across multiple machines of a cluster. New machines can be easily
added to the nodes of a cluster and can scale to thousands of nodes storing thousands of terabytes of data.
3. Fault Tolerance – Hadoop, by default, stores 3 replicas of data across the nodes of a cluster. So if any node goes
down, data can be retrieved from other nodes.
4. Fast – Since Hadoop processes distributed data parallelly, it can process large data sets much faster than the
traditional systems. It is highly suitable for batch processing of data.
5. Flexibility – Hadoop can store structured, semi-structured as well as unstructured data. It can accept data in the form
of textfile, images, CSV files, XML files, emails, etc
6. Data Locality – Traditionally, to process the data, the data was fetched from the location it is stored, to the location
where the application is submitted; however, in Hadoop, the processing application goes to the location of data to perform
computation. This reduces the delay in processing of data.
7. Compatibility – Most of the emerging big data tools can be easily integrated with Hadoop like Spark. They use Hadoop
as a storage platform and work as its processing system.

Hadoop Deployment Methods

1. Standalone Mode – It is the default mode of configuration of Hadoop. It doesn’t use hdfs instead, it uses a local file
system for both input and output. It is useful for debugging and testing.
2. Pseudo-Distributed Mode – It is also called a single node cluster where both NameNode and DataNode resides in the
same machine. All the daemons run on the same machine in this mode. It produces a fully functioning cluster on a single
machine.
3. Fully Distributed Mode – Hadoop runs on multiple nodes wherein there are separate nodes for master and slave
daemons. The data is distributed among a cluster of machines providing a production environment.

Hadoop Installation on Windows 10

As a beginner, you might feel reluctant in performing cloud computing which requires subscriptions. While you can install
a virtual machine as well in your system, it requires allocation of a large amount of RAM for it to function smoothly else it
would hang constantly.
You can install Hadoop in your system as well which would be a feasible way to learn Hadoop.
We will be installing single node pseudo-distributed hadoop cluster on windows 10.
Prerequisite: To install Hadoop, you should have Java version 1.8 in your system.
Check your java version through this command on command prompt

1 java –version

If java is not installed in your system, then –

Go this link –
Accept the license,
Download the file according to your operating system. Keep the java folder directly under the local disk directory
(C:\Java\jdk1.8.0_152) rather than in Program Files (C:\Program Files\Java\jdk1.8.0_152) as it can create errors
afterwards.

After downloading java version 1.8, download hadoop version 3.1 from this link –
Extract it to a folder.
Setup System Environment Variables
Open control panel to edit the system environment variable

Go to environment variable in system properties

Create a new user variable. Put the Variable_name as HADOOP_HOME and Variable_value as the path of the bin folder
where you extracted hadoop.
Likewise, create a new user variable with variable name as JAVA_HOME and variable value as the path of the bin folder
in the Java directory.

Now we need to set Hadoop bin directory and Java bin directory path in system variable path.
Edit Path in system variable

Click on New and add the bin directory path of Hadoop and Java in it.
Configurations
Now we need to edit some files located in the hadoop directory of the etc folder where we installed hadoop. The files that
need to be edited have been highlighted.
1. Edit the file core-site.xml in the hadoop directory. Copy this xml property in the configuration in the file

1 <configuration>
2    <property>
3        <name>fs.defaultFS</name>
4        <value>hdfs://localhost:9000</value>
5    </property>
6 </configuration>

2. Edit mapred-site.xml and copy this property in the cofiguration

1 <configuration>
2    <property>
3        <name>mapreduce.framework.name</name>
4        <value>yarn</value>
5    </property>
6 </configuration>

3. Create a folder ‘data’ in the hadoop directory

Create a folder with the name ‘datanode’ and a folder ‘namenode’ in this data directory

4. Edit the file hdfs-site.xml and add below property in the configuration
Note: The path of namenode and datanode across value would be the path of the datanode and namenode folders you
just created.
1 <configuration>
2    <property>
3        <name>dfs.replication</name>
4        <value>1</value>
5    </property>
6    <property>
7        <name>dfs.namenode.name.dir</name>
8        <value>C:\Users\hp\Downloads\hadoop-3.1.0\hadoop-3.1.0\data\namenode</value>
9    </property>
10    <property>
11        <name>dfs.datanode.data.dir</name>
12        <value> C:\Users\hp\Downloads\hadoop-3.1.0\hadoop-3.1.0\data\datanode</value>
13    </property>
14 </configuration>

5. Edit the file yarn-site.xml and add below property in the configuration

1 <configuration>
2    <property>
3      <name>yarn.nodemanager.aux-services</name>
4      <value>mapreduce_shuffle</value>
5    </property>
6    <property>
7        <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
8 <value>org.apache.hadoop.mapred.ShuffleHandler</value>
9    </property>
10 </configuration>

6. Edit hadoop-env.cmd and replace %JAVA_HOME% with the path of the java folder where your jdk 1.8 is installed

Hadoop needs windows OS specific files which does not come with default download of hadoop.
To include those files, replace the bin folder in hadoop directory with the bin folder provided in this github link.
https://github.com/s911415/apache-hadoop-3.1.0-winutils
Download it as zip file. Extract it and copy the bin folder in it. If you want to save the old bin folder, rename it like bin_old
and paste the copied bin folder in that directory.

Check whether hadoop is successfully installed by running this command on cmd-

1 hadoop version

Since it doesn’t throw error and successfully shows the hadoop version, that means hadoop is successfully installed in the
system.

Format the NameNode

Formatting the NameNode is done once when hadoop is installed and not for running hadoop filesystem, else it will delete
all the data inside HDFS. Run this command-

1 hdfs namenode –format

It would appear something like this –

Now change the directory in cmd to sbin folder of hadoop directory with this command,
(Note: Make sure you are writing the path as per your system)
1 cd C:\Users\hp\Downloads\hadoop-3.1.0\hadoop-3.1.0\sbin

Start namenode and datanode with this command –

1 start-dfs.cmd

Two more cmd windows will open for NameNode and DataNode
Now start yarn through this command-

1 start-yarn.cmd

Two more windows will open, one for yarn resource manager and one for yarn node manager.

Note: Make sure all the 4 Apache Hadoop Distribution windows are up n running. If they are not running, you will see an
error or a shutdown message. In that case, you need to debug the error.
To access information about resource manager current jobs, successful and failed jobs, go to this link in browser-
http://localhost:8088/cluster
To check the details about the hdfs (namenode and datanode),
Open this link on browser-

Note: If you are using Hadoop version prior to 3.0.0 – Alpha 1, then use port here

Working with HDFS
I will be using a small text file in my local file system. To put it in hdfs using hdfs command line tool.
I will create a directory named ‘sample’ in my hadoop directory using the following command-

1 hdfs dfs –mkdir /sample

To verify if the directory is created in hdfs, we will use ‘ls’ command which will list the files present in hdfs –

1 hdfs dfs –ls /

Then I will copy a text file named ‘potatoes’ from my local file system to this folder that I just created in hdfs using
copyFromLocal command-

1 hdfs dfs -copyFromLocal C:\Users\hp\Downloads\potatoes.txt /sample

To verify if the file is copied to the folder, I will use ‘ls’ command by specifying the folder name which will read the list of
files in that folder –

1 hdfs dfs –ls /sample

To view the contents of the file we copied, I will use cat command-

1 hdfs dfs –cat /sample/potatoes.txt

To Copy file from hdfs to local directory, I will use get command –

1 hdfs dfs -get /sample/potatoes.txt C:\Users\hp\Desktop\priyanka

These were some basic hadoop commands. You can refer to this HDFS commands guide to learn more here

Conclusion
Hadoop MapReduce can be used to perform data processing activity. However, it possessed limitations due to which
frameworks like Spark and Pig emerged and have gained popularity. A 200 lines of MapReduce code can be written with
less than 10 lines of Pig code. Hadoop has various other components in its ecosystem like Hive, Sqoop, Oozie, and
HBase. You can download this software as well in your windows system to perform data processing operations using
cmd.

Hadoop Command Line Interface
No ratings yet
Hadoop Command Line Interface
10 pages
DEV3600 LabGuide
No ratings yet
DEV3600 LabGuide
26 pages
Lab 1 - Hadoop HDFS and MapReduce
No ratings yet
Lab 1 - Hadoop HDFS and MapReduce
4 pages
Step by Step Hadoop 2.8 Installation
No ratings yet
Step by Step Hadoop 2.8 Installation
14 pages
Describe The Functions and Features of HDP
100% (2)
Describe The Functions and Features of HDP
16 pages
Unit-4-Unit-4-Bda EDIT
No ratings yet
Unit-4-Unit-4-Bda EDIT
16 pages
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
Professional Hadoop Solutions
From Everand
Professional Hadoop Solutions
Boris Lublinsky
4/5 (2)
Mastering Apache Cassandra - Second Edition
From Everand
Mastering Apache Cassandra - Second Edition
Nishant Neeraj
No ratings yet
HDInsight Essentials - Second Edition
From Everand
HDInsight Essentials - Second Edition
Rajesh Nadipalli
No ratings yet
Hadoop and BigData LAB MANUAL
33% (3)
Hadoop and BigData LAB MANUAL
59 pages
Course Contents of Hadoop and Big Data
No ratings yet
Course Contents of Hadoop and Big Data
11 pages
2 HDFS Commands
No ratings yet
2 HDFS Commands
7 pages
Big Data Analytics - Lab-Manual
No ratings yet
Big Data Analytics - Lab-Manual
19 pages
HDFS Commands
No ratings yet
HDFS Commands
6 pages
HDFS Commands
No ratings yet
HDFS Commands
15 pages
HDFS Concepts
No ratings yet
HDFS Concepts
10 pages
Install Hadoop-2.6.0 On Windows10
No ratings yet
Install Hadoop-2.6.0 On Windows10
10 pages
BIG DATA WITH HADOOP, HDFS & MAPREDUCE (Hands On Training)
No ratings yet
BIG DATA WITH HADOOP, HDFS & MAPREDUCE (Hands On Training)
35 pages
MCQ Type Questions
No ratings yet
MCQ Type Questions
24 pages
Tutorial-HDP-Administration V III
100% (1)
Tutorial-HDP-Administration V III
274 pages
Bigdatacourse
No ratings yet
Bigdatacourse
10 pages
Hadoop and Mapreduce
No ratings yet
Hadoop and Mapreduce
21 pages
DATA ANALYTICS Lab
No ratings yet
DATA ANALYTICS Lab
3 pages
Cloudera Administration Study Guide
No ratings yet
Cloudera Administration Study Guide
3 pages
100 Interview Questions On Hadoop PDF
No ratings yet
100 Interview Questions On Hadoop PDF
24 pages
BDC Previous Papers 2 Marks
100% (1)
BDC Previous Papers 2 Marks
7 pages
Hadoop Exams
No ratings yet
Hadoop Exams
14 pages
Unit 4 Hadoop
No ratings yet
Unit 4 Hadoop
86 pages
Hadoop I/O: Jaeyong Choi
No ratings yet
Hadoop I/O: Jaeyong Choi
36 pages
Twitter Sentimental Analysis
No ratings yet
Twitter Sentimental Analysis
42 pages
Hadoop Unit-4
No ratings yet
Hadoop Unit-4
44 pages
14-Lesson Cloudera Hive
No ratings yet
14-Lesson Cloudera Hive
9 pages
Hadoop Echosystem and Ibm Big Insights: Rafie Tarabay Eng - Rafie@Mans - Edu.Eg
No ratings yet
Hadoop Echosystem and Ibm Big Insights: Rafie Tarabay Eng - Rafie@Mans - Edu.Eg
112 pages
Word Count Program To Demonstrate The Use of Map and Reduce Tasks
No ratings yet
Word Count Program To Demonstrate The Use of Map and Reduce Tasks
5 pages
Apache Hive DDL DML, Queries
100% (2)
Apache Hive DDL DML, Queries
4 pages
Manual Hadoop HIve Installation
No ratings yet
Manual Hadoop HIve Installation
4 pages
HDFS Exercises - Basic
No ratings yet
HDFS Exercises - Basic
5 pages
CCS334 BDA Lab Manual Final
No ratings yet
CCS334 BDA Lab Manual Final
40 pages
BD - Unit - IV - Hive and Pig
No ratings yet
BD - Unit - IV - Hive and Pig
41 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
80 pages
DAN Lab ManuaL
No ratings yet
DAN Lab ManuaL
53 pages
Big Data and Data Analytics Cloudera.
No ratings yet
Big Data and Data Analytics Cloudera.
3 pages
Hadoop and Related Tools
No ratings yet
Hadoop and Related Tools
57 pages
Hadoop ppt@87
No ratings yet
Hadoop ppt@87
16 pages
MapR Sandbox For Hadoop DocUpdateFor3.1.1
No ratings yet
MapR Sandbox For Hadoop DocUpdateFor3.1.1
7 pages
Hadoop: Fasilkom/Pusilkom UI (Credit: Samuel Louvan)
No ratings yet
Hadoop: Fasilkom/Pusilkom UI (Credit: Samuel Louvan)
44 pages
BDA Unit - II
No ratings yet
BDA Unit - II
66 pages
BDA Lab ManuaL[1]
No ratings yet
BDA Lab ManuaL[1]
83 pages
HBase Interview Questions
No ratings yet
HBase Interview Questions
12 pages
Installation of Hadoop in Windows10
No ratings yet
Installation of Hadoop in Windows10
4 pages
Unit-3 (HDFS)
No ratings yet
Unit-3 (HDFS)
59 pages
Cloud Computing Lab Manual-New
No ratings yet
Cloud Computing Lab Manual-New
150 pages
Administration of Hadoop Summer 2014 Lab Guide v3.1
No ratings yet
Administration of Hadoop Summer 2014 Lab Guide v3.1
107 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
32 pages
Hadoop Hdfs Commands
No ratings yet
Hadoop Hdfs Commands
5 pages
BDACh 02 L01 Hadoop
No ratings yet
BDACh 02 L01 Hadoop
24 pages
Map Reduce Examples
No ratings yet
Map Reduce Examples
16 pages
Unit 3 Big Data MCQ AKTU: Royal Brinkman Gartenbaubedarf
No ratings yet
Unit 3 Big Data MCQ AKTU: Royal Brinkman Gartenbaubedarf
17 pages
Cloud Computing Chapter-11
No ratings yet
Cloud Computing Chapter-11
15 pages
CS 5950/6030 - Computer Security and Information Assurance Section 3: Program Security
No ratings yet
CS 5950/6030 - Computer Security and Information Assurance Section 3: Program Security
114 pages
Principal Operating Systems Lab Manual
No ratings yet
Principal Operating Systems Lab Manual
12 pages
Advanced Encryption Standard (AES)
No ratings yet
Advanced Encryption Standard (AES)
34 pages
First and Follow.c++
No ratings yet
First and Follow.c++
2 pages
Annual Gender Sensitisation Plan 2014-19
No ratings yet
Annual Gender Sensitisation Plan 2014-19
14 pages
Master of Computer Applications (MCA) Course
No ratings yet
Master of Computer Applications (MCA) Course
58 pages
MTech Syllabus (1-2)
No ratings yet
MTech Syllabus (1-2)
21 pages
The Pit - Pilot RMS
No ratings yet
The Pit - Pilot RMS
32 pages
Book Review Lt Cdr Harmandeep Singh
No ratings yet
Book Review Lt Cdr Harmandeep Singh
5 pages
CV GeoffrySagala
No ratings yet
CV GeoffrySagala
2 pages
Shortest Path Algorithms
No ratings yet
Shortest Path Algorithms
6 pages
Transportation
No ratings yet
Transportation
8 pages
Session 2 Procurement Defined V3
No ratings yet
Session 2 Procurement Defined V3
11 pages
Ontents: Asic Tructure of Omputers
0% (1)
Ontents: Asic Tructure of Omputers
7 pages
A Peek Into The Future
No ratings yet
A Peek Into The Future
4 pages
Assignment-1 Theory
No ratings yet
Assignment-1 Theory
3 pages
3.2 math notes
No ratings yet
3.2 math notes
5 pages
Eaom 7 Man Eng v00
No ratings yet
Eaom 7 Man Eng v00
40 pages
Jaweria Usmani
No ratings yet
Jaweria Usmani
1 page
Limits Ap Classroom PDF
100% (1)
Limits Ap Classroom PDF
23 pages
Fundamental Cyber Security Training
No ratings yet
Fundamental Cyber Security Training
9 pages
Makalah Strategi Lokasi
No ratings yet
Makalah Strategi Lokasi
15 pages
Hyper-V Failover Clusters - Hyper-V Failover Clustering Series - Part 1
No ratings yet
Hyper-V Failover Clusters - Hyper-V Failover Clustering Series - Part 1
3 pages
ZTE ZXV10 W615 V3 Product
No ratings yet
ZTE ZXV10 W615 V3 Product
5 pages
Feactures of RTOS and DOS OSY Project (AS)
No ratings yet
Feactures of RTOS and DOS OSY Project (AS)
25 pages
CS REG - Unit 7 - PART TWO
No ratings yet
CS REG - Unit 7 - PART TWO
45 pages
07_gpuarch
No ratings yet
07_gpuarch
73 pages
Migration Cockpit
No ratings yet
Migration Cockpit
15 pages
Accounts Project
0% (1)
Accounts Project
20 pages
Buyers Guide Enterprise GRC Management Solutions
No ratings yet
Buyers Guide Enterprise GRC Management Solutions
75 pages
FTB-500 Platform: Boundless Capabilities, Testing Unlimited
No ratings yet
FTB-500 Platform: Boundless Capabilities, Testing Unlimited
9 pages
Navigate Worksheets: Basic Selects
No ratings yet
Navigate Worksheets: Basic Selects
9 pages
01) System Configuration (IPECS-MG)
No ratings yet
01) System Configuration (IPECS-MG)
23 pages
Apelem Kristal X-Ray Table - User Manual PDF
No ratings yet
Apelem Kristal X-Ray Table - User Manual PDF
42 pages
BMS 3S 25A Circuit Diagram and Modified - Electronics Projects C
100% (1)
BMS 3S 25A Circuit Diagram and Modified - Electronics Projects C
5 pages
Latihan Peribahasa SPM PDF 2
No ratings yet
Latihan Peribahasa SPM PDF 2
1 page
Apollo Hospitals Case Study by Akash
0% (2)
Apollo Hospitals Case Study by Akash
28 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Install and Run Hadoop On Windows

Uploaded by

Install and Run Hadoop On Windows

Uploaded by

Install and Run Hadoop on Windows

If java is not installed in your system, then –

Go to environment variable in system properties

1 hdfs namenode –format

It would appear something like this –

Start namenode and datanode with this command –

1 hdfs dfs –mkdir /sample

1 hdfs dfs –ls /

1 hdfs dfs -copyFromLocal C:\Users\hp\Downloads\potatoes.txt /sample

1 hdfs dfs –ls /sample

1 hdfs dfs –cat /sample/potatoes.txt

1 hdfs dfs -get /sample/potatoes.txt C:\Users\hp\Desktop\priyanka

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.