0% found this document useful (0 votes)

13 views

Hadoop Mapreduce Python Script

Uploaded by

zammy official

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

Hadoop Mapreduce Python Script

Uploaded by

zammy official

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Hadoop Streaming Program using Python

____________MAPPER_____________

1> make a file named mapper.py and paste below python code for mapper in it

$ nano mapper.py

#!/usr/bin/env python

import sys

for line in sys.stdin:

line = line.strip()

words = line.split()

for word in words:

print '%s\t%s' % (word, 1)

--------understanding above code---------------

#[ for line in sys.stdin: ] described that input comes from standard input (STDIN).
Standard input(stdin), is the source of input data for python ,

#[ line = line.strip() ] removes extra spaces

#[ words = line.split() ] splits line into words

#[ for word in words: ] increases counters

#[ print '%s\t%s' % (word, 1) ] will write the result to (stdout) . This output will
input for reducer

2> Grant permission to mapper.py

$ chmod 744 /home/ubuntu/mapper.py

____________REDUCER_____________

3> make a file named reducer.py and paste below python code for reducer in it

$ nano reducer.py

#!/usr/bin/env python

from operator import itemgetter

import sys

current_word = None
current_count = 0
word = None

for line in sys.stdin:

line = line.strip()

word, count = line.split('\t', 1)

try:
count = int(count)
except ValueError:

continue

if current_word == word:
current_count += count
else:
if current_word:

print '%s\t%s' % (current_word, current_count)

current_count = count
current_word = word

if current_word == word:
print '%s\t%s' % (current_word, current_count)
----understanding above code----

#The code in reducer.py will read results of mapper.py through standard input so , output
of mapper.py and input of reducer.py must match .

#[ word, count = line.split('\t', 1) ] will parse input got from mapper

#[ try:
count = int(count)
except ValueError: ] will convert count which is in currently string format to int
because count is going to be a number , i.e int.

#The [continue] statement after the code will ignore the line if count was not the number , i.e int

#[ if current_word == word:
current_count += count
else:
if current_word: ] here if works because hadoop sorts map output i.e word before it is passed to the reducer

#[ print '%s\t%s' % (current_word, current_count)

current_count = count
current_word = word] this will write result to standard output (STDOUT)

4> Grant all permission to reducer.py

$ chmod 744 /home/ubuntu/reducer.py

RUNNING PYTHON CODE ON HADOOP_

S.
5> first copy the files that has to be Processed from our local file system to Hadoop’s HDF

$ hadoop fs -put <filename> <input>

6> run hadoop streaming jar file which will allow python code on hadoop followed by mapper reducer input
and output

$ hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-streaming-1.2.1.jar -file

/home/ubuntu/mapper.py -mapper /home/ubuntu/mapper.py -file /home/ubuntu/reducer.py
-reducer /home/ubuntu/reducer.py -input in -output out1

----------Understanding above command-------------------

Here -file takes File/dir to be shipped in the Job jar file -input takes DFS input file for the Map step .
-mapper takes the streaming command to run map steps . -reducer takes the streaming command to run
reduce step

Hadoop Python MapReduce Tutorial For Beginners
No ratings yet
Hadoop Python MapReduce Tutorial For Beginners
15 pages
lp3 The Three Horrid Little Pigs
No ratings yet
lp3 The Three Horrid Little Pigs
6 pages
Writing An Hadoop MapReduce Program in Python
No ratings yet
Writing An Hadoop MapReduce Program in Python
21 pages
Run Python MapReduce On Local Docker Hadoop Cluster - DEV Community
No ratings yet
Run Python MapReduce On Local Docker Hadoop Cluster - DEV Community
5 pages
Unit 3 MapReduce Part 2
No ratings yet
Unit 3 MapReduce Part 2
12 pages
TP3_hadoop python_Wordcount (1)
No ratings yet
TP3_hadoop python_Wordcount (1)
6 pages
Commands in Hadoop
No ratings yet
Commands in Hadoop
7 pages
MapReduce(Streaming) TP Report
No ratings yet
MapReduce(Streaming) TP Report
16 pages
lsde_workshop_wk9(2)
No ratings yet
lsde_workshop_wk9(2)
31 pages
Bda Lab Exercises Lab Mannual - 2023
No ratings yet
Bda Lab Exercises Lab Mannual - 2023
72 pages
mapred-streaming
No ratings yet
mapred-streaming
14 pages
Word Count
No ratings yet
Word Count
3 pages
hai hadoop
No ratings yet
hai hadoop
14 pages
Lez.d-01-Hadoop (A) Intro
No ratings yet
Lez.d-01-Hadoop (A) Intro
58 pages
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
Bda Lab Manual
No ratings yet
Bda Lab Manual
20 pages
Bda Experiment No2
No ratings yet
Bda Experiment No2
12 pages
Dive Into Sea of C
From Everand
Dive Into Sea of C
M Ashok
No ratings yet
Mapreduce
No ratings yet
Mapreduce
7 pages
Mapreduce: Simplified Data Processing On Large Clusters by Jeffrey Dean and Sanjay Ghemawa Presented by Jon Logan
No ratings yet
Mapreduce: Simplified Data Processing On Large Clusters by Jeffrey Dean and Sanjay Ghemawa Presented by Jon Logan
30 pages
Hadoop Streaming
No ratings yet
Hadoop Streaming
6 pages
Bash Command Line Pro Tips
From Everand
Bash Command Line Pro Tips
Jason Cannon
4.5/5 (8)
Map Reduce
No ratings yet
Map Reduce
30 pages
Palak
No ratings yet
Palak
10 pages
Hadoop Wordcount Program
No ratings yet
Hadoop Wordcount Program
20 pages
Unit-2 (Hadoop)
No ratings yet
Unit-2 (Hadoop)
16 pages
Lecture - 3
No ratings yet
Lecture - 3
25 pages
Lecture 03
No ratings yet
Lecture 03
26 pages
Python Programming Concepts
From Everand
Python Programming Concepts
MRB
No ratings yet
23-04-2024
No ratings yet
23-04-2024
3 pages
Profound Linux For Developers
From Everand
Profound Linux For Developers
Onder Teker
No ratings yet
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
55 pages
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
49 pages
Python-Deprecated Library v1.1 Documentation
From Everand
Python-Deprecated Library v1.1 Documentation
Laurent LAPORTE
No ratings yet
BDA
No ratings yet
BDA
88 pages
Module-1: Hdfs Basics Running Example Programs and Benchmarks Hadoop Mapreduce Framework Mapreduce Programming
No ratings yet
Module-1: Hdfs Basics Running Example Programs and Benchmarks Hadoop Mapreduce Framework Mapreduce Programming
33 pages
Module 3 - Mapreduce
No ratings yet
Module 3 - Mapreduce
40 pages
Bda Unit III r20csm
No ratings yet
Bda Unit III r20csm
54 pages
Dllction To MAPREDUCE Afflrlling: L Tro
No ratings yet
Dllction To MAPREDUCE Afflrlling: L Tro
12 pages
Introduction to batch processing
No ratings yet
Introduction to batch processing
23 pages
Understanding MapReduce
No ratings yet
Understanding MapReduce
4 pages
Bda Practical 2
No ratings yet
Bda Practical 2
3 pages
C++ Functions and tutorial
From Everand
C++ Functions and tutorial
Nino Paiotta
No ratings yet
HadoopExercises July2011 PDF
No ratings yet
HadoopExercises July2011 PDF
26 pages
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
day6
No ratings yet
day6
12 pages
6. Map Reduce Programming
No ratings yet
6. Map Reduce Programming
67 pages
084 Liza Bda File
No ratings yet
084 Liza Bda File
23 pages
Mapreduce
No ratings yet
Mapreduce
94 pages
Extreme Computing Lab Exercises Session One: 1 Getting Started
No ratings yet
Extreme Computing Lab Exercises Session One: 1 Getting Started
6 pages
Procedure: 1
No ratings yet
Procedure: 1
29 pages
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
From Everand
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
C Programming
From Everand
C Programming
Netra
No ratings yet
Part 03 Intro To Hadoop
No ratings yet
Part 03 Intro To Hadoop
22 pages
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
ProgrammingHadoop ApacheConUS08
No ratings yet
ProgrammingHadoop ApacheConUS08
7 pages
2.RDDs in Spark
No ratings yet
2.RDDs in Spark
38 pages
C Language Programming Codes
From Everand
C Language Programming Codes
Durgesh
No ratings yet
Mastering Go A Practical Guide to Developers: A Practical Guide to Developers
From Everand
Mastering Go A Practical Guide to Developers: A Practical Guide to Developers
Miguel Miranda de Mattos
No ratings yet
Big Data Analytics Unit-3
No ratings yet
Big Data Analytics Unit-3
29 pages
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
No ratings yet
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
11 pages
Esl P2 MS 2021-2020
100% (1)
Esl P2 MS 2021-2020
22 pages
XII CS Material 2022-23
No ratings yet
XII CS Material 2022-23
308 pages
Reproduction_of_Batak_Manuscript_for_the_Purposes_ (1)
No ratings yet
Reproduction_of_Batak_Manuscript_for_the_Purposes_ (1)
14 pages
2ndtermtt
No ratings yet
2ndtermtt
1 page
Web Based Graduate Exit Examination System
100% (6)
Web Based Graduate Exit Examination System
78 pages
Mouse Review
No ratings yet
Mouse Review
2 pages
4th MT in ENGLISH2
No ratings yet
4th MT in ENGLISH2
2 pages
In Simple Present
No ratings yet
In Simple Present
6 pages
Chapter 4 Differ en Ti Able Functions
No ratings yet
Chapter 4 Differ en Ti Able Functions
32 pages
Export HTML Table To PDF
No ratings yet
Export HTML Table To PDF
2 pages
Student Info Sys
No ratings yet
Student Info Sys
125 pages
Passive Voice Explained
100% (1)
Passive Voice Explained
3 pages
Snoc CM Mop 4g x2 Addition Zte v1.0
No ratings yet
Snoc CM Mop 4g x2 Addition Zte v1.0
8 pages
Azure Aks Agic v2
No ratings yet
Azure Aks Agic v2
161 pages
HSK-level-4
No ratings yet
HSK-level-4
3 pages
English10 Q2 Mod2
100% (1)
English10 Q2 Mod2
4 pages
Bab 2
No ratings yet
Bab 2
9 pages
Antonyms, Synonyms, Homophones. Worksheet 2, Third Term: I.E Escuela Nacional Auxiliares de Enfermería
No ratings yet
Antonyms, Synonyms, Homophones. Worksheet 2, Third Term: I.E Escuela Nacional Auxiliares de Enfermería
6 pages
MAD Chapter6
No ratings yet
MAD Chapter6
53 pages
Computer-Assembly-And-Repair-Lab-Manual 2023 - 202 - 231209 - 093922
No ratings yet
Computer-Assembly-And-Repair-Lab-Manual 2023 - 202 - 231209 - 093922
63 pages
SDCS:SA Cheat Sheet v2.0: " Top To Bottom, First One Wins"
No ratings yet
SDCS:SA Cheat Sheet v2.0: " Top To Bottom, First One Wins"
2 pages
AN IND 1 017 - PanelControlPlugIn
No ratings yet
AN IND 1 017 - PanelControlPlugIn
15 pages
Theology of Youth Ministry
No ratings yet
Theology of Youth Ministry
11 pages
English Class 2 Work Sheet 1 2 3 4 PDF
No ratings yet
English Class 2 Work Sheet 1 2 3 4 PDF
8 pages
SSLC 2nd-Lang-English-MCQ-2023
No ratings yet
SSLC 2nd-Lang-English-MCQ-2023
55 pages
Elements of Creative Nonfiction
No ratings yet
Elements of Creative Nonfiction
31 pages
Final Reflection Engl 1302
No ratings yet
Final Reflection Engl 1302
5 pages
2020 MMP-AI Assignment 01 PDF
No ratings yet
2020 MMP-AI Assignment 01 PDF
2 pages
London Summer
No ratings yet
London Summer
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Hadoop Mapreduce Python Script

Uploaded by

Hadoop Mapreduce Python Script

Uploaded by

Hadoop Streaming Program using Python

for line in sys.stdin:

for word in words:

print '%s\t%s' % (word, 1)

--------understanding above code---------------

#[ line = line.strip() ] removes extra spaces

#[ words = line.split() ] splits line into words

#[ for word in words: ] increases counters

2> Grant permission to mapper.py

$ chmod 744 /home/ubuntu/mapper.py

from operator import itemgetter

for line in sys.stdin:

word, count = line.split('\t', 1)

print '%s\t%s' % (current_word, current_count)

#[ word, count = line.split('\t', 1) ] will parse input got from mapper

#[ print '%s\t%s' % (current_word, current_count)

4> Grant all permission to reducer.py

$ chmod 744 /home/ubuntu/reducer.py

RUNNING PYTHON CODE ON HADOOP_

$ hadoop fs -put <filename> <input>

$ hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-streaming-1.2.1.jar -file

----------Understanding above command-------------------

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Hadoop Mapreduce Python Script

Uploaded by

Hadoop Mapreduce Python Script

Uploaded by

Hadoop Streaming Program using Python

for line in sys.stdin:

for word in words:

print '%s\t%s' % (word, 1)

--------understanding above code---------------

#[ line = line.strip() ] removes extra spaces

#[ words = line.split() ] splits line into words

#[ for word in words: ] increases counters

2> ​Grant permission to mapper.py

$ chmod 744 /home/ubuntu/mapper.py

from operator import itemgetter

for line in sys.stdin:

word, count = line.split('\t', 1)

print '%s\t%s' % (current_word, current_count)

#[ word, count = line.split('\t', 1) ] will parse input got from mapper

#[ print '%s\t%s' % (current_word, current_count)

​4>​ ​Grant all permission to reducer.py

$ chmod 744 /home/ubuntu/reducer.py

____________​RUNNING PYTHON CODE ON HADOOP​_____________

​ $ hadoop fs -put <filename> <input>

$ hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-streaming-1.2.1.jar -file

----------Understanding above command-------------------

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

2> Grant permission to mapper.py

4> Grant all permission to reducer.py

RUNNING PYTHON CODE ON HADOOP_

$ hadoop fs -put <filename> <input>