0% found this document useful (0 votes)
9 views

Big_data_Lab_Manual[1] (4)

This document is a laboratory manual for a B.Tech practical course on Big Data at SAGE University, Indore, detailing various experiments and objectives related to Hadoop and its functionalities. It includes instructions for installing Hadoop, performing file management tasks, and implementing matrix multiplication using MapReduce. The manual is structured with an index and outlines specific experiments along with their objectives, procedures, and expected outcomes.

Uploaded by

deepakgorya5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Big_data_Lab_Manual[1] (4)

This document is a laboratory manual for a B.Tech practical course on Big Data at SAGE University, Indore, detailing various experiments and objectives related to Hadoop and its functionalities. It includes instructions for installing Hadoop, performing file management tasks, and implementing matrix multiplication using MapReduce. The manual is structured with an index and outlines specific experiments along with their objectives, procedures, and expected outcomes.

Uploaded by

deepakgorya5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

SAGE UNIVERSITY, INDORE

Institute of Advance Computing


Laboratory Manual
B. Tech- Practical
Academic Session: Jan-Jun 2025

Name : Yuvraj Tomar


Enroll number : 22ADV3CSE1089
Program : B.Tech CSE
Course Code : ECSDEBDH001P
Course Name : Big Data Lab

Course Coordinator Head of Department


Name : Prof. Sudha Kore Name : Dr. Manoj K. Ramaiya

Signature :
INDEX
S.N
LIST OF EXPRIEMENTS DATE SIGNATURE
O
Downloading and installing Hadoop; Understanding
different Hadoop modes. Startup scripts, Configuration
1 10/01/2025
files.

Hadoop Implementation of file management tasks, such


as Adding files and directories, Retrieving files and
2 17/01/2025
Deleting files.

Implement of Matrix Multiplication with Hadoop Map


3 Reduce. 24/01/2025

Run a basic Word Count Map Reduce program to


4 07/02/2025
understand Map Reduce Paradigm
Implementation of K-means clustering using Map Reduce 14/02/2025
5
Installation of Hive along with practice examples.
6 17/02/2025
Installation of HBase, Installing thrift along with Practice
7 examples 21/03/2025

To study and implement basic functions and commands


in programming
8 04/04/2025
EXPERIMET – 1

Objective: Downloading and installing Hadoop; Understanding different


Hadoop modes. Startup scripts, Configuration files.

Hadoop Installation

Environment required for Hadoop: The production environment of Hadoop is


UNIX, but it can also be used in Windows using Cygwin. Java 1.6 or above is
needed to run Map Reduce Programs. For Hadoop installation from tar ball on
the UNIX environment you need

1. Java Installation

2. SSH installation

3. Hadoop Installation and File Configuration

Java Installation

Step 1. Type "java -version" in prompt to find if the java is installed or not.
If not then download java from
http://www.oracle.com/technetwork/java/javase/downloads/jdk7-
downloads-1880260.html . The tar filejdk-7u71-linux-x64.tar.gz will be
downloaded to your system.

Step 2. Extract the file using the below command

#tar zxf jdk-7u71-linux-x64.tar.gz

Step 3. To make java available for all the users of UNIX move the file to
/usr/local and set the path. In the prompt switch to root user and then type
the command below to move the jdk to /usr/lib.

# mv jdk1.7.0_71 /usr/lib/

Now in ~/.bashrc file add the following commands to set up the path.

# export JAVA_HOME=/usr/lib/jdk1.7.0_71

# export PATH=PATH:$JAVA_HOME/bin
Now, you can check the installation by typing "java -version" in the
prompt.

2) SSH Installation

SSH is used to interact with the master and slaves computer without any
prompt for password. First of all create a Hadoop user on the master and
slave systems

# useradd hadoop

# passwd Hadoop

To map the nodes open the hosts file present in /etc/ folder on all the
machines and put the ip address along with their host name.

# vi /etc/hosts

Enter the lines below

190.12.1.114 hadoop-master

190.12.1.121 hadoop-salve-one

190.12.1.143 hadoop-slave-two

Set up SSH key in every node so that they can communicate among
themselves without password. Commands for the same are:

# su hadoop

$ ssh-keygen -t rsa

$ ssh-copy-id -i ~/.ssh/id_rsa.pub tutorialspoint@hadoop-master

$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop_tp1@hadoop-slave-1

$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop_tp2@hadoop-slave-2

$ chmod 0600 ~/.ssh/authorized_keys

$ exit

3) Hadoop Installation

Hadoop can be downloaded from


http://developer.yahoo.com/hadoop/tutorial/module3.html
Now extract the Hadoop and copy it to a location.

$ mkdir /usr/hadoop

$ sudo tar vxzf hadoop-2.2.0.tar.gz ?c /usr/hadoop

Change the ownership of Hadoop folder

$sudo chown -R hadoop usr/hadoop

Change the Hadoop configuration files:

All the files are present in /usr/local/Hadoop/etc/hadoop

1) In hadoop-env.sh file add

export JAVA_HOME=/usr/lib/jvm/jdk/jdk1.7.0_71

2) In core-site.xml add following between configuration tabs,

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://hadoop-master:9000</value>

</property>

<property>

<name>dfs.permissions</name>

<value>false</value>

</property>

</configuration>

In hdfs-site.xmladd following between configuration tabs,

<configuration>

<property>

<name>dfs.data.dir</name>

<value>usr/hadoop/dfs/name/data</value>
<final>true</final>

</property>

<property>

<name>dfs.name.dir</name>

<value>usr/hadoop/dfs/name</value>

<final>true</final>

</property>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

</configuration>

4) Open the Mapred-site.xml and make the change as shown below

<configuration>

<property>

<name>mapred.job.tracker</name>

<value>hadoop-master:9001</value>

</property>

</configuration>

5) Finally, update your $HOME/.bahsrc

cd $HOME

vi .bashrc

Append following lines in the end and save and exit

#Hadoop variables

export JAVA_HOME=/usr/lib/jvm/jdk/jdk1.7.0_71
export HADOOP_INSTALL=/usr/hadoop

export PATH=$PATH:$HADOOP_INSTALL/bin

export PATH=$PATH:$HADOOP_INSTALL/sbin

export HADOOP_MAPRED_HOME=$HADOOP_INSTALL

export HADOOP_COMMON_HOME=$HADOOP_INSTALL

export HADOOP_HDFS_HOME=$HADOOP_INSTALL

export YARN_HOME=$HADOOP_INSTALL

On the slave machine install Hadoop using the command below

# su hadoop

$ cd /opt/hadoop

$ scp -r hadoop hadoop-slave-one:/usr/hadoop

$ scp -r hadoop hadoop-slave-two:/usr/Hadoop

Configure master node and slave node

$ vi etc/hadoop/masters

hadoop-master

$ vi etc/hadoop/slaves

hadoop-slave-one

hadoop-slave-two

After this format the name node and start all the deamons

# su hadoop

$ cd /usr/hadoop

$ bin/hadoop namenode -format

$ cd $HADOOP_HOME/sbin

$ start-all.sh
EXPERIMENT - 2
Objective: Hadoop Implementation of file management tasks, such as Adding
file directories, Retrieving files and Deleting files.

Execute various reading, writing operations such as creating a directory,


providing permissions, copying files, updating files, deleting, etc. You can add
access rights and browse the file system to get the cluster information like the
number of dead nodes, live nodes, spaces used, etc.

HDFS Operations to Read the file

To read any file from the HDFS, you have to interact with the NameNode as it
stores the metadata about the DataNodes. The user gets a token from the
NameNode and that specifies the address where the data is stored.

You can put a read request to NameNode for a particular block location through
distributed file systems. The NameNode will then check your privilege to access
the DataNode and allows you to read the address block if the access is valid.

$ hadoop fs -cat <file>

HDFS Operations to write in file

Similar to the read operation, the HDFS Write operation is used to write the file
on a particular address through the NameNode. This NameNode provides the
slave address where the client/user can write or add data. After writing on the
block location, the slave replicates that block and copies to another slave location
using the factor 3 replication. The salve is then reverted back to the client for
authentication.

The process for accessing a NameNode is pretty similar to that of a reading


operation. Below is the HDFS write commence:

bin/hdfs dfs -ls <path>

Take the Big Data Training and learn the key technologies from subject matter
experts.
Listing Files in HDFS

Finding the list of files in a directory and the status of a file using ‘ls’ command
in the terminal. Syntax of ls can be passed to a directory or a filename as an
argument which are displayed as follows:

$ $HADOOP_HOME/bin/hadoop fs -ls <args>

Inserting Data into HDFS

Below mentioned steps are followed to insert the required file in the Hadoop file
system.

Step1: Create an input directory

$ $HADOOP_HOME/bin/hadoop fs -mkdir /user/input

Step2: Use the put command transfer and store the data file from the local
systems to the HDFS using the following commands in the terminal.

$ $HADOOP_HOME/bin/hadoop fs -put /home/intellipaat.txt /user/input

Step3: Verify the file using ls command.

$ $HADOOP_HOME/bin/hadoop fs -ls /user/input

Retrieving Data from HDFS

For instance, if you have a file in HDFS called Intellipaat. Then retrieve the
required file from the Hadoop file system by carrying out:

Step1: View the data from HDFS using the cat command.

$ $HADOOP_HOME/bin/hadoop fs -cat /user/output/intellipaat

Step2: Gets the file from HDFS to the local file system using get command as
shown below

$ $HADOOP_HOME/bin/hadoop fs -get /user/output/ /home/hadoop_tp/


Shutting Down the HDFS

Shut down the HDFS files by following the below command

$ stop-dfs.sh

Multi-Node Cluster

Installing Java

Syntax of java version command

$ java -version

Following output is presented.

java version "1.7.0_71"

Java(TM) SE Runtime Environment (build 1.7.0_71-b13)

Java HotSpot(TM) Client VM (build 25.0-b02, mixed mode)

Get 100% Hike!

Master Most in Demand Skills Now !

Submit

Creating User Account

System user account is used on both master and slave systems for the Hadoop
installation.

# useradd hadoop

# passwd hadoop

Mapping the nodes

Hosts files should be edited in /etc/ folder on each and every nodes and IP address
of each system followed by their host names must be specified mandatorily.

# vi /etc/hosts

Enter the following lines in the /etc/hosts file.


192.168.1.109 hadoop-master

192.168.1.145 hadoop-slave-1

192.168.56.1 hadoop-slave-2

Configuring Key Based Login

Ssh should be set up in each node so they can easily converse with one another
without any prompt for a password.

# su hadoop

$ ssh-keygen -t rsa

$ ssh-copy-id -i ~/.ssh/id_rsa.pub tutorialspoint@hadoop-master

$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop_tp1@hadoop-slave-1

$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop_tp2@hadoop-slave-2

$ chmod 0600 ~/.ssh/authorized_keys

$ exit

Installation of Hadoop

Hadoop should be downloaded in the master server using the following


procedure.

# mkdir /opt/hadoop

# cd /opt/hadoop/

# wget http://apache.mesi.com.ar/hadoop/common/hadoop-1.2.1/hadoop-
1.2.0.tar.gz

# tar -xzf hadoop-1.2.0.tar.gz

# mv hadoop-1.2.0 hadoop

# chown -R hadoop /opt/hadoop


Experiment – 3
1. Objective : Implement of Matrix Multiplication with Hadoop Map
Reduce.

 Description

Let A be an m x n matrix and B an n x p matrix. we want to compute the product


AB,an m x p matrix.Here input is a text file containing matrix element as given
below. so key will be byte offset of each line and value will be each line (e.g
A,0,1,1.0).

Then each Mapper class will implements this logic


Mapper Class
map(key, value): // value is ("A", i, j, a_ij) or ("B", j, k, b_jk)

if value[0] == "A":

i = value[1]
j = value[2]
a_ij = value[3]
for k = 1 to p:
contex.write((i, k) as key, (A, j, a_ij) as value)
else:
j = value[1]
k = value[2]
b_jk = value[3]
for i = 1 to m:
contex.write((i, k) as key, (B, j, b_jk) as value)

After generating intermediate<key,value>pairs.In shuffle and sorting phase we


get multiple values for each unique keys(e.g- <(0,0)
[(A,1,1.0),(A,2,2.0),(A,3,3.0),(A,4,4.0),(B,1,3.0),(B,2,6.0),(B,3,9.0),(B,4,12.0)]
>)

ThenReducerwill implement this below logic to find output result

Reducer Class
reduce(key, values):
// key is (i, k)
// values is a list of ("A", j, a_ij) and ("B", j, b_jk)

hash_A = {j: a_ij for (x, j, a_ij) in values if x == A}


hash_B = {j: b_jk for (x, j, b_jk) in values if x == B}
result = 0
for j = 1 to n:
result += hash_A[j] * hash_B[j]
contex.write(key, result)
so result will be <(0,0),90> for <(0,0)
[(A,1,1.0),(A,2,2.0),(A,3,3.0),(A,4,4.0),(B,1,3.0),(B,2,6.0),(B,3,9.0),(B,4,12.0)]>
intermediate key value input to Reducer.

 Input
The input file has one line of the following format for each non-zero element m_{ij} of a
matrix M:
<M><i><j><m_ij>
Suppose

A sample input file for the vector A is given below :


1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 2.0 9.0 1.0 2.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0
8.0 2.0 9.0 1.0 2.0
1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 2.0 9.0 1.0 2.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0
8.0 2.0 9.0 1.0 2.0

The input file that represents A and B has the following lines:
A,0,1,1.0
A,0,2,2.0
............ ..........
B,0,1,1.0
B,0,2,2.0

 Output
The output file has one line of the following format for each non-zero element m_{ij} of a
matrix M:
<i><j<m_ij>
In our example, the output file that represents AB should have the following lines:
0,0,90.0
0,1,100.0
0,2,110.0
1,0,240.0
1,1,275.0
1,2,310.0
Experiment -4
Objective : write a program to check whether an input number is prime
number or not .
In MapReduce word count example, we find out the frequency of
each word. Here, the role of Mapper is to map the keys to the existing
values and the role of Reducer is to aggregate the keys of common
values. So, everything is represented in the form of Key-value pair.

Pre-requisite
o Java Installation - Check whether the Java is installed or not
using the following command.
java -version
o Hadoop Installation - Check whether the Hadoop is installed or
not using the following command.
hadoop version
If any of them is not installed in your system, follow the below link to
install it.
www.javatpoint.com/hadoop-installation
Steps to execute MapReduce word count example
o Create a text file in your local machine and write some
text into it.
$ nano data.txt
o Check the text written in the data.txt file.
$ cat data.txt

In this example, we find out the frequency of each word exists in this text file.

o Create a directory in HDFS, where to kept text file.


$ hdfs dfs -mkdir /test

o Upload the data.txt file on HDFS in the specific directory.


$ hdfs dfs -put /home/codegyani/data.txt /test
o Write the MapReduce program using eclipse.

File: WC_Mapper.java

package com.javatpoint;

import java.io.IOException;

import java.util.StringTokenizer;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.MapReduceBase;

import org.apache.hadoop.mapred.Mapper;

import org.apache.hadoop.mapred.OutputCollector;

import org.apache.hadoop.mapred.Reporter;

public class WC_Mapper extends MapReduceBase implements Mapper<Long


Writable,Text,Text,IntWritable>{

private final static IntWritable one = new IntWritable(1);


private Text word = new Text();

public void map(LongWritable key, Text value,OutputCollector<Tet,


IntWritable> output,

Reporter reporter) throws IOException{

String line = value.toString();

StringTokenizer tokenizer = new StringTokenizer(line);

while (tokenizer.hasMoreTokens()){

word.set(tokenizer.nextToken());

output.collect(word, one);

File: WC_Reducer.java

package com.javatpoint;

import java.io.IOException;

import java.util.Iterator;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.MapReduceBase;

import org.apache.hadoop.mapred.OutputCollector;

import org.apache.hadoop.mapred.Reducer;

import org.apache.hadoop.mapred.Reporter;

public class WC_Reducer extends MapReduceBase implements Reducer<Te


xt,IntWritable,Text,IntWritable> {

public void reduce(Text key, Iterator<IntWritable> values,OutputCollector<


Text,IntWritable> output,
Reporter reporter) throws IOException {

int sum=0;

while (values.hasNext()) {

sum+=values.next().get();

output.collect(key,new IntWritable(sum));

File: WC_Runner.java

package com.javatpoint;

import java.io.IOException;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.FileInputFormat;

import org.apache.hadoop.mapred.FileOutputFormat;

import org.apache.hadoop.mapred.JobClient;

import org.apache.hadoop.mapred.JobConf;

import org.apache.hadoop.mapred.TextInputFormat;

import org.apache.hadoop.mapred.TextOutputFormat;

public class WC_Runner {

public static void main(String[] args) throws IOException{

JobConf conf = new JobConf(WC_Runner.class);

conf.setJobName("WordCount");

conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);

conf.setMapperClass(WC_Mapper.class);

conf.setCombinerClass(WC_Reducer.class);

conf.setReducerClass(WC_Reducer.class);

conf.setInputFormat(TextInputFormat.class);

conf.setOutputFormat(TextOutputFormat.class);

FileInputFormat.setInputPaths(conf,new Path(args[0]));

FileOutputFormat.setOutputPath(conf,new Path(args[1]));

JobClient.runJob(conf);

Download the source code.

o Create the jar file of this program and name it countworddemo.jar.

o Run the jar file


hadoop jar /home/codegyani/wordcountdemo.jar
com.javatpoint.WC_Runner /test/data.txt /r_output

o The output is stored in /r_output/part-00000


o Now execute the command to see the output.
hdfs dfs -cat /r_output/part-00000
Experiment -5
Objective : write a recursive code to compute the nth term Fibonacci
number.

Experiment /code :
# python function to calculate nth term of series .

def recur_fibo(n):
if n <= 1 :
return n
else :
return(recur_fibo(n-1) + recur_fibo(n-2))

nterms = int(input(“how many terms you want ?”))

# check if the number of terms is valid

if nterms<= 0:

print (“please enter a positive integer”)

else :

print(“Fibonacci sequence -->”)

for i in range (nterms):

print(recur_fibo(i))

Output :
How many terms you want? 7

Fibonacci sequence-->

5
Experiment -6
Objective: Write a recursive function to find the factorial of a natural
number.

Experiment /code:
#python function to calculator factorial of a number .
def fac(n):
if n == 1:
return 1
else :
return n * fac(n-1)

#Take the number as input to calculate factorial.


num = int(input(“enter a number for factorial .”))

#process the factorial

if num< 0:

print(“please enter a positive number.”)

elif num==0:

print(the factorial of zero is one .”)

else :

print(“factorial of {num} is {fac(num)}”)

Output:
Enter a number to factorial. 5
Factorial of 5 is 120
Experiment – 7
Objective : Write a python function power Fn(num,pr) to calculate
the result.

Experiment /code :
def pwr (base,power):
result = 1
For i in range (power):
result*= base
return result
# take numbers base,power for calculate power.
base = int(input(“enter base number.”))
exp = int(input(“enter exponent.”))
print(f”The calculated answer is {pwr(base,exp)}”)

Output:
Enter base number. 2
Enter exponent. 3
The calculated answer is 8
Experiment -8
Objective : Write a python code to find the sum of all elements of a
list.

Experiment /code :
total = 0
li = [1,2,3,4,5]
for i in li:
total *=i
print(“the sum of all list elements:”,total)
Output:
The sum of all list elements: 15
Experiment-09
Objective: Write a python program to implement a stack and queue
using a list data-structure

 Stack implementation 
 Experiment /code:
stack = []
while true :
op = input (“ “ “choose any operation.
1.PUSH
2.POP
3.SHOW
4.OVERFLOW
5.UNDERFLOW
6.EXIT
” ” ”)

if op ==”1
if len(stack)=10:
print (*---->stack overflow.”)
continue
ele = input (“enter an element to push: ’’)
stack.insert(0,ele)
print (f”---->element {ele}added.”)
elif op==”2”:
if len(stack)==0:
print (“---->stack.underflow “)
continue
stack.pop(0)
print(“------> element poped”)
elif op==”3”:
print (“-->stack :”, stack )
elif op==”4”:
if len(stack)==0
print (“-------->ohh! Stack is not overflowing”)
else:
print(“------>no stack is not overflowing”)
elif op==”5”:
if len(stack)==0:
print(“---->ohh! Stack is underflowing”)
else:
print(“-------> no stack is not underflowing”)
elif op===”6”:
print(“Thank you ”)
break
Output:-
Queue Implementation
Experiment/ Code:
queue = []

while True:

op=Input (“”” Choose any operation.

1. PUSH

2.POP

3.SHOW

4. OVER FLOW

5. UNDER FLOW

6. EXIT

“””)

if op==”1” :

if len(queue) == 10:

print(“-------> Queue Overflow”)

Continue

ele= input("Enter an element to push: “)

queue.insert(o, ele)

print("------> Element {ele} added. “)

elif op==”2”:

if len(queue)==0:

print("--------> Queue underflow")

continue

queue.pop(-1)

print (“-------> Element “opos”)


elif op=="3":

print("------ > Queue: " queue)

elif op==”4":

if len(queue) == 10:

print(“------->Ohh! Queue 13 overflowing")

else:

print(“------>no, Queue is not overflowing”)

elif op=="5":

if len(queue)==0 :

print(“------->Ohh! Queue is underflowing”)

else:

print("----->No, queue is not underflowing”)

elit op==”5":

print("Thank You “)

break
Output:
EXPERIMENT - 10
Objective: Write a Python program to perform linear and binary
search on list elements.
Experiment/ Code For Linear Search:
#Function to implement linear search

def l_search(list,n):

for i in list:

if i == n:

return True

return False

# list of element for testing

list=[5, 10, 2, 3, 45, 2, 34]

# using defined function to find element

n = int(input("Enter an element: "))

if 1_search(list, n):

print("Element Found.")

else:

print("Element Not Found.")

Output:
Experiment/ Code For Binary Search:
def binary_search(arr, low, high, x):

# Check base case

if high >= low:

mid = (high + low) // 2

if arr[mid] == x: # If element is present at the middle itself

return mid

# If element is smaller than mid, then it can only

# be present in left subarray

elifarr[mid] > x:

return binary_search(arr, low, mid - 1, x)

# Else the element can only be present in right subarray

else:

return binary_search(arr, mid + 1, high, x)

else:

return -1 # Element is not present in the array

arr = [ 2, 3, 4, 10, 40 ] # Test array

x = 10

# Function call

result = binary_search(arr, 0, len(arr)-1, x)

if result != -1:

print("Element is present at index", str(result))

else:

print("Element is not present in array")

Output:

Element is present at index 3

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy