Big_data_Lab_Manual[1] (4)
Big_data_Lab_Manual[1] (4)
Signature :
INDEX
S.N
LIST OF EXPRIEMENTS DATE SIGNATURE
O
Downloading and installing Hadoop; Understanding
different Hadoop modes. Startup scripts, Configuration
1 10/01/2025
files.
Hadoop Installation
1. Java Installation
2. SSH installation
Java Installation
Step 1. Type "java -version" in prompt to find if the java is installed or not.
If not then download java from
http://www.oracle.com/technetwork/java/javase/downloads/jdk7-
downloads-1880260.html . The tar filejdk-7u71-linux-x64.tar.gz will be
downloaded to your system.
Step 3. To make java available for all the users of UNIX move the file to
/usr/local and set the path. In the prompt switch to root user and then type
the command below to move the jdk to /usr/lib.
# mv jdk1.7.0_71 /usr/lib/
Now in ~/.bashrc file add the following commands to set up the path.
# export JAVA_HOME=/usr/lib/jdk1.7.0_71
# export PATH=PATH:$JAVA_HOME/bin
Now, you can check the installation by typing "java -version" in the
prompt.
2) SSH Installation
SSH is used to interact with the master and slaves computer without any
prompt for password. First of all create a Hadoop user on the master and
slave systems
# useradd hadoop
# passwd Hadoop
To map the nodes open the hosts file present in /etc/ folder on all the
machines and put the ip address along with their host name.
# vi /etc/hosts
190.12.1.114 hadoop-master
190.12.1.121 hadoop-salve-one
190.12.1.143 hadoop-slave-two
Set up SSH key in every node so that they can communicate among
themselves without password. Commands for the same are:
# su hadoop
$ ssh-keygen -t rsa
$ exit
3) Hadoop Installation
$ mkdir /usr/hadoop
export JAVA_HOME=/usr/lib/jvm/jdk/jdk1.7.0_71
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop-master:9000</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
<configuration>
<property>
<name>dfs.data.dir</name>
<value>usr/hadoop/dfs/name/data</value>
<final>true</final>
</property>
<property>
<name>dfs.name.dir</name>
<value>usr/hadoop/dfs/name</value>
<final>true</final>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hadoop-master:9001</value>
</property>
</configuration>
cd $HOME
vi .bashrc
#Hadoop variables
export JAVA_HOME=/usr/lib/jvm/jdk/jdk1.7.0_71
export HADOOP_INSTALL=/usr/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
# su hadoop
$ cd /opt/hadoop
$ vi etc/hadoop/masters
hadoop-master
$ vi etc/hadoop/slaves
hadoop-slave-one
hadoop-slave-two
After this format the name node and start all the deamons
# su hadoop
$ cd /usr/hadoop
$ cd $HADOOP_HOME/sbin
$ start-all.sh
EXPERIMENT - 2
Objective: Hadoop Implementation of file management tasks, such as Adding
file directories, Retrieving files and Deleting files.
To read any file from the HDFS, you have to interact with the NameNode as it
stores the metadata about the DataNodes. The user gets a token from the
NameNode and that specifies the address where the data is stored.
You can put a read request to NameNode for a particular block location through
distributed file systems. The NameNode will then check your privilege to access
the DataNode and allows you to read the address block if the access is valid.
Similar to the read operation, the HDFS Write operation is used to write the file
on a particular address through the NameNode. This NameNode provides the
slave address where the client/user can write or add data. After writing on the
block location, the slave replicates that block and copies to another slave location
using the factor 3 replication. The salve is then reverted back to the client for
authentication.
Take the Big Data Training and learn the key technologies from subject matter
experts.
Listing Files in HDFS
Finding the list of files in a directory and the status of a file using ‘ls’ command
in the terminal. Syntax of ls can be passed to a directory or a filename as an
argument which are displayed as follows:
Below mentioned steps are followed to insert the required file in the Hadoop file
system.
Step2: Use the put command transfer and store the data file from the local
systems to the HDFS using the following commands in the terminal.
For instance, if you have a file in HDFS called Intellipaat. Then retrieve the
required file from the Hadoop file system by carrying out:
Step1: View the data from HDFS using the cat command.
Step2: Gets the file from HDFS to the local file system using get command as
shown below
$ stop-dfs.sh
Multi-Node Cluster
Installing Java
$ java -version
Submit
System user account is used on both master and slave systems for the Hadoop
installation.
# useradd hadoop
# passwd hadoop
Hosts files should be edited in /etc/ folder on each and every nodes and IP address
of each system followed by their host names must be specified mandatorily.
# vi /etc/hosts
192.168.1.145 hadoop-slave-1
192.168.56.1 hadoop-slave-2
Ssh should be set up in each node so they can easily converse with one another
without any prompt for a password.
# su hadoop
$ ssh-keygen -t rsa
$ exit
Installation of Hadoop
# mkdir /opt/hadoop
# cd /opt/hadoop/
# wget http://apache.mesi.com.ar/hadoop/common/hadoop-1.2.1/hadoop-
1.2.0.tar.gz
# mv hadoop-1.2.0 hadoop
Description
if value[0] == "A":
i = value[1]
j = value[2]
a_ij = value[3]
for k = 1 to p:
contex.write((i, k) as key, (A, j, a_ij) as value)
else:
j = value[1]
k = value[2]
b_jk = value[3]
for i = 1 to m:
contex.write((i, k) as key, (B, j, b_jk) as value)
Reducer Class
reduce(key, values):
// key is (i, k)
// values is a list of ("A", j, a_ij) and ("B", j, b_jk)
Input
The input file has one line of the following format for each non-zero element m_{ij} of a
matrix M:
<M><i><j><m_ij>
Suppose
The input file that represents A and B has the following lines:
A,0,1,1.0
A,0,2,2.0
............ ..........
B,0,1,1.0
B,0,2,2.0
Output
The output file has one line of the following format for each non-zero element m_{ij} of a
matrix M:
<i><j<m_ij>
In our example, the output file that represents AB should have the following lines:
0,0,90.0
0,1,100.0
0,2,110.0
1,0,240.0
1,1,275.0
1,2,310.0
Experiment -4
Objective : write a program to check whether an input number is prime
number or not .
In MapReduce word count example, we find out the frequency of
each word. Here, the role of Mapper is to map the keys to the existing
values and the role of Reducer is to aggregate the keys of common
values. So, everything is represented in the form of Key-value pair.
Pre-requisite
o Java Installation - Check whether the Java is installed or not
using the following command.
java -version
o Hadoop Installation - Check whether the Hadoop is installed or
not using the following command.
hadoop version
If any of them is not installed in your system, follow the below link to
install it.
www.javatpoint.com/hadoop-installation
Steps to execute MapReduce word count example
o Create a text file in your local machine and write some
text into it.
$ nano data.txt
o Check the text written in the data.txt file.
$ cat data.txt
In this example, we find out the frequency of each word exists in this text file.
File: WC_Mapper.java
package com.javatpoint;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
while (tokenizer.hasMoreTokens()){
word.set(tokenizer.nextToken());
output.collect(word, one);
File: WC_Reducer.java
package com.javatpoint;
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
int sum=0;
while (values.hasNext()) {
sum+=values.next().get();
output.collect(key,new IntWritable(sum));
File: WC_Runner.java
package com.javatpoint;
import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;
conf.setJobName("WordCount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(WC_Mapper.class);
conf.setCombinerClass(WC_Reducer.class);
conf.setReducerClass(WC_Reducer.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf,new Path(args[0]));
FileOutputFormat.setOutputPath(conf,new Path(args[1]));
JobClient.runJob(conf);
Experiment /code :
# python function to calculate nth term of series .
def recur_fibo(n):
if n <= 1 :
return n
else :
return(recur_fibo(n-1) + recur_fibo(n-2))
if nterms<= 0:
else :
print(recur_fibo(i))
Output :
How many terms you want? 7
Fibonacci sequence-->
5
Experiment -6
Objective: Write a recursive function to find the factorial of a natural
number.
Experiment /code:
#python function to calculator factorial of a number .
def fac(n):
if n == 1:
return 1
else :
return n * fac(n-1)
if num< 0:
elif num==0:
else :
Output:
Enter a number to factorial. 5
Factorial of 5 is 120
Experiment – 7
Objective : Write a python function power Fn(num,pr) to calculate
the result.
Experiment /code :
def pwr (base,power):
result = 1
For i in range (power):
result*= base
return result
# take numbers base,power for calculate power.
base = int(input(“enter base number.”))
exp = int(input(“enter exponent.”))
print(f”The calculated answer is {pwr(base,exp)}”)
Output:
Enter base number. 2
Enter exponent. 3
The calculated answer is 8
Experiment -8
Objective : Write a python code to find the sum of all elements of a
list.
Experiment /code :
total = 0
li = [1,2,3,4,5]
for i in li:
total *=i
print(“the sum of all list elements:”,total)
Output:
The sum of all list elements: 15
Experiment-09
Objective: Write a python program to implement a stack and queue
using a list data-structure
Stack implementation
Experiment /code:
stack = []
while true :
op = input (“ “ “choose any operation.
1.PUSH
2.POP
3.SHOW
4.OVERFLOW
5.UNDERFLOW
6.EXIT
” ” ”)
if op ==”1
if len(stack)=10:
print (*---->stack overflow.”)
continue
ele = input (“enter an element to push: ’’)
stack.insert(0,ele)
print (f”---->element {ele}added.”)
elif op==”2”:
if len(stack)==0:
print (“---->stack.underflow “)
continue
stack.pop(0)
print(“------> element poped”)
elif op==”3”:
print (“-->stack :”, stack )
elif op==”4”:
if len(stack)==0
print (“-------->ohh! Stack is not overflowing”)
else:
print(“------>no stack is not overflowing”)
elif op==”5”:
if len(stack)==0:
print(“---->ohh! Stack is underflowing”)
else:
print(“-------> no stack is not underflowing”)
elif op===”6”:
print(“Thank you ”)
break
Output:-
Queue Implementation
Experiment/ Code:
queue = []
while True:
1. PUSH
2.POP
3.SHOW
4. OVER FLOW
5. UNDER FLOW
6. EXIT
“””)
if op==”1” :
if len(queue) == 10:
Continue
queue.insert(o, ele)
elif op==”2”:
if len(queue)==0:
continue
queue.pop(-1)
elif op==”4":
if len(queue) == 10:
else:
elif op=="5":
if len(queue)==0 :
else:
elit op==”5":
print("Thank You “)
break
Output:
EXPERIMENT - 10
Objective: Write a Python program to perform linear and binary
search on list elements.
Experiment/ Code For Linear Search:
#Function to implement linear search
def l_search(list,n):
for i in list:
if i == n:
return True
return False
if 1_search(list, n):
print("Element Found.")
else:
Output:
Experiment/ Code For Binary Search:
def binary_search(arr, low, high, x):
return mid
elifarr[mid] > x:
else:
else:
x = 10
# Function call
if result != -1:
else:
Output: