Bda Manual
Bda Manual
AIM:
To Downloading and installing Hadoop; Understanding different Hadoop modes. Startup
scripts, Configuration files.
• VIRTUAL BOX (For Linux): it is used for installing the operating system on it.
• OPERATING SYSTEM: You can install Hadoop on Windows or Linux based
operating systems. Ubuntu and CentOS are very commonly used.
• JAVA: You need to install the Java 8 package on your system.
• HADOOP: You require Hadoop latest version
1. Install Java
• Java JDK Link to download_
https://www.oracle.com/java/technologies/javase-jdk8-downloads.html
• Extract and install Java in C:\Java
• Open cmd and type-> javac -version
2. Download Hadoop
https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz
• extract to C:\Hadoop
1
2
3
4
5
6
paste the xml code in folder and save
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
7
<value>/hadoop-3.3.0/data/datanode</value>
</property>
</configuration>
6. Hadoop Configurations
Download
https://github.com/brainmentorspvtltd/BigData_RDE/blob/master/Hadoop%20Configuration.zip
or (for hadoop 3)
https://github.com/s911415/apache-hadoop-3.1.0-winutils
• Copy folder bin and replace existing bin folder in
C:\Hadoop-3.3.0\bin
• Format the NameNode
• Open cmd and type command "hdfs namenode -format"
8
7. Testing
• Open cmd and change directory to C:\Hadoop-3.3.0\sbin
• type start-all.cmd
9
10
11
HADOOP IMPLEMENTATION OF FILE MANAGEMENT
EXP.NO:2 TASKS, SUCH AS ADDING
FILES AND DIRECTORIES, RETRIEVING FILES AND
DATE:
DELETING FILES.
AIM:
To implement the following file management tasks in Hadoop:
1. Adding files and directories
2. Retrieving files
3. Deleting Files
I.Create a directory in HDFS at given path(s).
Usage:
hadoop fs -mkdir <paths> Example:
hadoop fs -mkdir /user/saurzcode/dirl /user/saurzcode/dir2
12
Example:
Example:
hadoop fs -cp
/user/saurzcode/dir1/abc.txt
/user/saurzcode/dir2
URI Example:
hadoop fs -copyFromLocal /home/saurzcode/abc.txt /user/ saurzcode/abc.txt
Similar to put command, except that the source is restricted to a local file reference.
copyToLocal
Usage:
hadoop fs -copyToLocal [-ignorecrc] [-ere] URI <localdst>
Similar to get command, except that the destination is restricted to a local file reference.
13
4. Remove a file or directory in HDFS.
Remove files specified as argument. Deletes directory only when it is empty
Usage:
hadoop fs -rm <arg> Example:
hadoop fs -rm /user/saurzcode/dirl/abc.txt
RESULT:
14
EXP.NO:3 IMPLEMENT OF MATRIX MULTIPLICATION WITH
HADOOP MAP REDUCE
DATE:
AIM:
To write a Map Reduce Program that implements Matrix Multiplication.
ALGORITHM:
We assume that the input matrices are already stored in Hadoop Distributed File System
(HDFS) in a suitable format (e.g., CSV, TSV) where each row represents a matrix element. The
matrices are compatible for multiplication (the number of columns in the first matrix is equal
to the number of rows in the second matrix).
STEP 1: MAPPER
The mapper will take the input matrices and emit key-value pairs for each element in
the result matrix. The key will be the (row, column) index of the result element, and the value
will be the corresponding element value.
STEP 2: REDUCER
The reducer will take the key-value pairs emitted by the mapper and calculate the partial
sum for each element in the result matrix.
PROGRAM:
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
15
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.TextlnputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.FilelnputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.fs.Path;
public class MatrixMultiplicationMapper extends Mapper<LongWritable, Text, Text, Text>
{
protected void map(LongWritable key, Text value, Context context) throws IOException,
InterruptedException {
II Parse the input line to get row, column, and value of each element in the input matrices
String[] elements = value.toString().split(",");
int row= Integer.parselnt(elements[0]);
int col = Integer.parselnt(elements[1]);
int val = Integer.parselnt(elements[2]);
II Emit key-value pairs where key is (row, column) index of the result element
II and value is the corresponding element value
context.write(new Text(row +","+col), new Text(val));
}
}
public class MatrixMultiplicationReducer extends Reducer<Text, Text, Text, IntWritable>
{ protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException,
InterruptedException {
int result= 0;
for (Text value : values) {
II Accumulate the partial sum for the result element
result+= Integer.parselnt(value.toString());
}
II Emit the final result for the result element
context.write(key, new IntWritable(result));
}
16
}
public class MatrixMultiplicationDriver {
public static void main(String[] args) throws Exception
{ Configuration conf = new Configuration();
Job job= Job.getlnstance(conf, "Matrix Multiplication");
job.setJarByClass(MatrixMultiplicationDriver.class);
job.setMapperClass(MatrixMultiplicationMapper.class);
job.setReducerClass(MatrixMultiplicationReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FilelnputFormat.addlnputPathUob, new Path(args[0]));
FileOutputFormat.setOutputPathUob, new Path(args[1]));
System.exitUob.waitForCompletion(true)? 0: 1);
}
}
17
18
EXP.NO:4 RUN A BASIC WORD COUNT MAP REDUCE PROGRAM
TO UNDERSTAND MAP REDUCE PARADIGM
DATE:
AIM:
To write a Basic Word Count program to understand Map Reduce Paradigm.
ALGORITHM:
The entire MapReduce program can be fundamentally divided into three parts:
• Mapper Phase Code
• Reducer Phase Code
• Driver Code
19
Input:
• The key nothing but those unique words which have been generated after the sorting
and shuffling phase: Text
• The value is a list of integers corresponding to each key: IntWritable
• Example - Bear, [l, 1], etc.
Output:
• The key is all the unique words present in the input text file: Text
• The value is the number of occurrences of each of the unique words: IntWritable
• Example - Bear, 2; Car, 3, etc.
• We have aggregated the values present in each of the list corresponding to each key and
produced the final answer.
• In general, a single reducer is created for each of the unique words, but, you can specify the
number of reducer in mapred-site.xml.
PROGRAM:
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
20
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.TextlnputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.FilelnputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.fs.Path;
public class WordCount
{
public static class Map extends Mapper<LongWritable,Text,Text,IntWritable>
{ public void map(LongWritable key, Text value,Context context) throws
IOException,InterruptedException{
String line= value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens())
{ value.set(tokenizer.nextToken());
context.write(value, new IntWritable(l));
}
}
}
public static class Reduce extends Reducer<Text,IntWritable,Text,IntWritable>
{ public void reduce(Text key, Iterable<IntWritable> values,Context context)
throws IOException,InterruptedException {
int sum=0;
for(IntWritable x: values)
{
sum+=x.get();
}
context.write(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception
{ Configuration conf= new Configuration();
21
Job job= new Job(conf,"My Word Count Program");
job.setJarByClass(WordCount.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(lntWritable.class);
job.setlnputFormatClass(TextlnputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
Path outputPath = new Path(args[l]);
//Configuring the input/output path from the filesystem into the job
FilelnputFormat.addlnputPathUob, new Path(args[0]));
FileOutputFormat.setOutputPathUob, new Path(args[1]));
//deleting the output path automatically from hdfs so that we don't have to
delete it explicitly
outputPath.getFileSystem(conf).delete(outputPath);
//exiting the job only if the flag value becomes false
System.exitUob.waitForCompletion(true)? 0: 1);
}
}
Run the MapReduce code:
The command for running a MapReduce code is:
hadoop jar hadoop-mapreduce-example.jar WordCount /sample/input /sample/output
OUTPUT:
22
23
EXP.N0:5
INSTALLATION OF HIVE ALONG WITH PRACTICE EXAMPLES.
DATE:
AIM:
To install HIVE along with practice examples.
PREREQUISITES:
• Java Development Kit (JDK) installed and the JAVA_HOME environment variable
set.
• Hadoop installed and configured on your Windows system.
STEP-BY-STEP INSTALLATION:
1. Download HIVE:
Visit the Apache Hive website and download the latest stable version of Hive.
Official Apache Hive website: https://hive.apache.org/
2. Extract the Downloaded Hive Archive to a Directory on Your Windows Machine,
e.g., C:\hive.
3. Configure Hive:
• Open the Hive configuration file (hive-site.xml) located in the conf folder of the
extracted Hive directory.
<property>
• Set the necessary configurations, such as Hive Metastore connection settings and
Hadoop configurations. Make sure to adjust paths accordingly for Windows
4. Environment Variables Setup:
• Add the Hive binary directory (C:\hive\bin in this example) to your PATH environment
variable.
• Set the HIVE_HOME environment variable to point to the Hive installation directory
(C:\hive in this example).
24
5. Start the Hive Metastore service:
To start the Hive Metastore service, you can use the schematool script:
6. Start Hive:
• Open a command prompt or terminal and navigate to the Hive installation directory.
• Execute the hive command to start the Hive shell.
EXAMPLES:
1. Create a Database:
To create a new database in HIVE, use the following syntax:
2. Use a Database:
To use a specific database in HIVE, use the following syntax:
USE database_name;
Example:
USE mydatabase;
3. Show Databases:
To display a list of available databases in HIVE, use the following syntax:
SHOW DATABASES;
4. Create a Table:
To create a table in HIVE, use the following syntax:
CREATE TABLE table_name (
columnl datatype,
column2 datatype,
);
25
Example:
CREATE TABLE mytable
( id INT,
name STRING,
age INT
);
5. Show Tables:
To display a list of tables in the current database, use the following syntax:
SHOW TABLES;
6. Describe a Table:
To view the schema and details of a specific table, use the following syntax:
DESCRIBE table_name;
Example:
DESCRIBE mytable;
RESULT:
Thus the Installation of HIVE was done successfully.
26
EXP.N0:6 INSTALLATION OF THRIFT
DATE:
AIM:
To install Apache thrift on Windows OS.
ALGORITHM:
27
28
EXP.NO:7 PRACTICE IMPORTING AND EXPORTING DATA FROM
VARIOUS DATABASES.
DATE:
AIM:
To import and export data from various Databases using SQOOP.
ALGORITHM:
--password <DB_PASSWORD> \
29
--password <DB_PASSWORD> \
--table <TABLE NAME> \
--export-dir <HDFS_EXPORT_DIR> \
--input-fields-terminated-by '<DELIMITER>'
RESULT:
Thus the implementation export data from various Databases using SQOOP was done
successfully.
30
EXP.N0:8 INSTALLATION OF HBASE ALONG WITH PRACTICE EXAMPLES
DATE:
AIM:
To install HBASE using Virtual Machine and perform some operations in HBASE.
ALGORITHM:
Step 1: Install a Virtual Machine
• Download and install a virtual machine software such as VirtualBox
(https://www.virtualbox.org/) or VMware (https://www.vmware.com/).
• Create a new virtual machine and install a Unix-based operating system like Ubuntu or
CentOS. You can download the ISO image of your desired Linux distribution from their
official websites.
31
• Move the extracted HBase directory to a desired location:
sudo mv <hbase_extracted_directory> /opt/hbase
• Replace <hbase_extracted_directory> with the actual name of the extracted HBase
directory.
32
Step 3: Create a Table
• In the HBase shell, you can create a table with column families.
• For example, let's create a table named "my_table" with a column family called "cf':
>> create 'my_table', 'cf
RESULT:
Thus the installation of HBase using Virtual Machine was done successfully.
33