BDA CIA 2 IMP Questions
BDA CIA 2 IMP Questions
Two marks
db.students.updateOne(
{ name: "John Doe" }, // Filter condition
{ $set: { age: 23 } } // Update operation
);
Example: Updating Multiple Documents
db.students.updateMany(
{ course: "Computer Science" }, // Filter condition
{ $set: { status: "Graduated" } } // Update operation
);
8. Infer how you can manage compute node failures in Hadoop.
Primitive Data Types: These are the most common data types in Hive, used for simple data
values.
Complex Data Types: These types can store more structured data.
15. Relate how to read external data into R from different file formats
JSON Files
• Function: jsonlite::fromJSON()
• Example: library(jsonlite)
data <- fromJSON("file.json")
16. Highlight the key differences between MapReduce and Apache Pig.
19. Write a program in R language to print prime number from 1 to given number.
While Loop
• Repeats a block of code as long as a specified condition is TRUE.
• Syntax:
while (condition) {
# code to be executed
}
Repeat Loop
• Executes an infinite loop until a break condition is met.
• Syntax:
repeat {
# code to be executed
if (condition) break
}
1. Explain the update and delete operations in MongoDB Query Language.
db.collection.updateOne(
<filter>,
<update>,
{ <options> }
);
Example:
db.employees.updateOne(
{ _id: 1 }, // Filter: match the document with _id 1
{ $set: { salary: 60000 } } // Update: set the salary field to 60000
);
updateMany():
• Updates multiple documents that match the filter criteria.
Syntax:
db.collection.updateMany(
<filter>,
<update>,
{ <options> }
);
Example:
db.employees.updateMany(
{ department: "HR" }, // Filter: match all documents where department is "HR"
{ $set: { salary: 50000 } } // Update: set the salary field to 50000 for all matching employees
);
replaceOne():
• Replaces the entire document that matches the filter with a new document.
Syntax:
db.collection.replaceOne(
<filter>,
<replacement>,
{ <options> }
);
Example:
db.employees.replaceOne(
{ _id: 1 }, // Filter: match the document with _id 1
{ _id: 1, name: "John Doe", department: "Engineering", salary: 70000 } // Replace document
entirely
);
Update Operators:
MongoDB provides several operators that can be used with update operations:
• $set: Sets the value of a field.
• $inc: Increments the value of a field.
• $push: Adds an element to an array.
• $addToSet: Adds an element to an array only if it doesn't already exist.
• $unset: Removes a field from a document.
• $rename: Renames a field.
deleteMany():
• Deletes multiple documents that match the filter criteria.
Syntax: db.collection.deleteMany(<filter>);
Example: db.employees.deleteMany({ department: "HR" });
Program in MongoDB Using the Aggregate Function to Calculate Total Sales Revenue for
Each Product Category
In this example, we will calculate the total sales revenue for each product category in a
MongoDB collection named sales. The sales collection contains documents with information
about the products sold, their prices, quantities, and categories.
Collection Structure (sales):
{
"_id": ObjectId("..."),
"product_name": "Laptop",
"category": "Electronics",
"price": 1000,
"quantity": 10
}
import java.io.IOException;
// Mapper class
public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
// Reducer class
public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws
IOException, InterruptedException {
int sum = 0;
R Program Code:
# Function to calculate factorial of a number using iterative approach
factorial_iterative <- function(n) {
# Initialize the result variable to 1
result <- 1
10. Write an R program to perform data visualization using the ggplot2 library.
# Step 2: Create a Bar Plot to visualize the 'Value' for each 'Category'
ggplot(data, aes(x = Category, y = Value, fill = Category)) +
geom_bar(stat = "identity") +
ggtitle("Bar Plot of Category vs Value") +
xlab("Category") +
ylab("Value") +
theme_minimal()
2. Creating a Table
In Hive, you can create a table with a defined schema to store data.
• Command: CREATE TABLE
Syntax:
CREATE TABLE table_name (
column1 datatype,
column2 datatype,
...
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY 'delimiter';
Example:
CREATE TABLE employee_details (
emp_id INT,
emp_name STRING,
emp_age INT,
emp_salary FLOAT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';
5. Dropping a Table
To remove a table from Hive, use the DROP TABLE command.
• Command: DROP TABLE
Syntax:
DROP TABLE table_name;
Example:
DROP TABLE employee_details;
6. Altering a Table
You can alter a table's schema by adding, modifying, or dropping columns.
• Command: ALTER TABLE
Syntax:
ALTER TABLE table_name ADD COLUMNS (column_name datatype);
Example:
ALTER TABLE employee_details ADD COLUMNS (emp_department STRING);
7. Dropping a Database
To remove a database from Hive, use the DROP DATABASE command. The database must be
empty to drop it.
• Command: DROP DATABASE
Syntax:
DROP DATABASE database_name;
Example:
DROP DATABASE employee_db;
8. Listing Tables
You can list all tables in the current database.
• Command: SHOW TABLES
Syntax:
SHOW TABLES;
Example:
SHOW TABLES;
9. Describing a Table
To view the schema of a table, use the DESCRIBE command.
• Command: DESCRIBE
Syntax:
DESCRIBE table_name;
Example:
DESCRIBE employee_details;
13. Elaborate the Mapper and Reducer task with a neat sketch.
Elaboration of Mapper and Reducer Task in MapReduce with Diagram
MapReduce is a distributed data processing framework used in Hadoop to handle large-scale
data. It consists of two key phases:
1. Mapper Phase – Processes input data and converts it into key-value pairs.
2. Reducer Phase – Aggregates and processes intermediate key-value pairs to generate
the final output.
1. Mapper Task
• The Mapper takes input data, processes it, and emits key-value pairs as intermediate
output.
• It runs in parallel on multiple nodes to increase efficiency.
Example: Word Count in a Document
• Input: A text file containing sentences.
• The Mapper reads the file and splits it into words.
• It emits each word with a count of 1.
Input File Content:
Hello Hadoop
Hello Big Data
Mapper Output (Key-Value Pairs):
(Hello, 1)
(Hadoop, 1)
(Hello, 1)
(Big, 1)
(Data, 1)
2. Reducer Task
• The Reducer takes the output from the Mapper, aggregates the values based on keys,
and produces the final result.
• It processes data after shuffling & sorting, where keys with the same values are
grouped together.
Reducer Input (After Shuffling & Sorting):
(Big, [1])
(Data, [1])
(Hadoop, [1])
(Hello, [1,1])
Reducer Output (Final Word Count):
(Big, 1)
(Data, 1)
(Hadoop, 1)
(Hello, 2)
2. MapReduce Implementation
Mapper Class
• Reads input records.
• Emits student name as the key and the rest of the record as the value.
Reducer Class
• Since keys are automatically sorted by Hadoop during the shuffle phase, the Reducer
simply outputs them in sorted order.
MapReduce Java Program for Sorting by Name
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
// Mapper Class
public static class NameMapper extends Mapper<Object, Text, Text, Text> {
public void map(Object key, Text value, Context context) throws IOException,
InterruptedException {
String[] fields = value.toString().split(","); // Split by comma
if (fields.length == 3) {
String studentName = fields[1].trim(); // Name as key
String studentData = fields[0] + "," + fields[2]; // StudentID, Marks as value
context.write(new Text(studentName), new Text(studentData));
}
}
}
// Reducer Class
public static class NameReducer extends Reducer<Text, Text, Text, Text> {
public void reduce(Text key, Iterable<Text> values, Context context) throws IOException,
InterruptedException {
for (Text value : values) {
context.write(key, value); // Output sorted by key (student name)
}
}
}
// Driver Code
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "Sort Students by Name");
job.setJarByClass(StudentSort.class);
job.setMapperClass(NameMapper.class);
job.setReducerClass(NameReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
@Override
protected void setup(Context context) {
Configuration conf = context.getConfiguration();
searchKeyword = conf.get("keyword"); // Get keyword from configuration
}
public void map(Object key, Text value, Context context) throws IOException,
InterruptedException {
String line = value.toString();
if (line.contains(searchKeyword)) { // Check if line contains the keyword
context.write(new Text("Matching Line:"), new Text(line));
}
}
}
R Program Code:
# Function to calculate factorial of a number using iterative approach
factorial_iterative <- function(n) {
# Initialize the result variable to 1
result <- 1