0% found this document useful (0 votes)

19 views44 pages

BDA CIA 2 IMP Questions

The document provides a comprehensive question bank covering various topics related to MongoDB, Hadoop, MapReduce, and R programming. It includes explanations of key concepts such as replication, sharding, data locality, and operations in MongoDB, as well as practical examples for inserting, updating, and deleting data. Additionally, it discusses the advantages of MongoDB's document structure over traditional RDBMS and highlights the integration of R with Hadoop for data analysis.

Uploaded by

ilakkiyanj01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views44 pages

BDA CIA 2 IMP Questions

Uploaded by

ilakkiyanj01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

MODULE 2&3 QUESTION BANK

Two marks

1. Point out Replication and scaling features of MongoDB.

Replication and Scaling Features of MongoDB:

1. Replication: MongoDB uses Replica Sets, where multiple copies of data are
maintained across different servers. This ensures high availability and data
redundancy, allowing automatic failover in case of server failure.
2. Scaling: MongoDB supports Sharding, a horizontal scaling mechanism that
distributes data across multiple servers (shards). This enables the handling of large
datasets and high-throughput operations efficiently.

2. Show how sharding is done in big data

Sharding in Big Data

Sharding is the process of partitioning large datasets across multiple servers to ensure
efficient querying and load balancing. Here’s how it is done:
1. Choose a Shard Key: A field in the document is selected to distribute data (e.g., _id,
date, location).
2. Data Distribution: Data is divided into chunks based on the shard key and
distributed across multiple shards (servers).
3. Query Routing: A mongos router directs queries to the appropriate shard based on
the shard key.
4. Balancing: MongoDB’s balancer ensures an even data distribution across shards.

3. Write about Data Locality in MapReduce.

Data Locality in MapReduce

Data Locality in MapReduce refers to the concept of processing data close to where it is
stored to reduce network congestion and improve performance.
• Hadoop sends computation to the node where the data resides instead of moving
large datasets across the network.
• This minimizes I/O overhead and network latency, making the MapReduce job
more efficient.
• The Hadoop Distributed File System (HDFS) ensures that data blocks are distributed
across nodes, allowing task trackers to assign map tasks to nodes storing the
required data.

4. Differentiate between MongoDB and traditional RDBMS in terms of data storage.

5. Generalize the term Record Reader/Writer.

Record Reader/Writer in Big Data

1. Record Reader: In Hadoop’s MapReduce, a Record Reader converts input data from
HDFS blocks into key-value pairs for processing by the Mapper. It reads raw data
and splits it logically rather than physically.
2. Record Writer: After processing, the Record Writer takes the key-value pairs
produced by the Reducer and writes them back to HDFS or another storage system
in the desired format.
6. How to insert a document in Mongo DB?

Inserting a Document in MongoDB

To insert a document into a MongoDB collection, use the insertOne() or insertMany()
method.
Example: Insert a Single Document
db.students.insertOne({
name: "John Doe",
age: 22,
course: "Computer Science"
});
Example: Insert Multiple Documents
db.students.insertMany([
{ name: "Alice", age: 21, course: "Mathematics" },
{ name: "Bob", age: 23, course: "Physics" }
]);

7. How to update data in a document in Mongo DB by using the update command in

MongoDB with suitable example?

Updating Data in a MongoDB Document

In MongoDB, the updateOne() and updateMany() methods are used to update documents.
Example: Updating a Single Document

db.students.updateOne(
{ name: "John Doe" }, // Filter condition
{ $set: { age: 23 } } // Update operation
);
Example: Updating Multiple Documents
db.students.updateMany(
{ course: "Computer Science" }, // Filter condition
{ $set: { status: "Graduated" } } // Update operation
);
8. Infer how you can manage compute node failures in Hadoop.

Managing Compute Node Failures in Hadoop

Hadoop handles compute node failures through fault tolerance mechanisms:
1. Task Reassignment: When a node fails, the JobTracker (Hadoop v1) or
ResourceManager (Hadoop v2) reassigns the failed task to another healthy node.
2. Speculative Execution: Hadoop detects slow or failing tasks and runs duplicate tasks
on different nodes to ensure job completion.
3. Data Replication: Hadoop’s HDFS replicates data blocks across multiple nodes
(default 3 replicas) to prevent data loss if a node crashes.
4. Heartbeat Monitoring: Hadoop constantly monitors nodes using heartbeats; if a
node stops responding, it is marked as dead, and its tasks are reassigned.

9. Interpret the phases of Map and Reduce task.

Phases of Map and Reduce Task in Hadoop

1. Map Phase
• Input Splitting: The input data is split into chunks.
• Mapping: The Mapper processes each split and converts it into key-value pairs.
• Partitioning: Data is divided based on keys for efficient processing.
• Shuffling & Sorting: Intermediate key-value pairs are sorted and grouped by key.
2. Reduce Phase
• Grouping: The sorted data is grouped by key.
• Reducing: The Reducer processes grouped data to generate the final output.
• Writing Output: The final result is written back to HDFS.
10. Distinguish between Mapper and Reducer in a MapReduce job.

11. Analyze the role of Combiner in MapReduce programming.

Role of Combiner in MapReduce Programming

A Combiner in MapReduce is a local, mini-reducer that helps optimize performance by
reducing the amount of data transferred between the Map and Reduce phases.
1. Data Aggregation: The Combiner processes the output of the Mapper locally,
aggregating or summarizing data (e.g., summing values, counting occurrences)
before sending it to the Reducer.
2. Reduce Network Load: By applying a reduction operation on data at the Mapper
level, the Combiner reduces the amount of data that needs to be shuffled and
transferred over the network to the Reducer, improving overall efficiency.

12. Write a query to create and drop database in Hive.

Hive Query to Create and Drop a Database

Create a Database in Hive
CREATE DATABASE my_database;

Drop a Database in Hive

DROP DATABASE my_database;
13. Apply the concept of partitioning to distribute data in a MapReduce job.

Applying Partitioning to Distribute Data in a MapReduce Job

In MapReduce, partitioning is the process of distributing the output of the Mapper across
multiple Reducers based on the key. The partitioning ensures that all records with the
same key are sent to the same Reducer.
1. Partitioning Key: The partitioner uses the key of the key-value pair produced by the
Mapper to determine which Reducer will process the data.
2. Custom Partitioning: By default, the Hadoop partitioner uses a hash function to
assign keys to Reducers. However, you can implement a custom partitioner to
control data distribution, allowing for better load balancing and optimized
processing.
Example of Custom Partitioner:
public class MyCustomPartitioner extends Partitioner<Text, IntWritable> {
@Override
public int getPartition(Text key, IntWritable value, int numPartitions) {
// Example: Partition based on the first character of the key
return key.toString().charAt(0) % numPartitions;
}
}
14. Identify the different Hive datatypes and their uses.

Primitive Data Types: These are the most common data types in Hive, used for simple data
values.
Complex Data Types: These types can store more structured data.

15. Relate how to read external data into R from different file formats

Reading External Data into R from Different File Formats

CSV Files
o Function: read.csv()
o Example: data <- read.csv("file.csv")
Excel Files
• Function: readxl::read_excel()
• Example:
library(readxl)
data <- read_excel("file.xlsx")

Text Files (Tab-delimited)

• Function: read.table()
• Example: data <- read.table("file.txt", sep = "\t", header = TRUE)

JSON Files
• Function: jsonlite::fromJSON()
• Example: library(jsonlite)
data <- fromJSON("file.json")
16. Highlight the key differences between MapReduce and Apache Pig.

17. Give the Data types in Hive.

Here are some data types in Hive:

1. TINYINT
2. SMALLINT
3. INT
4. BIGINT
5. FLOAT
6. DOUBLE
7. STRING
8. BOOLEAN
9. DATE
10. TIMESTAMP
18. Evaluate the integration of R with Hadoop for data analysis.

Integration of R with Hadoop for Data Analysis

1. Data Processing at Scale: Hadoop provides a distributed environment for storing
and processing large datasets, while R is powerful for statistical analysis and data
visualization. By integrating the two, R can be used to analyze large datasets stored
in Hadoop's HDFS (Hadoop Distributed File System) efficiently.
2. Using RHadoop: RHadoop is a collection of R packages (like rmr2, rhdfs, rhbase)
that facilitate the integration of R with Hadoop. With RHadoop, R can access and
process data directly from HDFS, run MapReduce jobs, and analyze large-scale data
in a parallel and distributed manner.

19. Write a program in R language to print prime number from 1 to given number.

# Function to check if a number is prime

is_prime <- function(n) {
if (n <= 1) return(FALSE)
for (i in 2:sqrt(n)) {
if (n %% i == 0) return(FALSE)
}
return(TRUE)
}

# Function to print prime numbers up to a given number

print_primes <- function(limit) {
for (i in 2:limit) {
if (is_prime(i)) {
print(i)
}
}
}

# Example: Print prime numbers from 1 to 30

print_primes(30)

20. Mention the Types of Iterative Programming in R.

Types of Iterative Programming in R

1. For Loop
o Used for repeating a block of code a fixed number of times.
o Syntax:
for (i in 1:10) {
# code to be executed
}

While Loop
• Repeats a block of code as long as a specified condition is TRUE.
• Syntax:
while (condition) {
# code to be executed
}

Repeat Loop
• Executes an infinite loop until a break condition is met.
• Syntax:

repeat {
# code to be executed
if (condition) break
}
1. Explain the update and delete operations in MongoDB Query Language.

Update and Delete Operations in MongoDB Query Language (MQL)

MongoDB provides powerful methods to update and delete data in a database. The update
and delete operations are essential for managing and maintaining data integrity. Below is an
explanation of both operations, their syntax, and common use cases.

1. Update Operations in MongoDB

The update operation in MongoDB modifies existing documents in a collection. It allows you to
change values of specific fields, add new fields, or remove existing ones.
Types of Update Operations:
1. updateOne():
o Updates one document that matches the filter criteria.
o If multiple documents match the filter, only the first one is updated.
Syntax:

db.collection.updateOne(
<filter>,
<update>,
{ <options> }
);

Example:
db.employees.updateOne(
{ _id: 1 }, // Filter: match the document with _id 1
{ $set: { salary: 60000 } } // Update: set the salary field to 60000
);

updateMany():
• Updates multiple documents that match the filter criteria.
Syntax:
db.collection.updateMany(
<filter>,
<update>,
{ <options> }
);
Example:
db.employees.updateMany(
{ department: "HR" }, // Filter: match all documents where department is "HR"
{ $set: { salary: 50000 } } // Update: set the salary field to 50000 for all matching employees
);
replaceOne():
• Replaces the entire document that matches the filter with a new document.
Syntax:
db.collection.replaceOne(
<filter>,
<replacement>,
{ <options> }
);

Example:
db.employees.replaceOne(
{ _id: 1 }, // Filter: match the document with _id 1
{ _id: 1, name: "John Doe", department: "Engineering", salary: 70000 } // Replace document
entirely
);

Update Operators:
MongoDB provides several operators that can be used with update operations:
• $set: Sets the value of a field.
• $inc: Increments the value of a field.
• $push: Adds an element to an array.
• $addToSet: Adds an element to an array only if it doesn't already exist.
• $unset: Removes a field from a document.
• $rename: Renames a field.

2. Delete Operations in MongoDB

The delete operation is used to remove documents from a collection. There are two main
methods for deleting documents in MongoDB.
Types of Delete Operations:
1. deleteOne():
o Deletes one document that matches the filter criteria.
Syntax: db.collection.deleteOne(<filter>);

Example: db.employees.deleteOne({ _id: 1 }); // Deletes the document with _id 1

deleteMany():
• Deletes multiple documents that match the filter criteria.
Syntax: db.collection.deleteMany(<filter>);
Example: db.employees.deleteMany({ department: "HR" });

2. Discuss the advantages of using MongoDB's JSON-like document

structure for storing employee records over a traditional relational
database schema

Advantages of Using MongoDB's JSON-like Document Structure for Storing Employee

Records Over a Traditional Relational Database Schema
MongoDB's JSON-like document structure (BSON - Binary JSON) offers several advantages
over traditional relational database schemas for storing employee records. These benefits
stem from the flexibility, scalability, and ease of use provided by MongoDB’s document-based
approach.
Here are the key advantages:

1. Flexible Schema Design

MongoDB:
• MongoDB uses a schemaless design, meaning each document (record) can have
different fields and structures.
• For employee records, this flexibility allows you to store various data without rigid
constraints. For example, some employee records might have additional information
such as multiple phone numbers or work locations, while others might not.
• This flexibility allows quick adaptation to changing business requirements, such as
adding new fields to employee records without needing to alter the database schema.
Traditional RDBMS:
• Relational databases use a fixed schema, where all rows in a table must conform to the
same structure (columns). Any changes, like adding or removing columns, often require
modifying the table structure, which can be cumbersome and disruptive.
Example: In MongoDB, you can store an employee’s skills as an array of strings, and the
contact info as a nested document, which is much more difficult to achieve in a relational
database.

2. Easier Data Representation (Complex Structures)

MongoDB:
• With its JSON-like structure, MongoDB can natively represent complex and nested
data, which is common for employee records. For example, you can easily represent
nested information such as addresses, work experience, or departments as sub-
documents within an employee record.
• MongoDB’s embedded documents and arrays allow storing employee's related
information (like past projects, previous job roles, and certifications) together within a
single record, leading to simpler data retrieval and better organization.
Traditional RDBMS:
• Representing complex relationships, such as an employee with multiple work locations
or a history of job positions, typically requires multiple related tables and JOIN
operations. This results in complicated queries and often affects performance.
• In relational databases, this requires normalization, which may result in multiple tables
and an increased need for foreign keys and JOINs.

3. Horizontal Scalability and Performance

MongoDB:
• MongoDB is designed to scale horizontally, which means it can handle large volumes of
employee data by distributing it across multiple servers using sharding.
• The document structure and the ability to store data in a denormalized way (reducing the
need for joins) allow for faster reads and writes, especially when dealing with large
datasets like employee records in large organizations.
• MongoDB provides better performance when storing large, unstructured, or semi-
structured data, making it easier to scale as the company grows.
Traditional RDBMS:
• Relational databases typically scale vertically, which means increasing the capacity of a
single server (adding CPU, RAM, storage, etc.). Scaling horizontally (across multiple
servers) requires complex sharding or partitioning, which can be difficult to implement
and manage.
• Also, in relational databases, normalization and JOIN operations can slow down
performance as the dataset grows, especially when handling many relationships
between employee data and other entities.

4. Real-Time Analytics and Aggregation

MongoDB:
• MongoDB’s aggregation framework allows for powerful, real-time analytics directly on
employee records, such as calculating the number of employees in different
departments, average salary by region, or sorting employees by their years of service.
• It enables ad-hoc queries, which means employees’ records can be queried and
analyzed without predefined schema restrictions, allowing for more flexible and faster
decision-making.
Traditional RDBMS:
• In a relational database, similar analytics often require complex JOIN operations and
may need pre-built views or materialized views to optimize performance. The queries are
more rigid due to the fixed schema and may not be as efficient as MongoDB’s
aggregation framework for large datasets.

5. NoSQL Model for Fast Development and Iteration

MongoDB:
• MongoDB’s flexible schema design facilitates faster application development, particularly
when you need to iterate quickly. As business needs change (e.g., new employee
benefits or tracking additional employee metrics), the schema can evolve without the
overhead of modifying the underlying database structure.
• MongoDB’s support for dynamic schemas allows developers to store different kinds of
information with varying structures, making it ideal for fast-paced development
environments, such as tech startups or companies experimenting with new HR systems.
Traditional RDBMS:
• In a relational model, developers must define the schema upfront and make changes
carefully. Altering the database schema for evolving requirements (e.g., adding a new
field for "remote work status") can involve significant overhead, including database
migrations and potential downtime.

6. High Availability with Replica Sets

MongoDB:
• MongoDB supports replica sets, which provide automatic failover and data redundancy.
This means employee records are always available even if one node fails, making it
ideal for mission-critical systems where uptime is important.
• Replica sets ensure that employee data is consistently available for read/write
operations, even during server maintenance or unexpected failures.
Traditional RDBMS:
• Achieving high availability in relational databases typically involves complex
configurations and clustering technologies, such as master-slave replication or
clustering. These configurations may involve manual intervention during failovers or
backup processes.

7. Handling Unstructured Data

MongoDB:
• MongoDB excels in handling unstructured and semi-structured data (such as
employee comments, documents, or logs) alongside structured data. For example, you
can store employee feedback in a text field and use it for analysis without having to fit it
into a strict schema.
Traditional RDBMS:
• Relational databases are not well-suited for unstructured or semi-structured data. Any
unstructured data (like text or images) often requires separate tables or special handling,
adding complexity to the database schema.

8. Reduced Data Duplication and Complexity

MongoDB:
• MongoDB’s denormalized data model reduces the need for complex JOINs. Employee
data, including related data such as department information, can be embedded directly
within the document, thus avoiding the need to reference multiple tables.
Traditional RDBMS:
• In relational databases, data normalization can lead to multiple tables and frequent JOIN
operations. These operations are often more complex and slower, especially when
dealing with large datasets of employee records, where data redundancy is minimized
through normalization.

3. Describe the aggregate function in MongoDB Query Language.

Aggregate Function in MongoDB Query Language (MQL)

The aggregate function in MongoDB is a powerful and flexible tool for performing data
transformation and computation operations. It allows you to process and summarize data in a
collection by applying multiple operations such as filtering, grouping, sorting, projecting, and
joining data. MongoDB's aggregation framework processes data through a series of stages,
each of which performs a specific operation on the data, forming an aggregation pipeline.
Key Features of the Aggregate Function in MongoDB:
1. Aggregation Pipeline: The aggregation framework in MongoDB works by using an
aggregation pipeline. The pipeline is a series of stages that process documents in a
sequence. The output of one stage becomes the input for the next.
o Each stage is represented by a MongoDB aggregation operator.
o The aggregation framework processes data efficiently and can handle complex
queries involving filtering, grouping, sorting, and more.
2. Stages in Aggregation Pipeline: Each stage in the aggregation pipeline performs a
specific operation on the data. Common stages include:
o $match: Filters documents to pass only those that match a specified condition
(similar to the WHERE clause in SQL).
o $group: Groups documents by a specified field and applies aggregate functions
like sum, avg, count, etc. (similar to GROUP BY in SQL).
o $sort: Sorts the documents by specified fields.
o $project: Reshapes the document, allowing you to include or exclude specific
fields.
o $limit: Limits the number of documents passed to the next stage.
o $skip: Skips a specified number of documents.
o $unwind: Deconstructs an array field and creates a document for each element.
o $lookup: Performs a left outer join with another collection (similar to SQL joins).

Syntax of the Aggregate Function:

db.collection.aggregate([
{ $stage1: { ... } },
{ $stage2: { ... } },
...
]);
Each stage is enclosed in a {} and is separated by commas.

Commonly Used Pipeline Stages:

1. $match: The $match stage filters the documents in the collection based on the specified
criteria. It is similar to the WHERE clause in SQL.
Example:
db.sales.aggregate([
{ $match: { region: "North" } }
]);
o This query filters the sales records to include only those where the region is
"North".
2. $group: The $group stage is used to group documents by a specific field and apply
aggregate functions such as sum, avg, count, etc.
Example:
db.sales.aggregate([
{ $group: { _id: "$region", total_sales: { $sum: "$amount" } } }
]);
o This groups the sales documents by region and calculates the total sales ($sum)
for each region.
3. $sort: The $sort stage sorts the documents by specified fields.
Example:
db.sales.aggregate([
{ $sort: { amount: -1 } }
]);
o This sorts the documents by the amount field in descending order.
4. $project: The $project stage reshapes each document in the pipeline by specifying
which fields to include or exclude.
Example:
db.sales.aggregate([
{ $project: { region: 1, amount: 1, _id: 0 } }
]);
o This projects the region and amount fields, while excluding the _id field from the
result.
5. $limit: The $limit stage limits the number of documents passed to the next stage of the
pipeline.
Example:
db.sales.aggregate([
{ $limit: 5 }
]);
o This limits the result to the first 5 documents.
6. $unwind: The $unwind stage deconstructs an array field and creates a separate
document for each element in the array.
Example:
db.orders.aggregate([
{ $unwind: "$items" }
]);
o This unwinds the items array field, creating a new document for each item in the
array.
7. $lookup: The $lookup stage performs a left outer join with another collection.
Example:
db.orders.aggregate([
{ $lookup: {
from: "products",
localField: "product_id",
foreignField: "_id",
as: "product_info"
}}
]);
o This performs a left outer join between the orders collection and the products
collection, matching product_id from orders with _id in products, and stores the
joined data in a field called product_info.

Example of a Complete Aggregation Pipeline:

Below is an example of a full aggregation pipeline that combines multiple stages.
Scenario: You want to get the top 3 regions with the highest total sales.
Pipeline:
db.sales.aggregate([
{ $match: { region: { $in: ["North", "South", "East", "West"] } } }, // Match specific regions
{ $group: { _id: "$region", total_sales: { $sum: "$amount" } } }, // Group by region and calculate
total sales
{ $sort: { total_sales: -1 } }, // Sort by total sales in descending order
{ $limit: 3 } // Limit the result to top 3 regions
]);

4. Write a Program in MongoDB using the aggregate function to calculate the

total sales revenue for each product category in a collection.

Program in MongoDB Using the Aggregate Function to Calculate Total Sales Revenue for
Each Product Category
In this example, we will calculate the total sales revenue for each product category in a
MongoDB collection named sales. The sales collection contains documents with information
about the products sold, their prices, quantities, and categories.
Collection Structure (sales):

{
"_id": ObjectId("..."),
"product_name": "Laptop",
"category": "Electronics",
"price": 1000,
"quantity": 10
}

Steps to Calculate Total Sales Revenue:

• Step 1: Use the $group stage to group the documents by category.
• Step 2: Use the $sum operator to calculate the total revenue for each category by
multiplying the price and quantity fields.
• Step 3: Optionally, use the $sort stage to order the categories by the total sales revenue
in descending order.
MongoDB Aggregation Pipeline Program:
db.sales.aggregate([
{
// Step 1: Group by category
$group: {
_id: "$category", // Group by 'category'
total_revenue: {
$sum: { $multiply: ["$price", "$quantity"] } // Multiply price and quantity for total sales
revenue
}
}
},
{
// Step 2: Sort by total_revenue in descending order
$sort: { total_revenue: -1 }
}
]);
Explanation:
1. $group Stage:
o _id: "$category": Groups the documents by the category field.
o total_revenue: { $sum: { $multiply: ["$price", "$quantity"] } }: Calculates the
total revenue for each category by multiplying the price and quantity fields and
summing the results for each category.
2. $sort Stage:
o $sort: { total_revenue: -1 }: Sorts the categories by the calculated total revenue
in descending order (i.e., highest revenue first).
Example Output:
The output will be a list of product categories with their corresponding total sales revenue:
[
{
"_id": "Electronics",
"total_revenue": 15000
},
{
"_id": "Clothing",
"total_revenue": 8000
},
{
"_id": "Furniture",
"total_revenue": 4000
}
]

5. How does sorting and searching work in the MapReduce framework?

Explain with an example.

Sorting and Searching in the MapReduce Framework (8 Marks)

Sorting and searching are key operations in the MapReduce framework, used to organize
data and efficiently find specific results during data processing.
Sorting in MapReduce:
1. Map Phase:
o The Mapper outputs key-value pairs, but sorting is not performed here.
o Sorting occurs later in the pipeline.
2. Shuffle and Sort Phase:
o The MapReduce framework automatically sorts the key-value pairs emitted by
the Mappers by key.
o All values for a specific key are grouped together and sent to the same Reducer.
3. Reduce Phase:
o The Reducer receives sorted key-value pairs and performs operations such as
summing or filtering. Sorting ensures that the data is processed in a predictable
order.
Searching in MapReduce:
1. Map Phase:
o Searching/filtering can be done by emitting only specific key-value pairs based
on a condition (e.g., specific words or ranges).
2. Reduce Phase:
o In the Reduce phase, complex searching tasks like top N or aggregation can be
performed after sorting the data.
Example: Finding the Top N Frequent Words:
1. Map Phase: Mapper emits key-value pairs where the key is the word, and the value is 1.
Example output:
("apple", 1)
("orange", 1)
("banana", 1)
2. Shuffle and Sort: Automatically groups and sorts the words by frequency.
Grouped output:
("apple", [1, 1, 1])
("orange", [1, 1])
("banana", [1])
3. Reduce Phase: Reducer sums up the values for each key (word), providing the total
count.
Output:
("apple", 3)
("orange", 2)
("banana", 1)
4. Final Sorting: Sorting the results gives the top N most frequent words.

6. Write a MapReduce program to count the number of occurrences of each

word in a given dataset.

MapReduce Program to Count the Number of Occurrences of Each Word

In this program, we'll write a MapReduce job to count the occurrences of each word in a given
dataset (e.g., a text file). The MapReduce framework will break the task into two phases: the
Map phase and the Reduce phase.
Steps to Implement the Word Count Program:
1. Map Phase:
o The Mapper will read each line of the input text file, split it into words, and emit
each word with a count of 1.
2. Reduce Phase:
o The Reducer will take all the values (counts) for each word and sum them up to
compute the total occurrences of that word in the dataset.
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;

public class WordCount {

// Mapper class
public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {

private final static IntWritable one = new IntWritable(1);

private Text word = new Text();
@Override
public void map(Object key, Text value, Context context) throws IOException,
InterruptedException {
// Split the input line into words
String[] words = value.toString().split("\\s+");

for (String word : words) {

word.set(word); // Set the word as key
context.write(word, one); // Emit word with count 1
}
}
}

// Reducer class
public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

private IntWritable result = new IntWritable();

@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws
IOException, InterruptedException {
int sum = 0;

// Sum up all the occurrences of the word

for (IntWritable val : values) {
sum += val.get();
}

result.set(sum); // Set the total count for the word

context.write(key, result); // Emit the word and its total count
}
}

// Main method to set up the job

public static void main(String[] args) throws Exception {
// Configure the job
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);

// Set Mapper and Reducer classes

job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class); // Optional, to reduce intermediate data
job.setReducerClass(IntSumReducer.class);
// Set output key and value types
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

// Set input and output file paths

FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));

// Submit the job and wait for completion

System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

7. Discuss the Hive architecture with a neat diagram

Hive Architecture Overview

Hive is a data warehousing and SQL-like query language that is built on top of Hadoop. It
provides an interface for querying and managing large datasets residing in Hadoop Distributed
File System (HDFS). The architecture of Hive is designed to allow users to query data stored in
Hadoop using a simplified, SQL-like syntax (HiveQL). It abstracts the complexities of writing
MapReduce programs directly and provides an easy-to-use platform for data analysts.
Components of Hive Architecture:
1. Hive Client:
o This is the interface through which users interact with Hive. Users can submit
queries using Hive CLI, JDBC, ODBC, or Web UI (like Beeline or Hive Web UI).
2. Hive Driver:
o The Driver component receives the queries submitted by the users and
manages the lifecycle of query execution.
o It parses the query, compiles it, optimizes it, and finally executes it by generating
the appropriate MapReduce jobs (or other execution plans).
3. Compiler:
o The Compiler translates the HiveQL queries into a series of MapReduce jobs
that can be run on Hadoop.
o It performs tasks such as syntax checking, semantic analysis, query
optimization, and query plan generation.
4. Execution Engine:
o The Execution Engine is responsible for executing the MapReduce jobs
generated by the compiler.
o It interfaces with Hadoop's MapReduce framework to execute the tasks in
parallel across the Hadoop cluster.
o It can also execute Tez or Spark jobs (in newer versions of Hive that support
execution engines beyond MapReduce).
5. MetaStore:
o MetaStore is a central repository where all metadata about the tables, partitions,
and schemas are stored.
o It contains information such as the table schema, location of data in HDFS,
and other metadata.
o The MetaStore can be stored in a relational database like MySQL, PostgreSQL,
etc.
6. Hive SerDe (Serializer/Deserializer):
o SerDe is used to convert the data from its original format into a format that Hive
can work with and vice versa.
o It supports various file formats like Text, Avro, Parquet, ORC, etc.
7. HDFS:
o HDFS (Hadoop Distributed File System) is where the actual data resides. Hive
stores large amounts of structured and unstructured data in HDFS, and the
Execution Engine interacts with this data during query execution.
8. Hadoop:
o Hive utilizes Hadoop MapReduce for query execution (though it can also use
Tez or Spark as an execution engine). MapReduce jobs process the data in
HDFS and return the results back to the Hive system.
8. Write a R program by creating a function to calculate factorial of a number
using an iterative approach.

R Program to Calculate Factorial of a Number Using Iterative Approach

In this program, we will create a function in R that calculates the factorial of a number using an
iterative approach. The factorial of a number n is the product of all positive integers less than
or equal to n. The factorial is represented as n! and calculated as:
• n! = n * (n-1) * (n-2) * ... * 1

R Program Code:
# Function to calculate factorial of a number using iterative approach
factorial_iterative <- function(n) {
# Initialize the result variable to 1
result <- 1

# Loop to calculate factorial

for (i in 1:n) {
result <- result * i
}

# Return the calculated factorial

return(result)
}

# Example usage of the function

number <- as.integer(readline(prompt = "Enter a number to calculate its factorial: "))

# Check if the number is non-negative

if (number < 0) {
print("Factorial is not defined for negative numbers")
} else {
# Call the function and display the result
fact_result <- factorial_iterative(number)
cat("The factorial of", number, "is", fact_result, "\n")
}
9. Describe the CRUD operations in MongoDB with an example.

CRUD Operations in MongoDB

CRUD stands for Create, Read, Update, and Delete, which are the four basic operations you
can perform on a MongoDB database. MongoDB, being a NoSQL database, provides a set of
methods to perform these operations efficiently.
Below is a description of each CRUD operation in MongoDB with examples:
1. Create Operation
The Create operation is used to insert documents into a collection.
• Method: insertOne() and insertMany()
• Example: Inserting a single document into a collection

// Connect to the MongoDB database

db = connect('mongodb://localhost:27017/testdb');

// Inserting a single document into the "employees" collection

db.employees.insertOne({
name: "John Doe",
age: 30,
position: "Software Engineer",
department: "IT"
});

// Inserting multiple documents into the "employees" collection

db.employees.insertMany([
{
name: "Alice Smith",
age: 25,
position: "Data Analyst",
department: "Data Science"
},
{
name: "Bob Johnson",
age: 35,
position: "Project Manager",
department: "IT"
}
]);
• Explanation:
o insertOne() is used to insert a single document.
o insertMany() is used to insert multiple documents at once.
2. Read Operation
The Read operation is used to retrieve documents from a collection.
• Method: find() and findOne()
• Example: Finding documents in the "employees" collection

// Find all employees in the "employees" collection

db.employees.find();

// Find employees with a specific condition

db.employees.find({ department: "IT" });

// Find a single employee by name

db.employees.findOne({ name: "John Doe" });
• Explanation:
o find() returns all documents that match the specified query.
o findOne() returns a single document that matches the query condition.
3. Update Operation
The Update operation is used to modify an existing document in a collection.
• Method: updateOne(), updateMany(), and replaceOne()
• Example: Updating documents in the "employees" collection

// Update a single employee's position in the "employees" collection

db.employees.updateOne(
{ name: "John Doe" }, // Filter condition
{ $set: { position: "Senior Software Engineer" } } // Update operation
);

// Update multiple employees' department to "Engineering"

db.employees.updateMany(
{ department: "IT" },
{ $set: { department: "Engineering" } }
);

// Replace a document with new data

db.employees.replaceOne(
{ name: "Alice Smith" },
{ name: "Alice Brown", age: 26, position: "Senior Data Analyst", department: "Data Science" }
);
• Explanation:
o updateOne() updates the first document that matches the query.
o updateMany() updates all documents that match the query.
o replaceOne() replaces an entire document with a new one.
4. Delete Operation
The Delete operation is used to remove documents from a collection.
• Method: deleteOne() and deleteMany()
• Example: Deleting documents from the "employees" collection

// Delete a single employee by name

db.employees.deleteOne({ name: "John Doe" });

// Delete multiple employees from the "employees" collection

db.employees.deleteMany({ department: "Engineering" });
• Explanation:
o deleteOne() deletes the first document that matches the query.
o deleteMany() deletes all documents that match the query condition.

10. Write an R program to perform data visualization using the ggplot2 library.

R Program to Perform Data Visualization Using ggplot2

In this program, we will use the ggplot2 library to visualize data in R. ggplot2 is one of the most
popular libraries for data visualization and is part of the tidyverse package. It allows users to
create elegant and customizable plots.
Steps:
1. Install and Load ggplot2: First, we need to install the ggplot2 package (if not already
installed) and then load it.
2. Create Sample Data: We will create a sample data frame for visualization.
3. Create Plots: We will create different types of plots such as a scatter plot, bar plot, and
histogram.
R Program Code:
r
CopyEdit
# Install ggplot2 if it's not installed already
# install.packages("ggplot2")

# Load ggplot2 library

library(ggplot2)

# Step 1: Create a Sample Data Frame

data <- data.frame(
Category = c('A', 'B', 'C', 'D', 'E'),
Value = c(23, 45, 56, 78, 89),
Age = c(25, 30, 35, 40, 45)
)

# Step 2: Create a Bar Plot to visualize the 'Value' for each 'Category'
ggplot(data, aes(x = Category, y = Value, fill = Category)) +
geom_bar(stat = "identity") +
ggtitle("Bar Plot of Category vs Value") +
xlab("Category") +
ylab("Value") +
theme_minimal()

# Step 3: Create a Scatter Plot to visualize 'Age' vs 'Value'

ggplot(data, aes(x = Age, y = Value)) +
geom_point(color = "blue", size = 3) +
ggtitle("Scatter Plot of Age vs Value") +
xlab("Age") +
ylab("Value") +
theme_minimal()

# Step 4: Create a Histogram for the 'Value' column to show distribution

ggplot(data, aes(x = Value)) +
geom_histogram(bins = 5, fill = "skyblue", color = "black", alpha = 0.7) +
ggtitle("Histogram of Value") +
xlab("Value") +
ylab("Frequency") +
theme_minimal()

11. Generalize HIVE commands with an example.

General Hive Commands with Examples

Hive is a data warehouse system built on top of Hadoop, which provides an SQL-like query
language known as HiveQL to query and manage large datasets. Hive is commonly used to
work with large-scale structured data in a Hadoop ecosystem. Below are some common Hive
commands along with examples.
1. Creating Databases
Hive allows you to create databases for organizing tables.
• Command: CREATE DATABASE
Syntax:
CREATE DATABASE database_name;
Example:
CREATE DATABASE employee_db;

2. Creating a Table
In Hive, you can create a table with a defined schema to store data.
• Command: CREATE TABLE
Syntax:
CREATE TABLE table_name (
column1 datatype,
column2 datatype,
...
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY 'delimiter';
Example:
CREATE TABLE employee_details (
emp_id INT,
emp_name STRING,
emp_age INT,
emp_salary FLOAT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';

3. Loading Data into a Table

After creating a table, you can load data from a file into the table.
• Command: LOAD DATA
Syntax:
LOAD DATA INPATH 'hdfs_path' INTO TABLE table_name;
Example:
LOAD DATA INPATH '/user/hadoop/employee_data.csv' INTO TABLE employee_details;

4. Querying Data (SELECT)

Hive supports SQL-like queries to retrieve data from tables.
• Command: SELECT
Syntax:
SELECT column1, column2, ... FROM table_name WHERE condition;
Example:
SELECT emp_name, emp_salary FROM employee_details WHERE emp_age > 30;

5. Dropping a Table
To remove a table from Hive, use the DROP TABLE command.
• Command: DROP TABLE
Syntax:
DROP TABLE table_name;
Example:
DROP TABLE employee_details;

6. Altering a Table
You can alter a table's schema by adding, modifying, or dropping columns.
• Command: ALTER TABLE
Syntax:
ALTER TABLE table_name ADD COLUMNS (column_name datatype);
Example:
ALTER TABLE employee_details ADD COLUMNS (emp_department STRING);

7. Dropping a Database
To remove a database from Hive, use the DROP DATABASE command. The database must be
empty to drop it.
• Command: DROP DATABASE
Syntax:
DROP DATABASE database_name;
Example:
DROP DATABASE employee_db;

8. Listing Tables
You can list all tables in the current database.
• Command: SHOW TABLES
Syntax:
SHOW TABLES;
Example:
SHOW TABLES;

9. Describing a Table
To view the schema of a table, use the DESCRIBE command.
• Command: DESCRIBE
Syntax:
DESCRIBE table_name;
Example:
DESCRIBE employee_details;

10. Inserting Data into a Table

You can insert data into a table using the INSERT INTO command.
• Command: INSERT INTO
Syntax:
INSERT INTO TABLE table_name VALUES (value1, value2, ...);
Example:
INSERT INTO TABLE employee_details VALUES (1, 'John Doe', 28, 50000.0);
12. Compare the functionalities and use cases of MongoDB and traditional
relational databases.

13. Elaborate the Mapper and Reducer task with a neat sketch.
Elaboration of Mapper and Reducer Task in MapReduce with Diagram
MapReduce is a distributed data processing framework used in Hadoop to handle large-scale
data. It consists of two key phases:
1. Mapper Phase – Processes input data and converts it into key-value pairs.
2. Reducer Phase – Aggregates and processes intermediate key-value pairs to generate
the final output.

1. Mapper Task
• The Mapper takes input data, processes it, and emits key-value pairs as intermediate
output.
• It runs in parallel on multiple nodes to increase efficiency.
Example: Word Count in a Document
• Input: A text file containing sentences.
• The Mapper reads the file and splits it into words.
• It emits each word with a count of 1.
Input File Content:

Hello Hadoop
Hello Big Data
Mapper Output (Key-Value Pairs):

(Hello, 1)
(Hadoop, 1)
(Hello, 1)
(Big, 1)
(Data, 1)

2. Reducer Task
• The Reducer takes the output from the Mapper, aggregates the values based on keys,
and produces the final result.
• It processes data after shuffling & sorting, where keys with the same values are
grouped together.
Reducer Input (After Shuffling & Sorting):

(Big, [1])
(Data, [1])
(Hadoop, [1])
(Hello, [1,1])
Reducer Output (Final Word Count):

(Big, 1)
(Data, 1)
(Hadoop, 1)
(Hello, 2)

14. Write a Map reduce program to sort data by student name.

MapReduce Program to Sort Data by Student Name

In this MapReduce program, we will sort student names in ascending order. The Mapper will
read the student data and emit the name as the key and other details as the value. The
Reducer will collect the sorted names and output the final sorted list.

1. Input Data (students.txt)

Assume we have a file with student records in the format:
CopyEdit
102, Alice, 85
101, Bob, 90
103, Charlie, 78
104, David, 88
(Format: StudentID, Name, Marks)

2. MapReduce Implementation
Mapper Class
• Reads input records.
• Emits student name as the key and the rest of the record as the value.
Reducer Class
• Since keys are automatically sorted by Hadoop during the shuffle phase, the Reducer
simply outputs them in sorted order.
MapReduce Java Program for Sorting by Name

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class StudentSort {

// Mapper Class
public static class NameMapper extends Mapper<Object, Text, Text, Text> {
public void map(Object key, Text value, Context context) throws IOException,
InterruptedException {
String[] fields = value.toString().split(","); // Split by comma
if (fields.length == 3) {
String studentName = fields[1].trim(); // Name as key
String studentData = fields[0] + "," + fields[2]; // StudentID, Marks as value
context.write(new Text(studentName), new Text(studentData));
}
}
}

// Reducer Class
public static class NameReducer extends Reducer<Text, Text, Text, Text> {
public void reduce(Text key, Iterable<Text> values, Context context) throws IOException,
InterruptedException {
for (Text value : values) {
context.write(key, value); // Output sorted by key (student name)
}
}
}

// Driver Code
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "Sort Students by Name");
job.setJarByClass(StudentSort.class);
job.setMapperClass(NameMapper.class);
job.setReducerClass(NameReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

15. Differentiate between the various file compression techniques in

MapReduce and their impact on performance.

Impact on MapReduce Performance

1. Splittability:
oSplittable formats (Bzip2, LZO, Zstandard) allow parallel processing across
multiple nodes, improving efficiency.
o Non-splittable formats (Gzip, Snappy) require a single node to process the entire
file, reducing parallelism.
2. Compression Ratio vs. CPU Overhead:
o Higher compression ratios (Bzip2, Gzip) save disk space but require more
CPU power for decompression.
o Lower compression ratios (LZO, Snappy) are optimized for speed, reducing
CPU overhead.
3. Best Practices for MapReduce Jobs:
o Use Bzip2 or LZO for input files to enable splitting and parallel processing.
o Use Gzip or Zstandard for output files if storage efficiency is a priority.
o Use Snappy for applications where real-time speed is more important than
compression.

16. Write a Map reduce Program to search a specific keyword in a file.

MapReduce Program to Search for a Specific Keyword in a File

In this MapReduce program, we will search for a specific keyword in a text file and output the
lines that contain the keyword.

1. Input File (input.txt)

Example file content:
kotlin
CopyEdit
Hadoop is a distributed computing framework.
MapReduce is a programming model for big data.
Hadoop and Spark are used for large-scale data processing.
Data analytics is an important field in big data.
If the search keyword is "Hadoop", the program should return lines that contain the word
"Hadoop".

2. MapReduce Java Program for Keyword Search

public static class KeywordMapper extends Mapper<Object, Text, Text, Text> {

private String searchKeyword;

@Override
protected void setup(Context context) {
Configuration conf = context.getConfiguration();
searchKeyword = conf.get("keyword"); // Get keyword from configuration
}

public void map(Object key, Text value, Context context) throws IOException,
InterruptedException {
String line = value.toString();
if (line.contains(searchKeyword)) { // Check if line contains the keyword
context.write(new Text("Matching Line:"), new Text(line));
}
}
}

public static class KeywordReducer extends Reducer<Text, Text, Text, Text> {

public void reduce(Text key, Iterable<Text> values, Context context) throws IOException,
InterruptedException {
for (Text value : values) {
context.write(key, value); // Output matching lines
}
}
}

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();
conf.set("keyword", args[2]); // Set the keyword to search

Job job = Job.getInstance(conf, "Keyword Search");

job.setJarByClass(KeywordSearch.class);
job.setMapperClass(KeywordMapper.class);
job.setReducerClass(KeywordReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
17. Summarize the Architecture of HIVE in detail.

Hive Architecture Overview

18. Write a R program by creating a function to calculate factorial of a number

using an iterative approach.

R Program to Calculate Factorial of a Number Using Iterative Approach

R Program Code:
# Function to calculate factorial of a number using iterative approach
factorial_iterative <- function(n) {
# Initialize the result variable to 1
result <- 1

# Loop to calculate factorial

for (i in 1:n) {
result <- result * i
}

# Return the calculated factorial

return(result)
}

# Example usage of the function

number <- as.integer(readline(prompt = "Enter a number to calculate its factorial: "))

# Check if the number is non-negative

19. Evaluate the advantages and limitations of integrating MapReduce with R

in data analytics.

Advantages and Limitations of Integrating MapReduce with R in Data Analytics

MapReduce and R can be integrated using RHadoop, RHIPE, or SparkR for large-scale data
analytics. This integration combines the distributed processing power of MapReduce with R’s
statistical and machine-learning capabilities.
20. Describe the CRUD operations in MongoDB with an example.

CRUD Operations in MongoDB

// Connect to the MongoDB database

db = connect('mongodb://localhost:27017/testdb');
// Inserting a single document into the "employees" collection
db.employees.insertOne({
name: "John Doe",
age: 30,
position: "Software Engineer",
department: "IT"
});

// Inserting multiple documents into the "employees" collection

// Find all employees in the "employees" collection

db.employees.find();

// Find employees with a specific condition

db.employees.find({ department: "IT" });

// Find a single employee by name

// Update a single employee's position in the "employees" collection

db.employees.updateOne(
{ name: "John Doe" }, // Filter condition
{ $set: { position: "Senior Software Engineer" } } // Update operation
);

// Update multiple employees' department to "Engineering"

db.employees.updateMany(
{ department: "IT" },
{ $set: { department: "Engineering" } }
);

// Replace a document with new data

// Delete a single employee by name

db.employees.deleteOne({ name: "John Doe" });

// Delete multiple employees from the "employees" collection

Introduction To MapReduce
No ratings yet
Introduction To MapReduce
9 pages
Map Reduce
No ratings yet
Map Reduce
30 pages
SQL Q Spider Notes
No ratings yet
SQL Q Spider Notes
16 pages
S MapReduce Types Formats
100% (2)
S MapReduce Types Formats
22 pages
BDA Question Bank
No ratings yet
BDA Question Bank
8 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
64 pages
PythonReference 2022 en PDF
No ratings yet
PythonReference 2022 en PDF
594 pages
Bda Unit-3
No ratings yet
Bda Unit-3
44 pages
Cloud Unit 5
No ratings yet
Cloud Unit 5
52 pages
Lecture 1
No ratings yet
Lecture 1
58 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
27 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
74 pages
Big Data Unit 4 (Easy Notes) Edushine Classes
No ratings yet
Big Data Unit 4 (Easy Notes) Edushine Classes
34 pages
IDS Unit3
No ratings yet
IDS Unit3
19 pages
BDA Lab Manual 200305105108
No ratings yet
BDA Lab Manual 200305105108
44 pages
Ultimate Guide Coding For Beginners
100% (11)
Ultimate Guide Coding For Beginners
65 pages
Bda Unit 3
No ratings yet
Bda Unit 3
22 pages
Lez.d-01-Hadoop (A) Intro
No ratings yet
Lez.d-01-Hadoop (A) Intro
58 pages
BDA-MapReduce (1) 5rfgy656yhgvcft6
No ratings yet
BDA-MapReduce (1) 5rfgy656yhgvcft6
60 pages
Big Data Analysis PDF 2
No ratings yet
Big Data Analysis PDF 2
18 pages
Unit # 2
No ratings yet
Unit # 2
23 pages
BDA
No ratings yet
BDA
20 pages
Unit 3 Bda
No ratings yet
Unit 3 Bda
59 pages
Big Data Notes (All Lectures)
No ratings yet
Big Data Notes (All Lectures)
44 pages
18CS72-Big Data and Analytics 3rd Internal QP 7th Semester - Scheme of Evaluation
No ratings yet
18CS72-Big Data and Analytics 3rd Internal QP 7th Semester - Scheme of Evaluation
14 pages
The CAP Theorem Overview
No ratings yet
The CAP Theorem Overview
16 pages
Python Sets
No ratings yet
Python Sets
4 pages
Cookies and Sessions
100% (1)
Cookies and Sessions
40 pages
BDA Final Manual 1-8 Sourav
No ratings yet
BDA Final Manual 1-8 Sourav
43 pages
BDA Mayur
No ratings yet
BDA Mayur
43 pages
2 1-MapReduce
No ratings yet
2 1-MapReduce
16 pages
File Attributes
No ratings yet
File Attributes
4 pages
AbhijeetDholpuria - CV - Abhijeet Dholpuria
100% (1)
AbhijeetDholpuria - CV - Abhijeet Dholpuria
1 page
BDA IAT 1 Question Bank
No ratings yet
BDA IAT 1 Question Bank
21 pages
Bda QB3
No ratings yet
Bda QB3
22 pages
Kcs 061 PPT Unit 2
No ratings yet
Kcs 061 PPT Unit 2
56 pages
Computer Project
No ratings yet
Computer Project
46 pages
Questionsand Answers
No ratings yet
Questionsand Answers
23 pages
BDA Manual SHUBHAM
No ratings yet
BDA Manual SHUBHAM
22 pages
BDA - Manual - 1to6 Ayushi
No ratings yet
BDA - Manual - 1to6 Ayushi
22 pages
Big - Data - ISE 2
No ratings yet
Big - Data - ISE 2
12 pages
QBII
No ratings yet
QBII
15 pages
BDA Questions
No ratings yet
BDA Questions
8 pages
Viva
No ratings yet
Viva
12 pages
Big Data 2021-2022
No ratings yet
Big Data 2021-2022
18 pages
BDH Answer Bank
No ratings yet
BDH Answer Bank
21 pages
21SE28 BDA CA III SET B-Key
No ratings yet
21SE28 BDA CA III SET B-Key
8 pages
Chapter 1-React Native App Development
100% (1)
Chapter 1-React Native App Development
3 pages
An Introduction To Big Data - NoSQL - Data Science
No ratings yet
An Introduction To Big Data - NoSQL - Data Science
14 pages
Unit V Programming Model
No ratings yet
Unit V Programming Model
53 pages
Assgnment2 Group B
No ratings yet
Assgnment2 Group B
5 pages
WWW Doubtly in Big Data Analytics Semester 7 Mu Ai Ds Viva Qna
No ratings yet
WWW Doubtly in Big Data Analytics Semester 7 Mu Ai Ds Viva Qna
7 pages
Beaker Documentation: Release 1.6.2
No ratings yet
Beaker Documentation: Release 1.6.2
40 pages
13 @home
No ratings yet
13 @home
13 pages
Question Bank-BDA
No ratings yet
Question Bank-BDA
15 pages
BDC Previous Papers 2 Marks
100% (1)
BDC Previous Papers 2 Marks
7 pages
BDA Answers
No ratings yet
BDA Answers
6 pages
Ai Answer Key 7 Sample Paper-1
No ratings yet
Ai Answer Key 7 Sample Paper-1
4 pages
Model Paper BIG DATA (KOE097)
No ratings yet
Model Paper BIG DATA (KOE097)
8 pages
Bird Species Identification Using Deep Learning
No ratings yet
Bird Species Identification Using Deep Learning
74 pages
BDAunit III
No ratings yet
BDAunit III
4 pages
Unit 4
No ratings yet
Unit 4
11 pages
BDA QN Bank All Units
No ratings yet
BDA QN Bank All Units
5 pages
Summarized Short Notes On Web Development
No ratings yet
Summarized Short Notes On Web Development
4 pages
Unit I Lexical Analysis
No ratings yet
Unit I Lexical Analysis
27 pages
Join Algorithms Using Mapreduce: A Survey: Vikas Jadhav, Jagannath Aghav, Sunil Dorwani
No ratings yet
Join Algorithms Using Mapreduce: A Survey: Vikas Jadhav, Jagannath Aghav, Sunil Dorwani
5 pages
Excel Automation
No ratings yet
Excel Automation
4 pages
Mapreduce: Simplified Data Processing On Large Clusters by Jeffrey Dean and Sanjay Ghemawa Presented by Jon Logan
No ratings yet
Mapreduce: Simplified Data Processing On Large Clusters by Jeffrey Dean and Sanjay Ghemawa Presented by Jon Logan
30 pages
Big Data Masters Program
No ratings yet
Big Data Masters Program
13 pages
Navas: Navigation Approaches For Answer Sets: Asmaa Afeefi
No ratings yet
Navas: Navigation Approaches For Answer Sets: Asmaa Afeefi
6 pages
Top Answers To Map Reduce Interview Questions
No ratings yet
Top Answers To Map Reduce Interview Questions
6 pages
Hadoop Unit III DR David
No ratings yet
Hadoop Unit III DR David
12 pages
Big Assignment 2
No ratings yet
Big Assignment 2
10 pages
Shubham Singh
No ratings yet
Shubham Singh
14 pages
Final VT Report
No ratings yet
Final VT Report
36 pages
Spring Transaction Management
No ratings yet
Spring Transaction Management
25 pages
How To Configure UWL
No ratings yet
How To Configure UWL
22 pages
Computer Science Investigatory Project
50% (2)
Computer Science Investigatory Project
15 pages
Lab 5
No ratings yet
Lab 5
3 pages
SIT Exam Portions: Questions For Dbms
No ratings yet
SIT Exam Portions: Questions For Dbms
4 pages
B.tech - Non Credit Courses For 2nd Year Students
No ratings yet
B.tech - Non Credit Courses For 2nd Year Students
4 pages
Module 1
No ratings yet
Module 1
3 pages
PL Prelim Exam
No ratings yet
PL Prelim Exam
2 pages
Certified Hadoop and Spark Course Curriculum
No ratings yet
Certified Hadoop and Spark Course Curriculum
9 pages
Servlet Interview Questions
No ratings yet
Servlet Interview Questions
5 pages
Clojure and Rabbit MQ
No ratings yet
Clojure and Rabbit MQ
7 pages
Mapreduce Programming Model and Design Patterns: Andrea Lottarini January 17, 2012
No ratings yet
Mapreduce Programming Model and Design Patterns: Andrea Lottarini January 17, 2012
23 pages
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Professional Hadoop Solutions
From Everand
Professional Hadoop Solutions
Boris Lublinsky
4/5 (2)
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.