0% found this document useful (0 votes)
18 views3 pages

BDA Question Bank AY 2023-24

The document outlines a question bank for a big data analytics course divided into 6 units. The units cover topics like getting an overview of big data, technologies for handling big data including Hadoop, MapReduce, YARN, Hive and Pig, analytics approaches and tools, and exploring R. Multiple questions are provided under each unit covering concepts, processes, architectures and applications of the topics.

Uploaded by

yashodeep1050
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views3 pages

BDA Question Bank AY 2023-24

The document outlines a question bank for a big data analytics course divided into 6 units. The units cover topics like getting an overview of big data, technologies for handling big data including Hadoop, MapReduce, YARN, Hive and Pig, analytics approaches and tools, and exploring R. Multiple questions are provided under each unit covering concepts, processes, architectures and applications of the topics.

Uploaded by

yashodeep1050
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Subject: Big Data Analytics

Question Bank
Unit-1 Getting an Overview of Big Data
1. What is Big Data? List and explain the common types of data and their sources.
2. List and discuss the four element of Big Data.
3. What is Big Data Analytics? Describe the three types of analytics.
4. Describe the advantages of Big Data Analytics.
5. Describe the challenges faced during the evolution of Big Data.
6. Write note on the following
a. Big Data Companies
b. Job Titles in Big Data
c. Skills required for Big Data Professionals

Unit-2 Technologies for Handling Big Data


1. What is distributed computing? Explain the working of distributed computing
environment.
2. List and explain the differences between parallel and distributed systems.
3. Discuss the techniques of parallel computing.
4. What is Hadoop? Draw and explain the Hadoop multinode cluster architecture.
5. Describe the important features of Hadoop.
6. What is Mapreduce? Draw the Hadoop Mapreduce architecture and explain it with role of
Mapreduce components.
7. Describe the Job Tracking Process in Mapreduce.
8. What is Hadoop Ecosystem? How the various elements of Hadoop involve at various
stages of processing data?
9. What is Hadoop Distributed File System (HDFS)? Draw and explain the architecture of
HDFS.
10. Illustrate the use of Heartbeat Message to maintain the Hadoop Performance.

Unit-3 Understanding Hadoop MapReduce and YARN Fundamentals


1. List and Explain the main features of MapReduce
2. Discuss the working of the MapReduce algorithm.
3. Discuss the techniques that can be used to optimize MapReduce jobs.
4. List and describe the fields benefitted by the use of MapReduce.
5. What are the limitations of MapReduce?
6. Discuss the advantages of YARN over MapReduce.
7. Explain the YARN architecture.
8. Describe the working of YARN.
9. Describe the two types of schedulers commonly available with YARN.
Unit-4 Exploring Hive and Pig
1. Define the Hive. Draw and describe the architecture of Hive.
2. List and explain the Hive Services.
3. List and explain both primitive and complex data types available in Hive.
4. List and explain any seven built-in functions available in Hive.
5. List and explain any seven aggregate functions available in Hive.
6. Write the Hive Commands for:
i. Create a database named Employee_DB
ii. Create a table employee with columns (empid, ename, designation) partitioned by
designation
iii. Show the structure of employee table
iv. Modify the name of column ename to emp_name
v. Add the column salary
vi. Rename table to employee_new
vii. Delete a table employee_new

7. Write the complete syntax of SELECT command in Hive and write the Hive queries for
the following:
i. Retrieve the all the columns and rows from student table.
ii. Retrieve the sales records that have an amount greater than 15000 from the US
region
iii. Calculate the average marks obtained by students from all semester. Result should
display semester and average marks.
iv. Display the only ten records of students
v. For table Sales (product, category, salesvalue), retrieve the categories of product
with total salesvalue greater than 300.
8. What is the use of Pig? Explain the benefits of Pig.
9. Discuss the two modes used for running the Pig scripts.
10. What are the main reasons for developing Pig Latin?
11. Describe the Pig Latin Application flow with the different types statements used in it.
12. Describe the use of the following operators in the Pig Latin:
i. FOREACH
ii. ASSERT
iii. FILTER
13. Describe the use of the following operators in the Pig Latin:
i. GROUP
ii. ORDER BY
iii. DISTINCT
14. Describe the use of the following operators in the Pig Latin:
i. JOIN
ii. SAMPLE
iii. SPLIT
Unit 5: Understanding Analytics, Analytical Approaches and Tools to Analyze
Data
1. Compare the reporting with analysis.
2. Describe the phases of the Analytic Process
3. Describe the basic analytics in detail.
4. Describe the advanced analytics in detail.
5. Describe the ensemble method of analytical approaches.
6. What is Text Data Analysis? Give some examples of data sources and different data
structure types used in case of text data analysis.
7. List the analytical tools. Explain the features and limitations of R.
8. Describe the analytical tool IBM SPSS with its features.
9. Describe the Statistical Analysis System (SAS) as information delivery system with its
features.
10. Compare the different analytical tools such as R, IBM SPSS and SAS.
Unit 6: Exploring R
1. What is R? Explain the programming features of R.
2. Demonstrate the use of functions that allow users to handle the data in workspace :
a. ls () b) rm () c) save () d) load ()
3. Demonstrate the use of following commands used to import and export large amount of
data in R :
a) read.csv () b) read.table () c) write.csv () d) write.table ()
4. Demonstrate the different ways of combining data by using the merge () function.
5. Demonstrate the use of sort () and order () functions.
6. Demonstrate the use of melt () and dcast () functions.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy