BDA Question Bank
BDA Question Bank
UNIT-2 Hadoop
1. Explain working of following phases of Map Reduce with one common example.
● Map Phase
● Combiner Phase
● Shuffle and Sort Phase
● Reducer Phase
2. Write Map Reduce code for counting occurrences of specific words in the input text
file(s). Also write the commands to compile and run the code.
3. Explain Job Scheduling in Map Reduce. How it is done in case of
● The Fair Scheduler
● The Capacity Scheduler
4. Explain Avro data serialization technique in MapReduce
5. Explain “Map Phase” and “Combiner Phase” in MapReduce.
6. What is Resilient Distributed Dataset in Apache Spark? Explain in detail. Make a
note on why RDD is better than Map Reduce data storage?
7. What are the advantages of Hadoop? Explain Hadoop Architecture and its
Components with proper diagram.
8. Explain working of Hive with proper steps and diagram.
9. What do you mean by HiveQL Data Definition Language? Explain any three
HiveQL DDL command with its syntax and example
10. Draw HDFS Architecture. Explain any two commands of HDFS from following
commands with syntax and al least one example of each.
● copyFromLocal
● setrep
● checksum
11. Explain core architecture of Hadoop with suitable block diagram. Discuss role of
each component in detail.
12. What is Hadoop Ecosystem? Discuss various components of Hadoop Ecosystem.
13. List various configuration files used in Hadoop Installation. What is use of
mapred-site.xml?
UNIT-3 NoSQL
1. Write a short note on NoSQL databases. List the differences between NoSQL and
relational databases?
2. Define NO SQL Database.
3. What is Key Value data store?
4. Compare document store vs Key value store.
5. Differentiate master-slave versus peer-to-peer models.
6. List the classification of NoSQL Databases and explain about Key-Value Stores.
7. What is NoSQL? What are the advantages of NoSQL? Explain the types of NoSQL
databases.
8. Differences between SQL Vs NoSQL explain it with suitable example.
UNIT-5 Frameworks
1. What is Zookeeper? What are the benefits of Zookeeper?
2. Draw architecture of APACHE PIG and explain in short.
3. Explain in detail about HIVE.
4. What is Hive?
5. Difference Between Hbase and Hive.
6. What is Hbase?
UNIT - 6 Spark
1. Explain Spark components in detail. Also list the features of spark.
2. What are the problems related to Map Reduce data storage? How Apache Spark
solves it using Resilient Distributed Dataset? Explain RDDs in detail.
3. What is Apache Spark?
4. Explain the key features of Apache Spark.
5. What are benefits of Spark over MapReduce?
6. Describe HBase and ZooKeeper in details.