CCBD Question Bank
CCBD Question Bank
Question Bank
Course Objectives
• To provide an overview of Apache Hadoop, Provide HDFS Concepts and Interfacing with HDFS
Course Outcomes
After completion of the course, the students should be able to:
CO1 – Explain the core concepts of the cloud computing paradigm: how and why this paradigm
shift came about, the characteristics, advantages and challenges brought about by the various
models and services in cloud computing. (K3)
CO2 – Apply fundamental concepts in cloud infrastructures to understand the tradeoffs in
power, efficiency and cost, and then study how to leverage and manage single and multiple data
centers to build and deploy cloud applications that are resilient, elastic and cost-efficient. (K2)
CO3 – Illustrate the fundamental concepts of cloud virtualization. (K4)
CO4 – Identify Big Data and its Business Implications. (K2)
CO5 – List the components of Hadoop and Hadoop Eco-System, Access and Process Data on
Distributed File System.(K3)
Types of Digital Data, Introduction to Big Data, Big Data Analytics, History of Hadoop, Apache
Hadoop, Analyzing Data with Unix Tools, Analyzing Data with Hadoop, Hadoop Streaming,
Hadoop Echo System, IBM Big Data Strategy, Introduction to Info Sphere Big Insights and Big
Sheets.
The Design of HDFS, HDFS Concepts, Command Line Interface, Hadoop File System Interfaces,
Data Flow, Data Ingest with Flume and Scoop and Hadoop Archives, Hadoop I/O: Compression,
Serialization, Avro and File-Based Data Structures. Anatomy of a Map Reduce Job Run, Failures,
Job Scheduling, Shuffle and Sort, Task Execution, Map Reduce Types and Formats, Map Reduce
Features.
Unit I
2Marks
1.
2. Define Cloud Computing.
3. Define Parallel Computing and Centralized computing.
4. List out the cluster design issues.
5. Describe the applications of high performance and high throughput systems.
6. Tabulate the difference between the high performance computing and high throughput
computing
7. Name the essential characteristics of cloud computing.
8. Give the advantages of cloud computing.
9. Highlight the importance of the term “cloud computing.”
10. Identify any two advantages of distributed computing.
11. Bring out the differences between private cloud and public cloud.
12. Illustrate the evolutionary trend towards distributed and cloud computing.
13. What are the characteristics of cloud architecture that separates it from traditional one?
14. Interpret the cloud resource pooling.
15. Outline elasticity in cloud.
16. Mention what is the difference between elasticity and scalability in cloud computing?
17. List few drawbacks of grid computing.
18. How is On Demand provisioning of resources applied in cloud computing?
19. Assess properties of Cloud Computing.
20. Formulate the technologies on which cloud computing relies.
21. Investigate how can a company benefit from cloud computing.
22. Define public clouds.
23. Write a short note on community cloud.
24. Define IaaS.
25. State the differences between PaaS and SaaS
26. Why do we need a hybrid cloud
27. State the role of cloud auditor in cloud.
28. What are the different layers available in cloud architecture design?
29. What are the various components of NIST Cloud computing reference architecture?
30. Differentiate cloud consumer and provider.
5Marks
10Marks
1. Discuss about various dimensions of scalability and performance laws in distributed system.
2. It is said, ‘cloud computing can save money’. What is your view? Can you name some open
source cloud computing platform databases? Explain any one database in detail.
3. Create and justify Cloud architecture application design with neat sketch.
4. Briefly explain each of the cloud computing services. Identify two cloud providers by
company name in each service category.
5. I am starting a new company to analyse videos. I’ll need a lot of storage as videos consume
quite a bit of disk. Additionally, I’ll need ample computational power, possibly running
applications concurrently. I have discovered some very good tools to facilitate development in
Windows but the deployment will be more effiicently handled in the Linux environment. All
the pointers say that I need to move to cloud. I have found that SaaS is the most attractive
service, followed by PaaS and IaaS, in that order. Given the above information, which service
do you recommend? Why?
6. Evaluate and contrast the merits and demerit of Cloud deployment models: public, private,
hybrid, community.
7. Evaluate about the architectural design of compute and storage clouds.
8. Under what circumstances should you prefer to use PaaS over IaaS? Formulate it with an
example
UNIT II
2Marks
10marks
UNIT III
2Marks
5Marks
10marks
UNIT IV
2marks
1. Define MapReduce.
2. What is the role of Reduce function?
3. List out the Hadoop core fundamental layers
4. Compare Reporting and Analysis with its process.
5. Explain the following.
a.Advanced analytics
b. Operationalized analytics
c. Monetized analytics \
6. How to develop an analytical team and what is the skill required for an analyst?
7. Distinguish statistical significance and business importance.
8. What are the roles of analytical team and IT team with a detailed note on text analysis?
9. Explain in detail the commonly used analytical approaches?
11.How analytical tools have evolved from graphical user interfaces to point solutions to data
visualization tools?
9. Give a detailed note on features and limitations of R programming and IBM SPSS.
a. SAS
5Marks
3. Discuss the techniques which is used to optimize the map reduce jobs.
d. Org.apcahe.hadoop.io.package
9. What is Meta data? What information does it provide and explain the role of Name node in a HDFS
clusters?
10. Define Command line interface using HDFS files and give a brief note on Hadoop-specific file
10Marks
Write the definition of “big data” and under what conditions it is given that name.
10. Define the reason behind the phrase “Web data is the most popular big data” .
UNIT V
2marks
5Marks
1. Define Big Data . Describe the main features of big data in detail.
2. (i) Explain the main characteristics features and structure of Big data in detail.
diagram.
(ii) Describe the use of Massive Parallel Processing system in big data analytics
7. Point out in detail the analysis tools and reporting tools used in Bigdata.
8. Discuss in detail about Analytical data set and the types of analytical data set.
10. Illustrate in detail how big data are effectively filtered and mixed with the traditional one. (13)
14. (i) Explain in detail about the web data in current action today.
10marks
1. Summarize in detail about the challenges of the Big Data in Modern Data Analytics.
2. Hypothesize the statement “Web Data is the Most Popular Big Data” with reference to data analytic
professional.”
3. Infer on the statement “Is the “Big” Part or the “Data” Part More Important “.
4. Formulate the role of analytic sandbox, its benefits and types. Give the definition of Hadoop.
9. Differentiate between Hadoop and Map Reduce. 10. Point out the characteristics of Hadoop.