0% found this document useful (0 votes)
151 views8 pages

Devoir Surveillé: Please Answer The Following Multiple-Choice Questions

This document contains a survey with 36 multiple choice questions about big data and Hadoop concepts like MapReduce, HDFS, Spark, and HBase. The questions cover topics such as the roles of TaskTrackers, default replication in Hadoop, supported shells in Spark, and characteristics of Hive external tables.

Uploaded by

Ons Nouili
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
151 views8 pages

Devoir Surveillé: Please Answer The Following Multiple-Choice Questions

This document contains a survey with 36 multiple choice questions about big data and Hadoop concepts like MapReduce, HDFS, Spark, and HBase. The questions cover topics such as the roles of TaskTrackers, default replication in Hadoop, supported shells in Spark, and characteristics of Hive external tables.

Uploaded by

Ons Nouili
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Ministère de l’Enseignement Supérieur et de la Recherche Scientifique

Université de Carthage
Institut Supérieur des Technologies de l’Information et de la Communication

Classes : LFSI-3 Nombre de pages : 8


Devoir Surveillé
Matière : BIG DATA
Enseignants : Zayneb Trabelsi Date : 09/03/2020 Durée : 1h30
Documents : autorisés 
Nom : ………………………………………. Groupe : …………………………… : non autorisés 

Prénom : …………………………………..

Please answer the following multiple-choice questions:


1. Under MapReduce V1, what is one purpose of the TaskTracker?

A1. ☐ Coordinates MapReduce jobs

A2. ☒ Manages storage and transmission of intermediate output

A3. ☐ Accepts MapReduce jobs submitted by clients

2. What is the default number of replicas in a Hadoop system?

A1. ☐ two

A2. ☒ three

A3. ☐ four

3. Spark provides the shell in two programming languages: (select two.)

A1. ☒ Scala

A2. ☒ Python

A3. ☐ R

A4. ☐ SQL

4. One of the driving principal of Hadoop is that the data is brought to the program?

A1. ☐ True

A2. ☒ False

1
5. Which database is a NoSQL key-value storage database?

A1. ☒ REDIS

A2. ☐ HBase

A3. ☐ Neo4j

6. Which of the following options is NOT CORRECT ?

A1. ☐ Big data solutions are ideal for analyzing not only raw structured data, but semi structured and

unstructured data from a wide variety of sources.

A2. ☒ Big data solutions are ideal for Online Transaction Processing (OLTP) environments.

A3. ☐ Big data solutions are ideal for iterative and exploratory analysis when business measures on
data are not predetermined.

7. True or False : CREATE statement in HBase requires the name of the table and the list of its
columns

a. ☐ True

b. ☒ False

8. What is NOT true regarding Hive External Tables?

a. ☐ data is stored outside the Hive warehouse directory

b. ☒ data is stored outside the Hadoop cluster

c. ☐ when dropping table, only the metadata is deleted, and data is left untouched

d. ☐ All of the above.

9. If Active NameNode fails, then which of the following Node takes all the responsibility of
active node?

A1. ☐ Secondary NameNode

A2. ☒ Standby NameNode

A3. ☐ All of the above.

10. Which of the following statement is true about metadata?

A1. ☐ Metadata shows the structure of HDFS directories/files

A2. ☐ FsImage & EditLogs are metadata files

A3. ☐ Metadata contain information like number of blocks, their location, replicas

A4. ☒ All of the above

2
11. The source of HDFS architecture in Hadoop originated as

A1. ☒ Google distributed filesystem

A2. ☐ Yahoo distributed filesystem

A3. ☐ Facebook distributed filesystem

A4. ☐ None of the above

12. What does Big Data represent?

A1. ☐ A Hadoop feature capable of processing vast amounts of data in-parallel on large clusters of
commodity hardware in a reliable, fault-tolerant manner.

A2. ☒ A concept and platform of technologies with the characteristics of the 5 Vs that is able to handle
large amounts of unstructured, semi-structured, and structured raw data unlike traditional systems.

A3. ☐ A database feature capable of converting pre-existing structured data into unstructured raw
data.

A4. ☐ Only data stored in the BIGDATA table in any relational database.

13. Which is not a principle of Hadoop

A1. ☐ Bring processing to Data

A2. ☐ Replicate data blocks to multiple nodes

A3. ☒ None of the above

14. Which is the 5th V that is the real purpose of working with Big Data to obtain business
insight?

A1. ☐ Volume

A2. ☐ Variety

A3. ☒ Value

A4. ☐ None of the above

15. What is Hadoop ?

A1. ☒ An open-source software framework for distributed storage and distributed processing of Big
Data on clusters of commodity hardware.

A2. ☐ It was conceived for high-end, expensive hardware.

A3. ☐ An environment for on-line transaction processing.

A4. ☐ It consists of 3 sub projects: MapReduce, Hive and Hadoop Common.

A5. ☐ All of the above

3
16. What are the main components of Hadoop ? (Select four.)

A1. ☒ Map Reduce

A2. ☐ HBase

A3. ☒ Hadoop Distributed File System

A4. ☒ YARN

A5. ☒ Hadoop Common

17. What is NOT true about Hadoop Distributed File System (HDFS) ?

A1. ☐ Can create, delete, copy but not update

A2. ☐ Files split into blocks

A3. ☐ Data access through MapReduce

A4. ☒ Designed for random access not streaming reads

18. What is NOT true about the Namenode StartUp ?

A1. ☐ NameNode reads fsimage in memory

A2. ☐ NameNode applies editlog changes

A3. ☐ NameNode exits safemode when 99.9% of blocks have at least one copy accounted for.

A4. ☒ NameNode stores data blocks

A5. ☐ NameNode waits for block data from data nodes

19. What is an optional MapReduce Task ?

A1. ☐ Map

A2. ☐ Shuffle

A3. ☐ Reduce

A4. ☒ Combiner

20. Which is the Hadoop-related Apache project that utilizes an in-memory architecture to run
applications faster than MapReduce?
A1. ☐ Hive

A2. ☐ Python

A3. ☒ Spark

A4. ☐ Pig

4
21. Apache Spark can run on which two of the following cluster managers? (Select two.)

A1. ☒ Apache Mesos

A2. ☒ Hadopp Yarn

A3. ☐ Linux cluster manager

A4. ☐ onesies

22. What Hadoop command is used to copy files from HDFS to local file system:

a. ☐ hadoop fs -put

b. ☒ hadoop fs -get

c. ☐ hadoop copyfromlocal

23. What is NOT true about HDFS-2 Namenode HA

A1. ☐ Memory state of Standby Namenode is very close to Active Namenode

A2. ☒ Datanodes send heartbeats to the Active Namenode only

A3. ☐ Unlike the Secondary NameNode, the Standby NameNode allows a fast failover to a new
NameNode in the case that a machine crashes

A4. ☐ None of the above

24. True or False: Hive comes with an HBase storage handler.

A1. ☒ True

A2. ☐ False

25. The clause that indicates the storage file/record format on HDFS.

A1. ☒ stored as

A2. ☐ stored in

A3. ☐ stored to

A4. ☐ None of the above

26. Which two factors in a Hadoop cluster increase performance most significantly? (Select
two.)

A1. ☐ immediate failover of failed disks

A2. ☐ large number of small data files

A3. ☒ high-speed networking between nodes

A4. ☐ data redundancy on management nodes

A5. ☒ parallel reading of large data files

5
27. What is not true about HBase?

A1. ☐ An industry leading implementation of Google's Big Table design

A2. ☐ An open source Apache Top Level Project

A3. ☒ A RDBMS that powers some of the leading sites on the Web

A4. ☐ A NoSQL data store

A5. ☐ None of the above

28. Under the HDFS architecture, what is one purpose of the NameNode?

A1. ☐ to coordinate MapReduce jobs

A2. ☐ to periodically report status to DataNode

A3. ☒ to regulate client access to files

29. What is the "scan" command used for in HBase?


a. ☐ to get detailed information about the table
b. ☐ to report any inconsistencies in the database
c. ☐ to list all tables in Hbase
d. ☒ to view data in an Hbase table

30. Hadoop environments are optimized for:

A1. ☐ Processing transactions (random access).

A2. ☐ Low latency data access.

A3. ☒ Batch processing on large files.

A4. ☐ Intensive calculation with little data.

31. How are data stored in a Hadoop cluster?

A1. ☒ The data is divided into blocks, and copies of these blocks are replicated across multiple servers
in the Hadoop cluster.

A2. ☐ The data converted into a single block, and the block is stored in just one of the servers in the
Hadoop cluster.

A3. ☐ The data is divided into blocks, each block is stored in a different server in the Hadoop cluster,
but the blocks are not replicated.

A4. ☐ The data is converted into a single block, and copies of this block are replicated across multiple
servers in the Hadoop cluster.

6
32. Which of the following options best describes the proper usage of MapReduce jobs in
Hadoop environments?

A1. ☒ MapReduce jobs are used to process vast amounts of data in-parallel on large clusters of
commodity hardware in a reliable, fault-tolerant manner.

A2. ☐ MapReduce jobs are used to process small amounts of data in-parallel on expensive hardware,
without fault-tolerance.

A3. ☐ MapReduce jobs are used to process structured data in sequence, with fault-tolerance.

33. In a traditional Hadoop stack, which of the following components provides data warehouse
infrastructure and allows SQL developers and business analysts to leverage their existing SQL
skills?

A1. ☐ MapReduce.

A2. ☒ Hive.

A3. ☐ Zookeeper.

34. Following the most common HDFS replica placement policy, when the replication factor is
three, how many replicas will be located on the local rack?

A1. ☐ two

A2. ☒ one

A3. ☐ three

35. Which primary computing bottleneck of modern computers is addressed by Hadoop?

A1. ☐ MIPS

A2. ☒ disk latency

A3. ☐ limited disk capacity

36. What are two primary limitations of MapReduce v1? (Select two.)
A1. ☒ Resource utilization

A2. ☒ Scalability

A3. ☐ Workloads limited to MapReduce

A4. ☐ Number of TaskTrackers limited to 1000

A5. ☐ TaskTrackers can be a bottleneck to Mapreduce jobs

7
37. In Spark, to do an RDD transformation by reading in a file that was previously loaded to
HDFS, we type the following:
val pp = sc.textFile("Gutenberg/Pride_and_Prejudice.txt")

True or False: “pp” is just a pointer to the RDD

a. ☒ True

b. ☐ False

38. Which Hbase command lists individual row data?

a. ☐ scan

b. ☐ show all

c. ☒ get

d. ☐ All of the above.

39. Which command must be sometimes executed before deleting an HBase table or changing
its settings ?

a. ☐ alter

b. ☒ disable

c. ☐ None of the above

40. The following command is executed in HDFS.


hadoop fs -copyFromLocal /home/student/labfiles/*.txt Gutenberg

This command can also be executed as:


a. ☒ hadoop fs -put /home/student/labfiles/*.txt Gutenberg

b. ☐ hadoop fs -get /home/student/labfiles/*.txt Gutenberg

c. ☐ None of the above

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy