IT6006-Data Analytics Department of CSE 2018-2019
IT6006-Data Analytics Department of CSE 2018-2019
Ou t look
?
Tem p er a t u r e
St. Joseph’s College of Engineering P la y Page 65 of 193
H u m id it y
Win d y
IT6006- Data Analytics Department of CSE 2018-2019
UNIT I INTRODUCTION TO BIG DATA
Introduction to Big Data Platform – Challenges of conventional systems – Web data –
Evolution of Analytic scalability, analytic processes and tools, Analysis vs reporting –
Modern data analytic tools, Stastical concepts: Sampling distributions, resampling,
statistical inference, prediction error.
UNIT-I / PART-A
1. What are the Three Vs of big data? (Nov/Dec 2016)
3Vs (volume, variety and velocity) are three defining properties or dimensions of big
data. Volume refers to the amount of data, variety refers to the number of types of data
and velocity refers to the speed of data processing.
2. What is the source of the Big Data?
Human Generated Data, ◦Email, ◦Social Networks, ◦Cloud Storage hosted info,
◦Electronics publications –Scientific, Engineering, social, admin, ◦Enterprise Web pages,
◦Product Documents, ◦Legacy documents presented in electronic form, ◦On-line Videos
(TV, YouTube,), ◦Photos, ◦Machine Generated Data.
3. What do we do with Big Data?
•Extract useful patterns from Big Data and use it for
•Short Case Study 1: Amazon’s personalized product recommendation
•Short Case Study 2:FB personalized product marketing
•Short Case Study 3: Google search –getting competitive edger over Wikipedia.
4. List some Big Data Platform.
Hadoop -Supports a large Hadoop distributed file system (HDFS)
•Supports a batch-oriented distributed computing architecture with Map Reduce
•HDFS and Map Reduce are the two core components of Hadoop
•Hadoop Apacheis open source. Suited for Batch processing,
•Not suitable for Ad-hoc queries (processing)
5. What are the other alternative application development languages?
The other alternative is to use one of the Map Reduce application development
languages Among these language, Pig, Hive, Jaql are popular one.
6. What is Hive?
Hive is one of the programming languages that support Map Reduce application
development. Hive is developed at Face book. Hive is a query language -Hive Query
Language (HQL). Hive queries are broken down into Map Reduce jobs and executed
across a Hadoop Cluster.
7. What are the risks of BIG DATA?
An organization will be so overwhelmed with big data that it won’t make any
progress.
That costs escalate too fast as too much big data is captured before an
organization knows what to do with it. As with anything, avoiding this is a
matter of making sure that progress moves at a pace that allows the organization
to keep up.
The biggest risk with many sources of big data is privacy.
8. Draw the big data taxonomy.