0% found this document useful (0 votes)

91 views10 pages

Big-Data-Analytics Notes For Ug

Definition of Big Data, Classification and applications

Uploaded by

M S S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

91 views10 pages

Big-Data-Analytics Notes For Ug

Definition of Big Data, Classification and applications

Uploaded by

M S S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

BIG DATA ANALYTICS

Unit Structure
2.0 Objectives
2.1 Introduction to big data analytics
2.2 Classification of Analytics
2.3 Challenges of Big Data
2.4 Importance of Big Data
2.5 Big Data Technologies
2.6 Data Science
2.7 Responsibilities
2.8 Soft state eventual consistency
2.9 Data Analytics Life Cycle
Summary
Review Questions

2.0 OBJECTIVES

Big Data is creating significant new opportunities for

organizations to derive new value and create competitive advantage from
their most valuable asset: information. For businesses, Big Data helps
drive efficiency, quality, and personalized products and services,
producing improved levels of customer satisfaction and profit. For
scientific efforts, Big Data analytics enable new avenues of investigation
with potentially richer results and deeper insights than previously
available. In many cases, Big Data analytics integrate structured and
unstructured data with Realtime feeds and queries, opening new paths to
innovation and insight.

2.1 INTRODUCTION TO BIG DATA ANALYTICS

Big Data Analytics is...
1 Technology-enabled analytics: Quite a few data analytics and
visualization tools are available in the market today from leading
vendors such as IBM, Tableau, SAS, R Analytics, Statistica, World
Programming Systems (WPS), etc. to help process and analyze your big
data.
2. About gaining a meaningful, deeper, and richer insight into your
business to steer it in the right direction. understanding the customer's
demographics to cross-sell and up- sell to them, better leveraging the
services of your vendors and suppliers, etc.

1
3. About a competitive edge over your competitors by enabling you
with findings thatallow quicker and better decision-making.

4. A tight handshake between three communities: IT, business users,

and data scientists.Refer Figure 3.3.

5. Working with datasets whose volume and variety exceed the current
storage and processing capabilities and infrastructure of your
enterprise.

About moving code to data. This makes perfect sense as the program for
distributed processing is tiny (just a few KBs) compared to the data
(Terabytes or Petabytes today and likely to be Exabytes or Zettabytes in
the near future).

2.2 CLASSIFICATION OF ANALYTICS

There are basically two schools of thought:

1 Those that classify analytics into basic, operationalized, advanced and
Monetized.
2 Those that classify analytics into analytics 1.0, analytics 2.0, and
analytics 3.0.

2.2.1. First School of Thought

It includes Basic analytics, Operationalized analytics, Advanced

analytics and Monetizedanalytics.

Basic analytics: This primarily is slicing and dicing of data to help with
basic business insights. This is about reporting on historical data, basic
visualization, etc.

(Big Data and Analytics)

2
Operationalized analytics: It is operationalized analytics if it gets
woven into the enterprisesbusiness processes.
Advanced analytics: This largely is about forecasting for the future by
way of predictive andprescriptive modelling.
Monetized analytics: This is analytics in use to derive direct business
revenue.

2.2.2 Second School of Thought:

Let us take a closer look at analytics 1.0, analytics 2.0, and analytics
3.0. Refer Table 2.1. Figure 2.1 shows the subtle growth of analytics
from Descriptive 🡪 Diagnostic 🡪 Predictive 🡪 Perspective analytics.

Analytics 1.0 Analytics 2.0 Analytics 3.0

Era: mid 1990s to 2005 to 2012 2012 to present
2009 Descriptive Descriptive statistics Descriptive + predictive
predictive statistics (use +
statistics (report on data from the past to prescriptive statistics (use
events, occurrences, etc. make predictions for the data from the past to
of the past) future) make
prophecies for the future
and at the same time
make
recommendations to
leverage the situation to
one's advantage)
key questions asked: key questions asked: Key questions asked:
What happened? What happened? What will happen?
Why did it happen? Why will it happen? When will it happen?
Why will it happen?
What should be the
action
taken to take advantage
of
what will happen?
Data from legacy Big data A blend of big data and
systems. ERP, CRM, and data from legacy systems,
3rd party applications. ERP, CRM, and 3rd party
applications.
Small and structured data Big data is being taken up A blend of big data and
sources. Data stored in seriously. Data is mainly traditional analytics to
enterprise data unstructured, arriving at a yield insights and
warehouses or data marts. much higher pace. This offerings with speed and
fast flow of data entailed
impact.
that the influx of big
volume data had to be
stored and processed
rapidly, often on massive
parallel servers running
Hadoop.
Data was internally Data was often Data is both being
sourced. externally sourced. internally and externally
sourced.

3
Relational databases Database appliances, In memory analytics, in
Hadoop clusters, SQL to database processing, agile
Hadoop environments, analytical methods,
etc. machine
learning techniques etc.
Table 2.1Analytics 1.0, 2.0 and 3.0 (Big Data and Analytics)

2.3 CHALLENGES OF BIG DATA

There are mainly seven challenges of big data: scale, security,

schema, Continuous availability, Consistency, Partition tolerant and data
quality.

Scale: Storage (RDBMS (Relational Database Management System) or

NoSQL (Not only SQL)) is one major concern that needs to be addressed
to handle the need for scaling rapidlyand elastically. The need of the hour
is a storage that can best withstand the attack of large volume, velocity
and variety of big data. Should you scale vertically or should you scale
horizontally?

Security: Most of the NoSQL big data platforms have poor security
mechanisms (lack of proper authentication and authorization
mechanisms) when it comes to safeguarding big data. A spot that cannot
be ignored given that big data carries credit card information, personal
information and other sensitive data.

schema: Rigid schemas have no place. We want the technology to be

able to fit our big data and not the other way around. The need of the
hour is dynamic schema. Static (pre-defined schemas) are obsolete.

Continuous availability: The big question here is how to provide 24/7

support because almostall RDBMS and NoSQL big data platforms have a
certain amount of downtime built in.

Consistency: Should one opt for consistency or eventual consistency?

Partition tolerant: How to build partition tolerant systems that can take
care of both hardwareand software failures?

Data quality: How to maintain data quality- data accuracy,

completeness, timeliness, etc.? Dowe have appropriate metadata in place?

2.4 IMPORTANCE OF BIG DATA

Let us study the various approaches to analysis of data and what it

leads to.
Reactive-Business Intelligence: What does Business Intelligence (BI)
help us with? It allows the businesses to make faster and better decisions
by providing the right information to the right person at the right time in
4
the right format. It is about analysis of the past or historical data and then
displaying the findings of the analysis or reports in the form of enterprise
dashboards, alerts, notifications, etc. It has support for both pre-specified
reports as well as adhoc querying.

Reactive - Big Data Analytics: Here the analysis is done on huge

datasets but the approach isstill reactive as it is still based on static data.

Proactive - Analytics: This is to support futuristic decision making by

use of data mining predictive modelling, text mining, and statistical
analysis on. This analysis is not on big data as it still the traditional
database management practices on big data and therefore has severe
limitations on the storage capacity and the processing capability.

Proactive - Big Data Analytics: This is filtering through terabytes,

petabytes, exabytes of information to filter out the relevant data to
analyze. This also includes high performance analytics to gain rapid
insights from big data and the ability to solve complex problems using
more data.

2.5 BIG DATA TECHNOLOGIES

Following are the requirements of technologies to meet challenges of big

data:
• The first requirement is of cheap and ample storage.
• We need faster processors to help with quicker processing of big data.
Affordable open source distributed big data platforms, such as
Hadoop.
• Parallel processing, clustering, virtualization, large grid environments
(to distribute processing to a number of machines), high connectivity,
and high throughputs(rate at whichsomething is processed).
• Cloud computing and other flexible resource allocation arrangements.

2.6 DATA SCIENCE

Data science is the science of extracting knowledge from data. In

other words, it is a science of drawing out hidden patterns amongst data
using statistical and mathematical techniques.

It employs techniques and theories drawn from many fields from

the broad areas of mathematics, statistics, information technology
including machine learning, data engineering, probability models,
statistical learning, pattern recognition and learning, etc.

Data Scientist works on massive datasets for weather predictions,

oil drillings, earthquake prediction, financial frauds, terrorist network and

5
activities, global economic impacts, sensor logs, social media analytics,
customer churn, collaborative filtering(prediction about interest on users),
regression analysis, etc. Data science is multi-disciplinary. Refer to
Figure 2.2.

Figure 2.2 Data Scientist (Big Data and Analytics)

2.6.1 Business Acumen(expertise) Skills:

A data scientist should have following ability to play the role of data
scientist.
• Understanding of domain
• Business strategy
• Problem solving
• Communication
• Presentation
• Keenness

2.6.2 Technology Expertise:

Following skills required as far as technical expertise is concerned.
• Good database knowledge such as RDBMS.
• Good NoSQL database knowledge such as MongoDB, Cassandra,
HBase, etc.
• Programming languages such as Java. Python, C++, etc.
• Open-source tools such as Hadoop.
• Data warehousing.
• Data mining
• Visualization such as Tableau, Flare, Google visualization APIs, etc.

6
2.6.3 Mathematics Expertise:

The following are the key skills that a data scientist will have to have to
comprehend data,interpret it and analyze.
• Mathematics.
• Statistics.
• Artificial Intelligence (AI).
• Algorithms.
• Machine learning.
• Pattern recognition.
• Natural Language Processing.
• To sum it up, the data science process is
• Collecting raw data from multiple different data sources.
• Processing the data.
• Integrating the data and preparing clean datasets.
• Engaging in explorative data analysis using model and algorithms.
• Preparing presentations using data visualizations.
• Communicating the findings to all stakeholders.
• Making faster and better decisions.

2.7 RESPONSIBILITIES

Refer figure 2.3 to understand the responsibilities of a data scientist.

Data Management: A data scientist employs several approaches to

develop the relevant datasets for analysis. Raw data is just "RAW",
unsuitable for analysis. The data scientist works on it to prepare to reflect
the relationships and contexts. This data then becomes useful for
processing and further analysis.

Analytical Techniques: Depending on the business questions which we

are trying to find answers to and the type of data available at hand, the
data scientist employs a blend of analytical techniques to develop models
and algorithms to understand the data, interpret relationships, spot trends,
and reveal patterns.

Business Analysis: A data scientist is a business analyst who

distinguishes cool facts from insights and is able to apply his business
expertise and domain knowledge to see the results inthe business context.

7
Figure 2.3 Data scientist: your new best friend!!!
(Data Science & Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data)

Communicator: He is a good presenter and communicator who is able

to communicate the results of his findings in a language that is
understood by the different business stakeholders.

2.8 SOFT STATE EVENTUAL CONSISTENCY

ACID property in RDBMS:

Atomicity: Either the task (or all tasks) within a transaction are
performed or none of them are. This is the all-or-none principle. If one
element of a transaction fails the entire transaction fails.

Consistency: The transvaction must meet all protocols or rules definedt by

the system at all times. The transaction does not isolate those protocols
and the database must remain in a consistent state at the beginning and
end of a transaction; there are never any half-completedtransactions.

Isolation: No transaction has access to any other transaction that is in an

intermediate or unfinished state. Thus, each transaction is independent
unto itself. This is required for both performance and consistency of
transactions within a database.

Durability: Once the transaction is complete, it will persist as complete

8
and cannot be undone; it will survive system failure, power loss and
other types of system breakdowns.

BASE (Basically Available, Soft state, Eventual consistency). In a

system where BASE is the prime requirement for reliability, the
activity/potential (p) of the data (H) changes;
it essentially slows down.

Basically Available: This constraint states that the system does

guarantee the availability of the data as regards CAP Theorem; there will
be a response to any request. But, that response could still be ‘failure’ to
obtain the requested data or the data may be in an inconsistent or
changing state, much like waiting for a check to clear in your bank
account.

Eventual consistency: The system will eventually become consistent

once it stops receiving input. The data will propagate to everywhere it
should sooner or later, but the system will continue to receive input and
is not checking the consistency of every transaction before it moves onto
the next one. Werner Vogel’s article “Eventually Consistent – Revisited”
coversthis topic is much greater detail.

Soft state: The state of the system could change over time, so even
during times without input there may be changes going on due to
‘eventual consistency,’ thus the state of the system is always ‘soft.’

2.9 DATA ANALYTICS LIFE CYCLE

Here is a brief overview of the main phases of the Data Analytics:

Phase 1- Discovery: In Phase 1, the team learns the business domain,

including relevant history such as whether the organization or business
unit has attempted similar projects in thepast from which they can learn.
The team assesses the resources available to support the project in terms
of people, technology, time and data. Important activities in this phase
include framing the business problem as an analytics challenge that can
be addressed in subsequent phases and formulating initial hypotheses
(IHs) to test and begin learning the data.

Phase 2- Data preparation: Phase 2 requires the presence of an analytic

sandbox, in which theteam can work with data and perform analytics for
the duration of the project. The team needs to execute extract, load, and
transform (ELT) or extract, transform and load (ETL) to get data into the
sandbox. The ELT and ETL are sometimes abbreviated as ETLT. Data
should be transformed in the ETLT process so the team can work with it
and analyze it. In this phase, the team also needs to familiarize itself with
the data thoroughly and take steps tocondition the data.

Phase 3-Model planning: Phase 3 is model planning, where the team

9
determines the methods, techniques and workflow it intends to follow for
the subsequent model building phase. The team explores the data to learn
about the relationships between variables and subsequently selects key
variables and the most suitable models.

Figure 2.4 - Overview of Data Analytical Lifecycle

(Data Science & Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data)

Phase 4-Model building: In Phase 4, the team develops data sets for
testing, training, and production purposes. In addition, in this phase the
team builds and executes models based on the work done in the model
planning phase. The team also considers whether its existing tools will
suffice for running the models, or if it will need a more robust
environment for executing models and workflows (for example, fast
hardware and parallel processing, if applicable).

Phase 5-Communicate results: In Phase 5, the team, in collaboration

with major stakeholders, determines if the results of the project are a
success or a failure based on the criteria developed in Phase 1. The team
should identify key findings, quantify the business value, and develop a
narrative to summarize and convey findings to stakeholders.

Phase 6-0perationalize: In Phase 6, the team delivers final reports,

briefings, code and technical documents. In addition, the team may run a
pilot project to implement the models ina production environment.

Assignment 1 Based On Unit 1
No ratings yet
Assignment 1 Based On Unit 1
6 pages
1 U Data-Analytics-Unit-I-1
100% (1)
1 U Data-Analytics-Unit-I-1
81 pages
BDS Session 3
No ratings yet
BDS Session 3
64 pages
01 - Big Data Analytics - An Introduction
No ratings yet
01 - Big Data Analytics - An Introduction
45 pages
Unit1 BDT
No ratings yet
Unit1 BDT
96 pages
Unit-Ii Bdaur-Bcom
No ratings yet
Unit-Ii Bdaur-Bcom
7 pages
Traditional Versus Big Data Approach
No ratings yet
Traditional Versus Big Data Approach
25 pages
CSCI946 W1-Introduction
No ratings yet
CSCI946 W1-Introduction
36 pages
(Subject Code: 410243) (Class: TE Computer Engineering) : Data Analytics
No ratings yet
(Subject Code: 410243) (Class: TE Computer Engineering) : Data Analytics
68 pages
Chapter - 01 - Introduction To Big Data
No ratings yet
Chapter - 01 - Introduction To Big Data
22 pages
Unit 1 Rept
No ratings yet
Unit 1 Rept
61 pages
Introduction To Big Data
No ratings yet
Introduction To Big Data
4 pages
21ai402 Data Analytics Unit-1
No ratings yet
21ai402 Data Analytics Unit-1
37 pages
Unit I Introduction To Big Data
No ratings yet
Unit I Introduction To Big Data
36 pages
Bigdata Mod-1
No ratings yet
Bigdata Mod-1
33 pages
BDA Unit 1
No ratings yet
BDA Unit 1
23 pages
Kwasu-Csc204 Big Data Computing and Security-1
No ratings yet
Kwasu-Csc204 Big Data Computing and Security-1
57 pages
Sesi 3 Analytics and Big Data
No ratings yet
Sesi 3 Analytics and Big Data
28 pages
BDS Session 3
No ratings yet
BDS Session 3
56 pages
BigData - BCom Unit 2
No ratings yet
BigData - BCom Unit 2
10 pages
Kwasu-Csc204 Module 1 Big Data Computing and Security 2
No ratings yet
Kwasu-Csc204 Module 1 Big Data Computing and Security 2
22 pages
1 Business Analytics Unit 1
No ratings yet
1 Business Analytics Unit 1
35 pages
What Is Need of Big Data in Enterprises and How It Is Different From Business Intelligence
No ratings yet
What Is Need of Big Data in Enterprises and How It Is Different From Business Intelligence
56 pages
1.big Data and Its Importance
No ratings yet
1.big Data and Its Importance
17 pages
Unit - 2 Fundamentals of Big Data Analytics
No ratings yet
Unit - 2 Fundamentals of Big Data Analytics
39 pages
UNIT Two Emerging Technology
No ratings yet
UNIT Two Emerging Technology
43 pages
Unit 1
No ratings yet
Unit 1
21 pages
CS 329 Lecture One 2025
No ratings yet
CS 329 Lecture One 2025
28 pages
UNUT 1 - Introduction and Data Analytics Life Cycle
No ratings yet
UNUT 1 - Introduction and Data Analytics Life Cycle
86 pages
Chapter - 01 - Introduction To Big Data
No ratings yet
Chapter - 01 - Introduction To Big Data
23 pages
CHAPTER 02: Big Data Analytics
No ratings yet
CHAPTER 02: Big Data Analytics
73 pages
Iso 14641-2018
No ratings yet
Iso 14641-2018
50 pages
Data Analytics-Unit1 Notes
No ratings yet
Data Analytics-Unit1 Notes
30 pages
Big Data Analytics Project Proposal by Slidesgo
No ratings yet
Big Data Analytics Project Proposal by Slidesgo
12 pages
Bigdata Assess1 PDF
No ratings yet
Bigdata Assess1 PDF
12 pages
CHAPTER 02: Big Data Analytics
No ratings yet
CHAPTER 02: Big Data Analytics
62 pages
Unit 1 Introduction
No ratings yet
Unit 1 Introduction
70 pages
Big Data Analytics Unit Test-I Answers Bank
No ratings yet
Big Data Analytics Unit Test-I Answers Bank
10 pages
Reviewed Big Data Assignment
No ratings yet
Reviewed Big Data Assignment
6 pages
Big Data Analytics
No ratings yet
Big Data Analytics
58 pages
What Is Big Data
No ratings yet
What Is Big Data
4 pages
Chapter 1
No ratings yet
Chapter 1
49 pages
Big Data Analytics
100% (1)
Big Data Analytics
11 pages
Big Data Analytics PDF
No ratings yet
Big Data Analytics PDF
22 pages
Big Data Manual - Edited
No ratings yet
Big Data Manual - Edited
69 pages
Dataanalyticsunit 1
No ratings yet
Dataanalyticsunit 1
26 pages
Harnessing Big Data
No ratings yet
Harnessing Big Data
29 pages
Insights Into Big Data: An Industrial Perspective
No ratings yet
Insights Into Big Data: An Industrial Perspective
52 pages
Introduction To Big Data Analytics
No ratings yet
Introduction To Big Data Analytics
35 pages
Unit 2
No ratings yet
Unit 2
35 pages
Unit I
No ratings yet
Unit I
47 pages
Unit 1 - ETI (BDA)
No ratings yet
Unit 1 - ETI (BDA)
20 pages
Demystifying Big Data RGc1.0
100% (1)
Demystifying Big Data RGc1.0
10 pages
Clocks and Calendars Notes
No ratings yet
Clocks and Calendars Notes
30 pages
Business Analytics Introduction
No ratings yet
Business Analytics Introduction
8 pages
Notes - KCS 061 Big Data Unit 1
No ratings yet
Notes - KCS 061 Big Data Unit 1
25 pages
BDA Unit 1
No ratings yet
BDA Unit 1
10 pages
7 1 15
No ratings yet
7 1 15
103 pages
AutoCAD Electrical Inserting Components
No ratings yet
AutoCAD Electrical Inserting Components
23 pages
Guides Inspector User Reference Guide Helix 9.1
No ratings yet
Guides Inspector User Reference Guide Helix 9.1
526 pages
Cis-4 8-4 10
No ratings yet
Cis-4 8-4 10
7 pages
Wipro Interview Questions
100% (2)
Wipro Interview Questions
39 pages
Hotel Management Project in Java
No ratings yet
Hotel Management Project in Java
16 pages
Numeric, Date, Financial, String Functions in Visual Basic
No ratings yet
Numeric, Date, Financial, String Functions in Visual Basic
11 pages
Simba Apache Spark ODBC Connector Install and Configuration Guide
No ratings yet
Simba Apache Spark ODBC Connector Install and Configuration Guide
125 pages
IBM - IBM Watsonx - Data
No ratings yet
IBM - IBM Watsonx - Data
15 pages
IP Practical, Kanishk Bhati XII-D
No ratings yet
IP Practical, Kanishk Bhati XII-D
37 pages
Senstar Symphony 8 SDK Developer Guide en-US
No ratings yet
Senstar Symphony 8 SDK Developer Guide en-US
23 pages
K4 Analytics 2021 1 UsersManual
No ratings yet
K4 Analytics 2021 1 UsersManual
150 pages
LIS S511 Bow SP22 1
No ratings yet
LIS S511 Bow SP22 1
17 pages
Sita1502 Customer Interface Design and Development-165-219
No ratings yet
Sita1502 Customer Interface Design and Development-165-219
55 pages
Simple SQL Queries
No ratings yet
Simple SQL Queries
4 pages
Object Oriented Programming in Python, Definitions
No ratings yet
Object Oriented Programming in Python, Definitions
5 pages
E - 20200215 - How Can Versions Be Transported - Refresh Dev
No ratings yet
E - 20200215 - How Can Versions Be Transported - Refresh Dev
2 pages
DBMS
No ratings yet
DBMS
7 pages
Class 11 Ip Chapter 5 2024-2025
No ratings yet
Class 11 Ip Chapter 5 2024-2025
11 pages
Properties of Some Controls
No ratings yet
Properties of Some Controls
14 pages
FACTS About Building Retrieval Augmented Generation-Based Chatbots
No ratings yet
FACTS About Building Retrieval Augmented Generation-Based Chatbots
8 pages
Geleya4you Page 24 PDF
No ratings yet
Geleya4you Page 24 PDF
3 pages
01 Introduction To Active Directory
No ratings yet
01 Introduction To Active Directory
47 pages
Chapter 4
No ratings yet
Chapter 4
19 pages
Clustering - Definition and Types of Clustering
No ratings yet
Clustering - Definition and Types of Clustering
5 pages
The China Biographical Database - From Anecdote To Data
No ratings yet
The China Biographical Database - From Anecdote To Data
63 pages
What'S New in Omnis Studio 4.3.1: Tigerlogic Corporation
No ratings yet
What'S New in Omnis Studio 4.3.1: Tigerlogic Corporation
54 pages
Powerpdf Reference: Veision 0.8 (Beta)
No ratings yet
Powerpdf Reference: Veision 0.8 (Beta)
13 pages
RIoTBench Summary
No ratings yet
RIoTBench Summary
26 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
Zachman Framework in Teaching Information Systems: July 2003
No ratings yet
Zachman Framework in Teaching Information Systems: July 2003
7 pages
Abhishek Chauhan Resume
No ratings yet
Abhishek Chauhan Resume
2 pages
Rahul Kumar Resume
No ratings yet
Rahul Kumar Resume
1 page
Aris Mashzone: Cool Business Mashups in Minutes
No ratings yet
Aris Mashzone: Cool Business Mashups in Minutes
2 pages
Hadoop Ecosystem for Big Data
From Everand
Hadoop Ecosystem for Big Data
Dr. Zemelak Goraga
No ratings yet
Data Lake Development with Big Data: Explore architectural approaches to building Data Lakes that ingest, index, manage, and analyze massive amounts of data using Big Data technologies
From Everand
Data Lake Development with Big Data: Explore architectural approaches to building Data Lakes that ingest, index, manage, and analyze massive amounts of data using Big Data technologies
Pradeep Pasupuleti
No ratings yet
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
From Everand
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
Rob Botwright
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Big-Data-Analytics Notes For Ug

Uploaded by

Big-Data-Analytics Notes For Ug

Uploaded by

BIG DATA ANALYTICS

Big Data is creating significant new opportunities for

2.1 INTRODUCTION TO BIG DATA ANALYTICS

4. A tight handshake between three communities: IT, business users,

2.2 CLASSIFICATION OF ANALYTICS

There are basically two schools of thought:

2.2.1. First School of Thought

It includes Basic analytics, Operationalized analytics, Advanced

(Big Data and Analytics)

2.2.2 Second School of Thought:

Analytics 1.0 Analytics 2.0 Analytics 3.0

2.3 CHALLENGES OF BIG DATA

There are mainly seven challenges of big data: scale, security,

Scale: Storage (RDBMS (Relational Database Management System) or

schema: Rigid schemas have no place. We want the technology to be

Continuous availability: The big question here is how to provide 24/7

Consistency: Should one opt for consistency or eventual consistency?

Data quality: How to maintain data quality- data accuracy,

2.4 IMPORTANCE OF BIG DATA

Let us study the various approaches to analysis of data and what it

Reactive - Big Data Analytics: Here the analysis is done on huge

Proactive - Analytics: This is to support futuristic decision making by

Proactive - Big Data Analytics: This is filtering through terabytes,

2.5 BIG DATA TECHNOLOGIES

Following are the requirements of technologies to meet challenges of big

2.6 DATA SCIENCE

Data science is the science of extracting knowledge from data. In

It employs techniques and theories drawn from many fields from

Data Scientist works on massive datasets for weather predictions,

Figure 2.2 Data Scientist (Big Data and Analytics)

2.6.1 Business Acumen(expertise) Skills:

2.6.2 Technology Expertise:

Refer figure 2.3 to understand the responsibilities of a data scientist.

Data Management: A data scientist employs several approaches to

Analytical Techniques: Depending on the business questions which we

Business Analysis: A data scientist is a business analyst who

Communicator: He is a good presenter and communicator who is able

2.8 SOFT STATE EVENTUAL CONSISTENCY

ACID property in RDBMS:

Consistency: The transvaction must meet all protocols or rules definedt by

Isolation: No transaction has access to any other transaction that is in an

Durability: Once the transaction is complete, it will persist as complete

BASE (Basically Available, Soft state, Eventual consistency). In a

Basically Available: This constraint states that the system does

Eventual consistency: The system will eventually become consistent

2.9 DATA ANALYTICS LIFE CYCLE

Here is a brief overview of the main phases of the Data Analytics:

Phase 1- Discovery: In Phase 1, the team learns the business domain,

Phase 2- Data preparation: Phase 2 requires the presence of an analytic

Phase 3-Model planning: Phase 3 is model planning, where the team

Figure 2.4 - Overview of Data Analytical Lifecycle

Phase 5-Communicate results: In Phase 5, the team, in collaboration

Phase 6-0perationalize: In Phase 6, the team delivers final reports,

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.