0% found this document useful (0 votes)
37 views41 pages

Introduction To Business Analytics

This document provides an introduction to business analytics and big data. It discusses the types of business analytics including descriptive, diagnostic, predictive, and prescriptive analytics. It also outlines the benefits of business analytics and careers in big data analytics. Hadoop is introduced as a framework for distributed processing of large datasets across clusters of computers using MapReduce. The key design principles of Hadoop are that it can process big data using commodity hardware in a fault-tolerant and automatically parallelized manner.

Uploaded by

Himanshu Kashyap
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views41 pages

Introduction To Business Analytics

This document provides an introduction to business analytics and big data. It discusses the types of business analytics including descriptive, diagnostic, predictive, and prescriptive analytics. It also outlines the benefits of business analytics and careers in big data analytics. Hadoop is introduced as a framework for distributed processing of large datasets across clusters of computers using MapReduce. The key design principles of Hadoop are that it can process big data using commodity hardware in a fault-tolerant and automatically parallelized manner.

Uploaded by

Himanshu Kashyap
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Introduction to Business Analytics

Outline
 Big Data Drivers
 Challenges of Analytics
 Types of Business Analytics
 Benefits of Business Analytics
 Careers in Big data Analytics
 Benefits of Big data Analytics
What’s driving Big Data

- Optimizations and predictive analytics


- Complex statistical analysis
- All types of data, and many sources
- Very large datasets
- More of real-time

- Ad-hoc querying and reporting


- Data mining techniques
- Structured data, typical sources
- Small to mid-size datasets

3
The Big-Data Challenge
“Everywhere you look, the quantity of
information in the world is soaring.
Merely keeping up with this flood, and
storing the bits that might be useful, is
difficult enough. Analyzing it, to spot
patterns and extract useful information,
is harder still.”
The Economist; “The Data Deluge”; 2/10/2010

Gartner says:
• “The Big Data Challenge Involves More
Than Just Managing Volumes of Data”
• “The real issue is making sense out of
the data and (…) helping
organizations make better
decisions.”
Definition of Insights

 “Insights are thoughts, facts, data, or analysis of facts


and data that induce meaning and further
understanding of a business challenge and answer
essential questions and create an urgency to act or
rethink a business challenge in terms of its problems or
solutions.”
Value of Big Data Analytics

 Big data is more real-time in


nature than traditional DW
applications
 Traditional DW architectures (e.g.
Exadata, Teradata) are not well-
suited for big data apps
 Shared nothing, massively parallel
processing, scale out architectures
are well-suited for big data apps

6
Predictive Analytics
Prescriptive Analytics
Big problem: understanding the output
What Technology Do We Have
For Big Data ??

20
22
Why Hadoop is able to compete?
24

Database

vs.

Scalability (petabytes of data, Performance (tons of indexing,


thousands of machines) tuning, data organization tech.)

Flexibility in accepting all data


formats (no schema) Features:
- Provenance tracking
Efficient and simple fault- - Annotation management
tolerant mechanism - ….

Commodity inexpensive
hardware
What is Hadoop
25

 Hadoop is a software framework for distributed


processing of large datasets across large clusters of
computers
 Large datasets  Terabytes or petabytes of data
 Large clusters  hundreds or thousands of nodes
 Hadoop is open-source implementation for Google
MapReduce
 Hadoop is based on a simple programming model
called MapReduce
 Hadoop is based on a simple data model, any data
will fit
What is Hadoop (Cont’d)
26

 Hadoop framework consists on two main layers


 Distributed file system (HDFS)
 Execution engine (MapReduce)
Hadoop Master/Slave Architecture
27

 Hadoop is designed as a master-slave shared-nothing architecture

Master node (single node)

Many slave nodes


Design Principles of Hadoop
28

 Need to process big data


 Need to parallelize computation across thousands
of nodes
 Commodity hardware
 Large number of low-end cheap machines working in
parallel to solve a computing problem
 This is in contrast to Parallel DBs
 Small number of high-end expensive machines
Design Principles of Hadoop
29

 Automatic parallelization & distribution


 Hidden from the end-user

 Fault tolerance and automatic recovery


 Nodes/tasks will fail and will recover automatically

 Clean and simple programming


abstraction
 Users only provide two functions “map” and “reduce”

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy