0% found this document useful (0 votes)
3 views24 pages

1 Introduction

The document discusses the concept of Big Data, defined by its three main characteristics: velocity, volume, and variety, which present challenges to computational resources. It outlines the data lifecycle, including data collection, processing, and visualization, and highlights examples such as the Netflix Prize and flu prediction using search engine data. Additionally, it covers data management infrastructure and the role of databases in handling structured and unstructured data.

Uploaded by

Mx A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views24 pages

1 Introduction

The document discusses the concept of Big Data, defined by its three main characteristics: velocity, volume, and variety, which present challenges to computational resources. It outlines the data lifecycle, including data collection, processing, and visualization, and highlights examples such as the Netflix Prize and flu prediction using search engine data. Additionally, it covers data management infrastructure and the role of databases in handling structured and unstructured data.

Uploaded by

Mx A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

+ Big Data Challenge

+ How big is “Big Data”?

Big Data and You


+ How big is “Big Data”?

Big Data and You

It is the amount of data that is big enough to


challenge our computational resources!
+ Big Data Defined (3V)
n Velocity
n Moves at very high rates (think sensor-driven systems)
n Valuable in its temporal state

n Volume
n Fast-moving data creates massive historical archives
n Valuable for mining patterns, trends and relationships

n Variety
n From structured to unstructured
n Valuable for data enrichment/fusion

n 4V?
+ Big Data – Storage and Analytics
+ Where does Big Data come from?
Storage Analytics
Online NewSQL Analytic
gaming
§ Structured data Datastore
Ad § ACID guarantees
serving § Relational/SQL

Sensor
data

Financial
trade
§ Unstructured data
Internet § Eventual consistency
commerce
§ Schemaless
SaaS, § KV, document
Web 2.0

Mobile
platforms
NoSQL
+ Big Data Management Infrastructure
Storage Analytics
Online NewSQL Analytic
gaming
§ Structured data Datastore
Ad § ACID guarantees
serving § Relational/SQL

Sensor
data

Financial
trade
§ Unstructured data
Internet § Eventual consistency
commerce
§ Schemaless
SaaS, § KV, document
Web 2.0

Mobile
platforms
NoSQL
+ Big Data Management Infrastructure
Storage Analytics
Online NewSQL Analytic
gaming Datastore
Ad
serving

Sensor
data

Financial
trade

Internet
commerce

SaaS,
Web 2.0

Mobile
platforms
NoSQL
+ Big Data Management Infrastructure
Storage Analytics
Online NewSQL Analytic
gaming Datastore
Ad
serving

Sensor
data

Financial
trade

Internet
commerce

SaaS,
Web 2.0

Mobile
platforms
NoSQL
+ The Data LifeCycle

Exploratory
analysis Insights &
Data Data Data
& Decision
collection processing Analytics
Data Making
visualization
+ The Two Steps of Working with Data

(1) Collect data

Via computers, sensors, people, events, …

(2) Do something with it

Make decisions, gain insights, predict future, …


+ The Data LifeCycle

Exploratory
analysis Insights &
Data Data Data
& Decision
collection processing Analytics
Data Making
visualization
+
+ Netflix Prize I

n Recommender systems: predict a user’s rating of an item


n Twilight Wall-E Twilight II Furious 7
User 1 +1 -1 +1 ?
User 2 +1 -1 ? ?
User 3 -1 +1 -1 +1

Netflix Prize: $1MM to the first team that beats our in-
house engine by 10%
• Happened after about three years
• Model was never used by Netflix for a variety of reasons
• Out of date (DVDs vs streaming)
• Too complicated / not interpretable
+ Flu Prediction

n “DetectingInfluenza Epidemics using Search Engine


Query Data”, Nature 2009
n 50M common search terms
n 2003-2008 CDC seasonal flu spreading trends
n Training data: 2003-2006; Test data: 2007–2008

n 450M combinations è 45 search terms with strong


association
n Validation:successful predication of 2009 bird flu
outbreak
n Deployed in 29 countries by WHO
+

Sports

(1) Collect data

(2) Do something with it

Overview CS102
+ The Data LifeCycle

Exploratory
analysis Insights &
Data Data Data
& Decision
collection processing Analytics
Data Making
visualization
+ Data Collection and Preparation

n Data Preparation
n Filling in missing values
n Removing suspicious data
n Making formats, and units consistent

n Data Collection
n Storing big data
n Collecting (big) continuous stream data
+ Data Processing

Performing well-defined computations or asking


well-defined questions (“queries”)

n Average January low temperature for each country


over last 20 years

n Numberof items over $100 bought by females


between ages 20 and 30

n Theten stocks whose price varied the most over the


past year
+ What is DBMS?

n Database
n Data
n DBMS Application Interactive
Programs Queries
n Applications/Users

n What is DBMS?
n The relational model DBMS
n SQL/Query processing
n ACID/Transaction
Management

n Whatdoes RDBMS do Database


and don’t do?
+ The Database

n Data models
n The relational model
n Non-relational; object oriented, object relational, network,
hierarchical
n Now semi-structured and unstructured

n Database design
n Entity-Relationship diagrams, functional dependencies,
normalisation

n Physical storage
n Organisation, hashing, indexing
+ The Database

n Data models
n The relational model
n Non-relational; object oriented, object relational, network,
hierarchical
n Now semi-structured and unstructured

n Database design
n Entity-Relationship diagrams, functional dependencies,
normalisation

n Physical storage
n Organisation, hashing, indexing
+ The Data LifeCycle

Exploratory
analysis Insights &
Data Data Data
& Decision
collection processing Analytics
Data Making
visualization
+ Data Visualization
Basic Data Visualization
n “A picture is worth a thousand words”
Don’t underestimate the power of basic visualizations

§ Bar charts

§ Pie charts

§ Scatterplots

§ Maps

Overview CS102

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy