0% found this document useful (0 votes)
29 views33 pages

OC - Module 2 - DA Lifecycle 021312

Module 2 focuses on the Data Analytics Lifecycle, teaching participants to apply it to case studies, frame business problems as analytics problems, and identify key deliverables in analytics projects. The module covers essential topics such as the roles necessary for successful analytics projects and the structured approach needed for data science projects. It emphasizes the importance of a well-defined process to guide analytics efforts and includes practical tips for interviewing stakeholders and formulating hypotheses.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views33 pages

OC - Module 2 - DA Lifecycle 021312

Module 2 focuses on the Data Analytics Lifecycle, teaching participants to apply it to case studies, frame business problems as analytics problems, and identify key deliverables in analytics projects. The module covers essential topics such as the roles necessary for successful analytics projects and the structured approach needed for data science projects. It emphasizes the importance of a well-defined process to guide analytics efforts and includes practical tips for interviewing stakeholders and formulating hypotheses.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Module 2 – Data Analytics Lifecycle

2
EMC PROVEN PROFESSIONAL

Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Analytics Lifecycle 1
Module 2: Data Analytics Lifecycle

Upon completion of this module, you should be able to:


• Apply the Data Analytics Lifecycle to a case study scenario
• Frame a business problem as an analytics problem
• Identify the four main deliverables in an analytics project

2
EMC PROVEN PROFESSIONAL

Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Analytics Lifecycle 2
Module 2: Data Analytics Lifecycle

During this module the following topics are covered:


• Data Analytics Lifecycle
• Roles for a Successful Analytics Project
• Case Study to apply the data analytics lifecycle

2
EMC PROVEN PROFESSIONAL

Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Analytics Lifecycle 3
How to Approach Your Analytics Problems
Your Thoughts?

• How do you currently approach


your analytics problems?

• Do you follow a methodology or


some kind of framework?

• How do you plan for an analytic


project?

2
EMC PROVEN PROFESSIONAL

Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Analytics Lifecycle 4
Value of Using the Data Analytics Lifecycle

• Focus your time

• Ensure rigor and completeness

• Enable better transition to members of the cross-functional


analytic teams
 Repeatable
 Scale to additional analysts
 Support validity of findings

“A journey of a thousand miles begins with a single step“ (Lao Tzu)

2
EMC PROVEN PROFESSIONAL

Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Analytics Lifecycle 5
Need For a Process to Guide Data Science
Projects
1. Well-defined processes
can help guide any analytic
project
2. Focus of Data Analytics
Lifecycle is on Data Science
projects, not business
intelligence
3. Data Science projects tend to require a more consultative
approach, and differ in a few ways
 More due diligence in Discovery phase
 More projects which lack shape or structure
 Less predictable data
2
EMC PROVEN PROFESSIONAL

Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Analytics Lifecycle 6
Key Roles for a Successful Analytic Project
Role Description

Someone who benefits from the end results and can consult and advise project
Business User
team on value of end results and how these will be operationalized

Person responsible for the genesis of the project, providing the impetus for the
Project Sponsor project and core business problem, generally provides the funding and will gauge
the degree of value from the final outputs of the working team

Project Manager Ensure key milestones and objectives are met on time and at expected quality.

Business Business domain expertise with deep understanding of the data, KPIs, key metrics
Intelligence Analyst and business intelligence from a reporting perspective

Deep technical skills to assist with tuning SQL queries for data management,
Data Engineer
extraction and support data ingest to analytic sandbox

Database Database Administrator who provisions and configures database environment to


Administrator (DBA) support the analytical needs of the working team
Provide subject matter expertise for analytical techniques, data modeling, applying
Data Scientist valid analytical techniques to given business problems and ensuring overall
analytical objectives are met

2
EMC PROVEN PROFESSIONAL

Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Analytics Lifecycle 7
Data Analytics Lifecycle Do I have enough
information to draft an
analytic plan and share for
1 peer review?
Discovery
Do I have
enough good
quality data to
6 2
start building
Operationalize Data Prep the model?

5 3
Communicate Model
Results Planning

4
Model Do I have a good idea
Building about the type of model
Is the model robust to try? Can I refine the
enough? Have we analytic plan?
failed for sure?

2
EMC PROVEN PROFESSIONAL

Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Analytics Lifecycle 8
Data Analytics Lifecycle Do I have enough
Phase 1: Discovery information to draft an
analytic plan and share for
1 peer review?
Discovery
Do I have
enough good
quality data to
• Learn the Business Domain start building
 Operationalize Data you
Determine amount of domain knowledge needed to orient Prepto the the model?
data and
interpret results downstream
 Determine the general analytic problem type (such as clustering, classification)
 If you don’t know, then conduct initial research to learn about the domain area
you’ll be analyzing
Communicate Model
• Results
Learn from the past Planning
 Have there been previous attempts in the organization to solve this problem?
 If so, why did they fail? Why are we trying again? How have things changed?
Model Do I have a good idea
Building about the type of model
Is the model robust to try? Can I refine the
enough? Have we analytic plan?
failed for sure?

2
EMC PROVEN PROFESSIONAL

Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Analytics Lifecycle 11
Data Analytics Lifecycle Do I have enough
Phase 1: Discovery information to draft an
analytic plan and share for
1 peer review?
Discovery
Do I have
enough good
quality data to
start building
• Resources
Operationalize Data Prep the model?

 Assess available technology


 Available data – sufficient to meet your needs
 People for the working team
Communicate Model
 Assess scope of time for the project in calendar time
Results and person-hours
Planning
 Do you have sufficient resources to attempt the project? If not, can you get
more?
Model Do I have a good idea
Building about the type of model
Is the model robust to try? Can I refine the
enough? Have we analytic plan?
failed for sure?

2
EMC PROVEN PROFESSIONAL

Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Analytics Lifecycle 12
Data Analytics Lifecycle Do I have enough
Phase 1: Discovery information to draft an
analytic plan and share for
1 peer review?
Discovery
Do I have
enough good
quality data to
• Frame the problem…..Framing is the process of stating the analytics start
problem
building
to beOperationalize
solved Data Prep the model?

 State the analytics problem, why it is important, and to whom


 Identify key stakeholders and their interests in the project
 Clearly articulate the current situation and pain points
Communicate Model
 Objectives – identify what needs to be achieved in business terms and what needs
Results Planning
to be done to meet the needs
 What is the goal? What are the criteria for success? What’s “good enough”?
 What is the failure criterionModel
(when do we just stop trying or settle for what we
Do I have a good idea
have)? Building about the type of model
Is the model robust to try? Can I refine the
 Identify
enough?the success
Have we criteria, key risks, and stakeholders (such as RACI) analytic plan?
failed for sure?

2
EMC PROVEN PROFESSIONAL

Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Analytics Lifecycle 13
Tips for Interviewing the Analytics Sponsor
• Even if you are “given” an analytic problem you should work with clients to
clarify and frame the problem
 You’re typically handed solutions, you need to
identify the problem and their desired outcome
Sponsor Interview Tips
• Prepare for the interview – draft your questions, review with colleague, team
• Use open-ended questions, don’t ask leading questions
• Probe for details, follow-up
• Don’t fill every silence – give them time to think
• Let them express their ideas, don’t put words in their mouth, let them share their feelings
• Ask clarifying questions, ask why – is that correct? Am I on target? Is there anything else?
• Use active listening – repeat it back to make sure you heard it correctly
• Don’t express your opinions
• Be mindful of your body language and theirs – use eye contact, be attentive
• Minimize distractions
• Document what you heard and review it back with the sponsor
2
EMC PROVEN PROFESSIONAL

Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Analytics Lifecycle 14
Tips for Interviewing the Analytics Sponsor
Interview Questions
• What is the business problem you’re trying to solve?
• What is your desired outcome?
• Will the focus and scope of the problem change if the following dimensions
change:
• Time – analyzing 1 year or 10 years worth of data?
• People – how would this project change this?
• Risk – conservative to aggressive
• Resources – none to unlimited (tools, tech, …..)
• Size and attributes of Data
• What data sources do you have?
• What industry issues may impact the analysis?
• What timelines are you up against?
• Who could provide insight into the project? Consulted?
• Who has final say on the project?
2
EMC PROVEN PROFESSIONAL

Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Analytics Lifecycle 15
Data Analytics Lifecycle Do I have enough
Phase 1: Discovery information to draft an
analytic plan and share for
1 peer review?
Discovery
Do I have
enough good
• Formulate Initial Hypotheses quality data to
start building
IH, H1 , H2, H3, … Hn
Operationalize Data Prep the model?
 Gather and assess hypotheses from stakeholders and
domain experts
 Preliminary data exploration to inform discussions with
Communicate
stakeholders during the hypothesis forming stageModel
• IdentifyResults
Data Sources – Begin Learning the Data Planning
 Aggregate sources for previewing the data and provide
high-level understanding Model Do I have a good idea
 Review the raw data
Is the model robust
Building about the type of model
to try? Can I refine the
 Determine the structures and tools needed
enough? Have we analytic plan?
failed for sure?
 Scope the kind of data needed for this kind of problem
2
EMC PROVEN PROFESSIONAL

Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Analytics Lifecycle 16
Using a Sample Case Study to Track the Phases
in the Data Analytics Lifecycle
Mini Case Study: Churn Prediction for
Yoyodyne Bank
Situation Synopsis
• Retail Bank, Yoyodyne Bank wants to improve the Net Present Value
(NPV) and retention rate of customers
• They want to establish an effective marketing campaign targeting
customers to reduce the churn rate by at least five percent
• The bank wants to determine whether those customers are worth
retaining. In addition, the bank also wants to analyze reasons for
customer attrition and what they can do to keep them
• The bank wants to build a data warehouse to support Marketing
and other related customer care groups

2
EMC PROVEN PROFESSIONAL

Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Analytics Lifecycle 18
How to Frame an Analytics Problem Mini Case
Study

Sample Business Analytical


Qualifiers
Problems Approach
• How can we improve on x? Define an analytical
Will the focus and scope of the problem change if
• What’s happening real-time? approach, including
the following dimensions change:
Trends? key terms, metrics, and
• How can we use analytics • Time data needed.
differentiate ourselves • People – how would x change this?
• How can we use analytics to • Risk – conservative/aggressive
innovate? • Resources – none/unlimited
• How can we stay ahead of our
• Size of Data?
biggest competitor?

Mini Case Study: • Time: Trailing 5 months


Churn Prediction for • People: Working team and business users
Yoyodyne Bank from the Bank How do we identify
churn/no churn for a
• Risk: the project will fail if we cannot
customer?
Yoyodyne Bank determine valid predictors of churn
How can we improve • Resources: EDW, analytic sandbox, Pilot study followed
Net Present Value (NPV) and OLTP system full scale analytical
retention rate of the customers?
• Data: Use 24 months for the training set, model
then analyze 5 months of historical data for
those customers who churned

2
EMC PROVEN PROFESSIONAL

Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Analytics Lifecycle 19
Data Analytics Lifecycle Do I have enough
Phase 2: Data Preparation information to draft an
analytic plan and share for
peer review?

• Prepare Analytic Sandbox Discovery


Do I have
 Work space for the analytic team enough good
quality data to
 10x+ vs. EDW 2
start building
• Operationalize
Perform ELT Data Prep the model?

 Determine needed transformations


 Assess data quality and structuring
 Derive statistically useful measures
Communicate Model
 Extract data and determine data
Results Planning
connections for raw data, OLTP
transactions, OLAP cubes or data feeds
 Big ELT and Big ETL Model Do I have a good idea
Building about the type of model
Is the model robust to try? Can I refine the
• Useful enough?
Tools for thiswephase:
Have analytic plan?
• Forfailed
Datafor sure?
Transformation & Cleansing: SQL, Hadoop, MapReduce, Alpine Miner

2
EMC PROVEN PROFESSIONAL

Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Analytics Lifecycle 20
Data Analytics Lifecycle Do I have enough
Phase 2: Data Preparation information to draft an
analytic plan and share for
peer review?
Discovery
• Familiarize yourself with the data thoroughly Do I have
 List your data sources enough good
quality data to
2
 What’s needed vs. what’s available start building
Operationalize Data Prep the model?
• Data Conditioning
 Clean and normalize data
 Discern what you keep vs. what you discard
• SurveyCommunicate
& Visualize Model
Results
 Overview, zoom & filter, details-on-demand Planning
 Descriptive Statistics
 Data Quality Model Do I have a good idea
Building about the type of model
• Is the for
Useful Tools model
thisrobust
phase: to try? Can I refine the
• Descriptive
enough? Statistics
Have we on candidate variables for diagnostics & quality analytic plan?
• Visualization
failed for sure?
: R (base package, ggplot and lattice), GnuPlot, Ggobi/Rggobi, Spotfire,
Tableau
2
EMC PROVEN PROFESSIONAL

Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Analytics Lifecycle 22
Data Analytics Lifecycle Do I have enough
Phase 3: Model Planning information to draft an
analytic plan and share for
peer review?
Discovery
Do I have
• Determine Methods enough good
quality data to
 Select methods based on hypotheses, data start building
Operationalize
structure and volume Data Prep the model?

 Ensure techniques and approach will meet


business objectives
3

• Techniques
Communicate
& Workflow Model
Results Planning
 Candidate tests and sequence
 Identify and document modeling
assumptions Model Do I have a good idea
Building about the type of model
Is the model robust to try? Can I refine the
• Useful enough?
Tools for thiswephase: R/PostgresSQL, SQL
Have analytic plan?
Analytics,failed
Alpine Miner, SAS/ACCESS, SPSS/OBDC
for sure?

2
EMC PROVEN PROFESSIONAL

Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Analytics Lifecycle 24
Data Analytics Lifecycle Do I have enough
Phase 3: Model Planning information to draft an
analytic plan and share for
peer review?
• Data Exploration Discovery
Do I have
enough good
• Variable Selection quality data to
start building
 Inputs from stakeholders and domain
Operationalize Data Prep the model?
experts
 Capture essence of the predictors, leverage
a technique for dimensionality reduction 3
 Iterative testing to confirm the most
Communicate Model
Results
significant variables Planning

• Model Selection
Model Do I have a good idea
 Conversion to SQL or database language for
Building about the type of model
Is the model robust
best performance
enough? Have we
to try? Can I refine the
analytic plan?
 Choose
failed technique
for sure? based on the end goal
2
EMC PROVEN PROFESSIONAL

Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Analytics Lifecycle 26
Sample Research: Churn Prediction in Other Verticals
Mini Case Study:
Churn Prediction for
Yoyodyne Bank

• After conducting research on churn prediction, you have identified many


methods for analyzing customer churn across multiple verticals (those in
bold are taught in this course)
• At this point, a Data Scientist would assess the methods and select the best
model for the situation
Market Sector Analytic Techniques/Methods Used
Wireless DMEL method (data mining by evolutionary learning)
Telecom
Retail Business Logistic regression, ARD (automatic
relevance determination), decision tree
Daily Grocery MLR (multiple linear regression), ARD, and decision tree

Wireless Neural network, decision tree, hierarchical neurofuzzy systems, rule


Telecom evolver
Retail Banking Multiple regression
2
Wireless
EMC Logistic regression, neural network, decision tree
PROVEN PROFESSIONAL
Telecom
Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Analytics Lifecycle 28
Data Analytics Lifecycle Do I have enough
Phase 4: Model Building information to draft an
analytic plan and share for
peer review?
Discovery
• Develop data sets for testing, training, and production purposes Do I have
enough good
 Need to ensure that the model data is sufficiently robust for the model
quality data to
and analytical techniques start building
Operationalize Data Prep the model?
Smaller, test sets for validating approach, training set for initial
experiments
• Get the best environment you can for building models and
workflows…fast hardware, parallel processing
Communicate Model
Results Planning

4
Is the model robust Model Do I have a good idea
enough? Have we Building about the type of model
failed for sure? to try? Can I refine the
analytic plan?
• Useful Tools for this phase: R, PL/R, SQL, Alpine Miner, SAS Enterprise Miner
2
EMC PROVEN PROFESSIONAL

Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Analytics Lifecycle 29
Data Analytics Lifecycle Do I have enough
Phase 5: Communicate Results information to draft an
analytic plan and share for
peer review?
Discovery
Do I have
enough good
quality data to
Did we succeed? Did we fail? start building
Operationalize Data Prep the model?

• Interpret the results


• Compare to IH’s from Phase 1
5
• Identify key findings Model
Communicate
Results • Quantify business value
Planning
• Summarizing findings, depending on
audience
Model Do I have a good idea
Building about the type of model
Is the model robust to try? Can I refine the
Mini Case Study:Have we
enough? For the YoyoDyne Case Study,
Churn Prediction forsure? analytic plan?
failed for what would be some possible results and key findings?
Yoyodyne Bank

2
EMC PROVEN PROFESSIONAL

Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Analytics Lifecycle 31
Data Analytics Lifecycle Do I have enough
Phase 6: Operationalize information to draft an
analytic plan and share for
peer review?
Discovery
Do I have
• Run a pilot enough good
quality data to
6

Operationalize • Assess the benefits


Data Prep
start building
the model?

• Deliver final deliverables


• Model Execution in Production
Communicate Environment Model
Results • Define process toPlanning
update and retrain
the model, as needed
Model Do I have a good idea
Building about the type of model
Is the model robust to try? Can I refine the
enough? Have we analytic plan?
failed for sure?

2
EMC PROVEN PROFESSIONAL

Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Analytics Lifecycle 33
Mini Case Study:
Analytic Plan Churn Prediction for
Retail Banking
Components of Retail Banking: Yoyodyne Bank
Analytic Plan
Phase 1: Discovery How do we identify churn/no churn for a customer?
Business Problem
Framed
Initial Hypotheses Transaction volume and type are key predictors of churn rates.

Data 5 months of customer account history.


Phase 3: Model Logistic regression to identify most influential factors predicting
Planning - Analytic churn.
Technique
Phase 5: Once customers stop using their accounts for gas and groceries,
Result & they will soon erode their accounts and churn.
Key Findings If customers use their debit card fewer than 5 times per month,
they will leave the bank within 60 days.
Business Impact If we can target customers who are high-risk for churn, we can
reduce customer attrition by 25%. This would save $3 million in
lost of customer revenue and avoid $1.5 million in new customer
acquisition costs each year.
2
EMC PROVEN PROFESSIONAL

Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Analytics Lifecycle 35
Key Outputs from a Successful Analytic Project, by Role
What the Role Needs in the Final
Role Description
Deliverables
Someone who benefits from the end results and can • Sponsor Presentation addressing:
Business consult and advise project team on value of end results • Are the results good for me?
User and how these will be operationalized • What are the benefits of the findings?
• What are the implications of this for me?
Person responsible for the genesis of the project, • Sponsor Presentation addressing:
providing the impetus for the project and core business • What’s the business impact of doing this?
Project
problem, generally provides the funding and will gauge • What are the risks? ROI?
Sponsor
the degree of value from the final outputs of the working • How can this be evangelized within the
team organization (and beyond)?
Project Ensure key milestones and objectives are met on time
Manager and at expected quality.
Business Business domain expertise with deep understanding of • Show the analyst presentation
Intelligence the data, KPIs, key metrics and business intelligence • Determine if the reports will change
Analyst from a reporting perspective
Deep technical skills to assist with tuning SQL queries for • Share the code from the analytical project
Data
data management, extraction and support data ingest to • Create technical document on how to
Engineer
analytic sandbox implement it.
Database Database Administrator who provisions and configures • Share the code from the analytical project
Administrato database environment to support the analytical needs of • Create technical document on how to
r (DBA) the working team implement it.
Provide subject matter expertise for analytical • Show the analyst presentation
Data techniques, data modeling, applying valid analytical • Share the code
Scientist techniques to given business problems and ensuring
overall analytical objectives are met
2
EMC PROVEN PROFESSIONAL

Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Analytics Lifecycle 36
4 Core Deliverables to Meet Most Stakeholder
Needs
1. Presentation for Project Sponsors
• “Big picture" takeaways for executive level stakeholders
• Determine key messages to aid their decision-making process
• Focus on clean, easy visuals for the presenter to explain and for the
viewer to grasp

2. Presentation for Analysts


• Business process changes
• Reporting changes
• Fellow Data Scientists will want the details and are comfortable with
technical graphs (such as ROC curves, density plots, histograms)

3. Code for technical people


4. Technical specs of implementing the code
2
EMC PROVEN PROFESSIONAL

Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Analytics Lifecycle 37
Analyst Wish List for a Successful Analytics Project

Data & Workspaces


• Access to all the data, including aggregated OLAP data, BI tools, raw data, structured
and various states of unstructured data as needed
• Up-to-date data dictionary to describe the data
• Area for staging and production data sets
• Ability to move data back and forth between workspaces and staging areas
• Analytic sandbox with strong compute power to experiment and play with the data

Tools
• Statistical/mathematical/visual software of choice for a given situation and problem set,
such as SAS, Matlab, R, java tools, Tableau, Spotfire
• Collaboration: an online platform or environment for collaboration and communicating
with team members
• Tool or place to log errors with systems, environments or data sets

2
EMC PROVEN PROFESSIONAL

Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Analytics Lifecycle 39
Concepts in Practice
Greenplum’s Approach to Analytics

Magnetic Agile Deep

Attract all kinds of data Flexible and elastic data structures Rich data repository and
algorithmic engine

Analyze and

Future
How can
Model in the What will
we do
cloud happen?
better?
Push
results
What
back into happened
How and
why did it
the cloud

Past
where and
happen?
Analytics Get data when?

Data EDC PLATFORM into the Facts Interpretation


cloud

Source: MAD Skills: New Analysis Practices for Big Data, March 2009
2
EMC PROVEN PROFESSIONAL

Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Analytics Lifecycle 40
“The pessimist –
complains about the wind
The optimist –
expects it to change
The leader –
adjusts the sails
John Maxwell
(Leadership Author)

2
EMC PROVEN PROFESSIONAL

Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Analytics Lifecycle 41
Check Your Knowledge
• In which phase would you expect to invest most of your project time and
why? Where would expect to spend the least time? Your Thoughts?

• What are the benefits of doing a pilot program before a full scale rollout of a
new analytical methodology? Discuss this in the context of the mini case
study.

• What kinds of tools would be used in the following phases, and for which
kinds of use scenarios?
 Phase 2: Data Preparation
 Phase 4: Model Execution
• Now that you have completed the analytical project at Yoyodyne, you have an
opportunity to repurpose this approach for an online eCommerce company.
What phases of the lifecycle do you need to focus on to identify ways to do
this?

2
EMC PROVEN PROFESSIONAL

Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Analytics Lifecycle 42
Module 2: Summary

Key points covered in this module:


• The Data Analytics Lifecycle was applied to a case study
scenario
• A business problem was framed as an analytics problem
• The four main deliverables in an analytics project were
identified

2
EMC PROVEN PROFESSIONAL

Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Analytics Lifecycle 43
Lab Exercise 1: Introduction to Data Environment
This first lab introduces the Analytics Lab Environment you
will be working on throughout the course.

After completing the tasks in this lab you should be able to:
• Authenticate and access the Virtual Machine (VM)
assigned to you for all of your lab exercises
• Locate data sets you will be working with for the
course’s labs
• Use meta commands and PSQL to navigate through
the data sets
• Create sub-sets of the big data, using table joins and
filters to analyze subsequent lab exercises

2
EMC PROVEN PROFESSIONAL

Copyright © 2012 EMC Corporation. All Rights Reserved. Module 2: Data Analytics Lifecycle 44

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy