0% found this document useful (0 votes)
12 views25 pages

wk6 - Data Analytics

The document outlines the objectives and content of the Week 6 module on Data Analytic Techniques in the MSc Computer Science program, focusing on various types of data analytics including descriptive, diagnostic, predictive, and prescriptive analytics. It discusses the importance of data analytics in making informed business decisions and highlights the methodologies and tools used in each type of analysis. Additionally, it covers the evaluation of models, challenges in data quality, and available academic and wellbeing support for students.

Uploaded by

binaya Dai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views25 pages

wk6 - Data Analytics

The document outlines the objectives and content of the Week 6 module on Data Analytic Techniques in the MSc Computer Science program, focusing on various types of data analytics including descriptive, diagnostic, predictive, and prescriptive analytics. It discusses the importance of data analytics in making informed business decisions and highlights the methodologies and tools used in each type of analysis. Additionally, it covers the evaluation of models, challenges in data quality, and available academic and wellbeing support for students.

Uploaded by

binaya Dai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

MSc Computer Science

LDS7005M Big Data & Cloud Computing-LDS7005M,

Week 6: Data Analytic Techniques

Module Director(s): Dr Gayathri


Lecturer(s):

#WeAreYSJ

@YorkStJohn

@YorkStJohnUniversity
Objectives

▪ Learn what data analytics is and how it


works
▪ Analyse and understand the lifecycle of
data analytics
▪ Overview of statistical and machine
learning methods for data analysis
▪ Predictive modelling, clustering,
classification, and regression
▪ Evaluating model performance and
selection criteria
What is Data Analytics?

▪ Data analytics is the science of analysing raw data in order to make


conclusions about that information and make better business decisions.

▪ Many of the techniques and processes of data analytics have been


automated into mechanical processes and algorithms that work over raw
data for human consumption.
What is Data Analytics?
▪ Data analytics is a broad term that encompasses many diverse types of
data analysis. Any type of information can be subjected to data analytics
techniques to get insight that can be used to improve things.
▪ For example, content companies use data analytics to keep you clicking,
watching, or re- organizing content to get another view or another click.
Data Analytics Cycle

(Naveen, 2023)
Data Analytics Types

(Vinit Kachchi, 2021)


Descriptive analytics: What has happened?

▪ Descriptive analytics answers the “what happened” question by


summarizing past data without explaining the causes of future outcomes.
▪ Descriptive analytics is the ability to quantify events and report on them
in a human-readable way. It’s the first step in turning big data into
actionable insights.
▪ Easy to visualize
▪ (bar graphs, pie charts, histograms)
Descriptive analytics: What has happened?
▪ Typically, descriptive analytics takes the form of reports that synthesises the most relevant
tendencies in our data.
▪ Intends to be full of different types of plots to communicate different messages.
▪ A good example of descriptive analytics is Dashboards.
▪ It involves summarizing historical data to reveal patterns, trends, and key characteristics.
▪ It relies on summarization techniques, visualizations, and performance metrics to provide
insights into past events, serving as the foundational stage in the data analytics process.
▪ Examples include sales analysis, website traffic examination, and financial reporting
▪ Education: Student performance statistics over a semester.
▪ Healthcare: Number of hospital visits per month.

“Descriptive analysis is often the first step in data exploration before moving on to diagnostic, predictive,
or prescriptive analysis.”
Diagnostics Analytics : Why did it happen?

▪ This involves investigating and identifying the root causes of specific events or trends
revealed through descriptive analytics. It focuses on understanding why certain
patterns or outcomes occurred in historical data.

▪ When new problems arise, it is possible you have already collected certain data
pertaining to the issue that occurred in the past using questions that focus on the
reason behind the event. By already having the data at your disposal, it ends having
to repeat work and makes all problems interconnected.
Diagnostics Analytics : Why did it happen?

▪ Sample question may include: Why were Q2 sales less than Q1 sales?
▪ Diagnostics Analytics usually require collecting data from multiple
sources and storing it in a structure that lends itself to
▪ performing drill- down, correlation, regression and roll-up analysis.
▪ Risk analysis, anomaly detections
▪ Results are viewed via interactive visualisation tools that enable users to
identify trends and patterns.
Predictive analytics: What’s probably going to happen?

▪ Predictive analytics leverages statistical algorithms and machine learning techniques to forecast future
outcomes based on historical data patterns. It involves building models that can make predictions,
such as sales forecasting or customer behaviour, aiding in proactive decision-making. The goal is to
anticipate trends and events, enabling organizations to take pre-emptive actions.
▪ It uses historical data to predict future events. Typically, historical data is used to build a mathematical
model that captures important trends. That predictive model is then used on current data to predict
what will happen next, or to suggest actions to take for optimal outcomes.
▪ The accuracy of the predictions is field-dependent, i.e., is less complex to predict if a machine will fail
to predict cancer for example.
▪ Cloud Platforms: AWS SageMaker, Google Vertex AI.
▪ Python/R: Scikit-learn, TensorFlow, Prophet.
▪ SQL: Query large datasets for training models.
▪ BI Tools: Power BI
Predictive analytics: What’s probably going to happen?

▪ Machine Learning is a clear example where this type of analysis takes place.
▪ Questions are usually formulated using a what-if rationale, such as:
▪ What are the chances that a customer will default on a loan if they have missed a monthly
payment?
▪ The tools used generally abstract underlying statistical intricacies by providing user-
friendly front-end interfaces.
• Data Quality Matters: Garbage in → garbage out.
• Ethical Risks: Bias in data can lead to unfair predictions (e.g., loan denials).
Prescriptive analytics: What to do next?

▪ Prescriptive analytics recommends optimal actions by analysing data, considering


possible scenarios, and suggesting decisions that align with organizational goals.
▪ It goes beyond predicting outcomes, guiding decision-makers on the best course of
action for desired results.
▪ It requires such a seamless and completely integrated data analytics infrastructure
that just a few organizations have the capability to engage in a meaningful way.
Prescriptive analytics: What to do next?

▪ Prescriptive analytics is the frontier of data analysis, combining the insight from all
previous analyses to determine the course of action to take in a current problem or
decision.
▪ Artificial Intelligence (AI) is a perfect example of prescriptive analytics.
▪ Sample question may include: When is the best time to trade a particular stock?

▪ Tools for Prescriptive Analytics


▪ Advanced Software: IBM Decision Optimization, Gurobi, SAS.
▪ AI Platforms: Google Vertex AI, Azure Machine Learning.
▪ Custom Code: Python (PuLP, SciPy) for optimization models.
Data Analysis
Descriptive Statistics (summarizes the data)
Statistical Inferential Statistics (hypothesis)
Regression Analysis (Relationship between the
Methods models)
Analysis of Variance (ANOVA)-compare groups

Supervised Learning Machine


Unsupervised Learning
Reinforcement Learning Learning
Deep Learning Methods

Common Python
Tools and TensorFlow
PyTorch
Libraries
Types of Machine Learning
Clustering- Unsupervised Learning

Clustering
▪ Clustering aims to group similar data points together based on certain
features, uncovering inherent patterns or relationships.
▪ Applications: Customer segmentation, anomaly detection, or organizing
documents into topics.

Cluster 1 Cluster 2
Classification
▪ Classification assigns predefined categories or labels to data points based on their
characteristics.

▪ Applications: Spam detection in emails, sentiment analysis in social media, or


identifying fraud transactions.
Regression (Supervised Learning)
▪ Regression analyses the relationship between variables to predict a
continuous outcome, helping understand the impact of one variable on
another.

▪ Applications: Predicting house prices based on features, estimating sales


based on advertising spending, or forecasting temperature based on
historical data.
Research time- 15 mins
Could you provide an in-depth explanation of specific algorithms across different machine learning paradigms? In
your discussion, please cover:

• Supervised Learning: Detail one or two algorithms (e.g., Linear Regression, Support Vector Machines) including
their core concepts, typical applications, advantages, and limitations.
• Unsupervised Learning: Explain specific algorithms (such as K-Means Clustering or Principal Component Analysis
and Apriori Algorithm) with an emphasis on how they uncover hidden patterns in data.
• Reinforcement Learning: Describe a key algorithm (for instance, Q-Learning or Policy Gradient methods),
outlining how it learns through interactions with the environment and the challenges it faces.

• Semi-Supervised Learning: Discuss an algorithm that leverages both labeled and unlabeled data, and explain its
benefits in scenarios where fully labeled datasets are scarce.
• Self-Supervised Learning: Provide an overview of a self-supervised approach, highlighting how it generates
supervisory signals from the data itself.

Please ensure your explanation includes fundamental concepts, examples, and a discussion on the pros and
cons of each approach.
Model Valuation
▪ Both statistical and machine learning models need
rigorous evaluation and validation to ensure their
reliability and generalizability.

▪ Techniques include cross-validation, training and testing


datasets, and various performance metrics such as
accuracy, precision, recall, and F1 score.
▪ Hyper tuning
Challenges and Considerations
▪ Data Quality: Both statistical and machine learning methods require clean and relevant
data for accurate analysis.
▪ Interpretability: Statistical models often provide interpretable results, while some
complex machine learning models may lack transparency.
▪ Bias and Fairness: Addressing potential biases in data and models is crucial, especially in
machine learning applications.
Seminar Activity
• Part 1:Azure Virtual Machine creation – 1 hour ,
• Part 2 and 3: Data Structuring – 45 mins.

• Please upload your lab works in submission links created


Support available

• Academic Quality to check in with Student Support and Guidance Manager and Head of Student Opportunities

• Academic:
• Academic Skills Session (Compulsory – Scheduled session available on timetable)
• Small Group Academic Writing Tutorials both online and in-person (currently covering: Writing Critically, Essay writing, Report writing, Paraphrasing, Harvard Style
Referencing and Harvard Style Referencing) 1:1 support available on request if needed
• Targeted 1:1 support for students referred for Academic Misconduct
• Skills Guides available online to support

• Wellbeing:
• 1:1 Wellbeing Appointments
• Mental Health Support (online)
• Welfare Appointments (online)
• Wellbeing Breakfasts
Thank You ☺

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy