wk6 - Data Analytics
wk6 - Data Analytics
#WeAreYSJ
@YorkStJohn
@YorkStJohnUniversity
Objectives
(Naveen, 2023)
Data Analytics Types
“Descriptive analysis is often the first step in data exploration before moving on to diagnostic, predictive,
or prescriptive analysis.”
Diagnostics Analytics : Why did it happen?
▪ This involves investigating and identifying the root causes of specific events or trends
revealed through descriptive analytics. It focuses on understanding why certain
patterns or outcomes occurred in historical data.
▪ When new problems arise, it is possible you have already collected certain data
pertaining to the issue that occurred in the past using questions that focus on the
reason behind the event. By already having the data at your disposal, it ends having
to repeat work and makes all problems interconnected.
Diagnostics Analytics : Why did it happen?
▪ Sample question may include: Why were Q2 sales less than Q1 sales?
▪ Diagnostics Analytics usually require collecting data from multiple
sources and storing it in a structure that lends itself to
▪ performing drill- down, correlation, regression and roll-up analysis.
▪ Risk analysis, anomaly detections
▪ Results are viewed via interactive visualisation tools that enable users to
identify trends and patterns.
Predictive analytics: What’s probably going to happen?
▪ Predictive analytics leverages statistical algorithms and machine learning techniques to forecast future
outcomes based on historical data patterns. It involves building models that can make predictions,
such as sales forecasting or customer behaviour, aiding in proactive decision-making. The goal is to
anticipate trends and events, enabling organizations to take pre-emptive actions.
▪ It uses historical data to predict future events. Typically, historical data is used to build a mathematical
model that captures important trends. That predictive model is then used on current data to predict
what will happen next, or to suggest actions to take for optimal outcomes.
▪ The accuracy of the predictions is field-dependent, i.e., is less complex to predict if a machine will fail
to predict cancer for example.
▪ Cloud Platforms: AWS SageMaker, Google Vertex AI.
▪ Python/R: Scikit-learn, TensorFlow, Prophet.
▪ SQL: Query large datasets for training models.
▪ BI Tools: Power BI
Predictive analytics: What’s probably going to happen?
▪ Machine Learning is a clear example where this type of analysis takes place.
▪ Questions are usually formulated using a what-if rationale, such as:
▪ What are the chances that a customer will default on a loan if they have missed a monthly
payment?
▪ The tools used generally abstract underlying statistical intricacies by providing user-
friendly front-end interfaces.
• Data Quality Matters: Garbage in → garbage out.
• Ethical Risks: Bias in data can lead to unfair predictions (e.g., loan denials).
Prescriptive analytics: What to do next?
▪ Prescriptive analytics is the frontier of data analysis, combining the insight from all
previous analyses to determine the course of action to take in a current problem or
decision.
▪ Artificial Intelligence (AI) is a perfect example of prescriptive analytics.
▪ Sample question may include: When is the best time to trade a particular stock?
Common Python
Tools and TensorFlow
PyTorch
Libraries
Types of Machine Learning
Clustering- Unsupervised Learning
Clustering
▪ Clustering aims to group similar data points together based on certain
features, uncovering inherent patterns or relationships.
▪ Applications: Customer segmentation, anomaly detection, or organizing
documents into topics.
Cluster 1 Cluster 2
Classification
▪ Classification assigns predefined categories or labels to data points based on their
characteristics.
• Supervised Learning: Detail one or two algorithms (e.g., Linear Regression, Support Vector Machines) including
their core concepts, typical applications, advantages, and limitations.
• Unsupervised Learning: Explain specific algorithms (such as K-Means Clustering or Principal Component Analysis
and Apriori Algorithm) with an emphasis on how they uncover hidden patterns in data.
• Reinforcement Learning: Describe a key algorithm (for instance, Q-Learning or Policy Gradient methods),
outlining how it learns through interactions with the environment and the challenges it faces.
• Semi-Supervised Learning: Discuss an algorithm that leverages both labeled and unlabeled data, and explain its
benefits in scenarios where fully labeled datasets are scarce.
• Self-Supervised Learning: Provide an overview of a self-supervised approach, highlighting how it generates
supervisory signals from the data itself.
Please ensure your explanation includes fundamental concepts, examples, and a discussion on the pros and
cons of each approach.
Model Valuation
▪ Both statistical and machine learning models need
rigorous evaluation and validation to ensure their
reliability and generalizability.
• Academic Quality to check in with Student Support and Guidance Manager and Head of Student Opportunities
• Academic:
• Academic Skills Session (Compulsory – Scheduled session available on timetable)
• Small Group Academic Writing Tutorials both online and in-person (currently covering: Writing Critically, Essay writing, Report writing, Paraphrasing, Harvard Style
Referencing and Harvard Style Referencing) 1:1 support available on request if needed
• Targeted 1:1 support for students referred for Academic Misconduct
• Skills Guides available online to support
• Wellbeing:
• 1:1 Wellbeing Appointments
• Mental Health Support (online)
• Welfare Appointments (online)
• Wellbeing Breakfasts
Thank You ☺