FIT1043 - Lecture 1 - 2024 Data Science
FIT1043 - Lecture 1 - 2024 Data Science
Mahsa Salehi*
Semester 2, 2024
We try and cover the full extent of what makes Data Science:
Teaching Team
Staff Role Email
Dr. Mahsa Salehi Chief Examiner mahsa.salehi@monash.edu
and Lecturer
Dr. Heshan Kumarage Admin TA heshan.kumarage@monash.edu
2. additional textbook:
► no “perfect” Introduction to Data Science textbook available
► but a good introductory text available for purchase is:
The Art of Data Science by Peng & Matsui
► Doing Data Science by Rachel Schutt and Cathy O’Neil
► Python Data Science Handbook by Jake vanderPlas
Resources
3. review of Ed Lessons (will be added)
► LOTS of additional resources and exercises
► get the big picture from articles/videos
► a “critical mindset”:
► you will read/view a variety of material
► basicexposure to information technology and internet
businesses:
► Amazon, Google, Twitter, ...
Getting Started
6 Regression analysis
PollEv.com/mahsasalehi868
Learning Outcomes (Week 1)
Jupyter Notebook
► To be achieved in your applied session
Overview of Data Science
person A person B
person C person D
What is Data Science?
Data Science is
A. machine learning on big data
B. extraction of knowledge/value from data
through the complete data lifecycle mahsasalehi868
process
C. almost everything that has something to
do with data: collecting, analyzing,
modeling, etc, yet the most important
part is its applications — all sorts of
applications
Data Science Venn Diagram
Drew Conway’s Venn diagram of data science
Data Science Venn Diagram
Drew Conway’s Venn diagram of data science
Conclusion:
§ Combination of different skill sets
§ Diverse skills are needed
Data Science Examples
mahsasalehi868
Data Science Examples
A. Video Games
B. Self-driving cars
C. Spam filtering
D. Predictions
mahsasalehi868
E. All of the options
The Data Science Process
image src: Stephen Ausmus acquired from USDA ARS, public domain.
3. Integration: Data can come from many different
sources.
icons from by Openclipart.org, public domain; Good and Evil by AJC ajcann.wordpress.com
5. Governance: managing data standards and formats
image src: Stephen Ausmus acquired from USDA ARS, public domain
9. Visualisation: Choosing appropriate
visualizations for the data. Many different options
exist!
mahsasalehi868
The Data Science Process:
Our Standard Value Chain
our model of the process
Data Science Project Tasks
Collection: getting the data
Engineering: storage and computational resources across full lifecycle
Governance: overall management of data such as security across full
lifecycle
Wrangling: data preprocessing, cleaning
Analysis: discovery (learning, visualisation, etc.)
Presentation: arguing the case that the results are significant and
useful
Operationalisation: putting the results to work, so as to gain benefits
or value
We call this the
Standard Value Chain
Data Science Process
from Doing Data Science by Schutt and O’Neil, 2013, (available digitally through library)
We learnt
► what is data science and
► what is Drew Conway’s Venn diagram