IDS - Lecture 1
IDS - Lecture 1
CUST
Fall 2024
● Class projects must be focused on some real data problem (ideally one that
you collect yourself), not an already-curated data set
How to take course
● Course is interesting, challenging, has high demand in industry
● Try to spend extra time learning about the contents discussed during class
● Do maximum programming practice
● Try to grasp Math involved (Not too much here)
1
1
Some possible definitions
1
2
So what is the process of data science?
13
So Now, Who Is a Data Scientist?
Data scientists use their skills to extract knowledge and insights from data to solve real-world
problems.
In Academic:
Background: Typically has a scientific background (social science, biology, etc.)
• Working with large datasets
• Addressing challenges related to data structure, size, quality, and complexity
• Applying computational methods to solve problems using data
In Industry :
• How to design the experiments,
• How to the process of collecting, cleaning, and munging of data.
• Exploratory data analysis, which combines visualization and data sense.
• Find patterns, build models, and algorithms.
• Use analyses for decision making.
Pop Up Quiz!!!
Question!
What is Data?
What is Data?
Structured
Any data that can be stored, accessed and processed in the form of fixed format is
termed as a ‘structured’ data.
Over the period of time, talent in computer science has achieved greater success in
developing techniques for working with such kind of data (where the format is well
known in advance) and also deriving value out of it. However, nowadays, we are
foreseeing issues when a size of such data grows to a huge extent, typical sizes are
being in the rage of multiple zettabytes.
Do you know? 1021 bytes equal to 1 zettabyte or one billion terabytes forms a zettabyte.
Example of Structured Data
What is Data?
The term “cloud computing” can be used to describe applications and data that users
access over the Internet rather than on their local computer.
Cloud Computing Services/Service Providers
● Amazon Web Services
● Google Cloud
● IBM Watson Studio
What is Data?
Big Data
Big Data is a collection of data that is huge in volume, yet growing exponentially with
time.
• It is a data with so large size and complexity that none of traditional data
management tools can store it or process it efficiently.
• Big data is also a data but with huge size.
What is Data?
Example of Big Data?
The New York Stock Exchange is an example of Big Data that generates about one
terabyte of new trade data per day.
What is Data?
Characteristics of Big Data
Volume – The name Big Data itself is related to a size which is enormous. Size of data plays a very crucial
role in determining value out of data. Also, whether a particular data can actually be considered as a Big Data
or not, is dependent upon the volume of data. Hence, ‘Volume’ is one characteristic which needs to be
considered while dealing with Big Data solutions.
What is Data?
Characteristics of Big Data