UNIT 2 Ai Project Cycle
UNIT 2 Ai Project Cycle
AI Project Cycle
AI Project Cycle
1. Problem Scoping
Problem Scoping refers to understanding a problem, finding out various factors which
affect the
problem, define the goal or aim of the project.
Sustainable Development Goals
Sustainable Development: To Develop for the present without exploiting the
resources of the future.
o 17 goals announced by United Nations.
o Aim to achieve them by 2030.
o Pledge taken by all the member nations of the UN.
The Sustainable Development Goals (SDGs), also known as the Global Goals, were
adopted by
all United Nations Member States in 2015 as a universal call to action to end poverty,
protect the
planet, and ensure that all people enjoy peace and
prosperity 4 W's of Problem Scoping
4 Problem Statement Template
The 4W’s of Problem Scoping are Who, What, Where, and Why.
This W’s helps in identifying and understanding the problem in a better and efficient
manner.
❖ Who - “Who” part helps us in comprehending and categorizing who all are
affected directly
and indirectly with the problem and who are called the Stake Holders.
❖ What - “What” part helps us in understanding and identifying the nature of the
problem and
under this block, you also gather evidence to prove that the problem you have selected
exists.
❖ Where - "Where” does the problem arise, situation, context, and location?
❖ Why - “Why” is the given problem worth solving?
Problem Statement Template
The Problem Statement Template helps us to summarize all the key points into one
single template. So that in the future, whenever there is a need to look back at the
basis of the problem, we can take a look at the Problem Statement Template and
understand its key elements of it. Have a look at Problem Statement Template.
2. Data Acquisition
Data Features
o Refer to the type of data you want to collect.
o Eg: Salary amount, increment percentage, increment period, bonus etc.
Big data
o It includes data with sizes that exceed the capacity of traditional software to
process within
an acceptable time and value.
o The main focus is on unstructured type of data
Web Scraping
• Web Scraping means collecting data from web using some technologies.
• We use it for monitoring prices, news and etc.
• Example: Web Scrapping using beautiful soup in python.
Sensors
• Sensors are very important but very simple to understand.
• Sensors are the part of IoT (Internet of things)
• Sensors collect the physical data and detect the changes.
Cameras
• Camera captures the visual information and then that information which is
called image is used as a source of data.
• Cameras are used to capture raw visual data.
❖ API
❖ Observations
• When we observe something carefully, we get some information
• For ex: Scientists Observe creatures to study them.
• Observations are a time-consuming data source.
❖ Surveys
• The survey is a method of gathering specific information from a sample of people.
• Example, a census survey for analysing the population
3. Data Exploration
In this stage of project cycle, we try to interpret some useful information out of the
data we have acquired. For this purpose, we need to explore the data and try to put
it uniformly for a better understanding. This stage deals with validating or verification
of the collected data and to analyze that:
➢ The data is according to the specifications decided.
➢ The data is free from errors.
➢ The data is meeting our needs
This stage is divided into 2 sub stages.
1) Data Cleaning
2) Data
Visualization.
Data Cleaning
Data cleaning helps in getting rid of commonly found errors and mistakes in a data
set. These are the 3 commonly found errors in data.
1) Outliers: Data points existing out of the range.
2) Missing data: Data points missing at certain places.
3) Erroneous data: Incorrect data points.
Outliers
An outlier is a data point in a dataset that is distant from
all other observations.
or
An outlier is something that behaves differently from
the combination/ collection of the data.
Missing Data
What do these NaN values indicate? They are the missing values in the data set. We
can handle them in two ways:
Erroneous Data
Erroneous data is test data that falls outside of what is acceptable and should be
rejected by the system.
Data Visualization
Why we need to explore data through visualization?
1) We want to quickly get a sense of the trends, relationships, and patterns
contained within the data.
2) It helps us define strategy for which model to use at a later stage.
3) Visual representation is easier to understand and communicate to others.
Data Visualization Techniques
1. Area Graphs
Area Graphs are Line Graphs are used to display the development of quantitative
values over an interval or time period. They are most commonly used to show
trends, rather than convey specific values.
2. Bar Charts
The classic Bar Chart uses either horizontal or vertical bars (column chart) to show
discrete, numerical comparisons across categories. Bars Charts are distinguished
from Histograms, as they do not display continuous developments over an interval.
Bar Chart's discrete data is categorical data and therefore answers the question of
"how many?" in each category.
3. Histogram
4. Line Graphs
Line Graphs are used to display quantitative values over a continuous interval or
time period. A Line Graph is most frequently used to show trends and analyze how
the data has changed over time. Line Graphs are drawn by first plotting data points
on a Cartesian coordinate grid, then connecting a line between all of these points.
Typically, the y-axis has a quantitative value, while the x-axis is a timescale or a
sequence of intervals. Negative values can be displayed below the x-axis.
5. Pie Charts
Pie Charts help show proportions and percentages between categories, by dividing
a circle into proportional segments. Each arc length represents a proportion of each
category, while the full circle represents the total sum of all the data, equal to 100%.
Pie Charts are ideal for giving the reader a quick idea of the proportional distribution
of the data.
6. Scatterplots
7. Flow Charts
This type of diagram is used to show the sequential steps of a process. Flow Charts
map out a process using a series of connected symbols, which makes the process
easy to understand and aids in its communication to other people. Flow Charts are
useful for explaining how a complex and/or abstract procedure, system, concept or
algorithm work. Drawing a Flow Chart can also help in planning and developing a
process or improving an existing one.
4. Modelling
It’s the fourth stage of AI project cycle. In previous stage, i.e. graphical
representation makes the data understandable for humans as we can discover
trends and patterns out of it. But when it comes to machines accessing and
analyzing data, it needs the data in the most basic form of numbers (which is binary
– 0s and 1s) and when it comes to discovering patterns and trends in data, the
machine goes in for mathematical representations of the same. The ability to
mathematically describe the relationship between parameters is the heart of every AI
model.
In this approach, the rules are defined by the developer. The machine follows the
rules or instructions mentioned by the developer and performs its task accordingly.
So, it’s a static model. i.e. the machine once trained, does not take into consideration
any changes made in the original training dataset. Thus, machine learning gets
introduced as an extension to this as in that case, the machine adapts to change in
data and rules and follows the updated path only, while a rule-based model does
what it has been taught once.
After training, the machine is now fed with testing data. Now, the testing data might
not have similar images as the ones on which the model has been trained. So, the
model adapts to the features on which it has been trained and accordingly predicts
the output. In this way, the machine learns by itself by adapting to the new data
which is flowing in. This is the machine learning approach which introduces the
dynamicity in the model. Generally, learning based models can be classified as
follows:
I. Supervised Learning
In a supervised learning model, the dataset which is fed to the machine is labeled. In
other words, we can say that the dataset is known to the person who is training the
machine only then he/she is able to label the data. A label is some information which
can be used as a tag for data. For example, students get grades according to the
marks they secure in examinations. These grades are labels which categorize the
students according to their marks.
In this model, data is classified according to the labels. For example, in the grading
system, students are classified on the basis of the grades they obtain with respect to
their marks in the examination. This model works on discrete dataset which means
the data need not be continuous.
b) Regression
This model work on continuous data. For example, if you wish to predict your next
salary, then you would put in the data of your previous salary, any increments, etc.,
and would train the model. Here, the data which has been fed to the machine is
continuous.
An unsupervised learning model works on unlabeled dataset. This means that the
data which is fed to the machine is random and there is a possibility that the person
who is training the model does not have any information regarding it. The
unsupervised learning models are used to identify relationships, patterns and trends
out of the data which is fed into it. It helps the user in understanding what the data is
about and what are the major features identified by the machine in it.
For example, you have a random data of 1000 dog images and you wish to
understand some pattern out of it, you would feed this data into the unsupervised
learning model and would train the machine on it. After training, the machine would
come up with patterns which it was able to identify out of it. The Machine might
come up with patterns which are already known to the user like colour or it might
even come up with something very unusual like the size of the dogs.
a) Clustering
It refers to the unsupervised learning algorithm which can cluster the unknown data
according to the patterns or trends identified out of it. The patterns observed might
be the ones which are known to the developer or it might even come up with some
unique patterns out of it.
b) Dimensionality Reduction
5. Evaluation
2. Precision
Precision is defined as the percentage of true positive cases versus all the cases
where the prediction is true.
3. Recall
Recall is defined as the fraction of positive cases that are correctly Identified.
4. F1 score
The F1 score is a number between 0 and 1 and is the harmonic mean of precision
and recall