0% found this document useful (0 votes)
17 views10 pages

UNIT 2 Ai Project Cycle

The AI Project Cycle is a structured process that includes problem scoping, data acquisition, data exploration, modeling, and evaluation to effectively solve problems using AI. Each stage involves specific tasks such as understanding the problem, collecting and cleaning data, creating models, and evaluating their performance. The document also discusses sustainable development goals and various data visualization techniques to aid in understanding and communicating data insights.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views10 pages

UNIT 2 Ai Project Cycle

The AI Project Cycle is a structured process that includes problem scoping, data acquisition, data exploration, modeling, and evaluation to effectively solve problems using AI. Each stage involves specific tasks such as understanding the problem, collecting and cleaning data, creating models, and evaluating their performance. The document also discusses sustainable development goals and various data visualization techniques to aid in understanding and communicating data insights.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

UNIT-2

AI Project Cycle

AI Project Cycle

Project Cycle is a step-by-step process to solve problems using proven scientific


methods and drawing inferences about them.Let us take some daily examples as
projects, requiring steps to solve the problem.

Creating a birthday card.


1. Checking the factors like budget, etc which will help us decide the next steps
and understanding the Project.
2. Acquiring data from different sources like online, with friends etc for Designs
and ideas.
3. Making a list of the gathered data.
4. Creating or modelling a card on the basis of the data collected.
5. Showing it to Parents or cousins to let them check it or evaluate it.
Components of AI Project Cycle
Components of the project cycle are the steps that contribute to completing the Project.
The Components of AI Project Cycle are: -
❖ Problem Scoping - Understanding the problem
❖ Data Acquisition - Collecting accurate and reliable data
❖ Data Exploration - Arranging the data uniformly
❖ Modelling - Creating Models from the data
❖ Evaluation - Evaluating the project

1. Problem Scoping
Problem Scoping refers to understanding a problem, finding out various factors which
affect the
problem, define the goal or aim of the project.
Sustainable Development Goals
Sustainable Development: To Develop for the present without exploiting the
resources of the future.
o 17 goals announced by United Nations.
o Aim to achieve them by 2030.
o Pledge taken by all the member nations of the UN.
The Sustainable Development Goals (SDGs), also known as the Global Goals, were
adopted by
all United Nations Member States in 2015 as a universal call to action to end poverty,
protect the
planet, and ensure that all people enjoy peace and
prosperity 4 W's of Problem Scoping
4 Problem Statement Template
The 4W’s of Problem Scoping are Who, What, Where, and Why.
This W’s helps in identifying and understanding the problem in a better and efficient
manner.
❖ Who - “Who” part helps us in comprehending and categorizing who all are
affected directly
and indirectly with the problem and who are called the Stake Holders.
❖ What - “What” part helps us in understanding and identifying the nature of the
problem and
under this block, you also gather evidence to prove that the problem you have selected
exists.
❖ Where - "Where” does the problem arise, situation, context, and location?
❖ Why - “Why” is the given problem worth solving?
Problem Statement Template

The Problem Statement Template helps us to summarize all the key points into one
single template. So that in the future, whenever there is a need to look back at the
basis of the problem, we can take a look at the Problem Statement Template and
understand its key elements of it. Have a look at Problem Statement Template.
2. Data Acquisition

The process of collecting accurate and reliable data to work with.

Two types of Data Sets

Data Features
o Refer to the type of data you want to collect.
o Eg: Salary amount, increment percentage, increment period, bonus etc.
Big data
o It includes data with sizes that exceed the capacity of traditional software to
process within
an acceptable time and value.
o The main focus is on unstructured type of data

Web Scraping
• Web Scraping means collecting data from web using some technologies.
• We use it for monitoring prices, news and etc.
• Example: Web Scrapping using beautiful soup in python.
Sensors
• Sensors are very important but very simple to understand.
• Sensors are the part of IoT (Internet of things)
• Sensors collect the physical data and detect the changes.
Cameras

• Camera captures the visual information and then that information which is
called image is used as a source of data.
• Cameras are used to capture raw visual data.
❖ API

• Application Programming interface.


• API is a messenger which takes requests and tells the system about requests
and gives the response.
• Ex: Twitter API, Google Search API

❖ Observations
• When we observe something carefully, we get some information
• For ex: Scientists Observe creatures to study them.
• Observations are a time-consuming data source.
❖ Surveys
• The survey is a method of gathering specific information from a sample of people.
• Example, a census survey for analysing the population
3. Data Exploration

In this stage of project cycle, we try to interpret some useful information out of the
data we have acquired. For this purpose, we need to explore the data and try to put
it uniformly for a better understanding. This stage deals with validating or verification
of the collected data and to analyze that:
➢ The data is according to the specifications decided.
➢ The data is free from errors.
➢ The data is meeting our needs
This stage is divided into 2 sub stages.
1) Data Cleaning
2) Data
Visualization.
Data Cleaning
Data cleaning helps in getting rid of commonly found errors and mistakes in a data
set. These are the 3 commonly found errors in data.
1) Outliers: Data points existing out of the range.
2) Missing data: Data points missing at certain places.
3) Erroneous data: Incorrect data points.
Outliers
An outlier is a data point in a dataset that is distant from
all other observations.
or
An outlier is something that behaves differently from
the combination/ collection of the data.

Missing Data

What do these NaN values indicate? They are the missing values in the data set. We
can handle them in two ways:

1. By eliminating the rows of missing values.


(Generally, not recommended as it might reduce
the data set to some extent leading to less data to
be trained)

2. By using an Imputer to find the best


possible substitute to replace missing values.

Erroneous Data
Erroneous data is test data that falls outside of what is acceptable and should be
rejected by the system.
Data Visualization
Why we need to explore data through visualization?
1) We want to quickly get a sense of the trends, relationships, and patterns
contained within the data.
2) It helps us define strategy for which model to use at a later stage.
3) Visual representation is easier to understand and communicate to others.
Data Visualization Techniques

1. Area Graphs

Area Graphs are Line Graphs are used to display the development of quantitative
values over an interval or time period. They are most commonly used to show
trends, rather than convey specific values.

2. Bar Charts

The classic Bar Chart uses either horizontal or vertical bars (column chart) to show
discrete, numerical comparisons across categories. Bars Charts are distinguished
from Histograms, as they do not display continuous developments over an interval.
Bar Chart's discrete data is categorical data and therefore answers the question of
"how many?" in each category.
3. Histogram

A Histogram visualizes the distribution of data over a continuous interval or certain


time period. Each bar in a histogram represents the tabulated frequency at each
interval/bin. Histograms help give an estimate as to where values are concentrated,
what the extremes are and whether there are any gaps or unusual values.

4. Line Graphs

Line Graphs are used to display quantitative values over a continuous interval or
time period. A Line Graph is most frequently used to show trends and analyze how
the data has changed over time. Line Graphs are drawn by first plotting data points
on a Cartesian coordinate grid, then connecting a line between all of these points.
Typically, the y-axis has a quantitative value, while the x-axis is a timescale or a
sequence of intervals. Negative values can be displayed below the x-axis.

5. Pie Charts

Pie Charts help show proportions and percentages between categories, by dividing
a circle into proportional segments. Each arc length represents a proportion of each
category, while the full circle represents the total sum of all the data, equal to 100%.
Pie Charts are ideal for giving the reader a quick idea of the proportional distribution
of the data.

6. Scatterplots

Scatterplots use a collection of points placed using Cartesian Coordinates to display


values from two variables. By displaying a variable in each axis, you can detect if a
relationship or correlation between the two variables exists.

7. Flow Charts
This type of diagram is used to show the sequential steps of a process. Flow Charts
map out a process using a series of connected symbols, which makes the process
easy to understand and aids in its communication to other people. Flow Charts are
useful for explaining how a complex and/or abstract procedure, system, concept or
algorithm work. Drawing a Flow Chart can also help in planning and developing a
process or improving an existing one.

4. Modelling

It’s the fourth stage of AI project cycle. In previous stage, i.e. graphical
representation makes the data understandable for humans as we can discover
trends and patterns out of it. But when it comes to machines accessing and
analyzing data, it needs the data in the most basic form of numbers (which is binary
– 0s and 1s) and when it comes to discovering patterns and trends in data, the
machine goes in for mathematical representations of the same. The ability to
mathematically describe the relationship between parameters is the heart of every AI
model.

Generally, AI models can be classified as follows:

Rule Based Approach

In this approach, the rules are defined by the developer. The machine follows the
rules or instructions mentioned by the developer and performs its task accordingly.
So, it’s a static model. i.e. the machine once trained, does not take into consideration
any changes made in the original training dataset. Thus, machine learning gets
introduced as an extension to this as in that case, the machine adapts to change in
data and rules and follows the updated path only, while a rule-based model does
what it has been taught once.

Learning Based Approach


It’s a type of AI modelling where the machine learns by itself. Under the Learning
Based approach, the AI model gets trained on the data fed to it and then is able to
design a model which is adaptive to the change in data. That is, if the model is
trained with X type of data and the machine designs the algorithm around it, the
model would modify itself according to the changes which occur in the data so that
all the exceptions are handled in this case.

After training, the machine is now fed with testing data. Now, the testing data might
not have similar images as the ones on which the model has been trained. So, the
model adapts to the features on which it has been trained and accordingly predicts
the output. In this way, the machine learns by itself by adapting to the new data
which is flowing in. This is the machine learning approach which introduces the
dynamicity in the model. Generally, learning based models can be classified as
follows:

I. Supervised Learning

In a supervised learning model, the dataset which is fed to the machine is labeled. In
other words, we can say that the dataset is known to the person who is training the
machine only then he/she is able to label the data. A label is some information which
can be used as a tag for data. For example, students get grades according to the
marks they secure in examinations. These grades are labels which categorize the
students according to their marks.

There are two main types of supervised learning models:


a) Classification

In this model, data is classified according to the labels. For example, in the grading
system, students are classified on the basis of the grades they obtain with respect to
their marks in the examination. This model works on discrete dataset which means
the data need not be continuous.

b) Regression

This model work on continuous data. For example, if you wish to predict your next
salary, then you would put in the data of your previous salary, any increments, etc.,
and would train the model. Here, the data which has been fed to the machine is
continuous.

II. Unsupervised Learning

An unsupervised learning model works on unlabeled dataset. This means that the
data which is fed to the machine is random and there is a possibility that the person
who is training the model does not have any information regarding it. The
unsupervised learning models are used to identify relationships, patterns and trends
out of the data which is fed into it. It helps the user in understanding what the data is
about and what are the major features identified by the machine in it.

For example, you have a random data of 1000 dog images and you wish to
understand some pattern out of it, you would feed this data into the unsupervised
learning model and would train the machine on it. After training, the machine would
come up with patterns which it was able to identify out of it. The Machine might
come up with patterns which are already known to the user like colour or it might
even come up with something very unusual like the size of the dogs.

There are two main types of unsupervised learning models:

a) Clustering

It refers to the unsupervised learning algorithm which can cluster the unknown data
according to the patterns or trends identified out of it. The patterns observed might
be the ones which are known to the developer or it might even come up with some
unique patterns out of it.

b) Dimensionality Reduction

We humans are able to visualize up to 3-Dimensions only but according to a lot of


theories and algorithms, there are various entities which exist beyond 3-Dimensions.
For example, in Natural language Processing, the words are considered to be
N-Dimensional entities. Which means that we cannot visualize them as they exist
beyond our visualization ability. Hence, to make sense out of it, we need to reduce
their dimensions. Here, dimensionality reduction algorithm is used.

III. Reinforcement Learning

It a type of machine learning technique that enables an agent(model) to learn in an


interactive environment by trial and error using feedback from its own actions and
experiences. Though both supervised and reinforcement learning use mapping
between input and output, unlike supervised learning where feedback provided to
the agent(model) is correct set of actions for performing a task, reinforcement
learning uses rewards and punishment as signals for positive and negative behavior.
Reinforcement learning is all about making decisions sequentially.

5. Evaluation

Evaluation is a process of understanding the reliability of any AI model, based on


outputs by feeding the test dataset into the model and comparing it with actual
answers. i.e. oonce a model has been made and trained, it needs to go through
proper testing so that one can calculate the efficiency and performance of the model.
Hence, the model is tested with the help of Testing Data (which was separated out of
the acquired dataset at Data Acquisition stage. The efficiency of the model is
calculated on the basis of the parameters mentioned below:

1. Accuracy: Accuracy is defined as the percentage of correct predictions out of


all the observations.

2. Precision

Precision is defined as the percentage of true positive cases versus all the cases
where the prediction is true.

3. Recall

Recall is defined as the fraction of positive cases that are correctly Identified.

4. F1 score

The F1 score is a number between 0 and 1 and is the harmonic mean of precision
and recall

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy