SE
SE
Algorithmic Models
Managerial based
① Subjective in nature
② One problem, different estimator will
produce different estimates.
③ Experience level effects estimate.
④ Unstructured process
⑤ Hard to convince customer
⑥ Difficulty in validating estimate.
Intuition Based cont…
Analogy
Used to estimate effort for a new problem by analyzing solutions that were used to
solve an old problem. The analogy method usually follows the process in three step
fashion.
Product metrics
◦ quantify characteristics of the product being
developed
size, cost
Process metrics
◦ quantify characteristics of the process being
used to develop the software
efficiency of fault detection
Cont…
Few Famous Metrics
◦ productivity KLOC/person-month
◦ quality faults/KLOC
◦ cost $$/KLOC
◦ documentation doc_pages/KLOC
LOC cont…
Why used?
◦ early systems emphasis on coding
Criticisms
◦ cross-language inconsistencies
◦ within language counting variations
◦ change in program structure can affect count
◦ stimulates programmers to write lots of code
◦ system-oriented, not user-oriented
How many Lines of Code in this program?
Program length(N) = N1 + N2
Following equations are used for computing estimation .
N = Observed Program Length = N1 + N 2
N* = Estimated Program Length = n1 (log2 (n1))+ n2 (log2 (n2))
n = Program Vocabulary = n1 + n2
V = The program volume (V) is the information contents of the program,
measured in mathematical bits= N*(log 2 (n))
D = Program Difficulty = (n1/2)*(N 2 /n2)
E = D*V
Where
n1 = number of distinct operators in a program
n2 = number of distinct operands in a program
N1 = number of occurrences of operators in a program
N2 = number of occurrences of operands in a program
Halstead’s Example
if (k < 2)
{
if (k > 3)
x = x*k;
}
If-then-else Until
Cyclomatic Complexity
Set of independent paths through the graph
(basis set)
V(G) = E – N + 2
◦ E is the number of flow graph edges
◦ N is the number of nodes
V(G) = P + 1
◦ P is the number of predicate nodes
Meaning
V(G) is the number of (enclosed) regions/areas of the planar graph.
The V(g) or cyclomatic number is a measure of the complexity of a
function which is correlated with difficulty in testing. The standard
value is between 1 and 10.
A value of 1 means the code has no branching.
7 4 5
6
Computing V(G)
V(G) = 9 – 7 + 2 = 4
V(G) = 3 + 1 = 4
Basis Set
◦ 1, 7
◦ 1, 2, 6, 1, 7
◦ 1, 2, 3, 4, 5, 2, 6, 1, 7
◦ 1, 2, 3, 5, 2, 6, 1, 7
Another Example
1
2
4
3
5 6
8
What is V(G)?
Function Points
Function point is emerged which measures the
size of a system from its functionality and
usability.
History
Non-code oriented size measure
Developed by IBM (A.Albrecht) in 1979, 1983
Now in use by more than 500 organizations
world-wide
1) data communications
2) distributed functions Each rated on scales equivalent
3) performance to the following:
4) heavily used configuration
5) transaction rate Not present =0
6) on-line data entry Incidental Influence = 1
7) end user efficiency Moderate Influence = 2
8) on-line update Average Influence = 3
9) complex processing Significant Influence= 4
10) reusability Strong Influence = 5
11) installation ease
12) operational ease
13) multiple sites
14) facilitates change
Function Point Calculation
5 3
Function Counts = FC x i w j
i 1 j 1
14
Function Points = FP FC.65 .01 ck
k 1
where
xi = function i
wj = weight j
ck = complexity factor k
Computing Function Points
Analyze information
domain of the Establish count for input domain and system
application and
interfaces
develop counts
FP = SUM(count x weight) x C
Compute function where
points complexity multiplier C = (0.65+0.01 x N)
degree of influence N = SUM(F_i)
Analyzing the Information
Domain
weighting factor
measurement parameter count simple avg. complex
number of user inputs X 3 4 6 =
number of user outputs X 4 5 7 =
number of user inquiries X 3 4 6 =
number of files X 7 10 15 =
number of ext.interfaces X 5 7 10 =
count-total
complexity multiplier
function points
Example: SafeHome Functionality
Test Sensor
Password
Zone Setting Sensors
Zone Inquiry
Monitor
Password, Alarm Alert and
Sensors, etc. Response
System
System
Config Data
Example: SafeHome FP Calc
weighting factor
measurement parameter count simple avg. complex
number of user inputs 3 X 3 4 6 = 9
number of user outputs 2 X 4 5 7 = 8
number of user inquiries 2 X 3 4 6 = 6
number of files 1 X 7 10 15 = 7
number of ext.interfaces 2 X 5 7 10 = 10
count-total 3
complexity multiplier 1.11
function points --
Attempt
Effort E = a * (KDSI) b
* EAF
Advanced COCOMO
It is used for large sized projects.
The cost drivers depend upon requirements, analysis,
design, testing and maintenance.
Team size is large.
Organic mode projects used for relatively smaller teams and Project is
developed in familiar environment.
E=2.4(KDSI)1.05 E in person-months and S=2.5(E)0.38.
7000
Organic
6000 Semidetached
Embedded
5000
Person-months
4000
3000
2000
1000
0
0 100 200 300 400 500 600
Thousands of lines of code
Solve
Assume that the size of an organic type
software product has been estimated to
be 32,000 lines of source code. Assume
that the average salary of software
engineers be Rs. 15,000/- per month.
Determine the effort required to develop
the software product and the nominal
development time
Issues in Algorithmic Models
Specific Input, Specific Output
Parameter dependent
Regression based
Sizing source code volumes
On the basis of studies, the conversion
between LOC and function points is
possible.
Soft computing Amalgamation
Research Direction
Divergent
Single Machine learning technique on a single
estimation method
Convergent
Compound
Convergent
Single machine learning methods applied on
same estimation method.
For example:
Roheet Bhatnagar et al. has compared
Development Time (DT’) by applying the
Feed Forward, Back Propagation Neural
Network model, Cascaded Feed Forward
Back Propagation Neural Network model,
Elman Back Propagation Neural Network
model, Layer Recurrent, Neural Network
model & Generalised Regression Neural
Network model on Lopez et al. dataset.
Paper discussion cont…
They used MMRE, BRE, MdMRE and pred(z) as results
classification techniques.
Results of Bhatnagar et al.
Results Bhatnagar et al.
What can be added?
Different Data set/s
More assessment techniques for example
MRSE, Bayesian analysis etc.
Use models results and compare their
results with ML techniques
Single Technique, Single Method
When one machine learning technique is
applied to calibrate any existing model.
Chohan et al. implemented COCOMO
with the help of artificial neural networks
and trained using the perceptron learning
algorithm. The COCOMO dataset is used
to train and to test the network. The test
results from the trained neural network
are compared with that of the
COCOMO model.
Chohan et al. cont…
They used 17 inputs to the network which
are size of the project in KLOC, 15 effort
multipliers, actual effort of the project and
one bias value. These inputs enter the
network as weighted inputs. The effort is
calculated using following equation. The
weights are initialized as Wi = 1 for i = 1 to
17, learning rate, α = 0.001 and bias b = 1.
The inputs, as received, are multiplied to the
weights and provided to the network.
Study Results with MRE as
evaluation criteria
What can be added
Calibrating the factors involved in
regression based equation.
Involve more than one dataset to validate
the results
Divergent
A single machine learning method applied
on various Estimation Techniques.
Divergent Example
Ali Abbas et. al compared the results of effort
estimates of 11 former estimation methods and 3
function points based methods computed from 4
commercially available data sets. Neural net back
propagation with varying hidden layers is also used to
compare the effort results and to evaluate whether
former methods, function points or computational
intelligence technique has the potential to yield close
to realistic estimates. It is also intended to examine
the behavior of estimation methods by identifying the
unusualities among the data sets. Magnitude of
Relative Error MRE, Mean Magnitude of Relative
Error MMRE, pred (0.25) and Pred (0.50) are used for
computational work
What can be added
Refine datasets and cluster them
according to similarity and apply the
machine learning technique.
Apply some other technique than back
propagation neural network.
Apply same/different machine learning on
linear and non linear based estimation
methods.
Ratio of other techniques for
prediction of different aspects
Other techniques and results
Ziauddin et al. used to fuzzify input
parameters of COCOMO II model and the
result is defuzzified to get the resultant
Effort. Triangular fuzzy numbers are used to
represent the linguistic terms in COCOMO II
model. The results of this model are
compared with COCOMO II and Alaa Sheta
Model. The proposed model yields better
results in terms of MMRE, PRED(n) and VAF.
Other techniques cont…
Pervinder et al. in paper Software Effort
Estimation Using Soft Computing Techniques
used Neuro-Fuzzy system as a soft
computing approach to generate model by
formulating the relationship based on its
training. In this paper, Neuro-Fuzzy technique
is used for software estimation modeling of
on NASA software project data and
performance of the developed models are
compared with the Halstead, Walston-Felix,
Bailey-Basili and Doty.
Other techniques cont…
B. Kumar in Software Effort Estimation by Genetic Algorithm
Tuned Parameters of Modified Constructive Cost Model for
NASA Software Projects used COCOMO as algorithmic
model and an attempt is being made to validate the
soundness of genetic algorithm technique using NASA
project data. The main objective of this research is to
investigate the effect of crisp inputs and genetic algorithm
technique on the accuracy of system’s output when a
modified version of the famous COCOMO model applied to
the NASA dataset. Proposed model validated by using 5 out
of 18 NASA project dataset. Empirical results show that
modified COCOMO for software effort estimates resulted
in slightly better as compared with prevoius literature. The
proposed model successfully improves the performance of
the estimated effort with respect to the Variance Account
For (VAF) criteria, MMRE and Pred.
So far Now:
Mostly the COCOMO model is
calibrated or validated by applying neural
networks, fuzzy logic and genetic
algorithm mostly. The evaluation criteria
seen so far based on MRE, MdMRE,
MMRE, Pred(z).
What can be added:
Apply same techniques on other than COCOMO
algorithm method.
Apply HRV prediction and clustering techniques on
software estimation available datasets and available
former estimation methods.(next page)
Qualitatively the management aspect of software
estimation is not amalgamated with machine learning
techniques. For example Genetic algorithm may be
applied to identify the best working team for any
particular software development aspect.
Or devise an hierarchy of activities to be estimated in
order. Avoid un-necessary estimation activities. One
such research appears in
APPLICATION OF 80/20 RULE IN SOFTWARE ENGINEERING
WATERFALL MODEL by Muzaffar iqbal.
For the application of HRV techniques
One HRV technique on all dataset (results variation reasons may be
identified and best technique may be inferred)
One HRV technique on single dataset(results may be compared with
other algorithmic models)
Multiple HRV techniques on all datasets(Results may be discussed
on the basis of HRV techniques applied)
Multiple HRV techniques on single dataset( identify the best
technique )
Categorize the dataset according to closely related observations
and apply techniques(extension of case 1, where the loosely
connected dataset yield bad results)
Cluster the same datasets and apply the HRV techniques.
techniques(extension of case 1, where the loosely connected
dataset yield bad results)
Apply neural network and genetic algorithms along with HRV
techniques on single or all datasets(allow literature to be compared
with the current results)
Commercially available datasets
Albrecht Data set
Dataset for Projects in 4GL
Kemerer Dataset
Bailey and Basili/ NASA Dataset
Hallmark Dataset