0% found this document useful (0 votes)
51 views78 pages

SE

The document discusses various techniques for estimating effort for software projects. It explains that as software projects vary in size, more precise estimation techniques are needed. It then describes factors that influence estimation accuracy, such as changing requirements and team experience. The document outlines different estimation methods like intuition-based, algorithmic models, and managerial approaches. It also discusses principles for estimating project attributes like effort, cost and schedule. Various estimation metrics are presented such as lines of code, Halstead's equation and McCabe's complexity measure.

Uploaded by

Syed Hamza Kazmi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views78 pages

SE

The document discusses various techniques for estimating effort for software projects. It explains that as software projects vary in size, more precise estimation techniques are needed. It then describes factors that influence estimation accuracy, such as changing requirements and team experience. The document outlines different estimation methods like intuition-based, algorithmic models, and managerial approaches. It also discusses principles for estimating project attributes like effort, cost and schedule. Various estimation metrics are presented such as lines of code, Halstead's equation and McCabe's complexity measure.

Uploaded by

Syed Hamza Kazmi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 78

Effort Estimation

 As the variation occurred in software size from


small to medium or large, the need for precision
or correctness in software estimation with
understanding has also grown.

 effective prediction is influenced by several known


or unknown factors like imprecise and drifting
requirements, newness (complete project or
technology or both), trying to match the
estimates with available time and budgets,
impractical or heavy change in plan during the
execution of the project, Software type and its
size, Programming languages, teams capability and
the stage during the development when
estimation is conducted
As the project tasks are completed by the team, the rate of
change of a system’s estimated final size and the
uncertainty about a system’s estimated final size should
approach zero
Estimation Methods
 Intuition Based/Experience based

 Algorithmic Models

 Managerial based

 Soft computing amalgamated


Estimating Principle
 Project Size X Project Attributes results
into the estimates in the form of
 Effort
 Cost
 Schedule
 Deliverables
Effecting Factors
 Rate of Change
 Experience of development team
 Process mode
 Project Size
 Development Languages
 Reusability
 Delivery dates
 And many more….
What is required to be estimated

 Size of the project


 Effort required for the project
 Delivery time for the project/product
 Staff required for completion
Intuition Based
 Expert Judgment
 Expert Judgment sometimes known as Delphi technique is
one of the most widely used Method, sometimes referred as
educated guessing technique based on intuition of some
experts.
 Different expert opinion is then analyzed to predict
estimates.
 Human experts provide cost and schedule estimates based
on their experience. Usually, they have knowledge of similar
development efforts, and the degree of similarity is relative
to their understanding of the proposed project.
 the guess is a sophisticated judgment supported by a variety
of tools. The whole process of estimation could be a group
discussion that ends upon a all agreed effort estimate
Shortcomings of Expert Judgment

① Subjective in nature
② One problem, different estimator will
produce different estimates.
③ Experience level effects estimate.
④ Unstructured process
⑤ Hard to convince customer
⑥ Difficulty in validating estimate.
Intuition Based cont…
Analogy
 Used to estimate effort for a new problem by analyzing solutions that were used to
solve an old problem. The analogy method usually follows the process in three step
fashion.

① Selection of relevant cost attributes


 The selection of relevant cost attributes may be determined by looking at the best
possible combination of variables by implementing a comprehensive search by
means of available clustering algorithms like ANGEL, ACE and ESTOR.

② Selection of suitable similarity/distance functions


 In order to compare the datasets of different projects the Similarity functions are
defined.

③ Number of analogues to consider for prediction


 For prediction, one may use the effort value of the most similar analogue. When
considering more than one analogue, simply the mean value (mean effort from
retrieved project 1 and 2), or a weighted mean may be used for prediction.
Shortcoming of Analogy
1) Availability of appropriate analogue.
2) A sound strategy to select analogue.
3) Accuracy of the data used for both the
analogue and the new project.
Managerial Based
 Work break down structure.
I. Teams work on different tasks
II. Top down structure suggests that total
effort is measurable without decomposing
or breaking down the project into fewer
activities or parts.
III. Bottom Up suggests that work should be
broken down into the number of activities
IV. Discuss any idea
Software Metric
 Quantifiable measures that could be used to measure
characteristics of a software system or the software
development process
 Managers need quantifiable information, and not subjective
information
 Measure: Quantitative indication of the extent, amount,
dimension, or size of some attribute of a product or process.
 Indicators: A combination of metrics that provides insight into
the software process, project or product
 Direct Metrics: Immediately measurable attributes (e.g. line of
code, execution speed, defects reported)
 Indirect Metrics: Aspects that are not immediately quantifiable
(e.g. functionality, reliability)
Software Metric cont…
 Types of Metrics

 Product metrics
◦ quantify characteristics of the product being
developed
 size, cost
 Process metrics
◦ quantify characteristics of the process being
used to develop the software
 efficiency of fault detection
Cont…
 Few Famous Metrics

 Line of Code (LOC)


 Halstead Equation
 McCabe’s Cyclomatic Complexity
 Function Points
Line of Code (LOC)
 perhaps the simplest:
◦ The lines of code measures are the most traditional measures used to
quantify software complexity and for estimating software development
effort
◦ count the number of lines of code (LOC) and use as a measure of
program complexity.
◦ Simple to use:
 if errors occur at 2% per line, a 5000 line program should have about 100
errors.

Some common Measures

◦ productivity KLOC/person-month
◦ quality faults/KLOC
◦ cost $$/KLOC
◦ documentation doc_pages/KLOC
LOC cont…
 Why used?
◦ early systems emphasis on coding

 Criticisms
◦ cross-language inconsistencies
◦ within language counting variations
◦ change in program structure can affect count
◦ stimulates programmers to write lots of code
◦ system-oriented, not user-oriented
How many Lines of Code in this program?

#define LOWER 0 /* lower limit of table */


#define UPPER 300 /* upper limit */
#define STEP 20 /* step size */

main () /* print a Fahrenheit-Celsius conversion


table */
{
int fahr;
for (fahr=LOWER; fahr <= UPPER;
fahr=fahr+STEP)
printf(“%4d %6.1f\n”, fahr, (5.0/9.0)*(fahr-
32));
}
Halstead Equation
 Give more weight to lines that are more complex.
 metrics of the software should reflect the implementation or expression of algorithms
in different languages, but be independent of their execution on a specific platform.
These metrics are therefore computed statically from the code.
 In order to estimate the code length, volume, complexity and effort, software science
suggests the use of operators and operands.

 Program length(N) = N1 + N2
 Following equations are used for computing estimation .
 N = Observed Program Length = N1 + N 2
 N* = Estimated Program Length = n1 (log2 (n1))+ n2 (log2 (n2))
 n = Program Vocabulary = n1 + n2
 V = The program volume (V) is the information contents of the program,
measured in mathematical bits= N*(log 2 (n))
 D = Program Difficulty = (n1/2)*(N 2 /n2)
 E = D*V

 Where
 n1 = number of distinct operators in a program
 n2 = number of distinct operands in a program
 N1 = number of occurrences of operators in a program
 N2 = number of occurrences of operands in a program
Halstead’s Example
if (k < 2)
{
if (k > 3)
x = x*k;
}

 Distinct operators: if ( ) { } > < = * ;


 Distinct operands: k 2 3 x
 n1 = 8
 n2 = 4
 N1 = 10
 N2 = 7
Known weaknesses:
◦ Call depth not taken into account
 a program with a sequence 10 successive calls more
complex than one with 10 nested calls

◦ An if-then-else sequence given same weight as


a loop structure.
◦ Added complexity issues of nesting if-then-
else or loops not taken into account, etc.
McCabe’s Complexity Measures
 It simply measures the amount of decision logic in the
program module. Cyclomatic complexity gives all possible
paths through the module. Cyclomatic complexity is often
referred to as McCabe's complexity.
 It is important to testers because it provides an indication
of the amount of testing (including reviews) necessary to
practically avoid defects
 McCabe's complexity used to define minimum number of
test cases required for a module
 McCabe’s metrics are based on a control flow
representation of the program.
 A program graph is used to depict control flow.
 Nodes represent processing tasks (one or more code
statements)
 Edges represent control flow between nodes
Flow Graph Notation
While
Sequence

If-then-else Until
Cyclomatic Complexity
 Set of independent paths through the graph
(basis set)

 V(G) = E – N + 2
◦ E is the number of flow graph edges
◦ N is the number of nodes

 V(G) = P + 1
◦ P is the number of predicate nodes
Meaning
 V(G) is the number of (enclosed) regions/areas of the planar graph.
 The V(g) or cyclomatic number is a measure of the complexity of a
function which is correlated with difficulty in testing. The standard
value is between 1 and 10.
 A value of 1 means the code has no branching.

 Number of regions increases with the number of decision paths


and loops

 A quantitative measure of testing difficulty and an indication of


ultimate reliability

 Experimental data shows value of V(G) should be no more then 10


- testing is very difficulty above this value
Example
i = 0;
while (i<n-1) do
j = i + 1;
while (j<n) do
if A[i]<A[j] then
swap(A[i], A[j]);
end do;
i=i+1;
end do;
Flow Graph
1

7 4 5

6
Computing V(G)
 V(G) = 9 – 7 + 2 = 4
 V(G) = 3 + 1 = 4
 Basis Set
◦ 1, 7
◦ 1, 2, 6, 1, 7
◦ 1, 2, 3, 4, 5, 2, 6, 1, 7
◦ 1, 2, 3, 5, 2, 6, 1, 7
Another Example
1

2
4
3

5 6

8
What is V(G)?
Function Points
 Function point is emerged which measures the
size of a system from its functionality and
usability.

 History
 Non-code oriented size measure
 Developed by IBM (A.Albrecht) in 1979, 1983
 Now in use by more than 500 organizations
world-wide

 What are they?


 5 weighted functionality types
 14 complexity factors
Processing Complexity Adjustment

1) data communications
2) distributed functions Each rated on scales equivalent
3) performance to the following:
4) heavily used configuration
5) transaction rate Not present =0
6) on-line data entry Incidental Influence = 1
7) end user efficiency Moderate Influence = 2
8) on-line update Average Influence = 3
9) complex processing Significant Influence= 4
10) reusability Strong Influence = 5
11) installation ease
12) operational ease
13) multiple sites
14) facilitates change
Function Point Calculation

5 3

Function Counts = FC   x i w j
i  1 j 1

  14 
Function Points = FP  FC.65  .01  ck 
  k  1 

where
xi = function i
wj = weight j
ck = complexity factor k
Computing Function Points
Analyze information
domain of the Establish count for input domain and system
application and
interfaces
develop counts

Weight each count by Assign level of complexity (simple, average,


assessing complexity complex) or weight to each count

Assess the influence of Grade significance of external factors, F_i,


global factors that
such as reuse, concurrency, OS, ...
affect the application

FP = SUM(count x weight) x C
Compute function where
points complexity multiplier C = (0.65+0.01 x N)
degree of influence N = SUM(F_i)
Analyzing the Information
Domain
weighting factor
measurement parameter count simple avg. complex
number of user inputs X 3 4 6 =
number of user outputs X 4 5 7 =
number of user inquiries X 3 4 6 =
number of files X 7 10 15 =
number of ext.interfaces X 5 7 10 =
count-total
complexity multiplier
function points
Example: SafeHome Functionality
Test Sensor
Password
Zone Setting Sensors
Zone Inquiry

User Sensor Inquiry SafeHome Messages


System User
Sensor Status
Panic Button
(De)activate (De)activate

Monitor
Password, Alarm Alert and
Sensors, etc. Response
System
System
Config Data
Example: SafeHome FP Calc
weighting factor
measurement parameter count simple avg. complex
number of user inputs 3 X 3 4 6 = 9
number of user outputs 2 X 4 5 7 = 8
number of user inquiries 2 X 3 4 6 = 6
number of files 1 X 7 10 15 = 7

number of ext.interfaces 2 X 5 7 10 = 10
count-total 3
complexity multiplier 1.11
function points --
Attempt

 Compute the function point value for a project with the


following information domain characteristics:
◦ Number of user inputs: 32
◦ Number of user outputs: 60
◦ Number of user enquiries: 24
◦ Number of files: 8
◦ Number of external interfaces: 2
◦ Assume that weights are average and external complexity
adjustment values are not important.
Algorithmic Models
 Algorithmic which may be further broken
down in Discrete models, Power Function
models, Linear/non linear models,
Multiplicative models and others.
FORMER ESTIMATION MODELS
Description
Sr
Model Effort Expression
#

Where x1, x2…xo are 6 estimating factors out of


FARR & 2.86*x1+2.3*x2+33*x3- 13 factors incorporated by this model. This is
1 one of the earliest known model proposed
ZAGORSKI 17*x4+10*x5-188
around 1965.

NAVAL AIR NADC is similar to the 2nd equation of FARR


DEVELOPMEN 2.86*x1+2.3*x2+99*x3- & Zagorski and incorporates same 13
2 productivity factors, however the effort for
T CENTER 17*x4+10*x5-x0
(NADC) documentation is tripled.

The total job is divided into smaller tasks


which are classified as easy, medium and
Deliverable hard. The Effort for each category is given by
3 ARON Instruction/Instruction per deliverable instructions divided by
MM appropriate productivity and then summed
up to man months of other tasks to get total
effort

The model estimates the total man-months of


effort as a function of the line of code to be
produced.29 factors that are associated with
WALSTON &
4 5.2L0.91 productivity are used to calculate the
FELIX
productivity index and to obtain an equation
for estimating productivity of new projects
FORMAL MODELS CONT...
Doty incorporates 14 estimating
5.288(KDSI)1.047 for KDSI >10K factors (optional use) to estimate
5 DOTY the effort in person months or man
2.060(KDSI)1.047 for KDSI <10K
months.

Schneider has used Halstead’s


software science idea to estimate
6 SCHENIDER 0.3I1.83 effort in man-months and I is the
number of instruction given in
thousands
This model suggests a
development of project profile
based on major phases of software
development process and
assigning a percentage of total
7 KUSTANOWITZ Number of instructions/ productivity effort to each phase. Productivity
factors are then determined and
total man months of effort are
obtained by dividing number of
instructions by appropriate
productivity
Where K is the total effort (in PM),
S is the product size, td
corresponds to the time required
8 PUTNAM [S/Ck]3 * Td ¼ to develop the software. Ck is the
constant value which reflects
constraints on the basis of
working environments
FORMAL MODELS CONT…
S is effective software size, T is the development
time in years and Cte is Jensen’s technology
Constant that is the slight variation in the
Putnam’s Constant value
9 JENSEN 0.4[S/Ck]2 * 1/T 2

Cocomo suggests three models namely Basic,


Intermediate and Detailed and this effort
expression is used for Basic cocomo. The a and b
represents the mode values. Cocomo is considered
as complete and thoroughly documented model
10 COCOMO a(KLOC)b more than any other cost estimation model

Developed Lines DL represents a measure that is


the combination of total lines and new lines
BAILEY &
11 0.73*DL1.16+3.5
BASILI
 It was developed by Barry W Boehm in the year
1981.
 It is an algorithmic cost model.
 It is based on size of the project.
 The size of the project may vary depending upon
the function points .
 Basic cocomo
 used for relatively smaller projects .
 team size is considered to be small.
 Cost drivers depend upon size of the projects .

 Effort E = a * (KDSI) b
* EAF

Where KDSI is number of thousands of delivered source instructions a


and b are constants, may vary depending on size of the project .

 schedule S= c * (E) d where E is the Effort and c, d are


constants.

 EAF is called Effort Adjustment Factor which is 1 for basic


cocomo , this value may vary from 1 to 15.
Each of the 15 attributes receives a rating on a six-point scale that ranges from "very low" to "extra
high" (in importance or value). An effort multiplier from the table below applies to the rating. The
product of all effort multipliers results in an effort adjustment factor (EAF)
 Intermediate COCOMO
 It is used for medium sized projects.
 Cost drivers depend upon product reliability, database
size, execution and storage.
 Team size is medium.

 Advanced COCOMO
 It is used for large sized projects.
 The cost drivers depend upon requirements, analysis,
design, testing and maintenance.
 Team size is large.
 Organic mode projects used for relatively smaller teams and Project is
developed in familiar environment.
E=2.4(KDSI)1.05 E in person-months and S=2.5(E)0.38.

 Semidetached mode projects lies between organic mode and embedded


mode in terms of team size and consists of experienced and
inexperienced staff. Team members are unfamiliar with the system under
development.
E=3(KDSI)1.12 E in person-months. S=2.5(E)0.35

 Embedded mode projects with project environment is complex. Team


members are highly skilled. Team members are familiar with the system
under development.

E=3.6(KDSI)1.20 E in person-months. S=2.5(E)0.32.



COCOMO Models
COCOMO Models

7000
Organic
6000 Semidetached
Embedded
5000
Person-months

4000

3000

2000

1000

0
0 100 200 300 400 500 600
Thousands of lines of code
Solve
 Assume that the size of an organic type
software product has been estimated to
be 32,000 lines of source code. Assume
that the average salary of software
engineers be Rs. 15,000/- per month.
Determine the effort required to develop
the software product and the nominal
development time
Issues in Algorithmic Models
 Specific Input, Specific Output
 Parameter dependent
 Regression based
Sizing source code volumes
 On the basis of studies, the conversion
between LOC and function points is
possible.
Soft computing Amalgamation
Research Direction
 Divergent
 Single Machine learning technique on a single
estimation method
 Convergent
 Compound
Convergent
 Single machine learning methods applied on
same estimation method.
 For example:
 Roheet Bhatnagar et al. has compared
Development Time (DT’) by applying the
Feed Forward, Back Propagation Neural
Network model, Cascaded Feed Forward
Back Propagation Neural Network model,
Elman Back Propagation Neural Network
model, Layer Recurrent, Neural Network
model & Generalised Regression Neural
Network model on Lopez et al. dataset.
Paper discussion cont…
 They used MMRE, BRE, MdMRE and pred(z) as results
classification techniques.
Results of Bhatnagar et al.
Results Bhatnagar et al.
What can be added?
 Different Data set/s
 More assessment techniques for example
MRSE, Bayesian analysis etc.
 Use models results and compare their
results with ML techniques
Single Technique, Single Method
 When one machine learning technique is
applied to calibrate any existing model.
 Chohan et al. implemented COCOMO
with the help of artificial neural networks
and trained using the perceptron learning
algorithm. The COCOMO dataset is used
to train and to test the network. The test
results from the trained neural network
are compared with that of the
COCOMO model.
Chohan et al. cont…
 They used 17 inputs to the network which
are size of the project in KLOC, 15 effort
multipliers, actual effort of the project and
one bias value. These inputs enter the
network as weighted inputs. The effort is
calculated using following equation. The
weights are initialized as Wi = 1 for i = 1 to
17, learning rate, α = 0.001 and bias b = 1.
The inputs, as received, are multiplied to the
weights and provided to the network.
Study Results with MRE as
evaluation criteria
What can be added
 Calibrating the factors involved in
regression based equation.
 Involve more than one dataset to validate
the results
Divergent
 A single machine learning method applied
on various Estimation Techniques.
Divergent Example
 Ali Abbas et. al compared the results of effort
estimates of 11 former estimation methods and 3
function points based methods computed from 4
commercially available data sets. Neural net back
propagation with varying hidden layers is also used to
compare the effort results and to evaluate whether
former methods, function points or computational
intelligence technique has the potential to yield close
to realistic estimates. It is also intended to examine
the behavior of estimation methods by identifying the
unusualities among the data sets. Magnitude of
Relative Error MRE, Mean Magnitude of Relative
Error MMRE, pred (0.25) and Pred (0.50) are used for
computational work
What can be added
 Refine datasets and cluster them
according to similarity and apply the
machine learning technique.
 Apply some other technique than back
propagation neural network.
 Apply same/different machine learning on
linear and non linear based estimation
methods.
Ratio of other techniques for
prediction of different aspects
Other techniques and results
 Ziauddin et al. used to fuzzify input
parameters of COCOMO II model and the
result is defuzzified to get the resultant
Effort. Triangular fuzzy numbers are used to
represent the linguistic terms in COCOMO II
model. The results of this model are
compared with COCOMO II and Alaa Sheta
Model. The proposed model yields better
results in terms of MMRE, PRED(n) and VAF.
Other techniques cont…
 Pervinder et al. in paper Software Effort
Estimation Using Soft Computing Techniques
used Neuro-Fuzzy system as a soft
computing approach to generate model by
formulating the relationship based on its
training. In this paper, Neuro-Fuzzy technique
is used for software estimation modeling of
on NASA software project data and
performance of the developed models are
compared with the Halstead, Walston-Felix,
Bailey-Basili and Doty.
Other techniques cont…
 B. Kumar in Software Effort Estimation by Genetic Algorithm
Tuned Parameters of Modified Constructive Cost Model for
NASA Software Projects used COCOMO as algorithmic
model and an attempt is being made to validate the
soundness of genetic algorithm technique using NASA
project data. The main objective of this research is to
investigate the effect of crisp inputs and genetic algorithm
technique on the accuracy of system’s output when a
modified version of the famous COCOMO model applied to
the NASA dataset. Proposed model validated by using 5 out
of 18 NASA project dataset. Empirical results show that
modified COCOMO for software effort estimates resulted
in slightly better as compared with prevoius literature. The
proposed model successfully improves the performance of
the estimated effort with respect to the Variance Account
For (VAF) criteria, MMRE and Pred.
So far Now:
 Mostly the COCOMO model is
calibrated or validated by applying neural
networks, fuzzy logic and genetic
algorithm mostly. The evaluation criteria
seen so far based on MRE, MdMRE,
MMRE, Pred(z).
What can be added:
 Apply same techniques on other than COCOMO
algorithm method.
 Apply HRV prediction and clustering techniques on
software estimation available datasets and available
former estimation methods.(next page)
 Qualitatively the management aspect of software
estimation is not amalgamated with machine learning
techniques. For example Genetic algorithm may be
applied to identify the best working team for any
particular software development aspect.
 Or devise an hierarchy of activities to be estimated in
order. Avoid un-necessary estimation activities. One
such research appears in
 APPLICATION OF 80/20 RULE IN SOFTWARE ENGINEERING
WATERFALL MODEL by Muzaffar iqbal.
For the application of HRV techniques
 One HRV technique on all dataset (results variation reasons may be
identified and best technique may be inferred)
 One HRV technique on single dataset(results may be compared with
other algorithmic models)
 Multiple HRV techniques on all datasets(Results may be discussed
on the basis of HRV techniques applied)
 Multiple HRV techniques on single dataset( identify the best
technique )
 Categorize the dataset according to closely related observations
and apply techniques(extension of case 1, where the loosely
connected dataset yield bad results)
 Cluster the same datasets and apply the HRV techniques.
techniques(extension of case 1, where the loosely connected
dataset yield bad results)
 Apply neural network and genetic algorithms along with HRV
techniques on single or all datasets(allow literature to be compared
with the current results)
Commercially available datasets
Albrecht Data set
Dataset for Projects in 4GL
Kemerer Dataset
Bailey and Basili/ NASA Dataset
Hallmark Dataset

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy