100% found this document useful (1 vote)
323 views307 pages

Distribution Models-Theory PDF

Uploaded by

Gyogi Mitsuta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
323 views307 pages

Distribution Models-Theory PDF

Uploaded by

Gyogi Mitsuta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 307

Rafael Herrerias Pleguezuelo

Jose Callejon Cespedes


Jose Manuel Herrerias Velasco
editors

W.
b -a
m a
d a
M i I i I\I II I • i i i i I I |
I ll!!!MI'!iii:m
* I *
I 9

DISTRIBUTION
MODELS THEORY
DISTRIBUTION
MODELS THEORY
*3

DISTRIBUTION
MODELS THEORY
a

til sS

m *• a -a
B
* * • 43
6 • i3
# | ° 1 a 1

.jll!l!U|i|i|ii!j!

P
• • ** •* z
«3
••
*a
p
m
•0 4 - Eh
erf/fore •

Rafael Herrerias Pleguezuelo


Jose Callejon Gespedes
Jose Manuel Herrerias Velasco
University of Granada, Spain

\jjp World Scientific


NEW JERSEY • LONDON • SINGAPORE • BEIJING • SHANGHAI • HONG KONG • TAIPEI • CHENNAI
Published by
World Scientific Publishing Co. Pte. Ltd.
5 Toh Tuck Link, Singapore 596224
USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601
UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE

Library of Congress Cataloging-in-Publication Data


Distribution models theory / edited by Rafael Herrerias-Pleguezuelo,
Jose1 Callejdn-Cespedes, and Jose' Manuel Herrerias-Velasco.
p. cm.
Includes bibliographical references and index.
ISBN 981-256-900-6 (alk. paper)
I. Model theory. 2. Distribution (Probability theory). I. Herrerias-Pleguezuelo, Rafael.
II. Callej6n-C£spedes, Jos£ III. Herrerias-Velasco, iosi Manuel.

QA9.7.D58 2006
511.3'4-dc22 2006048221

British Library Cataloguing-in-Publication Data


A catalogue record for this book is available from the British Library.

Copyright © 2006 by World Scientific Publishing Co. Pte. Ltd.


All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means,
electronic or mechanical, including photocopying, recording or any information storage and retrieval
system now known or to be invented, without written permission from the Publisher.

For photocopying of material in this volume, please pay a copying fee through the Copyright
Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to
photocopy is not required from the publisher.

Printed in Singapore by World Scientific Printers (S) Re Ltd


Preface

The monograph contains a compilation of papers previously


chosen by the Scientific Committee of the Fifth Workshop of
Spanish Scientific Association of Applied Economy on
Distribution Models Theory held in Granada (Spain) in September
2005.
As editors, we endeavored to give a high scientific level in this
volume. All the papers have been carefully selected, revised and
presented at a high level. Therefore, this volume offers a
compulsory point of reference on models theory for statisticians,
economists, mathematics and in general for all researchers who are
working on models theory and are eager to know the most recent
advances from methodological and practical points of view.
Among the authors we appreciate the efforts of Prof, van Dorp
who has made possible for us to include with pleasure his paper
coauthored with Prof. Samuel Kotz, Editor-in-Chief of the
Encyclopedia of Statistical Sciences. We would also like to
acknowledge warmly all those who, through their papers, have
contributed to make this high-quality volume possible. To all of
them, thank you very much.

Rafael Herrerias Pleguezuelo


Jose Callejon Cespedes
Jose Manuel Herrerias Velasco
University of Granada, Spain
April 2006
Contents

Preface v

Chapter 1
Modeling Income Distributions Using Elevated Distributions
on a Bounded Domain 1
J.R. van Dorp and S. Kotz

Chapter 2
Making Copulas Under Uncertainty 27
C. Garcia Garcia, J.M. Herrerias Velasco and
J.E. Trinidad Segovia

Chapter 3
Valuation Method of the Two Survival Functions 55
M. Franco Nicolas, R. Herrerias Pleguezuelo, J. Callejon Cespedes
and J.M. Vivo Molina

Chapter 4
Weighting Tools and Alternative Techniques to Generate
Weighted Probability Models in Valuation Theory 67
M. Franco Nicolas and J.M. Vivo Molina

Chapter 5
O n Generating and Characterizing Some Discrete and
Continuous Distributions 85
M.A. Fajardo Caldera and J. Perez Mayo
viii Contents

Chapter 6
Some Stochastic Properties in Sampling from the Normal
Distribution 101
J.M. Fernandez Ponce, T. Gomez Gomez, J.L. Pino Mejfas
andR. Rodriguez Grinolo

Chapter 7
Generating Function and Polarization 111
R.M. Garcia Fernandez

Chapter 8
A New Measure of Dissimilarity Between Distributions:
Application to the Analysis of Income Distributions
Convergence in the European Union 125
F.J. Callealta Barroso

Chapter 9
Using the Gamma Distribution to Fit Fecundity Curves for
Application in Andalusia (Spain) 161
F. Abad Montes, M.D. Huete Morales and M. Vargas Jimenez

Chapter 10
Classes of Bivariate Distributions with Normal and Lognormal
Conditionals: A Brief Revision 173
J.M. Sarabia, E Castillo, M. Pascual andM. Sarabia

Chapter 11
Inequality Measures, Lorenz Curves and Generating Functions 189
/ / . Nunez Velazquez

Chapter 12
Extended Waring Bivariate Distribution 221
/ . RodriguezAvi, A. Conde Sanchez, A.J. Saez Castillo
and M.J. Olmo Jimenez
Contents ix

Chapter 13
Applying a Bayesian Hierarchical Model in Actuarial Science:
Inference and Ratemaking 233
J.M. Perez Sanchez, J. M. Sarabia Alegria, E. Gomez Deniz
and F.J. Vazquez Polo

Chapter 14
Analysis of the Empirical Distribution of the Residuals Derived
from Fitting the Heligman and Pollard Curve to Mortality Data 243
F. Abad Montes, M.D. Huete Morales andM. Vargas Jimenez

Chapter 15
Measuring the Efficiency of the Spanish Banking Sector:
Super-Efficiency and Profitability 285
/ Gomez Garcia, J. Solana Ibanez andJ. C. Gomez Gallego
Chapter 1

MODELING INCOME DISTRIBUTIONS USING ELEVATED


DISTRIBUTIONS ON A BOUNDED DOMAIN

J. RENE VAN DORP


Engineering Management and Systems Engineering Department
The George Washington University
1776 G street, Suite 110, NW, Washington DC, 20052

SAMUEL KOTZ
Engineering Management and Systems Engineering Department
The George Washington University
1776 G street, Suite 110, NW, Washington DC, 20052

This paper presents a new two parameter family of continuous distribution on a bounded
domain which has an elevated but finite density value at its lower bound. Such a
characteristic appears to be useful, for example, when representing income distributions
at lower income ranges. The family generalizes the one parameter Topp and Leone
distribution originated in the 1950's and recently rediscovered. The family of beta
distributions has been used for modeling bounded income distribution phenomena, but it
only allows for an infinite and zero density values at its lower bound, and a constant
density of 1 in case of its uniform member. The proposed family alleviates this apparent
jump discontinuity at the lower bound. The U.S. Income distribution data for the year
2001 is used to fit distributions for Caucasian (Non-Hispanic), Hispanic and African-
American populations via a maximum likelihood procedure. The results reveal stochastic
ordering when comparing the Caucasian (Non-Hispanic) income distribution to that of
the Hispanic or African-American population. The latter indicates that although
substantial advances have reportedly been made in reducing the income distribution gap
amongst different ethnic groups in the U.S. during the last 20 years or so, these
differences still exist.

1. Introduction
In a 1955 issue of the Journal of the American Statistical Association an
isolated paper on a bounded continuous distribution by Topp and Leone [1]
appeared which received little attention. The paper was re-discovered by
Nadarajah and Kotz [2] and motivated by investigations of van Dorp and Kotz
[3,4] on the Two-Sided Power (TSP) distribution and other alternatives to the

l
2 J.R. van Dorp andS. Kotz

popular and versatile beta distribution which has been used in various
applications for over a century. Even in the late nineties of the 20th century the
arsenal of bounded univariate distributions contained very few members.
Amongst them, the triangular and uniform distribution are the most widely used
together with some "curious" distributions appearing as problems or exercises in
various Mathematical and Statistical journals. Other, somewhat artificial
empirical bounded continuous distributions are based on mathematical
transformations of the normal distribution (of an unbounded domain) - the most
wide spread amongst them are perhaps the Johnson [5] family of
transformations. On the other hand the existence of multitudes of unbounded
continuous distributions developed in the 20th century is well known and amply
documented.
The construction of the Topp and Leone distribution is quite straightforward
and based on the principle that by raising an arbitrary cdf F(x) e [ 0,1J to an
arbitrary power /? > 0 , a new cdf G(x) = F^ (x) emerges with one additional
parameter. This devise was used in 1939 by W. Weibull [6] proposing his
Weibull distribution, which has achieved substantial popularity the second part
of the 20th century, especially in reliability and biometrical applications. The
cdf F(x) in the above construction method may be referred to as the generating
cdf. Figure 1 demonstrates the construction of the Topp and Leone distribution.
The generating density of the Topp and Leone family is the right triangular
density ( 2 - 2 x ) , x e [0,1 ]. It is displayed in Figure 1A. Figure IB depicts its
cdf ( 2 x - 2 x 2 ) and Figures 2C and 2D plot the pdf and cdf of a one parameter
Topp and Leone distribution for/? = 3 . Note, the appearance of a mode in the

Figure 1. Construction of Topp and Leone distribution from a right triangular distribution
Using Elevated Distributions on a Bounded Domain 3

pdf presented in Figure IC due to S-shapedness of the corresponding cdf in


Figure ID obtained by using a cdf transformation with /? > 1. Topp and Leone's
[1] original interest focused on the construction of J-shaped distributions
utilizing similar cdf transformations with 0 < ft < 1; They have fitted their
distribution to transmitter tubes failure data. Nadarajah and Kotz [2] showed that
the J-shaped Topp and Leone distributions exhibit a bath tub failure rate
functions with natural applications in reliability.
Our generalization of the Topp and Leone distribution (GTL) utilizes a
slightly more general slope distribution with pdf a x - 2 ( a - l ) x , 0 < a < 2 , as
the generating density (see Figure 2A with a = 1.5), where x e ( 0 , 1 ) . Slope
distributions possess linear pdf's and play a central role in deriving a
generalization of the trapezoidal distribution (see, e.g., Van Dorp and Kotz [7]).
From the restriction that a x - 2 ( a - l ) x > 0 for all x e ( 0 , l ) , it follows that
0 < a < 2 . For a e [ 0,1) ( a e (1 > 2 ] ) , the slope of the pdf is increasing
(decreasing). For a-\, the slope distribution (1) simplifies to a uniform
distribution on ( 0 , 1 ) . Figure 2B plots the cdf of the linear pdf in Figure 2 A.

1.6 •
1.4 • 0.8'
1.2.
1 - U, 0 . 6 -
Q 0.8-
"•0.6.
2 0.4-
0.4' 2
0.2-
a-2(a-l)x 0.2- / ax- (a-l)x

. 0 0.25 0.5 0.75 1 ) 0.25 0.5 0.75 1
" X B X

1.6 •
3{ax -(a-l)x2}2x^ v 0.8-
{ax -(a -l)x2}3/
1.4 -
1.2- {a-2(a-l)x}/
H, 0 . 6 '
h, 1 •

§ 0.8-
0.6.
So,-
0.4 • 0.2-
0.2'

_ 0 0.25 0.5 0.75 1 3 0.25 0.5 0.75 1


« X D X

Figure 2. Construction of generalized Topp and Leone distribution from a slope distribution

Now the Generalized Topp and Leone (GTL) distribution that follows from
Figure 2B (utilizing the above construction method with /? = 3) is depicted in
Figure 2D. The density associated with this cdf is displayed in Figure 2C. Note
4 J.R. van Dorp andS. Kotz

that, while a mode in (0,1) is present in Figure 2C, it has been shifted to the
right when compared to the situation in Figure 1C. More importantly, the
density at the upper bound is strictly positive in Figure 2C while being zero in
Figure 1C (representing the original Topp and Leone density).
Our main interest in this paper is to represent income distributions. We shall
therefore consider the reflected version of the Generalized Topp and Leone
(GTL) distribution utilizing the cdf transformation H(x) = l - G ( l - x ) , where G
is a GTL cdf on [ 0,1 ]. The latter transformation typically assigns the mode
towards the left hand side of its support and allows for strictly positive density
values at the lower bound. This form seems to be appropriate when representing
income distributions at lower income ranges. (Compare, e.g., with Figure 2 of
Barsky et al. [8], p. 668). The U.S. Income distribution data for the year 2001 is
used to fit Reflected GTL (RGTL) distributions for Caucasian (Non-Hispanic),
Hispanic and African-American populations via a maximum likelihood
procedure. The results reveal stochastic ordering when comparing the Caucasian
(Non-Hispanic) income distribution to that of the Hispanic or African-American
populations. In particular when comparing Americans of Caucasian Origin,
African-Americans appear to be approximately 1.9 times as likely and the
Hispanics 1.5 times as likely to have inadequate or no income at all. The latter
indicates that although substantial advances have indeed occurred in reducing
the income distribution gap amongst different ethnic groups in the U.S. during
the last 20 years or so (see, e.g., Couch and Daly [9]), these differences still
exist.
Another reason to consider reflected GTL distribution rather than GTL
distributions is that a drift of the mode towards the left hand side mimics the
behavior of the classical unbounded continuous distributions such as the
Gamma, Weibull and Lognormal. (We note, in passing, that these three
distributions are in a strong competition amongst themselves as to which is the
best one for fitting numerous phenomena in economics, engineering and medical
applications). One can therefore conjecture that application of Reflected GTL
(RGTL) distributions may not be limited to the area of income distributions.
In Section 2, we shall present the cdf and pdf of a four parameter RGTL
distribution and investigate its various forms. In Section 3, we will elaborate on
some properties of RGTL distributions. Moment expressions for RGTL
distributions, to the best of our knowledge, cannot be derived in closed form
(except for certain special cases). The cdf of the beta distribution while not
available in a closed form (whereas that of an RGTL distribution is) is, however,
useful for calculating moments of RGTL distributions for 1 < a < 2. In Section 4,
we shall discuss a Maximum Likelihood Estimation (MLE) procedure utilizing
Using Elevated Distributions on a Bounded Domain 5

standard root finding algorithms that are readily available in various software
packages such as e.g. Microsoft Excel. In Section 5, we shall fit RGTL
distributions to the U.S. 2001 income distribution data with seemingly
satisfactory results. Some brief concluding remarks are presented in Section 6.

2. Cumulative distribution function and density function


The four parameter RGTL distribution with support [ a, b J the cdf

F(x|a,b,a,/ff) = l-
b-x^r . .jb-x^
a-(a-l) - (1)
> - a i,y
where a < x < b , 0 < a < 2 and P>0 Evidently, F(a) = 0 and F(b) = 1. The
probability density function (pdf) follows from (1) to be

(2)
2 /b x
«-<«-'>(BfV «<-' ~
with the same constraint on x, a and P as in (1) . From (2) it follows that in
particular
f(a|a,b,«,/J) = £ £ = ^ (3)
b-a
and
0 p>\
Pa
f(b| a, b, «,/?) = fl = \ (4)
b-a
-»ooasxTb p <\
Relation (3) shows that the RGTL family allows for arbitrary density values
at its lower bound a. Expressions (1) and (2) are reduced to the Topp and Leone
distribution (see Topp and Leone [1]):

by setting a = 0 , b = 2 and utilizing the reflection transformation


Y = b + a - X . Figure 1C depicts a graph of the Topp and Leone distribution
with parameters b = 1 and P-3 in (5). Figure 3A displays its reflected
version. Note the transition in the form of graphical representations of the pdf's
from Figure 3B to Figure 3D which all have the same value of a with
decreasing values of p .
6 J.R. van Dorp andS. Kotz

3'
1.5- /^~- N^- L 2.5-

ft, 2'
Sl.5.
\
0.5- 1 •
/ r
\ 0.5'

« 0 0.25 0.5 0.75 1 J 0.25 0.5 0.75


A
x a = 2, p = 3 B x a=1.5, P = 6

1.2 ^ _ L 1.4 • ' ' ~s"


1.2 •
1 \^ ^ ^ ~
1 •
[«, 0.8
Q 0.8- ^^^
5 0.6 ;
°-0.6.
0.4-
0.4 •
0.2 • 0.2 •
^ \
c 0 0.25 0.5
*
0.75
a =1.5, P = 2
D 3 0.25 0.5
x
0.75
a=1.5, P = 1

3- 1.4-

2.5-
1.2 • *^--j 1
1 •
U, 2-
Q 0.8-
Sl.5- "•0.6.
1 --N^ 1
0.4-
0.5- 0.2-
^ [
0 0.25 0.5 0.75 ) 0.25 0.5 0.75
E x a = 0.5,p = 2
F x a = 0.5, p = 1

1.4 • A'
3.5-
1.2 •
3'
It,
1 "—=- - — _ _ L L - ^ •», 2 . 5 '
« 0.8
§ 2-
"-0.6-
1.5 •
0.4 •
1 •
0.2 • 0.5 •

D 0.25 0.5 0.75


G x a = 0.5, P = 0.''5 H ) 0.25 0.5 0.75
x a = 0.5,p = 0. 25
1

Figure 3. Examples of Standard RGTL distributions (a = 0 , b = 1):


A: a = 2, p = 3; B: a = 1.5, p = 6; C: a = 1.5, p = 2; D: <x= 1.5, p = 1
E: a = 0.5, p = 2; F: a = 0.5, p = 1; G: a = 0.5, p = 0.75; H: a = 0.5, P = 0.25

Note that in case of Figure 3B the pdf assumes a similar form to that of a
reliability function whereas Figure 3C displays a mode at a value greater than 0.
Similarly in Figures 3E to 3H the pdf's with the same value of a (= 0.5) with
Using Elevated Distributions on a Bounded Domain 1

progressively decreasing p from 2 to 0.25, indicate the change in form of the pdf
from a monotonically decreasing concave form, a linear function with
decreasing slope, a mild U-shaped function, up to a monotonically increasing
convex curve.
The J-shaped form of the pdf in Figure 3E (a = 0 , b = 1 , a = 0.5 , p = 2 )
resembles that of a Weibull distribution with the shape parameter less than one
(but on a bounded domain). Note that the structure of (1) is reminiscent to that
of the Weibull cdf. Figures 3G and 3H depict a U-shaped pdf form ( a = 0 , b = 1 ,
a = 0.5 , P = 0.75 ) and a J-shaped pdf form ( a = 0 , b = l , a = 0.5,P = 0.25 )
respectively, and are similar to those appearing in the beta family, but with a
bounded density value at its lower bound (cf. (3)). Setting a = 1 , p = 1 in (2)
yields a uniform distribution on [ a, b J. Hence, analogously to the four
parameter beta distribution with the pdf
Y{a + P) fx-a] [b-xx
(6)
r(a)r(yff)(b-aHb-a b-a
where a < x < b , a > 0 and / ? > 0 and the Two-Sided Power family (see van
Dorp and Kotz [3,4]) with the pdf
N. n-l
x-.a I
a < x <m

b-x 1 ^ ,
b-a ^b-m/ m<x<b
where n > 0 , the RGTL family has the uniform distribution on [ a, b ] as one of
its members. Another common member amongst these 3 families (Beta, TSP
and RGTL) is the reflected power (RP) distribution on [ a, b ] the pdf
b-xx
f(x|a,b,a,/?): (8)
b- b-a
obtained by substituting a = 1 in (2). Substituting a = 0 in (2) also yields the
reflected power distribution but with parameter 2p. The reader is encouraged to
construct diagrams connecting the above cited distributions.
A distinguishing feature amongst RGTL distributions, compared with
distributions (6) and (9), is the existence of additional pdf forms with a positive
density value at its lower bound (see Figures 3B-3H) allowing representation of
uncertain phenomena with such a property. Another feature of RGTL
distribution (indicating a lesser flexibility within the same family) is that the
pdf's of a GTL distributions and its reflections possess different functional
forms, whereas the reflection of a TSP pdf as well as a beta pdf belong to the
same functional family.
8 J.R. van Dorp and S. Kotz

3. Properties of Standard RGTL distributions


We shall provide some properties of the Standard RGTL (SRGTL)
distributions (setting a = 0 and b = 1 in (1) and (2) with the cdf
F(x|a, y 0) = l - ( l - x ) / , { a - ( a - l ) ( l - x ) } / ? (9)
and the pdf
f(x|a,^) = ^ ( l - x ) ^ 1 x
{a-(a-l)(l-x)}^1{a-2(a-l)(l-x)}
where 0 < a < 2 and ft > 0 .Results may be extended to the general forms of (1)
and (2) by means of a simple linear transformation.
Limiting Distributions
It immediately follows from (9) that the pdf (10) converges to a degenerate
distribution with a probability mass of 1 at a (b) when /? -»oo (p iO)
regardless of the value of a.
Stochastic Dominance Properties
Note that for p = 1 (9) simplifies to a slope distribution with the cdf
F(x|a,/? = l) = l - ^ ( l - x ) - ( a - l ) ( l - x ) 2 ) (11)
which is stochastically decreasing in a, i.e.,
ax <a2, x e ( 0 , l ) => F(x|a,,>9 = l)>F(x|a' 2 ,y? = l) (12)
Let now /?, > fi2 > 0. From (12) it follows that for all x e ( 0 , 1 ) and for any
P\
1 - { l - F ( x | a , , p = 1)}A > 1 - { l - F ( x | a2,p = 1)}A (13)
From the fact that the function z a is a decreasing function in a for
z e (0,1) it follows from /?, > /?2 > 0 that
1 - { l - F ( x | cc2,p = 1)}A > 1 - { l - F ( x | a2,p = l)f 2 (14)
However, simple algebra shows that
F(x|a,/?) = l - { l - F ( x | a , / ? = l ) f (15)
where F(x| a, P), F(x| a,P = 1) are given by (9) and (11), respectively, which
together with (13) and (14) implies
al<a2,pi>p2,xe{0,l)=> F(x| « 1 , y 9,)>F(x| a2,p2) (16)
Hence, RGTL distributions are stochastically increasing in a and
stochastically decreasing in p. This seems to be an interesting property
shedding an additional light on the meaning of the parameters a and p in (9)
and (10), especially in applications. Note that, relation (16) could be verbally
Using Elevated Distributions on a Bounded Domain 9

expressed as connecting the generating cdf F(x| a, p = 1) with the generated


one, i.e. ¥(x\a,P).
Mode Analysis
As it was already mentioned for /? = 1 and a = 1 the pdf (10) simplifies to
a uniform [0,l] density. For a = \, p*\ the pdf (10) becomes a RP
distribution (cf. (8) ) with a finite mode at 0 with value p > 1 and an infinite
mode at 1 for /? < 1. Taking the derivative of (10) with respect to x we have

V H
' = C(x\a,p)f(x\a,P) (17)
dx
where the multiplier
C(x\a,P) =

(«-D V^-(/?-i), { % 2 ( g ~ 1 ) ( 1 7 ) } , (18)


a-2(or-l)(l-x) (l-x ){a-(a-l)(l-x)}
is a linear function in p . From the relations
f(x|a,b,ar,y0)>O
{a-2(a-l)(l-x)}>0 (19)
{a-(a-l)(l-x)}>0
for a e [0,2 ] and p > 1 it follows from (17) and (18) that the following four
additional cases should be considered: Case 1 : 0 < or < 1 , P>\; Case 2 :
1 < a < 2 , P < 1; Case 3 : 1 < « < 2 , / ? > 1 ; Case 4 : 0 < a < 1 , /? < 1
Case 1 : 0 < a < 1, /? > 1: see figures 3E and 3F :
From (17), (18) and (19) it follows that the SRGTL pdf (10) is strictly
decreasing on [ 0,1J and hence possesses a mode at 0 with the value /?(2 - a)
(cf. (3) ). For example, setting a = 0.5 and /? = 2 (as in Figure 3E) yields a
mode at 0 with value 3. Setting a = 0.5 and p = 1 (as in Figure 3F) yields a
mode at 0 with value 1.5.
Case 2 : 1 < a < 2 , /? < 1: See Figure 3D:
From (17), (18) and (19) it follows that the SRGTL pdf (10) is strictly
increasing on [0,1 ].
From (4) it follows that the pdf (10) has an infinite mode at lfor p < 1 and
a finite mode at 1 for P = 1. Setting a = 1.5 and P = \ (as in Figure 3D) yields
a finite mode at 1 with value 1.5.
Case 3 : 1 < a < 2 , p > 1: See Figures 3A, 3B and 3C:
10 J.R. van Dorp and S. Kotz

This seems to be the most interesting case. From (17), (18) and (19)
it follows that the SRGTL pdf (12) may possess a mode in ( 0 , 1 ) . Defining
y = 1 - x and setting the derivative (17) to zero yields the following quadratic
equation in y

2 (or-l)2y2-2a(«-i)y + =0 (20)
2/?-l

(The left hand side of (20) is a parabolic function in y). Noting that the
symmetry axis of the parabola associated with the l.h.s. of (20) has the value

-*— (21)
2(«-l)
which is strictly greater than 1 for a > 1, and that y = 1 - x e [ 0,1 ] <=>
x e [ 0,1 ], it follows that out of the two possible solutions of (20) only the
solution
1
y =• 1- (22)
2(a-l) [ \ 2/9 — 1J
can yield a mode x* e ( 0 , 1 ) . Moreover, from 1 < a < 2 , /? > 1 it follows that
y > 0. Also, from (22) we have that y -»• a > 1 for 1 < a < 2 when
3
2(a-l)
P —> oo . Hence, from (22) we conclude that the mode x = 1 - y is
1 ( i —1: — A
:Max o,- a 1+ (23)
2 ( a - l ) I ^ \ 2/9-1
Setting a = 1.5 and P=2 (as in Figure 3C) yields
x* = Max [o,-i + i V J ] « 0.366. Setting a = 1.5 and /? = 6 (as in Figure 3B)
yields x* =Max[0,-- + — 7 l l J = 0 and hence a mode is located at the lower
bound 0 with value / ? ( « - 2 ) = 3 (cf. (3) with a = 0, b= 1). Utilizing (23) it
follows that a Standard Reflected Topp and Leone distribution ( a = 2) has a
mode at

2/7-1
1
for p > 1. Setting /? = 3 (as in Figure 3A) yields a mode at — v5 » 0.447

Case 4 : 0 < a < 1 , /? < 1 : See Figures 3G and 3H:


Using Elevated Distributions on a Bounded Domain 11

Similarly to Case 2 it follows that the pdf (10) has an infinite mode at lfor
0 < a < 1 , p < 1. However, from (17), (18) and (19) it follows that the pdf (10)
may also have an anti-mode x e ( 0 , 1 ) (resulting in a U-shaped form) in this
case. The formula for the anti-mode is also given by (23) provided /? > - . For
example, setting a = 0.5 , /? = 0.75 (as in Figure 3G) yields
x* = M a x [ o , - - i v 2 j and hence an anti-mode at approximately 0.793. For

P < — (as in Figure 3H) the anti-mode of an RGTL distribution occurs at


x* = 0, with value /?(2 - a) (cf. (3) with a = 0 , b = 1).

Failure Rate
The failure rate function r(t) = f(t)/{ l-F(t)} for an SRGTL density
follows from (9) and (10) to be
D(a,x)-£- (24)
1-x
where
N «-2(a-l)(l-x)
D ( o r x )
' = i iw, / <25>
a-(a-l)(l-x)
and it is straightforward to check that /?/(l - x) is the failure rate of a standard
reflected power (SRP) distribution ( (10) with a = 1). From (24) it follows that
D(«, x) may be interpreted as the relative increase (or decrease) in the failure
rate of an SRGTL distribution as compared to a SRP distribution. Taking the
derivative of (25) with respect to x yields
dD(a,x) a\\-a)
dx {a-(ar-l)(l-x)}2
Hence, D(l,x) = l for all x e [ o , l ] and it follows from (26) that
D ( a , x ) < l (>1) for all x e [ 0 , l ] when 1 < a < 2(0<a < 1). Thus, a may be
interpreted as a failure deceleration parameter (relative to the reflected standard
power distribution) when 1 < a < 2 and a failure acceleration parameter when
0 < a < 1. On the other hand, (24) shows that p is a failure acceleration
parameter for all p > 0 .
Cumulative Moments
Due to the functional form of the cdf (9) calculations of cumulative
moments
Mk=|^xk(l-F(x))dx (27)
12 J.R. van Dorp and S. Kotz

for SRGTL distributions have a slight advantage over that of central moments
about the mean. The mean JU[ and the central moments about the mean ju2
(variance), /i 3 (skewness) and // 4 (kurtosis) are connected with the cumulative
moments M k , k = 1,...,4, via
tt'=M0
H2 = 2M, - M 0
(28)
/i 3 = 3 M 2 - 6 M , M 0 + 2 M 0 3
M4 = 4 M 3 - 1 2 M 2 M 0 +12M!M 0 2 - 3 M 0 4
(see, e.g., Stuart and Ord [10]). The cumulative moments M k for SRGTL
distributions follow from (9) and (27) to be
JJ Oi'x k ( l - x ) / , { a - ( a - l ) ( l - x ) } " d x =
k fk" i + (29)
= 1 {-iya^ y 'h-^~^ dx
i=0
For a = 1, expression (29) simplifies to that of the cumulative moments of
an SRP distribution (cf. (10) with a -1). For a e ( l , 2 J , the cumulative
moments can be expressed utilizing the incomplete Beta function
T(a + b)
B(x | «,/?) = (30)
r(a)r(b){ 0 x p a - | (l-p) b " , dp

as
or-1
yS+i+l B P + i + 1,/3 + 1
M (-!)'<*' (31)
i=<i; a-I B '(y9 + i + l,/? + l)

where B(a,b) = is the Beta function. Numerical routines for


V
T(a + b)
evaluating the incomplete Beta function (30) are well known for a long time and
are provided in standard PC software such as e.g. Microsoft Excel. However, for
a e ( 0 , 1 ) expression (29) cannot be further simplified and one has to resort to
numerical integration. For the moments of the original Topp and Leone [1]
distribution (cf. (5) ) the cumulative moments were derived by Nadarajah and
Kotz [2]. For a e (1,2 ], we have for the cumulative moments M 0 , M,, M 2
and M i
Using Elevated Distributions on a Bounded Domain 13

a-\
/?+i B ft + Ufi + l
U0=ap a
a-\ B - ' C ^ + l ^ + l)

M, =M0-a fp [ a \
a-\\
fi+2
\i~ P + 2,p + l)
V-\p + 2,P + \)
(32)
a-\
B P + 3,P + l
M2 = - M 0 + 2 M , +
B-l(p + 3,P + l)

•) ^ + 4 P + 4,P + l)
{ Ka
B-l(p + 4,p + \)

Substituting a = 2 , in (32) yields the mean //,' = M 0 =4 / ? B(/? + l,/?+l)


of a Standard Reflected Topp and Leone (SRTL) distribution and hence
\-4^B(P + \,p + \) is the mean of a Standard Topp and Leone (STL)
distribution on ( 0 , 1 ) (see, Nadarajah and Kotz [2]).

Inverse Cumulative Distribution Function


Utilizing the inverse cdf technique random samples from RGTL
distributions may straightforwardly be generated. From (9) we derive that
(l - F~' (z | a, P)f, z e [ 0,1 ] is one of the roots of the quadratic equation in y
(a-l)y2-ay + /}yfr^=:0 (33)
Noting that (similarly to equation (20) ) the symmetry axis associated with
the l.h.s. of the quadratic (33) has a value (21) which is strictly larger than 1 for
1 < a < 2, it follows that out of the two solutions of (33) only the solution

~ 2(or-l)
_1
can yield | l - F (z | a, /?) j e [ 0,1 ]. Analogously, it follows that for 0 < a < 1
only the solution
14 J.R. van Dorp and S. Kotz

a + ^a2 -4(a-l)^/l^z:
2(o-l)
can result in {1 - F _ 1 (z | a, f3)\e [ 0,1 ] . Hence, we have
a—la1 - 4(a-1)^/1^1
l<or<2
2(or-l)
F- I (z|a,/0 = 1-^/T a =1 (34)
2
« + A /a -4(or-l)^/T^z
1- 0<a<l
2(a-l)
where the case a = 1 follows from the cdf of a standard reflected power (SRP)
distribution ( a = 1 in (9)).

4. Maximum likelihood estimation


Below we shall discuss an approximate MLE procedure for a total of N
observations grouped in m intervals [ X J . ^ X J ] with n{ observations each and
interval mean values Xj, where x 0 = 0 , x m =1 and
m
N = 2>i
i=l

The data described above may be summarized in an m-vector x whose


elements are the interval mean values and an m-vector n containing the number
of observations in each interval. The approximate MLE procedure below may
easily be modified to a non-approximate MLE procedure utilizing order
statistics, but here our approach is tailored to the format of the income
distribution data to be presented in Table 1. The approximate MLE procedure
will assume that the probability mass is concentrated at the interval mean X; of
the intervals [ x ^ ^ x j . Utilizing (10) we have the likelihood L(or,/?|x,n) to
be proportional to

fi"fl\{*yi-(a-l)yi2Y~l{a-2{a-l)yi} (35)
i=lL
where
Yi = l - x , (36)
Instead of maximizing L(a, ft | x, n) we may equivalently maximize the
log-likelihood. Taking the logarithm of (35) and calculating the derivative with
respect to /? we obtain
4 + In i Ln{ a; y 1 -(a-l)y i 2 } (37)
p 1=1
Using Elevated Distributions on a Bounded Domain 15

It follows from (37) that

1
/? = N (38)
(a-l)y/
is the unique MLE of ft given a particular value of a . Taking the logarithm of
(35) and calculating the derivative with respect to a , one obtains
ni(l-yi) . 2 ni(l-2yi)
(£-1)1 -+£ (39)
i=i a - ( a - l ) y j i=i a - 2 ( a - l ) y j
Substituting (38) into (39) (utilizing /? instead of p and expressing f5 in
terms of a ) the following function *¥(a) is derived:

x N g "i(l-yj) , g nj(l-2yi)
¥(a) = -1
i=i a - ( a - l ) y i i a-2(a-l)yi
ZnjLn-
i=l [ayi-(a-l)yi
(40)
where y, is given by (36) and the function is defined on a bounded range of
0 < a < 2 .The MLE a follows as one of the roots of the equation ¥(a) = 0 or
as one of the boundary values a = 0 or a = 2. The bounded domain of
x
P(or) allows for straightforward plotting of the function in standard spreadsheet
software such as Microsoft Excel and subsequent determination of an
approximate solution of the MLE a . Using the root finding algorithm
Goalseek, available in Microsoft Excel, and the approximate solution of a
allows us to calculate a up to a desired level of accuracy. Finally, substitution
of a in to (38) yields the MLE J3. The MLE procedure above will be
demonstrated in the next section using U.S. 2001 income data.

5. Fitting 2001 U.S. income distribution data


In a leading article of the 459 issue of the Journal of the American
Statistical Association (2002, Vol. 97, pp. 663-673) by Barsky et al. [8] an
illuminating and comprehensive analysis of the African-American and
Caucasian (Non-Hispanic) wealth gap was presented based on a longitudinal
survey of approximately over 6000 households over the period 1968-1992. The
authors argue that a parametric estimation of the wealth-earning relationship by
race is not an appropriate approach. Their main objection is that the wealth-
earning relationship is non-linear with an unknown functional form which is
difficult to parameterize and parametric estimation may thus likely yield
16 J.R. van Dorp and S. Kotz

inaccurate estimates. The authors also provide an extensive and up-to-date


bibliography up to and including 2001. Barsky et al. [8] note that the racial
wealth gap far exceeds the racial income gap at the higher wealth ranges,
suggesting that the racial wealth gap is too large to be explained by income gap
alone. On the other hand, they conclude that the role of earnings differences is
largest at the lower tails of the wealth distribution and decreases dramatically at
higher wealth levels. In fact, their results indicate that differences in household
earnings account for all of the racial wealth difference in the first quartile of the
wealth distribution. Interested readers are also referred to Couch and Daly [9]
and O'Neill et al. [11] who study the related topic of the racial wage gap in the
U.S.
Our approach to this problem is somewhat different. We attempt to use the
distribution developed in the previous sections to fit the more recent household
income data in the U.S. for the year 2001 (Source: U.S. Census Bureau, Current
Population Survey, March 2002) classified according to the Caucasian (Non-
Hispanic), African-American and Hispanic populations and draw some tentative
conclusions about the racial income gap based on this data. Parametric
estimation of income data is quite common for almost 100 years and a wide
variety of distributions have been proposed (see, Kleiber and Kotz [12] for an
extensive bibliography). RGTL distributions (which are not discussed in Kleiber
and Kotz [12]) allow for a strictly positive density value at its lower bound,
which is observed in a non parametric kernel density estimate of the 1989
income data (see Figure 2 of Barsky et al. [8] p. 668). The new distribution we
are proposing turns out to be appropriate for the U.S. 2001 household income
data, especially for that of the African-American sub population. We emphasize
that the main purpose of the numerical analysis below is to illustrate the fitting
attributes of the RGTL distribution and properties of its parameters. The
numerical analysis herein in no way yields a conclusive answer to the problem
of racial income gaps (nor that of racial wealth and racial wage gaps) while
providing indications of the current state of affairs and further study is in order.
Table 1 below contains income distribution data for households in the year
2001 for the different ethnic groups: Caucasian (Non-Hispanic), African-
American and Hispanic throughout the U.S.A. The MLE procedure above will
be used to fit RGTL distributions for incomes of these three groups. Only the
data up to $250,000 will be used in Table 1 since the U.S. Census Bureau data
does not provide the maximum observed income in their statistics. Of the total
number of U.S. households surveyed 98.58%, 99.44%, 99.65% have in 2001 an
income less than $250,000 for the Caucasian (Non-Hispanic), Hispanic and
African-American ethnic groups, respectively.
Using Elevated Distributions on a Bounded Domain 17

Figure 4 displays a graph of the function g{a) (cf. (40) ) for the income
data of Caucasian (Non-Hispanic) Americans presented in Table 1. From
Figure 4 we observe an approximate root of the equation g(a) = 0 to be the
value a* «1.70. Since, g(a)>0 ( < 0 ) f o r 0<a<a* (a* < a < 2) it follows
that a = a* is the unique MLE of (35) for a . Using Goalseek (a standard root
finding algorithm in Microsoft Excel) with a accuracy of 1 . 10~6, utilizing the
approximate solution 1.70 we obtain a = a =1.679. The unique MLE
P = 6.767 follows from substituting a = 1.679 in (38). Figure 5 below plots
both the empirical and their fitted RGTL counterparts (cf. (1) and (2) ) with
a = $0, b = $250,000 , a = 1.679 and /3 = 6.767 . Differences between the
empirical cdf and fitted cdf can be observed in Figure 5A. The Kolmogorov-
Smirnov Statistic D, which is the maximum observed difference between the
empirical and fitted cdf s (see, e.g., DeGroot [13]), in Figure 5A equals 8.60%.

1.0E+05 j
8.0E+04 -
6.0E+04 -
4.0E+04
^ 2.0E+04
3 0.0E+00
o
-2.0E+04
-4.0E+04
-6.0E+04
-8.0E+04
-1.0E+05
0 0.5 1 1.5 2
a
Figure 4. A graph of the Function g(a) (cf. (40) ) for the income data of Caucasian (Non-Hispanic)
Americans presented in Table 1
18 J.R. van Dorp and S. Kotz

Table 1. U.S. income distribution for households in year 2001 (Source: U.S. Census Bureau, Current
Population Survey, March 2002. Numbers in thousands, households as of March of the following
year)
Caucasian
(Non-Hispanic) African American Hispanic
Number Mean Number Mean Number Mean
Income of Household Income Income Income

$10,000 to $12,499 3,142 $11,220 621 $11,173 458 $11,214


$12,500 to $14,999 2,946 $13,615 543 $13,672 411 $13,659
$15,000 to $17,499 3,167 $16,091 660 $16,089 553 $15,993
$17,500 to $19,999 2,803 $18,660 479 $18,655 418 $18,579
$20,000 to $22,499 3,099 $21,082 610 $21,094 490 $21,005
$22,500 to $24,999 2,697 $23,706 447 $23,682 373 $23,691
$25,000 to $27,499 3,055 $26,064 570 $26,061 477 $26,011
$27,500 to $29,999 2,446 $28,673 464 $28,544 330 $28,617
$30,000 to $32,499 3,277 $31,059 492 $31,040 479 $30,998
$32,500 to $34,999 2,330 $33,679 375 $33,655 335 $33,601
$35,000 to $37,499 2,950 $36,045 437 $35,944 412 $36,082
$37,500 to $39,999 2,114 $38,713 310 $38,626 249 $38,641
$40,000 to $42,499 2,846 $41,052 434 $41,004 424 $40,938
$42,500 to $44,999 1,924 $43,679 260 $43,693 231 $43,668
$45,000 to $47,499 2,236 $46,058 289 $45,908 291 $46,044
$47,500 to $49,999 1,986 $48,709 256 $48,655 205 $48,607
$50,000 to $52.499 2,403 $51,042 350 $50,924 247 $51,021
$52,500 to $54,999 1,736 $53,679 210 $53,553 153 $53,725
$55,000 to $57,499 2,014 $56,127 249 $55,972 224 $55,992
$57,500 to $59,999 1,528 $58,650 177 $58,680 177 $58,764
$50,000 to $62,499 2,047 $61,053 248 $60,979 219 $61,106
$62,500 to $64,999 1,417 $63,719 162 $63,761 141 $63,801
$65,000 to $67,499 1,710 $66,048 175 $65,990 157 $66,018
$67,500 to $69,999 1,325 $68,677 150 $68,705 124 $68,734
$70,000 to $72,499 1,622 $71,067 190 $71,090 159 $71,112
$72,500 to $74,999 1,248 $73,707 142 $73,589 128 $73,711
$75,000 to $77,499 1,608 $75,981 133 $75,974 132 $75,860
$77,500 to $79,999 1,073 $78,662 100 $78,693 72 $78,726
$80,000 to $82,499 1,380 $81,051 100 $80,950 125 $80,976
$82,500 to $84,999 993 $83,688 90 $83,584 90 $83,708
$85,000 to $87,499 1,144 $86,057 103 $85,984 76 $85,830

$87,500 to $89,999 803 $88,696 86 $88,754 55 $88,636


$90,000 to $92,499 985 $91,051 78 $91,103 83 $90,997
$92,500 to $94,999 701 $93,658 83 $93,666 41 $93,579
$95,000 to $97,499 915 $96,071 71 $95,901 65 $95,999
$97,500 to $99,999 712 $98,682 65 $98,639 48 $98,811
$100,000 to $149,999 8,374 $119,083 554 $117,549 515 $119,016
$150,000 to $199,999 2,689 $169,312 115 $172,222 113 $164,692
$200,000 to $249,999 993 $219,285 29 $218,672 43 $221,737
$250,000 and above 1,345 $462,675 46 $433,097 59 $474,843

13,315 $39,248 10,499 $44,383


Using Elevated Distributions on a Bounded Domain 19

100% •
^ 1.8E-05 •
90% - ../
80% • .../ 1.6E-05 •
70% • // 1.4E-05 •
60% • ft 1.2E-05 • ; • : ,

50% • ft 1.0E-05 •
^ 5
40% -
If 8.0E-06 • ••• ^

30% -
if 6.0E-O6 • :
20% - i 4.0E-06 • ^-A.
10% •
1 2.0E-06 •
^"^f^
$100,000 -I

$150,000 •

$200,000 •
$50,000 •

0% •
e o

000

000

000
1
*» o
e

1 B
^ - Empirical CDF RGTL CDF - Empirical PDF

Figure 5. Empirical and an MLE fitted RGTL distribution (a = 1.643 and /? = 6.179 ) of the
Caucasian (Non-Hispanic) income data in Table 1; A: CDF; B: PDF

Hence, with 43 degrees of freedom (Table 1 has 43 rows up to $250,000)


the Kolmogorov-Smimov test accepts the fitted RGTL distribution at the 10%
( D 0 1 0 *0.182), 5% ( D 0 0 5 *0.203) as well as 1% ( D 0 0 1 * 0.243 )levels,
respectively. Table 2 provides the unique MLE estimators for a and J3
(obtained using the procedure described in Section 4) for the Caucasian (Non-
Hispanic), African-American and Hispanic income data presented in Table 1.
Figure 6A (Figure 6B) plots the empirical and fitted RGTL pdf with MLE
a = 1.613, /ff = 10.629 ( a = 1.685 and /? = 10.306) for the African-American
(Hispanic) income data as presented in Table 1. The Kolmogorov-Smirnov
Statistic D for the African-American (Hispanic) income data equals 6.01%
(8.09%) which is smaller than that of the Caucasian (Non-Hispanic) income data
(indicating a better fit). Hence the Kolmogorov-Smirnov test accepts both MLE
fitted RGTL distributions in Figure 6A and 6B at the 10%, 5% and 1% levels,
respectively.

Table 2. Maximum Likelihood Estimators for the parameters a and J3 of RGTL distributions for
the income data in Table 1 up to $250,000

a P
Caucasian (Non-Hispanic) 1.679 6.767
Hispanic 1.685 10.306
African-American 1.613 10.629
20 J.R. van Dorp and S. Kotz

Table 3 contains the (standardized) cumulative moments M0=M'\,


M^MJ.MJ and the central moments fi2, Mi and // 4 calculated utilizing (32)
and (28) . Note that there is a strict ordering column-wise for all the values in
Table 3 in the order: Caucasian American (non-Hispanic), Hispanic, African-
American. From Table 3 we can calculate values for the mean and standard
deviation utilizing the transformation Y = $250,000 X. In a similar manner, the
median and mode of the MLE RGTL distributions can be evaluated utilizing the
parameter values in Table 2, (34) and (23). In addition, we may utilize Table 3
to calculate the coefficient of skewness py and coefficient of kurtosis P2 given
by
MA_
Pl =
M2 M22

1.8E-05 •
1.6E-05 •
\
1.4E-05 • \
1.2E-05 •
\
1.0E-05 • \
8.0E-O6 • \
6.0E-06 •
\
4.0E-06 • \
2.0E-06 •
0.0E+00 •

Empirical PDF
B •Empirical PDF -RGTL PDF

Figure 6. Empirical and MLE fitted RGTL pdf's for the income data in Table 1; A: African-
American ( a = 1.613 and p = 10.629); B: Hispanic ( a = 1.685 and p = 10.306)

These estimated statistics are provided in Table 4 for the three


subpopulations under consideration.

Table 3. Cumulative Moments M k and Central Moments / / k + 1 of the MLE fitted RGTL
distributions for the income data in Table 1 up to $250,000 calculated utilizing (32) and (28),
k=l 3

M 0 =tt' M, M2 M3 Mi MT, MA

Caucasian (Non-Hispanic) 2.34e-l 3.97e-2 l.lle-2 3.80e-3 2.47e-2 2.54e-3 1.75e-3

Hispanic 1.77e-l 2.38e-2 5.26e-3 1.51e-3 1.60e-2 1.66e-3 8.40e-3


African-American 1.59e-l 1.98e-2 4.17e-3 1.14e-3 1.44e-2 1.60e-3 7.31e-3
Using Elevated Distributions on a Bounded Domain 21

Table 4. Statistics associated with the MLE fitted RGTL distributions for the income data in Table 1
up to $250,000
Mean Median Mode St. Dev A Pi
Caucasian (Non-Hispanic) $58393 $52534 $28306 $39326 0.424 2.858
Hispanic $44316 $38606 $11851 $31710 0.660 3.248
African-American $39786 $33599 $0 $30002 0.858 3.522

A similar ordering as observed in Table 3 can be observed throughout


Table 4. Note that the difference in the point estimates in Table 4 between the
Caucasian (Non-Hispanic) population and the African-American Population is
approximately $18607 or more and those associated with the Hispanic
population $13928 or more. The latter observation is amplified somewhat in
Table 4 by the fact that the fitted mean income for the Caucasian (Non-
Hispanic) population overestimates the empirical mean (of income up to
$250,000 )by $3936 whereas the fitted mean income for the African-American
(Hispanic) population is overestimated by only $1898 ($2357). Perhaps the most
notable difference is the modal income value of $0 for the MLE fitted RGTL
distribution for the African-American population while the modal income value
for the Caucasian (Non-Hispanic) and Hispanic population have a value
substantially larger than zero (and the mode for the Caucasian (Non-Hispanic)
population is more than twice that of Hispanics). A similar observation can be
made by comparing the RGTL distributions in Figure 5B, 6A and 6B. Finally,
from Table 2 and (3) we may evaluate the density values at the lower bound, i.e.
f (0| 0,$250,000, a, f$), presented in Table 5. Hence, in comparison with
Americans of Caucasian origin, African-Americans appear to be approximately
1.9 times as likely and Hispanics 1.5 times as likely, in the year 2001, to have
negligible income. It is the fact that our MLE fitted RGTL pdf's may take any
positive value at its lower bound, that allows us to reach such a conclusion.

Table 5. Density values at the lower bound of the MLE fitted RGTL distributions for the income
data in Table 1 up to $250,000

f(0|0,$250,000,a,#)
Caucasian (Non-Hispanic) 8.68e-6
Hispanic 1.30e-5
African-American 1.65e-5
22 J.R. van Dorp and S. Kotz

The analysis of our investigations presented below seems to be, in our


opinion, of some interest and value. In Figure 7A, we utilize the MLE fitted
RGTL income distributions by plotting the percentiles of the African-American
and Hispanic income distributions against those of the Caucasian (Non-
Hispanic) one using (9), (34) and the corresponding MLE values for a and J3
in Table 2. For example, from Figure 7A, we observe that approximately 70%
(65%) of the African-American (Hispanic) population have less income than the
median (50%) of the Caucasian (Non-Hispanic) income distribution. Similar
comparisons can be made for other percentiles of the Caucasian (Non-Hispanic)
income distribution utilizing Figure 7A. For example, 34%( 29%) of the
African-American (Hispanic) population earn less than what less than 20% of
the Caucasian (Non-Hispanic) population earn (i.e. the 20% percentile of the
Caucasian (Non-Hispanic) income distribution). Note that the solid curve in
Figure 7A involving the African-American (Hispanic) income distribution is
located completely above the unit diagonal which implies stochastic dominance
of Caucasian (Non-Hispanic) income over that of the African-American
(Hispanic) one. The latter can be directly concluded from the MLE values for a.
and P in Table 2 and (16) for the African-American and Caucasian (Non-
Hispanic) comparison but not the Hispanic and Caucasian (Non-Hispanic)
comparison. This shows that the implication arrow in (16) cannot in general be
reversed.
In a similar manner, Figure 7B utilizes the MLE fitted RGTL income
distributions by plotting the percentiles of the African-American and Caucasian
(Non-Hispanic) income distributions against those of the Hispanic one. For
example, from Figure 7B, we observe that approximately 56% (37%) of the
African-American (Caucasian Non-Hispanic) population have less income than
the median (50%) of the Hispanic income distribution. We now conclude from
Figure 7B that Hispanic income stochastically dominates the African-American
one. The latter conclusion also follows immediately from the corresponding
MLE values for a and ft in Table 2 and (16). However, we can conclude, once
again, only by observation in Figure 7B that Hispanic income is stochastically
dominated by Caucasian (Non-Hispanic) income (since the line associated with
the Caucasian (Non-Hispanic) now happens to be completely below the unit
diagonal). This conclusion, as before, cannot be directly obtained from the
corresponding MLE values for a and p in Table 2 and (16).
Using Elevated Distributions on a Bounded Domain 23

-+•34%
•+29%
-+•20%

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100
%
Caucasian (Non-Hispanic) Percentile

-African American -Hispanic — — Caucasian (NH)

-»- 56%
•*• 5 0 %

••37%

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100
%
B Hispanic Percentile
African American . . . -Hispanic Caucasian (NH)

Figure 7. Stochastic Dominance Analysis by Ethnicity for the income data in Table 1 utilizing the
MLE fitted RGTL cdf 's

Summarizing, Table 2 and (16) alone imply that the chances of a Caucasian
(Non-Hispanic) or Hispanic American earning more than a specified amount
(anywhere within the range from $0 to $250,000) are higher than those for an
African-American. In addition, the analysis in Figure 7 allows us to conclude
that the chances of a Caucasian (Non-Hispanic) earning more than a specified
amount (anywhere within the range from $0 to $250,000) are higher than those
of a Hispanic. Moreover, Figure 7 and Table 4 demonstrate that although
24 J.R. van Dorp andS. Kotz

substantial advances have reportedly been made in reducing the income


distribution gap amongst these three subpopulations in the U.S. during the last
20 years or so (see, e.g., Couch and Daly [9]), these differences still exist and
are quite noticeable.

6. Concluding remarks
We have attempted to construct and investigate a new four-parameter
continuous family of distributions on a bounded domain possessing arbitrary
strictly positive density values at its lower bound. As an illustration, the new
family is applied to fitting the distributions of income of Caucasians (Non-
Hispanic), Hispanics and African-Americans in the U.S.A. in the year 2001
based on U.S. Census bureau data. The results seems to be quite satisfactory and
allow us to compare the incomes of the above 3 groups in a novel manner which
seems to be revealing by shedding additional light on features which are not
obvious from a direct examination of the raw data.

Acknowledgments
We are indebted to T.A. Mazzuchi for his helpful comments in the course of
developing this paper and to Dr. David Findley (U.S. Bureau of Census) for
helping us to obtain recent data.

References
1. Topp, C.W. and Leone, F.C. (1955). A family of J-shaped frequency
functions. Journal of the American Statistical Association, 50(269), 209-
219.
2. Nadarajah, S. and Kotz, S. (2003). Moments of some J-shaped distributions.
Journal of Applied Statistics, 30(3), 311-317.
3. Van Dorp, J.R. and Kotz, S. (2002). The standard two sided power
distribution and its properties: With applications in financial engineering.
The American Statistician, 56(2), 90-99.
4. Van Dorp, J.R. and Kotz, S. (2002). A novel extension of the triangular
distribution and its parameter estimation. Journal of Royal Statistical
Society, Series D, The Statistician, 51(1), 63-79.
5. Johnson, N.L. (1949). Systems of frequency curves generated by the
methods of translation. Biometrika, 36, 149-176.
6. Weibull W. (1939). A statistical distribution of wide applicability. Journal
of Applied Mechanics, 18,293-297.
7. Van Dorp, J.R. and Kotz, S. (2003). Generalized trapezoidal distributions.
Metrika, 58(1), 85-97.
Using Elevated Distributions on a Bounded Domain 25

8. Barsky, R., Bound, J., Kerwin, K.C. and Lupton, J.P. (2002). Accounting
for the Black-White wealth gap: A nonparametric approach. Journal of the
American Statistical Association, 97(459), 663-673.
9. Couch, K. and Daly, M.C. (2000). Black-White inequality in the 1990's: A
decade of progress. Working Papers in Applied Economic Theory, No.
2000-07, Federal Reserve Bank of San Francisco.
10. Stuart, A. and Ord, J.K. (1994). Kendall's Advanced Theory of Statistics
(Vol. 1, Distribution Theory). New York, Wiley.
11. O'Neill, D., Sweetman, O. and Van de Gaer, D. (2002). Estimating
counterfactual densities: An application to Black-White wage differentials
in the U.S., Economics Department Working Paper Series, Department of
Economics, National University of Ireland - Maynooth.
12. Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics
and Actuarial Sciences. New York, Wiley.
13. DeGroot, M.H. (1991). Probability and Statistics, 3rd ed. Reading, MA:
Addison-Wesley.
Chapter 2
MAKING COPULAS UNDER UNCERTAINTY

C. GARCIA-GARCIA
Department of Quantitative Methods in Economics
University of Granada
Campus de Cartuja s/n. Granada, 18071, Spain

J.M. HERRERIAS-VELASCO
Department of Quantitative Methods in Economics
University of Granada
Campus de Cartuja s/n. Granada, 18071, Spain

J.E. TRINIDAD-SEGOVIA
Department of Business Administration
University ofAlmeria, Ctra. Sacramento s/n
La Canada de San Urbano, 04120
Almeria, Spain

This paper is based in the MTDF methodology, which lies in obtaining the value of an
asset from value of a specific index (Ballestero, 1973). The topics of this paper are to
apply this methodology in the case of two indexes under uncertainty, the construction of a
copula FGM using marginal TSP given the classical values (a, m, b) and the application
in an empirical case. Under an uncertainty environment a high correlation exists between
the indexes, which imply the impossibility to apply the FGM copula that is restricted to
the weak correlations case. This article overcomes this disadvantage presenting an
alternative that later is applied in a practical case.

1. Introduction
The method of the two distribution functions has been developed as a
method of valuation recommended under uncertainty environments, this is,
when there is no information over the asset that it has to be valued and an
experts' is consulted, acting in similar way that in the PERT method.
The present paper is based on the method of the two distribution functions,
also known as method of two betas. This method was presented by Ballestero
(1971) and it is highly used in valuation. It supposes an improvement of the
Synthetic method and it was formalized later by its author, Ballestero (1973),

27
28 C. Garcia-Garcia, J.M. Herrerias-Velasco andJ.E. Trinidad-Segovia

who describes it as follow: the variable market value of a good will follow
statistically the distribution function F. On the other hand, the index, parameter
or explanatory variable will follow statistically another distribution function G.
We suppose that the functions F and G have the form of a bell or similar, and
then the method of both betas establishes a relationship between both variables.
So, it is necessary to adopt the following hypothesis: if the index Z,, of an
assets Fj is higher that the Lj of another assets Fj, the market value Vt
corresponding to the first assets will be also major that the market value Vj
corresponding to the second one.
From it, if the distribution F of the market value is known as well as the
distribution G of the index, the market value Vk corresponding to an index Lk is
established by means of the transformation:
VK=0(LK)<*F(VK) = G(LK) (1)

Palacios, Callejon and Herrerias (2000) have presented a rigorous


formalization of the method:
Two random variables related to an asset are considered: the variable I,
which represents a quality index of the assets, and the variable V, asset market
value. It is supposed that the value of V is a function of its quality, this is to say
V = 0(1) , where 0 is a strictly increasing function in a certain interval [/, ,I2 J.
If we have a probability distribution whose distribution function is G(i), V is a
random variable whose distribution function is:

F(V) = P[V < v] = P[0{l)< v] = p[l < 0- 1 (v)J= G ( 0 _ 1 (V)) (2)

or,
G(i) = P[l < i] = P[0(I) < 0(i)] = P[V < 0(i)} = F(0(i)\

where 0 is a strictly increasing function.


It is evident that if F is strictly increasing on the interval [0(/[), 0 ( / 2 ) ] , F is
invertible on the mentioned interval; from the last expression is obtained
<Zii) = F~l {<$)), defined between ( / , , / 2 ) - > ( 0 ( / , ) , 0 ( / 2 ) ) which is a
biyection that transforms qualities in market values.
Figure 1 represents two density functions, known the values (a, m, b) and
(a',m',br). A biyection remains established between the value of the index and
the value of the assets, (Garcia and Garcia, 2003). From here, if the quality
index of a good is I0, then its market value must be:

K o = 0 ( / o ) = F- I (G(/ o )) (3)
Making Copulas Under Uncertainty 29

Assets Index

m Z b

Figure 1. Probability density function in the MTDF for the assets and the index respectively

Since the presentation of the method of the two betas in Ballestero (1973),
numerous contributions have been published. These contributions have extended
the application of this method and, summarizing, we can distinguish the
following lines:
a) Practical applications of the method of the two distribution functions:
Ballestero and Caballer (1982), Caballer (1994), Caballer (1998), Caballer
(1999) and Ballestero and Rodriguez (1999) extend its use to the valuation
of fruit-bearing trees and real estate. Alonso and Lozano (1985) do an
application to the valuation of properties in the region of Valladolid;
Guadalajara (1996) presents a series of practical cases. Garcia, Trinidad and
Sanchez (1997). Cafias, Domingo and Martinez (1994) realize a practical
application in the province of Cordoba.
b) Extension of the method to different distributions: Romero (1977) does an
extension of the method using uniform and triangular distributions; Garcia,
Cruz and Andujar (1998) present a review of the application in triangular
distributions. Garcia, Trinidad and Gomez (1999) extend the method to the
utilization of a special class of trapezoidal distributions; Herrerias, Garcia,
Cruz and Herrerias, (2001) extend the method to the use of trapezoidal
distributions of any type. Garcia, Trinidad and Garcia (2004) realize an
application using the generalized triangular functions of Van Dorp and Kotz
that can be fitted in an uncertainty environment.
c) Utilization of two or more indexes, under the independence hypothesis or
not, and implementation of econometric applications. In this line, Garcia,
Cruz and Rosado (2000, 2002) present an extension of the method to the
multi-index case under the hypothesis of independence between the indexes.
Herrerias Velasco (2002) in his Doctoral Thesis extends the method of both
distribution functions to the case bivariante of exhaustive form and, in
30 C. Garcia-Garcia, J.M. Herrerias-Velasco andJ.E. Trinidad-Segovia

general, to the multivariant case, without hypothesis of independence, and it


is also presented the pyramidal distribution. Finally Garcia, Cruz and Garcia
(2002.b) present an econometric application of the multi-index case of the
two distribution functions method.

d) Development of statistical test to confirm the adequacy of the distribution


functions chosen and the kindness of the indexes. Garcia, Cruz and Garcia
(2002.a) extend the use of the method of two distribution functions to the
mesokurtic functions, constant variance, Caballer and classic betas families
introducing a method to select the most appropriated distribution in every
case, and presenting a computational program that solves the investment
problem. To conclude, Herrerias, Palacios, Callejon and Perez (2001)
develop a method to evaluate the kindness of an expert in the PERT
methodology.

e) Iteratives procedures of valuation: Garcia, Cruz, Garcia (2002.c) and


Garcia, Cruz, Garcia (2004.a).

In this work the MTDF will be applied in the case of two indexes under an
uncertainty environment. The FGM family will be used to construct a joint
distribution function given the TSP marginal.

2. Initial approach

When one tries to value an asset that depend of one or more indexes and
there is no statistical information, it is said that we are in uncertainty
environment. The habitual procedure in these cases is to turn to an expert, who
will be asked about the optimistic value, the pessimistic value and the most
probable value of the assets and, at least, a reference index (Garcia, Trinidad
and Garcia; 2004). If we suppose that we have information about the PERT
values for the assets and two reference indexes (see Table 1):

Table 1. PERT values for the assets and two references index

Assets (V) Indexl (I\) Index 2 (I2)


(a, m, b) (ai,m,,b,) (a2.m2.b2)

' It is been also considered the case of the Multi-Index in Garcia, Cruz and Rosado,
(2000, 2002)
Making Copulas Under Uncertainty 31

The distribution functions F(I,) and F(Jz) will be obtained from these
estimations using classical methods. Then, when the marginal distribution
functions are known it is necessary to create a joint distribution function,
F(h, Id- This question has been hardly studied in the literature.
The first references are Frechet (1951) and Levy (1950), who proved, when
he was looking for a definition of the distance between two distributions, that
once given a distance between random variables d(X, Y), the minimum of the
above mentioned distance, when the distributions of X and Y are given, is
another distance. Frechet, based on Levy's paper, began to study the problem of
creating a joint distribution function when the marginal distributions were
known. He proved that from two cumulative distribution function, F(x) and
F(y), the joint cumulative distribution function is between W(x,y) and M(x,y):
W(x,y)<F(x,y)<M(x,y) (4)

The lower and the upper limits of the previous inequation are usually called
Frechet's limits, and these limits are likewise distribution functions with the
following expression:
W(x,y) = max[F,(x) + F2(y)-l, 0]
(5)
M(x,y) = min[Fi(x),F2(y)]

The upper limit is the distribution function of (x,j)when x = y with


probability equal to 1, and the lower limit correspond to the case when
y = \ — x.
Frechet was the first in expressing the problem systematically; nevertheless
the later develops have been done by other author, mainly Pompilj (1984). It is
also interesting the works of Morgenstern (1956), Farlie (1960), Nataf (1962),
Plackett (1965), Mardia (1967), Kimeldorf and Sampson (1975). Thesse papers
were revised by S. Kotz and N.L. Johnson (1977), but also by Conway (1979)
and by Barnett (1980). Schweizer and Sklar (1983) presented the copulas later
studied by Genest and Mackay (1986). On the other hand, Johnson and
Tenenbein (1989) presented the combined linear weighted method based on
Kendall's tau and the Sperman's correlation coefficient. Many of these advances
are showed in DalPAglio, Kotz and Salinetti (1991). It is observed that this
problem has attracted attention in the literature throughout many years, and that
still is presented.
There are different topics such us: Frechet's structure, the joint distribution
functions known the marginal distribution functions and the limits, the
compatibility of distribution functions, the efficiency of the average value of
32 C. Garcia-Garcia, J.M. Herrerias-Velasco andJ.E. Trinidad-Segovia

some functions, the connection with linear programming. It was very important
the introduction of copulas by Sklar in 1961, and his later paper with Schweizer,
where it was opened a new route of investigation.

3. FGM distribution functions


For many years an interest has existed for the bivariant distribution families
known the marginals F(XX) and F{X2). These bivariant distributions have
usually the following form:

F(Xx,X2la) = y,[F(Xx\F(X2)la\ (6)

where a is a parameter of a vector of parameters.


Some of the most relevant distributions in the literature were introduced by
Farlie (1960), Gumbel (1960), Morgensten (1956), Sibuya (1960), Gumbel
(1958), Plackett (1965), Frank (1979) and Cook and Johnson (1981). For a
discussion on these distributions see Mardia (1970) and Johnson (1987). Genest
and Mackay (1986) proved that most of the distributions are obtained from an
unique method. Later, Marshall and Olkin (1988) obtains similar conclusions
and presents new distributions considering certain mixed models.
In this section we present the FGM distribution function family, which has
been widely studied in the literature. The fist reference is Eyraud (1936), who
worked in this distribution using uniforms marginals.
The cumulative distribution function of this family has the next form:
F{XX ,X2) = F{ (Xx )F2 {X2 )[l + a{\ - F, (*, ))(1 - F2 (X2))] (7)

where:
F(Xj,X 2 ) is the joint cumulative distribution function ofXt andX2 .
F(X1) and F(X 2 ) are the marginals cumulative distribution functions
The expression for the probability density function is given by:
f(Xl ,X2) = fx (X, )f2 (X2 )[l + a(\ - 2F, (X, ))(1 - 2F2 (X2))] (8)
About the correlation between X e Y it is easy to prove that:
E{ylx) = E{y)+aJ2{2Fx{x)-\} (9)
+00

where J 2 = |V(}>)(1 - F{x))dx. See Kotz and Drouet (2001).


—00

The a parameter belongs to the interval [-1,1], so the cases in which a = -1


and a = 1 represent the maximum degrees of negative and positive dependence
Making Copulas Under Uncertainty 33

respectively, allowed in this family. The dependence properties of this family


are associated with the correlation coefficient, though a priori the parameter of
the FGM distribution, a, is not associated with this concept.
It is proved that:

• If the marginals follow a N(0,1) distribution the correlation is CCK ,


this is equivalent to say that the correlation coeficient moves in the
interval (-0,318, 0,318).

• If the marginals follow a uniform distribution, the correlation coeficient


is a/3 and changes between -1/3 and 1/3. It is deduced that for the
FGM distribution with absolute continuous marginals, the correlation
coeficient between X and Y can not be higher than 1/3.
Summarizing, it is possible to affirm that the structural dependence between
X and Y is controlled by the parameter a. To get the density function positive, a
has to change between —1 and 1. This, restrict the possible values of the
correlation coefficient, that changes between (-1/3,1/3), a circunstance that limit
the application of the FGM distributions to the cases in which the dependence is
weak enough. See Athanassoulis, Skarsoulis, Belibassakis, (1994).
It will be proved later, that under uncertainty environment, due that we have
only three data, the existing correlation between the indexes will be out of the
range (-1/3,1/3) described previously. For it, it is neccesary to try to look for an
alternative to apply the family of FGM distribution functions under uncertainty.

4. The Dorp and Kotz's distribution families and its subfamilies


Recently, Van Dorp and Kotz (2002a, 2002b) have introduced the Two
Sided Power (TSP) distribution, which is a generalization of the triangular
distribution and it is defined as follow: Let x be a random variable which is said
to follow a TSP distribution. Then the probability density function of x is:
N«~l
x-a \
b-a\m—a, , si a<x<m
f(x/a,m,b,x) = (10)
b-x \ .
b-a \b — m , si m<x<b

Standardizing the random variable x, this is, doing the change of variable:
x-a
t =•
)-a
34 C. Garcia-Garcia, J.M. Herrerias-Velasco cmdJ.E. Trinidad-Segovia

We obtain a new random variable t whose density function is given by:


s«-l
si 0<t<M
M
f(tlM,ri): (11)
n-\
n\ | , si M < / < 1
\-M

The cumulative distribution function is:

M \± si 0<t<M
F{tlM,ri) = - (12)
1-/
l-(l-M) si M<t<\
\-M
where:
(n - 1)M +1
E(t) = (13)
«+l
and
n-2(n- 1)M(1 - M)
var(f) = (14)
(« + 2)(« + l) 2

Where a is the pessimistic value, m the most probable value and b the
optimistic value, all of them apported by the expert. The parameter n has a more
complex interpretation, due that it is not known what exactly means, and also
what must be the question asked to the expert to obtain this information.
However, we can affirm that n verifies the following properties:
1. n > 0
2. For n = 1, then the STSP distribution degenerates into a uniform
distribution.
3. For n = 2, the TSP distribution is tranformed in a triangular distribution
with parameters a, m and b.
4. Finally, for a = 0 and m = b= 1, f(7a,m,b,n) is a potential function and
for a = m = 0 and b = 1, We would obtain its reflection.
In spite of this, Van Dorp and Kotz point out the intuitive meaning of n
since the expected value of x adopts the following expression:
Making Copulas Under Uncertainty 35

E(x)=aHn-l)m + b
n+\
So, n - 1 is the coefficient that wheights the mode to obtain the expected
value of the random variable, supposing that the extremes a and b are wheighted
by 1. In our opinion this property places the STSP distribution in the field of
PERT. From the habitual three classical values a, m and b, whose meaning is
known, it should be impossible to determine the unique STSP distribution, since
it is a tetraparametric distribution whit parameters a, b, and «. Therefore, it is
necessary to restrict the election of the unique STSP distribution to someone of
its subfamilies (see Garcia, Cruz and Garcia, 2004.b, 2005).
We will define, first of all, a family of constant variance as the set formed
by the STSP distributions with the same variance that the normal distribution of
the classic PERT. In case of working with random standardized variables, the
following equation is fulfilled:
w3 + 4« 2 + (-72M 2 + 72M - 31)« + (72M 2 - 72M + 2) = 0. (16)

This equation allows us to obtain for each value of M e (0,1), a unique


value of n > 1. Therefore, we can affirm that, given the three habitual values a,
m and b, a unique unimodal STSP distribution with constant variance is set. This
result allows to use this family in the field of the PERT. This is, to estimate a
TSP, given the values a, m and b.
On the other hand, we will define the mesocurtic family as the set of STSP
distributions which kurtosis coeficient ((32) ls equal to 3 (normal kurtosis
coefficient). Then the following equation is fulfilled:
an4 +bn3 +cn2 +dn + e = 0. (17)
where a, b, c, d and e are polinomies in M:

a{M) = 2M4 - 4M 3 + 6M 2 - AM +1
b(M) = -14M 4 + 28M 3 - 22M 2 + 8M - 1
•c(M) = - 2 M 4 + 4 M 3 - 2 2 M 2 + 2 0 M - 6 (18)
4 3 2
d(M) = 62M - 124M + 94.M -32A/ + 2
e(M) = -48M 4 +96M 3 -56M2 +8M

It could be proved that for every Me(0,l), the equation (18) has an only
solution that verifies n > 1, so we can affirm that always a mesocurtic STSP
36 C. Garcia-Garcia, J.M. Herrerias-Velasco andJ.E. Trinidad-Segovia

distribution will exist. This result improves the ones obtained by mesocurtics
beta distributions in which is imposible to get a solution when M e
(0,2763933...;0,7236067...). Solving the system created by equations (17) and
(18), the unique solutions for n > 1 are:
M= 0,747133..., « = 3,02344...
(19)
M= 0,252867..., n = 3,02344...
These solutions are the same that correspond to the STSP distributions that
verify simultaneously the conditions a2 = 1/36 and /?2 =3 and these are called
classic family.
To conclude, in the PERT, the family of STSP distributions improves
always the Beta distributions families due to:
1. It is always possible to select a mesocurtic STSP distribution, while in
the case of the Beta distribution it is not possible.
2. As well as the classic Beta distribution exits a STSP distribution with
n = 3,0234.
3. The STSP distribution is more moderate in mean and more
conservative in variance to every M value. See Garcia, Cruz y Garcia
(2005).
This could be explained by the behaviour of the kurtosis coefficient. If we
compare the STSP distributions family depending of n and M, with the beta
distributions family depending on k = n - 1 and M, the first one allows to select
a distribution with higher kurtosis than the second one. It could be proved that,
in symmetric distributions, the value to the kurtosis coefficient of the beta

A
-j Beta
4 _ s l s p
—j cuitosis = 1
H curiosi; = 6

60

Valores dek — n-l


Figure 2. Kurtosis of the Beta and STSP symmetric distributions
Making Copulas Under Uncertainty 37

distribution is lower that the normal one (3), whereas in the STSP distribution
we can find weighted values that get higher or lower kurtosis values than the
normal one (see Figure 2). In conclussion, this distribution could be an
alternative to the normal distribution and others when we want to fit
distributions with a higher kurtosis (see Herrerias, Callejon, Perez and Herrerias,
2001).

5. An approach to the problem: Application of the MTDF with van


Dorp and Kotz's marginals in an uncertainty environment
Once we introduced the FGM distributions families and the TSP
distribution, in this section we will obtain the joint distribution funtion applying
the FGM distributions family and the STSP distributions. Known two indexes,
Xx and X2, and the highest, more probable, and lowest values we obtain the
following standardized ones:

f o^
M•i X2 = M2 (20)
1 1

The indexes Xx y X2 follow a STSP, so applying the expression (12), its


distribution functions are:

M, 0<XX <MX
yMXJ
F(XX)- (21)
1 - (l - Af, M , < Xx <1
1-Af,

M, 0<X2 <M2
\M2J
F(X2) = (22)
\"2
1-X-,
l-(l-M 2 f M2<X2<\
\-M 2 7

Using (7) the following equation is obtained:


38 C. Garcia-Garcia, J.M. Herrerias-Velasco andJ.E. Trinidad-Segovia

V
MiAfV*
M M
* 1+G 1-M
^S
(K^MjO-cA^M,

1-0-M M i+0(i-M^ 1-M


1-M ik M

M, <A",<1;0<A2<A4
F{^X2) = { (23)
1-A"2
«'£ 1-(1-M,
1-M
1+Q(1-M|—
M
p-W 2
1-M

0<Xi<M\,M2<X1<\

(fj
1 1_Ai
I-(I-M: H,-Ml| 1+0(1-^) ' (1-M^
{i-H 1-M

M<X,<1;M2<A2<1

Applying the expression (8) we get the density funtions:

M, I
x2
I A/, ' - ^ —fe
0 < X, < A/,;0< X, < M,

1-Af, ]"' V X2
l-A/,1 I AT,
-1 + 2(1-A/,)
H*.
M.<X,< 1;0 < A", < A/,
f(X„X2) = \

M, I
i-X1
I 1 - A/,
1+a|,_2M,|A)
a- |_I+2(1_W2)iz|
0 < X, < A/,; A/, < AT, <1
(24)

1-A-, 1-Af,
-1 + 2 ( 1 - A / , ) •1 + 2 ( 1 - A / , )
l-A/,1 ll-Af 1-M, J I ' *'U-A/ 2
A/, <X, <\;M2 < X2 <1

Figure 3 represents the joint distribution function given two marginal STSP, by
means of the FGM distribution function.
Making Copulas Under Uncertainty 39

00,9-1
• 0,80,9
• 0,7-0,8
B 0,6-0,7
00,5-0,6
• 0,4-0,5
• 0,3-0,4
• 0,2-0,3
• 0,1-0,2
• 0-0,1

Figure 3. FGM joint distribution function with marginals TSP, Mi = 0,8; M2 = 0,6; a = 0,9

0.15 0,22
U M
' 0.71 ^TITfmFci
0.85 , n 00
n 0

Figure 4. FGM joint density function with marginals TSP, Mi = 0,8; M2 = 0,6; a = 0,9
40 C. Garcia-Garcia, J.M. Herrerias-Velasco andJ.E. Trinidad-Segovia

In Figure 4 is presented the FGM joint density function given the Van Dorp
marginal functions. To carry out these representations, it is necessary to find
before the value of a. Let's remember that the parameter a belongs to the interval
[-1,1], so in the cases in which a = -1 and a = 1 represent the maximum degrees
of negative and positive dependence, respectively, that are allowed in FGM
family. It is observed that the parameter a is associated with the measurements
of dependence, and that is why these are used for the calculation of this
parameter.
The basic measurement of linear dependence between two variables Xt and
X2 is the covariance:
cov(^,X2) = £[(*, -E(X i )Xx 2 - E(X2))] (25)

In order that this measurement was independent to the units in which the
variables are expressed, the covariance is divided by the product of the standard
deviations. This is the widely known correlation coefficient:

This coefficient has been the basic measurement of the linear dependence
during more than 100 years. Many other measurements have been proposed
during the 20th century to calculate the positive or negative dependence, for
example the Spearman's coefficient, Kendall's tau, the Blomquist's q coefficient
and the Hoffding's A . Specifically, the correlation coefficient has been placed
for different families of FGM distribution introduced in the literature, so the
correlation coefficient takes the value of a/4, dn and 0,281a for exponential,
normal and Laplace marginals, respectively. See the Table 2.

Table 2. Correlation coefficient values given the different marginal distributions in


the FGM family

Normal Uniform Exponential Laplace


"a/it a/3 a/4 0,281 a

In these distributions it is not necessary to considerer the value of the


parameters of the marginal distributions to calculate the value of the coefficient.
Nevertheless, this is not the case of FGM distributions with marginal TSP, since
a functional dependence exists on the parameters of the distribution. According
to Hoffding, W. (1940):
Making Copulas Under Uncertainly 41

cov(XuX2) = jj{F(Xi,X2)-F(Xl)F(X2))dXlclX2 (27)


oo
Using (7) and (27) we obtain:

l
corr(Xx, X2 ) = a\\ (28)

where:
i

hXj = ^F(Xi)[\-F(Xi)}iXi (29)

Finally, the substitution of expression (13) into (29) and using (28) and (15),
we obtain the following expression for the correlation coefficient:
"^-(^-QM.-d-M,.)
r{xi,X2,Mi,M2,nl,n2)=aln] +
(30)
(l-M,) 2n, +1
A=f\«,-2(»,.-l)M,.
Thus if we observed that, once that the correlation coefficient is known, we
might find the parameter a by clearing of the previous expression. To find the
correlation coefficient, we will raise the relation between the index 1 and
the index 2 as a basic problem of regression. So, one of the indexes will be the
explanatory variable (X) and other one the explained one (Y):
Y = p0+pxX + u.

We have three observations (the maximum, most probable and the


minimum value) for each of the indexes. The above mentioned values must be
standardized. Later we will proceed to realize calculations to obtain the steeming
of the parameters and the correlation coefficient:
(\ 0 ^
{°M , ] ; X = 1 M2
I1 J I1 1
J
Later, we will operate to find the parameters with the known formula:

^ = (X'X)" 1 X>

where:
42 C. Garcia-Garcia, J.M. Herrerias-Velasco andJ.E. Trinidad-Segovia

-1 1 -(M2'+l)
(XX)
2(M|-M2+1) •(M2'+l)

1 1 1
X'y
0 M2 1 M2M!+1
1
v ;
Solving we obtain:
1 M
A>= T (Mf + 1 X M, + 1)-(M 2 +1)(M2M, + 1) - * t ^ 2 - ^ ~"2Ml
2(M 2 -M 2 +1) 2(Mj - M 2 +1)
2M2M1-M2-M1+2
A 2(M -M +1) (-(M2 + lJXMj +1) + (M2M^ +1)3 = 2(M -M +1)
2 2 2 2

The expression for the variances and the covariance can be obtained from
Table 3:
Table 3. Calculations
Y X y-y x-x (y-y)(*-*)
0 0 -(M,+l)/3 -(M2+l)/3 (Af,+l)(Af 2 +l)/9
M, M2 (2M1-l)(2M2-l)/9
1 M 22 - ( M ^ + 1 )
3 3
1 1 l (M1+l) a (M2+l) (2-M1)(2-M2)/9
3 3

With regard to the variance of X and the variance of Y:

(M, + lY| (2M^ (2-MA = | ( M ? _2„, + 2 ) (3D


var(F) =

var(X) = - {M\ -2M2+2) (32)

Finally, the correlation coefficient is given by:


(2MxM2-Mx-M2 + 2)/3
P=-
}J(M? - 2MX + 2){M\ - 2M2 + 2)
J (33)
2 MXM2-MX-M2 + 2
2<J(MX2 - 2MX + 2)(M\ - 2M2 + 2)
Making Copulas Under Uncertainty 43

If we represent this function in the space we obtain Figure 5:

• 0,8-1
B 0,6-0,8
• 0,4-0,6
•0,2-0,4
B
0-0.2

Figure 5. Representation of the correlation coefficient between the index 1 and the index 2 under
uncertainty

Once the correlation coefficient is known, it is possible to substitute in the


expression (31) and we obtain the value of a. Nevertheless, as it has been
observed in the figure (6), the correlation coefficient changes between 0,4986
and 0,9016, and this will give the values of a included in the interval (1,55;
3,16). This fact implies that the expressions (7) and (8) cannot be used since the
FGM family only is applicable for values of a between -1 and 1.
In section dedicated to the presentation of this family was warned the
disadvantages when the values of a are out of the interval (-1,1), and we
conclude that the FGM model is adapted for variables with moderate or small
dependence. Nevertheless, in valuation under uncertainty we only have the
information contributed by the expert and this make the correlation between the
indexes very strong.

6. A solution
Under uncertainty we only have very limited information and this does that
the measures of correlation concludes with the existence of a high correlation
between the variables. Considering that the parameter a is related to the
measures of correlation, it is possible to affirm that the strong existing
correlation under uncertainty involves values of a out of the interval (-1,1). This
carries the imposibility to apply the family FGM in these cases.
The basic problem is the absence of observations for every index, but if we
consider the parameter n as the number of times that the mode is observed, and
in this way, n] should be the number of times that the mode of the index 1 has be
observed and n2 the number of times that the mode of the second index has been
observed, we would possess a total observations of:
44 C. Garcia-Garcia, J.M. Herrerias-Velasco andJ.E. Trinidad-Segovia

("1 + 2)(« 2 + 2 ) = "1«2 + 2 « l + 2n


2 +4

Hereby we have passed of having three observations for every index to have
ni + 2 for the first index and n2 + 2 for the second index (see Figure 6). The
intention of this, is to avoid high values of the correlation coefficient as a
consequence of the absence of information. Nevertheless, the result is that the
correlation coefficient will have nule value for every value of ni and n2. It is
proposed to omit some of the observations, and it seems to be logical to
eliminate that one in which the index 1 takes the optimistic value whereas the
index 2 gives us the pessimistic value and vice versa, since these are extreme
cases that under a supposition of correlation between the indexes would not be
possible. Hereby, the number of observations should be:
nlri2 + 2 « ] + 2 « 2 + 2 (34)

Then,
var(/ }= A^i2P"i"2 + 6"i"2 + 4w|1- Mi[2(n2 + \jn,n2 + 2"i)]+ 1"2 + "i"2 + 3"i"2 + 2 "i + 2" 2 + '1 (35)
\n^n2 + 2«i + 2« 2 + 2)

_ M\|2n2nl2 + 6/1)02 + 4 n 2 1 - ^ [ ^ ( " l + lX"l"2 + 2 " 2 ) ] + W + n n


2 \ + 3"i"2 + 2
"2 + 2 "i +11
var(/ 2 )
(n^n2 + 2«| + 2n 2 + 2)
(36)
(A/, + M 2 -2MlM2)nln2 +{n1 +n2 + l )
cov(/,,/ 2 ) = (37)
«l/72 + 2 / i | 2 n 2 + 2

We have increased the observations with regard to the initial problem, and
we got a nule value of the correlation coefficient. Hereby when the correlation
coefficient takes a value into the range (-1/3,1/3) we will be able to apply the
FGM distribution family.

Figure 6. Graph of the proposed solution in which the parameter n is consider as the number of times
that the mode of the index is observed
Making Copulas Under Uncertainty 45

In Figure 7 and Figure 8 it is observed that applying the proposed solution


the correlation coefficient should be always minor than 0,3, so the necessary
condition to use the FGM distribution family in case of uncertainty would be
fulfilled.
We know that:

p_ C 0 v(7„/ 2 )
JVaifd-VarVi)

And using the previous expression we obtain Figure 7 in which the


correlation coefficient is represented using the expressions (35), (36) and (37).

igure 7. Representation of the correlation coefficient using the proposed solution


46 C. Garcia-Garcia, J.M. Herrerias-Velasco andJ.E. Trinidad-Segovia

3
iiiiipiipiipiipii|ppiipiigiii^
o T- IN N n ^_ ^ w (B Q- r-_ eo er> en
o o o~ o" o o o o" o" o o © o

Figure 8. Detail of the representation of the correlation coefficient using the proposed solution

7. A valuation method
Until now it is been developed the procedure to calculate a joint distribution
function when the marginal TSP is known, in the case of uncertainty as well as
under risk, an expression for it has been achieved. In addition there has been
realized a mathematical formulation of the above mentioned procedure.
The next step is to present the valuation procedure that lies in, given values,
(x0, y0) of both indexs 1 and 2, respectively, to calculate the value of the asset
for those values.
First, the value F0 is calculated for the values (x 0 , y0), this is equivalent to
say:
Later, two possibilities are presented depending of the value of F0, if it is or
not major than the standarized mode of the assets (M), and the final value of the
assets will depend of this. These posibilities are defined in the expressions (38)
and (39):
Branch 1: If F 0 -<Mthen:

M
{^]=F^V = M§ (38)
Making Copulas Under Uncertainty 47

Branch 2: If F0 > M then:

1 (1
- -^TS)" = F ° ^ =1 - (1 -Wr§- (39)

When the application is developed under uncertainty, it will be necessary to


realize the procedure of valuation for each of the different subfamilies of the
TSP distribution, since the parameter n has a complex interpretation and it is not
a known information. Nevertheless, under risk it is possible to estimate the
parameter n defining clearly the TSP distribution of each one of the indexes.
Now, two practical applications will be presented, first, under uncertainty
and, later, under risk.

8. Practical application of the MTDF with van Dorp and Kotz's


marginals under uncertainty enviroment
Table 4 countains the optimistic, more probable and pessimistic values
apported by the expert for the assets and the two indexes. In addition we have
information about reference values of the indexes. The standardized values are
presented in brackets.

Table 4. Inputs for the practical application

a m b
Assets 110 140 180
(0) (0,428) (1)
Index 1 200 230 300
(0) (0,3) (1)
Index 2 90 120 170
(0) (0,375) (1)

As it was commented in previous section, the parameter n has a difficult


interpretation and there has not been defined the question that should be realized
to the expert to obtain its value, even if the question exists. For it, in this
practical application the different values of n are calculated for each of the TSP
distributions subfamilies, using the equations (16), (17), (18), (19) and (20). The
above mentioned values are countained in Table 5:
48 C. Garcia-Garcia, J.M. Herrerias-Velasco andJ.E. Trinidad-Segovia

Table 5. Values of n

n Mesocurtic c-v classic


assets 3,31885 2,7926 3,02344
index 1 3,09726 2,93465 3,02344
index2 3,23222 2,83454 3,02344

Once the values for the parameter n and M are known, it is possible to
substitute in the expression (23) and to obtain the joint distribution function.
Analogous substituting in the expression (24) the joint density function can be
obtained.
Table 6 contains the values of the correlation coefficient as well as the
parameter alpha for each of the subfamilies referred in the practical case:

Table 6. Outputs of the practical application


Mesocurtic Constant var
Correlation 0,253233397 0,2624621
Coefficient
ALFA 0,43737487 0,453333

It could be observed that, in this case, the correlation coefficient is always


lower than 1/3 and therefore we will be able to apply the FGM distribution
family to solve the joint distribution function with TSP marginal, since, when
the correlation coefficient is lower than the above mentioned value, we will
obtain values of alpha in the interval (-1, 1) such as this distributions family
needs.
Finally, expressions (38) and (39), that define the two possible branches to
calculate the value of the assets once the two values of both indexes are known,
are applied. Later, figures (9) and (10) present the functions of the assets value
for each of the TSP distributions subfamilies.
Making Copulas Under Uncertainty 49

Figure 9. Function of the asset value for constant variante subfamily

Figure 10. Function of the asset value formesocurtic subfamily

9. Conclusions
I. Introducing two indexes in the two distributions functions methods
under uncertainty we will need the construction of a copula and define
50 C. Garcia-Garcia, J.M. Herrerias-Velasco andJ.E. Trinidad-Segovia

the marginal distributions. We will use the STSP distribution as the


marginal distributions and the FGM copula.
2. In FGM copula the parameter a varies between (-1,1) which imply that
the correlation coefficient varies between (-1/3,1/3).
3. Under uncertainty a high correlation exists between the indexes due to
the absense of observations for every index. As consequence the FGM
copula could not be used becauses it is restrected to the weak
correlation case.
4. We propose to consider the parameter n as the number of times that the
mode is observed. In this way, we have nj +2 observations for every
index Is, and we avoid high values of the correlation coefficiet as a
consequence of the absence of information, The solution is a nule value
for the correlation coefficient for every value of n^
5. It is proposed to omit extreme cases that under a supposition of
correlation between indexes wouln not be possible. It is showed that in
this way the correlation coefficient should be always minor than 0,3 so
the necessary condition to use FGM distribution family would be
fulfilled. In the practical application is showed the valuation method.

References
1. Athanassoulis, G.A., Skarsoulis, E.K. and Belibassakis, K.A. (1994).
Bivariate distributions with given marginals with an application to wave
climate description. Applied Ocean Research, 16, 1-17.
2. Alonso, R. and Lozano, J. (1985). El metodo de las dos funciones de
distribution: Una aplicacion a la valoracion de fincas agricolas en las
comarcas Centra y Tierra de Campos (Valladolid). Anales del INIA,
Economia, 9, 295-325.
3. Ballestero, E. (1971). Sobre la valoracion sintetica de tierras y un nuevo
metodo aplicable a la concentration parcelaria. Revista de Economia
Politica, 225-238.
4. Ballestero, E. (1973). Nota sobre un nuevo metodo rapido de valoracion.
Revista de Estudios Agrosociales, 85, 75-78.
5. Ballestero, E. and Caballer, V. (1982). II metodo delle due beta. Un
procedimiento rapido nella stima dei beni fondiari. Genio Rurale, 6, 33-36.
6. Ballestero, E. and Rodriguez, J.A. (1999). El precio de los inmuebles
urbanos. CIE Inversiones Editoriales DOSSAT 2000.
7. Barnnet, V. (1980). Some bivariate Uniform Distributions.
Communicationes in Statistics, A9,453-461.
8. Caballer, V. (1994). Metodos de valoracion de empresas. Ediciones
Piramide, S.A. 101-104.
9. Caballer, V. (198). Valoracion agraria. Teoria y practica. Ediciones Mundi-
Prensa. 4a edition.
Making Copulas Under Uncertainty 51

10. Caballer, V. (1999). Valoracion de arboles, frutales, forestales,


medioambientales, ornamentales. Ediciones Mundi-Prensa.
11. Caflas, J.A. Domingo, J. and Martinez, J.A. (1994). Valoracion de tierras en
las campinas y la Subetica de la provincia de Cordoba por el metodo de las
funciones de distribucion. Investigation Agraria. Serie Economia, 9, 447-
467.
12. Conway, D.A. (1979). Multivariate Distributions with specified Marginals.
Technical Report no. 145, Stanford University, Dept. of Statistics.
13. Cook, R. and Jonhnson, M.E. (1981). A family of distirbutions for
modelling non-elliptically symetric multivariate data. Journal of the Royal
Statistical Society, Series B, 43, 210-218.
14. DalPAglio, G., Kotz, S. and Salinetti, G. (1991). Advances in Probability
Distributions with Given Marginals. Kluwer Academic Publishers.
15. Eyraud H. (1936). Les principes de la mesure des correlations. Ann. Univ.
Lyon, Sect, A 1,30-47.
16. Farlie, D.J.G. (1960). The perfomance of some correlation coefficients for a
general bivariate distribution. Biometrika, 47, 307-323.
17. Frank, M.J. (1979). On the simultaneous associativity of F(x,y) and
x+y-F(x,y). Aequationes MATH, 19, 194-226.
18. Frechet, M. (1951). Sur les Tableux de Correlation Dont les Marges Sont
Dones. Annals de l'Universite de Lyon, Ser. 3, 14, 53-77.
19. Garcia, J., Cruz, S. and Andujar, A.S. (1999). II metodo delle due funzioni
di distribuzione: II modello triangolare. Una revisione. Genio Rurale, 11,
3-8.
20. Garcia, J., Cruz, S. and Garcia L.B. (2002a). Generalization del Metodo de
las dos funciones de distribucion (MTDF) a familias betas determinadas con
los tres valores habituales. Analisis, Seleccion y Control de Proyectos y
Valoracion. Servicio de publicaciones de la Universidad de Murcia.
21. Garcia, J., Cruz, S. and Garcia, L.B. (2002b). Regresion a traves de las
funciones de Distribucion. Actas de la XVI Reunion Asepelt-Espaiia,
Madrid.
22. Garcia, J. Cruz, S. and Garcia, L.B. (2002c). Iterative valuation process in
the meted of the two beta distributions. Spanish Journal of Agricultural
Research, 2(1).
23. Garcia, J. Cruz, S. and Garcia, L.B. (2004a). La STSP como distribucion
subyacente en el ambito del PERT. Capitulo de libro: Aspectos teoricos y
aplicados en la generation de distribuciones de probabilidad. ISBN:
84-931950-8-1.
24. Garcia, J. Cruz, S. and Garcia, L.B. (2004b). Proceso iterativo de valoracion
en el metodo de las dos betas. Programacion, seleccion, control y valoracion
de proyectos. Capitulo 3. 37-65. Editado por Universidad de Granada.
ISBN: 84-338-3108-9.
52 C. Garcia-Garcia, J.M. Herrerias-Velasco andJ.E. Trinidad-Segovia

25. Garcia, J., Cruz, S. and Garcia, L.B. (2005). The two-sided power
distribution for the treatment of uncertainty. Statistical Methods &
Applications, 4(5), 209-222.
26. Garcia, J., Cruz, S. and Rosado, Y. (2000). Las funciones de distribucion
multivariantes en la teoria general de valoracion. Actas de la XIV Reunion
Asepelt-Espafia, Oviedo (publicacion en CD-Rom).
27. Garcia, J., Cruz, S. and Rosado, Y. (2002). Extension multi-indice del
metodo beta en valoracion agraria. Economia Agraria y Recursos Naturales,
2(2), 3-26.
28. Garcia, J. and Garcia, L.B. (2003). Teoria General de valoracion. Metodo de
las dos funciones de distribucion. ISBN 84 95979 09 8.
29. Garcia, J., Trinidad, J.E. and Garcia, L.B. (2004). Valoracion por el metodo
de las dos funciones de distribucion: Como seleccionar la mejor
distribucion. XVIII Reunion ASEPELT 84-60947165-.
30. Garcia, J., Trinidad, J.E. and Gomez, J. (1999). El metodo de las dos
funciones de distribucion: la version trapezoidal. Revista Espanola de
Estudios Agrosociales y Pesqueros, 185, 57-80.
31. Garcia, J., Trinidad, J.E. and Sanchez, M. (1997). Seleccion de una cartera
de cultivos: el principio primero la seguridad de Roy. Investigation Agraria.
Serie Economia, 12(1,2,3), 425-445.
32. Genest, C. and Mackay J. (1986). The joy of copulas: Bivariate distributions
with uniform marginals. The American Statistician, 40(4), 280-283.
33. Guadalajara, N. (1996). Valoracion Agraria. Casos Practicos. Ediciones
Mundi-Prensa.
34. Gumbel, E.J. (1960). Bivariate exponential distributions. Journal of the
American Statistical Association, 55, 698-707.
35. Gumbel, E.J. (1961). Bivariate logistic distributions. Journal of the
American Statistical Association, 55, 335-349.
36. Herrerias, R., Garcia, J., Cruz, S. and Herrerias Velasco, J.M. (2001). II
modello probabilistico trapezoidale nel metodo delle due distribucion della
teoria generate de valutazioni. Genco Rurale. Estimo e Territorio. Rivista de
Scienze Ambientali ANNO LXIV, 4, 3-9.
37. Herrerias, R., Palacios, F., Callejon, J. and Perez, E. (2001). Un metodo
para contrastar la bondad de un experto en la metodologia PERT.
Programacion, seleccion y control de proyectos en ambiente de incertidumbre.
38. Herrerias, R., Callejon, J., Perez, E. and Herrerias, J.M. (2001). Las familias
de distribuciones beta de varianza constante y mesocurticas en el metodo
PERT. Programacion, seleccion y control de proyectos en ambiente de
incertidumbre.
39. Herrerias Velasco, J.M. (2002). Avances en la teoria general de valoracion
en ambiente de incertidumbre. Tesis Doctoral.
40. Hoeffding, W. (1940). Maszstabinvariante Korrelationstheorie. Schriften
des Mathematischen Instituts un des Instituts fur Angewandte Mathematik
der Universitat Berlin, 5, 181-233.
Making Copulas Under Uncertainty 53

41. Johnson, Mark E. and Tenenbein A. (1981). A bivariate distributon family


with specified marginals. Journal of the American Statistical Association,
76(373), 198-201.
42. Johnson, M.E. (1987). Multivariate Statistical Simulation. New York: John
Wiley.
43. Kimeldorf, G. and Sampson, A.R. (1975). One-parameter families of
bivariate distributions with fixed marginals. Communications in Statistics,
4,293-301.
44. Kotz, S. and Johnson, N.L. (1977). On some generalized Farlie-Gumbel-
Morgenstern distributions II: Regresion, correlation and further
generalization. Communications in Statisics, 6, 415-427.
45. Kotz, S. and Drouet, M. (2001). Correlation and dependence. Imperial
College Press.
46. Levy, P. (1950). Distance de deux variables aleatoires et distance de deux
lois de probabilite, in Generalities sur les probabilites. Elements aleatoires
by M. Frechet, Gauthier-Villars, Paris.
47. Mardia, V. (1967). Some contributions to contingency-type bivariate
distributions. Biometrika, 54, 235-249.
48. Marshall, A.W. and Olkin, I. (1988). Families of multivariate distributions.
Journal of American Statistical Association, 83, 803-806.
49. Morgenstern, D. (1956). Einfache Beispiele Zweidimensionaler
Verteilungen. Mitteinlings fu Mathematische Statistik, 8, 234-235.
50. Nataf, A. (1962). Determination des distributions de Probabilites don des
Marges sont Donnees C.R. Academy of Sciences, 225,42-43.
51. Palacios, F., Callejon, J. and Herrerias, J.M. (2000). Fundamentos
probabilisticos del Metodo de Valoracion de las dos distribuciones. Actas
de la XIV Reunion Asepelt-Espana, Oviedo (publication en CD-Rom).
52. Placket, R.L. (1965). A class of bivariate distributions. Journal of the
American Statistical Assosiation, 60, 516-522.
53. Pompilj, G. (1984). Le variabili casuali, 1st. Calcolo Probab. Univ. Roma.
54. Romero, C. (1977). Valoracion por el metodo de las dos distribuciones beta:
Una extension. Revista de Economia Politica, 75,47-62.
55. Sibuya, M. (1960). Bivariate extreme statistics. Annals of the Institute of
Statistical Mathematics, 19, 195-210.
56. Van Dorp, J.R. and Kotz, S. (2002a). A novel extension of the triangular
distribution and its parameter estimation. The Statistician, 51(1), 63-79.
57. Van Dorp, J.R. and Kotz, S. (2002b). The standard two sided power
distribution and its properties: With applications in financial engineering.
The American Statistician, 56(2), 90-99.
58. Van Dorp, J.R. and Kotz, S. (2003). Generalizations of two sided power
Distributions and their convolution. Communications and Statistics: Theory
and Method, 32(9).
Chapter 3
VALUATION METHOD OF THE TWO SURVIVAL FUNCTIONS

M. FRANCO-NICOLAS
Dpto. Estadistica e Investigation Operativa, Universidad de Murcia
Campus de Espinardo, Murcia, 30100, Spain

R. HERRERIAS-PLEGUEZUELO
Department of Quantitative Methods in Economics, University of Granada
Campus de Cartuja s/n. Granada, 18071, Spain

J. CALLEJON-CESPEDES
Department of Quantitative Methods in Economics, University of Granada
Campus de Cartuja s/n. Granada, 18071, Spain

J.M. VIVO-MOLINA
Dpto. Metodos Cuantitativos para la Economia, Universidad de Murcia
Campus de Espinardo, Murcia, 30100, Spain

In this paper, we discuss a new application of the survival functions in asset pricing from
quality indexes. Thus, we propose the valuation method based on the two survival
functions (VMTS) to find, under uncertainty, the market value from a quality index.
Within this framework, from a one-dimensional quality index, VMTS is equivalent to the
valuation method of the two distribution functions (VMTD), which produces loss with
respect to the assessments from each component of a multidimensional quality index;
nevertheless, VMTS provides profit with respect to these assessments from each
component. Finally, we motivate the use of VMTS, as tools for the valuation of an asset,
through a practical application on land pricing.

1. Introduction
In the literature, the survival or reliability measures have been widely used
in many areas of economics, in political science, in biology, and in industrial
engineering. In particular, many interesting results of reliability theory have
been applied in risk analysis, and their properties have interesting qualitative
implications in these fields (see, e.g. Bagnoli and Bergstrom (2005) and the
references therein).

55
56 M. Franco-Nicolas et al.

One of these measures is the well-known survival function, also called


decumulative distribution function by Yaari (1987), wherein the dual theory of
choice under risk is introduced. This paper proposes a new application of this
measure in asset pricing from quality indexes.
The asset pricing, under uncertainty, is often analyzed by econometric
modelling and hedonic price indexes (see, e.g. Banerjee et al. (2004), Deltas and
Zacharias (2004) and Benkard and Bajari (2005) and the references therein) as
improvements of the classical synthetic method, but the weakness of these
techniques is known in the absence of data. In particular, the valuation method
based on the two distribution functions (VMTD) has been used to find the
market value of an asset, which was introduced by Ballestero(1971) as valuation
method of the two beta distributions, and extended by Caballer (1975) and
Romero (1977); as well as the practical utility of this methodology when only
small samples are available (e.g. Alonso and Lozano (1985)).
Specifically, the beta distribution has been suggested as a rough model in
the absence of data, e.g. Law and Kelton (1982), for problems of assessment of
risk and uncertainty, such as the program evaluation and review technique
(PERT) and the appraisal of an asset from a quality index; so, Beray (1989)
proposed a more complicated distribution than the beta model, although their
parameters are more intuitive, and Williams (1992) and Johnson (1997)
suggested the use of a triangular distribution as a simpler distribution than the
beta model, which only requires three parameters (pessimistic, optimistic and
most likely).
In this setting, the VMTD allows to appraise an asset under uncertainty,
when the appraiser only makes available the pessimistic, optimistic and most
likely values, which may be supplied by expert judgement, besides it is
simplified to the classical synthetic method when both market value and quality
index follow uniform distributions, and it has been used for several applications,
such as the valuation of real estates, irrigation, trees, business,...
In recent years, some authors have paid more attention to the study and
generalization of probability models required in PERT methodology and
valuation theory (see, e.g. Williams (1992), Johnson (1997), Johnson and Kotz
(1999), Herrerias et al. (2001), Herrerias (2002), van Dorp and Kotz (2002a),
(2002b) and (2003), Garcia and Garcia (2003) and Herrerias et al. (2003)), as
well as in analysis and development of the VMTD (see, e.g. Garcia et al. (1999),
Cruz et al. (2002), Garcia et al. (2002), Herrerias (2002) and Garcia and Garcia
(2003)).
In particular, the VMTD has been extended to assess when it is considered a
greater information by more than a quality index of the asset (see, e.g. Garcia
Valuation Method of the Two Survival Functions 57

et al. (2002), Herrerias (2002) and Garcia and Garcia (2003)). Unfortunately, the
VMTD produces loss with respect to the assessments from each component of a
multidimensional quality index, thus it is often used weights among their
components to adjust the asset pricing. Therefore, we consider that the new
valuation method based on the two survival functions (VMTS) might help to
appraise an asset under uncertainty from a quality index, even more when the
dimension is reduced by unobserved components of quality.
The purpose of this paper is to establish the theoretical framework of a new
valuation method and exhibit its practical application. For that, we study this
new technique based on the two survival functions corresponding to the two
probability models, providing an in depth explanation of the principles
underlying the analysis of the economic value of an asset by means of the
VMTS and its comparison with the VMTD.
In Section 2, the VMTD is briefly introduced and the new VMTS is given
in order to value an asset, under uncertainty, from a one-dimensional quality
index. Section 3 analyzes the VMTS to find the assessment from a
bidimensional or multidimensional quality indexes, wherein the differences
between both methods are shown when greater information by more than a
quality index of an asset is made available. Likewise, in Section 4, the use of the
VMTS, as tools for the valuation of an asset, is motivated through a practical
application on land pricing, and finally, we provide some concluding remarks in
Section 5.

2. Valuation method of the two survival functions


In this section, we introduce a new viewpoint in valuation theory to appraise
an asset, under uncertainty, from a one-dimensional quality index, and we
establish its relationship with the VMTD in this case. For that, in economic
modelling, it is usual to assume certain logical rules of the market. In particular,
when we attempt to obtain the market value of an asset from a quality index, the
following basic valuation principle is assumed: the asset with greater quality
index has greater market value, which may be statement as follows:
Let j and k be two assets, with / • and ik their values of the quality index
and Vj and vk their market values, respectively. If ij < ik then vj < vk .
Under this assumption, the valuation method of the two distribution
functions is based on the equality between the distribution function Fv of the
market value V of the asset and the distribution function Ff of its quality index
/ . Thus, the market value of an asset with quality index / = / by the VMTD is
58 M. Franco-Nicolas et al.

VD = MO (1)
w h e r e <j>D = Fy1 ° F, .
In this sense, it is possible to consider other valuation techniques where the
basic principle holds. In particular, taking into account the survival function Sv
of the market value V of the asset and the survival function Sj of a quality
index of this asset, instead of their distribution functions, we can consider an
alternative method based on the equality of both survival functions, and
consequently, the market value of an asset with quality index I = i by this new
valuation method of the two survival function is
v5=#s(0 (2)
l
where 0S = Sy °Sj, and these survival functions are defined by
Sy(v) = l-Fy(v) and Sj(i) = \-Fj(i). Besides, this VMTS verifies the basic
valuation principle, since both survival functions are decreasing.
In order to compare the assessments obtained by Eqs. (1) and (2) through
both methods (VMTD and VMTS) from a one-dimensional quality index, we
give the following result.
Theorem 1. Let 1 = i be the value of the one-dimensional quality index, with
vD its market value by VMTD and vs its assessment by VMTS. Then, vD =vs.
Proof. From Eqs. (1) and (2), it is immediate, since

v5 = S? OS, (0) = (\-Fv )"' (1 - F, (0) = Fyx (F(0) = vD.


Consequently, in the case of an one-dimensional quality index, the VMTS is
only a viewpoint alternative to the VMTD. However, in next section, we will see
that these methodologies provide different results in the valuation of an asset
when more than one quality index is made available to the appraiser.

3. VMTS from a multidimensional quality index


This section provides a comprehensive development of the VMTS when the
quality index of the asset is multidimensional, since the professional oftentimes
need to determine the value of an asset through a particular set of quality
indexes which affect this asset, i.e., using a bidimensional or multidimensional
quality index, whose components are each of the one-dimensional quality
indexes. For that, we assume the same basic valuation principle, the asset with
greater quality index has greater market value, where the ordering between two
vectors is determined by the orderings between the corresponding components
of both vectors.
Valuation Method of the Two Survival Functions 59

In this context, we analyze the bidimensional case, and subsequently, we


will extend the results to the multidimensional quality index.

3.1.1. Bidimensional quality index


In this case, the basic valuation principle can be established as follows:
Let j and k be two assets, with (i\j,iij) a n ^ ('i*»'2t) their values of the
quality index and v.- and vk their market values, respectively. If
then v
(hj,'2j) < (hk>hk) j < vk •
Analogous to the one-dimensional case, the VMTD is based on the equality
between both distribution functions, Fv of the market value and F, of the
bidimensional quality index, and so, the appraisal of an asset with quality index
/ = (/,,/2) by the VMTD is:
v0=^fl(/'i,i2) (3)
where <j>D = Fyl ° Fj, which provides a market value of the asset lower than the
valuations obtained for each component of its quality index, i.e., a reduction or
loss in the appraisal of the asset when it is considered a greater information by
more than a quality index of this asset, since
v D <inf{v,,v 2 } (4)
where v, and v2 are the assessments of the asset through each one of the
components of the quality index, given by Eq. (1) from their marginal
distribution functions F{ and F2, respectively.
In this context, the use of an alternative valuation method, wherein the basic
valuation principle holds and it equals to the VMTD in the one-dimensional
case, is more relevant if it allows us to solve the depreciation in the market of
the asset when there are more than a quality index. Specifically, taking into
account the survival function Sy of the market value V of the asset and the
bivariate survival function S, of its quality index, the VMTS is based on the
equality between both survival functions, and consequently, the assessment of
an asset with quality index / = 0',,z'2) by the VMTS is
Vy=#s(»'i,l' 2 ) (5)
l
where <j>s = Sy ° Sj and the bivariate survival function is defined as:
S1{i],i2) = P(I>(il,i2))
which is determined by the bivariate distribution function of the quality index
and their marginal distributions
5 / (/„i 2 ) = F / (i 1 ,i 2 )-F 1 (i I )-f 2 (i 2 ) + l. (6)
Likewise, this alternative methodology could say dual of the VMTD, is a
new viewpoint to deal the market value of an asset with more than a quality
60 M. Franco-Nicolas et al.

index, which does not lead to loss in the appraisal of the asset. Moreover, the
VMTS produces an appraisal of the asset upper than the valuations obtained for
each component of its quality index, i.e., an appreciation when more than one
quality index is made available to value the asset, as we prove in the next result.
Theorem 2. Let I = (iiti2) be the value of the bidimensional quality index, with
Vj and v2 its market values by VMTS from the components /, and I2,
respectively. Let vs be the assessment by VMTS, then, vs > sup{V|, v 2 }.
Proof. Taking into account that for all bivariate survival function the following
inequalities holds:
S / f o . / j ) * ^ ! , ) and SI(il,i2)<S2(i2)

where Sj(ij) = l-Fj(ij) for j = 1,2, are the marginal survival functions of
each component of the quality index, i.e.
5 / 0 1 ,/ 2 )<inf{5,(/ 1 ),5 2 0 2 )}

and the market values V| and v2 given by Eq. (2) for each component, we have
the following inequality
V 5 >SUp{v!,V 2 }

since Sv is decreasing, and also Syl.


However, we establish the comparison between both methods from a
bidimensional quality index. So, when we have great information through more
quality characteristics of the asset, the appraisals by the VMTS are greater than
ones obtained by the VMTD.
Theorem 3. Let I = (il,i2) be the value of the bidimensional quality index,
where vD is its market value by VMTD and vs is its assessment by VMTS.
Then, vD <vs.
Proof. Obvious from Theorem 2, the inequality (4) and the equivalence between
both methods in the one-dimensional of Theorem 1.

3.1.2. Multidimensional quality index


Let us see now, the extension of the VMTS to the multidimensional quality
index / = (/,,...,/„) with n>2, under the assumption of the basic valuation
principle, which may be statement as follows:
Valuation Method of the Two Survival Functions 61

Let j and k be two assets, with {iXj,...,i„j) and (ilk,...,i„k) their values of
the quality index and Vj and vk their market values, respectively. If
(hj,...,inj) <fa*,...,»'„*)then v,. < vk .
So, from a value of the quality index / = (il,...,i„) of the asset, the VMTD
provides the next assessment:
VD = Mh'-,'*) (7)
where <f>D = Fyl <= F,, which is lower than the valuations obtained for each of its
components, since
vD<inf{v1,...,v„} (8)
where v • 's are the assessments of the asset through marginal distributions Fj 's
byEq. (1), j = l,...,n.
Likewise, using the survival function Sy of the market value V of the asset
and the multivariate survival function S} of its quality index, the assessment
obtained by the VMTS from / = (/,,...,*„), is
v 5 =&(/,,...,/„) (9)
where $s = Sy1 ° Sj.
Analogous to the bidimensional case, the VMTS solves the depreciation
underwent by the VMTD when more than one quality index is made available to
value the asset, and provides appraisals of the asset greater than the best
valuation obtained for each component of its quality index.
Theorem 4. Let I = (ii,...,in) be the value of the multidimensional quality
index, with Vj 's its market values by VMTS from the components Ij 's,
respectively. Let vs be the assessment by VMTS, then, vs >sup{vj,...,v„}.
Theorem 5. Let I = (il,...,in) be the value of the multidimensional quality
index, where vD is its market value by VMTD and vs is its assessment by
VMTS. Then, vD<vs.

4. Practical application
In this section, we apply the new valuation method based on the two
survival functions to appraise agricultural plots, obtaining the assessments using
both valuation methods, wherein one can check the results established in the
former sections.
For that, we use the 2nd practical case of Guadalajara (1996), in which she
studies the valuation of an agricultural plot, used for growing grapes, in
Vinalopo Medio region (Alicante, Spain). The quality indexes considered to
62 M. Franco-Nicolas et al.

describe the market value (€//w2) are the gross production of grapes (kg/m2),
together with the percentage of sand in the soil of the plot.
Table 1 displays data of the minimum (pessimistic), maximum (optimistic)
and mode (most likely) values for each variable; where the goal is to valuate a
plot of agricultural land with an area of 12010.3833 m 2 , a gross production of
20399kg/m 2 and a sand/soil content of 32%.

Table 1. Agricultural plots used for growing grapes

Pessimistic Optimistic Most likely


P=Market value (€/m2) 0.8132 1.5012 1.0634
/,=Gross production (kg/m2) 1.5611 2.6019 1.8734
/2=Sand/soil content (%) 15 50 25

In order to apply and compare the assessments of this agricultural plot,


using the VMTS with the ones obtained by the VMTD from the bivariate
probability model of the quality index, we will consider that this quality index
follows a pyramidal model and the market value has either triangular or
trapezoidal model, which are a sample of the different models that might be
considered both in the market value as well as in the bidimensional quality
index. Note that to choose the four basic trapezoidal parameters, from these
three parameters (pessimistic, optimistic and most likely), we use the
specification of the modal interval given in Callejon et al. (1996)
Table 2 displays the assessments in both models of the market value, which
are multiplied by the 12010.3833 m2 of the particular plot; in each and every
cases, the VMTS allows us to get assessments which reduce loss underwent by
the VMTD with respect to the ones obtained for each component of the quality
index, as well as the remaining comparisons of Section 3.

Table 2. Assessments from pyramidal quality index

Fy VD inf{F,, V2} sup{K;. V2) Vs


Triangular 12587.1 13539.2 13747A 15033.8
Trapezoidal 12773.2 13785.0 13983.3 15198.9

Likewise, it is understandable to think that the gross production of grapes is


related to the sand content of the soil, although this correlation may not be very
strong. In any way, if we assume independence between the two components of
the quality index and both have a triangular model, and the market value follows
either triangular or trapezoidal models, we give in Table 3 the assessments of
Valuation Method of the Two Survival Functions 63

this particular property; where one can also check the comparisons established in
Section 3.

Table 3. Assessments from independent and triangular components

Fy VD inf{K,, V2} sup{K,, V2} Vs


Triangular 12787.3 13775.7 14018.9 15441.2
Trapezoidal 12994.2 14010.0 14239.9 15583.8

Nevertheless, to give a more clear exposition of the behaviour among these


assessments, using both VMTS and VMTD from the bidimensional quality
index, and a better comparison with respect to the appraisals obtained for each
one of the components, we show some graphs corresponding to triangular model
for the market value of this agricultural plot, in both bivariate probability models
of its quality index (pyramidal or independent and triangular components). Note
that to make easy the interpretation of these graphs in the plane, both
components of the quality index have been simultaneously taken from the least
to the supreme into their supports.
Figure 1 displays the assessments when the quality index has a pyramidal
model and Figure 2 depicts the appraisals when the quality index has
independent and triangular components, wherein one can see the inequalities of
Theorems 2 and 3.

Variable
-«- inf(Vl,V2)

-*- sup(Vl,V2)

-•- VD

— VS
10000
5 10 15 20 25 30 35

Figure 1. Assessments from pyramidal quality index


64 M. Franco-Nicolas et al.

Variable

-e- inf(Vl,V2)

-=- sup(Vl,V2)

-+- VD

— VS

5 10 15 20 25 30 35

Figure 2. Assessments from independent and triangular components

5. Conclusions
Finally, we point out the main conclusions of this paper.
The VMTD and VMTS are equivalent to appraise an asset from an one-
dimensional quality index.
The VMTS provides an greater assessment than one obtained by the
VMTD, from a bidimensional or multidimensional quality index. Moreover, the
VMTS produces profit with respect to both assessments given by each
component of the quality index, solving the depreciation of the VMTD when
more than a quality index made is available.

References
1. Alonso, R. and Lozano, J. (1985). El metodo de las dos funciones de
distribution: Una aplicacion a la valoracion de fincas agricolas en las
comarcas Centra y Tierra de Campos (Valladolid). Anales del INIA:
Economia, 9, 295-325.
2. Ballestero, E. (1971). Sobre valoracion sintetica de tierras y un nuevo
metodo aplicable a la concentration parcelaria. Revista de Economia
Politica, 57, 225-238.
3. Bagnoli, M. and Bergstrom, T. (2005). Log-concave probability and its
applications. Economic Theory, 26,445-469.
4. Banerjee, A., Gelfand, A.E., Knight, J.R. and Sirmans, C.F. (2004). Spatial
modeling of house prices using normalized distance-weighted sums. Journal
of Business and Economic Statistics, 22, 206-213.
Valuation Method of the Two Survival Functions 65

5. Benkard, C.L. and Bajari, P. (2005). Hedonic price indexes with unobserved
product characteristics, and application to personal computers. Journal of
Business and Economic Statistics, 23, 61-75.
6. Berny, J. (1989). A new distribution function for risk analysis. Journal of
the Operational Research Society, 40, 1121-1127.
7. Caballer, V. (1975). Concepto y metodos de valoracion agraria. Ed. Mundi-
Prensa, Madrid.
8. Callejon, J., Perez, E. and Ramos, A. (1996). La distribucion trapezoidal
como modelo probabilistico para la metodologia PERT. In Programacion,
selection y control de proyectos en ambiente de incertidumbre, R. Herrerias
(ed) (2001), 167-177.
9. Cruz, S., Garcia, C.B. and Garcia, J. (2002). Statistical test for the method
of the two distribution functions. An application in finance. In VI Congreso
de Matematica Financiera y Actuarial and 5th Italian-Spanish Conference in
Financial Mathematics, Valencia.
10. Deltas, G. and Zacharias, E. (2004). Sampling frequency and the
comparison between matched-model and hedonic regression price indexes.
Journal of Business and Economic Statistics, 22, 206-213.
11. Garcia, J., Cruz, S. and Andujar, A.S. (1999). II metodo delle due funzioni
di distribuzione: II modello triangolare. Una revisione. Genio Rurale, 11,
3-8.
12. Garcia, J., Cruz, S. and Rosado, Y. (2002). Extension multi-indice del
metodo beta en valoracion agraria. Economia Agraria y Recursos Naturales,
2, 3-26.
13. Garcia, J. and Garcia, L.B. (2003). Teoria General de Valoracion. Metodo
de las dos funciones de distribucion. Ed. Fundacion Unicaja, Malaga.
14. Guadalajara, N. (1996). Valoracion Agraria. Casos Practicos. Ed. Mundi-
Prensa, Madrid.
15. Herrerias, J.M. (2002). Avances en la Teoria General de Valoracion en
Ambiente de Incertidumbre. PhD Dissertation, Universidad de Granada.
16. Herrerias, R., Garcia, J. and Cruz, S. (2003). A note on the reasonableness
of PERT hypotheses. Operations Research Letters, 31, 60-62.
17. Herrerias, R., Garcia, J., Cruz, S. and Herrerias, J.M. (2001). II modello
probabilistico trapezoidale nel metodo delle due distribuzione della teoria
generale de valutazioni. Genio Rurale. Rivista di Scicienze Ambientali,
LXIV, 3-9.
18. Johnson, D. (1997). The triangular distribution as a proxy for the beta
distribution in risk analysis. Journal of the Royal Statistical Society, Ser. D,
46, 387-398.
19. Johnson, N.L. and Kotz, S. (1999). Non-smooth sailing or triangular
distributions revisited after some 50 years. Journal of the Royal Statistical
Society, Ser. D, 48, 179-187.
20. Law, A.M. and Kelton, W.D. (1982). Simulation modelling and analysis.
Ed. New York: McGraw-Hill.
66 M. Franco-Nicolas et a!.

21. Romero, C. (1977). Valoracion por el metodo de las dos distribuciones beta:
una extension. Revista de Economia Politica, 75,47-62.
22. van Dorp, J.R. and Kotz, S. (2002a). The standard two sided power
distribution and its properties: with applications in financial engineering.
The American Statistician, 56, 90-99.
23. van Dorp, J.R. and Kotz, S. (2002b). A novel extension of the triangular
distribution and its parameter estimation. Journal of the Royal Statistical
Society, Ser. D, 51,63-79.
24. van Dorp, J.R. and Kotz, S. (2003). Generalized trapezoidal distributions.
Metrika, 58, 85-97.
25. Williams, T.M. (1992). Practical use of distributions in network analysis.
Journal of the Operational Research Society, 43, 265-270.
26. Yaari, M. (1987). The dual theory of choice under risk. Econometrica, 55,
95-115.
Chapter 4
WEIGHTING TOOLS AND ALTERNATIVE TECHNIQUES TO
GENERATE WEIGHTED PROBABILITY MODELS IN
VALUATION THEORY

M. FRANCO-NICOLAS
Dpto. Estadistica e I.O., Universidadde Murcia
Campus de Espinardo, Murcia, 30100, Spain

J.M. VIVO-MOLINA
Dpto. Metodos Cuantitativos para la Economia, Universidad de Murcia
Campus de Espinardo, Murcia, 30100, Spain

In risk analysis, different procedures based on weighted probability models are usual
tools to reduce loss of the assessments in multivariate scenarios. In particular, the
weighted distribution functions have been widely used to correct and fit the market value
of an asset, through the valuation methods of the two functions, with respect to the
appraisals from each component of the multidimensional quality index, in the field of the
Valuation Theory.
In this context, the weighting procedures are of interest to find the weights and
consequently, to generate these weighted probability models.
The main objective of this paper is to analyze the different weighting techniques used in
the Valuation Theory, as well as to propose an alternative to calculate the weights and a
new tool to generate these weighted probability models.
First, the well-known weighting techniques to generate the weights are introduced, under
both independence and dependence presence of the components of the quality index.
Secondly, we expand these weighting techniques by the survival functions, which allows
us to generate other weighted probability models.
Likewise, we discuss a new tool to determine the weights of the components of the
quality index, modal mean technique, based on the mode values of its marginal
distribution functions, which extend the size of the possible weighted probability models
to approach the market value of the asset.
Finally, we give an application of these weighting techniques to generate weighted
probability models in one example of land pricing, and thus we obtain the assessments of
the land property according to each weighted probability model.

1. Introduction
In recent years, some authors have paid more attention to the study and
generalization of probability models required in PERT methodology and
Valuation Theory (see, e.g. Williams (1992), Callejon, Perez and Ramos (1996),

67
68 M. Franco-Nicolas andJ.M. Vivo-Molina

Johnson (1997), Johnson and Kotz (1999), Herrerias, Garcia, Cruz and Herrerias
(2001), Herrerias (2002), van Dorp and Kotz (2002a), (2002b) and (2003),
Garcia and Garcia (2003) and Herrerias, Garcia and Cruz (2003)). Likewise, the
valuation method of the two distributions (VMTD) have been studied and
applied, under uncertainty, to approach the market value of an asset from a
quality index (see, e.g. Garcia, Cruz and Andiijar (1999), Garcia, Trinidad and
Gomez (1999), Cruz, Garcia and Garcia (2002), Garcia, Cruz and Garcia (2002),
Garcia, Cruz and Rosado (2002), Herrerias (2002), Garcia and Garcia (2003)
and Garcia, Herrerias and Garcia (2003)); in this way, the valuation method
based on the two survival functions (VMTS) have been also used to find the
market value from a quality index (see, e.g. Callejon, Franco, Herrerias and
Vivo (2005), Franco, Callejon, Herrerias and Vivo (2005) and Franco, Herrerias,
Vivo and Callejon (2005)).
Unfortunately, the valuation methods based on two probability models
present some disadvantages when it is considered a greater information by more
than a quality index of this asset, such as loss or profit with respect to the
assessments from each component of the quality index. In order to reduce loss of
the assessments in risk analysis, it is useful to consider probability models
weighing the distinct components of the quality index. Therefore, these
procedures allow to correct and fit the market value of an asset. In particular,
seeking to reduce the depreciation (appreciation) underwent in the market by the
VMTD (VMTS) when more than one quality index is made available, it is
common the use of weighted probability models based on the marginal
distribution (survival) functions of the multidimensional quality index in both
cases, independence and dependence among its components.
Therefore, these weighting procedures, marginal distribution or survival
functions, in both independende and dependence cases, require to determine the
weights, i.e., the coefficients a and p of the weighted probability models.
Herrerias (2002) and Garcia and Garcia (2003) analyze three techniques to
calculate these weights, and consequently, to generate weighted probability
models:
1. subjective (supplied by an expert judgment).
2. modes (relationship between the modal values).
3. econometrics (fit by the linear models).
Within this framework, under independence of the components of the quality
index, Herrerias (2002) summarizes the use of the three procedures, with their
advantages and disadvantages, respectively. However, under dependence
between the components of the quality index, he comments that these techniques
Weighting Tools and Alternative Techniques 69

are reduced to two: subjective and econometrics. Nevertheless, we have not


found any motive to discard the mode technique under dependence, except the
same ones and well-known in the case of independence.
Remark that, in the subjective technique, the expert (appraiser) supplies the
information about the weights of the indexes, and so, one might point out the
inconvenients of its subjectivity, see Herrerias (2002). Therefore, we analyze
the remaining tools.
First, the weighting techniques to generate the weights are introduced, under
both independence and dependence between the components of the quality
index. Secondly, we expand these weighting tools by the survival functions,
which allows us to generate other weighted probability models. Likewise, we
discuss an alternative tool to determine the weights of the components of the
quality index, modal mean technique, based on the modal values of its marginal
distributions functions, which increases the possible weighted probability
models to approach the market value of the asset. Finally, we give an application
of these weighting procedures to generate weighted probability models in one
example of land pricing, and thus we obtain the assessments of the land property
according to each weighted probability model.

2. Techniques to generate weighted models valuation


In this section, we discuss the mode and econometric techniques, which
were introduced by the marginal distribution functions. Subsequently, we
analyze both methods from an alternative viewpoint, using the marginal survival
functions.

2.1. Econometric technique


Let us see now, the econometric technique to generate the weights of the
components of the quality index, in each case, independence and dependence.

2.1.1. Econometric technique by distribution functions


The generation of the weights by the econometric technique, when the
components are independent, is based on the estimation of the following
regression model
logF v (v,) = alogFx(iu) + plogF2(i2t) + u,, t = 1,...,n
where Fv is the distribution function of the market value V of the asset, F, is
the marginal distribution function of the i th component of the quality index,
with i = 1,2, and F{ (/',, i2) its joint distribution function. So, the estimation a
70 M. Franco-Nicolas andJ.M. Vivo-Molina

of the coefficient of the first component might be calculated by restricted least


squares, i.e., for least square involving the restriction a + /3 = l.
On the other hand, under dependence, the weights of the components using
the econometric technique, is determined by the next regression model
Fv(vt) = pFx{iu) + qF2(i2l) + u„ t = 1,...,n
and therefore, taking into account the restriction/? + q = 1, the weight/? of the
first component is estimated by restricted least squares.

2.1.2. Econometric technique by survival functions


In this item, we propose a different viewpoint in the econometric technique,
by the use of the survival function instead of the distribution function in the
above regression models.
Thus, assuming independence between the components of the quality index,
the estimation a. can be obtained through restricted least squares in the
following regression model
logS r (v,) = alogS,(/„) + /5logS2(i2l) + ut, t = l,...,n
Likewise, when the components of the quality index are dependent, the
estimation p might be calculated by restricted least squares in the next
regression model
•SV(vr) = pS\(ht) + qS2(ht) + ut,t = I,-, n
where Sv = 1 - Fv is the survival function of the market value of the asset, and
Sf -1 - Ft is the marginal survival function of the quality index, for z = 1,2 .
However, the advantage of the valuation methods of the two functions
(VMTD and VMTS) against other appraisal methods is the practical utility of
this methodology when only small samples are available and the weakness of
the other ones in the absence of data. It reduces the usefulness of the above
regression models, and consequently, for the estimation of the weights by the
previous econometric techniques, which attempt to improve the assessments by
the best fit among the distributions or survivals. Besides, one can note the
addition of errors, in the estimation of the weights and the fit of the probability
models (market value and quality index).

2.2. Mode technique


In this subsection, we analyze the generation of the weighted models by the
mode technique, which is based on the equality of the modal values between the
market value and the quality index through their probability models.
Weighting Tools and Alternative Techniques 71

First, the mode technique allows us determine the weights by the following
relationship between the distribution functions
Fv(m) = FWD(ml,m2) (1)
where FWD is the distribution function of the weighted model from the marginal
distribution functions of the quality index.

2.2.1. Mode technique by distribution functions


Let us see now the behaviour of the mode technique when both components
of the quality index are independent, and secondly, when then components are
dependent.
On the one hand, under independence of the components of the quality
index, from Eq. (1) the weight of the first component is given by
Fv(m) = F{x(mi)Fta(m2)
or equivalently,
Fv(m) F,{mx)
(2)
F2(m2) F2(m2)
where 0 < a < 1 represents the weight of the first component of the quality
index.
If Fx (mt) = F2 (jn2), then Eq. (2) only makes sense when
Fv (m) = Fl (m, ) = F2 (m2), which is a strong restriction on the modal market
value of the asset. In that case, a might be any point in (0,1).
If Ft (mx) 5* F2 (m2 ) , then the weight of the first component holds
= log Fy(m)-log F2(m2)
log F^mx)- log F2{m2)
wherein we point out, the following contradictory situations:
1. When Fv{m) < F2(jn2) < Fx{mx) then a<0
2. When Fv (m) < F, (m,) < F2 (m2) then a > 1
Therefore, in order to that a e (0, l), it is necessary to impose the following
restriction on the modal market value of the asset
Fv (m) e [Ft (/w,), Fj (ntj )] with i*je {1,2} such that Ft (m,) < Fj (ntj ) (3)
Hence, we conclude with the following disadvantages of the mode
technique by marginal distribution functions under independence:
1. The modal valuation has not to correspond with the modal quality index
(see Ballestero and Rodriguez (1999) and Herrerias (2002)).
2. It is required strong restrictions on the modal value of the distribution
function of the market value in order to generate feasible weights.
72 M. Franco-Nicolas andJ.M. Vivo-Molina

On the other hand, when the two components of the quality index are
dependent, from Eq. (1), the mode technique allows us to get the weights by
Fr (w) = pFx (mx) + (1 - p)F2 (m2)
or equivalently,
Fv (m) - F2 {m2) = p(Fx (m,) - F2 (m2 )) (4)
where 0 < p < 1 represents the weight of the fisrt component of the quality
index.
If Fx (mx ) = F2 (m2), then Eq. (4) only makes sense for
Fr(m) = Fx(mx) = F2{m2), which is a strong restriction on the modal market
value of the asset, and thus, p could be chosen in (0, l).
If Fx (w,) £ F2 (m2), then the weights holds
Fv{m)-F2{m2)
P=
fi(ml)-F2(m2)
wherein we point out, the following contradictory cases:
1. When Fy(m)<F2(m2)<Fl(ml) then p<0
2. When Fv (m) < Fx (mx) < F2 (m2) then p > 1

Therefore, in order to that p e (0, l), it is necessary to impose the restriction


(3) on the modal market value.
Consequently, under dependence between the components of the quality
index, the disadvantages of the mode technique by marginal distribution
functions are the same as in the independent case.

2.2.2. Mode technique by survival functions


Similar to the econometric technique, we propose a different viewpoint in
the mode technique based on the use of the survival functions instead of the
distribution functions. So, the weights with the mode technique are determined
by the following relationship between the survival functions
Sv(m) = Sws(m1,m2) (5)
where Sws is the survival function of the weighted model from the marginal
survival functions of the quality index.
So, let us see now, the behaviour of the mode technique from weighted
models through the marginal survival functions of the bidimensional quality
index, in both cases, independence and dependence of their components.
First, in the case of independence between the two components, from
Eq. (5) the weight of the first component is given by
Sr(m) = Sla(ml)S2-a(m2)
Weighting Tools and Alternative Techniques 73

or equivalently,
Sv(m)
(6)
S2(m2) S2(m2)
If 5, (/»!) = S2 (jn2), then Eq. (6) only makes sense when
Sy {m) = 5, (m,) = S2 (m2), which is a strong restriction on the modal market
value.
If Sx (ml) * S2 (m2), then the weight holds
logS F (/M)-logS 2 (m 2 )
a =
log Sj (»!,)-log S 2 (/« 2 )
wherein we remark the following contradictory situations:
1. When Sv(m)> S2(m2)>Sl(mi) then a<0
2. When Sv (m) > 5, (w,) > S2 (m2) then a > 1

Therefore, in order to that a e ( 0 , l ) , it is necessary to impose the following


restriction on the modal market value of the asset
SyWelSjirrtiXSjimj)] with i*je {1,2} such that Si{mi)<Sj(mj) (7)
Remark that the disadvantage of the mode techique by the weighting of the
marginal survival functions is the strong restriction (7) on the modal value of the
survival function of the market value in order to generate feasible weights.
On the other hand, under dependence between the components of quality
index, from Eq. (5) the mode technique allows us to obtain the weights by
Sv (m) = pSt {mx) + (1 - p)S2 {m2 )
or equivalently,
Sv (m) - S2 (m2) = p{Sx (w,) - S2 (m2 )) (8)
where 0 < p < 1 is the weight of the first component of the quality index.
If S, (TW, ) = ^2 (m2), then Eq. (8) only makes sense for
Sv (m) = 5, (ml ) = S2 (m2), which is a strong restriction on the modal market
value.
If 5, {mx) * S2 (m2 ) , then the weights holds
Sv(m)-S2{m2)
P=
S\(mx)-S2(m2)
wherein we remark the following contradictory cases:
1. When Sv (m) > S2 (m2 ) > S, (mx) then p < 0
2. When Sv (w) > Sx (w,) > S2 (m2 ) then p > 1
74 M. Franco-Nicolas andJ.M. Vivo-Molina

Therefore, in order to that p e (0, l), it is necessary to impose the restriction (7)
on the modal market value. So, the mode technique by marginal survival
functions has the same disadvantage the mode in both cases, independence and
dependence between of the components.
Besides, note that Sx (m1 ) = S2 (m2) if and only if F, (w, ) = F2 {m2), and
the restrictions (3) and (7) are equivalent. Consequently, we have found the
same disadvantages of this technique by both weightings, the marginal survival
functions and the marginal distribution functions.

3. New technique to generate weighted models


In this section, we discuss a new technique to find the weights of the
components, avoiding the disadvantages of the former tools: the subjectivity, the
weakness of the econometric methods and the restrictions on the modal market
value of the mode technique. For that, we consider a method based on the
marginals of the probability model of the quality index and the weighted model
from them, to generate the weights.

3.1. Modal mean technique by distribution functions


In order to generate weighted probability models by the marginal
distribution functions, which reduce the depreciation of the VMTD with respect
to the assessments from each component, the modal mean technique is based on
the modal values of the quality index.
Remark that for any weight of the first component, 0 < a < 1 or Q< p <\
under independence or dependence, respectively, the weighted model is bound
by the marginal distribution functions
inf {Fx(/,),F2(z2)} < FWD(/,,i2)< sup{F,(/,),F 2 {i 2 )}
for all quality index (ix,i2) • m particular, for the modal quality index (mum2),
the following inequalities hold
inf {Fx {mx), F2 (m2)} < FWD (w,, m2) < sup{F, (w,), F2 (m2)}
In this setting, we propose the modal mean technique to generate the
weighted model with parameter (a or p) such that the distance among these
three values is minimized, i.e., the modal value of the weighted distribution
function and the two modal values of the marginal distribution functions, which
is given by
F tm ... x ^i(mi)+ F2(m2)
tWD(mx,m2) = (9)
Weighting Tools and Alternative Techniques 75

Note that the modal mean technique generates a weighted model using the
available information by these quality indexes; so it is not influenced by the
market value, and consequently, this procedure does not require any restriction
on the modal market value.
Firstly, when the components of the quality index are independent, from
Eq. (9) the weight a provides by the modal mean method is given by
F (m ) + F2(m2)
Fla(ml)F2l-a(m2) = l l

or equivalently,
^i(wi)| _F\{mx) + F2{m2)
f2(m2)) 2F2(m2)
where 0 < a < 1 represents the weight of the first component of the quality
index.
If Fl (m,) = F2 (m2), then a can take any value in (0, l).
If F\ (/«|) * F2 (m2), then the weight holds
F {m ) + F2(m2)
log x x -log F2{m2)
a =- K0,1)
logF,(w 1 )-logF 2 (/w 2 )
On the other hand, when the components of the quality index are dependent,
from Eq. (9) we have the next equation
F,(m,) + F 2 (m 2 )
pFi(m{) + (\-p)F2(m2) =-
or equivalently,
Fx{mx)-F2{m2)
p(Fi(fr'i)-F2(m2)) =

where 0 < p < 1 represents the weight of the first component of the quality
index.
If Fx (ml) = F2 (m2), then p may be any point in (0, l).
If ^ i ( w i ) ^ F 2 ( m 2 ) , t h e n p = \/2.
Remark that under dependence between the components, the modal mean
technique provides the same weight for each component of the quality index,
which exhibits the coherence of the same one, since the dependent structure
between the components includes the predominance and importance of one over
the other, and therefore it will be contradictory the assignment of the different
coefficients in the weighted model.
76 M. Franco-Nicolas andJ.M. Vivo-Molina

3.2. Modal mean technique by survival functions


Taking into account that for any weight of the first component, 0 < a < 1 or
0 < p < 1 under independence or dependence, respectively, the weighted model
is bound by the marginal survival functions
inf {Sl (i,), S2 (i 2 )} < Sws (i,, i 2 ) < sup{S, (i,), 5 2 (i 2 )}
for all quality index (/'1,/2)> where the survival function Sws of this weighted
model allows to reduce the appreciation of the VMTS with respect to the
assessments from each component.
In particular, for the modal quality index (mx, m2), we have
inf {5! (mx), S2 (m2)}< Sws (mx,m2)< sup{5, (w,), S2 (m2)}
and therefore, the weighted model with coefficient (a or p), using the modal
mean technique, is defined by the distance minimum among the modal value of
the weighted model and the two modal values of the marginal survival
functions,

^.,™2)=5l(Wl)+/2(W2) (io)
Remark that this relationship to generate a weighted model using only the
quality index, just like the modal mean method by distribution functions.
In the first place, when the components of the quality index are
independent, from Eq. (10) the weight a of the first component is determined
by
Si(ml) + S2(m2)
S?(mx)S2-a{m2) =-

or equivalently,

c
.s2(™ 2)) 2S2(m2)
If F, (m,) = F2 (m2), then a can be any value in (0, l).
If Fj {mx) * F2 (m2 ) , then the weight holds

iog(l- F l ( ? W | ) + 2 F 2 K ) l-lo g (l-F 2 (. 2 ))


a=— 7 x 4 x e(0,l)
logil-F^m^-logil-F2(m2))
In the second place, when the components of the quality index are
dependent, from Eq. (10) the weight p of the first component is given by
e i wn \c /• A S (m ) + S2(m2)
pSx (w,) + (1 - p)S2 {m2) = x x

or equivalently,
Weighting Tools and Alternative Techniques 77

1V i ; n 2;
/>($,(m,)-S 2 (« 2 )) = 2

If Fl (w,) = F2 (m2), then p can be chosen in (0, l).


If Fi(ml)^F2(m2), then p = l/2.
Consequently, under dependence between the components and using their
marginal survival functions, the modal mean technique provides the same
weights for each component of the quality index, just like when one uses the
marginal distribution functions, and therefore it will be contradictory the
assignment of the different coefficients in the weighted model, because its
dependence includes the prevalence of one over the other.

4. Practical application
In this section, we expose a practical application of the weighted probability
models, by these techniques, in one example of land pricing. In particular, we
consider the transactions of agricultural propierties in Tierras de Campos and
Centro regions (Valladolid, Spain) given in Alonso and Lozano (1985) and
Garcia and Garcia (2003). The quality indexes used to explain the market values
(€) are the income per hectare (€/Ha) and the inverse distance to Valladolid
(l/km).
Table 1 displays data of the minimum (pessimistic), maximum (optimistc)
and modal (most likely) values for each variable; where the objective is to
appraise an agricultural property whose income per hectare is 194.31 and
location is km from Valladolid.

Table 1. Transactions of agricultural plots

Pessimistic Optimistic Most likely


K=Market value (€/Ha) 1502.53 3005.06 1953.29
/i=Income (€/Ha) 120.20 300.51 195.33
/2=Inverse distance (l/km) 1/70 1/10 1/50

Remark that the independence between both components of the


bidimensional quality index is assumed in this application by Garcia and Garcia
(2003), as well as their triangular distributions, and to reduce the depreciation of
the VMTD with respect to the appraisals obtained for each component, the
weight of the first component, a - 0.75, is provided by an expert judgment.
Besides, we will consider that the market value of an agricultural plot follows a
triangular or trapezoidal model, which are a sample of the different models that
might be considered.
78 M. Franco-Nicolas andJ.M. Vivo-Molina

Thus, in Tables 2 and 3, the weights of the first component have been
determined by the marginal distribution functions, and taking into account the
same ones for a better comparison in both cases, triangular and trapezoidal
market value when the weighting technique is econometric or mode, since the
other procedures are not influenced by the distribution function of the market
value.
In particular, Table 2 displays the assessments of the property through both
methods, VMTD and VMTS, when the weighted probability models are based
on the marginal distribution functions

Table 2. Weighted model of the marginal distribution functions (FWD)

Weighting a Fv VDWD VSWD


Technique
Subjective 0.75 Triangular 2054.33 2655.11
Subjective 0.75 Trapezoidal 2113.79 2681.07
Econometric 0.615456 Triangular 2064.94 2609.91
Econometric 0.615456 Trapezoidal 2125.23 2639.22
Mode 0.82074 Triangular 2048.92 2695.77
Mode 0.82074 Trapezoidal 2107.90 2718.71
Modal Mean 0.702754 Triangular 2058.01 2635.08
Modal Mean 0.702754 Trapezoidal 2117.77 2662.52

and Table 3 includes the appraisals of the weighted probability models based on
the marginal survival functions

Table 3. Weighted model of the marginal survival functions (SWD)

Weighting a Fv VDWS vsws


Technique
Subjective 0.75 Triangular 1689.98 2057.37
Subjective 0.75 Trapezoidal 1707.88 2117.09
Econometric 0.615456 Triangular 1711.83 2068.83
Econometric 0.615456 Trapezoidal 1731.80 2129.40
Mode 0.82074 Triangular 1669.16 2051.29
Mode 0.82074 Trapezoidal 1685.07 2110.49
Modal Mean 0.702754 Triangular 1699.95 2061.41
Modal Mean 0.702754 Trapezoidal 1718.79 2121.44

Analogously, in Tables 4 and 5, the weights of the first component are


generated from the marginal survival functions.
Weighting Tools and Alternative Techniques 79

In particular, Table 4 shows the valuations when the weighted probability


models are based on the marginal distribution functions by both methods,
VMTD and VMTS

Table 4. Weighted model of the marginal distribution functions (FWD)

Weighting
a Fy VDWS Vsirs
Technique
Econometric 0.671763 Triangular 2060.45 2624.50
Econometric 0.671763 Trapezoidal 2120.40 2652.73
Mode 0.612085 Triangular 2065.21 2609.22
Mode 0.612085 Trapezoidal 2125.52 2638.58
Modal Mean 0.441782 Triangular 2079.29 2598.49
Modal Mean 0.441782 Trapezoidal 2140.51 2628.65

and Table 5 displays the assessments using the weighted probability models
through the marginal survival functions

Table 5. Weighted model of the marginal survival functions (SWD)

Weighting
a Fy VDJTS vsws
Technique
Econometric 0.671763 Triangular 1705.06 2064.05
Econometric 0.671763 Trapezoidal 1724.39 2124.28
Mode 0.612085 Triangular 1712.13 2069.12
Mode 0.612085 Trapezoidal 1732.14 2129.70
Modal Mean 0.441782 Triangular 1714.64 2083.42
Modal Mean 0.441782 Trapezoidal 1734.89 2144.86

Remark that in all cases, the VMTS proposes appraisals greater than the
VMTD.
Besides, we show some graphics on the behaviour of both VMTS and
VMTD from the weighted models, in which the market value of an agricultural
land follows a triangular model and the quality index has independent and
triangular components.
In the first place, from marginal distribution functions, a = 0.82074,
0.615456 and 0.702754 by the mode, econometric and modal mean
techniques, respectively, and the valuation obtained from these procedures will
be marked by "m", "e" and "mm", respectively. So, Figures 1 and 2 describe the
80 M. Franco-Nicolas andJ.M. Vivo-Molina

assessments by the weighted models of the marginal distribution and survival


functions, respectively, where VI and V2 are the valuations from each
component of the quality index.

8 12 16 20 24 2S 32 36 12 16 20 24 28 32

Figure 1. Weighted models of the marginal distribution functions

Figure 2. Weighted models of the marginal survival functions

Similarly, from marginal survival functions, a = 0.612085, 0.671763 and


0.441782 by the mode, econometric and modal mean methods. Thus, Figures 3
and 4 depict the appraisals by the weighted probability models of the marginal
distribution and survival functions, respectively.

Figure 3. Weighted models of the marginal distribution functions


Weighting Tools and Alternative Techniques 81

Figure 4. Weighted models of the marginal survival functions

5. Comments and conclusions


Finally, we give some comments and we point out the main conclusions of
this work.
In the subjective technique, the expert judgment (appraiser) supplies the
information about the weights of the quality indexes, and its subjectivity was
commented by Herrerias (2002).
The main advantage of the valuation methods of the two functions (VMTD
and VMTS), against other appraisal methods, is the usefulness in absence of
data; in that situation, it is known the weakness of the regression models, and
consequently, the estimation of the weights by the econometric technique in the
generation of weighted models to correct and fit the assessments.
Likewise, the mode technique also has some disadvantages, in general, the
modal valuation has not to correspond with the modal quality index (see
Ballestero and Rodriguez (1999) and Herrerias (2002)). Furthermore, it is
required a strong restriction on the modal market value in order to generate
feasible weights, in both cases, independence and dependence between the
components of the quality index.
Finally, the modal mean technique allows to generate weighted models
avoiding the subjectivity of the expert judgment, the weakness of the
econometric methods and the restrictions on the modal market value. Besides,
the modal mean procedure provides feasible weights for the components of the
quality index.

References
1. Alonso, R. and Lozano, J. (1985). El metodo de las dos funciones de
distribution: Una aplicacion a la valoracion de fincas agricolas en las
82 M. Franco-Nicolas andJ.M. Vivo-Molina

comarcas Centro y Tierra de Campos (Valladolid). Anales del INIA:


Economia, 9, 295-325.
2. Ballestero, E. and Rodriguez, J.A. (1999). El precio de los inmuebles
urbanos. CIE Inversiones, Ed. DOSSAT 2000, Madrid.
3. Callejon, J., Franco, M., Herrerias, R. and Vivo, J.M. (2005). El metodo de
valoracion de las dos funciones de supervivencia como metodologia
alternativa al de las dos funciones de distribucion. In XIX Reunion
ASEPELT-ESPANA, Badajoz.
4. Callejon, J., Perez, E. and Ramos, A. (1996). La distribucion trapezoidal
como modelo probabilistico para la metodologia PERT. In X Reunion de
ASEPELT-ESPANA, Albacete. Content in Programacion, Seleccion y
Control de Proyectos en ambiente de incertidumbre. R. Herrerias (ed.).
Universidad de Granada, 2001, 167-177.
5. Cruz, S., Garcia, C.B. and Garcia, J. (2002). Statistical test for the method
of the two distribution functions. An application in finance. In VI Congreso
de Matematica Financiera y Actuarial and 5th Italian-Spanish Conference in
Financial Mathematics, Valencia.
6. Franco, M., Callejon, J., Herrerias, R. and Vivo, J.M. (2005). Procedimiento
para reducir la depreciacion del valor de mercado del metodo de valoracion
de las dos funciones de distribucion: Funciones de supervivencia y maximo.
In VI Seminario de ASEPELT sobre Analisis, Seleccion, Valoracion,
Control y Eficiencia de Proyectos, Murcia.
7. Franco, M., Herrerias, R., Vivo, J.M. and Callejon, J. (2005). Valuation
method of the two survival functions as a proxy methodology in risk
analysis. In CIMMA2005: International Mediterranean Congress of
Mathematics Almeria 2005, Almeria.
8. Garcia, J., Cruz, S. and Andiijar, A.S. (1999). II metodo delle due funzioni
di distribuzione: II modello triangolare. Una revisione. Genio Rurale, 11, 3-8.
9. Garcia, J., Cruz, S. and Garcia, L.B. (2002). Generalization del metodo de
las dos funciones de distribucion (MTDF) a familias beta determinadas con
lo tres valores habituales. In III Reunion Cientifica ASEPELT: Analisis,
Seleccion, Control de Proyectos y Valoracion, Murcia, 89-113.
10. Garcia, J., Cruz, S. and Rosado, Y. (2002). Extension multi-indice del
metodo beta en valoracion agraria. Economia Agraria y Recursos Naturales,
2, 3-26.
11. Garcia, J. and Garcia, L.B. (2003). Teoria General de Valoracion. Metodo
de las dos funciones de distribucion. Ed. Fundacion Unicaja, Malaga.
12. Garcia, J., Herrerias, R. and Garcia, L.B. (2003). Valoracion agraria:
Contrastes estadisticos para indices y distribuciones en el metodo de las dos
funciones de distribucion. Revista Espafiola de Estudios Agrosociales y
Pesqueros, 199,93-118.
Weighting Tools and Alternative Techniques 83

13. Garcia, J., Trinidad, J.E. and Gomez, J. (1999). El metodo de las dos
funciones de distribution: la version trapezoidal. Revista Espaflola de
Estudios Agrosociales y Pesqueros, 185, 57-80.
14. Herrerias, J.M. (2002). Avances en la Teoria General de Valoracidn en
Ambiente de Incertidumbre. Tesis Doctoral. Universidad de Granada.
15. Herrerias, R., Garcia, J. and Cruz, S. (2003). A note on the reasonableness
of PERT hypotheses. Operations Research Letters, 31, 60-62.
16. Herrerias, R., Garcia, J., Cruz, S. and Herrerias, J.M. (2001). II modello
probabilistico trapezoidale nel metodo delle due distribuzione della teoria
generale de valutazioni. Genio Rurale. Rivista di Scicienze Ambientali,
LXIV, 3-9.
17. Johnson, D. (1997). The triangular distribution as a proxy for the beta
distribution in risk analysis. Journal of the Royal Statistical Society, Ser. D,
46, 387-398.
18. Johnson, N.L. and Kotz, S. (1999). Non-smooth sailing or triangular
distributions revisited after some 50 years. Journal of the Royal Statistical
Society, Ser. D, 48, 179-187.
19. van Dorp, J.R. and Kotz, S. (2002a). The standard two sided power
distribution and its properties: With applications in financial engineering.
The American Statistician, 56, 90-99.
20. van Dorp, J.R. and Kotz, S. (2002b). A novel extension of the triangular
distribution and its parameter estimation. Journal of the Royal Statistical
Society, Ser. D, 51, 63-79.
21. van Dorp, J.R. and Kotz, S. (2003). Generalized trapezoidal distributions.
Metrika, 58, 85-97.
22. Williams, T.M. (1992). Practical use of distributions in network analysis.
Journal of the Operational Research Society, 43, 265-270.
Chapter 5
ON GENERATING AND CHARACTERIZING SOME DISCRETE
AND CONTINUOUS DISTRIBUTIONS

M.A. FAJARDO-CALDERA
Dpto. de Economia Aplicaday Organization de Empresas
University of Extremadura
Camino de Elbas s/n, Badajoz, 06071, Spain

J. PEREZ-MAYO
Dpto. de Economia Aplicaday Organization de Empresas
University of Extremadura
Camino de Elbas s/n, Badajoz, 06071, Spain

The main aim of this paper is to generate compound distributions, discrete or continuous,
from Binomial conditional distributions by means of Bayesian techniques. Besides, the
authors extend Kowar's paper (1975) by characterizing some discrete and continuous
distributions, in the context of some well-known distributions, from the conditional
distribution of a random variable (r. v.) and by the linear regression of the latter given the
former.

1. Introduction
One of the main aims of Probability Calculus is to determine some
theoretical distributions useful for modeling the random phenomena that appear
in Experimental Sciences.
Many methods have been applied to generate or characterize discrete or
continuous distributions of probability: change of variables, functional equations
(differential or in differences), etc. These methods supply the theoretical issues
needed to describe a random phenomenon and to obtain the explicit probability
law.
In the direct method, some distributions are obtained by the expression of a
mathematical model, which is the abstraction of a random experiment. An
example of this method is the theory of combinatory numbers to directly get the
probabilities that correspond to each value of a random variable. From this
theory some important and well-known distributions as the binomial,
hypergeometric, geometric or negative binomial ones, are generated.

85
86 M.A. Fajardo-Caldera and J. Perez-Mayo

Sometimes, while trying to establish the model, one must solve an equation
(functional, differential, in differences) to explicitly obtain the probability law.
For example, the equation in differences obtained when one establish the
probability of getting r successes in n independent tests, by assuming that
the probability of success varies in each test. The equation appears in the
generalization of the repeated Bernoulli's tests, having the Binomial distribution
as a particular case.
It is also possible to start from a differential equation, being the Poison
distribution the most known. Systems of differential equations have been
proposed in the literature. The most important one is the well-known Pearson's
system of curves, that is a generalization of the differential equation generated
from the Normal distribution and whose solution contains many of the
continuous distributions of probability as the Normal, Gamma, Beta,
Exponential. Later, this system was studied by Elderton and Johnson (1969) and
extended by Herrerias Pleguezuelo (1975) and Callejon (1995).
The discrete consideration of the Pearson's system is done by Ord (1972)
and generalized by Fajardo (1985) and Rodriguez Avi (1993), whose most
important consequence is the extended analysis of the family of discrete
distributions of probability defined by the generalized hypergeometric series.
This analysis was done by Dacey (1971) and later extended and generalized by
Hermoso Gutierrez (1986) and Saez Castillo (2002).
An alternative method used in Statistics for generating distributions of
probability is the use of functions of random variables, i.e. variables
transformations. The most usual transformations are the sum, product or
quotient of two variables.
Finally, it can not be forgotten another method of getting probabilistic
distributions by means of limits. Two well-known examples are the conversion
of a Binomial distribution into a Poisson or a Normal one.
Following the steps above, in this paper we try to generate probability
distributions from compound distributions by means of the well-known
Bayesian techniques and, on the other hand, to characterize discrete distributions
from Binomial conditional distributions and linear regression.

2. The binomial model


In many statistical experiments, the observations are considered to be
generated by a binomial distribution. This distribution describes discrete data,
resulting from an experiment called Bernoulli's process in Jacob Bernoulli's
honour.
Generating and Characterizing Distributions 87

Consider a population in which an event happens as the outcome of a


Bernoulli trial with probability p. Thus, given 0 < p < 1, the number of
occurrences for k in r trials has the binomial distribution,
rr\ k
P[k\r,p} = p {\-p)r-\k = 0,\,...r (1)
vh
It is necessary to be careful in using the binomial distribution because the
following conditions must be fulfilled: each trial only has two possible results,
the probability of each trial remains fixed along the time and the trials are
statistically independent. Specifically, the second and third conditions require
that the probability of results in every trial remains fixed along the time and the
trials or attempts in a Bernoulli process are statistically independent, that is, the
result of a trial can not affect the result of any other trial.

3. Generating compound distributions


As it is commented in the section before, necessary conditions for using the
binomial distribution, are not satisfied in the most of experiments because, in
general, parameters (r, p) are usually random instead of fixed and, therefore, one
needs to define a distribution of probability.
Compound distributions such as the compound Poisson and the compound
negative binomial are used extensively in the theory of risk to model the
distribution of the total claims incurred in a fixed period of time
Considered the former, one can take into account, theoretically, a random
variable E, depending on a parameter 6, which is also a random variable. The
distribution of both variables is:
P[Z = s,d = n]=P[% = s\0 = n}p[0 = n] (2)
In many situations, the main interest lies in the marginal distribution of S, to
predict instead of the value of parameter 9. This marginal distribution is called
in the statistical literature compound distribution and can be obtained:
P[$ = s]= J/[© = #]/>[<? = s\@ = 9\ie (3)
if #is continuous.
If 9 is discrete and finite, then the compound distribution is called mixture
of distributions and is given by
P[t = s}=^P[Z = s\Q = 0]p[& = 0} (4)
88 M.A. Fajardo-Caldera and J. Perez-Mayo

4. A compound distribution when the parameters of a binomial


distribution are variable (n) and fixed (p)
Assume the case of a random variable £, with a binomial distribution with a
parameter 9, being p fixed, as it is defined below:
s
p[t = s\e}= P {i-Py-s,s = o,i,...,0 (5)
s
VJ
Assume that 6 varies across the population according to a Poisson
distribution with X parameter
(6)

Then, from (8) and (9) one obtains the marginal distribution of £ a Poisson-
Binomial compound distribution, given by:
t X
'0\ J" 0~ PI
'(APy
*fe=4»Z \P'Q-PY (7)

This distribution matches the Poisson distribution with parameter Xp.


As an application, consider such a population that the probability of having
a male newborn is the parameter p (fixed) and one wants to compute the
probability of having s males in a family.
If the probability for a newborn to be male is p = 0.51 and the probability of
having s males in a family of n children is given by:
^
0.5 I s (1-0.51)"
v-v
Then, the probability for any family to have s male children will be:
(0.5\xy „ 5U

X"
by assuming that — e ~ k is the probability of having n children.

5. A compound distribution when the parameters of a binomial


distribution are variable (p) and fixed (n)

Consider that the distribution of probability of the random variable ^ given


in (1) depends on the parameter/?, being r fixed. Therefore, one can generate a
mixture of distributions if consider the following theoretical situation:
r\
P[t = k\r,0]-- r ek{\-ey-k,k = Q,\,...,r (8)
I*
Generating and Characterizing Distributions 89

Suppose that Ovaries across the population according to a Beta distribution,

f(0\a,fi) = 0" (
)~*> ,a>0,/?>0,0<fl>l (9)

where B(a,P) is the complete beta function. Since 9 is not observable, the
probability distribution of k in r trials, given a and /?, for a randomly chosen
member is the following simple mixture model
P^ = k\r,a,p]=jp[^ = k\r,9\f(9,a,P)d9 (10)
By using (8) and (9), the probability distribution of E, is defined as
^B*a + k,fi + r-k)tks0X^r ( U )
P[£ = k\r,a,p] =
Be{a,p)
This compound distribution is the beta-binomial model.
As an application of this beta-binomial model, consider that one wants
determine the probability of k, randomly selected, elements in a population have
influenza, knowing that the initial distribution of the proportion 9 of elements
with influenza is the distribution Be (9\1, 12) and that a random sample of 20
elements contained five sick people. Then, considering the former, the
probability for k randomly selected elements in a population have influenza
comes is given by
20\Be(S + k,U + 20-k)
p[t = k\20,3,n] =
k 5e(3,12)

6. Poisson compound distributions


It is usual to find situations where the Binomial distribution is applied with
a little p and a large n. Assuming rn = X constant and taking limits for the
binomial distributions when n -> oo , being p variable, one obtains a distribution
known as Poisson's Law. This reasoning means that Poisson's distribution
approximate the Binomial distribution rather well if n is large yp small.
Then, considering the former circumstances, the probabilistic distribution
given by (1) becomes into a Poisson's distribution con this conditional density
function given 0 = 9 :

f(y/9) = — e~e, w i t h 9 > 0 a n d y = 0,1,2,... (12)

Let © be a r.v. with density function f{9) and assume that 0 follows a
Gamma distribution, given by:
90 M.A. Fajardo-Caldera and J. Perez-Mayo

-0"-1e-ae,0>O
/ ( * ) = T(u) ' , withu>0, a > 0 (13)
0, 6 < 0
Then, the compound Gamma-Poisson distribution is given by:
/
I y- n«) r(w) I y\ {y J
(14)
That is known as negative binomial distribution of (u,p) parameters. This
distribution, introduced by Greenwood and Yule, has been used to represent the
industrial accidents. It is possible to say that the probability of y accidents is
given by a Poisson distribution of 0 parameter. However, this parameter changes
for each worker and one can observes that, assuming a Gamma distribution, the
observations follows a Poisson compound distribution.

7. On characterizing some discrete distributions by linear regression


In this section, we consider characterizations of discrete distributions, in the
context of some well known bivariate distributions, by the conditional
distribution of one random variable (r.v.) given another one, and the linear
regression of the second r.v. on the first.
Let X be a discrete r.v. taking the values 0...W, where m may be a positive
integer or co. Let the conditional distribution of another r.v. Y given X be given
by:
'^
p\y = y\X = x] = P'qxy (15)
\yj
with 0 <p < 1 and q = 1 -p. Here^j is independent of x.
Finally, let
E[x\Y = y]=ay + b (16)
where a and b are constant.
The question is: What distributions X are characterized by (15) and (16)?
We have the following propositions that prove, among other things, that a
and b in (16) must be positive.

Proposition 6.1
In the previous conditions with a finite E(X), we have the following:
(A) * > 0; (B) X is bounded if, and only if, 0 < a < 1. If X is bounded, then
b = m(\ - a); (C) 0 < a </>"' .
Generating and Characterizing Distributions 91

Proof.
(A) Specifying (16) for y = 0, we have E[X \ Y = 0] = b . On the other hand,
P[Y = y]=£,P[x = x,Y = y]=fiP[x = x]p[Y = y\X = x]=fd p>q'->'F[x = x\
\y.
(17)
Foiy = 0 andy = m, we have that:
m
F[Y = 0]=ttqxF[X = x] (18)

and
' ^
P[Y = m]=Y,P[X = m
] pxqxm =pmP\X = m\ (19)

Then, since X > 0 we have that


xP[X = x]p[Y = 0\X = x] PJX = x]
0< E[X I Y = 0]= Y^xPlx = x | Y = 0]= ^
P[Y = O] =Z^ P[Y = o]
If fc = 0, then P[X = x] = 0 for x = 1,2 m and hence X is degenerate at 0
which is a contradiction. Then b > 0.
(B) Let X be bounded, that is, let m a positive integer. Then, specifying (2)
for y = m, we have
E[X\Y = m\=am + b (20)
On the other hand,

v L J
mj ' _mpmP[X = x]
E[X\Y = m] = YjxP[X = x\y = m] = Yix
P[Y = m] P[Y = m]
(21)
Then, from (20) and (21), we have that m = E[X \ Y = m] = am + b. Then,
b = m{\ -a) implies that 0 < a < 1.
Now, assume that X is unbounded. Noting that Y also takes values 0, 1,2...,
and Y < X, we have from (16) that y = 0, 1, 2...

y = Z y^X = x'Y =
y] - Z xP x
l = x/Y = y] = E[X /Y = y] = ay + b (22)
x>y x>y

or (1 - a) y < b; which cannot hold for all the nonnegative integers unless
(I-a) < 0 . This proves (B).
(C) By considering that
92 M.A. Fajardo-Caldera and J. Perez-Mayo

l
E[X\Y = y] = YjxP[X = x\Y = y] = Yjx ' ^ q y + Z, (23)

and by summing both sides of (23), one can obtain that


Z Z * P [ * = *>y = >'] = X t o ; + *) / , [ y = >'] (24)
* y y ,

From (24) we have


E[X] = aE[Y] + b (25)
Besides,
L J
E[X\Y = y] = YiyP[Y = y\X = x] = ]ry / = ^ (26)

and by summing both sides of equation (26), we have

E[Y) = YLxP[X = x>Y = y] = lLxPP[X = x] = PE[x] (27)


and, therefore, from equations (25) and (27), one can obtain that

E[x] = —t— >0


from which
a>V
/p
as we want to prove in (C).

Theorem 6.1
Assume that X is a discrete r.v. taking the values 0,..., m, where m may be a
positive integer or co. Let E[X] finite. Let Y be another r.v. whose conditional
distribution given X is given by (15). Then (16) hold for some constants a, b if,
and only if, X has these distributions: binomial iff 0 < a < 1, negative binomial
and Poisson. Furthermore, X is binomial iff 0 < a < 1, negative binomial iff
a < 1, and Poisson iff a = 1.
Proof (=>):
Let (16) hold for some constants a and b. Letting P[X = x],x = 0,..., m, and
G (t) = Eft5'], the probability generating function (p.g.f) of X, we have

E[X,Y--y] = ±xnX--xlY--y]J£x^--XW--y!X--X]
n n
.. ~' -' (28)
X
^^p^P[X_=x] = ay + b
*=y P[Y = y]
Generating and Characterizing Distributions 93

y = 0, \,....m. That is,


/vA
Prq'->P[X = X]
5> y) P[Y = y]
= (ay + b)P[Y = y] (29)

^ ^
If we use the following expression, x = C+i) +^ , we have,
\y. 7J
m m
n I X \

+i
p7^\y J
(30)
Then by (30), we have.

{y + l)*-P[Y = y + Y] = (a-l)yP[Y = y] + bP[Y = y] (31)


/>
By multiplying both sides of (31) by t and summing overy = 0, 1 ,m, we
have the differential equation
ypH'{t) = {a-\)tH\t) + bH{t) (32)

for the p.g.f H{i) of Y. However, it is known that the p.g.f. of Y is given by
x x m ( yA x m ( y

H(t) = ^P[Y = y]=^Yl ^ ' f [ J f = i] = £ £ te,y<r>P[X = x\

=I I Wl"' \I\X = xY'Yd{tp + qYP\X = x\ = G{tp + q)


(33)

From (32) and (33), we get the differential equation


qG\tx) = (a-l)(?, -q)G'(tl) + bG(tl) (34)
for the p.g.f. G(ti) of X, (remember that t) = pt + q). To solve (34) for G(t]), we
consider two cases: (i) a = 1 and (ii) a ± 1.
W h e n a = \ , qG\t^ = fcG(/J), then
by
G{tx) = ce/q (35)
-b/
with c = e ' .
Thus, G(t) = e ' , showing thereby that X is Poisson with X =
b/q (b > 0), [remember that G(t) = eX(f~x) , with X > 0, is the p.g.f. of Poisson
distribution]
94 M.A. Fajardo-Caldera and J. Perez-Mayo

When a £ 1, then the equation (34) can be written as

whose solutions are


G(tl) = c{q-(a-!)(/, -q)Yv, v = b/(a-1), and c = (1 -ap) v (36)
Thus,

Vl (
' [/(I-op) /(\-ap)\ °
Now, let 0 < a < 1. Then, proposition l.C. (B), shows that b = m{\ -a).
Thus, considering the equation (37), we have

\\-apY /(l-ap)]
From (38) it follows that X has a binomial distribution with parameters
(w,a), where a = (1 - a)/(l - ap). [Remember that G(t) = (pt + q)n is the p.g.f.
of binomial distribution]
Finally, assume a > 1. Considering the equation (37), it follows that X has a
negative binomial distribution with parameters (v,a), where a = (1 - ap)laq. For
proposition 1 .C. (C) we have ap < 1, then a is indeed positive. [Remember that
G(t) = (p/1 - qt)n is the p.g.f. of negative binomial distribution]
This proves the "only if" part of the theorem.
<=) To prove the "if" part of the theorem, we consider the following:
Suppose that X is Poisson and Y/X = x is given by (1). Then, E[X|Y = y] =
ay + b, where a and b are constant.
Let be

P[X=x,Y = y] = P[X = x]P[Y = y\X = x] = e~A*t/xl * V x—y


yj p q (39)

/y\{x-y)\
the joint distribution of (X,Y) bivariate random variables (b.r.v).
Then,
Generating and Characterizing Distributions 95

P[Y = y] = ^P[X = x,Y = y] = YfX{XpY {XqY


'y\(x-y)\
(40)
= e^XpYlyXY^r
Xix-y)\ = «V«< W/y\ = y\
^ ^

Therefore, Y has a Poisson distribution with parameter Ap.


From (39) and (40), we have that:
e\XpY{XqTy
nx=x\Y=yi=
P[x=x
>Y=y]= y^-yy- e«{*4i
•*«/• 3 • . • * < * - > >

Ap
P[Y = y] e- &pY (x-y)\

(41)
Therefore, X|Y = y follows a Poisson distribution with parameter Aq. The
E[X/Y=y] is the following:
e'M (Aqf~y) ^ x, + , e* (Xqf"-y)
E[X = x\Y = y] = Y,x- •=Y,( ~y y)- •=Xq + y
x>y (x-y)\ x>y (x-y)\
(42)
which has the form of (2), with a = 1 and b = Aq,
(B) Suppose now that X follows a Binomial distribution with parameters
(m,a) and Y|X = x is given by (15). Then E[X|Y =y] = ay + b, where a and b are
constant.
Let be
V a'(l-a)" ^
P[X = x,Y = y] = P[X = x]P[Y = y\X = x] = pyqx~y
V ;A \yj
(43)
(ap)> m-y {\-a)-'{aqy

Thus from (43), we have that


'm^ m y
P\r = y\ = Y,Flx = xJ = y\=Y, (apY\ m-x,
\(l-a)-'(aqr>
KyJ (44)
frn^
(aPy(\-aPy
\y)
Therefore, yhas a binomial distribution with parameters (m, op).
Then, from (43) and (44) we have that
96 M.A. Fajardo-Caldera and J. Perez-Mayo

m-x /• \x-y
P[X = x,Y = y] m-y
P[X = x\Y = y] = (45) ' aq *
P[Y = y] m — x {l-ap l-ap
Therefore, X|Y = y have a binomial distribution with parameters {(m - y),
(1 - a)l(l - op)}. Then
"•
(l-a (m— ii\f
' aq '
E[X = x\Y = y] = YJx m — y
m — x l-ap
m-y\( (l-a aq
= Y,(x-y+y)
x>y m-xj{l-ap l-ap
f \x-y-\
aq^'f m — y— I (l-a aq
= (m-y) +y
l-ap Jx>y {l-ap l-ap
\ f
aq ^ f
aq = ay + m(l -a) = ay + b
•(m-y) +y = m y 1-
{l-ap, {l-ap l-ap
(C) Suppose now that X has a negative binomial distribution with
parameters (v,a), where a = (1 - ap)laq and Y|X = x is given by (1). Then,
E[X|Y = y] = ay + b, where a and b are constant.
Let be
v + x-l ^
P[X = x,Y = y] = P[X = x]P[Y = y IX = x] = a" (I-a)" pyqx-y
\ •* J \y.
v + y-l v + x-l x-y
(aY{p(l-a)}} {q(l-a)}
< y J { x-y
(46)
Thus from (46) we have that
V + y-T v + x-\
P[Y = y] = ZP\.X = x,Y = y] = fj (ay{p(\-a)Y {<?(! -a)}"
, y , x-y
(47)
Therefore, Y has a binomial distribution with parameters (m, ap).
Then, from (46) and (47) we have that
P[X = x,Y = y] m-y (l-a aq
P[X = x\Y = yy- (48)
P[Y = y]
m-x l-ap
Therefore, X|Y = y has a binomial distribution with parameters {(m - y),
(l-or)/(l-ap)}.Then,
Generating and Characterizing Distributions 97

(m-y\l (l-q ^ I aq \*-y


E[X = x\Y = y] = fjx
x>y m-x)\l-ap l-ap)
m-y (1-g aq
= Y,(x-y+y)
x>y m-x l-ap) \-ccp
sx-y-l
aq m—y —l (l-q aq
(m-y) +y
l-ap l-ap l-ap
aq aq * aq
= (m-y) +y = m +y = ay + m(l -a) = ay + b
l-ap) l-ap_ l-ap
which has the form of (16).

8. On characterizing discrete distributions by taking limits in


binomial distribution
Let 0 be a continuous r.v. with density function/(©), with 0 > 0 and E[0]
finite. Let Y be another r.v. whose conditional distribution given 0 = 9 is given
by a Poisson distribution with parameter 9, P(0). Then, if E[0|y = y] = ay + b
hold for some constants a, b (a ^ 0) if only if© follows a Gamma distribution.
Proof (=>):
Considering the particular case of E[0|y=_y] = ay + b foiy = 0, we have that:
oo

b = E[e= 6IY
= 0] = J0f(0/O)d0 > 0 (49)
o
Otherwise, from E [Y10] = 0, we have that
P[Y = y,Q = 0]
0 = E[Y/0] = ^yP[Y = y/0] = Zy
JW)
from which
0f(0) = Y,yP[Y = y,@ = 0] (50)
y

By integrating both sides of (50),

~\of(6)de = )ZypiY = y>® = *] => EM = E[Y] (51)


o ay
Let
98 M.A. Fajardo-Caldera and J. Perez-Mayo

@
ay + b = E[@/y] = "j0f(0/y)d0 = Jfl \ * ' , ** d0
co
(52)
=> (ay + b)P[Y = y] = J0P[Y = y,@ = 0]d0
0

By summing on y both sides of (52) and considering (51), we have that:

^(ay + b)P[Y = y] = Y)0P[Y = y,e = 0]d0


y y o

=> aE[Y] + b = E[0]=>E[0] = b/(l-a)>O


Therefore, a < 1.
From
0y
P\Y = y\=\—eef(0)d0 (53)
zy-
one obtains that
ay + b = E\0/Y = y] = - \e—eef(0)d0
L n
P{Y = y)l y\ (54)
y+l
= y)!(v + W. P(Y = y)
From (54):
(ay + b)P(Y = y) = (y + \)P(Y + l) (55)
Summing both sides of equation (51) and multiplying by f, one obtain that
atH\i) + bH{t) = # ' ( ' ) (56)
whose solution is given by
f _ V
H(t): , with v = "/ and p = \-a (57)
\-at v i-90

Hence, if 0 < a < 1, H(?) is the moments generating function of a


distribution of probability when v is a positive real number. In case of a positive
integer v, it is the Negative Binomial distribution, that is, Y ~ NeBi(v.p) and
when v= 1, it he Geometric distribution.
Besides
» ny
H(t) = ^t"P[Y = y] = ^t>
e-»!Lmd0

\e-"{YJ^{-)W)d0 = \e^f{9)d0= G(f*)


o y y'• o
Generating and Characterizing Distributions 99

where G(t*) is the m.g.f. of 0 with t* = t - 1.


By replacing H(t) by G(t*) in the differentual equation (56):
a(t +\)G'(t') + bG(t') = G'(t') (58)
whose solution is given by:
f _ \h,q f i Y
G(t') = ,withv = b/=k/and pr = \-a v(59)
' /a /q '
\p-qt \-Pt
If 0 < a < 1, then G(t*) is the generating distribution of a Gamma
distribution y{v,P). Besides, if v = 1, it is an Exponential distribution and,
finally, if v = n/2 and /7= 2; it is a j 2 distribution with n degrees of freedom.
<=) To prove the other side, and considering that Y\ 0 ~P(0) and ®~Ga(v,/3),
the distribution of (Y,0) is given by:

P[Y = y,@ = 0] = P[Y/0]f(0) = ^e-e—l— O^e* » (60)

Dividing (60) by the marginal density function of Y ~BN(v,p):


0y „e 1 0,-. e -*/,
P[Y = y,Q = 0] _ y\e Y{v)pv
f(0ly) =
P\Y = y\ fv + y-V
pvqy (6i)
QV±y-\ -Olq
n, — ^f(0/y)~Ga(v + y,q)

From which it is obtained that £[#1^] = ^ + }/, that it is linear, as we


wanted to prove.

References
1. Dacey, M.F. (1972). A family of discrete probability distributions defined
by the generalized hypergeometric series. Sankhya, Series B, 34,243-250.
2. Elderton, William P. and Johnson, Norman L. (1969). System of Frequency
Curves. Cambridge University Press.
3. Fajardo Caldera, M.A. (1985). Generalizaciones de los sistemas
Pearsonianos discretos. Tesis doctoral. Universidad de Extremadura.
4. Hermoso Gutierrez, J.A. (1986). Estudio sobre distribuciones generadas por
funciones hipergeometricas de argumento matricial. Tesis doctoral.
Universidad de Granada
5. Herrerias Pleguezuelo, R. (1975). Sobre las estructuras estadisticas de
Pearson y exponenciales: problemas asociados. Tesis doctoral. Universidad
de Granada.
100 M.A. Fajardo-Caldera and J. Perez-Mayo

6. Korwar R.M. (1975). On characterizing some discrete distributions by


linear regression. Communications in Statistics, 4(12), 1133-1147.
7. Ord, J.K. (1972). Families of Frequency Distributions. New York: Hamer
Publishing Company.
8. Pearson, K. (1920a). "Systematic Fitting of Curves to Observations". New
York: Biometrika, 1,265-303.
9. Pearson, K. (1920b). "Systematic Fitting of Curves to Observations". New
York: Biometrika, 2, 1-23.
10. Rodriguez Avi, J. (1993). Contribution a los metodos de generacion de
distribuciones multivariantes discretas. Tesis doctoral. Universidad de
Granada.
11. Saez Castillo, A.J. (2002). Generacion de distribuciones multivariantes
discretas mediante funciones hipergeometricas de argumento matricial.
Tesis doctoral. Universidad de Granada.
Chapter 6
SOME STOCHASTIC PROPERTIES IN SAMPLING FROM THE
NORMAL DISTRIBUTION

J.M. FERNANDEZ-PONCE
Departamento de Estadistica e I.O., Universidadde Sevilla

T. GOMEZ- GOMEZ
Departamento de Estadistica e I.O., Universidad de Sevilla

J.L. PINO-MEJIAS
Departamento de Estadistica e I.O., Universidad de Sevilla

R. RODRIGUEZ-GRINOLO
Departamento de Estadistica e I.O., Universidad de Sevilla

Univariate stochastic and dispersive ordering have extensively been characterized by


many authors over the last two decades. Stochastic orderings are also applied in
Economic. In particular, it is interesting to compare situations where one utility function
(or one distribution function) is obtained from the other by means of some operation that
has an economic meaning. To this end, stochastic properties for distributions associated
to normal distribution in sampling are studied in this paper. An application of the
multivariate dispersion order in the problem of detection and characterization of
influential observations in regression analysis is also shown. This problem can often be
associated to compare two multivariate ^-distributions.

1. Introduction
Stochastic orderings arise in statistical decision theory in the comparison of
experiments and estimation problems. Many useful characterizations of the
usual stochastic and dispersion order can be found in the literature. An excellent
handbook is Shaked and Shantikumar [13]. One of the most interesting
characterizations of the dispersion order is given in Shaked [12]. In particular,
dispersion and spread has been used to characterize the variability for
distributions and it has extensively been studied (see Lewis and Thompson,
[10]; Shaked, [12]; Hickey, [8]; Rojo and He, [11]; Fernandez-Ponce et al. [6];

101
102 J.M. Fernandez-Ponce et al.

among others). An extension of the univariate dispersion order to the


multivariate case was given by Giovagnoli and Wynn [7].
Stochastic orderings are also applied in Economic. The typical problem that
can be considered is how two different people with two different utilities react to
the same uncertain situation and how one person reacts to two different
uncertain situations. Stochastic orderings come into play only in the second
problem, but the two questions are so deeply related (one is in some sense the
dual of the other). In particular, it is interesting to compare situations where one
utility function (or one distribution function) is obtained from the other by
means of some operation that has an economic meaning.
The paper is organized as follows. In Section 2, the usual stochastic and
dispersion order are introduced. It is also given some interesting characterization
theorems which will be used later. In Section 3, stochastic properties for
distributions in sampling from normal distribution are studied. In Section 4, an
application of the multivariate dispersion order in Bayesian Influence Analysis
is explained.

2. Univariate stochastic orderings


In this section, the usual stochastic and dispersion ordering are introduced.
Moreover, it is given some interesting characterizations theorems which will be
used to compare the distributions associated to the normal distribution in
sampling.
Definition 2.1. The random variable X i s said to be smaller than the random
variable Y with respect to the usual stochastic order,_denoted as X <st Y,
if Fx(t)>Fr(t) JOT all f e 9 ? o r equivalently, if Fx(t)<FY(t) for all
/_€ 9? where Fx denotes the survival function of X given by
Fx(t) = P(X > t).
At first sight it might seem to be counterintuitive to say that X -<st Y if
Fx (/) > FY (t) for all / € 9? . On the other hand, it is clear that we want to
define Y stochastically larger than X when Y has large values with higher
probability than X. However, the distribution function describes the probability
of assuming small values, hence the reversal of the inequality sign holds. A
closure property of stochastic ordering is given in the next theorem.
Theorem 2.1. Let {Xt, i = 1, 2, . . .} be a sequence of non-negative
independent random variables, and let M be a non-negative integer valued
random variable which is independent of the Xt S. Let {Yn i — I, 2, . . .} be
Properties in Sampling from the Normal Distribution 103

another sequence of non-negative independent random variables, and let N be


a non-negative integer valued random variable which is independent of the Yf s .
M N
If Xi <s, Y, and if M< st N then ] £ X} <st £ Yj .
7=1 7=1

Proof. See Shaked and Shanthikumar [13].


It seems intuitive that the usual stochastic order can be characterized by
using the corresponding density functions. A sufficient condition to order two
random variables in the usual stochastic sense is given in the following theorem.
A definition is previously needed. Let a(x) be defined on / where / is a
subset of the real line. The number of sign changes of d in I is defined by
S~(a) = s u p 5 " [ a ( x , ) , . . . , a(xm)] where S~(ylt y2, . . . , ym) is
the number of sign changes of the indicated sequence and the supremum is
extended over all sets X, < x2 < . . . < xm such that x, is in / and m < 1.
Theorem 2.2. Let X and Y two random variables with density function f
and g , respectively. If S (g - f) = 1 and the sign sequence is -,+ then
X<stY.
Definition 2.2. Let X and Y be two random variables with distribution
functions F and G , respectively. Let F and G the right continuous
inverses of F and G , respectively. Then X is said to he smaller than Y in
the dispersive sense (X <disp Y), if F_1(v) - F~'(u) < G"'(v) - G~'(u), whenever
0 < u < v < 1.
In the following theorems, necessary and sufficient conditions are given to
order two distributions in dispersion sense. Let supp(JiQ be the support of
random variable X. That is, Supp(X) = {x in 5R: x > f ' ( 0 ) } .
Theorem 2.3. X <djSp Y if and only if 7= s , <p{X) for same increasing function <p
with <p(x2) - <p(xi) > x2 -x, for all xt < x2 and xh x2 in supp(X). Furthermore, if
this is the case then
tp(x) - G~'F (X) for all x in supp(Af).
Proof. See Shaked [12].
Theorem 2.4. The random variable X satisfies X <disp X + Y for any Y independent
of X if and only if X has a log-concave density.
Proof. Droste and Wefelmeyer [4].
104 J.M. Fernandez-Ponce et al.

3. Stochastic properties in normal sampling


In this section, the usual stochastic and dispersion ordering are studied for
the distributions associated to the normal distribution in sampling.

3.1. The normal distribution


Let X and Y be two random variables with normal distributions N (fij, at)
and N (n2, 02), respectively.
Theorem 3.1. Assume that a\ = a2. X<st 7 if and only if/// < fi2.
Proof.

F(t) >F(t -fi2 + fi,) = G (t) for all / in SR.


Now assume that /// = ft2. Then, X and Y can not be compared in usual
stochastic sense. See the following example: let X and Y be two normal random
variable with density functions N (0, 1) and N (0, 3), respectively. Hence, it is
obtained that
F (0.5) = 0.69 > G(0.5) = 0.56
F (1) = 0.15 <G(1) = 0.36
Consequently, the usual stochastic ordering is not held.

Theorem 3.2. X<disp Y if and only if a, < a2-


<72
Proof. By taking into account that the function <p(x) = (x — //,) + fd2 is an
0-1
expansion function, the result is immediately obtained by applying Theorem 2.3.

3.2. The tf-distribution


Let X and Y be two random variables with ^-distributions with m and n
degrees of freedom, respectively. This fact is denoted as X~ £m and Y ~ x2„ •
Theorem 3.3. If m < n then X <sl Y.
Proof. Assume that M and N are random variables with one point distributions
on m and n, respectively. Obviously, M<S,N. Furthermore, assume that Xj and Yj
are independent random variables with the standard normal distribution, then for
2
a\\jX-=slY
Thus, by using Theorem 2.1 is obtained that
Properties in Samplingfrom the Normal Distribution 105

7=1 7=1

Theorem 3.4. If m < n then X <disp Y.


m n n
2
Proof. It is well-known that X = ^Z? and 7 = ^ Z , =X + ] T Z,2
1=1 /=1 i=m+l
where Z, ~ JV(0. 7,). By using that the ^-distribution has a log-concave density,
the result is obtained by Theorem 2.4.

3.3. The t-Student distribution


Let Xbe a random variable with univariate /-Student distribution and with m
and a degrees of freedom and precision parameter, respectively and denoted as
X ~ St(0, a, m). The density function is given by
7M + 1 ,
2 for all jcinS?.
f(x\cr,m)= , \ ^ o -r'/2 1+ ^c*
r | y Knwr)"

The standard /-Student distribution, i.e. for a equals to 1, is denoted by /„.


Now, let X and Y be two random variables with univariate standard /-Student
distribution and with m and n degrees of freedom, respectively. It is easy to
check that X and Y can not be compared in usual stochastic sense. See the
following example: if X ~ t2 and Y ~t5 then
F (-2) = 0.091 > G(-2) = 0.051
F (1) = 0.788 <G(1) = 0.818.
The definition of a univariate partial order strongly connected with the
univariate dispersive ordering to order /-Student distributions in dispersion sense
is now needed. Therefore, the tail ordering defined by Lawrence [9] is
considered. Let X and Y be two univariate random variables symmetric about 0,
then we say that X is less in the tail order sense, denoted as X <r Y, if the ratio
G~' (u)/f"' (u) is non-decreasing (non-increasing) for u in (1/2, 1) (w in (0, 1/2)).
The following theorem establishes a sufficient condition to order the /-Student
distributions in dispersion sense.

Theorem 3.3.1. Let Sti(0, 1, m) and St](0, I, m) be two univariate


/-distributions. If n <m then tm <c//sp t„.
106 J.M. Fernandez-Ponce et al.

Proof. Caperaa [3] showed that if n <m then tm <r t„. In addition, Doksum [5]
showed that for univariate absolutely continuous distributions with F(0) =
G(0) = 0 such that/(0) >g(0) > 0 and G'1 (u)/F~' (u) non-decreasing for all u in
(0, 1), F <disp G holds. Under the last discussion, we consider the random
variable I /J with density function given by f. ,(f) = 1ft if) if t > 0 and 0
otherwise. A straightforward computation shows that the distribution function of
UJ is Fu{t) = 2FK{t)-\ fort>0. Hence, F\\(u) = F~\. [(u + l)/2]for
all u in the interval (0,1). Therefore, by using Caperaa [3], if « < m then
G~' (u)/F~' (u) is non-decreasing for all u in (0, 1). Since F]t ,(0) = F^ |(0) and
f\t |(0) > f\t |(0)» by using the result in Doksum [5], I /J <disp I tn\ holds. It is
easy to check, by using properties of symmetry, that this last result implies that
tm disp ni*

Note that the degrees of freedom of a /-Student distribution are always


associated with the dispersion and the lack of knowledge of the experiment.
That is, the lower degrees of freedom the bigger dispersion is and, therefore,
the bigger lack of knowledge of the experiment. To study in depth the
implications of the univariate dispersion order, see Shaked and Shanthikumar
[13]. If the precisions are different, the following corollary holds.
Corollary 3.3.1. Let St](0,o-i,m) and St^O.c^n) be two univariate
/-distributions which satisfy that n < m and a2 ^ o\ then Sti(0,ffi,m) <disp
St!(0,(T2,n).
Proof. The proof is trivial and hence is omitted.

3.4. The F-Snedecor distribution


Let A" be a random variable with an F-Snedecor distribution for «/ and m
degrees of freedom in the numerator and denominator, respectively. This fact is
denoted as X~3(n] ,m). Asume that X~3(nl ,m) and Y~3(n2 ,m). Then, X and Y
can not be compared in usual stochastic and dispersion sense. See the following
example: if
X~ 3(2,10) and Y ~ 3(7,10) then it is easy to check that
F(0.5) = 0.379 > G(0.5) = 0.185
F(2.5) = 0.868 < G(2.5) = 0.908
Note the following plot for the quantiles function is not non-increasing and
non-decreasing.
Properties in Samplingfrom the Normal Distribution 107

Figure 1. Plot of the function ( F -G"')

4. An application
In this Section, some results obtained in the last section to the particular t-
distribution family are applied. For this purpose, the corresponding definition of
the /-distribution from Bernardo and Smith ([2], pg.139-140) is used. A
continuous random vector X has a multivariate /-distribution or a multivariate
Student distribution of dimension k, with parameters // = (pi,...,pi), E and n,
where ju is in 9?*, E is a symmetric positive definite k x k matrix, and « > 0 if its
probability density function, denoted Stk(x;fi,E,n), is

Stk(x;n,!,n) = c
n
'n + k
for all x in 9? where

r§wr
Although not exactly equal to the inverse of the covariance matrix, the
parameter E is often referred to as the precision matrix of the distribution or,
equivalently, the inverse matrix of the dispersion matrix. In the general case,
EfXJ = ft and Var(X) = Z~' («/« - 2). An extension of the univariate dispersion
order to the multivariate case was given by Giovagnoli and Wynn [7]. A
function O : 9? -> iff is called an expansion if | 0 ( x ) - # X , ) | 2 > | x - X , | | 2 f o r
all x and x' in 9?n. Let X and Y be two «-dimensional random vectors. Suppose
that Y -sl 0(X) for some expansion function 3>.Then we say that X is less than Y
in the strong multivariate dispersion order (denoted by X <SD Y). Roughly
speaking, the strong multivariate dispersive order is based on the existence of an
108 J.M. Fernandez-Ponce et al.

expansion function which maps stochastically a random vector to another one.


The ordering in the <SD sense is intuitively reasonable and it satisfies many
desirable properties. Next result is in order to compare /-distributions when both
degrees of freedom and precision matrices are different.
Corollary 4.1. Let Yj ~ Stk(0,Sj,m) and Y2 ~ Stk(0,Z2,n) be two multivariate
/-distributions with different precision matrices and degrees of freedom. If that
^(S2_1) ^ ^ ( ? f ' ) and n < m hold then Y, <SD Y2
where i(.) is the vector of ordered eigenvalues and > refers to the usual
entrywise ordering.
Proof. See Arias et al. [1].
The following model is considered, Y = XB + £, where z is an Nx\ random
vector distributed as MNn(0,9I) (N dimensional multivariate normal (MN)) with
mean vector zero and covariance matrix 91, 9 scalar; B is the p x 1 vector of
regression coefficients; X is an N x p matrix of fixed "independent" variables;
and Y is the N x 1 vector of responses on the "dependent" variable. We assume
the prior density for B and 9 to be g(P,6) oc 0" where oc means that the first
member of this equation is proportional to the second member. This distribution
presumes that little prior information is available relative to that information
inherent in the data. Assume the case when a particular subset of size k has been
deleted, we denote this by (i), while the subset itself is indicated by i. Then the
general linear model may be expressed as
y , = (y,i5y'(i)) = P , ( X ' , x ' ( i ) ) + (Eii,s,(i)).
Thus the predictive densities based on full and subset deleted data sets,
when 9 is unknown, are two multivariate /-distributions with parameters
StN (y, (s2 (I + H))-', N - P) and StN (y (i) , (s 2 (i) (I + H ( i ) ))"', N - k - p)
where
S = X'X, H = XS-'X', H(i)=XS(i)~'X', y = Xp,
r = y - y , y (i) =XP (i) , a 2 =r'r, s2=z2/N-p
and S { i ) , a ( i ) , S (i) are similarly defined.
In this case, the problem to detect influential observations is based on
compare two multivariate /-distributions. If we only study the comparison in
terms of variability, it seems intuitive that if a subset of data is deleted then the
obtained predictive density will be expected to be more dispersive than the
predictive density based on full data. That is, the following order is verified

/(•HSD^OO
Properties in Samplingfrom the Normal Distribution 109

This fact may be interpreted as the added variability, due to deletion of data
subset i. However it is not held that every subset of data with a fixed size k has
the same influence. Consequently, a Dispersion Bayesian Influence in terms of
Variability (DBIV) measure to the i-subset can be defined as
Q 2 =|M S %(I + H (i) ))-Ms 2 (I + H))|[
and the subsets from least to most influential according to the magnitude of Q \
are ordered. Note that under the assumptions in Corollary 4.1, if the inequality
3i(s2(i) (I + H (i) )) > l(s2 (I + H))
holds then

For more details on this application see Arias et al. [1].

References
1. Arias-Nicolas, J.P. (2005). FernandezPonce, J.M., Luque-Calvo, P. and
SuarezLlorens, A. Multivariate dispersion order and the notion of copula
applied to the multivariate t-distribution. Probability in the Engineering and
Informational Sciences, 19, 361-375.
2. Bernardo, J.M. and Smith, A.F.M. (1994). Bayesian Theory. John Wiley
and Sons.
3. Caperaa, P. (1998). Tail ordering and asymptotic efficiency of rank tests.
The Annals of Statistics, 16, 470-478.
4. Droste, W. and Wefelmeyer, W. (1985). A note of strong unimodality and
dispersivity. Journal of Applied Probability, 22(1), 235-239.
5. Doksum, K. (1969). Starshaped transformations and the power of rank tests.
Annals of Mathematical Statistics, 40, 1167-1176.
6. Fernandez-Ponce, J.M., Kochar, S.C. and Munoz-Perez, J. (1998). Partial
orderings of distributions based on right spread functions. Journal of
Applied Probability, 35, 221-228.
7. Giovagnoli, A. and Wynn, H.P. (1995). Multivariate dispersion orderings.
Statistics and Probability Letters, 22, 325-332.
8. Hickey, R.J. (1986). Concepts of dispersion in distributions: Acomparative
note. Journal of Applied Probability, 23, 924-929.
9. Lawrence, M.J. (1975). Inequalities of s-ordered distributions. Ann. Statist.,
3,413-428.
10. Lewis, T. and Thompson, J.W. (1981). Dispersive distributions and the
connection between dispersivity and strong unimodality. Journal of Applied
Probability, 18, 76-90.
11. Rojo, J. and Guo Zhong He. (1991). New properties and characterizations
of the dispersive ordering. Statistics and Probability Letters, 11, 365-372.
110 J.M. Fernandez-Ponce et al.

12. Shaked, M. (1982). Dispersive ordering of distributions. Journal of Applied


Probability, 19,310-320.
13. Shaked, M. and Shanthikumar, J.G. (1994). Sthocastic Orders and Their
Appications. New York: Academic Press.
Chapter 7
GENERATING FUNCTION AND POLARIZATION

R.M. GARCIA-FERNANDEZ
Department of Quantitative Methods in Economics, University of Granada
Campus de Cartuja s/n. Granada, 18071, Spain

In this paper we apply the generating function to obtain the density of the overall sample.
This density is called mix density and is proportional to the geometric mean of the
subgroups densities. This approach can be use to measure the polarization when it is
understood as an economic distance between distributions. An empirical illustration is
provided using the data from the Spanish Household Expenditure Survey corresponding
to the regions of Andalucia and Cataluna, elaborated by the Institute Nacional de
Estadistica (INE) for the year 1999.

1. Introduction
The main objective of this paper is to extend the economic applications of
the generating function concept. The generating function was defined by
Callejon [1] considering that the right hand side of the system of Pearson, which
is given by:
f'(y)= y-a
fiy) bQ+b1y + b2y2
f'(y)
is a function of real variable g(y) , that is to say = g(y) .
/O)
The generating function has been applied successfully to the estimation of
the income distribution as we can see for instance in the papers of Herrerias,
Palacios and Ramos [8] and Herrerias, Palacios and Callejon [9]. In addition, the
concept of generating function can be used to generate Lorenz curves and
therefore to measure the inequality of the income distribution [1].
Another economic problem related with the income distribution is the
measurement of the polarization of the income as shown by the increasing
publications related to this topic (see Esteban and Ray [5], Wolfson [15], Tsui
and Wang [13] among others). As we will discuss in Section 4, there are several
approaches to measure the polarization. Following Gertel, Giuliodori and
Rodriguez [7] we are going to focus on the analysis of the polarization when it is

111
112 R.M. Garcia-Femandez

understood as an economic distance between distributions. On this point the


properties of the generating function provide a useful frame to the measure of
the polarization. Assuming that the income distribution is partitioned in
subgroups, by means of the generating function we are going to obtain the
density of the overall sample as a geometric normalized mean of the densities of
the subgroups. These densities will be used to measure the economic distance
between the subgroups distribution.
The approach proposed is developed assuming that the income distribution
follows a gamma distribution. We make this assumption because as we can see
in empirical studies, the gamma distribution has good properties to fit the
income distribution (see among others Lafuente [10] and Prieto [11]). It will be
interesting to use other distributions but a full exploration of the different
distribution must await a future paper.
This paper is organized as follows. In Section 2 we define the generating
function model and obtain the density function of the overall sample as a mix of
the density functions of each subgroup. In addition this Section shows how the
parameters of the model are estimated. In Section 3, the approach proposed in
Section 2 is applied to a gamma distribution. In Section 4, an introduction to the
measurement of the polarization is provided focusing on the measure that we are
going to use. Section 5, provides an empirical illustration, using the data from
the Spanish Household Expenditure Survey corresponding to the regions of
Andalucia and Catalufia, elaborated by the Instituto Nacional de Estadistica
(INE) for the year 1999. The main conclusions are discussed in Section 6.

2. Generating function
The starting point will be the definition of the generating function provided
by Callejon [1]. Let Y be a real variable defined over the bounded
support (a, b). Suppose that g(y) is a function of real variable such that
i) G(y) = \g{y)dy and ii) f eGiy)dy < co is verified. Then it is possible to
obtain a continuous probability function with density
-l
function f(y) = Ke (a < y < b), in which K = \"eGMdy .Observe that it
Ja
is verified:
±Lnf(y)=fM = g(y) (1)
dy f{y)
Function g(y) receives the name of generating function of probability (for
more details about this function and its properties see Callejon [1]).
Generating Function and Polarization 113

Let the support of the distribution be contained in some bounded


interval [a,b]. Assume that the interval is partitioned into n subgroups. Let g, be
the generating function corresponding to each subgroup. It is verified that
g(y) can be expressed as the weighted arithmetic mean:
g(y) = P\g\(y)+Pigi(y)+•••+p„g„(y) (2)

where each weight is a non-negative real number and ^p, = 1.


;=i

Denoting by / the density function associated with the subgroup /, and


considering expression (1) we can write f(y) as the following normalized
geometric mean:
Ay)=Kfl(xyf2(yy...f„(yY"
where K is the constant of normalization.
Expressions (1) and (2) allow us to obtain f{y) as a mix of the density
functions of each subgroup. This approach, as we will show, can be used to
study the degree of polarization presented by the distributions.

3. Application to a gamma distribution


In this Section, we describe the previous process assuming that Y follows a
gamma distribution of parameters a, 9. We make this assumption because our
main purpose is to apply this approach to an income distribution, and the gamma
distribution, as we can see in empirical studies, has good properties to fit the
income distribution (see among others Lafuente [10] and Prieto [11]). Of course,
it will be interesting to use other distributions but a full exploration of the
different distribution must await a future paper.
To divide the sample, we consider particular characteristics, for example
region, occupation... that provide an exhaustive partition of the sample into n
subgroups. For simplicity in the exposition, we consider two subgroups whose
generating function are g,(y;or,,,9,) and g2(y;a2,92) respectively.
The generating functions g,0;a,,i9,) and g2(y;a2,92) of a gamma
distribution, are defined in the following form (Callejon [1]):
a, - 1
1 a2 - 1 1
gx(y;au9x) = — g2(y\<x2,92) = -± —
y 9X y 92
According to expression (2) we can write
114 R.M. Garcia-Femdndez

a,-I 1 (a2-\ 1^
g(y) = Pi 9
+ p2
v y <) v y -27
f
p1(a1-l)+p2(.a2-l) P±+P_2 a-\ 1 = g(y;a,9)
y V9- &1 J y 9
Therefore the density of the overall sample is given by:
\g(y)<iy 1 a
~'0 9
f{y) = Ke —y e
Y{a)9a
This density is called mix density and is distributed as a gamma distribution
where r(a) is the gamma function.
Observe that the mix density function is proportional to the geometric mean
of the densities of the subgroups:

l -ya'-xe 9 1
f(y) = K < /*-'* * (3)
T{a2)9a2>
r(«,)T
where K is a constant of renormalization given by: / \iPl«l+Pl<*2)

(4)
K ( 1 1
\PI

r(p1ai+p2a2)
Y{ax-)9? j r(a2)9? ,
Introducing (4) into (3) the mix density function can be rewritten as follows:
le mix
1 --
f(y) = — a ya~xe »
r(a)9

where a = /?,a, + p2a2 and 9 = -; ^


\Pl+P2_
V ^ ^

In relation to the empirical work, it is necessary to estimate the parameters


of the density that is, or,,#,,a 2 ,9 2 ,p s ,p 2 . We are going to follow the following
steps. First, the parameters of the densities _/|(y;or,,5,) and f2(y;a2,92) are
estimated, using the Method of the Maximum Likelihood Estimation (MLE).
That is, we obtain the values of the parameters that maximize the following
loglikelihood functions:
" nY
lnZ,(j 1 ,...,^;ar 1 , 1 9,)=-n 1 lnr(a,)-«,a l ln5 1 +(a 1 -l)^ln>',.—Li
n,7,
\nL(yi,...y„;a2,92)=-n2lnr(a2)-n2a2ln92+(a2-l)Yjlnyl
Generating Function and Polarization 115

where «, and n2 are the sizes of the two subgroups and Yx and 72 are the
respective sample means.
The values of the parameters that maximize the above loglikelihood
function are denoted by dv9x,d2,32. Secondly, we introduce d],3l,d2,32
into f(y), and apply again the Method of the Maximum Likelihood to
estimate pup2. The empirical works show that pup2 are good for approaching
the group size. Observe the parameters a and 3 can be expressed as a
function, h. (.) of the parameters a],3l,a2,92,pl,p2, that is
a= h](al,a2,pl,p2)
3= h2(3l,32,pi,p2)
Hence, accordingly with the Zenha Theorem (Rohatgi [12]) we can
conclude that
d = hi(dud2,p1,p2)
3= h2(3i,32,pl,p2)
are the MLE of the parameters a and 9 .
After describing the estimation process, we are going to apply these results
to the polarization measurement.

4. Polarization of the income distribution


First of all, it is necessary to point out, that in this Section, we do not
pretend to make an exhaustive study of the polarization measurement. We think
that the method proposed in this paper could be useful to analyze the group
polarization but this paper is a first approach and it is necessary to continue
working on this theme.
Let us first start by defining the notion of polarization. According to
Esteban and Ray [5] "in any given distribution of characteristic we mean by
polarization the extent to which population is clustered around a small number
of distant poles". Several measures of polarization have been defined according
to different approaches emphasizing the differences between inequality and
polarization. Wolfson [14] proposed the following measure of polarization based
on the Lorenz curve:

W = 4^
2 \2 2

where /x is the mean, m is the median income, L\ — | is the Lorenz curve at the

median income and GI is the Gini index.


116 R.M. Garcia-Femdndez

Tsui and Wang [13] following the measure of Wolfson, defined a new class
of indices expressed by:

NT* rn J
where n, is the number of individuals that belong to group /, k is the number of
groups, rrij is the median of the group i, 3 is a positive constant and r takes
values in the interval [0,l] .
Esteban and Ray [5] provided a measure of polarization based on the sum of
antagonisms between individuals that belong to different groups. The
antagonism felt by each individual of group / is the joint result of the inter group
alienation combined with the sense of identification with the group to which
individual i belongs. The measure proposed by these authors is:

P= ttp)+apj\y,-yj\ i*«£i.6
(=1 7=1

where .y,-.yJ represents the alienation (distance) felt by the individuals of


income yi and y}. The share of population given by pt, and p" represents the
sense of group identification of each of the pt members of group / within their
own group. The parameter a falls into the interval [1,1.6] to be consistent with
the set of axioms proposed by Esteban and Ray. Before applying this measure it
is necessary to arrange the population into group according to characteristics, for
instance, region, race, etc.
Esteban, Gradin and Ray [6] proposed an extension of the Esteban and Ray
measure which corrected the error that may appear when the distribution is pre-
arranged into groups.
As we can see, the previous measures are defined for the discrete case. A
recent paper of Duclos, Esteban and Ray [4] developed the measurement of
income polarization which can be described using density function. The measure
proposed by these authors is based on what they refer as basis densities, that is,
densities unnormalized (by population), symmetric, unimodal and with compact
support. It has the following expression:

Pa (/) = j\f{*)Haf{y)\y -A ^ a e
[°- 25 >!]
where \y - x\ represents the alienation (distance) felt by the individual located at
x and y. The sense of group identification that an individual with income x feels
is given by / ( * ) " , where a is the sensitivity to polarization and falls into the
interval [0.25,l], in order to be consistent with the set of axioms proposed by
Duclos, Esteban and Ray.
Generating Function and Polarization 117

Gertel, Giuliodori and Rodriguez [7] measured the polarization of the


income distribution using the relative economic affluence measure (£>)
introduced by Dagum [2] (see also Dagum [3]). To define this measure we need
to introduce several definitions. Let P be a population with n income units yt.
P is partitioned into k sub-populations Pj (j = 1,..., k) of size w ; , with cumulative
distribution function F} (x), and mean income //. .The income level of the /'-th
individual that belongs to they'-th group is ytj.

Definition 1. The Gini mean difference, AJh, is the mathematical expectation of


the absolute difference between the income variables X and Y.
Ajh=E{\y-x\) = \;\;\y-x\lFh{X)dF^y)
Definition 2. The gross economic affluence djh is a weighted average of the
income differences yjt - yhr for each yj{ of P. which is higher than yhr of Ph
given that P. is in mean more affluent than Ph [jjj > juh):

Definition 3. The first order moment of transvariation p -^ between they'-th and


the h-th sub-population (such as ^ . > juh) is:

PJ* = \ldF^y)\l{y-x)dFM) (6)


Dagum resolved the integrals (5) and (6) obtaining:
dj^E^YF^+E^YF^-E^Y)
Pj>=Ej{YFl,)+Ek(YFJ)-EJ(Y)

where Ej(YFh) = j~yF.MdFjiy) and Ej(Y) = fiJ.


Considering definitions (1), (2) and (3) the relative economic affluence
measure is defined as follows:
d
D= jk-PjH=Mj-M„
A„ A„
The Gini mean difference can be written as:
A
jh =2djl,+Ml,-Mj
Hence, the ratio D can be rewritten as:

D= *'-*' (7)

The ratio D is a measure of the degree of proximity of the distributions. It


takes values on the interval [o,l]. It is zero when fij = /jh, meaning the
distributions completely overlap and equal to one when the distributions are
totally separate. Therefore, when polarization is interpreted as a distance
118 R.M. Garcia-Fernandez

between distributions, D can be used to measure the polarization. The higher the
values of D , the larger the polarization of the income distribution.
In our opinion the last approach is the most appropriate to analyze the
polarization in the context that we are working on. That is, we know the
densities of the subgroups, / , (y) and f2 (y), and we want to see how separate
or polarized they are.
In the next stage, we are going to obtain D according to the results
provided in Section 3. Let us consider two regions, the first group collects the
income data from the individuals that belong to region 1, and the second one the
individuals that belong to region 2. The mean income of the two regions are
given by //, and // 2 , and we assume that //2 > //, . The corresponding densities
are:

/i0,)=
"Fr~b^ o , v * (8)

My)=
r< \a«ya'~le~t (9)

r(a2)32 -
Given that //, =a1>91 and /J2 = a292 we can write expression (7) as
follows:
_ a292-a^9x
2d2l +a,i9, -a292
The gross economic affluence, dn, is given by:

where / , (y) and f2 (y) are the densities functions (8) and (9) and F{ (y) and
F2 (y) are their cumulative densities functions respectively.
As we can see, the ratio D is expressed in terms of the parameters of the
gamma distributions and d]. In Section 3, we described an approach based on
the MSE method to estimate the parameters a, ,&i,a2,32,p^,p2, so the
following step will be to apply this theoretical result to an empirical distribution.

5. Empirical application
We want to point out that the main object of this Section is to show how the
method proposed works. This is a preliminary version and we do not pretend to
do an exhaustive analysis of the income polarization. We are going to use the
data from the Spanish Household Expenditure Survey, Encuesta Continua de
Presupuestos Familiares, elaborated by the Instituto Nacional de Estadistica
Generating Function and Polarization 119

(INE) for the year 1999. We are going to focus on the income per capita of two
autonomous regions (Comunidades Automonas), Andalucia and Cataluna. First,
we estimate the density function of Andalucia, / , (y), and of Cataluna, f2 (y).
Secondly the mix density associated with the overall sample is estimated, see
Figures from 1 to 4.

d, = 3.60153259 5.202E - 06

500000 1E+06 2E-KJ6 2E406 3E+06 3E+06 4E+06 4E+06

Figure 1. Estimate density function of Andalucia

d, =3.60153259 #, = 5.202E -06

1000000 2000000 3000000 4000000 5000000 6000000

Figure 2. Estimate density function of Cataluna


120 R.M. Garcia-Femdndez

•Catalufia -Andalucfa

0.0000014

1000000 2000000 3000000 4000000 5000000 6000000 7000000

Figure 3. Estimate densities functions of Andalucia and Cataluna

d, = 3.60153259 .9, = 5.202E -06

0.0000014

0.0000012
0.000001
0.0000008

0.0000006

0.0000004

0.0000002

1000000 2000000 3000000 4000000 5000000 6000000 7000000

Figure 4. Estimate density function of the mix density

The ratio D can be estimated from the observed values or from a parametric
model of income distribution. The estimation presented in this Section is done
from the estimated parametric model. To obtain dn we have to resolve by
numerical methods the following integrals:
Generating Function and Polarization 121

JoV.G0/2O04'+ K yF2(y)fx(y)dy
The Gini index of Andalucia and Cataluna, as well as the Gini index for
both regions jointly are obtained. Given that the income is distributed according
to a gamma distribution, the Gini indices (Lafuente [10]) for Andalucia, IGX,
and Cataluna, IG2 are calculated using the following expression:
2
IG,= ,->; \ i = i,2

The Gini index for the overall sample, considering that a = pxax + p2cc2, is
given by:
r
IG=- (g'^'+g^+i)
•sJ7rT\axpx + a2p2 +1)
The analysis of the Gini index and the ratio D jointly, show on the one hand
the distance between the income distribution of Andalucia and Cataluna, and on
the other hand the inequality in each region. The value taken by D, see Table 1,
indicates that the income distributions of these two regions are located in an
intermediate point between the total overlapping and the complete separation.
Concerning the Gini index we conclude that the incomes are more equally
distributed in Cataluna than in Andalucia.
As we pointed out at the beginning of this Section, our purpose is to explain
how the method development in this preliminary paper works. It will be
interesting to obtain the D ratio for other years to establish comparison and to
consider other characteristics, to group the population, such as education level,
occupation etc.

Table 1. Gini indices and D ratio

IG D
Andalucia: fx (_y) 0.28718086
Cataluna: f2 (y) 0.25807088 0.554589619
Mix density: f(y) 0.27290117

6. Conclusion and further extensions


First of all, we want to emphasize that the properties of the generating
function provide a useful frame to the measure of the polarization when it is
understood as an economic distance between distributions. The generating
function allows us to obtain the density of the overall sample, which is
122 R.M. Garcia-Ferndndez

proportional to the geometric mean of the subgroup densities. This approach


makes easy the estimation of the parameters of the mix density. In addition the
generating function is a useful tool to extend the measurement of the
polarization to antisymmetric densities functions.
The ratio D indicates that the income distributions of Andalucia and
Cataluna are located in an intermediate point between the total overlapping and
the complete separation. In relation to the Gini index we conclude that the
incomes are more equally distributed in Cataluna than in Andalucia.
The approach proposed is developed assuming that the income distribution
follows a gamma distribution. It will be interesting to use other distributions and
to extend the empirical analysis to see how polarization and inequality change
over time

References
1. J. Callejon. (1995). Un nuevo metodo para generar distribuciones de
probabilidad. Problemas asociados y aplicaciones. Tesis Doctoral.
Universidad de Granada.
2. C. Dagum. (1985). Analyses of income distribution and inequality by
education and sex. Advances in Econometrics, 4, 167-227.
3. C. Dagum. (2001). Desigualdad del redito y bienestar social,
descomposicion, distancia direccional y distancia metrica entre
distribuciones. Estudios de Economia Aplicada, 17, 5-52.
4. J.Y. Duclos, J.M. Esteban and D. Ray. (2004). Polarization: Concepts,
measurement, estimation. Econometrica, 74, 1337-1772.
5. J.M. Esteban and D. Ray. (1994). On the measurement of polarization.
Econometrica, 62(4), 859-51.
6. J.M. Esteban, C. Gradin C. and D. Ray. (1999). Extensions of a Measure of
Polarization OCDE Countries Luxembourg income Study Working Paper
218, New York.
7. R.H. Gertel, R.F. Giuliodori, and A. Rodriguez. (2004). Cambios en la
diferenciacion de los ingresos de la poblaci6n del Gran Cordoba entre 1992
y 2000 segun el genero y el nivel de escolaridad. Revista de Economia y
Estadistica, XLII.
8. R. Herrerias, F. Palacios and A. Ramos. (1998). Una metodologia flexible
para la modelizacion de la distribution de la renta. Decima reunion
ASEPELT- ESPANA, Actas en CD-ROM.
9. R. Herrerias, F. Palacios and J. Callejon. (2001). Las curvas de Lorenz y el
sistema de Pearson 135-151. Aplicaciones Estadisticas y economicas de los
sistemas de funciones generadoras. Universidad de Granada.
Generating Function and Polarization 123

10. M. Lafuente. (1994). Medidas de cuantificacion de la desigualdad de la


renta en Espana segun la E.P.F. 1990-91. Tesis Doctoral. Universidad de
Murcia.
11. M. Prieto. (1998). Modelizacion parametrica de la distribution personal de
la renta para Espana mediante metodos robustos. Tesis Doctoral.
Universidad de Valladolid.
12. V.K. Rohatgi. (1976). An Introduction to Probability Theory and
Mathematical Statistics. New York: John Wiley and Sons.
13. K. Tsui and Y. Wang. (1998). Polarisation Ordering and New Classes of
Polarisation Indices Memo the Chinese University of Hong Kong.
14. M.C. Wolfson. (1994). When inequalities diverge? American Economic
Review, 84, 353-58.
Chapter 8
A NEW MEASURE OF DISSIMILARITY BETWEEN
DISTRIBUTIONS: APPLICATION TO THE ANALYSIS OF
INCOME DISTRIBUTIONS CONVERGENCE IN THE
EUROPEAN UNION

F.J. CALLEALTA-BARROSO
Departamento de Estadistica, Estructura Economicay O.E.I., University ofAlcala
Plaza de la Victoria no. 2, 28802 Alcald de Henares (Madrid), Spain

This study introduces a new measure of dissimilarity between distributions, related to


Gini's mean difference, and applies it to analyse the convergence between personal
income distributions within the 15 EU member states, during the period 1993-2000.
According to this measure of dissimilarity, relationships of proximity between these
distributions during that period of time constitute the basis of the analysis.
Multidimensional scaling techniques are used to construct the temporal trajectories of
such distributions in a factor space, optimally reduced for the analysis of their
differences. Data are taken from the European Community Household Panel.

1. Introduction
Personal income distribution has been the subject of study from very
different perspectives during the last decades. These perspectives have been
characterized by terms such as inequality, poverty, deprivation, mobility or
convergence.
This study focuses on the measurement of differences between personal
income distributions in order to use such a measure as an index of convergence
between them.
Measuring these differences raises an important problem for which there is
not only one solution. Several interesting aspects can be observed in the
personal income distribution of a population, which explains the multiplicity of
instruments needed to inform about each of them. Thus, from the simplest
descriptive statistics of a distribution to the most sophisticated measures of
inequality and poverty, all of them allow us to compare populations in some of
their specific aspects.
However, although they achieve successfully the informative specialization
for which they were set out, using these measures produces biased results when

125
126 F.J. Callealta-Barroso

our aim is to measure the overall difference resulting from the comparison of the
individuals that constitute the compared populations. Thus, we can compare the
average wealth of two populations from their means, or the internal inequality
within them by comparing their Gini's concentration indices. But, for example,
in the first case, we are disregarding the information about the shapes of such
distributions (it must be remembered that the same mean can be obtained from
distributions with different shapes), while in the second case, we are
disregarding the localizations of such distributions (it must be remembered that
two very different populations can present similar concentration indices, even
when one of them can be much richer).
One attempt to avoid this problem is to combine localization statistics with
inequality indices. For example, we can consider for this purpose the index
I = H • G, where \i and G are the corresponding mean and the Gini index of the
considered distribution, respectively. This index, I, is closely related to Gini's
mean difference between the individuals of a population3.
Could we, therefore, use Gini's mean difference to measure the difference
between distributions? Unfortunately, this measure only informs about inter-
population inequalityb, and not about proximity0 between populations. It must
be noted that Gini's mean difference between identically distributed populations
is not zero but equals twice the product of their common mean and their
common Gini index, as can be deduced from footnote b.
In this paper we propose a new dissimilarity measure related to Gini's mean
difference, intuitively interpretable and also clearly informative, which can be
used to measure the resulting overall difference between two compared random
variable distributions.

"Let A = E[|X - Y|] be the Gini's mean difference between two random variables X and Y. Then, for
X and Y identically distributed, the following equality holds: I = u • G = A/2.
'Tor any two random variables X and Y, then A is related to Gini's inter-population inequality index,
Gxy, and their localizations, u„ and u y as follows:

'We use the term proximity as a generic reference to any of either dissimilarity or similarity
measures, following the terminology used in Cuadras (1996). In order to compare pairs of random
variables, (X,Y), this study will concentrate specifically on dissimilarity measures defined as real
functions, "d", which increase with the difference and comply with the following properties:
a) for X = Y^>d(X,Y) =0
b) d(X,Y) = d(Y,X)
These measures are discussed in more detail in Everitt (1993).
A New Measure of Dissimilarity Between Distributions 127

Once we have introduced this measure, our objective is to use it to attempt


to determine the degree of proximity or convergence that could exist between
personal income distributions of different populations over time. Therefore, as
an application of what is developed in this paper, we present the study carried
out on the convergence between net personal income distributions within the 15
EU member states, during the period 1993-2000, according to the data from the
European Community Household Panel (ECHP).
The complexity of the volume of numerical information increases
quadratically when we try to address this problem. The dynamic analysis of the
degree of convergence between the populations under study requires the
measurement of the proximity between them, not only for each period but also
throughout the whole period. Thus, to compare p populations over t periods of
time we need to take into account
f„.*\ pt(pt-l)
pt
2
v *• J
non-trivial informative indices of proximity calculated between populations for
each pair of different periods of time, which have to be interpreted in
comparative terms.
This generally large number of informative indices makes it necessary to
use a technique beforehand that will allow us to simplify the overall
interpretation. We propose, therefore, to apply multidimensional scaling
techniques to help us to understand the evolution of distributions in a reduced
factor space, whose reference system we will additionally try to explain.
Consequently, for the analysis of the relationships of proximity and distance
(convergence) between distributions of net equivalent personal income in the
countries under study we will visualize their respective temporal trajectories,
which will be found in such a factor space optimally reduced, starting from
multidimensional scaling techniques applied to proximity measures previously
calculated according to what is proposed in this study.
The problem set out here deals, therefore, with two main issues. On the one
hand, we want to find a new measure of dissimilarity, as an informative
expression of the degree and quality of the differences observed between the
distributions under study. On the other hand, we would like to propose a
synthesising methodology for the analysis of these measures, when the objective
is to analyse a set of multiple populations through a large number of periods.
128 F.J. Callealta-Barroso

2. Measurement of proximity between income distributions


The measure we propose starts from the intuitive idea of the "opulence
measure'"1 introduced by Dagum (1980), which he denotes as distance di, and
which is closely related to Gini's mean difference.
For u x and uY the means or average incomes of two populations P x and PY,
whose income distributions are represented by random variables X and Y with
probability distribution functions F x () and F Y (), respectively, Dagum
establishes that the population PY is more opulent than P x when |i x < uY. In this
case, he defines the opulence measure dt as follows

dx =E[(Y-X)-I(Y-X)]= l"dFY(y)lyo(y-x)dFx(x) 0)

where r j y > x

I{Y-X) = -\I2 , Y=X (2)


0 , Y<X
Despite the clearly intuitive base of Dagum's proposal, this measure was
harshly criticised by Shorrocks (1982), mainly for two reasons:
• Shorrocks considers the measure dt inadequate as a relative opulence
measure, because Dagum establish, for its calculation, the a priori
assumption that one of them is more opulent, based exclusively on their
mean incomes. Thus, Shorrocks considers that using d] as a
measurement of the degree of opulence of a population over another
might be inconsistent and biased.
• Additionally, Shorrocks considers that d] can not be used as a measure
of economic "distance" since the measure di, applied to compare a
distribution to another identically distributed is not zero, as it should
logically be. In fact, it equals the product of its mean and its Gini index.
The first observation made by Shorrocks related to Dagum's proposal of
prefixing one of the distributions as a reference (that with the bigger mean),
once it has been "established" that it is more opulent, also shows a problem
when using dj as a dissimilarity measure of the difference between distributions.

•"The concept of "opulence" introduced originally by Dagum, corresponds to that of "satisfaction"


introduced by Hey and Lambert (1980). The concept of "deprivation" is obtained by changing the
role played by both populations. Thus, deprivation of X with respect to Y is defined as opulence of
Y with respect to X.
A New Measure of Dissimilarity Between Distributions 129

Dagum's proposal introduces a certain economic directionality in his measure,


thus making it asymmetrical. Moreover, if we try to use d] as a dissimilarity
measure, the dissimilarity between the less opulent distribution and the more
opulent distribution would not be defined.
However, considering that the intuitive idea that underlies Dagum's
measure informs appropriately about the existing economic difference between
two distributions, according to Gini's mean difference, we will try to adapt his
measure, for our purpose, as we develop below.

2.1. Reformulation of Dagum's measure dt and its relationship to


Gini's mean difference
Gini's mean difference A can be re-written as follows:

A = E\Y - x\] = E\Y - X\I(Y - x)+\Y - x\{\ - I(Y - x))]


= E\Y - X[I(Y - x)]+ E\Y - x\i{x - Y)] (3)
= E[(Y - X)I(Y - x)]+ E[(X - Y)I(X - Y)]
where
fl , Y>X
(4)
[0 , Y<X

According to definitions of opulence and deprivation of a population with


respect to another, we could say that two income levels x and y of their
respective populations Px and PY support the argument that "PY has a greater
opulence with respect to P x " (reciprocally, greater deprivation of P x with respect
to PY") if and only ify>x. In this case, the amount that this pair of compared
levels, (x,y), contributes to the greater opulence of PY with respect to P x , in the
sense used by Dagum (reciprocally, to the deprivation of P x with respect to PY),
could be evaluated by the difference y-x.
Similarly, we could say that two income levels x and y of their respective
populations P x and PY support the argument that "deprivation of PY is greater
with respect to P x " (reciprocally, greater opulence of P x with respect to PY") if
and only if y<x. In this case, the amount that this pair of compared levels, (x,y),
contributes to the greater deprivation of PY with respect to Px in the sense used
by Dagum (reciprocally, to the greater opulence of Px with respect to PY) could
be evaluated by the difference x-y.
130 F.J. Callealta-Barroso

The above suggests a decomposition of Gini's mean difference as follows:


h= d+yx+d~x=dxy+dxy
= E[(Y - X}I(Y - x)]+ E[(X - Y)l(X - Y)]
where:
a) d* =dxy= E[(Y - X}l(Y - x)], is the part of A due to mean opulence
of PY with respect to P x , which evaluates the difference for the cases in
which Y>X. This measure can be interpreted as the mean opulence
(satisfaction) of population P Y with respect to the individuals of P x with
lower incomes (reciprocally, mean deprivation of population Px with
respect to the individuals of P y with higher incomes).
b
) d^=dxhv=E[(X-Y}l(X-Y)], is the part of A due to mean
deprivation of PY with respect to P x , which evaluates the difference for
the cases in which X>Y. This measure can be interpreted as the mean
opulence (satisfaction) of population P x with respect to the individuals
of PY with lower incomes (reciprocally, mean deprivation of population
PY with respect to the individuals of P x with higher incomes).
Given these two definitions, the following properties, which relate them to
Gini's mean difference and the means of both compared populations, are
satisfied:
• Relationship to Gini's Mean Difference:

A = £|r-*|]=</;+rf-=«/;+</- (6)
• Relationship to the Difference of Means:

tiY-nx= E[{Y-X)]=d; -dyx = -(< -d-J (?)


• Explicit expressions for ct and d:

d* =d- = A + ^-A* (8)


yx v
2 2

d~ =d+ =^_»Y-»X (9)


y* xy 2 2

• Ranges for dt and d :


Old^^d^ZA (10)
A New Measure of Dissimilarity Between Distributions 131

0<d~yx=d+xy<A (11)

Starting from these definitions and properties, Dagum's measure di could be


reformulated less ambiguously, as follows:

+
^,=max[/;,^} = < ^ + f e - ^ l =A + K - ^ l (12)

i r » > yx) 2 2 2 2

or, alternatively:

dl=max\dxy,dxy) = 2 +- - 2+ i (13)

This measure is always between the limits:

o<A<^ i= A + Kz^d< A 04)


2 ' 2 2
and it takes the maximum value:

dx = A <=> X > Y (a.e.) or X < Y (a.e.) (15)

We observe that this measure corresponds to the average of two indices of a


very different nature, A and \juY -nx\ • While \juY -nx\ summarises the mean
difference of wealth, not taking into account the distribution shapes of
populations, A measures, in absolute terms, inter-population inequality, which
appears in the decomposition of the Gini index of two joint populations6.
With this reformulation, we solve the drawback of asymmetry or
unidirectionality presented by the measure of opulence proposed by Dagum
when we tried to use d] as a dissimilarity measure between both compared
populations.
However, the nature of the concentration measure involved in its calculation
means that di cannot be considered as a proper measure of dissimilarity. Indeed,

c
When the Gini index is calculated for a population coming from the joining of two others, the part
of inequality due to the relationship between the two joint populations after eliminating the part of
inequality presented internally by both population separately is:

Mx+Mr
132 F.J. Callealta-Barroso

the measure d] of a distribution X to another identically distributed to it is not


zero, as it should logically be, but instead:

dx=\ = Px-Gx (16)


where G x is the Gini index of X.
With reference to the alternative proposal of relative distance D b which
Dagum constructs from di, consequently, it leads to consider:
._^
D d.-Minjd,) _*» 2_k~^|
1 (
Ma^J-Mn^,) A_A_ A '
2
Now, 0 < D J < 1 and D] reaches a minimum value of 0 when means of
distributions coincide, not taking into account, in this case, the way in which
they distribute their wealth. And it reaches a maximum of 1 as long as any of the
variables X or Y are greater than the other (almost everywhere), not taking into
account, in this case, their localizations and the distance between their means.
This renders it inadequate for our purpose.
However, continuing in the spirit of Dagum concerning this measure and as
an attempt to solve the problem presented by using it as a dissimilarity measure,
we suggest the following as a new measure of dissimilarity based on the Gini's
mean difference between sub-populations of the compared populations.

2.2. Intuitive approach to a new measure of dissimilarity


Let X and Y be a pair of absolutely continuous and independent variables
with their respective density functions fx(x) and fY(y) defined over 5R.
Comparing now the populations, represented by their respective probability
density functions, we observe that we can extract two sub-populations, each one
coming from each original population, entirely comparable in their values. We
can also differentiate them clearly from two other sub-populations, each one
coming from each original population, perfectly distinguishable from the other
for having "distinctive" values by any individual, as it is intuitively reflected in
Figure 1.
A New Measure of Dissimilarity Between Distributions 133

Distinctive of Px Sub-populations

Figure 1. Comparison of populations Px and Py

According to this argumentation, for any pair of absolutely continuous


variables, X and Y, the subject of comparison here, we can define the following
auxiliary variables:
a) Variable C, which represents the behaviour of the "comparable sub-
populations". Here "comparable sub-populations" refer to sub-
populations of Px and Py respectively, for each one we can find another
sub-population coming from the other population, with similar
characteristics; i.e. with similar values of the variable, meaning
common behaviour for both variables X and Y (related to the shaded
area in Figure 1). Thus, C density function would be set up as follows:

_Min{fx(t),Mt))_ fx(t)<fY(t)
\-p (18)
/c(0 \-p frit)
fy{t)^fx<J)
\-p

where 1-p is the proportion of each population P x and Py that is


"comparable" to another equal proportion in the other one:

(19)
l
- P = kL(MW) / ? W + kw>/ l W ) / j f ( #
134 F.J. Callealta-Barroso

b) Variable X , which represents the behaviour of the "distinctive sub-


population of P x ". Here "distinctive sub-population of P x " refers to the
sub-population of P x complementary to that selected as "comparable
sub-population" to another one of PY, with specific characteristics of X
and for which it is not possible to find any other element of PY
"comparable" to its own (related to the non-shaded area on the left
hand side, in Figure 1). Its density function would be set up as follows:

/ (x) = fxM-frM.j^ (x) > f (JC)}


P
(20)
'W-f™ • /,(*)>/,<*)
0 , fx{x)<fy(x)
where p now represents the proportion of population P x which is "not
comparable" to any sub-population of PY, and where I{} is the
indicative function for the proposition in bracketsf.
c) Variable Y , which represents the behaviour of the "distinctive sub-
population of PY". Here, "distinctive sub-population of P Y " refers to the
sub-population of P Y complementary to that selected as "comparable
sub-population" to another one of P x , with specific characteristics of Y
and for which it is not possible to find any other element of P x
"comparable" to its own (related to the right hand side non-shaded area,
in Figure 1). Its density function will be written as follows:

fr.(y)=My)-fAy)-i{My)>fAy)}
p
fr(y)>fx(y) (2i)
p
o , fY(y)<fx(y)
where p now represents the proportion of population PY that is "not
comparable" to any sub-population of P x .
With these definitions, the original distributions can be expressed as
mixtures of the variables defined above, as follows:

f
The indicative function for a proposition A has a value of
1 , A true
1
' 0 , A false
A New Measure of Dissimilarity Between Distributions 135

{ll)
/ jr (*) = 0-/0-/c(*)+/>-./>(*)
fr(y)=Q-p>fc(y)+rfr-(y) (23)
where variable C informs about characteristics of the sub-populations selected as
"comparable" in both populations P x and PY, with a proportion of 1-p, while the
variables X and Y inform about specific "distinctive" or "non-comparable"
sub-populations, of proportions p, coming respectively from either compared
populations P x or Py.
Some of these distributions properties are the following:
a) "Distinctive sub-populations" represent a proportion p of populations
from which they come, and:

P = \-DfAO-Mt^dt (24)
b) The means of these auxiliary distributions (C, X*, Y*) decompose the
means of the original distributions, informing of contributions to the
latter of each "comparable" and "distinctive" sub-population, according
to their weights in the corresponding mixtures, as follows:

E[X} = PE[X']+{\- p)E[c\


E[Y] = PE[Y']+ (1 - p)E[c] (2 5 )

From the above we derive the following properties:

(i - P)E[C]=E[X]- PE[X' ]=E[Y] - PE[Y' ]


E[X]-E[Y] = P(E[X']-E[Y'1I (26)

E[X]+ E[Y] = P(E[X* ]+ E[Y' J+ 2(1 - p)E[c]


2.3. Definition of the proposed measure of dissimilarity
According to definitions presented above, we propose Gini's mean
difference between associated distributions X and Y , weighted by the product
of the proportions they represent of the original populations X and Yg, as a
dissimilarity measure between distributions X and Y.

sNote that we introduce the weight factor because our objective is, firstly, to make the measure as
intuitive as possible (it leads to the direct evaluation of differences related to non-shaded areas in
Figure 1). Secondly, we want to introduce in the expression the effect of relative sizes of
"distinctive" sub-populations (proportions of populations that "distinctive sub-populations"
represent).
136 F.J. Callealta-Barroso

d(X,Y) = p2E\Y*-X*\\
' ' (27)
= r r r > - *K/> <*> - /rwM/iw > M*)}
•ifr (y) - fx (y)}i{fr 00 > fx {y)\dx-dy
2.3.1. Properties of the proposed measure of dissimilarity
• d(X,Y) = 0e>X =Y (a.e.)
• The measure d(X,Y) increases with the difference between X and Y;
i.e., it increases not only with the increase of the proportion of X and Y
represented by their "distinctive" sub-populations, but also with the
increase of separation between them.
The measure is symmetrical: d(X, Y) = p EX -Y = d(Y,X)

0<d(X,Y)<A

For X > Y (a.e.) or X < Y (a.e.) => d(X,Y) = A = MY~MX

• This dissimilarity measure is invariant under the same translation of


compared variables; and it is proportionally affected by the common
scale factor under the same changes of scale for the compared
variables.
It is worth noticing, however, that this measure of dissimilarity, which
measures proximity between distributions in the way we have proposed, does
not strictly fulfil the triangular property; and, therefore, it is not strictly speaking
a distance11.

3. Case study: Convergence of income distributions in the EU-15

3.1. Concepts and Data


Following the introduction of the proposed dissimilarity measure, our
objective will be to apply it to the analysis of the degree of proximity, and

h
There are counter-examples in the matrix of dissimilarities calculated in the application developed
in a later section. One of them is, for example, that occurring between the countries GER, BEL
and FRA in 1993, for which d(GER,FRA)=223 while d(GER,BEL)=59 and d(BEL,FRA)=142.
A New Measure of Dissimilarity Between Distributions 137

consequently to the analysis of the convergence, that we may find between


personal income distributions within the EU-15 during the last years.
For this purpose, we have used data about family incomes from the
European Community Household Panel (ECHP) between 1994 and 2001, which
ensures that the information provided is homogenous over time and for the
different countries, allowing cross-section and dynamic comparisons.
Looking at sample sizes from the ECHP for each year we can see that we do
not have homogenous data for Austria and Luxembourg for the first year (1994),
for Finland for the first two years (1994 and 1995), or for Sweden for the first
three years (1994-1996). Furthermore, in 1997 Germany, Luxembourg and the
United Kingdom (UK) stopped collecting the original ECHP questionnaires, and
information requested by ECHP was collected since then from their own
national panels (SOEP, PSHELL and BHPS, respectively); we have selected
these series of data to preserve longitudinal homogeneity for these countries.
The concept of income used as a starting point is "Total Net Household
Income" (variable HIlOO), which includes incomes after transfer payments and
deduction of taxes and Social Security contributions. The years of reference for
the ECHP data about incomes correspond to the years before the surveys were
carried out and we will use those for this study.
To render incomes comparable for the different countries and waves taken
into account, variable HIlOO has been adjusted according to purchasing powers
of national currencies within each country, using for this purpose the OECD
Purchasing Power Parity for each year and currency, also taken from the ECHP
(variables PPPyy, yy=93 to 00). And since welfare of a household depends not
only on its incomes but also on its size and composition, we have finally
calculated the variable "Comparable Equivalent Personal Income (net)", for
each country and survey wave, adjusting comparable incomes to this effect.
In short, these adjustments were carried out by dividing "Total Net
Household Income", previously modified according to the purchasing power
parity for each country and year, by the equivalised size resulting for each
household when applying the conventional OECD equivalent scale' (variable
HD004).
The "Comparable Equivalent Personal Income (net)" has been assigned to
each member in every household assuming that all members enjoy the same
level of economic welfare. From this approach, the analysis unit is the
individual; therefore, in each wave and country the variable "Comparable

'In the conventional OECD equivalent scale the first adult counts as 1 unit, next adults as 0.7 and
each child under the age of 16 years as 0.5 units.
138 F.J. Callealta-Barroso

Equivalent Personal Income (net)" constructed this way, has been weighted by a
variable "weight" constructed as the product of the household cross-sectional
weight (variable HG004) and its size (variable HD001).

Table 1. Number of available cases for the variable "Comparable Equivalent Personal Income
(net)", by countries and waves

Wavel Wave 2 Wave 3 Wave 4 Wave 5 Wave 6 Wave 7 Wave 8


(1994) (1995) (1996) (1997) (1998) (1999) (2000) (2001)

Germany 6163 6293 6207 6098 5891 5782 5619 5474


Austria 0 3365 3280 3130 2951 2809 2637 2535
Belgium 3454 3341 3189 3009 2857 2684 2549 2322
Denmark 3478 3218 2950 2739 2504 2379 2273 2279
Spain 7142 6448 6132 5714 5438 5299 5047 4948
Finland 0 0 4138 4103 3917 3818 3101 3106
France 7108 6679 6554 6141 5849 5593 5332 5268
Greece 5480 5173 4851 4543 4171 3952 3893 3895
Netherlands 5139 5035 5097 5019 4922 4981 4974 4824
Ireland 4036 3562 3164 2935 2723 2372 1944 1757
Italy 6915 7004 7026 6627 6478 6273 5989 5525
Luxembourg 0 2976 2471 2651 2521 2550 2373 2428
Portugal 4787 4869 4807 4767 4666 4645 4606 4588
U.K. 5024 4987 4991 4958 4958 4914 4842 4749
Sweden 0 0 0 5286 5208 5165 5116 5085

Source: Author's own, from ECHP data

Table 2. Sums of household weights from available cases for the variable "Comparable
Equivalent Personal Income (net)", by countries and waves

Wave 1 Wave 2 Wave 3 Wave 4 Wave5 Wave 6 Wave 7 Wave 8


(1994) (1995) (1996) (1997) (1998) (1999) (2000) (2001)

Germany 6140 6280 6207 6125 5921 5812 5646 5506


Austria - 3366 3280 3133 2954 2809 2636 2539
Belgium 3446 3341 3188 3012 2862 2689 2552 2331
Denmark 3478 3218 2950 2740 2505 2380 2276 2280
Spain 7146 6443 6121 5724 5442 5296 5032 4952
Finland - - 4139 4100 3918 3820 3099 3108
France 7113 6683 6564 6141 5853 5596 5333 5277
Greece 5486 5173 4851 4544 4170 3954 3897 3891
Netherlands 5152 5050 5114 5024 4929 4987 4978 4827
Ireland 4038 3565 3164 2938 2725 2374 1947 1759
Italy 6894 6994 7024 6634 6498 6295 6004 5540
Luxembourg - 2975 2471 2652 2522 2550 2373 2428
Portugal 4799 4868 4809 4780 4653 4655 4614 4592
U.K. 5028 4994 4989 4956 4967 4924 4852 4762
Sweden - - - 5807 5717 5667 5633 5568

Source: Author's own, from ECHP data


A New Measure of Dissimilarity Between Distributions 139

Table 3. Weighted means for the variable "Comparable Equivalent Personal Income (net)",
by countries and waves (previous year incomes in purchasing parity units)

Wave 1 Wave 2 Wave 3 Wave 4 Wave 5 Wave 6 Wave 7 Wave 8


(1993) (1994) (1995) (1996) (1997) (1998) (1999) (2000)
Germany 11479 11424 11959 12429 12727 13102 14012 15166
Austria - 11917 11887 12070 12159 12579 13686 14359
Belgium 11803 11981 12056 12752 13209 13921 14094 14832
Denmark 11030 11721 12007 12858 13208 13856 14606 14982
Spain 7257 7246 7488 7881 8238 8650 9604 10409
Finland - - 9631 10031 10246 10603 10929 11799
France 11022 11052 11224 11358 12116 12673 12709 13549
Greece 6149 6587 6884 7273 7759 7833 8563 8743
Netherlands 10237 10482 11007 11467 12184 12989 13031 13287
Ireland 7702 8849 9481 9614 10860 10672 10709 11616
Italy 8074 8651 8749 8887 9372 9883 10508 10605
Luxembourg - 18166 18369 19364 19536 20233 21931 23101
Portugal 5898 6270 6377 6719 7018 7432 7792 8619
U.K. 10151 11174 10852 12118 12828 12588 13574 14675
Sweden - - - 10220 10597 10650 11023 12041

Source: Author's own, from ECHP data

Summarizing this first process, Tables 1 and 2 show respectively, by


countries and waves, the effective sample sizes and the aggregated sums for the
household weights, once we have eliminated the cases for which there is no
available data or whose variable "Comparable Equivalent Personal Income
(net)" cannot be calculated. Similarly, Table 3 shows the weighted means of
variable finally adjusted.

3.2. Non-parametric estimation of income distributions


Before calculating the proposed measure of dissimilarity between
distributions of "Comparable equivalent personal income" (net), we proceeded
to estimate non-parametrically their density functions using univariate gaussian
kernels, with optimal bandwidth, following Silverman's procedure (1986) for
each country and year considered. For this evaluation we used SAS/STAT
procedure KDEJ, which allowed us to calculate the corresponding estimates in
each of the 601 equidistant points in which we had divided the common range
taken into account (from 0 to 60,000 purchasing parity units), prefixed for all

'SAS/STAT® and SAS/GRAPH® are registered products of SAS Institute Inc., Cary, NC, USA.
140 F.J. Callealta-Barroso

income distributions in all countries and different waves of the panel . For their
analysis, charts for the density functions calculated in this way were obtained
using SAS/GRAPH procedure GPLOT.
From these charts we can extract some remarkably different behaviours:
Firstly, we see how Luxembourg has a distribution of "Comparable
Equivalent Personal Incomes (net)" clearly displaced to the right of those for the
rest of the countries, standing out for its higher personal incomes.
Towards the middle of these charts we can see two other groups of
countries behaving differently. The Nordic countries (Finland, Sweden and
Denmark) together with Netherlands present distributions more leptokurtic,
higher in their central sections (although Denmark and Netherlands present
medium degrees of kurtosis). In contrast, the rest of Central European countries
present a wider diversity in their central sections of income.
Lastly, on the left hand side of these charts, we find those countries
conventionally considered poorer (Italy, Greece, Spain, Portugal and Ireland).
However, if we observe the dynamics of these distributions over time, we
can see that although these trends are preserved, most distributions in the EU-15
countries, in general, tend to approach to the others, leaning towards a common
average behaviour in the centre of the chart, with the clear exception of
Luxembourg and different particularities presented by each country at each
period of time.
Additionally, if we observe in these charts the evolution of the distributions
for each country through the 8 waves, we can see their systematic movement to
the right (a tendency to a higher level of income) with noticeable decreases in
modal probability densities (a tendency to a wider diversity of incomes and
possibly to a higher inequality) including, in some cases, the presence of central
flatness in their density functions and even a pair of relative modes.
We will attempt below to study in depth these first impressions, and for this
purpose we will analyse the information obtained from the proposed measure of
dissimilarity, calculated between each pair of distributions in all of those.

k
Density functions estimated by stochastic kernels produce small deviations for estimations of
population means. Assuming that the ECHP sample sizes have been calculated to obtain
parametric estimations rather than for any another reason, we have proceeded to correct slightly
the corresponding density function in each case regrouping the upper 1% of probability from the
right tail in a unique interval. Thus, we have conveniently determined its range and class-mark so
that the mean of the corrected density function would reproduce faithfully the corresponding mean
estimated by the ECHP.
A New Measure of Dissimilarity Between Distributions 141

3.3. Dissimilarities
From the estimates of the 120 density functions obtained in the way
mentioned in the previous section (in fact there are 113 since 7 of them are not
available, for Austria, Luxembourg, Finland and Sweden, for some of the years)
and which represent the behaviours of "Comparable Equivalent Personal
Incomes (net)" in the 15 countries studied through the 8 years of the panel, we
have proceeded to evaluate the proposed measure of dissimilarity for each pair
compared. Consequently, we have constructed the matrix, which reflects the
totality of dissimilarity coefficients calculated between every pair of density
functions, each one corresponding to a "country-year", using the programme
SAS/IML1.
To sum up, differences between distributions of "Comparable Equivalent
Personal Incomes (net)" within the 15 countries for the initial and final years of
the period studied are presented in Tables 4 and 5.
Obviously, we cannot calculate the corresponding dissimilarity measures
between countries for which data were not available, as it is clearly shown in the
table of dissimilarities for the initial year (Table 4). This is the case of Austria,
Luxembourg, Finland and Sweden in 1993, Finland and Sweden in 1994 and
Sweden in 1995, as mentioned earlier.
Table 4. Dissimilarities between countries for the year 1993

1.993 GER DK_ NL_ BEL LUX FRA UK_ IRL ITA GRE SPA POR AUS FIN SWE
_
GER 0 219 195 59 - 223 147 1.482 953 2.508 1.655 2.885 - -
DK_ 219 0 215 202 - 449 484 1.332 1.064 2.447 1.712 2.964 -
NL_ 195 215 0 244 - 117 109 829 526 1.717 911 2.133 -
BEL 59 202 244 0 - 142 217 1.341 1.024 2.694 1.722 3.006 -
LUX
FRA 223 449 117 142 - 0 82 947 533 1.842 1.072 2.192 -
UK_ 147 484 109 217 - 82 0 732 301 1.460 730 1.746 -
IRL 1.482 1.332 829 1.341 - 947 732 0 121 208 43 387 -
ITA 953 1.064 526 1.024 - 533 301 121 0 436 108 551 -
GRE 2.508 2.447 1.717 2.694 - 1.842 1.460 208 436 0 144 48 -
SPA 1.655 1.712 911 1.722 - 1.072 730 43 108 144 0 221 -
POR 2.885 2.964 2.133 3.006 - 2.192 1.746 387 551 48 221 0
AUS
FIN
SWE

Source: Author's own, from ECHP data

'SAS/IML® is a registered product of SAS Institute Inc., Cary, NC, USA


142 F.J. Callealta-Barroso

Table 5. Dissimilarities between countries for the year 2000

2.000 GER DK_ NL_ BEL LUX FRA UK_ IRL ITA GRE SPA POR AUS FIN SWE

GER 0 93 243 45 2.665 158 153 765 1.411 3.002 1.583 3.533 90 688 592~
DK_ 93 0 363 263 2.910 290 374 918 1.432 2.961 1.821 3.652 139 863 780
NL_ 243 363 0 96 4.016 35 111 212 439 1.484 674 1.903 130 165 110
BEL 45 263 96 0 3.388 103 112 657 1.078 2.553 1.453 3.062 86 573 488
LUX 2.665 2.910 4.016 3.388 0 4.024 3.039 6.004 7.029 9.403 7.382 9.843 3.222 6.042 5.407
FRA 158 290 35 103 4.024 0 114 250 514 1.587 755 1.993 93 240 203
UK_ 153 374 111 112 3.039 114 0 424 884 2.140 1.159 2.612 204 641 424
IRL 765 918 212 657 6.004 250 424 0 61 605 197 1.050 578 117 141
ITA 1.411 1.432 439 1.078 7.029 514 884 61 0 315 77 676 925 171 245
GRE 3.002 2.961 1.484 2.553 9.403 1.587 2.140 605 315 0 171 97 2.306 881 1.016
SPA 1.583 1.821 674 1.453 7.382 755 1.159 197 77 171 0 417 1.264 441 535
POR 3.533 3.652 1.903 3.062 9.843 1.993 2.612 1.050 676 97 417 0 2.750 1.463 1.639
AUS 90 139 130 86 3.222 93 204 578 925 2.306 1.264 2.750 0 505 433
FIN 688 863 165 573 6.042 240 641 117 171 881 441 1.463 505 0 25
SWE 592 780 110 488 5.407 203 424 141 245 1.016 535 1.639 433 25 0

Source: Author's own, from ECHP data

3.4. Direct analysis of measures of dissimilarity


In general, we observe a wide range of dissimilarities, going from a few
tens of purchasing parity units (43 units for Spain-Ireland in 1993, or 25 units
for Finland-Sweden in 2000) to various thousands of units (9,843 units in the
case of Luxembourg-Portugal in 2000).
The evolution over time, according to the similarity presented by their
distributions, leads to a classification of countries that agrees with that generally
found in the economic literature related to the course of these countries.
In Table 6, we have reorganized Table 5 sorting countries in descending
order according to their means of the "Comparable Equivalent Personal Income
(net)" variable, and set apart the different levels of proximity with different
background patterns. Thus, in the year 2000, we would have the following
groups of countries with more similar income distributions (some of these
countries could be situated alternatively in different contiguous groups,
according to the different internal degrees of similarity set up within groups):
{Luxembourg}, {Denmark-Germany}, {Germany-Austria-Belgium}, {United
Kingdom}, {France-Netherlands}, {Sweden-Finland}, {Ireland-Italy}, {Italy-
Spain}, {Greece-Portugal}.
A New Measure of Dissimilarity Between Distributions 143

Table 6. Classification of countries in the year 2000, according to their dissimilarities

2000 LUX DK GER AUS BEL UK FRA NL SWE FIN IRL ITA SPA G R E P O R

L U X ^ 2 " in 2 w o '-222 "-tSN V)?') 4H24 4 0 1 ' . Mir M)42 Mil i-l "029 7382 9403 9843
DK_ 2910* 0 93 139 26' 5 4 2'Hi 3M "so Sd- VIS 1432 1821 2961 3652
GER 2665 9 3 0 90 45 153 15.S •'4? *v. (.SS " ( • * 1411 1583 3 0 0 2 3533
-.115
AUS 3222 \i<) 90 0 86 204 93 130 I«3
4sx
"\ 925 1264 2306 2750
BEL 3388 26? 45 86 0 112 103 96 *--. (.*• 1078 1453 2553 3062
UK_ 3039 " 1 153 204 112 0 114 111 424 (.41 424 884 1159 2140 2612
FRA 4024 2"D l*S 93 103 114 0 35 203 240 2*0 514 755 1587 1993
N L _ 4016 *'>* 243 130 96 111 35 0 110 165 212 430 6 7 4 1484 1903
S W E 5407 " X I I *"2 4-1 4ss 424 20 i 110 0 25 141 245 535 1016 1639
F I N 6042 ,\(.; (.sx ^(15 ^", i.ll 240 165 25 0 117
n 441 881 1463
I R L 6004 " I S ~h* S~S i'^~ 421 2->U 2 | l 197
141 117
() 61 605 1050
I T A 7029 I4- - 1 1411 l) uri *I4 1) 77 <|5 (."(.
-- SSI
||-MI
4'y 245
PI 61
|U7
S P A 7382 IS2I 1*»3 1 V-4 14--'- (•"4
"•=•> * ! • ;
441 77 0 PI 4P
G R E 9403 2961 3002 2306 2553 2140 1587 1484 1016 881 605 315 PI I) 97
P O R 9843 3652 3533 2750 3062 2612 1993 1903 1639 1463 1050 (»"d 11" 97 0

Source: Author's own, from ECHP data

Legend: Dissimilarity less than 100 units


Dissimilarity less than 150 units
Dissimilarity less than 275 units

If we compare the dissimilarities in the final year 2000 to the corresponding


dissimilarities in the initial year 1993 we can see which countries have closer
distributions at the end of the period than they had at the beginning, and which
ones have a greater degree of separation.
To analyse the degree of convergence between countries during this period,
we have calculated the convergence indices for each pair of countries, resulting
from the ratio of dissimilarity presented by their distributions in the last year of
the survey (X2000 and Y2ooo) to that presented in the first year of the survey (X1993
and Y1993).

i$£(.x,r> =" ( ^ • 2 0 0 0 ' - ' 2 0 0 o ) 2 8


^(^1993 >^993) ( )

Consequently, a value of 1 for this index would show that the distributions
compared remain with the same degree of proximity, values greater than 1
would show separation or divergence between the distributions of the countries
compared, and values smaller than 1 would show proximity or convergence.
For the cases in which we did not have a dissimilarity measure (in the years
1993, 1994 and 1995) we employed, for the same countries compared, those
obtained the following year in which data were available.
144 F.J. Callealta-Barroso

Obtained results are shown in Table 7. Starting from it, we can infer that
there are groups of countries whose income distributions have come closer
during the period 1993-2000. However, there are other countries that present
greater differences between them at the end of this period. Consequently,
looking at Tables 6 and 7, we can highlight:
a) The country with the highest mean of "Comparable Equivalent
Personal Income (net)", Luxembourg, presents a final distribution of
incomes clearly distanced from those of the other EU-15 countries.
b) The four countries that follow Luxembourg, according to their mean
income (Germany, Denmark, Belgium and the United Kingdom), form
a group in which, generally, there is a final greater proximity between
income distributions; although with some internal polarizations. Thus,
Denmark with Germany and Belgium with the United Kingdom, have
respectively reduced their differences to approximately half those
presented initially. However, Germany and the United Kingdom
practically retain their differences, while Belgium and Denmark have
distanced themselves to some extent.
c) Austria has also distanced itself somewhat from the previous countries,
with the exception of Denmark, while the latter, has in turn distanced
itself from the other two northern countries, Sweden and Finland.
d) Income distributions in these two countries, Sweden and Finland,
together with France, Netherlands, Ireland and Italy, become much
closer to each other.
e) Out of these countries, Ireland is closer to those with mean incomes
higher than its own, with the only exception of Luxembourg which
distances itself more rapidly.
f) Spain and Greece (although more so in the case of Spain, which
therefore distances itself to a certain extent from Greece) also approach
this last group of 6 countries, with the exception of Ireland, which
seems to distance itself more rapidly.
g) Income distributions in both Spain and Greece distance themselves
from that in Portugal, which seems further from its initial position with
respect to the richest countries, shyly approaching the group of 6
countries mentioned in section d, with the exceptions of Italy and,
already mentioned, Ireland.
A New Measure of Dissimilarity Between Distributions 145

Table 7. Indices of convergence between countries: 1993-2000

Income Country LUX GER DK BEL UK AUS FRA NL SWE FIN IRL ITA SPA GRE POR
23.101 LUX 1,03 1,19 1,56 1,09 1,43 1,48 1,18 1,14 1,30 1,18 1,37 1,07 1,21 1,17
15.166 GER 1,03 l l S l 0 ' 7 7 " '•° 4 '- 1 1 <Wl '-•' '••*" !•-- .0,52 1.48 0,9ft 1.2(1 1.22
14.982 DK_ 1,19*0,41 l.W 0,77 .0,67 0JSS I.fi9 1.10 1.46 0,69 \M I.Oh 1.21 1.2*
14.833 BEL 1,56 f 0,77 I.'O 0 ^ 1 2.X* 0 , » 0,39 LOS (),92 0,40 1.115 0.84 0,95 1.02
14.676 UK_ 1,09 1.04 0,77 0,51 1.40 1 IS 1.02 0,99 I.in 0,S8 2.')4 1.^9 1,4" 1,50
14.359 AUS 1,43 '.13 0,67 2.1s' 1.4ii 1.03 0,54 l,S2 0,98 0,55 1.01 0,66 I,»n 0,96
0 7 3 l ,K 0,3
13.549
13.287
FRA
NL_
'^Ifcilil ' - '•"' ° °''J5 °'67-<)'26 °' % °'TO 0M "•"
1,18 1,24 L69jO,.V> 1.02 0,54 0,30 0,81 0,74 0,26. 0,83 0.74 0,86 0,89
12.041 SWE
1,14 1,47 1,19 l.os o,W 1.S2 0,95 0,81 0,99 0,2* 0,67 0,55 0,85 0.93
11.799 FIN
1,30 1,22 1,46 0,92 !.'<() 0,98 0,67 0,*4 0,99 0,2$ 0,48, 0,51 0,84 0.99
11.616 IRL
1,IS^MJMM0A9' 0.SS 0,55 0,26 0J2fi 0,26 0^9 0,50" 4,(,4 2.92 2.-1
10.605 ITA
1,37 1,48'" 1,35 1.D5 2.94 l.til' 0,96 0,83 0,67 0,4* 0,50 0,71 0,72 1.21
10.409 SPA
1,07 0,96 1,0610,84 1.59 0,66 0,70 0.74 0,55 0,51 4.M 0,71 LIS l.sx
8.743 GRE
1,21 1,20 1,21 0,95 1.4" LOO o]»6 0.86 0.85 0,84 2.92 0,72 LIS 2,0?
8.619 POR
1,17 1,22 1,23 1.112 I.Si) (),% (1,91 0,89 0,93 0,99 :.-| 1.23 1 XS 2,0'
Source: Author's o w n , from E C H P data

Leeend- IBiiifti Reduction to less than 85% of dissimilarity in 1993


:
*'••' Reduction to less than 9 0 % o f dissimilarity in 1993
Reduction to less than 9 5 % o f dissimilarity in 1993

As can be deduced from the above, the greater or smaller proximity


between income distributions in the countries depends not only on the course of
the countries' economies, but also on the rhythm or speed with which the other
countries move. For this reason, we are not only interested in knowing their
current positions but, to a greater extent, to know how they have arrived at it
over time: whether trends of proximity (or distance) remained stable throughout
that period or not, whether every country tends to a same distribution pattern or
not, whether their paths have been relatively similar or not, etc.
To take into account as much information as possible about the course of
these income distributions' behaviours, in terms of proximity or distance, we
have considered all the dissimilarities calculated between countries'
distributions for the years available in the ECHP. Consequently, we have used
the totality of the dissimilarity triangular matrix, of the order (8xl5)(8xl5),
which includes (8xl5+l)(8xl5)/2=7260 dissimilarity coefficients between the
15 countries' distributions throughout the 8 years of the survey.
As we can see, complexity of numeric information increases since,
generally, the measurement of the difference between the behaviours of p
populations through t periods, leads us to consider
146 F.J. Callealta-Barroso

( F f+i) j>/+r
P
2 I 2J
1
dissimilarity coefficients" between p populations for the different t periods of
time, which have to be interpreted in comparative terms.

3.5. Application of the ALSCAL multidimensional scaling model


To analyse these results and to understand the relative temporal evolution of
the distributions studied, it is convenient to treat the dissimilarities previously
calculated using a technique that will help us to interpret them globally. For this
purpose, we have used a multidimensional scaling technique" which will give us
a representation of the income distributions compared in an Euclidian factor
space of a reduced number of dimensions, deduced optimally according to
previously calculated proximity measures. In this space, the analysis of
relationships of proximity and convergence between distributions of
"Comparable Equivalent Personal Income (net)" in the countries observed will
be made easier by the visualization of their respective temporal trajectories.
Moreover, once the reference system of this space has been explained in
economic terms, the analysis of these trajectories will be more informative.
To simplify the interpretation of dissimilarity coefficients calculated
between the distributions of "Comparable Equivalent Personal Income (net)",
for each country in each period of time observed, we have used the ALSCAL
model, following the procedure established by Young, Lewyckyj and Takane
(1986).
This model will attempt to find, in a certain p-dimensional space, the
coordinates (points) representative of each country distribution in each survey
year, so that the Euclidean distances, dy, between each pair of these points, or
their monotonous transformations T(dy), will reproduce, as closely as possible,
the observed dissimilarities, Sy, between their represented distributions.

"Actually, the number of non-trivial dissimilarity coefficients in the matrix resulting from the
comparison of the p t distributions of p countries through t periods, excluding the zero coefficients
derived from comparing a country in a period to itself, is:
pt\ p-ljp-t - 1)
2
J 2
"Torgenson (1958) proposed the fundamentals of multidimensional scaling. For an introduction to
these methods see Kruskal and Wish (1978).
A New Measure of Dissimilarity Between Distributions 147

SAS/STAT procedure MDS° has been used in order to solve the adequate
ALSCAL model. The model has been established trying a variety of
monotonous transformations (identity, afin, lineal, potential and staggered-
monotonous), as well as several dimensions for the factor space of
representation (between 1 and 6). The goodness of fit criterion used is the
measure of Kruskal's Stress-lp whose formulation is as follows:

Ife-TH))2 (29)
S,=

and, according to it, we finally find that in all spaces considered, the best
approximation was always given by the potential transformation model (or
linear logarithmic transformation, equivalently), as follows:

sij=T(dij)=s(duy
or equivalently, l 0 g f e ) = log(,) + ^ l o g ( ^ ) <30)

For every space of different dimensions considered, this model has provided the
values of the goodness of fit criterion reflected in Figure 2, which have also
been represented in an elbow chart.
According to these results and following the parsimony principle, two
dimensions should be enough to represent quite well the diversity reflected in
the calculated dissimilarities; or three if we want the adjustment to be qualified
as "excellent", according to Kruskal's scale. Increasing the dimension of
representation space to more than three does not seem to improve substantially
the goodness of fit for the model, although it improves it to some extent.
Thus, the model has been solved in three dimensions for the potential
transformation model (or equivalently, linear logarithmic transformation),
obtaining the following optimal solution, whose associated Shepard's Diagram
is presented in Figure 3:

,J v
or equivalently, ''' (31)
log(a>.. ) s log(234.9) +1.963-log(rf9 )

"SAS/STAT® and SAS/GRAPH® are registered products of SAS Institute be., Cary, NC, USA.
P
SAS/STAT MDS Procedure calculates Kruskal's Stress-1 when options Fit=l, Formula=l and
Coef=Identity are selected. According to Kruskal's criterion, Kruskal's Stress-1 characterizes the
goodness of fit of the model as follows: 0=perfect, 0.025=excellent, 0.05=good, 0.1=fair,
0.2=poor. Actually, this is the reason why, in terminology of MDS procedure, it is qualified as a
"Badness of Fit Criterion"
148 F.J. Callealta-Barroso

Dimensions Stress-1
1 0.058921
2 0.028915
3 0.022431
4 0.020017
5 0.018404
6 0.017446

Figure 2. Goodness of fit and dimensionality

10000 11000 12000 13000 14000 15000

Figure 3. Shepard's diagram

As we can see, Shepard's Diagramq confirms the goodness of fit obtained,


indicating a very high linear correlation coefficient between the dissimilarities
originally observed and the transformations of the corresponding distances

''Graphic representation of the pairs (T(dij), 5ij) joined in order lowest to highest 8*j
A New Measure of Dissimilarity Between Distributions 149

reproduced by the coordinates obtained from the model; indeed, once this linear
correlation coefficient has been calculated it takes the approximated value of
1.00 with a two decimals precision.
As a consequence, we obtained the coordinates of each country's yearly
income distribution in the optimal factor space, which we analyse below.

3.6. Trajectories of countries' income distributions in the factor space


Joining in an orderly way the coordinates in the factor space of a specific
country throughout the successive years of the survey, we can visualize the
trajectory of its behaviour and analyse it comparatively with that of others.
Figure 4 presents the trajectories of Countries' Income Distributions during
the period 1993-2000 in the projection plane formed by the two main
dimensions of the factor space.
At first glance, we can see that nearly all countries present a quite sustained
movement in this period, from right to left along the first dimension, from their
position in the initial year to those in the final year, indicated in the chart by the
country's identification labels followed by 00.

3 "
* - * - * AUS t t t Ba t-t-l- DK_ ff-r-FIN f - r - f FRft
t t ^ GER t-«-+ GRE + 1 - 1 |RL 1—+—t" IT* t t t - LUK
LLMXK HUM- « . _ t- * f POB ++* SPA * * * SHE HMHt UK_
2"

s '•
UK_O#—g—It L_, }3?5v^
Q 0-

-2"
T 1 1 1 1 1 1 1 1 1 1 1 1 1 1 f
- 6 - 5 - 4 - 3 - 2 - 1 0 1 2 3 4 S

Din-ensbn 1

Figure 4. Dynamic of countries using the proposed dissimilarity measure


Source: Author's own, from ECHP 1994-2001

We can also see another generalized movement of concentration of the


countries' positions, over time, towards positions close to the reference axis in
the first dimension; i.e., towards values close to zero in the second dimension.
150 F.J. Catteaha-Barroso

There are only two exceptions to this rule: the United Kingdom, whose
coordinates seem to raise slightly in the second dimension, although it remains
in relatively low levels (+0.29); and Luxembourg, which increases substantially
its coordinates in the second dimension, and is far away from the area where
trajectories of the rest of the countries are situated.
Although the movement of concentration is generalized with the exceptions
mentioned above, we can distinguish five groups of countries with quite
different value levels at the end of the period studied: Luxembourg (+2.28),
Portugal (+0.68), the United Kingdom (+0.29), Sweden and Finland (-0.59 and
-0.65 respectively) and the rest of the countries (between -0.20 and +0.10).

3.7. Understanding the factor space


Since our aim is to analyse the totality of the cloud of points in its three
dimensions, from their two-dimension projections over each pair of them, we
will try to study their statistical and economic interpretation in more depth, in an
attempt to better understand what is reflected in the charts.
To this end, Figures 5 and 6 show correlations between punctuations on the
dimensions of the factor space and the following descriptive variables
(represented in these figures as they appear in brackets): arithmetic average
(mean), quantiles of the orders 0.001, 0.01, 0.05, 0.1, 0.25, 0.5, 0.75, 0.90, 0.95,
0.99, 0.999 (P_001, P 0 1 , P_05, P_10, P_25, P_50, P_75, P_90, P_95, P_99,
P_999), ratios of these quantiles to the mean (PrOOl, P r O l , Pr_05, P r l O ,
Pr_25, Pr_50, Pr_75, Pr_90, Pr_95, Pr_99, Pr_999), standard deviation (dt),
range (rango), interquantile range P 2 5 - P 0 0 1 (rl), interquantile range P 5 0 -
P 2 5 (r2), interquantile range P 7 5 - P 5 0 (r3), interquantile range P 9 9 9 - P 7 5
(r4), Pearson's variation coefficient (cvar), ratios of these four interquantile
ranges to the median (rrl, rr2, rr3, rr4), Gini's mean difference (dmgini), Gini's
concentration index (igini) and squared Pearson's variation coefficient (cvar2).
The graphic representation of these descriptive variables, according to their
correlations with the dimensions of the factor space, will allow us to study their
intuitive meaning. In order to simplify these graphic representations, we only
represent those descriptive variables for which at least one of the correlations
with any represented dimension is higher that 0.4. Thus, Figure 5 shows
descriptive variables in the first two main dimensions sub-space.
We can see that the first dimension is highly and negatively correlated
(correlations near to -1) with nearly all the localization measures, absolute and
relative, and also with dispersion measures such as Gini's mean difference and
the standard deviation. In addition, it is also positively correlated with Gini's
A New Measure of Dissimilarity Between Distributions 151

concentration index. Therefore, a country will be located the more to the left of
the chart, the more its income distribution moves to the right over the income
size axis, providing higher average incomes and distributing higher wealth
(which usually happens with an increase of dispersion in the distribution), and
the lower inequality it presents (the more evenly the incomes are distributed).
Therefore, the first dimension can be interpreted as an index of welfare or
an index of "standards of living-income"r which takes into account jointly the
general level of wealth in the population and the degree of equality in the way it
is distributed.

1.0-

P 01 K3INI
0.5- P001 C ^
^ P R 01
DMGIN PJOP 05 Ni^OOl

^3o~^ = s^~rt? :: ^^^


oo-
PP.25 ^ f f ^ ^ S
5
PR 75 i l S ^ ^
PFTse *Sfrn£® /

FM
-0.5-

-10 -0.5 0.0 0.5 1.0


Dimensicn 1

Figure 5. Chart of descriptives in dimensions 1 and 2

Looking now at the second dimension, we observe that its correlations with
the descriptive measures are not too high, and therefore its interpretation could
be risky. In any case, the descriptive statistic more closely correlated with it is
Gini's concentration index (positively correlated), although ratios of low
percentiles to the mean are also positively correlated, and ratios of high
percentiles to the mean are negatively correlated as well.

r
The group of "standards of living-income" indices introduced by Pena et al. (1996) is defined as the
product of the income distribution mean and the complement to 1 of a normalized inequality
index. It belongs to a wider class of welfare indices introduced by Blackorby and Donaldson
(1978).
152 F.J. Callealta-Barroso

Therefore, this dimension classifies the income distributions of the different


countries placing in the more positive values those that have a greater
concentration of percentiles around their means and reach at the same time high
degrees of inequality. Reciprocally, it places in the more negative values those
distributions that have a greater separation of percentiles around their means and
reach at the same time low degrees of inequality. Hence, this dimension seems
to inform about the contribution of the right distribution tail to inequality. More
positive values in this dimension denote countries where the right tail has a
higher relative weight in the inequality, to compensate the higher equality in the
rest of the distribution, and vice versa.
To interpret the third dimension, let us observe the corresponding chart of
descriptive variables on the projection plane over dimensions 1 and 3, as shown
in Figure 6. We can see that, as was the case with the second dimension,
correlations with the third one are not high and, therefore, its interpretation is
risky. In any case, the highest correlations in absolute value are negative and
correspond to dispersion statistics, especially relative dispersion statistics, and
indices of inequality, while all the localization statistics present positive
correlations, especially the ratios of low percentiles to the mean. Furthermore,
we can see that this dimension is virtually not correlated with the mean.

0.5-
PR 10
p 29mA
P 9®ma^^P8_05

" P O I ~*2(s53
0.0-
Dlmension 3

DMQIM " ^ ^

-0.5- K3INI
DT >RAN30 GVAR2

-1.0-

-1.0 -0.5 0.0 0.5 1.0


Dimenacn 1

Figure 6. Chart of descriptives in dimensions 1 and 3


A New Measure of Dissimilarity Between Distributions 153

Thus, this dimension classifies income distributions in the different


countries placing in more positive values those that present a greater distance
with respect to the mean of percentiles above this and a smaller distance with
respect to the mean of percentiles below this, presenting at the same time a
lower inequality and a lower dispersion. Reciprocally, this dimension places in
more negative values those distributions that present a greater distance with
respect to the mean of percentiles below it and a greater proximity to the mean
of percentiles above it, presenting at the same time a greater inequality and a
greater dispersion.
Consequently, this dimension seems to inform about the contribution to the
inequality of the lower, middle and middle-upper classes or about the structure
of incomes in these classes.
The kind of difference that dimension 3 informs about seems to be related
to the structures of the left tail and central section, sometimes providing local
positive or negative skewness in these sections of the distributions, sometimes
favouring flatness or more than one relative mode, and sometimes producing
more bell-shaped and symmetric forms.
To sum up, first dimension informs us about welfare in the sense of
"standards of living-income", fundamentally influenced by the mean of the
population's incomes. The second and third dimensions seem to inform about
the different patterns of the same standards of living-income, using different
ways of internal distribution of wealth (i.e., different ways of obtaining similar
levels of global welfare with different forms of internal inequality).

3.8. Analysis of trajectories of income distributions


Bearing in mind the above interpretations of dimensions of the factor space
in which we can observe most of the variability between income distributions in
the considered countries, during the years observed by the ECHP, we will finally
analyse below the trajectories followed by these countries throughout the years.
This basic analysis will be carried out on the representations of income
distributions in the three projection planes formed by each two of the three
dimensions considered, at a larger scale (i.e., without Luxembourg).
Starting from the representations in Figures 7, 8 and 9, we can now
complement the analysis from convergence indices carried out in Section 3.4:
a) All countries show a rather sustained movement towards the left on the
dimension 1 (positions of greater welfare or standards of living-
income). Only the United Kingdom seems to have had a few setbacks
during the years 1995 and 1998, compensated largely by its progress
during the rest of the period.
154 F.J. Callealta-Barroso

<r-*r-*
•f-f-f- FIN f-f-f- \u + ** Kir
t-t-t GtE + 1-1- i n 1-r-t- ir*
t 11 ML. r-t-t Mir ** +
i-s-r Sf[ t t t u.

Figure 7. Dynamic of countries (without Luxembourg) using the proposed dissimilarity measure:
Dimensions l and 2
Source: Author's own, from ECHP 1994-2001

*-*-*• tus t t t BEL «. t f - f - FIN -f-r-r- FIU


t t t EER t - t - e - GKE T r r IRE 1 - t - t ITA t t t NE.
f-t-f PDR t t t SPA SIE t t t UK.

Figure 8. Dynamic of countries (without Luxembourg) using the proposed dissimilarity measure:
Dimensions l and 3
Source: Author's own, from ECHP 1994-2001
A New Measure of Dissimilarity Between Distributions 155

Figure 9. Dynamic of countries (without Luxembourg) using the proposed dissimilarity measure:
Dimensions 2 and 3
Source: Author's own, from ECHP 1994-2001

b) Luxembourg, the country with the higher equivalent and comparable


mean income, presents a final income distribution clearly distanced
from that of the rest of the EU-15. This is due to the inequality caused
by the progressively more heavy weight in its right tail (dimension 2)
as well as its clearly higher mean of incomes, which compensates its
greater inequality and leads it to have the higher "standards of living-
income" in the EU-15 (as shown by dimension 1, in Figure 4).
c) The internal polarizations in the convergent group of countries formed
by Germany, Denmark, Belgium and the United Kingdom, identified in
the analysis of convergence indices, are due to the different ways in
which they distribute their wealth. Despite the fact that Denmark with
Germany and Belgium with the United Kingdom converge in very
similar standards of living-income (dimension 1), they differ in the way
in which they distribute their wealth. Particularly, in the first case, in
dimension 3 (inequality due to the low and central areas of their
distributions) and in the second case, in dimension 2 (inequality due to
the right tail of the distribution).
d) Despite the exceptional behaviour of Luxembourg's income
distributions, quite different from that in the rest of the EU-15, virtually
all countries have a tendency towards average values (close to zero) in
the second dimension, with the only exception of the United Kingdom.
156 F.J. Callealta-Barroso

Those countries tend to converge, therefore, in a unique model


according to the kind of inequality that characterises this dimension
(with a right tail of medium weight).
e) However, the United Kingdom increases its value in the second
dimension since 1996, slightly but in a sustained way, remaining within
values of around +0.30.
f) In any case, according to the inequality that characterizes the second
dimension (heaviness on the right tail), we find different states of
mutual proximity at the end of the period analysed, as a consequence of
this convergence process: Luxembourg (+2.28), Portugal (+0.68), the
United Kingdom (+0.29), Sweden and Finland (-0.59 and -0.65
respectively, around a value of -0.60) and the rest of the countries
(between -0.20 and +0.10), where higher positive values indicate
higher relative heaviness of right tails.
g) Austria has distanced itself to some extent from Germany and Belgium.
But this distance is due to their different behaviours in dimension 3
(inequality due to the lower and central sections of their distributions)
since they started off from a similar position in the first year, in
dimensions 1 and 2, and they have only distanced themselves very
slightly in absolute terms.
h) With respect to dimension 1, dimension 3 seems to take a "U" shape,
for most of the countries. Dimension 3 decreases in countries with
lower levels of welfare as they increase them (increasing the part of
inequality due to the enlargement of lower and middle sections of the
income distribution). Dimension 3 increases in countries with higher
levels of welfare (decreasing the part of the inequality due to the lower
and central sections of the income distribution). Exceptions to this rule
are Belgium and the United Kingdom, whose values decrease following
corrections in this dimension for the year 1997, while Sweden and
Finland remain at stable levels.
i) Denmark, Sweden and Finland share trends of growth in the first two
dimensions towards a distribution pattern, which could be referred to as
"Central-European". Denmark, in particular, shows a greater impetus,
with higher growth in welfare and inequality related to the second
dimension (weight in its right tail), and therefore distances itself more.
However, inequality in the lower and middle sections of income
distributions in Sweden and Finland also increases, while global
inequality in Denmark is compensated to a certain extend with greater
equality in these sections (according to third dimension).
A New Measure of Dissimilarity Between Distributions 157

j) Income distributions in France, Netherlands, Ireland and Italy have also


come closer, with levels of inequality approximately similar in both
dimensions (2 and 3). Sweden and Finland tend to converge to them,
although the final dissimilarities in the second dimension continue to be
greater than in these four countries.
k) Starting off with high inequality levels in the second dimension,
Ireland, Spain and Greece have decreased their inequalities to average
European levels. But this is not the case with the inequality associated
to the third dimension, which increases in the middle-lower section,
even though in Ireland it changes its trend in 1995-96. In any case,
Ireland is the country that gets closer out of all countries with higher
mean of incomes than itself, not only in welfare but also in levels of
inequality. The only exception is Luxembourg, which distances itself
more rapidly.
1) Spain and Greece (although more so in the case of Spain) also get
closer to the group of Sweden, Finland, France, Netherlands, Ireland
and Italy (but Ireland gets closer to the richer countries more rapidly
and increases its distance from Greece and Spain). The growth of
inequality in the third dimension for Spain implies levels of inequality
in its middle and lower classes above the average of the EU-15.
m) Spain and Greece distance their income distributions with respect to
that of Portugal, which maintains levels of inequality above the EU-15
average, in both dimensions. This occurs despite the fact that inequality
is reduced in the second dimension because of its increase in the third
dimension. Regarding welfare, Portugal seems to get even further than
initially with respect to the richer countries, getting slowly closer to the
group of six countries referred to above (in j) with the exception of
Ireland, as mentioned earlier.

4. Conclusions
Taking as a starting point the problematic proposal made by Dagum (1980)
to measure the distance between income distributions, we have introduced in
this study a new measure of dissimilarity, based on Gini's mean difference.
To test its validity we have calculated the corresponding measures of
dissimilarity between all the yearly distributions of "Comparable Equivalent
Personal Incomes (net)" in the EU-15. Distributions and dissimilarities have
been constructed on the basis of the data from the EU-15 Household Panels
between 1994 and 2001.
158 F.J. Callealta-Barroso

The analysis of each yearly table of dissimilarities between countries thus


calculated, allows to describe the relative situation in the countries for each year.
In this way, we have analysed the data for the last year of reference about
incomes (2000) and established the groups of more similar countries in that
year.
Since these results are a consequence of an evolutionary process over time,
comparison of tables of dissimilarities from two different years allows us to
analyse the transformation experienced in the period of time studied. We have,
therefore, extracted some specific consequences based on the magnitudes of the
proximity relationships, between the different countries, by comparing their
position in the final year (2000) with that of the initial year (1993). Thus we
have determined the groups of countries whose distributions have come closer
(converge) and those whose distributions have mutually distanced (diverge).
However, this static comparison ignores the dynamic of the process, i.e. the
evolution of the countries to reach the point of the final transformation observed,
and does not reflect the possible sources of diversity which could explain the
different behaviours of the studied income distributions. In order to visualise
these dynamics, we have applied the ALSCAL multidimensional scaling model
to determine, first of all, in which dimensions the differences between
distributions manifest themselves (dimensions of a factor space). We have
concluded, in short, that they are fundamentally three: standard of living-
income, inequality due to heaviness on the right tail of the distribution, and
inequality due to the lower and middle classes of the distribution. The model
has also been applied to describe the trajectories followed by the distributions of
the countries considered, in that factor space.
The conclusions drawn from this analysis, applied to the general and
specific dynamic of the set of considered countries, have been detailed in the
sections 3.4 and 3.8; and we refer the reader to these sections to avoid repetition
here.

Acknowledgments
This study has been partially supported by Project I+D+I ref.: SEC2002-00999,
from the Spanish Ministerio de Ciencia y Tecnologia. Data from the European
Community Household Panel have been used here by permission given in the
agreement ECHP/15/00, between EUROSTAT and the University of Alcala
(Spain).
A New Measure of Dissimilarity Between Distributions 159

References
1. C. Blackorby and D. Donaldson. (1978). Measures of relative equality and
their meaning in terms of social welfare. Journal of Economic Theory, 18,
59-80.
2. CM. Cuadras. (1996). Metodos de Analisis Multivariate. EUB.
3. C. Dagum. (1980). Inequality measures between income distributions with
applications. Econometrita, 48(7), 1971-1803.
4. Eurostat. (2004). ECHP UDB Manual: European Community Household
Panel Longitudinal Users' Database. Eurostat.
5. B.S. Everitt. (1993). Cluster Analysis. New York: John Wiley and Sons.
6. C. Garcia, F.J. Callealta and J.J. Nunez. (2005). La Interpretation
Economica de los Parametros de los Modelos Probabilisticos para la
Distribucion Personal de la Renta. Una Propuesta de Caracterizacion y su
Aplicacion a los Modelos de Dagum en el Caso Espanol. Estadistica
Espaflola, I.N.E.
7. Hey and Lambert. (1980). Relative deprivation and the Gini coefficient:
comment. Quarterly Journal of Economics, 95, 567-573.
8. J.B. Kruskal and M. Wish. (1978). Multidimensional Scaling. Sage
University Paper series on Qualitative Applications in the Social Sciences,
7-11. Sage Publications.
9. B. Pena, F. J. Callealta, J. M. Casas, A. Merediz and J. J. Nunez. (1996).
Distribucion Personal de la Renta en Espana. Piramide.
10. B.W. Silverman. (1986). Density Estimation for Statistics and Data
Analysis. London: Chapman and Hall.
11. A.F. Shorrocks. (1982). On the distance between income distributions.
Econometrica, 50(5), 1337-1339.
12. W.S. Torgenson. (1958). Theory and Methods of Scaling. John Wiley and
Sons, Inc.
13. J. Villaverde Castro and A. Maza Fernandez. (2003). Desigualdades
Regionales y Dependencia Espacial en la Union Europea. CLM Economia,
2, 109-128.
14. F.W. Young, R. Lewyckyj and Y. Takane. (1986). The ALSCAL
Procedure. SUGI Supplemental Library User's Guide. Version 5 Edition.
SAS Institute Inc.
Chapter 9
USING THE GAMMA DISTRIBUTION TO FIT FECUNDITY
CURVES FOR APPLICATION IN ANDALUSIA (SPAIN)

F. ABAD-MONTES
Dpto. Estadistica e Investigation Operativa, Universidad de Granada
C/Fuentenueva,s/n, Granada, Espaha

M.D. HUETE-MORALES
Dpto. Estadistica e Investigation Operativa, Universidad de Granada
C/Fuentenueva,s/n, Granada, Espana

M. VARGAS-JIMENEZ
Dpto. Estadistica e Investigation Operativa, Universidad de Granada,
C/Fuentenueva,s/n, Granada, Espana

Analysis of the evolution of specific fecundity rates, by the age of the mother, i.e.
fecundity curves, and their modelling, is of vital importance when we seek to obtain
projections or forecasts of the behaviour of this demographic phenomenon. Indeed, on
some occasions these estimates do not need to be reasonable from the populational
standpoint, but may have the goal of establishing hypothetical scenarios. The present
study includes an analysis of the observed data for total births (without taking into
account the order of birth) by age and by female population. These data, for the period
1975-2001, were provided by the Statistical Institute of Andalusia (IEA) and were used
to construct synthetic fecundity indicators, which are the most basic and the most
effective means of accounting for the global behaviour pattern of the phenomenon within
a given period. Subsequently, the observed fecundity curves were fitted using a Gamma-
type distribution. This distribution is one of the most commonly used, for two main
reasons: it provides very good quality fits, and the parameters of the distribution are
identified perfectly with the indicators of fecundity. Finally, various behaviour
hypotheses are proposed, on the basis of the information obtained during the period of
analysis.

1. Data utilized and basic indicators


In order to address the demographic phenomenon we are concerned with,
we must first obtain a series of fecundity rates ranked by the mother's age, this
series being known as the Fecundity Curve. The following data were provided
by the Statistical Institute of Andalusia (IEA): number of births by mother's age

161
162 M. Abad-Montes, M.D. Huete-Morales and M. Vargas-Jimenez

and by age of the female population, on 1 January of each year being


considered, which in the case of the present study was 1975-2001, within the
area comprising the Autonomous Community of Andalusia. The study was
carried out for a population of women of fertile age, this being taken as ages 15
to 49. With this information, we calculated the specific fecundity rates for each
age (x) and each year (t), these rates being denoted by fx :

v
•>x p\l\lt p\/l/l+\ '
x £j[
2
where Nx is the number of births to mothers who have passed their 'x'
birthday during year 't', and Px is the female population having passed their
'x' birthday by 1 January in year 't'. These rates, for some of the years in
question, are represented as follows:

Figure 1. Fecundity curves in Andalusia

It is very apparent that in little more than a quarter of a century the pattern
of fecundity in Andalusia has varied spectacularly. In 1975, fecundity rates were
very high for almost all the ages, which suggests that the number of births was
also high. These high rates were mainly due to the fact that families began to
have children at a fairly young age and went on to have a lot of them; this
explains why fecundity rates were so high at the end of the fertile period. This
situation did not last, however, and the above figure shows that by 1985 the
fecundity rates had fallen significantly. Subsequently, they continued to fall,
though less dramatically. Nevertheless, it can be seen that the bell shape of the
Using the Gamma Distribution to Fit Fecundity Curves 163

fecundity curve was distorted, with the mode of the distribution shifting to the
right (as a result of the age of first pregnancy being delayed) and the appearance
of a "second mode", which reflects the births that occur to very young mothers,
normally unmarried and of children who were often unplanned.
Let us now define and construct the most commonly used indicators of
fecundity. First, we obtain the Synthetic Fecundity Index (SFI) which describes
the mean number of children per woman of fertile age:
49

SFI' = £/; ( 2)
x=\5
Other relevant indicators include the Mean Age at Maternity (MAM), which
describes whether the age of maternity is rising or falling, and the Variance in
the Age at Maternity (VAM), which provides a measure of the variability of the
occurrence of births, i.e. whether these occur at widely-spaced ages or are
closely grouped around the mean age:

I>+ 0'5)/,'
49

x=15
MAM' 49 (3)

I/;
x=15
49
£ [(* + 0,5) -MAM]2/:
<j2' = VAM' = ^ (4)
49

I/;
x=15
Table 1 shows the application of the above expressions to the available
information. The pattern of this series of indices might be more apparent in
graphical form:

Figure 2. Variation in SFI and MAM in Andalusia


164 M. Abad-Montes, M.D. Huete-Morales and M. Vargas-Jimenez

Table 1. Variation of SFI, MAM and VAM in Andalusia

Year SFI MAM VAM


1975 3.212 29.138 35.882
1976 3.238 28.873 35.225
1977 3.132 28.769 35.685
1978 3.041 28.675 35.740
1979 2.861 28.469 35.905
1980 2.739 28.387 36.236
1981 2.535 28.388 35.787
1982 2.444 28.453 35.326
1983 2.275 28.484 34.918
1984 2.140 28.472 34.823
1985 1.990 28.470 34.324
1986 1.891 28.529 34.015
1987 1.819 28.525 33.141
1988 1.760 28.471 32.322
1989 1.689 28.576 31.556
1990 1.656 28.636 30.087
1991 1.612 28.758 29.807
1992 1.581 28.936 29.087
1993 1.527 29.095 28.262
1994 1.426 29.305 28.196
1995 1.375 29.493 27.931
1996 1.329 29.704 27.518
1997 1.336 29.843 27.908
1998 1.303 29.961 28.088
1999 1.335 30.099 28.527
2000 1.358 30.157 29.011
2001 1.354 30.209 29.283

The Synthetic Fecundity Index and that of the mean age of maternity reveal
a very different behaviour pattern; the former has fallen gradually over the
years, from 3.2 children per woman in 1975 to 1.3 in the year 2001. With
respect to the mean age of maternity, the graph might be considered to present a
distorted view of reality, since although the mean age seems to fall in the initial
years, then stabilise and then rise from the late 1980s onwards, we must take
into account the very high values recorded at the beginning of this period. This
latter fact was due to the very long period of fecundity commonly presented
Using the Gamma Distribution to Fit Fecundity Curves 165

then, with mothers having a large number of children; thus, the mean age of
maternity was higher than that of mothers today.
This situation is reflected in the Index of the Variance; in the initial
years of the study, the variance was very high, and so births were not
concentrated around the mean, but widely distributed throughout the fertile life
of the mothers:
38,000

36,000

26,000
1990 2010

Figure 3. Variation in VAM in Andalusia

It should be noted that in very recent years there has been a moderate rise in
the SFI (which shows that women in Andalusia are starting to have more
children), a levelling off in the rise in the mean age of maternity and a rise in the
variance (partly due to the "second mode", referred to above, in the fecundity
curves).

2. Fitting and modelling the fecundity curves


The series of specific rates of fecundity by age for each year of the observed
series can be fitted by means of various distributions, including the Hadwiger,
Lognormal, Miras and Beta functions and, of course, the one that is most often
used, the Gamma (or Pearson type III) distribution, because of its extraordinary
advantages.
These advantages include its ease of application, the very acceptable fits it
produces and the fact that its parameters are identified with the above-listed
indicators (SFI, MAM, VAM), i.e. it depends on them.
The following expression is used to fit the fecundity curve:
_ abcyc~l exp{- by}
F(y) (5)
166 M. Abad-Montes, M.D. Huete-Morales and M. Vargas-Jimenez

where y is the class mark of the interval considered less the minimum fertile
age, i.e. y = (x + 0,5) - 15 and T(c) is the gamma function. The parameters a, b
and c of F(y) are related to the fertility indicators as follows:

a = SFI b = ^Tc = ^Y (6)


Thus, by fitting the above to the series of rates per year, we obtain:

Table 2. Fertility indicators in Andalusia

Year a b C
1975 3.212 0.39402 5.57054
1976 3.238 0.39384 5.46361
1977 3.132 0.38584 5.31246
1978 3.041 0.38263 5.23262
1979 2.861 0.37512 5.05241
1980 2.739 0.36942 4.94530
1981 2.535 0.37411 5.00861
1982 2.444 0.38081 5.12291
1983 2.275 0.38615 5.20674
1984 2.140 0.38687 5.21184
1985 1.990 0.39243 5.28599
1986 1.891 0.39774 5.38110
1987 1.819 0.40812 5.51987
1988 1.760 0.41678 5.61443
1989 1.689 0.43020 5.84028
1990 1.656 0.45324 6.18063
1991 1.612 0.46157 6.35014
1992 1.581 0.47910 6.67658
1993 1.527 0.49871 7.02920
1994 1.426 0.50733 7.25726
1995 1.375 0.51888 7.52001
1996 1.329 0.53433 7.85663
1997 1.336 0.53184 7.89414
1998 1.303 0.53263 7.96850
1999 1.335 0.52930 7.99202
2000 1.358 0.52244 7.91845
2001 1.354 0.51938 7.89913
Using the Gamma Distribution to Fit Fecundity Curves 167

Below, we illustrate some of the fits that were made:

Figure 4. Specifics rates in Andalusia


These figures show that the Gamma fit for the fecundity curves is
considerably better in the initial years; the "second mode" that appears for the
early ages, in the latter years of the series, is worse fitted, as in all of them the
curve is shifted to the left.

Figure 5. Fitted fecundity curves for Andalusia


168 M. Abad-Montes, M.D. Huete-Morales andM. Vargas-Jimenez

3. Use of the fit in forecasts of fecundity curves


Let us now make a forecast (or rather, a simulation) of the fecundity curve
that might be recorded for Andalusia for the coming years. To establish
reasonable hypotheses of future behaviour, it would be necessary to perform a
more exhaustive study of the current characteristics of fecundity in this region,
as regards fecundity by order of birth, within and outside the marriage, by
foreigners, and many other parameters. However, this is not the aim of the
present analysis; rather, we seek to perform simulations of the fecundity rates
under various more or less plausible scenarios of behaviour. Therefore, we shall
limit ourselves to establishing different hypotheses about the synthetic
parameters of fecundity. Let us examine the trend in the series of indicators for
recent years:

1 H

1.7
\ SFI
1.6
\ •
^v •

• *
*^
1.3

1990 1992 1994 1996 1998 2000 2002 2000 2002

Figure 6. Indicators for recent year

A clear pattern can be observed in all the series. The SFI, although it has
fallen, seems to have recovered in the last few years; the MAM is also
increasing, albeit slowly (which might be a consequence of the fact that the SFI
Using the Gamma Distribution to Fit Fecundity Curves 169

is improving); and what is most dramatic is the recovery of the variance (which
could indicate that women in Andalusia are having children at more widely
spaced intervals, and perhaps too that the number of children born in higher
orders of birth is greater). Taking all this into account, we assume the following
values:
[ 577 = 1,4
r'Hypothesis\MAM = 30A

VAM = 29,8

•%nd
' SFI = 1,6
2"" Hypothesis MAM = 3\
VAM = 30
SFI = 1,3
'Hypothesis
MAM = 29
VAM = 28
These hypotheses would correspond, respectively, to: 1) a slight
improvement in fecundity rates in Andalusia; 2) a markedly higher number of
children being born to each woman, with women having children at wider age
intervals; 3) fewer children born to each woman and an advance in the mean age
of maternity. Let us examine a graphic representation of these three hypotheses,
compared to the observed data for 2001:

20 25 30 36

Figure 7. Forecast fecundity curves


170 M. Abad-Montes, M.D. Huete-Morales and M. Vargas-Jimenez

Table 3. Fertility indicators in Andalusia (hypotheses 1,2 and 3)

SFI MAM VAM a b c


Hypothesis 1 1.4 30.4 29.8 1.400 0.51678 7.95839
Hypothesis 2 1.6 31.0 30.0 1.600 0.53333 8.53333
Hypothesis 3 1.3 29.0 28.0 1.300 0.50000 7.00000

4. Influence of the Index of Generational Replacement


The Index of Generational Replacement is the number of children per
woman that is necessary so that the study population can replace itself; this
indicator would correspond to an SFI of 2.1 children per woman.
Let us implement a simulation exercise in which in the coming years each
woman in the population has 2.1 children. According to the data for the current
situation in Andalusia, in order to reach this level there would first have to be an
increase in the variance of the age of maternity, as births would be more widely
dispersed with respect to age. Therefore, let us assume a value of 30. As
concerns the age at maternity, this too would have to rise (the current trend for
women is to have their children at later ages), and so we would take a mean age
of 30.5 years.
The result of the projection of the fecundity curve, under the above
assumptions for the fecundity characteristics, is shown below.

Figure 8. Fecundity curves


Using the Gamma Distribution to Fit Fecundity Curves 171

5. Conclusions
The Gamma distribution is a powerful tool for the analysis and subsequent
projection of fecundity curves for a given zone, and this distribution is the most
widely used by demographers and other researchers in the field. The results
obtained reveal its clarity and suitability for modelling fecundity patterns and for
carrying out simulations or predictions of future patterns, largely because its
parameters depend on synthetic indicators of fecundity.

References
1. Arroyo, A. (coordinator), Hernandez, J., Romero, J., Viciana, F. and Zoido,
F. (2004). Tendencias demograficas durante el siglo XX en Espana. INE.
Madrid.
2. Brass, W. (1971). Seminario sobre modelos para medir variables
demograficas (Fecundidad y mortalidad). CELADE. S. Jose de Costa Rica.
3. Brass, W. (1974). Metodos para estimar la fecundidad y la mortalidad en
poblaciones con datos limitados. CELADE. Santiago de Chile.
4. I.E.A. (1999). Un siglo de demografia en Andalucia. La poblacion desde
1900. Sevilla.
5. Leridon, H. and Toulemon, L. (1997). Demographie. Economica. Paris.
6. Pressat, R. (1995). Elements de demographie mathematique. AIDELF.
Paris.
Chapter 10
CLASSES OF BIVARIATE DISTRIBUTIONS WITH NORMAL
AND LOGNORMAL CONDITIONALS: A BRIEF REVISION*

J.M. SARABIA
Department of Economics, University ofCantabria
Avda. de los Castros s/n. Santander, 39005, Spain

E. CASTILLO
Dept. of Applied Mathematics and Computational Sciences
University ofCantabria
. Avda. de los Castros s/n. Santander, 39005, Spain

M. PASCUAL
Department of Economics, University ofCantabria
Avda. de los Castros s/n. Santander, 39005, Spain

M. SARABIA
Department of Business Administration, University ofCantabria
Avda. de los Castros s/n. Santander, 39005, Spain

The present paper is a brief survey of the classes of bivariate distributions with normal
and lognomal conditionals. Basic properties including conditional moments, marginal
distributions, characterizations, parameterizations, dependence and modality are revised.
Estimation and applications of these models are studied. Finally, some extensions of the
bivariate conditional normal model are reviewed.

1. Introduction and motivation


Let (X, Y) be a bivariate random variable with joint probability density
function (pdf) f(x,y). It is well known that the couple of marginal
distributions does not determine the bivariate distribution. For example, a
bivariate distribution with normal marginals need not to be a classical bivariate
normal density. Probably, the simplest example is

The authors thank to Ministerio de Education y Ciencia (project SEJ2004-02810) for partial
support of this work.

173
174 J.M. Sarabia et al.

f(x,y) = -exp[-{x'+y2)/2]l(xy>0)

where /(•) is the indicator function. However, the conditional distributions


functions uniquely determine a joint density function [1].
The classical bivariate normal distribution has both marginal and
conditional normal distributions. A natural question then arises: Must a bivariate
distribution with normal conditionals be a classical bivariate normal
distribution? The answer is negative, and then, another question arises: What is
this class of distributions?
We answer this question, studying and reviewing the class of bivariate
distributions with normal conditionals and next we will study the class of
distribution with lognormal conditionals. The present paper surveys bivariate
distributions with normal and lognormal conditionals.

2. Bivariate distribution with normal conditionals


Assume that (X, Y) is a random vector that has a joint density. The
marginal, conditional and joint densities are denoted by fx {x), fY (y),
fxir(x\y), fr{x(y\x) and f{x,y). We are interested in obtaining the most
general bivariate random variable whose conditional distributions are normal,
X\Y = y~N(Ml(y),cr;(y)), (1)
Y\X = x~N{Ml(x),al{x)), (2)
that is

l x-M>(y)
fx,r(x\y) = exp (3)
°>(y)

1
f^x(y\x) = exp (4)
u t(x)^2n a2(x)

where w, (w): 9? -»5R, i = 1,2 and ai (u): 5? -» 9?+, / = 1, 2 are unknown


functions.
This bivariate distribution was obtained by Castillo and Galambos [2,3].
Bhattacharyya [4] obtained the same expression solving another different
problem.
If we write the joint density as product of marginals and conditionals we
obtain the functional equation:
Bivariate Distributions with Normal and Lognormal Conditionals 175

VA,(*)V AGO
exp exp

This is a functional equation with 6 different unknown functions. There are


two ways for solving this functional equation: using general methods of
functional equations or using standard calculus techniques. For this kind of
equations, functional equation's methods have been used widely by Arnold,
Castillo and Sarabia [5]. In this particular case, it is possible to use standard
calculus. Taking logarithms we get:

' fT(y) " 1


log/(x,.y) = log <x-Mi(y)Y,
2ff,Cv)s

1
\ogfix,y) = \og -0>-//2«)2.
2a2(xy

We write:
flog f{x, y) = a,iy) + bx iy)x + c, (y)x
1 log fix, y) = a, ix) + b, ix)y + c, ix)y

Now, if we assume differentiability for fix,y),

[a2 log fix,y)


= 2ctiy),Vy
dx1

d2 log fix,y)
= a," (*) + b~;(x)y + C,"ix)y2, Vy.
dx1

In consequence, the functions c, iy) and a, iy), 6, iy), c, iy) must be


polynomials of degree 2. Finally, computing [d2 /dy2)\ogfix,y), we conclude
that a,0>) and b,iy) are polynomial of degree two. Finally, the joint pdf
fix, y) must be of this form,
f{x,y) = exp\mm +mmx + mmy + m20x2 +m,ay1 + muxy + mnxy2 + m2lx2 y + m„x2 y2}.
(5)

2.1. Conditional moments and marginal distributions


From general expression (5) and by identification with (3) and (4) we obtain,
176 J.M. Sarabia et al.

log fr(y) niiy)


= mM+mmy + mmy ,
2a»

My)
= mm+muy + mny
vUy)
l
= ™20+m2ly + m22y ,
IcrUy)
which leads to
m +m +m
ny uy io
E(X\Y = y) = fi,(y) = ~ 2 (6)
2(m22y +m2ly + m2l>)
-1
Var(X\Y = y) = a2(y) = 2 (7)
2(m22y +m2ly + m20)
E(Y\X = x) = //2W = - - ^ (8)
2{m22x +mt2x + m02)
-1
Var(Y\X = x) = a2(x) = (9)
2(m22x +m]2x + m02)

The marginal densities are given by:

fx(x) = {-2(m22x2 +mnx + rnn)p x


(10)
2 (m21x2 +mux + mm)2
e x p - 2(m20x +ml0x + mm)- 2(m x2 + m x + m )
12 n 02

fY(x) = (- 2(mny2 + muy + m20))^ x


(w12y +muy + mw)2 (ID
exPi- 2(»« ( H / + 2ffi01>' + m 00 )-
2{m22y- + mny + m20)

The joint pdf can be written in the alternative form (with the notation used
for more general exponential families):

2
(moo ™m "O f l ] (12)
f{x,y) = exp- {hx,x ) W
10 «II
mn y
m2l m22)
(m20 W)
Bivariate Distributions with Normal and Lognormal Conditionals 177

It remains only to determine appropriate conditions on the constants


mt; i,j = 0,1,2 in (12) to ensure the integrability of those marginals. The
constant /w00 will be a function of the other parameters.

2.2. Properties of the normal conditionals distribution


The normal conditionals distribution has joint density of the form (12)
where the mt constants satisfy one of the two sets of conditions
(a) m22 = ml2 = m21 =0; m20 < 0; m02 < 0; mlu < 4mmm20.
(b) m 2 2 <0; 4m22m02 >mf2; 4/K20/K22 > m2l.

Models satisfying conditions (a), are the classical bivariate normal models with:
• Normal marginals,
• Normal conditionals,
• Linear regressions and constant conditional variances.
More interesting are the models satisfying conditions (b). These models have:
• Normal conditional distributions,
• Non-normal marginal densities (see (10) and (11)),
• The regression functions are either constant or non-linear given by (6)
and (8). Each regression function is bounded (in contrast with the
classical bivariate normal model).
• The conditional variance functions are also bounded and non constant.
They are given by (7) and (9).
What if we require normal conditionals and independent marginals?
Referring to (12) the requirement of independence translates to the following
functional equation

{hx,x2) m„ in,,
n = r(x) + s(y),.
(13)
y
22/ \y
Its solution eventually leads us to
=0
which is the independence model. This result shows that independence is only
possible within the classical bivariate normal model. As consequences of the
above discussion, Castillo and Galambos [3] derive the following interesting
conditional characterizations of the classical bivariate normal distribution.
178 J.M. Sarabia et al.

Theorem f(x,y) is a classical bivariate normal density if and only if all


conditional distributions, both of X given Y and of Y given X, are normal and any
one of the following properties hold
(i) a\ (x) = Var (Y \ X = x) or a] (y) = Var (X \ Y = y) is constant
(ii) lim^„ y2a] (y) = co or lim„„ x2a2 (x) = oo
(iii) lmi^CT,(j)*Oorhm M o o cr.,(*)*0
(iv) E(Y | X = x) or E(X \Y = y) is linear and non-constant.
Proof: Direct, using the general expression for f(x, y).
Other characterizations of the classical bivariate normal distribution by
conditional properties have been proposed by Ahsanullah [6], Arnold, Castillo
and Sarabia [7,8], Bischoff [9,10,11], Bischoff and Fieger [12] and Nguyen,
Rempala and Wesolowski [13].

2.3. Convenientparameterizations
Expression (5) depends on 8 parameters, and the normalizing constant is not
available in a close form. From a practical point of view it is convenient to
provide some simpler models or some convenient parameterization. In this way
Gelman and Meng [14] proposed a simple parameterization. If in (12) we make
location and scale transformations in each variable we get:
f(x,y) x exp{-(a x2y2 +x2 + y2 + ftxy + y x + Sy)},

where a,j3, y, and£ are the new parameters which are functions of the old /w..
parameters. In this parameterization, the conditional distributions are

Py + y 1
X\Y = y~N
2(ay2+l)'2(ay2+\))
Px + 5 1
Y\X = x~N
2(ax2+l) 2{ax2+\)j

The only constraints for this parameterization are:


a > 0 a n d i f a = 0then|/?|<2. (15)

An advantage of this Gelman and Meng parameterization is that


multimodality can be studied easily.
Other important parameterization was proposed by Sarabia [15]. This
author proposed the choice /ut (u) = //,, / = 1, 2 for obtaining the joint density
Bivariate Distributions with Normal and Lognormal Conditionals 179

f(x, y\fi,a,c) =-
LltO.CJ^
exp- '1-^+^1+^^1)1 *> y>0 (16)

Where z, =(x-/il)/erl, z2 = (y-fi2)/a2 andc > 0 . In this case the


conditional distributions are given by
\
X\Y = y~N
i+c(y-Mi) l°u

Y\X = x~N )
'l + c(x-fj,)2 la1, y

and the normalizing constant by

yflc
*(c) =
C/(l/2,l,l/2c)

where U(a, b, z) represents the confluent hypergeometric function (a, z > 0)

U(a,
v b, z)y = — f e-'zf-x v(1 + / ) ' - " - ' d?.
rVia)
^ ^ Jo >

2.4. How many modes?


Gelman and Meng [14] gave an example of a distribution of this type with
two modes. The conditional mode curves (which correspond to the conditional
mean curves (6) and (8)) can intersect in more than one point. The general
problem was solved by Arnold, Castillo, Sarabia and Gonzalez-Vega [16]. Since
in this model, modes are at the intersection of regression lines, the coordinates
of the modes satisfy the following system of equations

Py+r
x = —2 ( a / + l )
(17)

Px + 8
y
~ 2(ax2+l)'

Substituting the first into the second we get


4 « V + 1a~8y* + 8ay3 + a{AS + Py)y2 + (4 - fil + af )y + 28 - J3y = 0 ,

which is a polynomial of degree five. When this polynomial has a unique real
root, the density is unimodal, if it has three distinct real roots (two modes and a
180 J.M. Sarabia et al.

saddle point) the density is bimodal, and with 5 distinct real roots (three relative
maxima and two saddle points) we have 3 modes. For example, in the
symmetric case S = y, we have 3 real roots and consequently f(x, y) will be
bimodal if and only if
aS2 > 8 ( 2 - £ ) .

2.5. Dependence
For this model the usual correlation coefficient is not limited. Other
alternative non-scalar dependence measure is the local dependence function
[17,18] defined by
_d> log f(x,y)
HX y)
' - cxay ' (18>

which gives more detailed information about the dependence. In this case, the
local dependence function is
9 2 log/(x,y)
x
Y( >y) = T~Z = mn+2m2lx + 2mny + 4m22xy.
ox ay
An interpretation of this function is possible: random variables X and Y are
positively associated in the first and third quadrants and negatively associated in
the second and fourth which supposes non-linear dependences in the model.

3. Interesting properties and applications of models with conditional


specification
The properties of the model with normal conditionals (some of them
unexpected properties) appear in another model with conditional specification.
We enumerate some of these properties of these conditional models.
Models with conditional specification use to depend on a large number
of parameters.
They include as particular cases the independence case and well known
classical models.
Its dependence structure is richer than that for the usual models (local
and global).
Sometimes they present multimodality.
Characterizations of some classical models can be obtained based on
these models.
In the most limited case, if we begin with a one-parameter family, the
dependence structure can be limited. However the marginal
Bivariate Distributions with Normal and Lognormat Conditionals 181

distributions are wider than the usual models. They present


overdispersion.
• The resulting densities are easy to simulate using Gibbs sampler
techniques, indeed they are tailor made for such simulation.

4. Bivariate distributions with lognormal conditionals


In this section the class of bivariate distributions with log-normal
conditionals is reviewed. This model has been recently proposed by Sarabia,
Castillo, Pascual and Sarabia [19] and has important applications in the study of
bivariate income distributions. We work with the triparametric version of the
lognormal distribution that we will denote by X ~ LN(/u,a,S) and has pdf,

f(xr,H,o)= = e x H — - —S '—£- \, x>8

with fi e 9? y a > 0. Then, we are interested in the more general random


variable (X,Y) satisfying,

X\Y = y~LN{8l,nl(y),al(yj), (19)

Y\X = x~LN(Sliti1(x), a2 (x)). (20)

If conditions (19) and (20) are satisfied, the joint probability density
function takes the form,

f{x, y; S,m) = {x- 5,)"' (y - S2)"' exp{- M , « T M u s l (y)} (21)

where Us (•) denotes the vector

u3> (z) = (l, log(z - 5,), [log(z - 8, )]2 ) T , i = 1,2,

and M = \miJj is a 3x3 parameters matrix. The parameters \m,\ must be


chosen such that (21) to be integrable. Expanding formula (21) we obtain the
formula:

f{x, v; 8, m) = [(x - 8,)(v - 82)]"' exp{- [mM+ u (z,, z 2 ) + v (z,, z2)]} (22)

where
u(x,y) = mlox + m20x2 +mmy + m02y2 +muxy,
vix,y) = mnxy2 +m2lx2y + m22x2y2
182 J.M. Sarabia et at

and z, = log(jc - Sl), z2 = log(x - 82). The function « ( y ) contains the


terms that appear in the classical model and the function v ( y ) contains new
terms that appear in these conditional models. The conditional parameters /xt (u)
and <T, (u) ,i= 1, 2 are:

(23)
A, 00 =
1$» iJ*
„ ,-. W2,Z, +W„Z, +W0, (24)
2(m22z, + m12z1+m02J

and

<r2O0 = (25)
2^ J'
<r2(*) = (26)
2(m 22 z 2 +m, 2 z l +w 02 )'

4.1. General properties


The constant exp(w00) is the normalizing constant and it is a function of
the rest of parameters. In order to have a genuine joint pdf, sufficient conditions
for integrability of (22) are that the parameters satisfy one of the following two
sets of conditions:
ml2 = m21 = m22 = 0, mm > 0, m20 > 0, m2 < 4mmm10, (27)

w 2 2 >0, m'2<4m22mm, w21 <4m22m2l). (28)

If (27) is satisfied, we obtain the classical bivariate lognormal distribution.


If (28) satisfies, we find a new class of distributions. The marginal distributions
are given by (x > Si):

2 \ +\mlxz\ +mltz, +mm)

fx(x;Sl,m)=-
exp< - '20^1
' ¥ (29)
J{m12z*+mnzK+mm)l27T

and (y > S2)


Bivariate Distributions with Normal and Lognormal Conditionals 183

2 \ (wi2222+W„Z2+W|0)2
exp-^ (30)
4(m22z2 + m2lz2 + m20)
fr(y;S2,m) =
^{m22z\+muz2+m2l))l27r

Note that (29) and (30) are not lognormal distributions if conditions (28)
hold. These marginals depend on all eight parameters and then present a high
flexibility. The conditional moments of (22) are (r = 1,2,...) :

E[{X-Sl)r\Y]=QxV{rJul(Y) + r1a:(Y)/2}, (31)

E[{Y-S2)r \x]=exp{rMl(X) + r2a12(X)/2}, (32)

where //,(£/) and <r,2(£/), 7 = 1,2 are given in (23) to (25). Combining (31)-(32)
with (29)-(30) the moments of the marginal marginal distributions as well as the
correlation coefficient can be obtained. The usual correlation coefficient is not
obviously limited and the local dependence function (18) is
mn+2m2, log(x) + 2ml2 log(y) + 4m22 log(x) log(;>)
r(x,y) = -
xy

5. Estimation
For this kind of conditional models, several estimation strategies have been
proposed by Arnold, Castillo and Sarabia [5]. Here we pay attention in
techniques based on the likelihood. The family of densities (5) is a member of
the exponential family with natural sufficient statistics:

(S^.I^.I'MyM^.Z^'I^.Z^ 1 ). <33>
However, inference from conditionally specified models is not direct
because the normalizing constant is an unknown function of the parameters. The
shape of the likelihood is known but not the factor required to make it integrate
to 1. A method to avoid dealing with the normalizing constant consists of using
both conditional distributions. We define the pseudolikelihood estimate of 0 to
be that value of 0 which maximizes the pseudolikelihood function defined by:

7@) = n;., /,,,(*, | y,;i)fr\Ay, \ xni), (34)


184 J.M. Sarabia et al.

According to Arnold and Strauss [20] these estimators are consistent and
asymptotically normal. In this kind of conditional models, these estimators are
much easier to obtain than the maximum likelihood estimates.

6. Applications
The model with normal conditionals can present several modes and in
consequence is a natural alternative to mixture models for modelling
heterogeneity and also can be used for modelling a population composed for
several cluster. Arnold, Castillo and Sarabia [21] used this bivariate distribution
for fitting the classical Fisher data where there are pooled two different samples.
The model was fitted by pseudo-likelihood.
The model with lognormal conditionals has been used by Sarabia, Castillo,
Pascual and Sarabia [19] for modelling bivariate income distributions, using the
information contained in the European Community Household Panel. These
authors have used the Spanish microdata (approximately 10,500 individuals),
focusing analysis on waves 1, 3 and 6. It is important to point out that are a big
number of bivariate data with high variability. They fitted to these two sets of
data the classical bivariate lognormal distribution and the bivariate lognormal
conditional distribution (22) with <5. = 0, maximizing the pseudo-likelihood
function given in (34). The resulting fitted model is very acceptable and implies
a very significant improvement in the fit of the bivariate lognormal conditional
distribution.

7. An extension: Bivariate distributions with skew-normal


conditionals
Several extension of the previous models are possible. Bivariate and
multivariate distributions with t-Student conditionals were studied by Sarabia
[22]. Sarabia, Castillo, Pascual and Sarabia [19] proposed several extensions of
the bivariate distribution with Lognormal conditionals given by (21). In this
section we review models with Skew-Normal conditionals that were studied by
Arnold, Castillo and Sarabia [23]. The univariate skew-normal distribution is a
class of distributions whose density takes the form
f(x;A) = 2<f>(x)<l>(Ax), xeK, (35)

where $(x) and Q>(x) denote, the standard normal density and distribution
functions, respectively. The parameter A E 9 ! is a parameter which governs the
skewness of the distribution. We will write X ~ SN(A). The skewness of this
Bivariate Distributions with Normal and Lognormal Conditionals 185

distribution is a bit limited. In order to increase coverage of the (/?,,/?2) plane it


is convenient to introduce an extra parameter and define densities:

We are interested in the form of the density for a two dimensional random
variable (X,Y) such that:

X\Y = y~SN{XXy)) (37)


and
Y \ X = x ~ SN{A2(x)) (38)

for some functions At(y) and A2(x). If (37)-(38) are to hold, it must exist
densities fx(x) and fr(y) such that

fXY(x,y) = 2 Kx) <K(A,(y)x) fr(y) = 2 #(y) fl»fo(x)y) fx(x). (39)

In this functional equation, fx (x), fr (y), A, (y) and A2 (JC) are unknown
functions to be determined. It is not hard to proof that fr (y) = #(y) and
fx (•*) = #(•*) • Then we have:
0(A,(y)x)=(t>(A2(x)y), Vx,y
and then we get the solutions A, (y) = Ay and A2 (x) = Ax where A is a
constant. In consequence, we have two types of solutions to the previous
functional equation. The first one corresponds to the independence case. In this
situation we have A, (y) = /I,, A2 (x) = A2, X ~ SN(A2), Y ~ SN^) and
fxy(x,y) = 4 <*(*) # 0 0 <X>(A2x)<I>(A,y)

The second situation corresponds to the dependent case. Here A, (y) = Ay


and A2(x) = Ax and consequently fx (x) = #(x), /,,(y) = #(y) and
fxr(x,y) = 2^x)^y)(t>{Axy). (40)

The previous joint density has standard normal marginals together with
skewed normal conditionals. The corresponding regression functions are non-
linear and take the form:

E{x\Y = y) = ^ . - r ^ = .
V*- yjl + A'y2

The correlation coefficient is:


186 J.M. Sarabia et al.

,vv^ • ,•>•. Ufa 2,2,1 2A2;)


p(X, Y) = sign (/I) x u ' ' / -

where U(a,b,z) represents the Confluent Hypergeometric function. It can be


shown that | p ^ , y ) | < 0.63662. Again, multimodality is possible. If
| A, | < ^Tt/l a 1.25, the density (40) has a unique mode at the origin, (0,0), and
if I A | > •yfn/2 , the density (40) is bimodal. More complicated models based on
the density (36) have been considered by Arnold, Castillo and Sarabia [23].

References
1. B.C. Arnold and S.J. Press. (1989). Compatible conditional distributions.
Journal of the American Statistical Association, 84, 152-156.
2. E. Castillo and J. Galambos. (1987). Bivariate distributions with normal
conditionals. Proceedings of the International Association of Science and
Technology for Development, 59-62. Anaheim, CA: Acta Press.
3. E. Castillo and J. Galambos. (1989). Conditional distributions and the
Bivariate normal distribution. Metrika, 36, 209-214.
4. A. Bhattacharyya. (1943). On some sets of sufficient conditions leading to
the normal Bivariate distribution. Sankhya, 6, 399-406.
5. B.C. Arnold, E. Castillo and J.M. Sarabia. (1999). Conditional specification
of statistical models. Springer Series in Statistics. New York: Springer
Verlag.
6. M. Ahsanullah. (1985). Some characterizations of the Bivariate normal
distribution. Metrika, 32, 215-218.
7. B.C. Arnold, E. Castillo and J.M. Sarabia. (1994a). A conditional
characterization of the multivariate normal distribution. Statistics and
Probability Letters, 19, 313-315.
8. B.C. Arnold, E. Castillo and J.M. Sarabia. (1994b). Multivariate normality
via conditional specification. Statistics and Probability Letters, 20, 353-354.
9. W. Bischoff. (1993). On the greatest class of conjugate priors and
sensitivity of multivariate normal posterior distributions. Journal of
Multivariate Analysis, 44, 69-81.
10. W. Bischoff. (1996a). Characterizing Multivariate Normal Distributions by
Some of its Conditionals. Statistics and Probability Letters, 26, 105-111.
11. W. Bischoff. (1996b). On distributions whose conditional istributions are
normal. A vector space approach. Mathematical Methods of Statistics, 5,
443-463.
12. W. Bischoff and W. Fieger. (1991). Characterization of the multivariate
normal distribution by conditional normal distributions. Metrika, 38, 239-
248.
Bivariate Distributions with Normal and Lognormal Conditionals 187

13. T.T. Nguyen, G. Rempala and J. Wesolowski. (1996). Non-Gaussian


measures with Gaussian structure. Probability and Mathematical Statistics,
16,287-298.
14. A. Gelman and X.L. Meng. (1991). A note on Bivariate distributions that
are conditionally normal. The American Statistician, 45, 125-126.
15. J.M. Sarabia. (1995). The centered normal conditionals distribution.
Communications in Statistics, Theory and Methods, 24, 2889-2900.
16. B.C. Arnold, E. Castillo, J.M. Sarabia and L. Gonzalez-Vega. (2000).
Multiple modes in densities with normal conditionals. Statistics and
Probability Letters, 49, 355-363.
17. P.W. Holland and Y.L. Wang. (1987). Dependence function for continuous
Bivariate densities. Communications in Statistics, Theory and Methods, 16,
863-876.
18. M.C. Jones. (1996). The local dependence function. Biometrika, 83, 899-
904.
19. J.M. Sarabia, E. Castillo, M. Pascual and M. Sarabia. (2005). Bivariate
income distributions with lognormal conditionals. International Conference
in Memory of Two Eminent Social Scientists: C. Gini and M.O. Lorenz,
23-26.
20. B.C. Arnold and D. Strauss. (1988). Pseudolikelihood estimation. Sankhya,
Ser. B, 53,233-243.
21. B.C. Arnold, E. Castillo and J.M. Sarabia. (2001). Conditionally specified
distributions: An introduction (with discussion). Statistical Science, 16,
151-169.
22. J.M. Sarabia. (1994). Distnbuciones Multivariantes con Distnbuciones
Condicionadas t de Student. Estadistica Espanola, 36, 389-402.
23. B.C. Arnold, E. Castillo and J.M. Sarabia. (2002). Conditionally specified
multivariate skewed distributions. Sankhya, A, 64, 1—21.
Chapter 11
INEQUALITY MEASURES, LORENZ CURVES AND
GENERATING FUNCTIONS

J.J. NUNEZ-VELAZQUEZ*
Departamento de Estadistica, Estructura Economicay O.EI., University ofAlcald
Plaza de la Victoria, 2, 28802 Alcald de Henares (Madrid), Spain

This paper studies the foundations of income inequality measures and its relations with
Lorenz curves, the Pigou-Dalton transfer principle and majorization relations among
income vectors. So, the historic development of these concepts is surveyed to see how the
actual set of properties and axioms was generated, in order to define when an inequality
measure has a good perform. Finally, this work includes an analysis studying the problem
associated with inequality orders and dominance relations among income vectors.

1. Introduction
It may be considered that the interest raised in the last thirty years in the
researcher's community, related to the study of economic inequality aspects has
begun since the seminal paper by Atkinson (1970) and the book by Sen (1973)
as its main focuses. Both of them have had profound effects on this research
field. Since then, papers and books on this task appear frequently in the
economic literature and this root interest has been spread to several nearby
important social problems, like poverty, mobility, polarization and privation
studies, among others.
In this period of time, different approximations to this problem have been
developed, including social welfare assumptions from Economic Theory to
support several economic inequality measures3. However, the number and
variety of these assumptions have considerably increased in such a way that
some of them have been matter of hard controversy. Some outstanding examples

This work is dedicated to the memory of Camilo Dagum, recently died. He was a direct disciple of
C. Gini and a master of several generations of researchers.
'When inequality measures are referred, we must understand them as functions or indicators defined
over an income distribution. So, these indicators are supposed to measure how much inequality is
present in the resources sharing. In other words, there are no connections with the same commonly
used concept in Measure Theory. So, along the paper, we shall use the words indicator and
measure in an interchangeable manner.

189
190 J.J. Nunez-Velazquez

of these works could be Cowell (1995), Foster (1985), Nygard and SandstrQm
(1981) or Dagum (2001), among others. In the Spanish case, we would quote the
works published by Zubiri (1985), Ruiz-Castillo (1987) or Pena et al. (1996).
Nevertheless, despite the huge amount of related literature, Lorenz curve
paradigm remains nowadays as the cornerstone of economic inequality analysis.
Indeed, Lorenz curve should be considered as the basic tool to be taken into
account to support inequality analysis, even though this proposal was presented
by Lorenz (1905), more than a century ago. Along all this time, Lorenz curve
has resisted all the alternative proposals suggested to modify it.
Because of the above argument, one of the main objectives of this paper
must be to pay tribute to Lorenz, a century after his curve's proposal. To put
Lorenz curves in context, a description of the 9 pages long original paper is
quoted from Arnold (2005), which was pronounced at the Siena Congress, just
celebrated owing to the commemoration of such an event. He wrote: ... In the
last 3 pages of the paper he describes what will become the Lorenz curve.
Actually there are only 35 lines of text and two diagrams devoted to the topic. It
has all grown from that! ...
First of all, in this paper, we review the classical concepts related to income
majorization, in order to identify the theoretical background underlying Lorenz
curves and economic inequality measures in the way we understand them
nowadays. This aim should be justified because we must reconsider what the
underlying basic concepts are really imbedded under economic inequality
measurement. In doing so, it would result in a better comprehension about what
elements are playing a significant role when economic inequality is intended to
be measured. Moreover, the aforementioned understanding must allow us to
back up an efficient selection about which the better inequality measures could
be. In this sense, a set of properties will be proposed in order to analyze the
suitability of a huge amount of economic inequality measures. Additionally, a
brief analysis of other related concepts and methods, recently proposed, will be
included. Variety of themes this paper deals with, advice us to provide the paper
with a well-disaggregated structure, which is exposed next.
So, the paper is structured as follows. In section 2, a brief chronology of
published concepts related to economic inequality is developed, emphasizing
those which are close to Lorenz curves methodology. Section 3 is devoted to set
the basic framework with respect to income distribution space and to present the
crucial majorization concepts. Section 4 studies, on the one hand, the meaning
of economic inequality and, on the other hand, it is dedicated to Lorenz curves
methodology to analyze income inequality as well as connected methods like
Inequality Measures, Lorenz Curves and Generating Functions 191

direct functional forms estimation or the less-known generating functions.


Section 5 presents the Pigou-Dalton Transfer Principle, as a key element to play
a role in economic inequality; and enlightens its relation with majorization
comparisons. Section 6 connects income inequality measurement with Schur-
convex or S-convex functions, closing the circle of relationships among
analytical concepts exposed before. However, section 7 intends to face another
point of view in economic inequality analysis, as axiomatic approach may be
considered; nevertheless, it will be shown how, in essence, it constitutes another
way of expressing the same ideas. Section 8 carries out the discussion about
alternative inequality comparison criteria suggested in the literature. Once all
these elements has been discussed, a group of properties is proposed to be used
in selecting economic inequality measures in section 9, and it will be proved
how some well-known indicators fail to fulfil them. Finally, the paper ends
summing up the outstanding conclusions.

2. Brief historical evolution of concepts related to economic


inequality analysis
First studies about economic inequality of income distributions must to be
related to the majorization relationship between a pair of them. So, Muirhead
(1903) establishes a relation of majorization concept to progressive income
transfers, which will be expressed in formal terms later. In the year 1905, M.O.
Lorenz proposes his curves to analyze income and wealth inequality and he
points out that his curve's bow is an indicator of the inequality degree included
in the distribution.
In 1912, C. Gini proposes his indicator to measure inequality, using the
mean difference measure obtained by averaging the differences between every
pair of incomes from the distribution. In the same year, A.C. Pigou suggests the
ideas which lately will be stated as the Transfer Principle, when H. Dalton
expressed them into rigorous terms in the year 1920. This principle is one of
four well-known properties, including the so-called Dalton Population Principle.
Related to the majorization concept again, I. Schur proposed his convexity
concept in the year 1923. This concept is strongly closed to bi-stochastic
matrices and therefore near to the progressive transfer concept too.
In 1929, G.H. Hardy, J.E. Littlewood and G.Polya publish their first results
about inequality by means of an article included in the journal The Messenger of
Mathematics. It can be considered as a precedent of his seminal book
Inequalities, whose first edition appeared in the year 1934. However, in 1932, J.
Karamata proves the theorem which is named after him since then, although
192 J.J.Nunez-Velazquez

Hardy, Littlewood and Polya had proposed it in 1929. Its content can be
regarded as one of the cornerstones of economic inequality measurement.
In the year 1979, J. Gastwirth proposes the explicit expression of general
Lorenz curves, allowing the use of random variable-based ones, whatever its
type would be.
Obviously, this brief review must contain a mention to the aforementioned
paper by A.B. Atkinson, in 1970, where he sets key arguments on the normative
content of inequality measures through a family of indicators named after him.
These arguments are based on the general mean function or generalized mean,
but they are not free of controversy. Again, it is necessary to make reference to
the appearance of the book On Economic Inequality, by A.K. Sen in 1973,
which has been reedited in 1997, including a wide annexe with several advances
in economic inequality and poverty registered during the elapsed time of 25
years between them. This new annexe has been written by the same author with
J.E. Foster.
Precisely, J.E. Foster published, in 1985, his renowned theorem, where he
determined the conditions an indicator has to fulfil to be compatible with the
order generated using the Lorenz curve. These conditions impose suitable
properties on inequality measures in order to reach a performance according to
Lorenz curves do and they are conceptually different from the aforementioned
normative ones. This result constitutes the basic system of properties which are
required an inequality indicator to achieve and it may be considered as a starting
point in the search of relevant properties, so-called inequality axioms, to select
an adequate indicator. Nevertheless, this way of choosing an inequality indicator
had some precedents in the literature.
Finally, in 2001, C. Dagum publishes in the Spanish journal Estudios de
Economia Aplicada a summary from several papers published before in different
journals, since 1981. In this work, the author exposes his point of view about the
economic foundations of different inequality measures in contrast to the
normative view derived through the Atkinson's approachb.
Along this necessarily brief revision, we have tried to point out the
evolution that economic inequality study has registered, taking into account
those several concepts configured as fundamentals on this subject treatment.
Although nowadays these contents are usually presented as properties or axioms
like it was explained before, we believe this paper will show the links among
these properties and basic concepts supporting them. In the following sections,
the aforementioned concepts will be developed.

b
A more detailed description of this point of view can be seen in Dagum (1990).
Inequality Measures, Lorenz Curves and Generating Functions 193

3. Income distribution space and the majorization concept'


Firstly, we are going to define the income distribution space as a support
tool for the remaining concepts. So, if the population contains N individuals, an
income distribution will be any real multidimensional vector from RN, provided
that all its components must be non negative so that there exists something to
share among the individuals compounding the population. More precisely:

>0
DN=Ux„x2,...,xN):xi>0,i = l,...,N;|>i f 0)

But attending only to the inequality in the distribution, every vector


permutation provides the same resource sharing, whatever would be the identity
or the place corresponding to each single individual11. To formalize this idea, let
IlNxN be the N-order permutation matrices set and let us define the following
equivalence relation over D N :
x « y <=> x = n - y , n e n N x N , (2)

so that we shall choose the ordered income vector, from smallest to largest, as
the canonical element of each equivalence class. Thus:

DN = D V * ={(X1,X2,...,XN):X1 < X 2 < . . . < X N } (3)

Then, the income distribution space to be considered will be:

D = UD N
N=2

Now, we can go on to define the majorization relation between income


distributions belonging to D. Let x, y be two income distributions belonging to
E>N; then we shall say that x is majorized by y and write x < y when it is more
equally distributed. So:

'Here, we are referring only to the economic concept of income, although the analysis can be applied
directly to other concepts related to the individual or household economic positions, like earnings,
expenditures or wealth. However, there is controversy about what the economic position must be
used, because of both theoretical grounds and disposable data reliability (Ruiz-Castillo, 1987;
Pena et al., 1996, among others).
d
This argument is usually known as symmetry or anonymity axiom related to inequality measurement
(Foster, 1985).
194 J.J. Nunez-Velazquez

ix^Syj, k = l,2,...,(N-l)
i=l i=l
x -< y <=> N N
(4)
2>i = Syi
i=l i=l

It is easy to note how this relation constitutes the straight precedent of


Lorenz curves comparison between pairs of income distributions, as we shall
present later. However, majorization turns to be a more restrictive relationship
because it only allows comparisons between income distributions defined over
equally sized populations and where the total amount of shared resources has to
be the same. It is trivial to prove that this relation defined over DN presents a
partial order or quasi-order structure.

4. Economic inequality, Lorenz curves and generating functions


An early precedent of the inequality concept can be found when V. Pareto
(1897) identifies a smaller inequality with a situation where personal incomes
tend to be more similar. Castagnoli and Muliere (1990) points out how this
argument constitutes an early version of Transfer Principle, although it is not
formalized yet.
On the other hand, majorization relationship includes the more-equally
distributed concept, meaning that its components are more similar than the
respective ones in the other vector we are comparing with. Using this fact, it is
possible to clarify definitively the inequality concept we are trying to measure.
In that respect, it is very useful the following quote from S. Kuznets to describe
the concept: When we speak about income inequality, we are simply referring to
income differences, without taking into account its desirability as a reward
system or its undesirability as a system contradicting a certain equality scheme.
(S. Kuznets, 1953, pg. xxvii). According to the previous statement, an economic
inequality measure is not supposed to judge if the sharing is adequate or not but
to quantify if the actual distribution is near or far from the equality situation,
where all components in the population perceives the same income, although
this fact has not to be a goal itself. In such a sense, Bartels (1977) demands a
reference distribution in order to an inequality measure could compare with.
Although such a reference might be what society considers a fair distribution,
this approach did not find support in the literature because of the inherent
difficulty involved in the reference distribution determination. Therefore, the
usual reference distribution to compare with turns to be the egalitarian one,
where all its components are the same and equal to mean income.
Inequality Measures, Lorenz Curves and Generating Functions 195

And still inequality assertions generate great repercussions, as A.K. Sen


points out: The idea of inequality is both very simple and very complex. At one
level it is the simplest of all ideas and has moved people with an immediate
appeal hardly matched by any other concept. At another level, however, it is an
exceedingly complex notion which makes statements on inequality highly
problematic, and it has been, therefore, the subject of much research by
philosophers, statisticians, political theorists, sociologists and economists (A.K.
Sen, 1973, pg. vii). Of course, thirty three years later, it can not be said that
exists an agreement among researchers about a consensus method to measure
inequality, while the nearest instrument to this situation comes to be the Lorenz
curve we are going to present now.

4.1. Lorenz curves and the Lorenz dominance criterion


The curve, proposed by Lorenz (1905), can be defined in the following way.
Let x be an income distribution from D. Using its ordered components,
cumulative relative frequencies of individuals and resource shares are
calculated, keeping in mind that they are non negative. Also, let x be the
income mean; then:

Po=°;p, =—>i=1>2,...N
N (5)
1 i
q o = 0 ; q i = — Z X J ,1=1,2,..^
Thus, the Lorenz curve, L(p), is obtained by linking the points contained in
the set {(pi,qO; i = 0,1,...,N}, using linear interpolation to generate a polygonal
curve. Obviously, L(p) is inscribed within the unit square. So, if L(p) is near to
the unit square's diagonal, then the income sharing will be near to the egalitarian
situation. Else, the more bent the bow's curve is, the more inequality will be
present in the income distribution.
The previous definition is a descriptive one, but it can be easily generalized
to the case when we are dealing with a non-negative random variable, X, to
model incomes. In such a case, let u be its expectation E(X) and let F(x) be its
cumulative distribution function. Now, definition (5) can be expressed as
(Kendall and Stuart, 1977, for example):

p = F(x)=| 0 x dF(t)
(6)
q = L[F(x)] = Ij 0 x t.dF(t)
196 J.J. Nunez-Velazquez

In this context, Gastwirth (1971) suggests an integrated framework,,


allowing us to express Lorenz curve into an explicit general way:

L(p) = i j 0 P F - 1 ( t ) d t (7)

where F - 1 (p) = inf {x: F(x) > p}.


Lorenz curve properties are very well-known6, but it is worth standing out
that if L(p) is derivable, its slope will be given by:

t ( p ) = ^ — ^ , pe(0,l) (8)
H
Also, the difference (related to the diagonal) function will be:
A(p) = p - L ( p ) , pe[0,l] (9)
and it reaches a maximum at the point p = F(u). Moreover, it is particularly
interesting the following resultf:
Theorem 1 (Iritani and Kuga, 1983): Let q = L(p) be a function defined over
the interval [0,1]. Then, L(p) is a Lorenz curve corresponding to some non-
negative random variable X, if and only if L(p) satisfies the following
properties:
L(0) = 0,L(1)=1.
L(p) is convex and non-decreasing.
However, the precedent discussion about Lorenz curves suitability had as a
main objective making inequality comparisons between income distributions. To
accomplish this aim, the following relationship, called Lorenz dominance
criterion is going to be established.
Definition 1: Let x, y e D. Then x is said to be less unequal than y in the
Lorenz sense (x <Ly) when the Lorenz curve associated to y contains completely
the corresponding to x. Formally:
x<Ly»Lx(p)>Ly(p) , Vpe[0,l] (10)
Related to majorization relation, Lorenz criterion turns out to be a more
versatile relation because of its capability of making comparisons between

c
See, for example, Casas and Nunez (1987) or Nygard and Sandstrom (1981), for more details.
•Analyses of sampling results about Lorenz curves are out of the scope of this paper. However, there
are very interesting references in this field, beginning with Goldie (1977) on strong consistency of
empirical Lorenz curves, and Beach and Davidson (1983) or Beach and Richmond (1985) on
asymptotic normality of Lorenz curves estimates.
Inequality Measures, Lorenz Curves and Generating Functions 197

income vectors coming from different-sized populations. Nevertheless, in this


last case, it becomes evident that:

^ y ~'fH£ ; x,yeDN (11)

Thus, what it is under the Lorenz dominance criterion is just the


majorization concept. Now, Lorenz relationship presents a pre-order structure
(reflexive and transitive properties), whenever it be defined between
proportional income distribution classes (or partial order if not). Therefore, if
two Lorenz curves cross each other, the associated income distributions will be
no comparables and this is a very frequent situation in practice, giving an
incomplete inequality ranking as a result. In applications, this structure used to
be plotted using the so-called Hesse diagrams, as it can be seen in Pena et al.
(1996), for example.
Actually, the absence of a total order is the main reason to use inequality
measures, in order to overcome the lack of ranking in an appreciable number of
paired comparisons and to quantify inner inequality levels too. Inequality
indicators induce a total order because its values are real numbers but the price
we have to pay is the inclusion of underlying weighting schemes on the
distributions, and these are not always clear enough to deduce. So, several
inequality indicators may produce different rankings on the implied
distributions.

4.2. Parametric estimation of Lorenz curves


Theorem 1 turns to be a very important result because of its characterization
of Lorenz curves. Moreover, the researcher may decide sometimes to estimate a
parametric Lorenz curve directly from data (Pi,qO- In such a case, it is strongly
necessary to know what the possible parametric functional forms could perform
like a Lorenz curve and Theorem 1 shows what the required properties must be.
Recently, this adjusting procedure has constituted an active researching field,
whose guidelines are presented below.
Some of the simplest functional forms used as a parametric Lorenz curve
are the following ones:
i) Potential: L(p) = p b , b > 1 (e.g., Casas and Nunez, 1991)
ii) Exponential: L(p) = p.a 0 " 0 , a > 1 (e.g., Gupta, 1984)
iii) Potential-Exponential: L(p) = pb.e"c(1"p) , b > 1, c > 0 (Kakwani and
Podder, 1973).
198 J.J. Nunez-Velazquez

There exists a great deal of more complex functional forms. Furthermore, it


can be proved the next assertion: If some Lorenz curves satisfy the conditions
included in Theorem 1, then every convex mixture of them will fulfil such
conditions too. (Casas, Herrerias and Nunez, 1997). In other words, every
convex mixture of Lorenz curves turns out to be another Lorenz curve. So, we
have an infinite number of possible functional forms for estimating Lorenz
curves.
In addition, there is another method to generate new functional forms
capable to estimate Lorenz curves indeed. Now, the procedure consists in
obtaining new functional Lorenz curves by applying specific transformations
over an original one.
Theorem 2 (Sarabia, Castillo and Slotjje, 1999): Let L(p) be a Lorenz curve.
Then, the next transformations generate Lorenz curves too8:
a)L a (p) = p a . L ( p ) , a > l .
b) L a (p) = pa.L(p) , 0 < a < l , L » > 0 .
c)L ? ,(p) = L ( p / , y>\.
Moreover, another approach in this field consists of imposing directly
parametric cumulative distribution functions over the income. In this respect,
examples of such models used in practice are the Wakeby distribution
(Houghton, 1978), generalized Tukey lambda (Ramberg, et al., 1979) or Mc
Donald (Sarabia, Castillo and Slottje, 2002).

4.3. Generating functions and Lorenz curve functional forms


In this section, we start with the definition of the density generating
function, to go on with its generalization related to Lorenz curves. Finally, we
are going to explore the relationship both types of generating functions.
So, the (density) generating function associated to every continuous random
variable with f(.) as its probability density function may be defined as (Callejon,
1995):

g F (x) = ^ (12)
f(x)
This generating function allows us to obtain ordered families of Lorenz
curves. Such ordered families only depend on a parameter and, therefore, give
us a total order structure on paired comparisons using Lorenz dominance

Regarding estimation metjods in such a matter, see e.g. Castillo, Hadi and Sarabia (1998).
Inequality Measures, Lorenz Curves and Generating Functions 199

criterion, and this fact is due to the only parameter they have. The simplest case
corresponds to strongly unimodal distributions or, in other words, those whose
probability density function is log-concave. That is:

_d_ = gp(x)<0, V x e R (13)


dx f(x)

If the support of the random variable X consists on an interval (a,+co),


Lorenz curves derived from this definition will be (Arnold et al., 1987):
LT(p) = ¥(¥-'(p) - T ) , T > 0 (14)

Some examples of this kind of random variables are log-normal and Pareto
distributions. As it may be seen through the mentioned examples, the main
drawback with these distributions is the rigidity as real income models .
On the other hand, in the same way as before, the Lorenz curve generating
function can be defined mutans mutandi, assuming now L(.) as a Lorenz curve
from a continuous random variable:

gL(P) = ^ 7 T =>L(P) = k.exp(jg L (p).dp)=k.e G( P ) , (15)


L(p)
but, in this case, the obtained function might not be a Lorenz curve because of
its additional properties. So, we need to establish what restrictions must be
imposed on that definition. To answer this question, it can be easily proved the
next result:
Proposition 1 (Herrerias, Palacios and Callejon, 2001): In the same above
circumstances, L(p) is a Lorenz curve if and only if the following conditions are
satisfied:
a) k = exp(-G(l))
b) lim G(p) = -oo
/7->0 f

c) g L (p)>0, Vpe(0,l]
d)(g L (p)) 2 +g' L (P)>0, Vpe(0,l]
So, if a function gi_(p) fulfils the above conditions, then it will give a Lorenz
curve through the associated generating function. Using several generating
functions, Garcia and Herrerias (2001) has obtained a number of well-known

h
Although more complex, another method to generate ordered families of Lorenz curves can be seen
in Sarabia, Castillo and Slotjje (1999).
200 J.J. Nunez-Velazquez

functional forms, corresponding to Lorenz curves associated to some


probabilistic income models.
However, the relationship between Lorenz curves and probability density
function is so close than we would expect a readily explicit relationship between
both generating functions, but diis is not the case. The aforementioned
relationship turns out to be hard to accomplish. So, if implied functions are
sufficiently differentiable, then we can only prove the next equations system,
using the same notation as above:

gF(x) = E(X).f\x).L"[F(x)]
(16)
gL[FM]= X
E(X) L[F(X)]

5. Majorization and the Pigou-Dalton transfer principle


Despite the informal precedent appeared in Pareto (1987), it was H. Dalton
(1920, pg. 351) who stated the Transfer Principle, from the guidelines exposed
by Pigou (1912, pg. 24): If only there are two income receptors, and a transfer
from the richest to the poorest is produced, inequality must decrease. Below, he
imposes the transferred account must not change the relative positions of both
involved individuals as an obvious restriction and he concludes that the most
equalizer transfer is half of the income difference between them.
In a general version, the Pigou-Dalton Transfer Principle can be established
as follows: If an income distribution y is obtained from x by a progressive
(regressive) transfer, or a non-empty sequence of them, then inequality
decreases (increases). Now, we can state the progressive transfer concept, from
a more rigorous point of view.
Definition 2: Let x, y e DN. Then, y is said to be obtained from x through a
progressive income transfer if:

x = (x1,...,xi,...,xj,...,xN)' => y = (x„...,Xi +5,...,Xj -5,...,x N )', 5 e 0,-±-


2
V ,
In such a case, x is said to be obtained from/ through a regressive transfer.
Next, the link between this just introduced concept and majorization
relationship is going to be analyzed. A pioneer result published in 1903 serves
this purpose and it can be stated by means of the presented terminology as
follows:
Inequality Measures, Lorenz Curves and Generating Functions 201

Theorem 3 (Muirhead, 1903): Let x, y e DN. Then, x is majorized byy (x < y )


if and only if x can be obtained from y through a finite number of progressive
transfers.
Therefore, we must conclude that the Pigou-Dalton Transfer Principle
represents the essence of majorization relationship defined over pairs of income
distributions, and then also of dominance criterion in the sense of Lorenz and,
furthermore, of economic inequality. However, in spite of the importance of the
previous assertion, we must admit this formulation as scarcely operative, to a
certain extent. So, the following objective will be to achieve an effective
characterization of both concepts.
To fulfil this aim, we appeal to bi-stochastic matrices set, which definition
is presented below.
Deflnition 3: A matrix PNxN is said to be bi-stochastic or doubly stochastic if it
satisfies the following properties:
i) 0 < P i j < l , Vij=l,2,...,N

ii) Z p i j = l , Vi = l,2,...,N (17)


j=i

hi) Z P i j = l . Vj = l,2,...,N
i=l
Thus, doubly stochastic matrices are finite ones with a probability
distribution defined over each row or column. The set including all these
matrices is closely related to permutation matrices, in the way expressed by the
following result.
Theorem 4 (Birkhoff, 1976): The (NxN) bi-stochastic matrices set constitutes
the convex envelope of the (NxN) permutation matrices set.
Furthermore, it would be easy to prove how the application of a doubly
stochastic matrix over an income distribution produces an equalizing effect. It is
enough to let P be a bi-stochastic matrix and x,ys DN, so that x = Y.y; then each
component of vector x will be a convex mixture of the vector y components and
thus we have a progressive transfer1. In other words:

Xi = ZyjPij + yi i - S P i j =yi + l ( y j - y i ) p i j > i = i,2,...,N (18)


J*I V J" 1 J j*'

'In that sense, Arnold (2005), quoting from Schur (1923), refers them defining x as an averaging
of y.
202 J.J. Nunez-Velazquez

The anterior explanation is sufficient to make evident the following result,


despite the great advance it has represented in this field.
Theorem 5 (Hardy, Littlewood and Polya, 1959):
(x-<y)<=> x = P.y, V x , y e D N ,
where P is some (NxN) doubly stochastic matrix.
Taking into account the results exposed before, progressive income
transfers have been characterized through operations involving income vectors
and bi-stochastic matrices. So, progressive transfers have been reduced to
algebraic operations, making its treatment easier.

6. Inequality measures and Schur convexity


The main objective of this section is to obtain functions compatible with the
majorization relation, in order to construct inequality measures over income
distributions. In doing so, it must be taken into account the links with Transfer
Principle and Lorenz dominance. These functions are defined below.
Definition 4: A real function cp(.), defined over DN is called Schur convex or
S-convex when it is monotone with respect to majorization relationship. Formally:
( x ^ y ) ^ cp(x)<cp(y) (19)

If there were strict inequality, the function would be called strictly


S-convex.
A useful characterization of such a function is the one contained in the next
result, which makes easier its manipulation.
Theorem 6 (Schur and Ostrowski, 1952): Let I be a real interval and cp(.) a
continuously differentiable function defined over IN. Then, <p(.) will be S -convex
if and only if it satisfies the following conditions:
i) cp(.) is symmetric over IN.
ii) Schur condition:

( x
i- x
j)- >0, Vi*j, VxeDN n I N (20)
OXj OXj

Furthermore, it can be proved that every convex and symmetric function is


S-convex too (Marshall and Olkin, 1979, pg. 67).
From now on, it becomes evident how inequality measures should be
S-convex functions defined over income distributions, keeping in mind all the
Inequality Measures, Lorenz Curves and Generating Functions 203

equivalences stated before. For example, Gini index (Gini, 1912) is a strictly
S-convex function1. However, usual inequality measures construction is based
on the next statement, which connects all the implications related to inequality
and majorization exposed before.
Theorem 7 (Karamata, 1932): Let g(.) be a convex, continuous and real
function, then:

(x<y)oZg(xi)sZg(yi), Vx,yeDN
i=l i=l
N
Further, if g(.) is a convex real function, then h(x) = £ g(x j) is said to be a
i=l
convex separable function, provided that x e DN.
It is easy to see that every convex separable function is S-convex too.
Nevertheless, the inverse statement is not true and this can be readily checked
from Theorem 7. But it is important to observe how Theorem 7 relates
majorization to economic inequality measures construction. Moreover, the
following property links this to Lorenz dominance.
Corollary 1 (Arnold, 1987): Let g(.) be a convex, continuous and real function,
then:
v
r g( X rf y ^
<E g
UwJ U(y)JJ
As a result, it is worth mentioning that each convex, continuous and real
function can generate a genuine inequality indicator, because it will be
compatible with Lorenz dominance criterion, using Corollary 1. So, the partial
order deduced from Lorenz dominance criterion is still present, connecting to
the intersection quasi-order (Sen, 1973, pg. 72), which constitutes another
partial order rather less restrictive11.
Evidently, choosing a single inequality measure implies a total order as a
result, but Lorenz compatibility hides what the causes of different orders may be
when several inequality indicators are used. Reasons explaining this fact must be
explained by distinct weighting schemes placed on income distribution, which
are associated to each inequality measure. So, a research field has emerged,
considering batteries of inequality indicators instead of choosing only one of

'Marshall and Olkin (1979) contains an extensive exposition about S-convex functions, including the
result covered in Theorem 6.
k
Obviously, this will be true only if all of the considered inequality indicators are compatible with
Lorenz relation. In other case, there is no inclusion relationship linking both partial orders.
204 J.J. Nunez-Velazquez

them, in order to extract the common information included in such a set using
Principal Component Analysis or to eliminate the redundant inequality
information through Ivanovic-Pena DP2 distance (Garcia et al., 2002). This new
approach can be modified to allow dynamic inequality evaluations too
(Dominguez and Nunez, 2005).
On the other hand, Corollary 1 allows comparisons between income
distributions from different-sized populations. However, this achievement is
possible using homogeneous functions as inequality measures, so as
proportional income vectors must give the same value. This formal fact is
equivalent to impose the so-called Dalton Population Principle, proposed by the
aforementioned author with the name Individuals Proportional Addition
Principle (Dalton, 1920, pg. 357): Inequality becomes invariant against
population replicas'. In formal terms, this restriction imposes that inequality
measures have to be functions defined over the empirical accumulative
distribution function.
Finally, we can summarize a great part of the last discussion by reproducing
the next statement, where relative sensibility to income transfers is included,
depending on the chosen inequality measure.
Theorem 8 (Atkinson, 1970; Kakwani, 1980): If V(.) is a strictly convex and
real function, then every inequality measure defined by I(x) = E[V(x)] will
satisfy the Pigou-Dalton Transfer Principle, whatever the income level be.
Furthermore, if V(.) is differentiable too, then its relative sensibility to income
transfers will be proportional to:
T(x) = V'(x) - V'(x - 5 ) , 8 > 0.

7. Axiomatic approach to economic inequality


This approach consists of stating desirable properties a good inequality
measure should fulfil, in order to be chosen among all possible ones. So, the
name axiom must be understood in such a context, but not in the mathematical
sense of unchanging true. In this way, we can impose more and more restrictive
properties to limit us to choose from a lesser alternatives set. The best option
would be the formulation of a group of properties able enough of characterizing
a single inequality measure, because allows us its selection whenever we agree
with its properties. Nevertheless, this is a difficult goal to achieve.

'A r-order population replica consists of considering an income vector which repeats r times each
component of the original income distribution, giving (X],...,),Xi,X2,...r>,x2,...,xN,...r),XN)' as a result.
Inequality Measures, Lorenz Curves and Generating Functions 205

Along this section, we don't intend offering an exhaustive exposition of


properties but only to present the most commonly accepted ones™. Indeed, we
will include some controversial properties to clearly establish links between this
approximation through axioms imposing and the analytical treatment exposed
before.
In precedent sections, we have shown how the Pigou-Dalton Transfer
Principle plays a fundamental role in inequality measurement. So, we are not
going to repeat its statement again, though it has to be understood as included in
the basic properties set". Consequently, we present below the aforementioned
basic properties or axioms, where I(.) stands for a real function defined over D
as a basic formulation of an inequality measure.
1. Symmetry or Anonymity Axiom
Let x = (x1;x2,...,xN)'eD and y = (xCT(1),xo(2),...,x(j(N))', where CT(.) is a
permutation function over the set {1,2,...,N}. Then, I(x)=I(y).
2. Homothetic or Scale Invariance Axiom
I(A.x) = I(x), Vx e D, V\ > 0
3. Dalton Population Principle
Let x,y eD, in such a way that y is a r-order replica of x (in other words, y =
(x',x',...m>,x')'), and its components are increasingly ordered. Then I(x) = I(y).
4. Normalization Axiom
Its weak version expresses that I(x) > 0, Vx e D. Furthermore:
I(x) = 0 <=> 3c>0 : x = (c,c,..,c)'
There exists a strong version of the axiom, so-called Range normalization
axiom, where the inequality measure must to be 1 in the case of maximum
inequality.
5. Constant Addition Axiom0
Letx,ysD, so thatx = (xi,x2,...,xN)',y = (xj+c, x2+c,..., x N +c)'. Then:
I(y)<I(x), Vc>0.

"A wide exposition of proposed axioms in the economic literature can be find in Nygard and
Sandstrom (1981) or Ruiz-Castillo (1986), for example.
"Axioms presented in this section are considered the basics related to Lorenz dominance. Among the
omitted ones, we must mention the additive decomposability axiom (Bourguignon, 1979), which
stands out because of its repercussion and controversy. Moreover, this axiom allows us to
characterize a family of inequality measures.
"This axiom appears in Dalton (1920, pg. 357).
206 J.J. Nunez-Velazquez

On the other hand, to connect this approximation with the more analytical
developed before, we introduce the next definition.
Definition 5: A real function I(.) defined over D is said to be a Lorenz-
compatible inequality measure when it is monotone with respect to Lorenz
dominance criterion. More formally:
I(x) > I(y) ^ x > L y « L x (p) < L y (p), Vp e [0,l]

To characterize this kind of inequality measures, we need to formalize the


restrictions that were included in the analytical framework related to Lorenz
dominance and this necessity leads us to the first three axioms recently exposed.
The next result summarizes this reasoning.
Theorem 9 (Foster, 1985): Let I(.) be a real function defined over D. Then, I(.)
is a Lorenz-compatible inequality measure if and only if it satisfies the following
axioms:
i) Symmetry.
ii) Scale Invariance.
iii) Dalton Population Principle.
iv) Pigou-Dalton Transfer Principle.
As it can be readily seen, Theorem 9 is a reformulation of the whole
precedent analytical exposition into axiomatic terms. However, as it may be
expected, there exist a lot of Lorenz-compatible inequality measures. Among
them, we have the coefficient of variation, Gini index, Atkinson's and Theil's
families of measures, as a few examples.
This axiomatic approach has allowed us to express desirable properties in
order to narrow the set of alternative inequality measures to choose from.
Among all of them, the Pigou-Dalton Transfer Principle plays a crucial role in
inequality measurement related to Lorenz dominance criterion and majorization
relationship, whereas the rest of exposed axioms have a more instrumental
characterp.
In addition, the restrictions these axioms impose on inequality measures
might clarify some details about the underlying weighting scheme included in
each indicator. So, it is generally accepted that inequality measures would tend
to weight more heavily incomes near the bottom of the distribution, while the

p
It has been suggested the use of absolute measures, instead of relative ones. This approach implies
the suppression of the Scale Invariance Axiom (Moyes, 1987). Nevertheless, this kind of measures
are closer to the so-called Lorenz generalized dominance relation (Shorrocks, 1983).
Inequality Measures, Lorenz Curves and Generating Functions 207

limit case would be configured using only the poorer income (Rawls, 1972).
Therefore, this research field intends to restrict the Pigou-Dalton Transfer
Principle by placing more weighting on transfers where the smaller incomes are
involved. Some related results are Shorrocks and Foster (1987) or Fleurbaey and
Michel (2001), among others.
In this way of thinking, another related research field has as an objective the
use of weighting schemes on the Lorenz curve directly. It is a well-known fact
that Gini index matches twice Lorenz areaq (e.g. Wold, 1935 or Kakwani, 1980).
Following this idea, some authors have proposed inequality measures based on
geometrical elements on the Lorenz curve, like the maximum distance to the
egalitarian line (Pietra, 1914-15, 1948; Schutz, 1951), its length (Kakwani,
1980) and weighting Lorenz areas using specific functions (Mehran, 1976;
Casas and Nunez, 1991, among others).

8. Alternative comparison criteria


Both majorization and Lorenz dominance relationships generate a partial
order structure over DN and D, respectively. Obviously, this fact constitutes an
important drawback because it is well-known that Lorenz curves crosses occur
frequently in practice (Shorrocks, 1983) and, therefore, the number of non
comparable pairs of income distributions may be relatively great. So, other
alternative comparison criteria have been proposed looking for a lesser number
of non comparable situations, admitting that inequality is, in essence, a quasi-
order (Sen, 1973) and the only way to achieve a total order is by using inequality
measures, as it was shown previously. Really, it can be thought that the partial
order as a result must be inherent to the problem of order relations in vector
spaces.
Along this section, we will expose some of the most studied proposals in
this researching field. To begin with, Shorrocks (1983) proposed the use of
generalized Lorenz curves, claiming that they reduce significantly the number of
non comparable pairs of income vectors with respect to Lorenz dominance
criterion. To reach this aim, he defined his generalized curves re-scaling the
Lorenz curve in the following way:
LG x (p) = x - L x ( p ) , p e [ 0 , l ] , x e D (21)

where Lx(p) stands for the Lorenz curve of the income vector x. Properties of
these curves are easy to establish as direct consequences of Lorenz curves ones.
Consequently, the dominance relationship can be established:

q
It refers to the area located between the Lorenz curve and the diagonal of the unit square.
208 J.J. Nunez-Velazquez

x < LG y o LG X (p)> LG y (p), Vp e [0,l] (22)

However, scale change induced through multiplication by its mean income


implies that generalized curves don't measure inequality but they assume
postulates related to social welfare valuation from a strictly monetary point of
view7. Because of this reason, sometimes they are called income-welfare curves
(Penaetal., 1996)s.
Another interesting proposal has been the ranks dominance criterion
(Nygard and Sandstrom, 1981), whose definition is presented next, assuming a
pair of income vectors x.ye DN:
x<Ky<=>x,<y,, Vi = l,2,...,N (23)

closely related to majorization, as it can be observed. Again, this relation


induces a partial order structure over DN, as we may expect. Also, this
relationship is related to the generalized Lorenz dominance, as it can be easily
proved:
x < R y => x < LG y (24)

Lately, a great deal of research effort has been devoted to the application of
well-known stochastic dominance criteria to provide alternative tools in the
study of economic inequality and other related concepts, such as poverty,
welfare and so on'. Stochastic dominance consists of several relationships
defined on pairs of random variables through their accumulative distribution
functions. To define them, let X be a non-negative random variable, representing
a society income and let F(.) be its accumulative distribution function, then the
successive orders accumulative distribution functions can be defined through the
following expressions:
F,(z) = F(z) = P(X<z), Vz>0
(25)
F j (z)=£F j _ 1 (t)-dt, Vz>0, Vj = 2,3,....

Furthermore, we can define the j-order stochastic dominance criterion as


follows, where X, Y stands for two income non-negative random variables and
F(.), G(.) are their respective cumulative distribution functions:

'Relations between Lorenz dominance and welfare have been studied in Bishop, Formby and Smith
(1991) and subsequent papers.
S
A sufficient condition for this dominance criterion is given in Ramos, Ollero and Sordo (2000).
'More details may be seen in Muliere and Scarsini (1989) or Bishop, Formby and Sakano (1995),
among others.
Inequality Measures, Lorenz Curves and Generating Functions 209

x< D . y » F j (z)>G j (z), Vz>0 (26)

First and second orders of stochastic dominance criteria are strongly


connected to rank and generalized Lorenz dominance relationships, respectively
(Bishop, Formby and Smith, 1991). Again, all these criteria generate partial
order structures, though there are progressively less non-comparable cases when
the dominance order increases. At the same time, each order maintains the
structure induced in inferior orders, so that if X dominates Y at the first order,
then X will dominate Y at each order, for example. From a few years onward,
research interest has placed on third-degree stochastic dominance to analyze its
normative implications and what the decision would be if Lorenz curves crossed
each other (Shorrocks and Foster, 1987; Davies and Hoy, 1994, 1995, among
others).
It might be possible to define a total order structure by assuming the Rawls
postulate (Rawls, 1972), and so focusing the comparison only on the poorer
income. Thus, the Rawls comparison criterion would be defined as follows,
where it is supposed to be x, y e D:
x ^ y o Min{ Xi }<Min{ yj } (27)
i i
But in this case, we are losing the sense of measuring inequality and what
this criterion is comparing may be located nearer to poverty analysis.
In addition, there exist other more sophisticated proposals like successive
orders Lorenz dominance criteria, but there is reasonable doubt about they might
effectively measure inequality (Nygard and Sandstrom, 1981; Ramos and Sordo,
2001).
On the other hand, recently, absolute Lorenz curves have been proposed
(Moyes, 1987), as an alternative. These curves are constructed using income
differentials instead of classical relative ones, and so a new research field has
emerged, where neither Pigou-Dalton Transfer Principle nor Scale Invariance
Principle has to be included in their essential framework. Ramos and Sordo
(2003) proved its relation to second-order absolute Lorenz ordering.
Nevertheless, both this approach and generalized Lorenz curves are subject to
the same conceptual controversy.

9. Inequality measures as an average of individual inequalities


In looking for Lorenz-compatible inequality measures, Theorem 7 and
Corollary 1 allow us to consider economic inequality measures as averages of
individual's income valuations. To understand this assertion, we would think of
210 J.J. Nunez-Velazquez

individual inequality as the amount each person contributes to global result with.
In doing so, if the resources sharing was egalitarian (all individuals have
perceived income mean), then each contribution to inequality would be null. But
when some of them get more or less income than mean, they are contributing to
raise inequality.
Therefore, the aim this interpretation is searching for consists in finding out
the method we must use to measure such an individual contribution to
inequality. It should be noted how this individual contribution must be coherent
with inequality concepts, and so we might expect at least a reduction of optional
indicators to choose among, as a result.
Next, along the first subsection, a precedent inequality indicators family
addressed to this approach is exposed, whereas a new proposal about what an
inequality indicator must fulfil will be presented at the second one.

9.1. Generalized mean deviation family


Castagnoli and Muliere (1991) consider inequality measures belonging to
the following family:
C(x)=I7i.|xj-A| , x e D N ; r i > 0 . (28)
i=l

So, {YJ, I = 1,2,...,N} stands for weights to averaging individual's inequality


contributions valued by the function |XJ - A|, which expresses the differences
between each income and A, as a reference point. Some particular cases are
described below:
• Mean deviation related to income mean is obtained when A = u and y\ =
(1/N), I = 1,2,...,N. With the same weightings, but A = Me, mean deviation
with respect to median appears, where Me stands for median income.
• Pietra index is included too, using A = u and yi = l/(2Nu), I = 1,2,.. .,N.
• Gini index, because it can be obtained using the following alternative
expression (Berrebi and Silber, 1987):

I G ( x ) = - ^ - Z | N - 2 i + l|JxN_i+1-Me|, xeDN (29)


N |X i=l

In addition, C(x) is an S-convex function if and only if the weights y; are


non-increasing for Xj < A and non-decreasing for x( > A. In particular, this is true
when all the weights are equal and positive.
A more general formulation consists of admitting the use of monotone non-
decreasing functions g(.), so that:
Inequality Measures, Lorenz Curves and Generating Functions 211

C(x) = g- 1 fl r i .g(jx i -A|) j , x e D N ; f i > 0 . (30)

where g"'(t) = inf{x: g(x) > t}, including the first formulation when g(.) is the
identity function.
This family values the individual inequality contribution through income
differences respect to a reference point, usually mean or median income. Only
the use of normalizing constants included in the weights specification, allows
habitual relative indicators like Pietra and Gini ones can be obtained. Therefore,
this family can be considered as generalized mean deviations.

9.2. Individual inequality average indicators


Corollary 1 shows a way of designing inequality measures through the
characterization it contains. Therefore, separable convex functions defined on
relative incomes are suitable tools to obtain "genuine" inequality indicators, by
taking expectation over them. This is the justification of indicators as
individual's inequality contribution averages, provided that they would be plenty
of sense. Let us express these ideas through the definition below.
Definition 6: Let X be a non-negative random variable to model income and let
u = E(X) its expectation. Then an indicator I(.) is said to be an individual
inequality average if it presents the next form:

fxY
g —
<vJ.
and the following conditions have to be fulfilled:
i) g(.) is a convex, continuous and real function.
ii) g(.) is non-negative.
iii) g(.) is non-increasing when x < u.
iv) g(.) is non-decreasing when x > u.
These conditions assure that I(X) will be a genuine inequality indicator,
because they are imposing such a performance over the individual contribution
valuation. As a matter of fact, the first one implies g(.) is a convex separable
function, the second is necessary because individual contribution to inequality
must not be negative, keeping in mind that incomes can not diminish inequality
and only can accumulate it or not. Two last conditions allow us to impose the
genuine perform of individual contributions, so as they must increase when
income becomes more far away from mean, whatever the direction would be
212 J.J. Nunez-Velazquez

(egalitarian distribution represents absence of inequality, and then null


individual contributions).
Gini index is one of the most renowned measures not included in this
family, because it is a strictly S-convex function but not a convex separable one.
Of course, this fact does not signify Gini index as a bad inequality index.
Nevertheless, generalized mean deviation family is included inside
individual inequality average indices, when A = u and {y;, I = 1,2,.. .,N} constitutes
a probability distribution.
Next, some of the most usually proposed inequality measures are to be
analyzed with relation to its belonging to this new family.
Proposition 2: Both Pietra index and squared coefficient of variation are
individual inequality average indicators.
Proof
a) Pietra index can be expressed as:

I(X) = ^-.EJX-u|]=E

and so g(x) = ( | x - 1 | )/2. Obviously, g(x) is a non-negative, continuous and


real function. In addition, g'(x) = (-1/2), if x < 1, and g'(x) = 1/2, when x > 1.
Furthermore, g"(x) = 0 assures its convexity.
b) Squared coefficient of variation is:
/ \2
CV2(X)=-! -.E[(X-u)2]=]

then g(x) = (x - 1) , and so it is a non-negative, continuous and real function.


Finally, g'(x) = 2.(x - 1) satisfies conditions iii) and iv), and g"(x) = 2
shows g(x) as a convex function.
Proposition 3: Both Theil order 1 and order 0 indicators are not individual
inequality average indicators.
Proof
a) Theil order 1 index can be expressed as:
X 'X^
T,(X) = E -.log
\v- J
Hence, g(x) = x.logx, and its first derivatives are g'(x) = 1 + logx, g"(x) = 1/x.
Inequality Measures, Lorenz Curves and Generating Functions 213

So, this function is a convex, continuous and real one, but g(x) > 0 <=> x > 1,
and it fails ii) because g(.) is negative when x > 1. This fact implies T)(X)
admits negative contributions to inequality when incomes are lesser than mean.
Also, condition iii) is not fulfilled.
b) Theil order 0 indicator is defined by:
C \"
1
log = E log

Hence, g(x) = log(l/x), g'(x) = -(1/x), g"(x) = 1/x2, and so it satisfies i).
But g(x) > 0 <=> x < 1, and it fails to satisfy ii). Then, T0(X) allows
negative contributions in case incomes are greater than mean. In fact, condition
iv) is not fulfilled.
Hence, Theil's indicators seem ill-conditioned to measure inequality, as far
as it has proven in Proposition 3, taking into account implied reasons to fail". So,
the proposed family could be used to assure us about if convex separable
inequality indicators are really measuring what they are supposed to do, despite
its Lorenz-compatibility. Furthermore, this last result enlightens us about some
well-known inequality measures, whose performance would not be adequate.
Perhaps, what this family shows is that Lorenz curves analyze inequality, of
course, but it is possible other things may be included too. However, this is a
task, which might require more investigation in the future.

10. Conclusions
In order to provide adequate comprehension of economic inequality
measures, underlying statistical theory has been exposed along this paper. In
doing so, we must conclude that economic inequality measures are firmly
connected to majorization and Lorenz dominance relationships between pairs of
income distributions. This conclusion has important consequences on the
selection of inequality indicators to be compatible with those relationships.
To reach conclusions like the afore-mentioned, a historical revision has
been developed to recover statistical terms related to economic inequality,
seldom used nowadays, as well as its relations with currently trends and
proposals. Even so, Lorenz curves have resisted against new theoretical
approaches, since its appearance more than a century ago and they can be

"Dagum (1990, 2001) had warned about Theil,s indicators ill-conditioned performance, but he did it
only in a social welfare framework.
214 J.J.Nunez-Velazquez

considered as a cornerstone of inequality analysis, despite the great research


efforts registered in this field.
Moreover, Pigou-Dalton Transfer Principle can be considered another
cornerstone of inequality measures design, as it has been clearly shown in the
paper. In fact, this property turns to be the essential underlying feature in
majorization relations through the use of doubly stochastic matrices. This reason
has conducted researchers toward the study of restrictions placed on this
Principle, in order to investigate possible inequality indicator characterizations
and to analyze concrete inequality measure performances, including their
weighting schemes over income distributions.
Connection to Lorenz dominance criterion has been explained through
Schur-convex, or S-convex, functions so that they are present in the construction
of the most adequate kinds of indicators to measure inequality. As a matter of
fact, convex separable functions and Karamata's Theorem are the key results in
this line of work.
Basic inequality axioms are determined by relaxations carried out over the
original majorization concept, which is defined only over pairs of equally-sized
income vectors. So, Dalton Population Principle makes possible comparisons
between different-sized income vectors, and Scale Invariance Axiom permits the
same but using income vectors where the total amount of shared resources could
be distinct. Foster's Theorem fixes what axioms are necessary to obtain Lorenz
compatible inequality indicators. Additional restrictions like additive
decomposition axioms are useful in characterizing inequality indicators families
or in achieving interesting performances, but those properties rarely are linked to
genuine inequality concepts.
Until now, other relations derived from global income vector comparisons
have not allowed us to reach a total order structure over the income distributions
space, and thus Lorenz dominance criterion continues as the most accepted
background to measure economic inequality. This affirmation embodies Sen's
statement about the partial order nature of economic inequality when income
vector comparisons have to be considered and it suggests the use of batteries of
economic indicators as a valid alternative.
Lorenz curves generating functions has revealed as an interesting approach
to generate functional forms in the direct estimation of Lorenz curves task.
Nevertheless, relations to density generating functions are hard to achieve,
possibly because of the inherent difficult in establishing direct relations between
Lorenz curves and cumulative distribution functions, except in the simplest
cases. In addition, ordered families of Lorenz curves allow us to obtain a total
Inequality Measures, Lorenz Curves and Generating Functions 215

order over income distributions, but it might be due to the assumption of


scarcely realistic models of income.
Finally, obtaining a consensus inequality measure implies the necessity of
more restrictions to be imposed together with the basic inequality axioms, but
those new properties should not lose connection with the essential inequality
concepts. Nevertheless, a set of properties has been stated in order to evaluate
individual inequality contributions when convex separable functions are used as
inequality indicators. Surprisingly, both Theil's order 0 and 1 fail to fulfill them,
although they are Lorenz-compatible indicators. In the end, this fact might point
out towards Lorenz curves measure inequality, of course, but other things may
be involved too.

Acknowledgments
The author gratefully acknowledges partial financial support from
University of Alcala (grant UAH-PI2004/034) and of Junta de Comunidades de
Castilla-La Mancha together with Fondo Social Europeo. (Project PBI-05-004).

References
1. B.C. Arnold. (1987). Majorization and the Lorenz Order: A Brief
Introduction. Lecture Notes in Statistics. New York: Springer Verlag.
2. B.C. Arnold. (2005). The Lorenz curve: Evergreen after 100 years. Int.
Conference in Memory of C. Gini and M.O. Lorenz. Siena.
[http://www.unisi.it/eventi/GiniLorenz05].
3. B.C. Arnold, C.A. Robertson, P.L. Brockett and B.Y. Shu. (1987).
Generating ordered families of Lorenz curves by strongly unimodal
distributions. Journal of Business and Economic Statistics, 5(2), 305-308.
4. A.B. Atkinson. (1970). On the measurement of inequality. Journal of
Economic Theory, 2, 244-263.
5. C.P.A. Barrels. (1977). Economics Aspects of Regional Welfare. Martinus
Nijhoff Sciences Division.
6. CM. Beach and R. Davidson. (1983). Distribution-free statistical inference
with Lorenz curves and income shares. Review of Economic Studies, L,
723-735.
7. CM. Beach and J. Richmond. (1985). Joint confidence intervals for income
shares and Lorenz curves. International Economic Review, 26(2), 439-450.
8. Z.M. Berrebi and J. Silber. (1987). Dispersion, asymmetry and the Gini
index of inequality. International Economic Review, 28(2), 331-338.
9. G. Birkhoff. (1976). Tres observaciones sobre el Algebra Lineal. Univ.
Nacional de Tucuman Rev., Serie A, 5, 147-151.
216 J.J. Nunez-Velazquez

10. J.A. Bishop, J.P. Formby and R. Sakano. (1995). Lorenz and stochastic-
dominance comparisons of European income distributions. Research on
Economic Inequality, 6, 77-92.
11. J.A. Bishop, J.P. Formby and W.J. Smith. (1991). Lorenz dominance and
welfare: Changes in the U.S. distribution of income, 1967-1986. Review of
Economics and Statistics, 73, 134-139.
12. F. Bourguignon. (1979). Decomposable income inequality measures.
Econometrica, 47, 901-920.
13. J. Callejon. Un nuevo metodo para generar distribuciones de probabilidad.
Problemas asociados y aplicaciones. Ph. D. dissertation. University of
Granada.
14. J.M. Casas, R. Herrerias and J.J. Nunez. (1997). Familias de Formas
Funcionales para estimar la Curva de Lorenz. Actas de la IV Reunion Anual
de ASEPELT-Espafia. Servicio de Estudios de Cajamurcia, 171-176.
Reprinted in Aplicaciones estadisticas y economicas de los sistemas de
funciones indicadoras (R. Herrerias, F. Palacios and J. Callejon, eds.). Univ.
Granada, 119-125(2001).
15. J.M. Casas and J.J. Nunez. (1987). Algunas Consideraciones sobre las
Medidas de Concentration. Aplicaciones. Actas de las II Jornadas sobre
Modelizacion Economica, 49-62. Barcelona. Reprinted in Aplicaciones
estadisticas y economicas de los sistemas de funciones indicadoras (R.
Herrerias, F. Palacios and J. Callejon, eds.). Univ. Granada, 111-118
(2001).
16. J.M. Casas and J.J. Nunez. (1991). Sobre la Medicion de la Desigualdad y
Conceptos Afines. Actas de la V Reunion Anual de ASEPELT-Espafia,
Caja de Canarias, 2, 77-84. Reprinted in Aplicaciones estadisticas y
economicas de los sistemas de funciones indicadoras (R. Herrerias, F.
Palacios and J. Callejon, eds.). Univ. Granada, 127-133 (2001).
17. E. Castagnoli and P. Muliere. (1990). A note on inequality measures and the
Pigou-Dalton Principle of Transfers. Income and Wealth Distribution,
Inequality and Poverty. (C. Dagum and M. Zenga, eds.) Springer Verlag,
171-127.
18. E. Castillo, A.S. Hadi and J.M. Sarabia. (1998). A method for estimating
Lorenz curves. Communications in Statistics, Theory and Methods, 27,
2037-2063.
19. F.A. Cowell, Measuring inequality. 2a ed. LSE Handbooks in Economics.
Prentice Hall/Harvester Wheatsheaf (1995).
20. C. Dagum. (1990). Relationship between income inequality measures and
social welfare functions. Journal of Econometrics, 43(1-2), 91-102.
21. C. Dagum. (2001). Desigualdad del redito y bienestar social,
descomposicion, distancia direccional y distancia metrica entre
distribuciones. Estudios de Economia Aplicada, 17, 5-52.
22. H. Dalton. (1920). The measurement of the inequality of incomes.
Economic Journal, 30,348-361.
Inequality Measures, Lorenz Curves and Generating Functions 217

23. J. Davies and M. Hoy. (1994). The normative significance of using third-
degree stochastic dominance in comparing income distributions. Journal of
Economic Theory, 64, 520-530.
24. J. Davies and M. Hoy. (1995). Making inequality comparisons when Lorenz
curves intersect. American Economic Review, 85(4), 980-986.
25. J. Dominguez and J.J. Nunez. (2005). The evolution of economic inequality
in the EU countries during the nineties. First Meeting of the Society for the
Study of Economic Inequality (ECINEQ). Palma de Mallorca. Available at
[http://www.ecineq.org]
26. M. Fleurbaey and P. Michel. (2001). Transfer Principles and inequality
aversion, with an application to optimal growth. Mathematical Social
Sciences, 42, 1-11.
27. J.E. Foster. (1985). Inequality measurement. Published in Fair Allocation
(H.P. Young, ed.), Proceedings of Symposia in Applied Mathematics, 33,
Providence, American Mathematical Society, 31-68.
28. C. Garcia, J.J. Nunez, L.F. Rivera and A.I. Zamora. (2002). Analisis
comparativo de la desigualdad a partir de una bateria de indicadores. El
caso de las Comunidades Autonomas espafiolas en el periodo 1973-1991.
Estudios de Economia Aplicada, 20(1), 137-154.
29. R.M. Garcia and J.M. Herrerias. (2001). Inclusion de curvas de Lorenz en
las funciones generadoras. Aplicaciones estadisticas y economicas de los
sistemas de funciones indicadoras (R. Herrerias, F. Palacios and J. Callejon,
eds.). Univ. Granada, 185-191.
30. J.L. Gastwirth. (1971). A general definition of the Lorenz curve.
Econometrica, 39, 1037-1039.
31. C. Gini. (1912). Variability e Mutabilita: Contributo alio studio delle
distribuzioni e relazioni statistiche. Studi Economico-Giuridici
dell'Universita di Cagliari, 3, 1-158.
32. C. Gini. (1921). Measurement of inequality of incomes. The Economic
Journal, 31, 124-126.
33. CM. Goldie. (1977). Convergence Theorems for empirical Lorenz curves
and their inverses. Advances in Applied Probability, 9, 765-791.
34. M.R. Gupta. (1984). Functional form for estimating the Lorenz curve.
Econometrica, 52(5), 1313-1314.
35. G.H. Hardy, J.E. Littlewood and G. Polya. (1929). Some simple inequalities
satisfied by convex functions. The Messenger of Mathematics, 26, 145-153.
36. G.H. Hardy, J.E. Littlewood and G. Polya. (1952). Inequalities. 2a ed.
Cambridge University Press.
37. R. Herrerias, F. Palacios and J. Callejon. (2001). Las curvas de Lorenz y el
sistema de Pearson. Published in Aplicaciones estadisticas y economicas de
los sistemas de funciones indicadoras (R. Herrerias, F. Palacios and J.
Callejon, eds.). Univ. Granada, 135-151.
38. J.C. Houghton. (1978). Birth of a parent: The Wakeby distribution for
modelling flood flows. Water Resources Research, 14, 1105-1109.
218 J.J. Nunez-Velazquez

39. J. Iritani and K. Kuga. (1983). Duality between the Lorenz curves and the
income distribution functions. Economic Studies Quarterly, 23, 9-21.
40. N.C. Kakwani. (1980). Income Inequality and Poverty. Methods of
Estimation and Policy Applications. Oxford University Press.
41. N.C. Kakwani and N. Podder. (1973). On the estimation of Lorenz curves
from grouped observations. International Economic Review, 14(2), 278-
291.
42. J. Karamata. (1932). Sur une inegalite relative aux fonctions convexes.
Publ. Math. Univ. Belgrade, 1, 145-148.
43. M. Kendall and A. Stuart. (1977). The Advanced Theory of Statistics, 1,
4a ed. C. Griffin. London.
44. S. Kuznets. (1953). Share of upper income groups in income and savings.
National Bureau of Economic Research. New York.
45. M.O. Lorenz. (1905). Methods of measuring the concentration of wealth.
Journal of the American Statistical Association, 9,209-219.
46. A.W. Marshall and I. Olkin, Inequalities: Theory of Majorization and its
Applications. New York: Academic Press.
47. F. Mehran. (1976). Linear measures of income inequality. Econometrica,
44, 805-809.
48. P. Moyes. (1987). A new concept of Lorenz domination. Economics
Letters, 23, 203-207.
49. R.F. Muirhead. (1903). Some methods applicable to identities and
inequalities of symmetric algebraic functions of n letters. Proceedings of
Edinburgh Mathematical Society, 21, 144—157.
50. P. Muliere and M. Scarsini. (1989). A note on stochastic dominance and
inequality measures. Journal of Economic Theory, 49, 314-323.
51. F. Nygard and A. Sandstrom. (1981). Measuring Income Inequality.
Stockholm: Amqvist and Wiksell International.
52. A.M. Ostrowski. (1952). Sur quelques applications des fonctions convexes
et concaves au sens de I. Schur. Journal of Math. Pures Appl., 9, 253-292.
53. V. Pareto. (1897). Cours d'Economie Politique. Rouge. Lausanne.
54. J.B. Pena (Dir.), F.J. Callealta, J.M. Casas, A. Merediz and J.J. Nunez.
(1996). Distribucion Personal de la Renta en Espana. Piramide. Madrid.
55. G. Pietra. (1914-15). Delle relazioni tra gli indici di variability. Note I in
Atti del R. Istituto Veneto di Scienze, Lettere ed Arti, LXXIV (II), 775-
804.
56. G. Pietra. (1948). Studi di statistica metodologica. Giuffre. Milan.
57. A.C. Pigou. (1912). Wealth and welfare. McMillan. New York.
58. J.S. Ramberg, E.J. Dudewicz, P.R. Tadikamalla and E.F. Mykytra. (1979).
A probability distribution and its uses in fitting data. Technometrics, 21,
201-214.
59. H.M. Ramos, J. Ollero and MA. Sordo. (2000). A sufficient condition for
generalizad Lorenz order. Journal of Economic Theory, 90, 286-292.
Inequality Measures, Lorenz Curves and Generating Functions 219

60. H.M. Ramos and M.A. Sordo. (2001). El orden de Lorenz generalizado de
orden j , ^un orden en desigualdad?. Estudios de Economia Aplicada, 19,
139-149.
61. H.M. Ramos and M.A. Sordo. (2003). Dispersion measures and dispersive
orderings. Statistics and Probability Letters, 61, 123-131.
62. J. Rawls. (1972). A Theory of Justice. London: Oxford University Press.
63. J. Ruiz-Castillo. (1986). Problemas conceptuales en la medicion de la
desigualdad. Hacienda Publica Espanola, 101,17-31.
64. J. Ruiz-Castillo. (1987). La medicion de la pobreza y de la desigualdad en
Espana, 1980-81. Estudios Economicos, 42. Servicio de Estudios del Banco
de Espana. Madrid.
65. J.M. Sarabia, E. Castillo and D. Slottje. (1999). An ordered family of
Lorenz curves. Journal of Econometrics, 91,43-60.
66. J.M. Sarabia, E. Castillo and D. Slottje. (2002). Lorenz ordering between
McDonald's generalized functions of the income size distribution.
Economic Letters. 75, 265-270.
67. I. Schur. (1923). Uber eine klasse von mittelbildungen mit anwendungen
die determinaten. Theorie Sitzungsber Berlin Math. Gesellschaft, 22, 9-20.
68. R.R. Schutz. (1951). On the measurement of income inequality. American
Economic Review, 41, 107-122.
69. A.K. Sen. (1973). On Economic Inequality. Oxford: Clarendon Press.
70. A.K. Sen and J.E. Foster. (1997). On Economic Inequality. Expanded
edition. Clarendon Press Paperbacks and Oxford University Press.
71. A. Shorrocks. (1983). Ranking income distributions. Economica, 50, 3-18.
72. A. Shorrocks and J.E. Foster. (1987). Transfer sensitive inequality
measures. Review of Economic Studies, 54,485^197.
73. H. Wold. (1935). A study of the mean difference, concentration curves and
concentration ratio. Metron, 12, 39-58.
74. I. Zubiri. (1985). Una introduction al problema de la medicion de la
desigualdad. Hacienda Publica Espanola, 95, 291-317.
Chapter 12
EXTENDED WARING BIVARIATE DISTRIBUTION

J. RODRIGUEZ-AVI
Department of Statistics and Operations Research, University of Jain
Campus Las Lagunillas, B3, Jaen, 23071, Spain

A. CONDE-SANCHEZ
Department of Statistics and Operations Research, University of Jaen
Campus Las Lagunillas, B3, Jaen, 23071, Spain

A.J. SAEZ-CASTILLO
Department of Statistics and Operations Research, University of Jaen
Campus Las Lagunillas, B3, Jaen, 23071, Spain

M.J. OLMO-JIMENEZ
Department of Statistics and Operations Research, University of Jaen
Campus Las Lagunillas, B3, Jaen, 23071, Spain

The aim of this paper is to obtain a bivariate distribution that extends the Bivariate
generalized Waring distribution (BGWD) and that preserves some of its properties, such
as the partition of the variance into three distinguishable components due to randomness,
proneness and liability. Finally, an example in the context of accident theory is included
in order to illustrate the versatility of this new distribution.

1. Introduction
Accident theory has become the object of numerous studies that tried to
develop several hypotheses in order to interpret the causes of an accident.
Among them, the idea of accident proneness has stimulated much interesting
statistical theories. One important contribution in this direction is the
"proneness-liability" model proposed by Irwing [1] and Xekalaki [5] giving rise
to a three parameter discrete distribution, the univariate generalized Waring
distribution (UGWD) with probability generating function (p.g.f.) given by the
Gauss hipergeometric function:

221
222 J. Rodriguez-Avi et al.

G(t) = -^-2Fi(a,k;a + k + p;t), (1)


(a + p\
where a, k, p>0.
This model assumes that all non-random factors may be split into internal
and external factors. So, the term "accident proneness" refers to a person's
predisposition to accidents and the term "accident liability" refers to a person's
exposure to external risk of accident. Then, the UGWD arises from a Poisson
distribution where the parameter A is the "liability" that follows a Gamma
distribution and the parameter p is the "proneness" that follows a Beta
distribution, that is:
( l-p^
(2)
Poisson(A) A Gamma\ a, A Beta I(p,k).
A y p j p " ' '
This way of obtaining the distribution as a mixture allows the variability to
be split into three additive components due to proneness, liability and
randomness:
Var{x)= J*. + <*(* + !) + a>k(p + k-i)

randomness liability proneness

However, there is a problem arising from the fact that the UGWD is
symmetrical in the parameters a and k and, hence, distinguishable estimates for
non-random components cannot be obtained.
Moreover, it is observed that the UGWD belongs to the family of Gaussian
hypergeometric distributions, GHD (Kemp and Kemp [2]). Thus, Rodriguez
et al. [4] have considered an extension of this distribution, introducing a
parameter A, 0<>l< 1, in such a way that the p.g.f. is given by:
G
«=4 H F l T «'Ar>o, o<^i. (4)
2F](a,/3;r,A)
This distribution, denoted by GHD\{a,p,y,A), may also be obtained as a
mixture of a Poisson distribution with a Gamma and a generalized Beta
distributions, so that the property of partition of the variance is verified and data
that can not be adequately fitted by the UGWD, are successfully modeled by the
proposed distribution. However, the two non-random variance components
cannot be separately estimated either.
Xekalaki [6] proposed a solution of this problem dividing the whole period
of observation into two non-overlapping sub-periods and then studying the
resulting bivariate accident distribution. Following a similar process to the
Extended Waring Bivariate Distribution 223

univariate case, this distribution, that she called bivariate generalized Waring
distribution (BGWD), has p.g.f. generated by the F\ Appell's hypergeometric
function:
(P)k m
G(tut2)= l Fx(a;k,m;a + k + m + p;tx,t2), (5)

where
y
(6)
x=0>.=0 \l)x+y*-y-

wither, k, m, p>0.
Then, the accident distribution in the whole period is also a UGWD, like in
each one of the sub-periods considered. Moreover, in this situation it is possible
to distinguish the non-random components in the partition of the variance. In
Kocherlakota and Kocherlakota [3] some of the most interesting properties of
the UGWD are listed.
Our aim is to obtain a bivariate distribution that extends the BGWD
introducing a parameter X, but without loosing its excellent properties in order to
be used in fields such as accident theory. Thus, distinguishable estimates for the
two non-random variance components are obtained and, moreover, fits achieved
by the BGWD are improved.

2. Extension of the BGWD in accident theory


We will generalize the result obtained by Xekalaki [6] that presents the
bivariate Waring distribution as a mixture of a double Poisson distribution with
two independent Gamma distributions and a Beta distribution.
We consider that the number of accidents that a person incurs in two
consecutive sub-periods is determined by a proneness (internal risk), constant
throughout the entire period of observation, and by a liability (external risk) that
varies from one period to the other. This hypothesis seems to be reasonable, at
least for a limited period of time, as Xekalaki points out. In this situation, let
(X,Y,Ai,A2J>) be a random vector where A\\P and A2\P represent liability in
each period and P proneness, so that:
• (X,Y)\A\=l\,A2=l2jP=p has a double Poisson distribution with probability
mass function (p.m.f.)
lx ly
JC = e e2
/(x,>.)|A 1 =/,,A 2 =/ 2 ,p=p( '>') ' ^ j —• (7)
224 J. Rodriguez-Avi et al.

This means that the number of accidents in each period has a Poisson
distribution, both independent.
• Liability parameters have two independent Gamma distributions:
A1|p=p-><JO/MOTa(y01, V)
(8)
A2\P=p-*Gamma(/32, v),

with v=A{l-p)/[l-/l(l -p)], /3up\>0 and density function

-— (9)

• P has a generalized Beta distribution with density

f(D)_ i roo pr—\i-Prx


JpKP)
Fl(a;/3ufi2;r;A,A)r(a)r(r-a)(l-Ml-p))^^'
where f>a, 0<A<1 and 0</?<l.
Therefore:

1. (X,Y)\P=p has a double negative Binomial distribution with m.p.f:

Axj^i^yy^^^Q-Mi-pV^iMi-p))^. (ii)
p
x\ v!
2. (X,Y) is an extended bivariate Waring distribution (from now on EBWD)
withp.m.f:

/(jr.n(*..y) = /o y: — . (12)

where the constant of normalization,^, is

f0 = Fx(a\ fcfoyaj)-* = 2Fx(fx\Px +Pi\Y\XTx- (13)


Below, we are going to prove these statements:

1. Integrating in lx and 12:


Extended Waring Bivariate Distribution 225

/((X,Y)\P,p(x,y)
,x ,y i $ - l - / , / u . A - l - / , / t i

Jo Jo JC! y\ Y{Px)v& T{P2)vh


1 f" -AO+tr'K"*"1
40
x\y\ Y(px)T{p2)v^ /o"*" ^" ^
,
Jo
c 2 u
**,d2
-/ (i+ -'VfA-
.j%-w«r
1 1 U4)
^-r(x+^)rcv+/?2) i + -

x\y\ \v) \v + \,

JC!J! vw + U \v + \

(l-A(l-p))A + A (,1(1-/,))*+>.
x!j>!
2. Firstly, we note that since

Fl(a;0l,02;r,A,A) = — fJ—
r(r)
U P
i „ dp
f p^-'a-p)" (15)
1
' 2
r(a)r(r-flr) o(i-/i(i-^ + A
r(or)r(y-ar)Jon-An-D^
the function in Eq. (10) is a density one. Then,

W * . y H „'° x\y\
. , * "0-/0 '•Fl{a;puP2;y-A,X)
X
r, y ya'X^-pf-'dp
.(fl)x(fl), A"" rOQ
x\y\ Fx(a;px,P2;y;A,A) Y{a)T{y-a)

xjy-»-\\-Py+y+a-ldp (16)
,(#),(&), Ax+y T{y)
x\y\ Fx{a;P,P2;y-A,X) T{a)T{y-a)
yT{y-a)T{x + y + a)

r(x + y + y)
i (<*WflUAMx+y
' Fx{a;PuP2;r,A,X) (y)„yx\y\
226 J. Rodriguez-Avi et al.

It can be observed that if A=\ the expressions in Eqs. (10), (11), (12) and
(13) reduce to those deduced by Xekalaki [4].

3. Properties of the EB WD
In this section we show some of the properties of the EBWD. Firstly, the
p.g.f. is given by:
g(tl,t2) = f0F](a,]3l,j32;y;AtuAt2), (17)
which is convergent for |fi|<l, fel^l if W^l (with y>c&P\+p2 for the case in that
A=l).
The probabilities may be obtained in a recursive way, since this distribution,
like the BGWD, belongs to the Pearson's system. Then, the p.m.f.,f^s, satisfies
the following system of difference equations:
(y + r + s)(r + l)fr+Us-A(a! +r+ s)(ft+r)frs=0
(r + r + s)(s + \)frs+x -A(a + r + s)(02 + *)/,,, = 0 .
So, if the constant of normalization, fofi=/o given in Eq. (13), is known, the
remainder probabilities are obtained. When A=\ this constant may be computed
exactly from the Gauss summation theorem:

(r-«-/?,-AW 2
_r(y)r(y-a-/31-/32)
nY-px-p2)T{y-a)
In the general case, the value of this constant is computed by
approximation.

3.1. Mixture ofbivariate confluent hypergeometric distributions


Xekalaki [7] proves that the EBWD may be obtained as a mixture of a
generalized Gamma distribution and a bivariate confluent hypergeometric
distribution. Specifically, suppose that:
• (X,Y)\A=l has a joint distribution with p.g.f. given by:

,F,(A+A;r;0 '
where
Extended Waring Bivariate Distribution 227

tip, (c)i+j i\j\


• A has a generalized Gamma distribution with density given by:

/•(/)= i*i(fl + A ; r ; Q ^-I.-ZM (22)


a
/l r(ar)2F1(«,A+^;r;/l)
Then, the p.g.f. of (X,y) is:

However, Xekalaki does not study this distribution in depth.

3.2. Marginal and conditional distributions


The marginal distributions for /t=l are generated by a 2Fl(a,P\,y~p2;l) and
a 2F\(a,P2,Y-P\\\), respectively. Therefore, they are UGWD.
The following result is verified for any A:

= (a)r(A)r r y{a + r)s(P2)s r


(V)r r\ ~ {y + r)s s\

= f0i^^^-2F](a + r,P2;r + r,A).

Thus, the marginal distributions have the p.m.f.:

fr=fo 7 T ~ 1 — 2 F i ( a + r,p2;r + r;A)


(25)
(a)s(P2) Xs
fs=fo 7-T—f—2FX{<X + S,PX;Y + S;X),

whereto is the constant of normalization given in Eq. (13). Then, it should be


emphasized that:
• The marginal distributions are not GHD, but they are UGWD when A=\, so
they are more general distributions that the Waring distribution.
• We may obtain the p.g.f. of the marginal distributions since:
228 J. Rodriguez-Avi et al.

2FX(CC,PX+P2;Y;X)

gy(0 = g(U) = —=T. -5 ^ r-r-


2Fx(a,Px+p2\y;A,A)
Another important question that will be finalized later, is the distribution of
X+Y, that is, the distribution of the number of accident in the whole period:
,rt „ r t Fx{a,P„P2;y;At,At) 2Fx(a,px+P2;y;At)
8x+r (0 = £('> 0 = "777—75—75 TTT = — ^ 7 — 7 5 75 T7' (21>
Fl(a,p,p2;y;A,A) 2Fx(a,px + P2;y;A)
It is a GHDl{a,P\+Pi,y,X), as it was desirable. Hence, the total number of
accidents has a GHDl, independently of the division in two sub-periods, while
the number of accidents has a distribution with p.m.f given by Eq. (25) in each
sub-period.
In order to obtain the conditional distributions, we can operate in the
following way:

f (<*U,(0l)r<A), M
J2r±1
/,/,= — = — , (28)

having the expressions


(a + s)r(Px)rAr
Jrls ~ JO
(y + s)rr\
(29)
(a + r)s(P2)sAs
J sir ~ JO/r
(y + r)ss\
wheref0/s=2Fi(a+s,pU}^-s-,A.yl andfo/^^ia+r^.y+nAyK
Their p.g.f, therefore, are:
g(t) = f0/s2Fl(a + s,pl;y + s;At)
(30)
g(t) = f0lr2Fx(a + r,P2;y + r;At).
So, these distributions belong to the GHDl family.

3.3. Components of the variance


Xekalaki [6] obtained the components of the variance for X+Y that, in our
case, has a GHDl. So, we have the following variance components (Rodriguez
etal.[4]):
Extended Waring Bivariate Distribution 229

a2 = Var(X + Y) = (fl +/32)EP(V) + {ft + /32)EP(V2)


> v / > v .
randomness liability
+ (&+/32)2VarP(V),
v
v '
proneness

where V=Z(\ —P)/[\ -A(l —P)] and P has a distribution with the density function
given in Eq. (10).
Concerning X and Y, since both variables are obtained as mixtures, their
variances may be split into three components
a\ = Var{X) = PXEP(V) + PXEP(V2) + ffVarP(V)
a) = Var(Y) = p2EP{V) + /32EP(V2) + P22VarP{V\
in the same way as the BGWD.

4. Applications
To conclude, we consider data about the number of driver accidents in
Connecticut (Xekalaki [6]).
The parameters are estimated by the maximum likelihood method because
the method of moments does not provide good estimates. Then, the log-
likelihood function, whose expression is

In L(a,Px,p2,Y,X) = n In /„ + jSn(ar) X/+Vj + j > ( # ) * ,


1=1 i=\

+ £ln(/? 2 ), j -£ln(r) ;ti+y/ (33)


i=l i=l

+ lnA£(x,+^-2tax,!-2lnj/1.!,
;=i i=i 1=1

is maximized, for (xi,yi),..., (*„,_>>„) a sample of size n.


The parameter estimates provide a
£50T>(1.O133,8.O91,7.2535,63.346,O.77468).
Table 1 includes the results of the ^-goodness of fit test (observed and
expected frequencies), indicating the classes that have been grouped in order to
consider expected values greater or equal than 5. The value of the x2-statistic
(14.046) is less than the one obtained for the BGWD and, also, the p-value is
higher (0.0806).
With regard to the components of the variance, the values obtained are
included in Table 2. It should be noted that the majority of the variability is due
to randomness. Moreover, the external factors or liability have less incidence
230 J. Rodriguez-Avi et al.

than the internal factors or proneness in the explanation of the behavior of the
number of accidents. It should be pointed out that even though the BGWD and
the EBWD are different, the values obtained for the variance components are
very similar to those obtained by Xekalaki, so it seems that both models
coincide in the explanation of the factors that influence the number of accidents.

Table 1. Observed and expected values

1931-33
1934-36 0 1 2 3 4
23881 2117 242 17 2
0
23887.9478 2146.1793 214.6711 23.6536 0.4292
2386 419 57 9 3
1
2378.6215 418.1887 61.6481 8.9106 0.2410
275 64 12 5 1
2 67.5159 2.3224
260.5670 13.0563 0.0874
22 5 2 2 0
3
31.1481 10.5873 2.5195 0.5297 0.0264
5 4 0 1 0
4
4.0282 1.6850 0.4739 0.1145 0.0073

Table 2. Components of the variance

Components 1931-33 1934-36 1931-36


Randomness 0.1261(86.3336%) 0.1138(87.2669%) 0.2398(78.5701%)
Proneness 0.0160(10.9505%) 0.0130(9.9876%) 0.0579(18.9580%)
Liability 0.0040(2.7159%) 0.0036(2.7457%) 0.0075(2.4719%)
Total 0.1460 0.1304 0.3052

References
1. J.O. Irwing. (1968). The generalized waring distribution applied to accident
theory. Journal of the Statistical Society, Series A, 131, 205.
2. A.W. Kemp and CD. Kemp. (1975). Models for Gaussian hypergeometric
distributions. Statistical Distributions in Scientific Work, 1,31.
3. S. Kocherlakota and K. Kocherlakota. (1992). Bivariate Discrete
Distributions. Marcel Dekker.
4. J. Rodriguez-Avi, A. Conde-Sanchez, M.J. Olmo-Jimenez and A.J. Saez-
Castillo. (2004). Properties and applications of the family of Gaussian
discrete distributions. Proceedings of the International Conference on
Distribution Theory, Order Statistics and Inference in Honour of Barry C.
Arnold, Santander, Spain.
Extended Waring Bivariate Distribution 231

5. E. Xekalaki. (1983). The Univariate generalized waring distribution in


relation to accident theory: Proneness, spells or contagion? Biometrics, 39,
887.
6. E. Xekalaki. (1984a). The Bivariate generalized waring distribution and its
application to accident theory. Journal of the Royal Statistical Society,
Series A, 147,488.
7. E. Xekalaki. (1984b). Models leading to the Bivariate generalized waring
distribution. Utilitas Mathematica, 25, 263.
Chapter 13
APPLYING A BAYESIAN HIERARCHICAL MODEL IN
ACTUARIAL SCIENCE: INFERENCE AND RATEMAKING

J.M. PEREZ-SANCHEZ
Department of Quantitative Methods in Economics
University of Granada, 18071-Granada, Spain

J.M. SARABIA-ALEGRIA
Department of Economics, University ofCantabria, 39005-Santander, Spain

E. GOMEZ-DENIZ
Department of Quantitative Methods in Economics
University of Las Palmas de Gran Canaria, 3'5017'-Las Palmas de G.C. Spain

F.J. VAZQUEZ-POLO
Department of Quantitative Methods in Economics
University of Las Palmas de Gran Canaria, 35017-Las Palmas de G. C. Spain

In a standard Bayesian model, a prior distribution is elicited for the structure parameter in
order to obtain an estimate of this unknown parameter. The hierarchical model is a two
way Bayesian one which incorporates a hyperprior distribution for some of the
hyperparameters of the prior. In this way and under the Poisson-Gamma-Gamma model,
a new distribution is obtained by computing the unconditional distribution of the random
variable of interest. This distribution seems to provide a better fit to the data, given a
policyholders' portfolio. Furthermore, Bayes premiums are thus obtained under a bonus-
malus system and solve some of the problems of surcharges which appear in these
systems when they are applied in a simple manner.

1. Introduction
From the Bayesian standard model point of view, a structure parameter
follows a prior distribution. A hierarchical model is a two way Bayesian model
which incorporates a hyperprior distribution for some of the hyperparameters of
the prior. A new distribution is obtained by computing the unconditional
distribution of the random variable of interest if the Poisson-Gamma-Gamma
model is used. This distribution provides a better fit to the data. The hierarchical
approach reflects a different statistical perspective on how to model the expert's

233
234 J.M. Perez-Sanchez et al.

information within the Bayesian framework. This Bayesian hierarchical


methodology incorporates both the prior distribution and the data information
into one unified modelling framework. In order to consider a hierarchical Bayes
elicitation, we have to assume a framework in which structural and subjective
prior information can be used to yield an elicited prior. In the hierarchical Bayes
scenario, we have to specify our subjective beliefs about the hyperparameters of
the prior distribution.
A Bayesian approach allows the statistician to compute the posterior
probability for each model in a set of possible models. Using hierarchical
approach, analysis can facilitate the choice of a satisfactory prior distribution. In
this paper, we use this methodology in order to analyze its application to an
insurance framework.
We apply the hierarchical model for computing bonus-malus premiums
(BMP) in the same way as Lemaire [3]. Thus, hierarchical methodology
incorporates knowledge about the number of claims believed a priori. The
distribution of the number of car accidents in an automobile portfolio is known
to be well fitted by a Poisson distribution, assuming that 0 is the mean of the
number of claims. Let us assume that the portfolio is not homogeneous and that
the frequency of the risks is different in each case.
Bayesian hierarchical methodology is based on the use of hierarchical priors
and we need to specify how the data (x) depends on the parameter of interest
(0), the likelihood function, f(x\0,F), where x represents the sample
information and F is an unknown parameter.
The prior specification is restricted to two stage priors:
• The standard prior distribution, nx (0 \ A, G) , where A is a hyperparameter
in A . This level indicates how the parameter of interest ( 0 ) varies throughout
the population, depending on two unknown parameters A and G.
• The proper prior, n2 {A, F, G). In the second stage, instead of estimating
0, it will be considered as a random variable. In this level, we obtain a true
prior density on the set of nuisance parameters, depending on A , G and F.
The variables could be scalars, vectors or matrices, but here they are
represented as scalars.
A third level distribution is to specify the posterior distribution of 6, or
some features thereof. In this sense, we must specify the posterior distribution in
terms of the posterior distributions at the various stages of the hierarchical
structure. Therefore, we need to specify n2{0 \ x) in the third stage.
The main goal of a hierarchical Bayesian analysis is often to obtain the
posterior distribution. If we apply the Bayes' theorem, we would obtain the
posterior distribution in the following form:
Applying a Bayesian Hierarchical Model in Actuarial Science 235

_ (0 x) _ WU^ 1 g ^ f f i ( g I *•>G)*2V»F,G)dMFdG
Hl\f(x\&,F)xl(0\A,G)x2(A,F,G)dAdFdGd0'
It is of great interest to estimate the posterior mean E(0j \x) and the
variance E(6f). However, it is possible that the posterior distribution of A , F
and G is in our range of interest. In this case, we need to compute:

- (u F C\x)~ ^(^F,G)lf(x\0,F)^(0\A,G)d0 (2)


\W\fi.x\e,F)trl(0\X,F,G)jt2(X,FJG)dXdFdGd0'

This model was introduced by Lindley and Smith [1]. More recently,
Klugman [7] analyzed the normal-normal hierarchical structure from the
Bayesian point of view. Cano [8] applied this methodology to study the
Bayesian robustness of the model.
However, a continuous distribution is clearly inappropriate for frequency
counts. For severe or total losses, the distribution places probability in negative
numbers and so the Poisson and negative binomial are much more commonly
used.
The rest of this paper is structured as follows: Section 2 analyzes a
hierarchical Bayesian structure, the Poisson-Gamma-Gamma model. In Section
3 we use this model to compute premiums under a bonus-malus system. Section
4 applies the above results to an actuarial example. Finally, section 5 contains a
discussion of related work.

2. Inference procedure
In this section, the hierarchical Bayesian Poisson-Gamma-Gamma model is
studied. In this case, the hierarchical model is a two way Bayesian standard
model which is built in the following way:
Firstly, we have the model depending on an unknown parameter 0,
f(x | 0). We assume a Poisson distribution, i.e.,
f(x\0) ~ P(0). (3)
Secondly, parameter 0 follows a prior distribution which is assumed to be a
Gamma distribution. Then:
K,{0\a,b) ~ G(a,b), a,b>0 (4)

where the Gamma distribution has a probability density function proportional to


0"-'e-M .
236 J.M. Perez-Sanchez et al.

Thirdly, and finally, a gamma hyperprior distribution is assumed for the b


parameter of the prior nx (0 \ a, b), i.e., b ~ G(a, /?), a, p > 0.
Therefore
K2{b) ~ b"-'e-pe. (5)
It is well known that a mixture of distributions is a simple way to obtain new
probability distributions. Thus, we can build the prior distribution of 0 without
depending on the b parameter, to obtain:

nx{6\a,a,P) = ^e\b)7r2{b)db

= \™baT(a)9a-xe-b(>par(a)ba-xe-pbdb

1 (6)7?)"-'
(6)
B(.a,a)P{\ + Oipy

where B(x,y) denotes the usual beta function.


This distribution corresponds to the Pearson type VI distribution, sometimes
called second-kind beta distribution or beta-prime distribution, with scale
parameter p (Stuart and Ord [5] and Johnson et al. [9]). A random variable
with pdf (6) can be denoted by 6 ~ B2(a, a; P) .
The moments of KX (9 \ a, a, P) can be calculated by using:

MM-W'^fl'^l?:*, if«>r (7)


T(a)T(a)
Thus, the mean and the variance are:

E«SD = ~^~, if a>\,


a-\

VarW = * f l +g - 1 ) / ? ' ifa>2.


(a-I)2 (a-2)
These results are obtained under straightforward computations. An
interesting property of the prior distribution in the Poisson-Gamma-Gamma
model (or Poisson-second kind beta) is the over-dispersion it presents with
respect to the classical Gamma distribution. In other words, when the mean of
the Poisson-second kind beta distribution is equal to that of the Gamma
distribution, the variance of the former is greater than that of the latter. This
property gives the model more flexibility and makes it appropriate to use in a
Applying a Bayesian Hierarchical Model in Actuarial Science 237

BMS, where the variance of the observed data is generally greater than the mean
(Shengwang et al. [12]).
The following proposition gives the posterior distribution of 9 under the
hierarchical Bayesian model.
Proposition 1 The posterior distribution of 9 given the data x in the
hierarchical Poisson-Gamma-Gamma model is given by

r(a + xjU(a + x, x - a + 1, fit)

where

lHm,n,z)=-±-re-usm-\l + sy--lds, m,z>0, (9)


r[m] JU
is the confluent hypergeometric function (Goovaerts and De Pril [4]).
Proof. It is straightforward by applying Bayes' Theorem.

3. Experience rating
To illustrate our approach, we apply the results obtained for computing
premiums under a BMS. This is a merit rating method used in automobile
insurance where the number of claims modifies the premium. A model often
used for experience rating in a BMS assumes that each individual risk has its
own Poisson distribution for a number of claims, assuming that the mean
number of claims is distributed across individual policyholders (Coene and
Doray, [10]; Corlier et al, [2]; Lemaire, [3], [6], [11]).
A bonus-malus premium (BMP) can be computed under the variance
principle (Gomez and Vazquez, [13]) in the same way as Lemaire [3] built a
BMP under the net principle. In this sense, we have:

\ a + \)2K<<X\x)dX f (X + \)x(X)dX
J
PBH'HX,0= J -r 5 (io)

Observe that this expression is simply a rate between a posterior magnitude


and the corresponding prior. Next proposition gives the BMP in (10) under the
model assumed in Section 2.
Proposition 2 Under the Poisson-Gamma-Gamma model the variance bonus-
malus premium given in (10) is computed as:
238 J.M. Perez-Sanchez et al.

A+B+C
'{x,f) = K (11)
D + C ''
where
A = P2{a + x + \)(a + x)11{a + x + 2,x-a + \,pt),
B = 2j3(a + x)ll(a + x + l,x + a + 2,pt),
C = V.(a + x,x-a + l,0t),
aP a{a + a-\)p2 a2p2
K = +1
(a-iy(a-2) (cc-1)2

and 1l(m, n, z) is the confluent hypergeometric function defined in (9).


Proof. It is straightforward to prove this proposition by:
t ,, IN n i wo P(a + x)%l(x + a + l,x + a + 2,pt) ,
I (A
JA + I M A *)«& = - WT-1 , .. +1,
Xi(a + x,x-ar + l , ^ )

and
fa + \fn(X I x)dX = \A2JI(A I JC)<M + 2 f A^(A | x)dX +1
JA JA JA

ni, ,N/ . li(jc + a + 2 , x - a + l,jflf)


ll(x + a,x-a + \,pt)

11(x + a,x-a + l,pt)

Although we do not have a perfect closed form for this BMP, its
computation is simple by using, for example, MATHEMATICA software,
because the confluent hypergeometric function is tabulated.

4. Numerical example
In this section, the results obtained in the preceding sections are illustrated
with an example from Lemaire [3], which represents the claims made by
policyholders of a Belgium insurance company during four periods.
Figure 1 shows the distribution for the number of claims, which provides a
fairly good fit, accepted by the %2 -test of goodness of fit.
The mean and variance of this distribution are 0.1011 and 0.1074,
respectively. The parameters of the structure function were estimated by
applying the method of moments. The estimated parameters are 5 = 3.25585,
£2 = 6.13732 and fi = 0.159492.
Applying a Bayesian Hierarchical Model in Actuarial Science 239

The results are illustrated in Table 1, which shows the BMP for the
hierarchical structure considered (in bold) and the BMP for the standard
Bayesian methodology.

120000
to
0)
g 100000
'o
c 80000
I Q Adjusted
CD 60000
cr D Observed
CD
O
c/j 20000
<
1 2 3
Number of claims

Figure 1. Observed distributions

Table 1. Bonus-malus premiums under both standard and hierarchical models

X
t 0 1 2 3
1 0.994 1.050 1.105 1.161
0.993 1.048 1.131 1.265
2 0.998 1.041 1.094 1.146
0.988 1.036 1.104 1.202
3 0.984 1.033 1.083 1.133
0.984 1.027 1.086 1.164

It is clear from Table 1 that the relative premiums allow the transition rules
commented above. For example, a policyholder has to pay 1.104 monetary units
in the second period because of his/her two previous claims. In the next period,
the policyholder will have to pay 1.164 monetary units if he/she makes a claim.
However, the premium will decrease to 1.086 monetary units if he/she does not
make a claim. This behaviour is observed for all the premiums, and so we obtain
BMP by using a hierarchical Bayesian model.
240 J.M. Perez-Sanchez et at.

Table 2 shows how a hierarchical BMP gives a bonus to good drivers with
respect to standard Bayesian premiums by decreasing their percentage of
penalization for the transition x-0-»x = 1 and t = l->t = 2. However, the
hierarchical structure increases the percentage of penalization for the other
transitions.

Table 2. Percentage of penalization

Ax
l->2 4.7% 10% 15.2%
4.3% 11.1% 20.9%
2->3 3.5% 8.9% 13.9%
3.9% 9.3% 17.1%

5. Conclusions
In this article we review some aspects of the hierarchical Bayesian models
and emphasize the Poisson-Gamma-Gamma model because of its practical use
in actuarial science. In order to model the number of claims of a BMS, we use a
hierarchical structure in which the second-kind beta distribution arises as the
hyperprior distribution.
The model poses no additional complications, as many of its positive
properties can be deduced analytically. The model can be applied
straightforwardly to actuarial premium-setting problems, and we show that these
premiums follow the transition rules of BMS. These transition rules allow the
malus policyholders to be surcharged and a bonus given to the bonus ones.
In order to check the prior distribution, we can carry out a Bayesian
robustness analysis of the premiums in the same way as Gomez and Vazquez
[13]. These authors studied the sensitivity of a BMS from a standard Bayesian
point of view. In the hierarchical setting, a Bayesian robustness analysis can be
carried out in the same way as in Cano [8], where the normal-normal
hierarchical model is analyzed.

References
1. D.V. Lindley and F.M. Smith. (1972). Bayes estimates for the linear model.
Journal of the Royal Statistical Society B, 34, 1-41.
2. F. Corlier, J. Lemaire and D. Muhokolo. (1979). Simulation of an
Automobile Portfolio. Essays in the Economic Theory of Risk and
Insurance, 11,40-46.
Applying a Bayesian Hierarchical Model in Actuarial Science 241

3. J. Lemaire. (1979). How to define a bonus-malus system with an


exponential utility function. Astin Bulletin, 10, 274-282.
4. M.J. Goovaerts and N. De Pril. (1980). Survival probabilities based on
Pareto claim distributions. Astin Bulletin, 11, 154-157.
5. A. Stuart and J.K. Ord. (1987). Kendall's Advanced Theory of Statistics
(Vol. 1, Chapter 6). New York: Oxford University Press.
6. J. Lemaire. (1988). Construction of the new Belgian motor third party tariff
structure. Astin Bulletin, 18(1), 99-112.
7. S. Klugman. (1992). Loss Model from Data to Decisions. New York:
Willey.
8. J.A. Cano. (1993). Robustness of the posterior mean in normal hierarchical
models. Communications in Statistics, 22(7), 1999-2014.
9. N.L. Johnson, S. Kotz and N. Balakrishnan. (1995). Continuous Univariate
Distributions (vol. 2, second edition, chapter 27). John Wiley, New York.
10. G. Coene and L. Doray. (1996). A financially balanced Bonus-Malus
system. Astin Bulletin, 26, 107-116.
11. J. Lemaire. (1998). Bonus-Malus system: The European and Asian
approach to merit-rating. (With discussion by Krupa Subramanian, "Bonus-
Malus system in a competitive environment"), North American Actuarial
Journal, 2(1), 1-22.
12. M. Shengwang, W. Yuan and G. Whitmore. (1999). Accounting for
individual over-dispersion in a bonus-malus automobile insurance system.
Astin Bulletin, 29(2), 327-337.
13. E. Gomez and F.J. Vazquez. (2005). Modelling uncertainty in insurance
bonus-malus premiums principles by using a Bayesian robustness approach.
Journal of Applied Statistics, 32(7), 771-784.
Chapter 14
ANALYSIS OF THE EMPIRICAL DISTRIBUTION OF THE
RESIDUALS DERIVED FROM FITTING THE HELIGMAN AND
POLLARD CURVE TO MORTALITY DATA

F. ABAD-MONTES
Dpto. Estadistica e Investigation Operativa, Universidad de Granada
C/Fuentenueva, s/n, Granada, Espana

M.D. HUETE-MORALES
Dpto. Estadistica e Investigation Operativa, Universidad de Granada
C/Fuentenueva, s/n, Granada, Espana

M. VARGAS-JIMENEZ
Dpto. Estadistica e Investigation Operativa, Universidad de Granada
C/Fuentenueva, s/n, Granada, Espana

In studying the behaviour of human phenomena, it is of interest to examine the patterns


that remain more or less stable, whether the comparison is made of different populations
at a given moment or at different times, or of the same population in different situations.
Such regularities have long been modelled, and this has enabled researchers to discover
aspects and properties that are inherent to the phenomenon being studied. In the present
paper, various techniques, some of which are relatively modern, are applied to the
analysis of the empirical distribution of the residuals derived from fitting the Heligman
and Pollard curve to mortality data. Firstly, we perform a graphical illustration from the
time perspective (curves fitted over various periods) and then a static one for the ages (i.e.
obtaining fits to different ages). The aim of this study is to explore the different
distributions of the residuals at each age and thus to evaluate the correspondence between
models (such as the Heligman and Pollard curve) and reality (the observed rates of
mortality). For this purpose, we use graphical techniques, non-parametric techniques such
as kernel smoothing, splines and weighted local fit, and generalised additive models,
together with bootstrap sampling techniques to describe distributions of statistical
measures of the residuals.

1. Introduction
It is frequently necessary to determine the density function of certain data
sets, especially when such data present characteristics which, a priori, cannot be
assumed to behave like standard probability models. The experience of

243
244 F. Abad-Montes, M.D. Huete-Morales and M. Vargas-Jimenez

demographers in fitting the Heligman and Pollard (H-P) curve to rates of


mortality shows that there exists a biased systematic pattern in the values fitted
for given age ranges. This curve provides a fairly good description of the
behaviour of mortality rates as a function of age, and thus it is widely used.
However, in the present study we seek to better identify the limitations of
forecasts made using H-P fitting, by means of a statistical analysis of the
behaviour of the distribution of residuals.
Analysis of such Heligman and Pollard residuals (rHP) was carried out
using standard current techniques to estimate the properties of distributions from
a perspective that is basically non parametric.

2. Data
We took the rHP residuals derived from the results of fitting H-P curves to
the mortality rates, qx, observed for ages 0 to 84 years for the population of
Andalusia for the period 1976-2002.

3. Exploration and graphical summary of the distributions of the


residuals
3.1. Behaviour of the residuals in each period (H-P fit)
Apart from a few anomalous points, the average behaviour is analogous in
each fit. The curves are assumed to be fitted in a similar way in each period
observed.
The fit of a spline shows a line close to the zero line, in accordance with the
previous figure.
In short, the above figures show a similar behaviour pattern for the residuals
derived from an H-P fit for each curve fitted for the corresponding period.

3.2. Behaviour of the fits for each age group


The figure shows the differences between the distributions of the residuals
for each age group: Box graph and Dispersion or scatter plot diagram
The same cannot be said of the pattern of the residuals when we consider
the pattern of the distribution of each age that is examined.
Fitting the Heligman and Pollard Curve to Mortality Data 245

i i i i i i i i i i i i i i i i i i i i i i i i i i i

1976 1979 1982 1985 1988 1991 1994 1997 2000


Period

Figure 1. Distribution of the H-P residuals by period

n PJ

J- o
ro
3
TJ

Q> o
DC o

iJjijjjNiN'iMijiijjiiiii
o_
9
1975 1980 1985 1990 1995 2000
Period

Figure 2. Distribution of the H-P residuals over the period


246 F. Abad-Montes, M.D. Huete-Morales andM. Vargas-Jimenez

0-
I
X
CO

CD
a:

in in II11III in in i n 11 n in mi in in mi in in MI mi in in mi ill ill mi ill ill in


0 4 8 13 18 23 28 33 38 43 48 53 58 63 68 73 78 83

Age

Figure 3. Distribution of the residuals for each age

Figure 4. Distribution of the residuals by age


Fitting the Heligman and Pollard Curve to Mortality Data 247

3.3. Behaviour pattern of means and variances of the residuals according


to the age and period examined
The next figure shows the systematic behaviour pattern of the means and
variances of the residuals according to the age at which the fit to the mortality
rate is carried out. The trend of the latter is seen to be less regular for the fit in
relation to the period.
to the age at which the fit to the mortality rate is carried out. The trend of
the latter is seen to be less regular for the fit in relation to the period.

Mean of the residuals at each period i Variance of the residuals at each period

Mean of the residuals at each period Variance of the residuals at each period
0.00
i
Variance
0.00010
1
1
0.00000
1
1

"i 1 1 r 1 1 1 1 1
1975 1980 1985 1990 1995 2000 1975 1980 1985

Figure 5. Means and variances for ages and periods

The top left figure shows that the assumption that the distribution of the
residuals presents an approximately zero mean for each age is unlikely to be
fulfilled.
248 F. Abad-Montes, M.D. Huete-Morales and M. Vargas-Jimenez

It can be seen that the curves are not fitted in the same way for every age; at
some (60-80 years), the figures show the residuals to be systematically negative.
Another noteworthy aspect is the diversity in the variability.

4. Non-parametric regression curves


Sometimes it is impossible to model a function using parametric techniques.
The nube de puntos diagram showing residuals versus age seems to show non-
linear effects of age versus the value of the residuals. There are, however,
flexible methods of describing such non-linear relationships, namely non-
parametric regression techniques. Different algorithms for fitting non-parametric
curves enable us to represent the effects of independent variables without
specifying the global shape of the relationship, which facilitates a clearer visual
interpretation of local behaviour patterns.
Assume a sample (XJ, yi) i=l,..., n of values in the variables X and Y.
Let us denote the relation between x and y by
y = / / ( x ) + £• (1)
where describes an unknown function of x, representing the trend
underlying the data, which is normally a smoothed trend fitted to the nube de
puntos, and which can be estimated by various methods, among which the
following are the most widely used:
a) The results obtained from locally averaging the response values observed in a
range of values close to x, as kernel smoothing. This method produces an
estimation of the mean response of Y in x, by means of the following ratio:

**> = ' A J (2)


i V D )
where the nucleus function k is a function of the symmetric density (normally,
the standardised normal) and where b is a constant that determines the size of the
averaging operation; its value represents a midway position between an
estimation that is more or less biased and which contains a greater or lesser
degree of variability. The weights used for calculating the average of the
response values decrease with increasing distance from point x.
b) The results obtained from the weighted local fit of a p-type polynomial.
Several variations of this method have been developed, including loess and
locpoly, implemented in R, which differ from each other in the parameter used
Filling the Heligman and Pollard Curve to Mortality Data 249

for the smoothing process. Cleveland's local regression method (loess)


establishes a neighborhood at point x, determining a proportion of points (the
span) to be used to estimate the mean response of Y at this point.
The loess function enables us to achieve a local fit that adapts more flexibly
to the trend of the data. It occupies an intermediate position between the global
fitting of a function (linear, quadratic, cubic, etc.) and local fitting based on
averaging the points (the calculated percentage of the total n) which constitute
the closest neighborhood to each fitted point. Higher span values correspond to
smoother curves.
The method consists, for each point x to be fitted, of performing a weighted
regression of a curve, whether linear or polynomial, to the proportion of points
closest to x thatcomprise the neighborhood in question. The weights reflect the
proximity to or distance from the point with the tri-cube function , and are
assigned to each point of the neighborhood.
Given a point x( of the neighborhood of x, let M(x) = ffldxp: — XAbe the
maximum distance for the points xs of the neighborhood of x.
The weight of each point of the neighborhood is equal to
f vY
X — X;
w(x) = 1- (3)
M(JC)

The method implemented in R, locpoly, enables both a regression fit and a


density function to be obtained. In the process of fitting the local polynomial, it
utilises the kernel weights derived from a k function (normally the standardised
normal)
x, -x
(4)

the values of which decrease as x; becomes more distant from x.


The value of the estimated curve is equal to the intercept of the fitted local
polynomial, and is obtained by minimising the sum of the weighted squares.

ti\y,-(P0+^l-x)+Mxl-xf+...+pp(xl-xy)Jk
(5)
Assuming that the weights matrix at a point x is
O, ~ x)
W{x) = Diag\k (6)
b
and that the matrix X evaluated at the point x is
250 F. Abad-Montes, M.D. Huete-Morales and M. Vargas-Jimenez

(x, -xf
X(x). (7)
p
1 xn-x ... (xn-x)
the value fitted in x is the first term (corresponding to the intercept) of the vector
solution by minimum weighted squares:
(X(x)W(x)X(x)ylX(xyW(x)y (8)
c) The results obtained from the definition of a curve as a linear combination of
baseline functions that constitute powers of x.
The splines method defines a curve in terms of linear combinations of
functions of powers of x that constitute a base. These are made up of polynomial
fragments that are defined in regions which are separated by knots or cutoff
points a,, ...,a K .
This method may be considered an extension of standard linear regression.
Under linear regression, the estimated values derived from a polynomial
expression in x are obtained by
y = X(X'X)-lX'y = Hy (9)
where X is given by the matrix nx(p+l), to fit a/?-type polynomial, the columns
of which form the base {l,x,x ,...,xp} , which is evaluated at the n points of
the sample.
The structure of the linear model can be generalised for the treatment of
non-linear, more complex structures, by including new functions in the above
base to represent truncated polynomials. For example, the p-type spline with K
knots in ak has the following parametric expression:
K
p
M(x) = /30+j3lx + ... + j3px + YJ ak(x-ak)1 (io)
where the truncated polynomial term
/ NO \(x-akYfor x>a.
p k
(x-ak) +=r ' y * (11)
\ 0 otherwise
has the base functions {l,x, ...,xp, (x — fl,)f ,...,(x ~ aFc)+}- In total there are
K+p+1 base functions, and this is described as a p-type truncated power base of
the spline model.
For any set of knots, the curve can be estimated by least squares using
multiple regression on the base functions evaluated in the n values observed
inX.
Fitting the Heligman and Pollard Curve to Mortality Data 251

One base that is widely used is that of cubic splines, specifically, a series of
cubic polynomials grouped around certain values of x, (the knots), {aj}, such
that the curve is continuous and with continuous first and second derivates. Each
spline is a 3 r -degree polynomial function over the interval [aj, aJ+i].
The dispersion or scatterplot diagram may sometimes suggest the
approximated location of the knots, being the points where the curve seems to
cross the trend line. The greater the number of knots, the greater the flexibility
of the curve. Nevertheless, an excessive number of knots may give an
impression of random fluctuations in the curve, thus obscuring the mean trend.
When there are many knots, and it is not straightforward to reduce this
number, their influence can be restricted by adopting a specific criterion, such as
the following:

a k <cte (12)
k=\
In this case, rather than minimizing

Y-X W (13)
a
\ J
we seek the solution to
(' R\ CR\
P
Y-X + A(p',a')D (14)
\aJ \ a j
where D is the diagonal matrix in which the first p+1 elements are null and the
rest are ones. The solution is given by
t = X(X'X + ADylX'y = S,y (i5)
S is termed a smoothed matrix.
If lambda is zero, the case is unrestricted. If the knots cover the range of
values of Xj reasonably well, the fit approaches the interpolation of the data. A
very large value of lambda weakens the influence of the knots and the fit is
smoother. As the effect of the knots decreases, the results are closer to a
standard parametric regression, the shape of which depends on the degree of the
spline.
In practice, we seek a lambda that produces a curve that is reasonably close
to the data but which eliminates the superfluous variability. In general, logically,
a spline of the order of p=3, for example, is more flexibly adapted to the data
than is a linear spline, but if there are many knots and penalised splines are used,
the differences are imperceptible.
252 F. Abad-Montes, M.D. Huete-Morales and M. Vargas-Jimenez

In addition to the base functions of truncated polynomials, others can be


used. Indeed, in practice, it tends to be more useful and easier to implement
certain bases which produce equivalent results. One of the main disadvantages
of truncated polynomial bases is their lack of orthogonality and the instability
that can arise when many knots are used.
Among the possible base functions are B-splines, which are useful because
of their property of numerical stability, in comparison with the base of series nf
truncated powers and natural splines, which are linear in the intervals
, where a.\ and aK are the first and last (external) knots.
The spline functions implemented in the statistical software R, bs() and ns(),
generate the bases of B-splines and of natural splines, respectively, that can be
used in regression.
By means of the smoothing spline method, a compromise can be obtained
between the degree of fit of the curve to the data and the smoothing of the shape.
It is not constructed explicitly, but rather is obtained as the solution to an
optimisation problem.
It is estimated using the criterion of penalised least squares, which
minimises the sum of the squared residuals and penalises them using the integral
of the squared second derivate, thus taking into account the degree of curvature
of the estimated function.

X bz-M^P + ^jV'O)]2^ (i6)


It has been shown that the minimiser is a cubic spline with knots at the n
points Xj.
A cubic spline with knots at the n points Xj of the sample does not
interpolate the data if lambda is greater than zero. The greater or lesser value of
lambda controls the smoothing of the curve. A small lambda value gives rise to a
curve that fits closely or interpolates the sampled data, while a large value
produces a parametric fit that is dependent on the baseline functions of the
spline.
The goodness of the fit will depend on the degree of the polynomial
elements, on the number of knots and on the value of the lambda parameter that
is used for smoothing. This lambda value has a great influence on the results of
the fitting. By varying the lambda value, from lesser to greater, we can see, on a
two-dimensional figure, how the curve tracks a trend that is perhaps clearer but
at the cost of being less well adapted to the whole data set. The choice of the
most appropriate lambda value is a difficult one. There are automatic procedures
Fitting the Heligman and Pollard Curve to Mortality Data 253

for this, which are based on the nature of the data, and one of the most
commonly used such procedures is that of cross validation (CV).
The cross validation technique consists of dividing the data set into two
parts: one that is used to estimate the model and another that enables us to make
a prediction. Thus, the values that are used for predicting do not play any part in
the fitting procedure. A particular case consists of reserving a single observation
for predicting, the rest (n-1) being used to estimate the model, in each of the n
partitions created.
Given n values in the response Y: y]5 ..., yn and the corresponding predicted
values, y_x,...,y_r..y_n , CV is defined as the sum of the squared residuals:

cr=5>,->_,)2 ( 17 )
where JK-, is the predicted value of the i-th case, when this case has not
been used to estimate the model.
In particular, given a lambda value and the predicted value of Xj in the non-
parametric regression curve, computed without the observation (XJ, yj), which we
shall denote as Ai,-/ \xi) , then the following definition may be made:

CVA =£&,-fiXw_ (x,))2 (18)


1=1

What is chosen is the lambda value that minimises CV.


In most statistical programs that implement this procedure, the fit is
obtained by specifying the degrees of freedom of the curve or by applying cross
validation.
The splines described above can be presented in the form
M = Szy (19)
They are described as linear because they are linear functions of the data
vector y, where the matrix S does not depend on y.
The lambda parameter is difficult to interpret, but a transformation of this
value, given by the trace of the matrix S, also reflects the amount of smoothing
applied to the curve. Under standard (parametric) regression analysis, the trace
of matrix H (matrix hat) is equal to the number of parameters fitted, which is
equal to the degrees of freedom of the fit. In a similar way, the trace of S can be
seen as a generalisation of this concept, being interpreted as "equivalent"
degrees of freedom of the fit.
254 F. Abad-Montes, M.D. Huete-Morales and M. Vargas-Jimenez

5. Estimating the probability density function


Estimation of the density using the kernel method is done by means of the
expression:

estimating f(x) for a random sample x1; ..., xn_where k is a symmetric density
function, for example, the standardised normal function. The value h is usually
large enough so that excessive smoothing is not produced, thus avoiding the
elimination of significant modes, but not so small as to allow too many random
spikes. A large value would lead to an excessively biased estimate, while a low
one would produce an estimate with too much variability. The choice of h is not
immediate. Some authors have proposed the execution of various solutions in
order to determine an optimum value. The method implemented in R is proposed
by Sheather and Jones (1991).
The following figures show an initial approximation of the density function.

-0.02
\ inJLt n
-0.01 0.00 0.01 -0.02 -0.01
JL.
0.00 0.01 0.02

Residual H-P Residual H-P

S-,

I 1 1 1 1
< *M
i — i — i —
-0.02 -0.01 0.00
L_ i —
0.01
i
0.02
-0.02 -0.01 0.00 0.01 0.02
Residual H-P
Figure 6. Distributions of residuals by periods
Fitting the Heligman and Pollard Curve to Mortality Data 255

Although the sample size is small, we can see the high degree of similarity
in the pattern of the probability density function in each period, with similar
ranges of variability and similar function shapes.
A graphic examination of the distribution, according to the age of the
subject, reveals patterns that are much more varied.

All ages Age 39

-0.02
—T-
-0.01
JL T
o.oo
I
0.01 0.02 -0.02
-r-
-0.01
T
0.00
—I—
0.01 0.02

Residual H-P
Residual H-P
Age 69
Age 64

i
-0.02
r
-0.01
i T
0.00 0.01 0.02
~\ i
-0.02
r
-0.01
A T"
0.00
1
0.01
1
0.02

Residual H-P Residual H-P

Figure 7. Distributions of residuals by ages

The above figure shows various shapes and differing ranges of variability in
the density functions that were estimated for different age values.

6. Statistical inference based on the empirical distribution


It is of interest to estimate aspects of the probability distribution F of the
residuals, based on a sample of size n. The estimated empirical distribution of F
256 F. Abad-Montes, M.D. Huete-Morales andM. Vargas-Jimenez

is the discrete distribution with a probability 1/n associated with each sample
value. This plays the role of a fitted model when no mathematical shape is
assumed for F.
To proceed with the statistical inference, here we assume a non-parametric
model with a sample of independent and identically distributed observations of
an unknown distribution F. In a parametric model, the estimator has a parametric
distribution, while in the non-parametric situation, we work with an empirical
distribution function. In the methods described below, we make use of
simulation to estimate the quantities of interest. The aim of this is to explore the
sample distribution of the mean and the variance as estimators of the mean value
and the variance of the residual associated with a particular age. The utility of
the bootstrap procedure is greater in cases for which there is no theoretical
knowledge of the distributions of the values.

6.1. Bootstrap
These methods are applied both when the probability models are well
defined and when they are not. One of the greatest proponents of the bootstrap
method of simulation is Efron. Based on the sample data, it is possible to make
an inference regarding certain aspects of the distribution.
Thus it is possible to explore, in a relatively straightforward way, the sample
distribution of the estimator of a parameter, for which we cannot a priori assume
any given model.
Let us assume that the parameter 0 is estimated from the sample x=( x t , ...,
x2), from which we calculate the value of interest t(x). The bootstrap sample
x*=( Xi*, ..., xn*) is then obtained by selecting and replacing n values of the
sample observed. For each bootstrap sample, we obtain the corresponding
replica of the statistic t(x*).
The bootstrap procedure consists of selecting B samples of size n with
replacement of the original sample x, and estimating the value t(x*) for each one
of these.
One of the most interesting values for measuring the accuracy of a statistical
measure in making an inference is the standard error associated with the
estimation. In this context, it is obtained as the standard deviation of the B
replicas of the bootstrap value corresponding to the B samples selected with
replacement.
Fitting the Heligman and Pollard Curve to Mortality Data 257

|Zk**)-r(*-)j (21)
e.e.(t(x))=-
5-1
where

fa (22)
'(**) =
B
The bias is estimated as the difference between the mean of the bootstrap
distribution and the value observed in the original sample.
Here, in particular, we are interested in the mean value of the residuals for
each age value, together with the variance or standard deviation as a measure
of dispersion.
One of our goals is to calculate the approximate distributions of the mean
and the standard deviation of the residuals for different ages. We wish to study
the differences there may be between the behaviour patterns of the residuals
derived from the fits, using a non-parametric analysis, that is, one based on the
pattern of the empirical distribution or the non-parametric estimation of F.
The graphic representation of the distributions of the estimators, in turn,
allows us to see whether the distribution is symmetric or biased. The graphic
representation of the estimate of the probability density function for each age
enables us to make visual comparisons.
The various methods of constructing confidence intervals also constitute a
powerful inferential tool.

6.2. Density (kernel) function of statistical values obtained with the


bootstrap method
Sometimes it is useful to represent the density function of the estimator in
order to study the differences with respect to the normal model, for example the
mode or modes, and the symmetry. The histogram constructed using the
distribution of the bootstrap values gives us an overall idea of the shape. A more
refined method is that of estimating the density function.
One of the most commonly used methods is that of the kernel function,
which can be estimated by
. 1 * (t-t'b)
(23)
J
258 F. Abad-Montes, M.D. Huete-Morales and M. Vargas-Jimenez

where k is the standard normal density function. As observed above, the value h
determines the degree of smoothing of the estimated function, and the selection
of this parameter is more important than that of the k function; its designation is
a crucial element in the estimation process. A value that is too high or too low
could mask possible modes, producing too much smoothing of the shape of the
function. On the other hand, there could also be a behaviour pattern with
multiple spikes, possibly a chance occurrence. For this type of estimation, it is
recommended that the number of bootstrap samples should be quite large (1000
or more).

i T ~r I 1
2 e-04 3 e-04 4 e-04 5 e-04 6 e-04 7 e-04 8 e-04

Mean: age 54

r T" 1
2 e-04 3 e-04 4 e-04 5 e-04 6 e-04 7 e-04

Standard deviation: age 54

Figure 8. Histogram and density of bootstrap distributions (means and standard deviation: age 54)
Fitting the Heligman and Pollard Curve to Mortality Data 259

Distribution bootstrap bootstrap quantiles normal


o
o
o
m
o
o
o
O .

_2> cp -

m
O o

9-

o
o
o

o
CO
o
o

i 1 1 1 1 1 1

-0.0030 -0.0020 -0.0010 0.0000 -10 1 2 3

Mean Quantile normal

Figure 9. Bootstrap distributions (histogram, density, and quantiles of means: age 75)
260 F. Abad-Montes, M.D. Huete-Morales and M. Vargas-Jimenez

III

a.

o
o
.o

c
ID
Q

-0.003 -0.002
1
-0.001
. \k1 0.000
1
0.001
1
0.002

Means boostrap
Figure 10. Bootstrap distributions of means residuals for various ages
Fitting the Heligman and Pollard Curve to Mortality Data 261

a.
(0
o
o
.o

c
Q

"i 1 1 1 1 1 1 r

0.0000 0.0005 0.0010 0.0015 0.0020 0.0025 0.0030 0.0035

Stand. Dev. bootstrap

Figure 11. Bootstrap distributions standard deviations of residuals for various ages

6.3. Bootstrap confidence intervals


As we are unaware of the theoretical distribution of the residuals, we shall
use bootstrap techniques to construct confidence intervals for a parameter 0 with
a value t(x) evaluated in the observed sample.
Among the best-known such techniques are the following:
Normal standard bootstrap interval: this is the simplest, being obtained
from the sampled estimation of the original sample t(x), adding and subtracting
the product of the bootstrap standard deviation and the a/2 order quantile of the
standard normal.
t(x) +- Zo/2 (bootstrap standard deviation)
262 F. Abad-Montes, M.D. Huete-Morales and M. Vargas-Jimenez

The interval of the percentiles: this is obtained from the a/2 and 1- a/2
order quantiles of the bootstrap distribution obtained from the B bootstrap values
of the parameter in question.
l-a=Pr(quantile ((tx*\ua/2) )< 0 < quantile (t(x*)(a/2)))
Another interval based on percentiles is the so-called basic interval, which
is obtained from
Pr[2t(x)- (quantile (t(x*) (1 . a/2) ) < 0 < 2t(x)- quantile (t(x*)(a/2))]
To do this, an appropriate transformation, for example the logarithmic
transformation in the estimation of the standard deviation, could improve the
limits to a certain extent. This is in contrast to the previous example, in which
the transformation was respected. Variations may occur in the case of
asymmetric distributions.
Note: a greater number of bootstrap distributions are required than are used
to determine the mean and the standard deviation, because of the need to
estimate the percentiles of the bootstrap distribution. The normal value taken is
B=1000ormore.
Other, improved, versions include:
t-intervals. These are useful for statistical measures such as the mean (in
general, for statistical measures of location). The idea is to imitate a Student-t
measure to overcome our ignorance of the standard deviation when an inference
is made concerning the mean. These intervals require us to estimate the variance
of the statistic for each bootstrap sample. The interval is based on the
Studentised statistic.
Bca intervals. Intended to correct bias. These, too, are calculated for
percentiles of the distribution of the B bootstrap replicas of the statistic, but
while the percentile intervals directly use the a/2 and 1- a/2 order quantiles to
define the extreme values of the confidence interval, those employed in Bca are
obtained by first deriving new al and a2 orders for the quantiles of the
distribution; the values of these depend on two constants termed acceleration, a,
and bias correction, zO, and are estimated from the bootstrap values (Efron and
Tibshirani, 1993).

The following results show the confidence intervals for the mean of the
residuals for ages 37, 54 and 75 years.
Fitting the Heligman and Pollard Curve to Mortality Data 263

Table 1. 37 years
Level Normal Basic Percentile BCa
90% (-0.0001, 0.0001) (-0.0001, 0.0001) (-0.0001, 0.0001) (-0.0001, 0.0001)
95% (-0.0001, 0.0001) (-0.0001, 0.0001) (-0.0001, 0.0001) (-0.0001, 0.0001)

Table 2. 54 years
Level Normal Basic Percentile BCa
90% (0.0003, 0.0006) (0.0003, 0.0006) (0.0003, 0.0006) (0.0003, 0.0006)
95% (0.0003, 0.0006) (0.0003, 0.0006) (0.0003, 0.0006) (0.0003, 0.0007)

Table 3. 75 years
Level Normal Basic Percentile BCa
90% (-0.0021,-0.0008) (-0.0021,-0.0008) (-0.0021,-0.0009) (-0.0021,-0.0009)
95% (-0.0022, -0.0007) (-0.0022, -0.0007) (-0.0022, -0.0008) (-0.0022, -0.0007)

Bootstrap intervals for standard deviation: ages 37, 54, 75 years

Table 4. 37 years
Level Normal Basic Percentile BCa
90% (0.0001, 0.0002) (0.0001, 0.0002) (0.0001, 0.0002) (0.0001, 0.0002)
95% (0.0001, 0.0002) (0.0001, 0.0002) (0.0001, 0.0002) (0.0001, 0.0002)

Table 5. 54 years
Level Normal Basic Percentile BCa
90% (0.0003, 0.0006) (0.0003, 0.0006) (0.0003, 0.0006) (0.0003, 0.0007)
95% (0.0003, 0.0006) (0.0003, 0.0006) (0.0003, 0.0006) (0.0003, 0.0007)

Table 6. 75 years
Level Normal Basic Percentile BCa
90% (0.0016, 0.0025) (0.0016, 0.0025) (0.0014, 0.0024) (0.0016, 0.0026)
95% (0.0015, 0.0026) (0.0015, 0.0026) (0.0013, 0.0025) (0.0015, 0.0027)

6.4. Diagnostic figure of the specific effect of each observation


(Jackknife-after-bootstrap)
Jackknife
One of the most commonly used methods of estimating the bias and error of
an estimator is known as the jackknife. This technique was proposed by Tukey
and is less computationally intensive than the bootstrap method. With this
technique it is also possible to make inferences in situations in which little
populational information is available.
264 F. Abad-Montes, M.D. Huete-Morales and M. Vargas-Jimenez

For i=l,...,n, the jackknife i-th sample, denoted by x(-i), is obtained by


eliminating the i-th element x;, from the observed sample. The i-th replica of the
statistic t(x) is based on this sample and is the partial estimator or statistic that is
evaluated in this sample, t(x(-i)). Therefore it uses the empirical distribution of
the n-1 points in x(i). Thus, we obtain the following set of pseudovalues that
represent a new sample:
t*(x(l)), t*(x(2)), ...,t*(x(n)), where t*(x(i))= n t(x) - (n-1) t(x(-i)) for i=l,...,n
The jackknife estimator of© is obtained as the mean of these pseudovalues:

(24)
t*(x) = Jl
n
The variance of the jackknife estimator is obtained in a similar way to that
used to derive the variance of a sample mean.
±[t*(x(i))-t*(x)¥
n2
Jackknife-influence values are established for the n values of the sample at
differences of t(x(-i)) - t(x) for i=l, ..., n.
The techniques known as "jackknife-after-bootstrap" consist of applying
jackknife to the results generated by the bootstrap method. One means of
checking or diagnosing the degree of influence of a given observation x( of the
sample on the value of the statistic t used in bootstrap is the jackknife-after-
bootstrap figure. This method enables us to detect the changes produced in the
empirical quantiles of t*-t if an observation x; is eliminated from the sample.
Specifically, we construct a figure with various quantiles (such as 0.05, 0.10,
0.16, 0.5, 0.84, 0.9, 0.95) that are determined using bootstrap with all the values
of the original sample and represented by horizontal lines. Each of the n Xj points
of the sample is represented with abscissas that are equal to the corresponding
values of empirical influence (for example, the jackknife value obtained by
regression) and with ordinates that are equal to the value of the difference
between the quantile obtained with the complete bootstrap simulation and the
quantile obtained with simulations from which xs is absent1.
Note: The influence function or influence component can be considered a
type of derivate that reflects the change in t(F) when the distribution F is
subjected to a small contamination in x. These values are useful for determining
Filling the Heligman and Pollard Curve to Mortality Data 265

the approximate variance of a statistic taking into account that such a statistic
may be a kind of first order expansion of a Taylor series (for more information,
see Efron and Tibshirani, 1993, pp. 298-302).

Mean age=54 Stand.Dev. Age=54

9
CD

9
0)

'-'*'*;*-jr?i.*:
~f-.-.-.-T^!"

i-;-»r»ir<*«W* -•HT«**-

0 1 2 3 4

standardized jackknife value standardized jackknife value

Figure 12. Jacknife-after-bootstrap (mean and stand, dev. age =54)

The figure highlights the noticeable effect of Observation 19. The


sensitivity of bootstrap techniques to anomalous values makes it advisable for
these to be eliminated.
266 F. Abad-Montes, M.D. Huete-Morales and M. Vargas-Jimenez

7. Aspects of inference in non-parametric regression

7.1. Confidence intervals in splines


As in parametric regression, questions may be raised concerning inference
on the fitted curve. Specifically, we are interested in obtaining confidence
intervals for values fitted on the curve.
Confidence and prediction intervals for fitted values
Given the model
y =M+£ (26)
and assuming that for a given smoothing parameter, the non-parametric
regression curve can be stated in the linear form
M = Sy (27)
The covariance matrix of the fitted vector, jj., is Cov(/j) = SS'(J .
Given an estimator of the residual variance, & , we can obtain an estimation of
this matrix, the diagonal elements of which represent the estimated variances of
the components of the vector ju, by merely replacing (7 with o in the
expression of the above covariance matrix. Thus it is possible to derive
confidence intervals in a similar way to parametric regression, as well as
prediction intervals for new estimations of the dependent variable.
If the errors £ in the model y — f-l{x) + £ are normal and with a constant
2
variance C , the intervals are defined by:

where ^ ( x 0 ) is the standard deviation for the value fitted in xO, ju(x0),
obtained from the square root of the estimated variance,
V(ju(x0)) = S'x0 Sx0(T , where the row vector of S, termed Sx0, defines the
linear combination of values of y such that / i ( x 0 ) = Sx0'y.
For a small sample size, the Student t may be replaced by the normal value.
The degrees of freedom are those appropriate for the closest integer, and
correspond to the residual part of the fitted model. If the errors are not normal
and if n is large enough, the intervals given above may continue to be valid,
because of the central limit theorem.
The prediction intervals are also derived in a similar way to the parametric
regression, that is, by means of
Fitting the Heligman and Pollard Curve to Mortality Data 267

/)(x0) ± zx_alla^Ts\^ (29)


Logically, these are broader because they reflect, additionally, the
uncertainty in the observation about its mean.
The estimated value of the residual variance <J is obtained in a similar
way to the parametric regression, as the ratio of the sum of the squares of the
residuals (SSR) and the associated degrees of freedom. In parametric regression
we find that the expected value of the SSR is equal to the product (n-p) <J ,
where p is the number of parameters in the model. Thus, we obtain as the
estimator of the variance of the residuals the ratio
SSR
(30)
n-p
In non-parametric regression, it can be shown that the expectation of the
sum of the squares of the residuals is approximately
E(SSR) * o-2[track(SS') - 2track{S) + n] (31)
and we obtain as the estimation of the variance of the residuals the following
ratio:
.2 SSR SSR
cr = = (32)
&-l-reSid n ~ 2track(S) + track(SS')
In fact, the intervals that are constructed in this way are nnt really
confidence intervals for , but are for the exp<*<~t<^ value of . They
can only be interpreted as confidence intervals for if there is no inherent
bias in the regression curve, and this is very difficult to detect. Therefore, it is
more appropriate to use the term bands of variability, and a value of 2 is
normally used for z. In practice, prediction intervals are usually interpreted as
confidence intervals, because the bias is usually small compared to the
variability, and therefore can be ignored.
It should be remembered that these intervals cannot be interpreted as
descriptors of the global characteristics of the entire curve, as they only reflect
the behaviour pattern of each point that is estimated.
The following figure shows the variability bands of the fitted curve:
268 F. Abad-Montes, M.D. Huete-Morales and M. Vargas-Jimenez

Confidence intervals (2 stand, dev.)

Age

Figure 13. Variability bands of de fitted curve

7.2. Confidence intervals (bootstrap procedure)


As remarked above, one of the problems inherent in fitting a non-parametric
curve is that of bias. For the above-constructed intervals to be interpreted as
confidence intervals it is necessary for the fitted curve to be free of bias.
In constructing confidence intervals, various authors have proposed more or
less complex methods to take the above consideration into account. A strategy is
also employed in bootstrap methods, for example that of using a smoothing
parameter that is small enough to simulate the residuals of the fit and thus reduce
the degree of bias, but describing a curve with a shape that is less smoothed than
that corresponding to an optimum smoothing parameter derived from a cross-
validation criterion.
Fitting the Heligman and Pollard Curve to Mortality Data 269

In a similar way to the use of bootstrap methods in standard regression,


here we carry out a bootstrap simulation of splines fitted to the residuals as a
function of age, in order to determine the mean expected residual for a given
age.
To do so, we take 999 bootstrap samples, in order to obtain confidence
intervals for the values fitted to given ages. Specifically, once we have fitted a
spline to the data, and in order to take into account the problem of bias that is
inherent to these non-parametric regression methods, the following simulation
mode is now adopted.
From the optimum smooth parameter derived from the fit to the data of the
original sample, we obtain a spline that produces a greater degree of smoothing
in the model, using a duplicate of the original parameter, the estimations of
which, therefore, present less variability. Moreover, we determine another spline
that produces greater variability in the estimations, and therefore reduces the
bias.
The residuals derived from these latter splines (with greater variability),
sampled with replacement, are used to generate the new sets of responses, being
added to the values fitted with the spline obtained with the duplicated smooth
parameter.
The following figure shows the confidence intervals resulting from the
simulation using this strategy, which enables us to alleviate the problem of bias.
The intervals were determined for certain ages. Note that in some of these
the interval does not contain the value zero.

Age: 37, 54, 75


Fitted value: 3.427417e-05, 3.125760e-04, -1.132426e-03
Lower limit: -0.0001572603, 0.0001310923, -0.0013644013
Upper limit: 0.0001851737, 0.0004741195, -0.0009346646
Confidence level: 90%
270 F. Abad-Montes, M.D. Huete-Morales and M. Vargas-Jimenez

Confidence Intervals for values fitted with splines

i
1

Age

Simulation bootstrap

Figure 14. Confidence intervals for values fitted with splines

7.3. Comparison of linear and non-parametric models


It can be seen, moreover, that the proposed model with a non-parametric
curve produces a better fit than does the linear model. This contrast of the
models is highly significant.
The following results show that the replacement of a slope by the smooth
curve gives rise to a significant reduction in the residual part of the model, which
demonstrates that the residuals present a non-linear dependence with age, which
corroborates the above graphic studies.
Fitting the Heligman and Pollard Curve to Mortality Data 271

The contrast statistic used approximates an F distribution. Thus, given the


following models:
Model 1: linear model that expresses residuals as a function of age;
Model 2: smooth spline
The statistic is given by the ratio

(SSRl-SSR2)/(g!2-g.l.l)
SSR2/(n-g.1.2) ^V2-*.u,„-,,.2 (33)

where SSR1 and SSR2 are the sums of the squares of the residuals in
Models 1 and 2, respectively; g.1.1 and g.1.2 are the corresponding degrees of
freedom. Thus, we obtain a significance of the order of 2.2e-16.

8. Brief review of generalised linear models, additive models and


generalised additive models
Generalised linear models
Generalised linear models (GLM) are an extension of linear models, and
their characteristics enable the application of a unified statistical approach, based
on the common structure they share.
The linear model of normal errors * =
X/3 + s
is extended to responses
with other distributions within the exponential family, which also includes the
normal distribution, to facilitate the modelling of variables with densities
belonging to this family, whether discrete variables such as Poisson and
binomial ones or continuous ones such as the normal and gamma types.
The GLM model consists of a random component, Y, a systematic one or
linear predictor V ~
xp
, and a link function, g, to relate them.
The distribution of Y has a density function that takes the following shape:
fr(y.) = e x p j ^ ^ + ^ o ) ! (34)
0 is called a natural parameter, while O is the dispersion narameter.
The link function, g, relates the mean ofY, M = E(Y), with the linear
predictor 7 = Xp by means of g O ) = XP.
The solution to the parameters, P , of the model is obtained by maximum
likelihood estimation. The solution derived from applying weighted least squares
to a new response variable, Z, is given by:
272 F. Abad-Montes, M.D. Huete-Morales and M. Vargas-Jimenez

fi = X'Diag] Var(z ) \X X' Diag\


Var(z^)
(35)
i J

where the dependent variable Z is:


Z = XJ3 + Diag{g\Mi)Yy ~ M) (36)
(this can be considered a first-order approximation to a Taylor series), and the
variance is expressed as:
Var(Z) = ®Diag{v(Mi)[g'(Mi)Y) (37)
with the weights being determined from the inverse values of the variance.
Additive models
In an additive model, the linear terms Pj-^j in the expression representing
the linear predictor xp of the linear model, are replaced by functional terms
J j that are smoothed and non linear. In this sense, the additive model can be
considered an extension of the linear model. It is described by the following
expression:
Y = a + fl(Xl) + ... + fp(Xp) +e (38)
The fj functions, as is the case in the linear regression model with the
coefficients of regression, describe the effects of each independent variable, and
it is important to detect whether their inclusion in the model significantly
improves it or not.
Once the model has been fitted, the additivity property of the effects enables
us to examine and to evaluate separately the particular way in which each
variable affects the response.
We have seen, in the two-dimensional case (X, Y), how to find a smoothed
function that adapts to the trend or trace of a set of two-dimensional data (xi5 y,).
Forms of non-parametric regression such as smoothing splines can now be
considered candidates to represent, in a simultaneous way, the effects produced
on a variable Y in a model with multiple independent variables XI, ..., Xp.
The problem now encountered is that of finding the smoothing parameters
simultaneously. The most commonly adopted method consists of estimating
each term using a smoothing parameter.
An iterative solution has been proposed to fit these / / functions, namely
the backfitting algorithm. In general terms, the reasoning underlying this is
based on carrying out two-dimensional fits, such as smoothing splines or
local regressions, to two-dimensional data that are successively generated.
If we assume, in principle, that the model is correct, that is, if
Fitting the Heligman and Pollard Curve to Mortality Data 273

I = a + / , ( A 1 ) + .• • + Jp\Xp) + £ a nd the corresponding J, terms (j=l,...,


p) are optimum, then it is acceptable to assume that the expectation of the
residuals derived from subtracting t h e sum of all the terms except the j-th one
from the response will be equal to J j .
E{RJ}=E{Y~[a + MXi) + ...fJ_l(Xj) + fj+l(XJ+l) + ... + fp(Xp)^ = fj(Xj)
(39)
Therefore, (Xj3 Rj) would be well represented by a non-parametric
regression curve of the type described. In practice, we begin with an initial
solution (a non-parametric curve for each term) and then iteratively obtain new
estimations for each fj, fitting non-parametric curves, f , to the partial
residuals R , which are updated at each step, and eliminating all the effects of
the other variables from Y before performing the smoothed fit
l(XJ) = ftRJ) = f(Y-te + faX1) + .jJ_](XJ) + l+](XJ+J + ... + fp{Xp$

where f^iX-) = f(R:) is a smoothed non-parametric regression curve for the


response i? on the independent variable Xj. The process ends when the solution
stabilises.
Generalised additive models
In a similar way to the extension of the linear model to additive models, we
can consider that of the generalised linear model (GLM) to the generalised
additive model (GAM), assuming instead of the systematic component or linear
predictor T] = a + ftxXy + ... + fipX , with the link function g(ju) = X/3, a
non-linear component of the form a + fx (X,) + . . . + fp(Xp) .
Broadly speaking, the fitting of a generalised additive model is based upon
fitting a GLM by means of an iterative process of weighted least squares,
substituting the steps concerning the weighted fits of parametric linear
regression with steps of non-parametric additive regression, after having
modified the algorithm to fit a weighted additive model.

9. Deriving the density function by fitting a generalised additive


model (GAM)
Some authors have proposed the strategy of estimating the density function
of a variable X by means of regression analysis. In this context, the independent
variable is comprised of the points in the range of values observed in X that
represent the mean points of the rectangles making up the histogram that
describes the data sample, with intervals of equal amplitude, a. The dependent
variable is formed by the corresponding heights of the rectangles, obtained as the
274 F. Abad-Montes, M.D. Huete-Morales and M. Vargas-Jimenez

ratios between the numbers of observations in each interval and the


corresponding amplitude of the latter.
Given the sample size, n, it can be assumed that the number of observations
lying within the i-th interval follows a binomial model B(n,pi), where pi is equal
to the ratio between the number of observations in the interval and n. For a large
n and a small p, the binomial model is capable of approximating a Poisson
model, and so we may assume that for the centre of the i-th interval, Xj, we have
the value of the variable Y=nj = the number of observations in the interval i.
Therefore, a generalised additive regression model can be applied to the set
of data (XJ, nj).
The generalised linear model
logO,) = # , + # * , (41)
is not flexible enough to fit the density curve. If, instead of the linear expression
as a function of x; we take the curve S(XJ), such that
log(/!,) = *(*,.) (42)
the resulting fitted curve enables us to obtain the frequencies estimated for each
interval, from which we can derive the corresponding heights of the rectangles
of the histograms, by dividing by the product of the sample size and the
amplitude of the interval.

/(*,)= (43)
n -a
where a= amplitude of the interval.
To achieve acceptable results, the sample size and the number of intervals
must be large. Although the procedure is of most interest for statistics where it is
more difficult to identify the shape of the density function, especially in cases
where the curve may present various modes and perhaps biased behaviour
patterns, we shall apply it here to show, for example, the distribution of the data
resulting from a bootstrap simulation of the sampling means of the H-P residuals
recorded at 64 years of age.

In total, there were 9999 values of the means of the H-P residuals, from
which we obtained a frequency table of 100 intervals of equal amplitude, of
approximately a=0.00001.

The following figure shows the results obtained.


Fitting the Heligman and Pollard Curve to Mortality Data 275

Distribution (xi.ni) Comp. non param. 6 g.l.

^^
CD CNJ
C
m
0)
h
(0

in

-1 e-03 -1 e-03 -6 e-04 -2 e-04

Mean

Density function

Mean

Figure 15. Result for a bootstrap simulations of the sampling means of H-P residuals (64 years)

In the next figure, we compare the density obtained by the above-described


procedure with the density function corresponding to a normal distribution in
which the mean and the variance coincide with those of the data.
276 F. Abad-Montes, M.D. Huete-Morales andM. Vargas-Jimenez

Density functions

-1 e-03 -8 e-04 -6 e-04 -4 e-04 -2 e-04 0 e+00


Mean

Figure 16. Result for generalized additive model and normal

10. Approximation of the distribution using saddlepoint


In practice, various methods have been applied to approximate the
distribution of a statistical measure. One of the most commonly used of these is
the normal approximation, although this does not always provide accurate
results. A useful tool for describing distributions, and which is sometimes
utilised, is the so-called saddlepoint technique, which generally provides good
approximations even with small samples and in the tails of distributions, being
based on the generator function of cumulants.
Fitting the Heligman and Pollard Curve to Mortality Data 277

Although this technique was first used in the 1930s, it has recently
become popular again as a means of approximating the density function. It is, in
fact, a refinement of the Edgeworth expansion, which is frequently used to
approximate an unknown distribution for which the moments are known. This
technique gives good results in the centre of the distribution but sometimes
leaves much to be desired in the tails, and can even give negative results for the
density in such zones.
The derivation of the density function and the distribution function is based
on the cumulant generator function K(t) and on its first two derivates with
respect to t, K'(t) and K."(t).Therefore, it requires the cumulant generator
function to have a known, manageable shape, a fact that means it cannot be
widely used in practice. Moreover, it is necessary to numerically resolve the so-
called saddlepoint equation for each value of the variable of interest.
The cumulant generator function K(t) of a variable X is given by the
logarithm of the moment generator function m(t).
m{t) = E{e,x)=\e'xf{x)dx (44)

K(0 = log£(^)} (45)


The general procedure for a saddlepoint approximation to the density and
distribution functions of a statistic Y, expressed as a linear combination
Y = 2^ai^i of n random variables X1; X2, ..., Xn which are identically and

independently distributed with the F distribution, is as follows:


Let -^xvObe the cumulant generator function for each variable X; from
which it is possible to obtain that corresponding to Y, using
K(t) = Y,Kx(tai),
For every value of the variable Y, y, for which we wish to approximate the
distribution function FY(y), and the density function, fY(y), it is necessary to
resolve the saddlepoint equation K'(t)=y, the solution to which t=ty can be
obtained, for example, by Newton-Rapson.
Different forms of the saddlepoint method are used in practice, one of the
simplest being the Lugannani-Rice and Barndorff-Nielsen methods of
approximating the distribution function. These are given, respectively, by the
following formulas:

P(Y<y) = Fsadd(y) = ®(w) + <f>(w) (46)


278 F. Abad-Montes, M.D. Huete-Morales and M. Vargas-Jimenez

P(Y<y) = Fsadd(y)=® w+—log (47)


w
where the functions <P and 9 are, respectively, the distribution function
and the density of the standardised normal, and the values w and v are obtained
from the value of the solution of the saddlepoint equation, ty, from
r- , ,-J/2
12
w = sign(ty) 2{tyy-K(ty)}] (48)

v = t, (49)
The saddlepoint density function is approximated using:
fsadM = [^%)YeK^ (50)
In particular, in the context of the replacement of a sample Xi, X2, ..., Xn
where X;, is selected with a probability pj=l/n, we can assume a multinomial
distribution with a sum equal to n, given by the variables (n*1; n*2, ..., n* n ) that
describe the number of times that (Xi, X2, ..., Xn) appears, and the mean sample
statistic given by the linear combination

(51)
i n
with a;=Xj/n which has a cumulant generator function given by

K(t) = n\og LP.e" (52)

The corresponding saddlepoint equation for a point Xo in the range of X is


given by

K\t) (53)

Of more interest is the application of these results to the bootstrap


techniques with the linear approximation of a statistic, using the values of
empirical influence. For example, if we approximate the T* statistic by
* 7

T* = t + f5L (54)
then T*-t can be expressed as the linear combination of the n*i; with aHi/n,
where 1, are the influence values of the statistic.
Fitting the Heligman and Pollard Curve to Mortality Data 279

The following figure shows the density of the variance statistic for the H-P
residuals corresponding to the age of 75 years. It can be seen that the normal
density does not produce such a good fit as does the saddlepoint approximation,
especially in the tails.

Histogram of variances boots (age 75)

o
o
ty

o
{/) CM
c
L> o

I T" I
~r
2 e-06 4 e-06 6 e-06 8 e-06

Variance

Figure 17. Distributions saddlepoint and normal

11. Conclusions
The most important source of heterogeneity in the residuals is not in the sets
generated by the different curves that are fitted for each year, but within each
curve, in those generated for different ages.
280 F. Abad-Montes, M.D. Huete-Morales andM. Vargas-Jimenez

The mixing of residuals into a single set reveals a distribution with a


behaviour pattern far removed from the normal one. The different means and
dispersions, corresponding to residuals derived from fits for different ages, give
rise to a distribution with various modes.
The following figures show the results of distributions of the simulated
mean and variance estimators of the total set of H-P residuals, without
distinguishing by age or period.

Distribution bootstrap Bootstrap quantiles normal

\
\ 1
1

1
1
1

r 1
c
CD
Q i
»
1

1
I
rr
(
1
|f
1]

I
i
1 \
f to
>

T^ I T" "T
V 1
-0.00020 -0.00005 0.00010 -3 -2 -1

Mean Quantiles

Figure 18. Distributions of simulated means variances


Fitting the Heligman and Pollard Curve to Mortality Data 281

Distribution bootstrap Bootstrap quantiles normal

1
i
3

t \ a

1
a i
0.0020
1
0.0025
• \

1
0.0030

Variance Quantiles

Figure 19. Distributions of simulated variances

Exploration of the distributions of certain statistical measures of interest


enables us to evaluate behaviour patterns. Graphic techniques, as well as those
for fits for models of greater or lesser complexity, and particularly non-
parametric techniques, can be complementary and constitute useful tools for
performing this task, in which we seek to discover how schemas for modelled
structures (the Heligman and Pollard curve) are adapted to reality (the mortality
rates observed).

References
1. Booth, J.G, Hall, P. and Wood, A.T.A. (1993). Balanced importance
resampling for the bootstrap. Annals of Statistics, 21, 286-298.
282 F. Abad-Montes, M.D. Huete-Morales and M. Vargas-Jimenez

2. Davison, A.C., Hinkley, D.V. and Schechtman, E. (1986). Efficient


bootstrap simulation. Biometrika, 73, 555-566.
3. Davison, A.C. and Wang, S. (2002). Saddlepoint approximations as
smoothers. Biometrika, 89(4), 933-938.
4. DiNardo, J. and Tobias, J.L. (2001). Nonparametric density and regression
estimation. Journal of Economic Perspectives, 15(4), 11-28.
5. Efron, B. (1990). More efficient bootstrap computations. Journal of the
American Statistical Association, 55, 79-89.
6. Efron, B. (1986). How biased is the apparent error rate of a prediction rule?
Journal of the American Statistical Association, 81,461-470.
7. Efron, B. and Tibshirani, R. (1993). An Introduction to the Bootstrap.
Chapman & Hall.
8. Efron, B. (1992). Jackknife-after-bootstrap standard errors and influence
functions (with Discussion). Journal of the Royal Statistical Society,
Series B, 54, 83-127.
9. Gleason, J.R. (1988). Algorithms for balanced bootstrap simulations.
American Statistician, 42, 263-266.
10. Johns, M.V. (1988). Importance sampling for bootstrap confidence
intervals. Journal of the American Statistical Association, 83, 709-714.
11. Hall, P. (1989). Antithetic resampling for the bootstrap. Biometrika, 73,
713-724.
12. Hinkley, D.V. (1988). Bootstrap methods (with Discussion). Journal of the
Royal Statistical Society, Series B, 50, 312-337, 355-370.
13. Hinkley, D.V. and Shi, S. (1989). Importance sampling and the nested
bootstrap. Biometrika, 76, 435-446.
14. Kuonen, D. (1999). Saddlepoint approximations for distributions of
quadratic forms in normal variables. Biometrika.
15. McCullagh, P. and Nelder, J.A. (1989). Generalized linear models.
Chapman & Hall.
16. Rust, R.T. (1988). Flexible Regression. Journal of Marketing Research.
17. Sheather, S.J. and Jones, M.C. (1991). A reliable data-based bandwidth
selection method for kernel density estimation. Journal of the Royal
Statistical Society, Series B, 53, 683-690.
18. Silverman, B.W. (1985). Some aspects of the spline smoothing approach
to non-parametric curve fitting. Journal of the Royal Statistical Society,
Series B, 47, 1-52.
19. Stone, M. (1974). Cross-validation choice and assessment of statistical
predictions (with Discussion). Journal of the Royal Statistical Society,
Series B, 36,111-147.
20. Terrell, G.R. (1998). The Gradient Statistic. Statatistic Department, Virginia
Polytechnic Institute and State University. Blacksburg, Virginia.
21. Terrell, G.R. (2003). A stabilized Lugannani-Rice formula. Department of
Statistics, BPI&SU, Blacksburg, Virginia.
Filling the Heligman and Pollard Curve to Mortality Data 283

22. Wang, S. (1995). One-step saddlepoint approximations for quantiles.


Computational Statistic and Data Analysis.
23. Wolf, C.A. and Summer, D.A. (2001). Are farm size distributions bimodal?
Evidence from kernel density estimates of dairy farm size distributions.
American Agricultural Economics Association.
24. Wu, J. and Wong, A.C.M. (2003). A note on determining the p-value of
Bartlett's test of homogeneity of variances.
Chapter 15
MEASURING THE EFFICIENCY OF THE SPANISH BANKING
SECTOR: SUPER-EFICIENCY AND PROFITABILITY

J. GOMEZ-GARCIA
Department of Quantitative Methods for Economics, University ofMurcia
Campus de Espinardo, s/n, Espinardo 30100 Murcia, Spain

J. SOLANA-IBANEZ
Department of Business and Management, Catholic University San Antonio ofMurcia
Campus de los Jeronimos, s/n, Guadalupe 30107 Murcia, Spain

J.C. GOMEZ GALLEGO


Department of Business and Management, Catholic University San Antonio ofMurcia
Campus de los Jeronimos, s/n, Guadalupe 30107 Murcia, Spain

We analyse the dependent relationship between technical efficiency and profitability of


commercial banks in Spain, using multivariate techniques, such as factorial, cluster and
discrimination analyses. Efficiency measurements are obtained by Data Envelopment
Analysis (DEA), incorporating size and management-related variables as inputs and
outputs. Efficiency and super-efficiency coefficients are obtained for each bank and
conclusions are made concerning the existence of differing levels of profitability
according to the efficiency level with which the banks are managed.

1. Introduction
The high degree of correlation between the behaviour of the economy and
the banking sector, together with the sector's role as financial intermediary,
Pastor [1], is ample reason for the continual interest in different aspects of the
banking system.
Traditionally, this kind of study has been approached through the use of
costs and profitability ratios, Pastor, Perez and Quesada [2], although more
recently these traditional techniques have tended to be replaced by the use of
econometric techniques that look at an institution from a global viewpoint that
considers the inputs used and outputs obtained, that is techniques that permit the
efficiency of an organization to be measured. One such technique is that known
as Data Envelopment Analysis (DEA), a non-parametric econometric technique

285
286 J. Gomez-Garcia, J. Solana-Ibanez and J.C. Gomez-Gallego

that permits the way in which a company is managed to be evaluated more


thoroughly.
From the end of the 1980's, there has been a growing interest in the
importance of X-type efficiency as opposed to scale efficiency in the banking
sector. The fact that several studies of the banking sector have shown that the
spread of mean costs was greater among banks of a given size than among banks
of different sizes, points to the greater importance of reducing X-type
inefficiencies rather than of attaining an optimal production size (economies of
scale) as a means of reducing costs. After this analysis of the relation between
efficiency and costs, interest turned to the analysis of the relation between
efficiency and the profitability of different banks.
Economics theory says that companies wishing to maximize profits must
produce at the minimum cost possible. In other words, obtaining the maximum
level of profits and, hence, attaining maximum profitability involves being
economically efficient. Berger [3] in the USA, Goldberg and Rai [4] in Europe
and Maudos and Pastor [5] in Spain, suggested that efficient banks are generally
more profitable than inefficient banks.
Berger and De Young [6] demonstrated that efficiency, as indicator of
management quality, influences the assignation of lendable funds between
clients. Since, according to Freixas [7], the rate with which clients fall into
arrears contributes to explaining the evolution of profitability in the banking
sector, the effect of efficiency on profitability not only influences the reduction
of costs but also has implications on the process of giving credit. Efficiency,
then, not only has an effect on profitability derived from the reduction of costs,
but also derived from the management quality that any efficient bank enjoys and
that manifests itself in many banking spheres.
This study represents an analysis of the efficiency and profitability of
Spanish banks. In so far as the measurements of efficiency is based on DEA, it
is closely interrelated with a research line looking at methods of analysing
global efficiency and the establishment of a ranking of efficiency. The study
focuses on the relation between the productive efficiency and profitability of
banks. For this reason, in the second part we describe the methodology of
estimating the productive efficiency of banks. In the third section, we shall
describe the way in_which banks were sampled and the statistical sources used.
In the fourth section, we shall relate the profitability of a company with its
super-efficiency and, lastly, we shall present our conclusions.

So called X-type inefficiencies are those due to errors in management and/or organization, and
include technical inefficiencies such as the allocative type, and differ from scale inefficiencies.
Measuring the Efficiency of the Spanish Banking Sector 287

2. Reach of the study


The data base used is that published by the Spanish Banking Association
(AEB) [8], in Spanish for 2002 and 2003, which provided ample information on
the different characteristics concerning the type and volume of activity of
Spanish banks. However, the wide-ranging specializations of many banks meant
that certain areas were not covered and led us to choose a group of 36 for which
complete information on the selected variables was available, table 1.
For each variable we estimated minimum, mean, typical deviation and
asymmetric coefficient. Of note is the different degree of representativeness of
the mean value of each variable since, as can be seen, the typical deviations took
on extreme values, which were very small or very large. The non-ratio variables
were widely dispersed and the asymmetric coefficient pointed to generally very
asymmetric distributions, except in the case of CROF, ICAT and ROE; in the
case of DPRA, the asymmetry was significantly negative.

Table 1. Descriptive statistics

Variables Code Unit Min Mean S.D. As.

Mean T. Assets. ATM E. 10' 104928.0 16383920.3 44228723.4 3.8

Cashiers CAJ Unit. .0 536.6 1086.8 2.9

Current Ace. CC Unit. 826.0 268718.5 504860.0 2.6

Credits CRD E. 10' 54540.0 10101139.3 22688793.9 3.6

Debts DEB E. 10' 7899.0 9424198.0 21553846.8 3.5

Employees EMP Unit. 26.0 2922.5 6294.74 3.5

Expl. Margin MEX E. 10' -1373.0 434417.9 1235077.7 3.7

Int. Margin MIN E. 10' 2227.0 614255.7 1702052.5 3.8

Net NET E. 10' 22841.0 1099687.3 3020618.6 4.4

Cards TAR Unit. .0 622977.3 1429731.1 3.7

Cashiers/Offic CAOF Unit. .00 .89 .65 .38

Cred./Offic CROF E. 10' -52.63 11.60 43.26 3.51


288 J. Gomez-Garcia, J. Solana-Ibdnez andJ.C. Gomez-Gallego

Table 1. (Continued)

Variables Code Unit Min Mean S.D. As.

DepVOffic DPOF E. 103 343.43 116919.07 442366.0 5.61

Dep./R.Aj DPRA % .51 .91 .08 -4.07

Empl./Offic EMOF Unit. 1.18 13.26 24.61 4.37

Cred./ATM ICAT % .09 .60 .26 -.45

Cred./Emp ICEM E. 103 371.02 10378.3 34920.2 5.23

Expl M./ATM MEATM % -1.31 2.08 1.97 1.53

Int. M./ATM MIATM % .32 3.41 2.35 1.59

Net /ATM NEATM % .02 .08 .08 4.47

ROE ROE % -.08 .17 .15 1.1

Card./SharHdr. TARAC Unit. .00 16777.2 68808.8 5.42

Card/Offic TAROF Unit .00 51456.01 270029.8 5.86

3. Methodology

3.1. DEA Origin and diffusion


Economics and Operational Research share many interests, one of the most
important being analysis of the production possibilities of a productive unit. The
definitive connection arose in 1978 from the work of Abraham Charnes,
William W. Cooper and Edward Rhodes [9] (CCR) entitled "Measuring the
Efficiency of Decision Making Units", published in the "European Journal of
Operations Research". The DEA model that they presented led to growing
popularity of the empirical use of lineal programming techniques for calculating
coefficients of efficiency; so much so that by 1999 the work of CCR had been
cited more than 700 times in SSCI.
The starting point was the seminal work of Michael James Farrell [10],
"The Measurement of Productive Efficiency", published in the "Journal of the
Royal Society" in 1957, where the concept of efficiency was first mooted.
Measuring the Efficiency of the Spanish Banking Sector 289

The most influential work related with such aspects of macroeconomics was
that of Solow [11] published in the "Review of Economics and Statistics" and
entitled "Technical change and the aggregate production function". At the same
time Farrell established the bases for studying efficiency and productivity at
microeconomic scale, putting forward two novel aspects: how to define
efficiency and productivity, and how to measure efficiency.
Faced witii the possibility of inefficiency, Farrell opted for the concept of
border_production as opposed to the mean efficiency underlying most of the
econometric literature to date on the production function. The new focus of
Farrell consisted of decomposing efficiency into technical and allocative
efficiency at individual production unit level. The radial contraction/expansion
connecting inefficient units with efficient units with respect to the production
function constitutes the base for measuring efficiency and is the true
contribution of Farrell.
Farrell proposed a measure of efficiency consisting of two components:
technical efficiency and assignative efficiency, both of which combine to
provide a measure of total economic efficiency. These measures assume that the
production function of efficient companies is known. Since this function is never
known in practice, as Farrell recognized, he proposed two possibilities:
obtaining a non-parametric function or a parametric function.
The first alternative gave rise to the models of estimating non-parametric
frontiers and was followed by Charnes, Cooper and Rhodes [9], and resulted in
an approach to DEA. A subsequent model gave rise to a great quantity of
research and was denominated FDH (free disposal hull) formulated in 1984 by
Deprins, Simar and Tulkens [12], and developed by Tulkens [13] in 1994. This
second pathway was followed by Afriat [14] and Aigner [15], resulting in two
approximations known as the determinist and stochastic frontier models.
An intermediate pathway comprising models that we might term models
that do not use production frontiers, is provided by index numbers, and their use
in measuring efficiency and productivity is indirect. They are used, rather, to
generate variables or data that can be used in the application of the DEA models
or in the estimation of stochastic frontiers, Solana [16].

3.2. Models
Since its genesis, Charnes et al [9] have developed a variety of DEA
models, both input and output oriented, depending on the existence of constant
or variable returns (in this last case, depending, too, on whether these are
growing or diminishing) and whether the inputs can or cannot be controlled,
290 J. Gomez-Garcia, J. Solana-Ibdnez and J.C. Gomez-Gallego

among other aspects. The first model we applied was that initially proposed by
Charnes et al [9] and known as CCR, after its authors. This model implies
returns on a constant scale and is input oriented. In accordance with Cooper et al
[17], the starting point is the traditional definition of efficiency (coefficient
between outputs and inputs) and the aim is, by means of lineal programming, to
obtain weights so that, the ratio between outputs and inputs can be maximized.
To calculate the efficiency of n units, n lineal programming problems must
be solved to obtain both the values of the weights (VJ) associated with the inputs
(xj), and the weights (ur) associated with the outputs (yr). Assuming m inputs
and s outputs, and transforming the fractional programming model into a lineal
programming problem, the input oriented CCR model is formulated as follows:
Max & = ulylo+u2y2o+... + usyso

s.a.
vxxlo + v2x2o+...+ vmxmo=\ (1)

"l^iy + "2^27 + - + usysj ^ v,xly + v2x2j +... + vmxmj j = 1,2,...,«


v,>0 (i = l,2,...,m)
ur > 0 (r = \,2,...,s

The output oriented lineal version is formulated as follows:


Min p = v l X l o + v 2 x 2 o + ... + v m x m o

s.a.
u
iyio + u 2y2o+- + u syso = 1 (2)
u
i y i j + u 2 y 2 j + - + UsySj ^ V l X l j + V 2 x 2 j + - + V m x m j j = l,2,...,n
Vj>0 (i = l,2,...,m)
ur>0 (r = l,2,...,s)

Given the lack of information on the form of the production frontier, we


have used models analogous to those in [1] and [2] but which permit variable
returns, and known as BCC, after its authors, Banker et al [18]. In this work, we
use the output oriented model (BCC-O), formulated as:

Min Z = £v,x/0+v0
Measuring the Efficiency of the Spanish Banking Sector 291

s.a.
Z u jyro = i
r

Iv,x,,-][XyrJ+vo>0 j = l,2,...,n (3)

v;>0 (i = l,2,...,m)
ur>0 (r = l,2,...,s)
v0 free
where v0 is the variable that permits us to identify the nature of the scale returns
to scale. To obtain a more complete ranking, efficient units are classified by
applying the MDEA models proposed by Andersen, P. and Petersen, N. C. [19].

3.3. Efficiency of banking management


As explained by Thanassoulis [20], banking institutions have two activities
whose efficiency can be analysed: production and intermediation.
The efficiency of production refers to how banks are used: labour, capital,
space, service accounts, etc., all of which is reflected in a wide range of
transactions such as the search for resources, the process of advancing credit and
other income-generating activities.
The choice of inputs and outputs is a controversial subject that presents
several problems, since the products of banks are immaterial, heterogeneous and
jointly produced. Furthermore, this heterogeneity is continually changing: not
only do new products appear and disappear, but the proportions of the
components of the output vector also change.
Two basic solutions have been proposed to resolve this problem, Pastor
[21]:
• The first involves measuring output by adding given sections of the balance
sheet of the institutions (deposits, total assets, loans, etc.). This is known as a
monetary focus, according to which the total assets and/or deposits are
magnitudes representative of financial services and payments, respectively.
This approach has the advantage of simplicity and the availability of relevant
data, so that it is frequently used in studies of economics of scale.
• The second solution, known as the physical or non-monetary focus, equates
banking activity with the productive processes of industrial companies by
using magnitudes, such as the number of loans and deposits, etc., which are
the equivalent of the number of service units offered. This approach is very
suitable for studying some aspects associated with cost-size relations. The
292 J. Gomez-Garcia, J. Solana-Ibdnez and J.C. Gomez-Gallego

lack of information makes it difficult to apply, at least in the case of Spanish


banks.
In this work we consider a bank as a company producing a flow of services,
which involves the consumption of inputs. This flow of services, associated with
active and passive items, will constitute the measurement of ideal output.
The choice of volume of credits and loans as basic representative measures
of input and output supposes that these factors provide the clients of assets and
liabilities with greater fluidity of resources and services. The conceptualization
of a bank as a company that produces services, and the use of proxy variables,
such as deposits and loans, normally associated to the provision of such services,
obliges us to consider an additional output that is closely related with the
conditions of providing these services: the number of current accounts.
Taking into consideration the cited literature, and the information available
in the case under study, the variable selected as inputs and outputs are the
following: Current Accounts, Intermediation Margin, Net Profit, Debits.

4. Results
When a wide number of correlated variables are available for a given
population, factorial analysis (FA) permits the information contained in these
variable to be synthesized into a lower number of variables (factors).
After typifying the original variables and demonstrating the existence of a
significant correlation, Barlett's sphericity test and the statistics of Kaiser-
Meyer-Olkin (KMO) were applied. These gave a Chi-squared value of 1283.59,
with 120 g.l., for the Bartlett test and a KMO value for the sample suitability of
0.637, with an associated significance level of 0.000. Next, the factorial axes
were extracted by principal components analysis. Lastly, the axes chosen were
rotated by Varimax to facilitate understanding.
Of the original variables observed, those related with size, profitability,
management and risk were selected, Moya and Caballer [22]. In this way the
following fifteen variables were included: Current Accounts, Debits, Time
deposits, ATM, Intermediation Margin, Intermediation Margin on ATM, ROE,
Operating Margin, Operating Margin over ATM, Credit investment over
employee, Credit Investment over ATM, Net profit over AMT, Debits per
employee, Deposits over debt capital.
Applying the FA procedure, four factorial axes were obtained which
explained 91.27% of the global variance. These were chosen bearing in mind the
value of the autovalues of the characteristic equation, in accordance with the
Measuring the Efficiency of the Spanish Banking Sector 293

criterion of the arithmetic mean. From the matrix of rotated components, the
factorial axes were defined as follows:
Factor 1: saturated by C/C (.900), CRD (.989), DEB (.995), IMPL (.984)
ATM (.986), ME (.965), MI (.967). Factor 2: saturated by MEATM (0.983),
MIATM (.942), ROE (.623), BATM (.932). Factor 3: saturated by ICEMP
(.970), DBEMP (.986). Factor 4: saturated by ICATM (.729), DPRA (.820).
Factor 1, with an associated autovalue of 7.09, explains 47.26% of the total
variance; Factor 2, with an associated autovalue of 3.21, explains 21.41% of the
total variance; Factor 3, with an associated autovalue of 1.37, explains 13.28%
of the total variance, and Factor 4, with a characteristic root of 1.37, explains
9.18% of the total variance.
From the correlations between the factorial axes and the original variables,
we have interpreted and, consequently, denominated the factorial axes as
follows: Factor 1: Size; Factor 2: Profitability; Factor 3: Management And
Factor 4: Risk.
Applying the BCC-O model, efficiency coefficients were obtained which
situated 14 financial institutions in the frontier, while the remaining 22 had a
different percentage of technical inefficiencies. The MDEA-0 model was used
to establish a complete ranking, and the corresponding coefficients of super-
efficiency were obtained for each bank.

4.1. Analysis of the relation efficiency-profitability


Appling the Cluster Analysis procedure to the efficiency and super-
efficiency distributions, the banks were grouped into homogeneous
conglomerates to study the profitability of these groups. In this work, we apply
the hierarchical grouping method, whereby the groups themselves are paired, so
that there is a lower increase in the total distances. The different cluster levels
are established by taking into account the value of intra-group variance. Three
highly homogeneous groups were formed since the coefficient of variability did
not exceed 20%. To test the suitability of the grouping obtained in the cluster
analysis, we applied discrimination analysis, obtaining the discriminant function
from the values of efficiency and super-efficiency. The results confirmed that
classification was correct in 100% of cases.
Table 2 contains information on the financial institutions of each cluster, the
mean values and typical deviations obtained for the variables, profitability and
super-efficiency. Also contains the coefficients of efficiency for each bank, and
super efficiency if necessary.
294 J. Gomez-Garcia, J. Solana-Ibanez andJ.C. Gomez-Gallego

Table 2. Cluster, mean values, typical deviations, coefficients of efficiency-super efficiency

Banks
Clusters Efficiency Profitability
( Efficiency ) - (* Super-efficiency)

Sabad. BPr. (500)* Patagon (451.8)*


1 Mean: 469.96 Mean: -0.85
Popular BP. (500)* Cred.Local.(500)*
N=5 D.T.: 45.33 D.T.: 0.61
Pueyo (398.9)*

Cooperat.E. (47.3) De Pyme (36.9)

Simeon (56.1) Urquijo (58.4)

Halifax (20.8) Espirito S.(55.4)

Barclays (78.2) Bankoa (38.9)

Bancofar (11.29) Deutsche (81.2)

2 Mean: 56.10 Mean: 0.03 Gallego (55.7) Sabadell (80.6)

N=22 D.T.: 19.89 D.T.: 0.94 Pastor (71.7) Guipuzc. (82.3)

Citybank(29.1) Atlantico (83.8)

Vasconia (55.9) March (58.2)

Castilla (50.6) C.Balear (64.8)

Galicia (53.7) Andalucia (62.5)

Fibanc (174.3)* Bankinter(171.0)*

BBVA (191.2)* E.Credito(220.2)*

3 Mean: 205.05 Mean: 0.40 BSCH (154.1)* Valencia (204.0)*

N=9 D.T.: 58.70 D.T.: 1.10 Popular Esp (134)* Banif (305.2)*

Santan.C.F.(290)*

ANOVA was applied to ascertain whether the three groups defined by


super-efficiency differ significantly as regards their levels of profitability.
Significantly different (p=0.043) levels of profitability were found. Applying
Bonferroni's test, differences in profitability were seen between groups 1 and 3
(p=0.038), while a p-value of 0.056 was observed for groups 1 and 2, and a
p-value of 0.356 for groups 2 and 3.
Measuring the Efficiency of the Spanish Banking Sector 295

5. Conclusions
Factorial analysis provided four factors that encapsulated all the
characteristics of Spanish banks; size, management, profitability and risk. In this
way, each bank is represented in a tetra-dimensional space by the vector which
components are the scores of the bank on each of the four factorial axes.
Applying DEA analysis, the BCC-0 model and MDEA, we obtained an
efficiency ranking for 36 financial institutions. In 22 banks we observed a
percentage of technical inefficiency. Changes in the way of managing these
banks could bring them to the production frontier.
Applying cluster analysis to the super-efficiency scores enabled us to make
homogeneous groups of the banks analysed. In this way we obtained three
groups of minimal intra-group variance.
As regards profitability, we conclude that there are significant differences
between the groups of banks established from the measures of super-efficiency.
Significantly different (p=0.043) levels of profitability were found. Group 3,
with high level of medium efficiency (205.05), presents the highest level of
profitability, while, group 1, (469.96), shows quite low profitability. Despite the
significant differences, we need to take into account other characteristics such as
specialization, in order to explain these findings. This could be the topic of
future research.

References
1. J.M. Pastor. (1998). Gestion del Riesgo y Eficiencia en los Bancos y Cajas
de Ahorros, Serie Documentos de Trabajo, No 142/1998. Fundacion de
Cajas de Ahorro Confederadas para la Investigation Economica y Social
Espana.
2. J.M. Pastor, F. Perez and J. Quesada. (1995). Are European Banks Equally
Efficient? Revue de la Banque, June, 324—33.
3. A.N Berger. (1995). The Profit-Relationship in Banking - Tests of Market-
Power and Efficient-Structure Hypotheses. Journal of Money, Credit and
Banking, 27(2), 405-431.
4. L.G. Goldberg and A. Rai. (1996). The structure-performance relationship
for European banking. Journal of Banking and Finance, 20, 745-771.
5. J. Maudos and J.M. Pastor. (1998). La eficiencia del sistema bancario
espafiol en el contexto de la Union Europea. Papeles de Economia
Espanola, 84/85, 155-168.
6. N. Berger and R. De Young. (1997). Problem Loans and Cost Efficiency in
Commercial Banks. Journal of Banking & Finance, 21(6), 849-870.
296 J. Gomez-Garcia, J. Solana-Ibdnez and J.C. Gomez-Gallego

1. X. Freixas, J. De Hevia and A. Inurrieta. (1993). Componentes


macroeconomicos de la morosidad bancaria: un modelo empirico para el
caso espafiol. Moneda y credito, 99, 125-156.
8. Anuario Estadistico de la Banca en Espafla. (2003). Asociacion espanola de
banca.
9. A. Charnes, W.W. Cooper and E. Rhodes. (1978). Measuring the efficiency
of decision-making units. European Journal of Operational Research, 2,
429-444.
10. M. J. Farrell. (1957). The measurement of productive efficiency. Journal of
the Royal Statistical Society, Series A, 120(111), 253-281.
11. R.A. Solow. (1957). Technical chance and the aggregate production
function. Review of Economics and Statistical, 39, 312-320.
12. D. Deprins, L. Simar and H. Tulkens. (1984). Measuring labour efficiency
in post offices. The performance of public enterprises: Concepts and
measurement, Marchand, M., Pierre Pestieau and Henry Tulkens, ed., 243-
267 Amsterdam, North Holland.
13. H. Tulkens. (1994). On FDH analysis: Some methodological issues and
applications to retail banking, courts and urban transit. Journal of
Productivity Analysis, 4(1-2), 183-210.
14. S. Afriat. (1972). Efficiency estimation of production functions. Economic
Review, 13(3), 568-598.
15. D.J. Aigner, C.A. Knox Lovell and P. Schmidt. (1997). Formulation and
estimation of stochastic frontier production function models. Journal of
Econometrics, 6(1), 21-37.
16. J. Solana. (2003). Modelos DEA para la evaluation global de la eficiencia
tecnica. Obtencion de un ranking de unidades productivas. Tesis doctoral.
UCAM.
17. W.W. Cooper, L.M. Seiford and K. Tone. (2000). Data envelopment
analysis: A comprehensive text with models, applications, references and
DEA-solver software. Boston: Kluwer.
18. R.D. Banker, A. Charnes and W.W. Cooper. (1994). Some models for
estimating technical and scale inefficiencies in data envelopment analysis.
Management Science, 30, 1078-1092.
19. P. Andersen and N.C. Petersen . (1993). A procedure for ranking efficient
units in Data Envelopment Analysis. Management Science, 39(10), 1261-
1294.
20. E. Thanassoulis. (1999). Data Envelopment Analysis and its use in banking.
Interfaces, May/June 29, Ed. 3.
21. J.M. Pastor. (1998). Diferentes metodologias para el analisis de la eficiencia
de los bancos y cajas de ahorro espafioles. Departament de Analisi
Economica Universitat de Valencia.
22. Moya and V. Caballer. (1994). Un modelo analogico bursatil para la
valoracion de cajas de ahorro. En Hernandez Mogollon, R.M. (Ed.): La
reconstruction de la empresa en el nuevo orden economico, 287-297.
.

DISTRIBUTION
MODELS THEORY
Distribution Models Theory is a revised edition
of papers specially selected by the Scientific
Committee for the Fifth Workshop of Spanish
Scientific Association of Applied Economy on
Distribution Models Theory held in Granada
(Spain) in September 2005. The contributions
offer a must-have point of reference on models
theory.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy