Mathematics Sem IV
Mathematics Sem IV
MATHEMATICS
SEMESTER IV
May, 2020
All rights reserved. No part of this work should be reproduced in any form without the permission in writing
form the Directorate of Open and Distance Learning, University of Kalynai.
Director’s Message
Satisfying the varied needs of distance learners, overcoming the obstacle of distance and reaching the un-
reached students are the threefold functions catered by Open and Distance Learning (ODL) systems. The
onus lies on writers, editors, production professionals and other personnel involved in the process to overcome
the challenges inherent to curriculum design and production of relevant Self Learning Materials (SLMs). At
the University of Kalyani a dedicated team under the able guidance of the Hon’ble Vice-Chancellor has in-
vested its best efforts, professionally and in keeping with the demands of Post Graduate CBCS Programmes
in Distance Mode to devise a self-sufficient curriculum for each course offered by the Directorate of Open and
Distance Learning (DODL), University of Kalyani.
Development of printed SLMs for students admitted to the DODL within a limited time to cater to the
academic requirements of the Course as per standards set by Distance Education Bureau of the University
Grants Commission, New Delhi, India under Open and Distance Mode UGC Regulations, 2017 had been our
endeavour. We are happy to have achieved our goal.
Utmost care and precision have been ensured in the development of the SLMs, making them useful to the
learners, besides avoiding errors as far as practicable. Further suggestions from the stakeholders in this would
be welcome.
During the production-process of the SLMs, the team continuously received positive stimulations and feed-
back from Professor (Dr.) Sankar Kumar Ghosh, Hon’ble Vice-Chancellor, University of Kalyani, who kindly
accorded directions, encouragements and suggestions, offered constructive criticism to develop it within
proper requirements. We gracefully, acknowledge his inspiration and guidance.
Sincere gratitude is due to the respective chairpersons as weel as each and every member of PGBOS
(DODL), University of Kalyani, Heartfelt thanks is also due to the Course Writers-faculty members at the
DODL, subject-experts serving at University Post Graduate departments and also to the authors and aca-
demicians whose academic contributions have enriched the SLMs. We humbly acknowledge their valuable
academic contributions. I would especially like to convey gratitude to all other University dignitaries and
personnel involved either at the conceptual or operational level of the DODL of University of Kalyani.
Their persistent and co-ordinated efforts have resulted in the compilation of comprehensive, learner-friendly,
flexible texts that meet the curriculum requirements of the Post Graduate Programme through Distance Mode.
Self Learning Materials (SLMs) have been published by the Directorate of Open and Distance Learning,
University of Kalyani, Kalyani-741235, West Bengal and all the copyright reserved for University of Kalyani.
No part of this work should be reproduced in any from without permission in writing from the appropriate
authority of the University of Kalyani.
All the Self Learning Materials are self writing and collected from e-book, journals and websites.
Director
University of Kalyani
Core Paper
MATC 4.1
Marks : 100 (SEE : 80; IA : 20)
• Unit 1: Definition of graphs, circuits, cycles, Subgraphs, induced subgraphs, degree of a vertex, Con-
nectivity.
• Unit 2: Trees, Euler’s formula for connected graphs, Spanning trees, Complete and complete bipartite
graphs.
• Unit 3: Planar graphs and their properties, Fundamental cut set and cycles. Matrix representation of
graphs, Kuratowski’s theorem (statement only) and its use, Chromatic index, chromatic numbers and
stability numbers.
• Unit 4: Lattices as partial ordered sets. Their properties, Lattices as algebraic system. Sublattices. Di-
rect products and Homomorphism. Some special Lattices e.g. complete complemented and distributed
lattices.
• Unit 5: Boolean Algebra: Basic Definitions, Duality, Basic theorems, Boolean algebra as lattices.
• Unit 6: Boolean Algebra: Boolean functions, Sum and Product of Boolean algebra, Minimal Boolean
Expressions, Prime implicants Propositions and Truth tables.
• Unit 7: Boolean Algebra: Logic gates and circuits, Applications of Boolean Algebra to Switching
theory (using AND, OR, & NOT gates), Karnaugh Map method.
• Unit 8: Combinatorics : Introduction, Basic counting principles, Permutation and combination, pigeon-
hole principle, Recurrence relations and generating functions.
• Unit 9: Grammar and Language : Introduction, Alphabets, Words, Free semi group, Languages, Regu-
lar expression and regular languages. Finite Automata (FA). Grammars.
• Unit 10: Finite State Machine. Non-deterministic and deterministic FA. Push Down Automation
(PDA). Equivalence of PDAs and Context Free Languages (CFLs), Computable Functions.
• Unit 11: Fields and σ-fields of events. Probability as a measure. Random variables. Probability
distribution.
• Unit 12: Expectation. Moments. Moment inequalities, Characteristic function. Convergence of se-
quence of random variables-weak convergence, strong convergence and convergence in distribution,
continuity theorem for characteristic functions. Weak and strong law of large numbers. Central Limit
Theorem.
• Unit 13: Definition and classification of stochastic processes. Markov chains with finite and countable
state space, classification of states.
• Unit 14: Statistical Inference, Estimation of Parameters, Minimum Variance Unbiased Estimator, Method
of Maximum Likelihood for Estimation of a parameter.
• Unit 15: Interval estimation, Method for finding confidence intervals, Statistical hypothesis, Level of
significance; Power of the test.
• Unit 16: Analysis of variance, One factor experiments, Linear mathematical model for ANOVA.
Contents
Director’s Message
1 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Directed Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Simple Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Subgraph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.6 Walks, Path, Cycles, Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.7 Few Probable Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2 15
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Bipartite graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Special Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.1 Euler Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5 Spanning Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.6 Few Probable Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3 25
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Matrix Representation of a Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 Isomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4 Planar Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.5 Graph Coloring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.6 Few Probable Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4 38
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2 Partially Ordered Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2.1 Digraphs of Posets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3 Lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.4 Sublattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.5 Direct Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.6 Few Probable Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
CONTENTS
5 49
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.2 Boolean Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.3 Boolean Algebra as Lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.4 Few Probable Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6 57
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.2 Disjunctive Normal Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.3 Conjunctive Normal Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.4 Propositions and definitions of symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.5 Truth tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.6 Few Probable Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
7 69
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
7.2 Switching Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.2.1 Simplification of circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
7.3 Logical Circuit elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
7.4 Karnaugh Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
7.5 Few Probable Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
8 79
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
8.2 Basic Counting principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
8.3 Mathematical Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
8.3.1 Factorial Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
8.3.2 Binomial Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
8.4 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
8.4.1 Permutations with Repetitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
8.5 Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
8.6 Pigeonhole Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
8.7 Inclusion-Exclusion Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
8.8 Tree Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
8.9 Few Probable Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
9 90
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
9.2 Alphabet, Words, Free Semigroup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
9.3 Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
9.3.1 Operations on Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
9.4 Regular Expressions and Regular Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
9.5 Finite State Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
9.5.1 State Diagram of an Automaton M . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
9.6 Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
9.6.1 Language L(G) of a Grammar G . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
9.7 Few Probable Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
CONTENTS
10 100
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
10.2 Finite State Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
10.2.1 State Table and State Diagram of a Finite State Machine . . . . . . . . . . . . . . . . 101
10.3 Turing Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
10.3.1 Computing with a Turing Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
10.4 Computable Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
10.4.1 Functions of Several Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
10.5 Few Probable Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
11 109
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
11.2 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
11.3 Discrete Probability Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
11.4 Distribution Functions for Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 111
11.5 Continuous Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
11.6 Joint Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
11.6.1 Discrete Case: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
11.7 Change of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
11.7.1 Discrete Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
11.8 Convolutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
12 117
12.1 Mathematical Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
12.2 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
12.3 Moment Generating Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
12.4 Characteristic Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
12.5 Chebyshev’s Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
12.6 Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
12.7 Special Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
12.7.1 The Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
12.7.2 The Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
12.7.3 Relation Between Binomial and Normal Distributions . . . . . . . . . . . . . . . . . 124
12.7.4 The Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
12.7.5 Relation Between the Poisson and Normal Distribution . . . . . . . . . . . . . . . . . 124
12.8 The Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
13 126
13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
13.2 Specification of Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
13.3 Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
13.4 Transition Probabilities and Transition Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 129
13.5 Classification of States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
14 137
14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
14.2 Estimation of Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
14.3 Unbiasedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
14.4 Minimum-Variance Unbiased (M.V.U.) Estimator . . . . . . . . . . . . . . . . . . . . . . . . 140
CONTENTS
15 147
15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
15.2 Interval Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
15.3 Method for finding confidence intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
15.4 Confidence interval for some special cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
15.5 Statistical Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
15.6 Null Hypothesis and Alternative Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
15.7 Critical Region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
15.8 Two Types of Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
15.9 Level of Significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
15.10Power of the test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
16 156
16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
16.2 One-Way Classification or One-Factor Experiments . . . . . . . . . . . . . . . . . . . . . . . 156
16.3 Total Variation, Variation Within Treatments, Variation Between Treatments . . . . . . . . . . 157
16.4 Shortcut Methods for Obtaining Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
16.5 Linear Mathematical Model for Analysis of Variance . . . . . . . . . . . . . . . . . . . . . . 158
16.6 Expected Values of the Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
16.7 Distributions of the Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
16.8 The F Test for the Null Hypothesis of Equal Means . . . . . . . . . . . . . . . . . . . . . . . 159
16.9 Analysis of Variance Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
16.10Modifications for Unequal Number of Observations . . . . . . . . . . . . . . . . . . . . . . . 162
Unit 1
Course Structure
• Definition of graphs, circuits, cycles
• Connectivity
1.1 Introduction
In the time of Euler, in the town of Königsberg in Prussia, there was a river containing two islands. The islands
were connected to the banks of the river by seven bridges (1.1.1). The bridges were very beautiful, and on
their days off, townspeople would spend time walking over the bridges (see figure below). As time passed,
a question arose: was it possible to plan a walk so that you cross each bridge once and only once? This is
known as the K¨nigsberg seven bridge problem. In the year 1736, Euler represented the problem as a graph
and answered the question in negative. This marked the birth of graph theory.
Since then it has blossomed in to a powerful tool used in nearly every branch of science and is currently
an active area of mathematics research. Over the past 200 years, graph theory has been used in a variety of
applications. Graphs are used to model electric circuits, chemical compounds, highway maps, and many more.
They are also used in the analysis of electrical circuits, finding the shortest route, project planning, linguistics,
genetics and social science.
1
2 UNIT 1.
Objectives
After reading this unit, you will be able to
• define subgraphs
1.2 Graphs
Definition 1.2.1. A graph G is a triple (V, E, g), where
3. g is a function, called the incidence function, that assigns to each edge, e ∈ E a one element subset
{v}, or a two-element subset {u, v}, where u, v are vertices.
For convenience, we will write g(e) = {u, v}, where v and u may be same in which case we write g(e) =
{v}.
Let G = (V, E, g) be a graph. Suppose e be an edge of this graph. Then there are vertices u and v such
that g(e) = {u, v}; the vertices u and v are called the end vertices or the endpoints of the vertex e. When a
vertex v is an endpoint of an edge e, we say that e is incident with vertex v and v is incident with the edge e.
Two vertices are said to be adjacent if there exists an edge e ∈ E such that g(e) = {u, v}. Two edges e and f
are said to be adjacent if they have a common endpoint, that is, if g(e) = {u, v} and g(f ) = {v, w}. If e is an
edge such that g(e) = {u, v} such that u = v, that is, g(e) = {u}, then e is an edge from u to itself, or u is
adjacent to itself and such an edge e is called a loop on the vertex u.
From now on, we will simply write the graph G = (V, E, g) as G.
g(e) = {a, b}, g(f ) = {b, c}, g(h) = {c, d}, g(i) = {d, a}, g(j) = {d, b}.
Thus, G = (V, E, g) is a graph. We can also write the above definition of g as follows:
e f h i j k
{a, b} {b, c} {c, d} {d, a} {d, b} {d, b}
Such a representation of the incidence function g is the incidence table whose columns are indexed by the
edges. The vertices adjacent to an edge are placed in the second row below the edge.
In this example, we see that the edge e is incident on the two vertices a and b. Thus, the vertex a and b are
adjacent. Similarly, we see that the edges e and f are adjacent since the vertex b is common for both.
1.2. GRAPHS 3
The set of vertices and the set of edges of a graph are finite. Thus, one of the features that make the study
of graphs easy and interesting is that they can be represented pictorially. That is, the corresponding diagram
for a graph helps us to visualize the facts easily. If we represent the graph in the above example pictorially,
then we get something as depicted in the figure 1.2.1.
The incidence function need not be one-to-one. There may be more than one edge having the same end-
points. Such edges are called parallel edges. We formally define parallel edges as follows.
Definition 1.2.3. Let G = (V, E, g) be a graph. Two edges e and f are said to be parallel if g(e) = g(f ) =
{u, v} for u, v ∈ V .
In the previous example, the edges j and k are parallel edges since g(j) = g(k) = {d, b}. This can easily
be seen from the figure.
Definition 1.2.4. Let G be a graph and v be a vertex in G. We call v as isolated vertex if it is not incident with
any edge, or, v is not an endpoint of any edge.
Definition 1.2.5. Let G be a graph and v be a vertex in G. Then the degree of v is defined as the number of
edges incident with v. It is written as deg(v) or d(v). By convention, it is considered that each loop contributes
2 to the degree of a vertex.
Note that for an isolated vertex v, we will always have d(v) = 0. In fact, this is a necessary and sufficient
condition for a vertex to be isolated.
Example 1.2.6. G = (V, E, g) is a graph (see figure 1.2.2 where V = {A, B, C, D} and E = {e, f, h, i, j},
where g is defined as
e f h i j
{A, B} {B, C} {C, B} {B, A} {A, A}
Then, we can see that D is an isolated vertex. Also, d(A) = 4, d(B) = 4, d(C) = 2 and d(D) = 0. e and i
are parallel edges and f and h are also so. Notice that g(A) = g(i) irrespective of the order in which A and
B are written in the incidence table of g. But it is not the case always (as we will study in case of the directed
graphs). These graphs that we are studying now are also called undirected graphs (or simply, graphs).
4 UNIT 1.
Exercise 1.2.7. Represent the following graphs pictorially and find the degree of each of its vertices. Also
state the parallel vertices and loops, if any.
1. V = {v1 , v2 , . . . , v7 } and E = {e1 , e2 , . . . , e7 } where g is defined as
e1 e2 e3 e4 e5 e6 e7
{v1 , v2 } {v1 , v2 } {v4 , v3 } {v6 , v3 } {v2 , v4 } {v6 , v3 } {v6 , v3 }
e1 e2 e3
{v1 , v2 } {v3 , v3 } {v4 , v3 }
The graphs in which all the vertices are of the same degree are called the regular graphs. The two examples
of the graphs we have seen so far are not regular graphs (verify it for example 1.2.2).
Definition 1.2.8. Let G be a graph and k be a non-negative integer. Then G is called a k-regular graph if the
degree of each vertex of G is k.
An interesting k regular graph is the Petersen 3-regular graph as shown in the figure.
Definition 1.2.9. Let G be a graph and v be a vertex of G. v is called an even degree vertex if d(v) is an even
number. Similarly, v is odd degree vertex if d(v) is odd.
1.2. GRAPHS 5
Clearly, every graph has a unique degree sequence. But, we can construct completely different graphs
having the same degree sequence.
Exercise 1.2.11. 1. State the even and odd vertices of the graphs in exercise 1.2.7. Also find the degree
sequence of them.
Consider the degree sequence of the graph in 1.2.2 which is, 2, 2, 4, 4 and adding them gives (2+2+4+4 =
)12, which is an even number. In fact, the sum of the degrees of all the vertices is always an even number
which is given in the following theorem due to Euler.
Theorem 1.2.12. The sum of the degrees of all the vertices of a graph is twice the number of edges.
Proof. Let G be a graph with n edges and m vertices, say v1 , v2 , . . . , vm . We want to determine
Now the degree, d(vi ), of vi is the number of edges incident with vi . Each edge e is either a loop or incident
with two distinct vertices. If e is a loop on a vertex v, then e contributes 2 to the degree of v. On the other
hand, if e is incident with two distinct vertices v and w, then e contributes 1 to the degree of each vertex. Thus
we find that when we compute the sum of the degrees, each edge contributes 2 to the sum. Because there are
n edges, the total contribution to the above sum is 2n. Hence
Corollary 1.2.13. The sum of the degrees of all the vertices of a graph is an even integer.
Proof. Since 2n is an even integer, the corollary follows from the previous theorem.
Proof. Suppose a graph G has k odd vertices, v1 , v2 , . . . , vk , and t even degree vertices, u1 , u2 , . . . , ut . Thus,
by the above corollary,
where n is the number of edges. Because each d(uj ) is even, it follows that d(u1 ) + d(u2 ) + · · · + d(ut )
is an even integer. Also, 2n is even. Hence, d(v1 ) + d(v2 ) + · · · + d(vk ) must also be even. Now, the
sum of odd number of odd integers is an odd integer. Because each number d(vi ) is an odd number and
d(v1 ) + d(v2 ) + · · · + d(vk ) is even, it follows that the number k cannot be odd. So k is even and this
completes the proof.
6 UNIT 1.
3. g : E → V × V is a function, that assigns to each edge, e ∈ E an ordered pair (u, v), where u, v are
vertices (u and v may be same).
We can represent a digraph pictorially. The only difference between the representation of graph and digraph
is in the directed edges which are drawn with arrows representing the starting and terminating vertices.
If g(e) = (u, v), then u is called the starting vertex and v is called the terminating vertex of the arc e. The
in-degree of a vertex v is the number of arcs with v as the terminating vertex and the out-degree of v is the
number of arcs with v as the starting vertex. In computing in-degree and out-degree of a vertex, we assume
that each loop contributes 1 to the in-degree and 1 to the out-degree of v.
Theorem 1.3.2. In any digraph G = (V, E, g), the following three numbers are equal:
Proof. The proof is similar to that of theorem 1.2.12. We just consider the fact that each arc e with starting
vertex u and terminating vertex u contributes 1 to the out-degree and 1 to the in-vertex of v. The details are
left as exercise.
Example 1.3.3. Let G be a digraph such that V = {a, b, c, d}, E = {e, f, h} and g(e) = (a, a), g(f ) = (b, c)
and g(h) = (b, d). The diagram is as follows:
The in-degrees of a, b, c and d are 1, 0, 1 and 1 respectively and the out-degrees are 1, 2, 0 and 0 respectively.
Then, sum of the in-degrees of all the vertices=sum of the out-degrees of all the vertices=number of arcs=3.
1.4. SIMPLE GRAPHS 7
Theorem 1.4.2. Let G be a simple graph with at least two vertices. Then G has at least two vertices of same
degree.
Proof. Let G be a simple graph with n ≥ 2 vertices. G has no loops or parallel edges. Thus, the degree of a
vertex v is the same as the number of vertices adjacent to it. The graph G has n vertices. Thus, a vertex v has
at most n − 1 adjacent vertices, because v is not adjacent to itself. Hence, for any vertex v, the degree of v is
one of integers: 0, 1, 2, . . . , n − 1.
We now show that if there exists a vertex v such that d(v) = 0, then for each vertex u of G, d(u) < n − 1.
On the contrary, suppose that in G, v is a vertex with degree 0 and u is a vertex with degree n − 1. Then v is
an isolated vertex and u has n − 1 adjacent vertices. Because G is a simple graph, u is not adjacent to itself.
From this and the fact that G is simple and d(u) = n − 1, it follows that every vertex of G other than u is
adjacent to u. This implies that v is adjacent to u, which is a contradiction since v is an isolated vertex. This
proves our claim.
In a similar manner, we can prove that if there exists a vertex v in G such that the degree of v is n − 1, then
for each vertex u in G, d(u) > 0.
We now conclude that the degree of all the vertices of G are either in the set {0, 1, 2, . . . , n − 2} or in the
set {1, 2, . . . , n − 1}.
Let v1 , v2 , . . . , vn be the n vertices of G. Then, either for all of i = 1, 2, . . . , n, d(vi ) ∈ {0, 1, 2, . . . , n − 2}
or d(vi ) ∈ {1, 2, . . . , n − 1}. Thus, by the pigeonhole principle, there exists i and j, 1 ≤ i ≤ n 1 ≤ j ≤ n,
i 6= j, such that d(vi ) = d(vj ). Hence there are atleast two vertices of same degree.
Remark 1.4.3. The converse of the above theorem is not true in general. For example, a and c have equal
degree in example 1.2.2, but the graph G is not simple.
Definition 1.4.4. A simple graph with n vertices in which there is an edge between every pair of distinct
vertices is called a complete graph on n vertices. This is denoted by Kn .
Proof. Let G be a complete graph with n vertices. Then G is a simple graph such that there exists an edge
between any two distinct vertices. Hence, for any vertex v of G, each of the remaining n − 1 vertices is
adjacent to v. Hence the degree of each vertex is n − 1. Also, since G has n vertices, so the sum of the degree
of all the vertices is n(n − 1). We know that the sum of the degree of all the vertices is 2 times the number of
edges. Let the number of edges be m. So, we have, n(n − 1) = 2m and thus, we get,
n(n − 1)
m= .
2
1.5 Subgraph
Consider the graph G = (V, E, g) and H = (V1 , E1 , g1 ) such that V = {A, B, C, D, E, F }, E = {a, b, c, d, e,
f, h} and V1 = {A, B, C, D, E}, E1 = {a, b, d} and g and g1 are as shown in the figure. It is worthy to note
that V1 ⊂ V and E1 ⊂ E. Also, g1 is the function g restricted over E1 . Such a graph H is called a subgraph
of G. We will now formally define subgraphs.
Remark 1.5.2. Let G = (V, E) be a graph and H = (V1 , E1 ) be a subgraph of G. From the previous
definition, it follows that if e ∈ E1 , and u, v are the end vertices of e in G, then u, v ∈ V1 .
1.6. WALKS, PATH, CYCLES, CIRCUITS 9
Let G be a graph with vertex set V and edge set E. Suppose that V contains more than one vertex. Then
for any vertex v ∈ V , G \ {v} denotes the subgraph whose vertex set is V1 = V \ {v} and the edge set is
E1 = {e ∈ E| v is not an end vertex of e}. Then G \ {v} is called a subgraph obtained from G by deleting
the vertex v.
Let e ∈ E, and G \ {e} denote the subgraph whose edge set is E \ {e} and the vertex set is V1 = V . Then
G \ {e} is the subgraph obtained by deleting the edge e.
Remark 1.5.3. G\{v} is obtained by deleting the vertex v and at the same time deleting all the edges incident
with v. However, the graph G \ {e} is obtained from G by deleting only the edge e without deleting any of
the vertices of G.
2. Draw a graph having the following properties and explain why no such graph exists:
3. Find three subgraphs of G in the figure 1.5.1 with at least four vertices and six edges:
4. How many vertices are there in a graph with 20 edges if each vertex is of degree 5?
6. Does there exist a graph with five edges and degree sequence 1, 2, 3, 4?
beginning with vertex u, called the initial vertex, and ending with vertex v, called the terminal vertex, in which
vi and vi+1 are endpoints of edge ei , for i = 1, 2, . . . , n.
10 UNIT 1.
Figure 1.5.1
Definition 1.6.2. Let u and v be two vertices in a digraph G. A directed walk from u to v in G, is an alternating
sequence of n + 1 vertices and n arcs of G
beginning with vertex u and ending with vertex v, in which each ei is an arc from vi to vi+1 for i = 1, 2, . . . , n.
Definition 1.6.3. The length of a walk(or a directed walk) is the total number of occurrences of edges(or, arcs)
in the walk(or, directed walk). A walk or directed walk of length zero is only a vertex.
A walk (or, directed walk) from a vertex u to v in a graph (or, digraph) G is also called a u − v walk (or,
directed walk). If u and v are the same, then u − v walk (or, directed walk) is called a closed walk (or, directed
walk). Otherwise, it is called an open walk (or, directed walk).
Definition 1.6.4. A walk with no repeated edges is called a trail, and a walk with no repeated vertices except
possibly the initial and terminal vertices is called a path.
Thus, from the previous definitions, it is clear that in a path, no edge can be repeated. Hence, every path is
a trail, but not conversely.
Definition 1.6.5. A walk, path, or trail is called trivial if it has only one vertex and no edges. A walk, path, or
trail that is not trivial is called nontrivial.
Definition 1.6.6. A nontrivial closed trail from a vertex u to itself is called a circuit.
Hence, a circuit is a closed walk of nonzero length from a vertex u to itself with no repeated edges.
Example 1.6.7. Consider the graph in figure 1.6.1. In this graph
(A, a, B, b, C, f, E, e, B, d, D)
is a walk of length 5. It is an open walk from A to D. This is a walk with no repeated edges. Hence this walk
is a trail since B appears twice. But
(B, b, C, f, E, i, D, j, G)
is a path of length 4 from vertex B to G.
1.6. WALKS, PATH, CYCLES, CIRCUITS 11
Figure 1.6.1
Definition 1.6.8. A circuit that does not contain any repetition of vertices except the starting and terminal
vertices is called a cycle. A cycle of length k is called a k-cycle. A cycle is called even (odd) if it contains an
even (odd) number of edges.
It follows from definition that a 3-cycle is a triangle.
Directed walks, trails, paths, circuits, cycles are defined analogously.
Definition 1.6.9. Let P = (v1 , e1 , v2 , e2 , . . . , vn−1 , en−1 , vn ) be a walk in a graph G. A subwalk of P is a
subsequence of consecutive entries Q = (vi , ei , vi+1 , ei+1 , . . . , vk−1 , ek−1 , vk ), 1 ≤ i ≤ k ≤ n, that begins
at a vertex and ends at a vertex.
From the definition, it follows that every subwalk is a walk.
Let P = (v1 , e1 , v2 , e2 , . . . , vn−1 , en−1 , vn ) be a walk in a graph G and Q = (vi , ei , vi+1 , ei+1 , . . . ,
vk−1 , ek−1 , vk = vi ) be a closed subwalk of P . If we delete this subwalk Q from P except for the ver-
tex vi , then we obtain a new walk. This walk is denoted by P − Q and is called the reduction of P by
Q.
Theorem 1.6.10. Let G be a graph and u, v be two vertices of G. If there is a walk from u to v, there is a path
from u to v.
Proof. Let P = (u = v1 , e1 , v2 , e2 , . . . , vn−1 , en−1 , vn = v) be a walk. If u = v, then this is a closed
walk. In this case, (u) from u to u consisting of a single vertex and no edge. Suppose P = (u =
v1 , e1 , v2 , e2 , . . . , vn−1 , en−1 , vn = v) is an open walk. If this is not a path, then vi = vj for some 1 ≤
i < j ≤ n. This shows that there is a closed subwalk Q from vi to vj . We reduce P to P − Q. Now, P − Q
is a new walk from u to v. If this walk is not a path, we repeat this deletion process of subwalks. Because the
number of closed subwalks in P is finite, we eventually obtain a path from u to v.
We can also follow the proof of the above theorem and deduce an analogous result for circuit.
Theorem 1.6.11. Every circuit contains a subwalk that is a cycle.
Proof. Let T be a circuit. Let S be the collection of all closed nontrivial subwalks of T . Because T ∈ S,
S is nonempty. Now S is a finite set. Thus we can find a member of S of minimum length. Let T1 be a
nontrivial closed subwalk (u = v1 , e1 , v2 , e2 , . . . , vn−1 , en−1 , vn = u) of T of minimum length. Since T1 is
of minimum length, T1 cannot contain a nontrivial closed subwalk other than T1 . This implies that T1 has no
repeated vertices except the vertex u. Hence T1 is a cycle.
12 UNIT 1.
Since the trivial walk (u) is a u − u walk in G, R is reflexive. Suppose there is a u − v walk (u =
v1 , e1 , v2 , e2 , . . . , vn−1 , en−1 , vn = v). Then, (v = vn , en−1 , vn−1 , . . . , v2 , e1 , v1 = u) is a v − u walk
in G. Thus, R is symmetric. Again, suppose there is a u − v walk
(u = v1 , e1 , v2 , e2 , . . . , vn−1 , en−1 , vn = v)
(u = v1 , e1 , v2 , e2 , . . . , vn−1 , en−1 , vn = v = u1 , f1 , u2 , f2 , . . . , vm = w)
is a walk from vertex u to vertex w. Thus, the relation R is transitive. Hence R partitions the vertex set V
into disjoint equivalence classes. Let V1 be an equivalence class of R and E1 be the set of edges joining the
vertices in V1 in the graph G. Then G1 = (V1 , E1 ) is a subgraph of G. In this subgraph, we see that any two
vertices are connected. This subgraph is called a component of G.
Definition 1.6.13. A subgraph H of a graph G is called a component of G if
1. any two vertices of H are connected in H, and
Graph G in the above figure has only one component, which is G itself. Graph H, on the other hand, has
two components with vertices {A, B, C, D, E, F } and {a, b, c, d, e}.
From the definition, it follows that any component of a graph is always connected. Now, every equivalence
class of the equivalence relation R gives a component of G. Hence, every graph can be partitioned into finite
number of components. It follows that a graph G is connected if and only if G has only one component.
1.7. FEW PROBABLE QUESTIONS 13
4. Define a path of a graph G. If G has exactly two vertices of odd degree, then show that there exists a
path between these two vertices.
5. Define simple graph. If there is a trail from a vertex u to another vertex v of a graph G, then show that
there is a path from u to v.
6. Define connected graph. Show that a simple graph with n vertices and m components can have at most
(n−m)(n−m+1)
2 edges.
7. Let G be a connected graph with at least two vertices. If the number of edges in G is less than the
number of vertices, then prove that G has a vertex of degree 1.
Unit 2
Course Structure
• Trees, Euler’s formula for connected graphs, Spanning trees
2.1 Introduction
In the previous unit, we learnt about the basic definitions of graph theory and certain properties related to
them. This unit is a continuation of the previous unit.
Objectives
After reading this unit, you will be able to
In the figure 2.2.1, the graph in (a) is a bipartite graph with partition {A} and {B, C, D}. Whereas, the
second graph is not bipartite as we can easily verify. (Verify!)
Definition 2.2.2. A bipartite graph G with bipartition V1 ∪ V2 is called a complete bipartite graph on m and
n vertices if the subsets V1 and V2 contain m and n vertices, respectively, such that there is an edge between
each pair of vertices v1 ∈ V1 and v2 ∈ V2 . A complete bipartite graph with m and n vertices is denoted by
Km,n .
15
16 UNIT 2.
Figure 2.2.1
The two graphs in the above figure represents two complete bipartite graphs. (a) is K1,3 while (b) is K2,3 .
Note that the number of edges in the graph Km,n is mn.
Definition 2.2.3. Let G be a graph. Then the distance between two vertices u, v of G, written as d(u, v), is
the length of a shortest path, if any exists, from u to v.
We will now deduce a necessary and sufficient condition for a graph to be bipartite.
Theorem 2.2.4. A graph is bipartite if and only if it does not contain any cycle of odd length.
Proof. Let G = (V, E) be a bipartite graph with bipartition V = V1 ∪ V2 . Now, each edge of G is incident
with one vertex in V1 and one vertex in V2 . Let (v1 , e1 , v2 , e2 , . . . , vk , ek , v1 ) be a cycle in G. Because vi and
vi+1 are end vertices of ei , for i = 1, 2, . . . , k (assuming vk+1 = v1 ), it follows that for i = 1, 2, . . . , k, if
vi ∈ V1 , then vi+1 ∈ V2 . Suppose v1 ∈ V1 . This implies that vk ∈ V2 . Also it follows that vi ∈ V1 if and only
if i is odd. Now vk ∈ V2 , which implies that k is even and hence the length of this cycle is even. This implies
that the length of each cycle is even.
2.3. SPECIAL CIRCUITS 17
Conversely, let G be a graph such that G has no odd cycle. Suppose G is partitioned into components
C1 , C2 , . . . , Cm , m ≥ 1. If we can show that each Ci is a bipartite graph, then G will be also so. We therefore
assume that G is connected. Let u be an arbitrary but fixed vector of G. Define the subsets V1 and V2 by
From our assumption that G is a connected graph, it follows that every vertex of G is either in V1 or in V2 .
Then {V1 , V2 } is a partition of V . Because d(u, u) = 0, it follows that u ∈ V1 . Let v be an adjacent vertex of
u. Then d(u, v) = 1. Hence, v ∈ V2 .
Suppose there are two distinct vertices v and w in V1 and suppose there exists an edge e with v, w as end
vertices. Then there is a walk from u to v in G and hence there is a shortest path, say P1 , from u to v.
Similarly, we have a shortest path P2 , from u to w. Because v and w belong to V1 , these two shortest paths
are of even length. Paths P1 and P2 may have several vertices and edges in common.
Now starting from u, let x be the last vertex common to both P1 and P2 . Let P1∗ be the section of the path
of P1 from u to x and let P2∗ be the section of the path of P2 from u to x. Because P1 and P2 are the shortest
0
paths, P1∗ and P2∗ have equal lengths, which are either both even or both odd. Let P1 be the part of P1 from
0 0 0
x to v and P2 be the part of P2 from x to w. It follows that the lengths of P1 and P2 are both either even or
0 0
odd. Now the walk P1 followed by e followed by P2 forms a closed walk C from x to x. Moreover, C does
0 0
not contain any repetitions of the vertices. Hence C is a cycle. Because the lengths of paths P1 and P2 are
both even or odd, C must be of odd length, which is a contradiction. Thus, v and w cannot be both in V1 .
Similarly, we can show that v and w cannot both belong to V2 . Hence each edge of G connects one vertex of
V1 with one vertex of V2 . Consequently, G is bipartite.
3. Prove that a simple graph with a cycle of length 3 can’t be a bipartite graph.
Definition 2.3.1. A circuit in a graph that includes all the edges of the graph is called an Euler Circuit. And
a graph G is called Eulerian if either G is trivial graph or G has an Euler circuit.
18 UNIT 2.
Recall the Königsberg bridge problem at the beginning of unit 1. The problem was to determine whether
it is possible to take a walk that crosses each bridge exactly once before returning to the starting point. Euler
converted this into a problem of graph theory as follows : Each of the islands A, B, C and D are considered
as the vertices of a graph and the seven bridges a the seven vertices of the graph. Now the problem reduces to
finding a circuit in the graph such that it contains all the edges, or, to find an Euler circuit, or to show that the
graph is Eulerian. It is evident from the figure that there does not exist any Euler circuits of the graph.
Example 2.3.2. Consider the graph below.
Each vertex of the above graph are even vertices. In fact, this is a feature of Eulerian graphs as we will soon
show.
Theorem 2.3.3. If a connected graph G is Eulerian, then every vertex of G has even degree.
Proof. Suppose that G is Eulerian.
First suppose that G is the trivial graph. Then G has only one vertex v and no edges. Hence the degree of
v is 0 which is even.
Next suppose that G contains more than one vertex. SInce G is Eulerian, it has an Euler circuit, say
C : (v1 , e1 , v2 , e2 , v3 , . . . , en−1 , vn = v1 )
2.3. SPECIAL CIRCUITS 19
from a vertex v1 to vn = v1 . Now, C contains all the vertices (since G is connected) and all the edges of
G. However, there are no repeated edges in C, though in C a vertex may appear more than once. Let u be a
vertex of G. Since G is connected, u is not an isolated vertex. So u is the end vertex of some edge. Since C
contains all the edges, it follows that u is a member of C.
Suppose u is v1 . Let us say that this is the first appearance of u in C. Now, if u is also vn , we say that vn is
the last appearance of u in C. For each of these two appearances of u, the edge e1 and the edge en−1 together
contribute 2 to the degree of u.
Suppose now u is vi in C for some i, 1 < i < n. Then u is an end vertex of the edges ei−1 and ei . These
edges together contribute 2 to the degree of u. It now follows that the degree of any vertex in C is even. Hence
the degree of any vertex in G is even.
Suppose G is connected in which every vertex is of even degree. We shall show that G contains an Euler
circuit. To do so, we first prove the following lemma.
Lemma 2.3.4. Let G be a connected graph with one or two vertices. If every vertex of G is of even degree,
then G has an Euler circuit.
Proof. Suppose G is a graph with only one vertex, say u. Now there may exist zero or more loops at u.
However, the number of loops at u must be finite. If there is no loop at u, then (u) is an Euler circuit of G.
Also suppose that there are loops e1 , e2 , . . . , en , n ≥ 1, at u. Then (u, e1 , u, e2 , . . . , en , u) is an Euler circuit
of G. Hence, G contains an Euler circuit.
Suppose now that G has two vertices u and v such that both are of even degree. Because G is con-
nected, u and v are connected. So there exists an even number of parallel edges between u and v. Let
{f1 , f2 , . . . , f2k }, k ≥ 1 be the set of all edges between u and v. Let e1 , e2 , . . . , en , n ≥ 0, be the loops at u
and let g1 , g2 , . . . , gm , m ≥ 0, be the loops at v. (If n = 0, then there are no loops at u. Similarly, if m = 0,
there are no loops at v). Now,
is a trail that begins at u, traverses all the loops incident with u, traverses one edge from u to v, traverses all
the loops at v, then traverses one edge from v to u, and then traverses all the edges between u and v. This trail
does not contain any repeated edges. Hence, it is a circuit from u to u. Because this circuit contains all the
edges of G, it follows that the graph G has an Euler circuit.
Theorem 2.3.5. Let G be a connected graph such that every vertex of G is of even degree. Then G has an
Euler circuit.
Proof. Suppose G has n edges. We prove by induction on the number of edges of G to show that G has an
Euler circuit.
Basic Step: Suppose n = 0. Because G has no edges, it follows that G has a single vertex, say u. Then (u)
is an Euler circuit.
Inductive hypothesis: Let n be a positive integer. Assume that any connected graph with k edges, 0 ≤ k <
n, in which every vertex has even degree has an Euler circuit.
Inductive step: Let G = (V, E) be a connected graph with n edges and the degree of each vertex of G is
even. If the number of vertices of G is 1 or 2, then by previous lemma, it follows that G has an Euler circuit.
So assume that G has at least three vertices.
Since G is connected, there are vertices v1 , v2 , v3 and edges e1 , e2 such that v1 , v2 are the end vertices of
e1 , and v2 , v3 are the end vertices of e2 . Now consider the subgraph G1 = (V1 , E1 ), where V1 = V and
E1 = E − {e1 , e2 }. Next we add a new edge e with v1 , v3 as end vertices to the subgraph and obtain a new
graph G2 = (V2 , E2 ), where V2 = V , E2 = E1 ∪ {e}.
20 UNIT 2.
Notice that the graph G2 is obtained from G by deleting edges e1 , e2 , but not removing any vertices, and
adding a new edge e with end vertices v1 and v3 .
In G, suppose deg(v1 ) = r, deg(v2 ) = m, and deg(v3 ) = t. Because we deleted edges e1 , e2 in G,
deg(v1 ) = r − 1, deg(v2 ) = m − 2, and deg(v3 ) = t − 1. Now in graph G2 , we add a new edge e with
end vertices v1 and v3 . Hence, in graph G2 , we have deg(v1 ) = r, deg(v2 ) = m − 2, deg(v3 ) = t. While
constructing G1 from G and G2 from G1 , the other vertices of G were not disturbed; i.e., their degree in G2
is the same as their degree in G. Thus, it follows that every vertex of G2 is of even degree.
Now graph G2 may not be a connected graph. We show that the number of components of G2 is less than
or equal to two.
Since v1 and v3 are the end vertices of the edge e in G2 , it follows that v1 and v3 belong to the same
component of G2 , say C1 . Now, vertex v2 may not be in C1 . Let C2 be the component of G2 that contains v2 .
Let v be a vertex of G2 . Then v is also a vertex of G. Since G is a connected graph, there is a path P from v
to v1 in G.
If P contains one of the edges e1 or e2 , then P cannot be a path from v to v1 in G2 . Let P1 be the path in
G2 that is a portion of the path P starting at v whose edges are also in G2 . Path P1 may terminate at v1 , v2 , or
v3 . If P1 is a path from v to v1 in G1 , then v and v1 belong to the same component, C1 . If P1 ends at v3 , then
(P1 , e, v1 ) is a path from v to v1 . Hence in this case, v also belongs to the same component, C1 . Suppose P1
ends at v2 . Then v belongs to component C2 . Thus, any vertex v of G2 belongs to either C1 or C2 . Hence, C2
has one(if C1 = C2 ) or two components.
Suppose G2 has only one component, C1 . Then G2 is a connected graph with n − 1 edges. Thus, by the
inductive hypothesis G2 has an Euler circuit, say T1 . From circuit T1 , we can construct an Euler circuit T in
G by simply replacing the subpath (v1 , e, v3 ) by the path (v1 , e1 , v2 , e2 , v3 ). Hence in this case, we find that
G is Eulerian.
Suppose G2 has two components, C1 and C2 . Now, each component Ci , i = 1, 2 is a connected graph such
that each vertex has even degree and the number of edges in Ci is ni < n. Hence, by the inductive hypothesis,
Ci has an Euler circuit Ti , i = 1, 2. Now, T1 contains v1 , v3 and T2 contains v2 . Hence (v1 , e, v3 ) is a subpath
of T1 . Moreover, we can assume that T2 is a circuit from v2 to v2 .
We now construct an Euler circuit in G by modifying T1 as follows: In T1 , replace (v1 , e, v3 ) by (v1 , e2 , v2 ),
followed by T2 , followed by (v2 , e2 , v3 ). Thus, we find that G has an Euler circuit. The result now follows by
induction.
The above theorem is an effective way of determining when a connected graph is Eulerian.
Example 2.3.6. Consider the Königsberg bridge problem. All the vertices in the graph are of odd degree.
Then by the preceding two theorems, we can say that there does not exist an Euler circuit for the problem.
But if we add two more edges as shown in the figure, then the resulting graph is Eulerian since every vertex
is of even degree.
Definition 2.3.7. An open trail in a graph is called an Euler trail if it contains all the edges.
2.4. TREES 21
Example 2.3.8. Consider the following graph. It is a connected graph having two vertices of odd degree. So
it does not have an Euler circuit but the trail (B, g, F, e, E, d, D, c, C, b, B, a, A, f, F ) contains all the edges
of G. Hence this is an Euler trail.
Theorem 2.3.9. A connected graph G has an open Euler trail if and only if G has only two vertices of odd
degree.
Proof. Suppose G has an open Euler trail P from a vertex u to a vertex v of G. Construct a new graph G1 by
adding a new edge e to G with u and v as the end vertices. In G1 , the trail P with e forms an Euler circuit.
Hence every vertex of G1 is of even degree. In graph G1 , e contributes 1 each to the degree of the vertices u
and v. Since G does not contain the edge e, it follows that u and v are the only vertices of odd degree in G.
Conversely assume that a connected graph G has only two vertices u and v of odd degree. Construct a new
graph, G1 , by adding a new edge, e, to G with u and v as the end vertices. Then G1 is a connected graph
where every vertex is of even degree. Then G1 contains an Euler circuit, say P . Now, (u, e, v) is a subpath of
P . This subpath is not present in G. Hence, if we delete (u, e, v) from P , then we obtain an open Euler trail
P1 from u to V in G. Hence the theorem is done.
2.4 Trees
Definition 2.4.1. A graph that is connected and has no cycles is called a tree. Generally, a graph that does not
contain any cycles is called an acyclic graph.
All the graphs are connected. The graphs a and c clearly contains no cycle and hence are trees. Also, the
graphs b and d contains cycles and hence are not trees.
Let T be a tree. Then T is a simple connected graph, so T does not have any parallel edges or loops. Let u
and v be two vertices in T . It follows that there is at most one edge connecting u and v. Since G is connected,
22 UNIT 2.
there is a path from u to v. Let P = (u, e1 , u1 , e2 , . . . , uk , ek , v). If no confusion arises, then we write the
path P as (u, u1 , . . . , uk , v), that is, when listing the vertices of the path, we will omit the edges.
Theorem 2.4.3. Let u and v be two vertices of a tree T . Then there exists only one path from u to v.
(u, u1 , . . . , uk , v, vt , . . . , v1 , u),
Theorem 2.4.4. In a tree with more than one vertex, there are at least two vertices of degree 1.
Proof. Let T be a tree with more than one vertex. Since T is a connected graph with at least two vertices,
there is a path with at least two distinct vertices. Because the number of vertices and the number of edges is
finite, the number of paths in T is also finite. Thus we can find a path P of maximal length. Suppose path P
is from vertex u to vertex v. We show that deg(u) = deg(v) = 1.
Suppose deg(v) 6= 1. Let P be the path (u = v1 , e1 , v2 , e2 , v3 , . . . , vk−1 , ek−1 , v). Since deg(v) 6= 1,
there exists an edge ek with v as an end vertex such that ek 6= ek−1 . Since T has no loops, the other end
of ek can’t be v. Suppose the other end is vk . Suppose vk = vi for some i such that 1 ≤ i ≤ k − 1.
Then (v, ek , vi , ei+1 , vi+1 , . . . , vk−1 , ek−1 , v) is a cycle from v to v, which contradicts the fact T is a tree. If
vk 6= vi , 1 ≤ i ≤ k − 1, then we get the path (v1 , e1 , v2 , e2 , v3 , . . . , vk−1 , ek−1 , v, ek , vk ) whose length is
greater than that of P . This contradicts the fact that path P is of maximal length in T . It now follows that
deg(v) = 1. Similarly, we can show that deg(u) = 1.
The converse of the above theorem is not true as shown by the following example.
Theorem 2.4.6. Let T be a tree with n vertices, n ≥ 1. Then T has exactly n − 1 edges.
Inductive hypothesis: Let k ≥ 1 be a positive integer. We assume that the theorem holds for any tree with
k vertices.
Inductive step: Let T be a tree with k + 1 vertices. Since k + 1 ≥ 2, it follows from theorem 2.4.4 that
T has at least two vertices of degree 1. Let u be a vertex of degree 1 in T . We construct a new graph G by
deleting u from T and also the edge e, which is incident on u. Now, G is still a connected graph and does
not contain any cycle. Hence G is a tree with k vertices. By inductive hypothesis, we find that G has exactly
k − 1 edges. This implies that T has k edges. Hence by induction, the theorem holds for any integer n.
The converse of the above theorem is not true in general. This is proved by the following example.
Example 2.4.7. The given graph is clearly a graph containing 4 vertices and 3 = 4 − 1 edges. But this is
clearly not a tree. Also, it is not connected.
Theorem 2.4.8. Let T be a graph with n vertices. Then the following are equivalent:
1. T is a tree.
2. T has no loops and if u and v are two distinct vertices in T , then there exists only one path from u to v.
Definition 2.5.1. A tree T is called a spanning tree of a graph G if T is a subgraph of G and T contains all
the vertices of G.
Note that the spanning tree of a graph need not be unique. The following theorem gives a necessary and
sufficient condition for a graph to have a spanning tree.
24 UNIT 2.
Proof. Suppose G has a spanning tree, G1 . G1 contains all the vertices of G. Then between any two vertices,
there exists a path in G1 , which is also a path of G. Hence, G is a connected graph.
Conversely, suppose G is a connected graph. If G has no cycles, then G is a tree. Suppose G has cycles.
Let C1 be a cycle in G and e1 be an edge in C1 . Now construct the graph G1 = G \ {e1 }, which is obtained
by deleting the edge e1 from G but not removing any vertex from G. Clearly, G1 is a subgraph of G and it
contains all the vertices of G. Because e1 is an edge of a cycle, G1 is still a connected graph. If G1 is acyclic,
then G1 is a tree. If G1 contains a cycle C2 , then we delete an edge e2 from C2 and construct a connected
subgraph G2 that contains all the vertices. If G2 contains cycles, then we continue this process. Since G has
a finite number of edges, it contains only a finite number of cycles. Hence, continuing the process of deleting
an edge from a cycle, we eventually obtain a connected subgraph Gk that contains all the vertices of G and is
also acyclic. It follows that Gk is a spanning tree of G.
Exercise 2.5.3. 1. Draw a tree with 9 vertices such that three vertices are of degree 3.
4. Suppose there exists a simple connected graph with 16 vertices that has 15 edges. Does it contain a
vertex of degree 1? Justify your answer.
2. Define Euler circuit. Deduce a necessary condition for a connected graph to be Eulerian.
3. Deduce a necessary and sufficient condition for a connected graph G to have an Euler trail.
4. Define a tree. Show that in a tree T , there exits only one path between two vertices of T .
5. Show that in a tree with more than one vertex, there exits at least two vertices of degree 1. Is the
converse true? Justify.
6. Show that a tree with n vertices has n − 1 edges. Is the converse true? Justify.
Unit 3
Course Structure
• Planar graphs and their properties
3.1 Introduction
The present unit starts with the matrix representation of graphs. We have dealt with two types of matrix
representations of graphs, viz., the adjacency matrix and the incidence matrix. The matrix representations are
compact and say everything about the graph in a very simple manner as we shall see.
The next topic that is covered is the graph isomorphisms. A graph can exist in different forms having the
same number of vertices, edges, and also the same edge connectivity. Such graphs are called isomorphic
graphs or "equal" or "same" graphs.
Next we have covered the planar graphs. Such graphs in which the can be drawn in a plane of paper can
be thought of as a planar graph and such graphs that don’t satisfy this property is called a non-planar graph.
Of particular importance are the connected simple planar graphs from which we can deduce the Kuratowski’s
theorem that characterises simple non-planar graphs. The proof of this is however excluded.
Lastly, we have dealt with the graph coloring. It is nothing but a simple way of labelling graph compo-
nents such as vertices, edges, and regions under some constraints. Vertex coloring and edge coloring are two
common graph coloring problems. The graph coloring problem has huge number of applications, like making
schedule or time tables, sudoku, map coloring, etc.
We have now given a brief idea about all that we are about to study. Let’s explore!
Objectives
After reading this unit, you will be able to
25
26 UNIT 3.
• define planar graphs and related terms like faces, boundaries, etc.
Since aij is the number of edges from vi to vj , the adjacency matrix AG is a square matrix over the set of
non-negative integers.
If G is a digraph, then the adjacency matrix AG with respect to the particular listing v1 , v2 , . . . , vn of n
vertices of G is an n × n matrix [aij ] such that the (i, j)th entry is the number of arcs from ai to aj .
Example 3.2.2. Consider the graph G below. The vertices of the graph are {A, B, C, D, E, F }. Then the
Example 3.2.3. Consider another graph 3.2.1. The vertices of the graph is {A, B, C, D}. The adjacency
matrix of the graph with respect to the listing is
0 2 1 0
2 0 1 1
1 1 0 0
0 1 0 1
3.2. MATRIX REPRESENTATION OF A GRAPH 27
Figure 3.2.1
Notice that the matrix AG is symmetric symmetric since aij = aji . But, if G is a digraph, then the
adjacency matrix need not be symmetric. The adjacency matrix has the following properties:
1. If G does not contain any loops and parallel edges, then each element of AG is either 0 or 1.
2. If G does not contain any loops, then all the diagonal elements of AG are 0.
We construct a graph G such that AG = A. For this, we denote the rows by A, B, C, D, E and the columns
by A, B, C, D, E. Now we draw a graph with vertices A, B, C, D, E. Since (1, 1) and (4, 4) are the only
diagonal elements with entries equal to one, we draw one loop each at the vertices A and D only. Now, we
see that (1, 2)th element= (2, 1)th element = 0 ⇒ there is no edge between A and B. Again, (1, 3)th element
= (3, 1)th element = 1 ⇒ there exists one edge between A and C. Continuing in this way, we find the
following graph
Definition 3.2.5. Let G be a graph with n vertices v1 , v2 , . . . , vn , where n > 0 and m edges e1 , e2 , . . . , em .
The incidence matrix IG with respect to the ordering v1 , v2 , . . . , vn of vertices and e1 , e2 , . . . , em edges is an
n × m matrix [aij ] such that
Exercise 3.2.6. 1. Find the adjacency matrix of the following graphs with respect to the listing A, B, C, D
of the vertices:
3. Find the adjacency matrix of the digraph with respect to the listing A, B, C, D:
3.3 Isomorphism
Definition 3.3.1. Let G1 = (V1 , E1 , g1 ) and G2 = (V2 , E2 , g2 ) be two graphs. G1 is said to be isomorphic to
G2 if there exists a one-to-one correspondence f : V1 → V2 and a one-to-one correspondence h : E1 → E2
in such a way that for any edge ek ∈ E1 , g1 (ek ) = {vi , vj } in G1 if and only if g2 (h(ek )) = {f (vi ), f (vj )}
in G2 .
3.3. ISOMORPHISM 29
In other words, if G1 = (V1 , E1 ) and G2 = (V2 , E2 ) be two graphs, then G1 is said to be isomorphic to
G2 if there exist a one-to-one correspondence f : V1 → V2 and a one-to-one correspondence h : E1 → E2
such that for any edge ek in E1 , vertices vi , vj are end vertices of ek in G1 if and only if f (vi ), f (vj ) are end
vertices of h(ek ) in G2 . When we say two graphs are same, we mean they are isomorphic to each other.
Example 3.3.2. Let G and H be graphs as in figure 3.3.1 Both these graphs have six vertices and six edges.
Figure 3.3.1
Moreover, both the graphs are simple. The degree sequence of both the graphs is 2, 2, 2, 2, 2, 2.
Let us define f : V1 → V2 and h : E1 → E2 by
f: A 7→ U, B 7→ V, C 7→ W, D 7→ X, E 7→ Y, F 7→ Z;
h: a 7→ u, b 7→ v, c 7→ w, d 7→ x, e 7→ y, f 7→ z.
Then we can check that these maps f and h serve as the one-to-one correspondence maps between the vertex
sets and edge sets of the two graphs that satisfies the isomorphism conditions. Thus G and H are isomorphic.
If two graphs G1 and G2 are isomorphic, then it is written as G1 ' G2 .
The following theorem is evident
Theorem 3.3.3. Let G, G1 , G2 and G3 be graphs. Then the following assertions hold:
(i) G ' G;
(ii) If G1 ' G2 , then G2 ' G1 ;
(iii) If G1 ' G2 , and G2 ' G3 , then G1 ' G3 .
Proof. Left as an exercise.
Definition 3.3.4. Two graph G1 and G2 are said to be different if G1 is not isomorphic to G2 .
Let us write few properties of isomorphic graphs.
1. Two graphs G1 and G2 are isomorphic if and only if there exists a one-to-one correspondence f between
the vertex sets of them such that if v1 , v2 are adjacent vertices in G1 , then f (v1 ) and f (v2 ) are adjacent
vertices in G2 .
2. Two graphs G1 and G2 are isomorphic. Then G1 has a vertex of degree k if and only if G2 has a vertex
of degree k.
3. Two graphs G1 and G2 are isomorphic. Then G1 has a cycle of length k if and only if G2 has a cycle of
length k.
30 UNIT 3.
(a) (b)
We can also say that the above two graphs are isomorphic or equal. In the latter graph, notice that no two
edges intersect except at the vertices. Such graphs are called planar graphs as we will formally define now.
Definition 3.4.1. A graph G is called a planar graph if it can be drawn in the plane such that no two edges
intersect except at the vertices, which may the common end point of the edges. We can also say that a graph
is planar if it is isomorphic to a graph having the property said above.
Definition 3.4.2. A graph drawn in the plane (on paper or a chalkboard) is called a plane graph if no two
edges meet at any point except the common vertex, if they meet at all.
From the preceding two definitions, it is clear that a graph is a planar graph if and only if it has a pictorial
representation in a plane which is a plane graph. The pictorial representation of a planar graph G as a plane
graph is called the planar representation of G.
Consider the planar representation of a planar graph given below
Let G denotes the graph in the above figure. Then G divides the plane into different regions, called the
faces of G. Suppose x is a point in the plane that is not a vertex of G or a point on any edge of G. Then a
face of G containing x is the set of all points on the plane that can be reached from x by a straight line or a
curved line that does not cross any edge of G or pass through any vertex of G. Thus, it follows that a face is
a region produced by a planar graph that is an area of the plane bounded by the edges and that is not further
subdivided into sub-areas.
The set of edges that bound a region is called its boundary. Of course, there exists a region of infinite area
in any plane graph G. This is the part of the plane that lies outside the planar representation of G. This region
3.4. PLANAR GRAPHS 31
is called the exterior face. A face that is not exterior is called an interior face. We illustrate these concepts
by the following example.
1: Bounded by the cycle (A, a, B, b, C, f, A). The boundary of P consists of the edges a, b, f .
2: Bounded by the cycle (D, d, E, e, C, c, D). The boundary of Q consists of the edges d, e, c.
3: The part of the plane outside this plane graph. The boundary of the region consists of the edges
a, b, c, d, e and f .
It follows that this plane graph contains three faces, namely P, Q and R.
For this plane graph, the number of edges ne = 6, the number of vertices nv = 5, the number of faces
nf = 3, and we see that
nv − ne + nf = 2.
Theorem 3.4.4. Let G be a connected planar graph with nv vertices, ne edges and nf faces. Then nv − ne +
nf = 2.
construction of G1 , we delete only the edge without deleting any vertex. Therefore, nv = t, ne = k − 1. Now,
C \ {e} will not form a boundary in G1 . Thus in G1 , nf = m − 1. Hence G1 is a connected planar graph
with nv = t vertices, ne = k − 1 edges, and nf = m − 1 faces. By the inductive hypothesis, it follows that
t − (k − 1) + (m − 1) = 2. This implies that t − k + m = 2. Hence, nv − ne + nf = 2.
The result now follows by induction.
Exercise 3.4.5. Verify the above theorem for the following graphs:
Theorem 3.4.7. Let G be a connected simple planar graph with nv ≥ 3 vertices and ne edges. Then
ne ≤ 3nv − 6.
Proof. Since G is a planar graph, it has a planar representation. Consider a planar representation of G.
Suppose nv = 3. Because G is a simple connected graph with 3 vertices, it follows that ne ≤ 3. Then
ne ≤ 3 · 3 − 6 = 3, which implies that ne ≤ 3nv − 6.
Suppose now nv ≥ 3. If G does not contain any cycles then we can show that ne = nv − 1. Now,
3nv − 6 = (nv − 1) + (nv − 2) + (nv − 3) > (nv − 1) = ne .
Suppose G contains a cycle. Because G is simple, it may contain a cycle with 3 edges. Thus, the number of
edges in the boundary of a face is ≥ 3. Now, there are nf faces and every edge is a member of some boundary
of the planar representation. Hence, the total number of appearances of the edges in boundaries of nf faces is
≥ nf · 3. In counting these appearances, an edge may be counted atmost two times. Thus, the total number of
appearances of the ne edges in boundaries is ≤ 2ne . Hence, nf · 3 ≤ 2ne . Now, by Euler’s theorem,
nv − ne + nf = 2
⇒ 3nv − 3ne + 3nf = 6
⇒ 3ne = 3nv + 3nf − 6
⇒ 3ne ≤ 3nv + 2ne − 6
⇒ ne ≤ 3nv − 6.
Let G = (V, E) be a graph. Suppose that e is an edge with v1 , v2 as end vertices. Construct the subgraph
G1 = G \ {e}. To construct G1 , we have deleted edge e without deleting any vertices from G. We now
construct a new graph, G2 = (V2 , E2 ), by taking V2 = V ∪ {w}, E2 = (E \ {e}) ∪ {f1 , f2 } such that w 6∈ V ,
f1 , f2 6∈ E, v1 , w are end vertices of f1 and v2 , w are end vertices of f2 . The process of obtaining G2 from G
is called a one-step subdivision of an edge of G.
Definition 3.4.9. A graph H is said to be a subdivision of a graph G if there exist graphs H0 , H1 , H2 , . . . , Hn
such that H0 = G, Hn = H, and Hi is obtained from Hi−1 be a one-step subdivision of an edge of Hi−1 for
i =, 1, 2, . . . , n.
If a graph H is a subdivision of a graph G, then we say that H is obtained from G by subdividing the edges
of G.
Example 3.4.10. Consider graphs G and H below.
We see that H is obtained from H by a finite sequence of subdivisions of edges. H is obtained from G by
dividing the edge a one time, b one time and c twice.
Definition 3.4.11. Two graphs G and H are said to be homeomorphic graphs if there is an isomorphism from
a subdivision of G to a subdivision of H.
Consider the following example.
Example 3.4.12. Consider the graphs G and H below. We see that G contains a cycle of length 5, and H
contains a cycle of length 4. Hence these two graphs are not isomorphic. But we find a subdivision G0 of G
and H 0 of H such that G0 and H 0 are isomorphic (see fig. 3.4.2). Hence G and H are homeomorphic.
In 1930, Kuratowski proved the following famous theorem, characterising simple planar graphs in terms of
K5 and K3,3 .
Theorem 3.4.13. Kuratowski. A simple graph is planar if and only if it does not contain a subgraph homeo-
morphic to K5 or K3,3 .
The proof of the above theorem is omitted.
34 UNIT 3.
Figure 3.4.2
For each vertex v, its image f (v) is called the color of v. It follows that a vertex coloring of a graph G is
an assignment of the colors c1 , c2 , . . . , cn to the vertices of graph G. Similarly, a proper vertex coloring of
G is an assignment of the colors c1 , c2 , . . . , cn to the vertices of G such that adjacent vertices have different
colors. The following graph is an illustration.
This is a graph with 4 vertices A, B, C and D. Suppose C = {r, b, y, g}, where r denotes red, b denotes
blue, y denotes yellow and g denotes green. Define f : V → C by
A 7→ r
B 7→ g
C 7→ y
D 7→ b.
Definition 3.5.3. The smallest number of colors needed to make a proper vertex coloring of a simple graph G
is called the chromatic number of G and is denoted by χ(G).
Theorem 3.5.4. Let G be a nontrivial simple graph. Then χ(G) = 2 if and only if G is a bipartite graph.
3.5. GRAPH COLORING 35
Proof. Let G = (V, E) be a bipartite graph. Then vertex set V can be partitioned into two non-empty subsets
V1 and V2 such that each edge of G is incident with one vertex of V1 and one vertex of V2 . Let C = {c1 , c2 }
be a set of two colors.
Define a function f : V → C such that
f (v) = c1 ; if v ∈ V1
= c2 ; if v ∈ V2 .
Since V1 ∩ V2 = ∅, it follows that f is well-defined. Now, no two vertices of V1 are adjacent. Therefore, all the
vertices can have the same color. Similarly, all the vertices of V2 can have the same color. From the definition
of f , it follows that two adjacent vertices of G have different colors. Thus, χ(G) ≤ 2. Also, since G has at
least one edge, χ(G) < 1. Hence combining, we get χ(G) = 2.
Conversely suppose that χ(G) = 2. This implies that the graph contains at least one edge. Also, there
exists a function f : V → C = {c1 , c2 } such that no two adjacent vertices have the same image.
Let V1 = {v ∈ V : f (v) = c1 } and V2 = {v ∈ V : f (v) = c2 }. It follows that V1 ∩ V2 = ∅ and
V1 ∪ V2 = V . Let e be an edge with end vertices v1 and v2 . Because v1 and v2 can’t have the same color,
v1 ∈ V1 and v2 ∈ V2 . Thus, G is a bipartite graph.
Definition 3.5.5. Let G be a graph with vertices v1 , v2 , . . . , vn−1 , vn . The maximum of the integers deg(vi ),
for i = 1, 2, . . . , n is denoted by ∆(G).
Definition 3.5.7. Let G = (V, E) be a simple graph and C = {c1 , c2 , . . . , cn } be a set of n colors. An edge
coloring of G using the colors of C is a function f : E → C. Let f : E → C be an edge coloring of G.
If, for any two edges e1 and e2 meeting at a common vertex, f (e1 ) 6= f (e2 ), then f is called a proper edge
coloring.
For each edge e, its image f (e) is called the color of e. It follows that a proper edge coloring of a graph
G is an assignment of the colors c1 , c2 , . . . , cn to the edges of graph G such that any two edges meeting at a
common vertex have different colors. The following graph is an illustration.
Figure 3.5.1
The graph G has six edges a, b, c, d, e, f . Suppose C = {R, B, Y, G}, where R denotes red, B denotes
blue, Y denotes yellow, and G denotes green. Define f : E → C by
a 7→ R
c 7→ G
d 7→ B
f 7→ Y
b 7→ B
e 7→ R.
p 7→ R
a 7→ G
b 7→ G
q 7→ R
r 7→ R.
This is a proper coloring of G. Hence χ(K2,3 ) = 2.
3.6. FEW PROBABLE QUESTIONS 37
Figure 3.5.2
Example 3.5.13. For the graph Kn , we find χ(Kn ). Kn is a complete graph with n vertices. For any vertex
v of Kn , each of the remaining n − 1 vertices is an adjacent vertex of v. Hence we need n distinct colors for
proper coloring of Kn . Then, χ(Kn ) ≥ n. But Kn has n vertices. So, χ(Kn ) = n.
2. Define planar graphs. Show that for a connected simple planar graph G with nv ≥ 3, ne ≤ 3nv − 6.
4. Define chromatic number of a graph G. Show that a simple nontrivial graph G has chromatic number 2
if and only if G is bipartite.
Course Structure
• Lattices as partial ordered sets. Their properties.
4.1 Introduction
A lattice is an abstract structure studied in the mathematical subdisciplines of order theory and abstract algebra.
It consists of a partially ordered set in which every two elements have a unique supremum (also called a least
upper bound or join) and a unique infimum (also called a greatest lower bound or meet). An example is given
by the natural numbers, partially ordered by divisibility, for which the unique supremum is the least common
multiple and the unique infimum is the greatest common divisor.
Lattices can also be characterized as algebraic structures satisfying certain axiomatic identities. Since the
two definitions are equivalent, lattice theory draws on both order theory and universal algebra. Semilattices
include lattices, which in turn include Heyting and Boolean algebras. These "lattice-like" structures all admit
order-theoretic as well as algebraic descriptions.
Objectives
After reading this unit, you will be able to
38
4.2. PARTIALLY ORDERED SETS 39
On the set of all integers, the usual "less than or equal to" relation is an antisymmetric relation since a ≤ b
and b ≤ a implies a = b.
Similarly if T is the set of all subsets of a set A, then the inclusion relation ”⊆” is an antisymmetric relation
since for any two subsets X and Y of A, we always have X ⊆ Y and Y ⊆ X implies X = Y .
Definition 4.2.2. A relation R on a set A is called a partial order on A if R is reflexive, antisymmetric and
transitive. In other words, if R satisfies the following conditions:
A set A together with a partial order relation R is called a partially ordered set, or poset, and we denote
this poset by (A, R).
Let (A, R) be a poset. If there is no confusion about the partial order, we may refer to the poset simply by
A.
Example 4.2.3. The set Z, together with the usual "less than or equal to", ≤ relation is a poset. Note that the
relation ’<’ is not a partial order relation on Z since the relation is not reflexive.
Example 4.2.4. Consider N, the set of all natural numbers, and the divisibility relation R on N. That is, for
all a, b ∈ N, aRb if a|b, that is, there exists a positive integer c such that b = ac. Check that this relation R is
partial ordered. Thus, N with the divisibility relation is a poset.
Though the divisibility relation is a partial order relation on the set of all positive integers, it is not so on
the set of all nonzero integers. For example, 5 = (−1)(−5) and also, −5 = (−1)(5) and thus, 5|(−5) and
(−5)|5 but 5 6= −5.
Let R be a partial order on a set A, that is, (A, R) is a poset. We usually denote R by ≤A , or simple ≤. If
A is a partially ordered set with a partial order ≤, then we denote it has (A, ≤A ) or (A, ≤).
Definition 4.2.5. Let (S, ≤) be a poset and a, b ∈ S. If either a ≤ b or b ≤ a holds, then we say that a and b
are comparable. The poset (S, ≤) is called a linearly set, or totally ordered set, or a chain. if for all a, b ∈ S,
either a ≤ b or b ≤ a.
Example 4.2.6. Consider the poset (Z, ≤) with the less equal to relation. For any two integers a and b, either
a < b , or a = b, or b < a. Thus, any two integers with respect to the partial order ≤ are comparable. Hence
(Z, ≤) is a chain.
Example 4.2.7. Consider the poset (N, ≤) with respect to the divisibility relation. Notice here that 2 does not
divide 5 and 5 does not divide 2. Thus, 2 and 5 are not comparable and hence (N, ≤) is not a chain.
Theorem 4.2.8. Any subset T of a poset S is itself a poset under the same relation (restricted to T ). Any
subset of a chain is a chain.
40 UNIT 4.
From the directed graph it follows that the given relation is reflexive and transitive. This relation is also
asymmetric because there is a directed edge from a to b, but there is no directed edge from b to a. Again, in
the graph we notice that there are two distinct vertices a and c such that there are no directed edges from a to
c and from c to a.
In a digraph of a partial order, one can see that if there is a directed edge from a vertex a to a different
vertex b, then there is no directed edge from b to a.
Theorem 4.2.10. A digraph of a partial order relation R cannot contain a closed directed path other than
loops. (A path a1 , a2 , . . . , an in a digraph is closed if a1 Ra2 , a2 Ra3 , . . . , an Ra1 .)
By the above theorem, it follows that if a digraph of a relation contains a closed path other than loops, then
the corresponding relation is not a partial order.
R = {(a, a), (b, b), (c, c), (a, b), (b, c), (c, a)}.
In this digraph, a, b, c, a forms a closed path. Hence, the given relation is not a partial order relation.
Hasse Diagram
Posets can also be represented visually by Hasse diagram. First we define a few terms that we will need in the
sequel.
Let (S, ≤) be a poset and x, y ∈ S. We say that y covers x, if x ≤ y, x 6= y, and there are no element
x ∈ S such that x < z < y.
4.2. PARTIALLY ORDERED SETS 41
We draw a diagram using the elements of S as follows: We represent the elements of S in the diagram by
the elements themselves such that if x ≤ y, then y is placed above x. We connect x with y by a line segment
if and only if y covers x. The resulting diagram is called the Hasse diagram of (S, ≤). We see the illustration
below.
Example 4.2.12. Let S = {1, 2, 3}. Then P(S) = {∅, {1}, {2}, {3}, {1, 2}, {2, 3}, {1, 3}, S}. Now, (P(S), ≤
) is a poset, where ≤ denotes the set inclusion relation. The poset diagram of (P(S), ≤) is shown in fig. 4.2.1.
Figure 4.2.1
Now, it is clear from the Hasse diagram that there exists no greatest or least element of the poset since no
element a satisfies b ≤ a for all b ∈ S ( for example, 2 ≤ 15 is not satisfied and also, 15 ≤ 20 is not satisfied),
42 UNIT 4.
and also, no element a exists that satisfy a ≤ b for every b ∈ S (if we consider 2 or 5 as the least element then
we see that 2 ≤ 5 does not hold and also 5 ≤ 2 does not hold). Now, 5 and 2 are definitely minimal elements
since there exist no element b ∈ S such that b < 2 or b < 5 (in other words, there is no line segment in the
Hasse diagram extending below 2 or 5). Also, 20 and 15 are maximal elements of the poset (verify).
The following lemma ensures the existence of minimal element for every finite poset.
Lemma 4.2.15. Let (S, ≤) be a poset such that S is a finite non-empty set. Then this poset has a minimal
element.
Proof. Let a1 be an element of S. If a1 is a minimal element, then we are done. Suppose a1 is not a minimal
element. Then there exists a2 ∈ S such that a2 < a1 . If a2 is a minimal element, then we are done, otherwise
there exists a3 ∈ S such that a3 < a2 . If a3 is not a minimal element, then we repeat this process. Now,
a3 < a2 < a1 shows that a3 , a2 , a1 are distinct elements in S. Since S is finite, after a finite number of steps,
we get an element an ∈ S, such that an is a minimal element.
We must note that, a poset (S, ≤), where S is a finite non-empty set, has minimal and maximal elements
but may not have least or greatest elements. You can take the previous example as a confirmation of this fact.
Definition 4.2.16. Let S be a set and let ≤1 and ≤2 be two partial orders on S. The relation ≤2 is said to be
compatible with the relation ≤1 if a ≤1 b implies a ≤2 b.
It should be noted that given a finite non-empty set, say S, we can define a linear order in it as follows.
Since S is non-empty, S has at least one element. Choose an element S, and call it the first element, a1 . Let
S1 = S \ {a1 }. If S1 is not empty, then from S1 , choose an element a2 . Let S2 = S \ {a1 , a2 }. If S2 is not
empty, then from S2 , choose an element a3 . Let S3 = S\{a1 , a2 , a3 }. If S3 is not empty, continue this process.
Since S is a finite set, this process must stop after a finite number of steps. Hence, there exists a positive integer
n such that Sn = S \ {a1 , . . . , an } is empty, where an is the element of Sn−1 = S \ {a1 , . . . , an−1 }. We now
define a partial order ≤1 on S by a1 ≤1 a2 ≤1 a3 · · · ≤1 an . This means that ai ≤1 aj if and only if either
i = j or i < j, where i, j ∈ {1, 2, . . . , n}. It follows that this is a linear order.
Next suppose that not only S is a finite non-empty set, but S also has a partial order ≤. Can we define a
linear order ≤1 on S that is compatible with the partial order ≤? This following theorem is all about answering
this question.
Theorem 4.2.17. Let (S, ≤) be a finite poset. There exists a linear order ≤1 on S which is compatible with
the relation ≤.
4.3 Lattice
Definition 4.3.1. Let (S, ≤) be a poset and let {a, b} be a subset of S. An element c ∈ S is called an upper
bound of {a, b} if a ≤ c and b ≤ c. Also, if T is any subset of S, then c ∈ S is called an upper bound of T if
t ≤ c for all t ∈ T .
An element d ∈ S is called least upper bound (lub) of {a, b} if,
We can also define the lub of any general subset T of S and denote it by sup T .
4.3. LATTICE 43
Example 4.3.2. Consider the set N together with the divisibility relation. Consider the subset {12, 8}. We
see that 24, 48, 72 are all common divisors of 12 and 8. Hence 12 ≤ 24 and 8 ≤ 24; 12 ≤ 48 and 8 ≤ 48;
12 ≤ 48 and 8 ≤ 48; 12 ≤ 72 and 8 ≤ 72. Thus, 24, 48, 72 are upper bounds of {12, 8} and hence 24 is the
least upper bound of {12, 8}. Notice that 24 6∈ {12, 8}.
Theorem 4.3.3. In a poset (S, ≤), if a subset {a, b} of S has a lub, then it is unique.
Proof. Let a, b ∈ S and a lub of {a, b} exists. Suppose c, d ∈ S are two lubs of {a, b}. Then c and d are upper
bounds of {a, b}. Since c is a lub of {a, b} and d is an upper bound, so c ≤ d. Similarly, d ≤ c. Then we have
c ≤ d and d ≤ c. By antisymmetry, we can say that c = d. Hence the result.
The lub of {a, b} in (S, ≤), if it exists, is denoted by a ∨ b, or the "join" of a and b.
Definition 4.3.4. Let (S, ≤) be a poset and let {a, b} be a subset of S. An element c ∈ S is called a lower
bound of {a, b} if c ≤ a and c ≤ b. Also, if T is any subset of S, then c ∈ S is called an lower bound of T if
c ≤ t for all t ∈ T .
An element d ∈ S is called greatest lower bound (glb) of {a, b} if,
We can also define the glb of any general subset T of S and denote it by inf T .
Theorem 4.3.5. In a poset (S, ≤), if a subset {a, b} of S has a glb, then it is unique.
The glb of {a, b} in (S, ≤), if it exists, is denoted by a ∧ b, or the "meet" of a and b.
Definition 4.3.6. A poset (L, ≤) is called a lattice if both a ∨ b and a ∧ b exist for all a, b ∈ L. A lattice L is
called complete if each of its subsets has a lub and glb in L.
Example 4.3.7. Any chain is a lattice in which a ∧ b is simply the smaller of a and b and a ∨ b is simply the
bigger of the two. Not every lattice is complete; the rational numbers are not complete with respect to the
"usual less than or equal to relation", and the real numbers (in their natural order) are also not complete unless
−∞ and ∞ are adjoined to it.
Example 4.3.8. Let L be the set of all nonnegative real numbers. Then (L, ≤) is a poset, where ≤ denotes the
usual "less than or equal to" relation. Let a, b ∈ L. Now, max{a, b} ∈ L and min{a, b} ∈ L. It is easy to see
that max{a, b} is the lub of {a, b} and min{a, b} is the glb of {a, b}. For example, max{2, 5} = 5 = 2 ∨ 5
and min{2, 5} = 2 = 2 ∧ 5. Hence (L, ≤) is a lattice. But it is not complete as we have discussed in the
previous example.
Example 4.3.9. Let S be a set. Then (P(S), ≤) is a poset, where ≤ is the set inclusion relation. For A, B ∈
P(S), we can show that A ∨ B = A ∪ B and A ∧ B = A ∩ B. Hence (P(S), T ≤) is a lattice. This
S lattice is
however, complete and the glb of any family A of subsets of S is simply A Aα and the lub is A Aα , both
of which belong to P(S).
Theorem 4.3.11. Let (S, ≤) be a poset and a, b ∈ S. Then the following conditions are equivalent:
1. a ≤ b;
2. a ∨ b = b;
3. a ∧ b = a.
Theorem 4.3.12. In any lattice (L, ≤), the operations of join and meet are isotonic, that is, if b ≤ c, then
a ∧ b ≤ a ∧ c and a ∨ b ≤ a ∨ c.
Theorem 4.3.13. In any lattice (L, ≤), we have the distributive inequalities
D (a ∧ b) ∨ (a ∧ c) ≤ a ∧ (b ∨ c),
D’ a ∨ (b ∧ c) ≤ (a ∨ b) ∧ (a ∨ c),
for all a, b, c ∈ L.
D1. a ∧ (b ∨ c) = (a ∧ b) ∨ (a ∧ c),
for all a, b, c ∈ L.
The two previous examples of lattices that we discussed earlier, were both distributive lattices. However, it
is worth mentioning that all lattices are not distributive as we see in the following example.
The next theorem gives a necessary and sufficient condition for a lattice to be distributive.
4.3. LATTICE 45
Figure 4.3.1
D2. a ∨ (b ∧ c) = (a ∨ b) ∧ (a ∨ c),
for all a, b, c ∈ L.
Proof. Suppose (L, ≤) is distributive. Let a, b, c ∈ L. Then
(a ∨ b) ∧ (a ∨ c) = ((a ∨ b) ∧ a) ∨ ((a ∨ b) ∧ c) by D1
= (a ∧ (a ∨ b)) ∨ ((a ∨ b) ∧ c) by L1
= a ∨ ((a ∨ b) ∧ c) by L4
= a ∨ (c ∧ (a ∨ b)) by L1
= a ∨ ((c ∧ a) ∨ (c ∧ b)) by D1
= (a ∨ (c ∧ a)) ∨ (c ∧ b) by L2
= (a ∨ (c ∧ a)) ∨ (b ∧ c) by L1
= a ∨ (b ∧ c) by L4.
a ∧ b = a ∧ c and a ∨ b = a ∨ c ⇒ b = c,
for all a, b, c ∈ L.
Proof. Let (L, ≤) be a distributive lattice. Now,
b = b ∧ (a ∨ b)
= b ∧ (a ∨ c)
= (b ∧ a) ∨ (b ∧ c)
= (a ∧ c) ∨ (b ∧ c)
= (c ∧ a) ∨ (c ∧ b)
= c ∧ (a ∨ b)
= c ∧ (a ∨ c)
= c.
46 UNIT 4.
Note that a poset (L, ≤) may not contain a greatest element, but from the antisymmetric property of ≤, it
can be shown that if there exists a greatest element in a poset, then it is unique, for if, a and b are two such
elements, then a ≤ b and by the same argument, b ≤ a, which implies that a = b. Similarly, a poset may
contain at most one least element. We denote the greatest element of L by I and the least element by O. The
elements O and I, when they exist, are called the universal bounds of L, since then O ≤ x and x ≤ I for all
x ∈ L.
O∧x=O and O ∨ x = x,
x∧I =x and x ∨ I = I,
for all x ∈ L.
a ≤ c ⇒ a ∨ (b ∧ c) ≤ (a ∨ b) ∧ c.
Definition 4.3.20. Let (L, ≤) be a lattice with I and O. If a ∈ L, then an element b ∈ L is said to be a
complement of a if a ∨ b = I and a ∧ b = O.
Example 4.3.21. Let D30 denote the set of all positive divisors of 30. Then
Now, (D30 , ≤) is a poset, where a ≤ b if and only if a divides b. Since 1 divides all the elements of D30 , it
follows that 1 ≤ m for all m ∈ D30 . Thus, 1 is the least element of this poset. Again, every member of D30
divides 30. Thus, m ≤ 30. Hence, 30 is the greatest element of this poset. The Hasse diagram of this poset is
given by fig. 4.3.2.
Figure 4.3.2
Let a, b ∈ D30 . Let d = gcd(a, b) and m = lcm(a, b). Now, d|a and d|b. Hence, d ≤ a and d ≤ b. This
shows that d is a lower bound of {a, b}. Let c ∈ D30 and c ≤ a and c ≤ b. Then c|a and c|b and since d is
the gcd of a and b, so c|d, and hence c ≤ d. Thus, d = gcd(a, b) = glb{a, b}. Since all the positive divisors
4.4. SUBLATTICE 47
of a, b are also divisors of 30, d ∈ D30 , so d = a ∧ b. Similarly we can show that m ∈ D30 and m = a ∨ b.
Hence D30 is a complete lattice with least element 1 and greatest element 30.
Now, for any a ∈ D30 , 30 a ∈ D30 . Using properties of gcd and lcm, we can show that for any a ∈ D30 ,
30 30
a∧ = 1 and a∨ = 30.
a a
Hence, every element a has a complement 30
a in D30 .
Note that for any positive integer n, we can construct the lattice (Dn , ≤), where ≤ denotes the usual
divisibility relation in a similar way as shown in the preceding example.
Theorem 4.3.22. In a distributive lattice (L, ≤) with I and O, every element has at most one complement.
Proof. Let a ∈ L. Suppose b, c are two complements of a in L. Then a ∨ b = I and a ∧ b = O; a ∨ c = I and
a ∧ c = O. Hence a ∨ b = a ∨ c and a ∧ b = a ∧ c. Then by theorem 4.3.17, it follows that b = c. Hence the
result.
A special type of distributive lattice is the Boolean Algebra. We will read about it in the next unit.
4.4 Sublattice
Definition 4.4.1. A sublattice of a lattice L is a subset X of L such that a, b ∈ X imply a ∧ b ∈ X and
a ∨ b ∈ X.
A sublattice is a lattice in its own right with the same join and meet operations. The empty set is a sublattice;
so is any one-element subset. More generally, given a ≤ b in a lattice L, the interval [a, b] of all elements
x ∈ L such that a ≤ x and x ≤ b is a sublattice.
A subset of a lattice L can be itself under the same (relative) order without being a lattice. Let us check the
following example.
Example 4.4.2. Let Σ consist of the subgroups of a group G and let ≤ be the usual set inclusion relation.
Then Σ is a complete lattice with H ∧ K = H ∩ K and H ∨ K the least subgroup in Σ containing H and
K (which is not their set-theoretic union). Here, the set-union of two non-comparable subgroups is never a
subgroup (since we know that the union of two subgroups H and K is a subgroup if and only if either H ≤ K
or K ≤ H). Hence this lattice is not a sublattice of the lattice of all subsets of G.
Definition 4.4.3. A property of subsets of a set I is a closure property when
1. I has the property, and
2. any intersection of subsets having the given property itself has this property.
Theorem 4.4.4. Let L be any complete lattice and let S be any subset of L such that
1. I ∈ S, and
2. T ⊂ S implies inf T ∈ S.
Then S is a complete lattice.
Proof. For any non-empty subset T of S, evidently inf T ∈ L is a member of S by 2, and it is the glb of T in
S. Also, let U be the set of all upper bounds of T in S. It is non-empty since I ∈ S. Then, inf U ∈ S is also
an upper bound of T . Moreover, it is the least upper bound since inf U ≤ u for all u ∈ U . This proves that S
is a complete lattice.
Corollary 4.4.5. Those subsets of any set which have a given closure property form a complete lattice, in
which the lattice meet of any family of subsets Sα is their intersection, and their lattice join is the intersection
of all subsets Tβ which contain every Sα .
48 UNIT 4.
Definition 4.5.1. The direct product P Q of two posets P and Q is the set of all ordered pairs (a, b) with
a ∈ P and b ∈ Q, partially ordered by the rule (a1 , b1 ) ≤ (a2 , b2 ) if and only if a1 ≤P a2 in P and b1 ≤Q b2
in Q.
Proof. For any two elements (a1 , b1 ) and (a2 , b2 ) in LM , the element (a1 ∨ a2 , b1 ∨ b2 ) (here we have taken
the join operations in all L, M and LM as ∨) contains both (a1 , b1 ) and (a2 , b2 ), hence is an upper bound for
the pair. Moreover every other upper bound (u, v) of the two satisfies a1 ≤ u and a2 ≤ u and hence by the
definition of lub, a1 ∨ a2 ≤ u. Similarly, b1 ∨ b2 ≤ v, and so, (a1 ∨ a2 , b1 ∨ b2 ) ≤ (u, v). This shows that
if the latter exists. By a similar argument for lower bound, we can show that
5. Define direct product of lattice. Show that the direct product is two lattice is a lattice.
6. Draw the Hasse diagram of D36 with respect to the usual divisibility relation and show that it is a lattice.
Also, find the complement of each of the elements of D36 , if it exists.
Unit 5
Course Structure
• Boolean Algebra: Basic Definitions, Duality, Basic theorems,
5.1 Introduction
In mathematics and mathematical logic, Boolean algebra is the branch of algebra in which the values of the
variables are the truth values true and false, usually denoted 1 and 0 respectively. Instead of elementary algebra
where the values of the variables are numbers, and the prime operations are addition and multiplication, the
main operations of Boolean algebra are the meet (and) denoted as ∧, the join (or) denoted as ∨, and the
negation (not) denoted as ¬. It is thus a formalism for describing logical operations in the same way that
elementary algebra describes numerical operations.
Boolean algebra was introduced by George Boole in his first book The Mathematical Analysis of Logic
(1847), and set forth more fully in his An Investigation of the Laws of Thought (1854). According to Hunt-
ington, the term "Boolean algebra" was first suggested by Sheffer in 1913, although Charles Sanders Peirce
in 1880 gave the title "A Boolian Algebra with One Constant" to the first chapter of his "The Simplest Mathe-
matics". Boolean algebra has been fundamental in the development of digital electronics, and is provided for
in all modern programming languages. It is also used in set theory and statistics.
Objectives
After reading this unit, you will be able to:
• deduce that an Boolean algebra is a lattice with respect to the partial order defined
49
50 UNIT 5.
The symbols ” + ” and ” · ” is just a convention. We could use any other symbols in place of these two.
Example 5.2.2. Let S be any set and P(S) be the set of all subsets of S. Then P(S) forms a Boolean
algebra where the binary operations (+) and (·) are the set-theoretic union and intersections respectively. The
corresponding identity elements are S and ∅ respectively. For every element T ∈ P(S), the complement is
given by S \ T .
Theorem 5.2.3. Every statement or algebraic identity deducible from the postulates of a Boolean algebra
remains valid if the operations (+) and (·), and the identity elements 0 and 1 are interchanged throughout.
This theorem is called the principle of duality.
Proof. The proof of this theorem follows at once from the symmetry of the postulates with respect to the two
operations and the two identities.
It should be noted that the steps in one proof are dual statements to those in the other, and the justification
for each step is the same postulate or theorem in one case as in the other.
Theorem 5.2.4. For every element a in a Boolean algebra B,
a + a = a and aa = a.
Proof.
a = a+0 by B2
0
= a + aa by B4
0
= (a + a)(a + a ) by B3
= (a + a)(1) by B4
= a + a, by B2
and similarly,
a = a(1) by B2
0
= a(a + a ) by B4
0
= aa + aa by B3
= aa + 0 by B4
= aa. by B2
5.2. BOOLEAN ALGEBRA 51
Thus, we can say that (+) and (·) operations are idempotent.
Theorem 5.2.5. For every element a in a Boolean algebra B,
a + 1 = 1 and a0 = 0.
Proof.
1 = a + a0 by B4
0
= a + a (1) by B2
0
= (a + a )(a + 1) by B3
= 1(a + 1) by B4
= a + 1. by B2
The other part is left as an exercise.
Theorem 5.2.7. In every Boolean algebra B, each of the binary operations (+) and (·) is associative. That is,
for every a, b, and c in B,
a + (b + c) = (a + b) + c and a(bc) = (ab)c.
Proof. First we will show that a + a(bc) = a + (ab)c, as follows:
a + a(bc) = a by Theorem 5.2.6
= a(a + c) by Theorem 5.2.6
= (a + ab)(a + c) by Theorem 5.2.6
= a + (ab)c. by B3
Next we will show that a0 + a(bc) = a0 + (ab)c, as follows:
a0 + a(bc) = (a0 + a)(a0 + bc) by B3
0
= 1(a + bc) by B4
0
= a + bc by B2
0 0
= (a + b)(a + c) by B3
0 0
= [1(a + b)](a + c) by B2
0 0 0
= [(a + a)(a + b)](a + c) by B4
0 0
= (a + ab)(a + c) by B3
0
= a + (ab)c. by B3
52 UNIT 5.
From now on, we shall write both a(bc) and (ab)c as abc, and similarly, we shall write both (a + b) + c and
a + (b + c) as a + b + c.
Theorem 5.2.8. The element a0 associated with the element a in a Boolean algebra is unique.
x = 1.x by B2
= (a + y)x by assumption
= (ax + yx) by B3 and B1
= 0 + yx by assumption
= yx by B2
= xy by B1
= xy + 0 by B2
= xy + ay by assumption
= (x + a)y by B3 and B1
= 1y by assumption
= y. by B2
Thus any two elements associated with a as specified in B4 are equal. In other words, a0 is uniquely
determined by a. We will refer to a0 as the complement of a.
Proof. By B4, a + a0 = 1 and aa0 = 0. But this is exactly the necessary condition that (a0 )0 is equal to a. By
the previous theorem, this is unique and hence the result.
Proof. By theorem 5.2.5, 1 + 0 = 1, and 1 · 0 = 0. Since theorem 5.2.8 shows that for each a there is only
one element a0 , these equations imply that 00 = 1, and 10 = 0.
Proof. First,
Further,
ab + a0 + b0 = a0 + b0 + ab by B1
0 0 0 0
= (a + b + a)(a + b + b) by B3
0 0
= (1 + b )(1 + a ) by B4 and B1
= 1. by theorem 5.2.5 and B2
Now, by B4 and theorem 5.2.8, we can show that (ab)0 = a0 + b0 . The part can be shown by duality principle.
Theorem 5.3.2. The following four properties of ≤ are valid in every Boolean algebra for arbitrary elements
x, y, and z:
1. x ≤ x (reflexive);
6. x ≤ y if and only if y 0 ≤ x0 .
54 UNIT 5.
x = x(1) by B2
0
= x(y + y ) by B4
0
= xy + xy by B3
= xy by assumption
= yx by B1
= yx + yx0 by B2 and assumption
0
= y(x + x ) by B3
= y(1) = y. by B4
xz 0 = xz 0 (1) by B2
0 0
= xz (y + y ) by B4
0 0 0
= xyz + xy z by B1 and associativity
= 0 + 0. by assumption
Thus, x ≤ z.
Hence x ≤ yz.
6. x ≤ y is equivalent to xy 0 = 0. Thus,
Hence, y 0 ≤ x0 .
The first three points of the above theorem show that B forms a poset with respect to the relation ≤ defined
above. We will show that Boolean algebra forms lattice with respect to the defined partial order.
5.3. BOOLEAN ALGEBRA AS LATTICES 55
Theorem 5.3.3. Let B be a Boolean algebra with respect to the partial order ≤ defined as x ≤ y if and only
if xy 0 = 0. Then B is a lattice with respect to ≤.
Proof. We will be done if we show that {x, y} has lub and glb in B. We show that x + y is the lub and xy is
the glb of the set. Since x(x + y)0 = x(x0 y 0 ) = xx0 y 0 = 0 and similarly, y(x + y)0 = 0 so x ≤ (x + y) and
y ≤ (x + y). Thus, x + y is an upper bound of {x, y}. Let z be any other upper bound of {x, y}. Then x ≤ z
and y ≤ z which imply xz 0 = 0 and yz 0 = 0. Now,
(x + y)z 0 = xz 0 + yz 0 = 0
which shows that x + y ≤ z. Thus, x + y is the lub of {x, y}. We can similarly show that xy is the glb of
{x, y}. Thus, (B, ≤) forms a lattice.
The join and meet are defined as x ∨ y = x + y and x ∧ y = xy, for any arbitrary x, y ∈ B.
Also, note from the previous theorems that a Boolean algebra is distributive, and each element of it has a
complement. Thus, a Boolean algebra is a distributive complemented lattice. Let us see certain examples.
Example 5.3.4. Let B be a Boolean algebra. We simplify the expression x + (yx)0 , where x, y ∈ B.
We have,
Example 5.3.5. In a Boolean algebra B, we simplify (xy)0 (x0 + y)(y 0 + y), for x, y ∈ B.
We have,
We have,
2. Define a partial order relation on a Boolean algebra B. Hence show that it is a lattice with respect to the
defined partial order.
7. Deduce the idempotent property of both the binary operators (+) and (·) in a Boolean algebra.
Course Structure
• Boolean Algebra: Boolean functions, Sum and Product of Boolean algebra,
6.1 Introduction
This unit starts with the dnf and cnf which are normals forms and continuation of the previous unit. Next we
move on to logic gates.
Logic is an extensive field of study with many special areas of inquiry. In general, logic is concerned with
the study and analysis of methods of reasoning or argumentation. Symbolic logic is not precisely defined as
distinct from logic in general, but might be described as a study of logic which employs an extensive use of
symbols. In any discussion of logic, the treatment centers around the concept of a proposition (statement).
The principal tool for treatment of propositions is the algebra of propositions, a Boolean algebra. In talking
about propositions, we will also investigate certain logical forms which represent acceptable techniques for
constructing precise proofs of theorems. Since statements are formed from words, it is apparent that some
consideration must be given to words and their meanings. No logical argument can be based on words that are
not precisely described. That part of logic which is concerned with the structure of statements is much more
difficult than the areas mentioned previously, and in fact, has not been satisfactorily formalized.
Objectives
After reading this unit, you will be able to
• define propositions and learn to form complex propositions by conjunction, disjunction and negation
57
58 UNIT 6.
• show that the set of all propositions form a Boolean algebra with respect to the conjunction, disjunction
and negation so defined
Definition 6.2.1. A Boolean function is said to be in disjunctive normal form in n variables x1 x2 , . . . , xn , for
n > 0, if the function is a sum of terms of the type f1 (x1 )f2 (x2 ) · · · fn (xn ), where fi (xi ) is xi , or x0i for each
i = 1, 2, . . . , n, and no two terms are identical. In addition, 0 and 1 are said to be in disjunctive normal form
in n variables for any n ≥ 0.
Some important properties of the disjunctive normal form are given in the following theorems.
Theorem 6.2.2. Every function in a Boolean algebra which contains no constants is equal to a function in
disjunctive normal form.
Example 6.2.3. Write the function f = (xy 0 + xz)0 + x0 in disjunctive normal form.
We have,
The usefulness of the normal form lies primarily in the fact that each function uniquely determines a normal
form in a given number of variables, as we shall see in later theorems. However, any function may be placed
in normal form in more than one way by changing the number of variables. For example, f = xy is in normal
form in x and y, but if xy is multiplied by z + z 0 , then f = xyz + xyz 0 also in normal form in the variables
x, y, and z. Similarly, g = x0 yz + xyz + x0 yz 0 − xyz 0 is in normal form in x, y, and z, but reduces, on
factoring, to g = x0 y − xy, which is in normal form in x and y. From now on we shall assume that unless
stated otherwise, disjunctive normal form refers to that disjunctive normal form which contains the smallest
possible number of variables. With this exception, we will be able to show that the normal form of a function
is uniquely determined by the function.
Suppose that we desire to select a single term out of the possible terms in a disjunctive normal form in n
variables. This corresponds to selecting either xi or x0i , for each of the n variables xi , i = 1, 2, . . . , n. Thus
there are exactly 2n distinct terms which may occur in a normal form in n variables.
Theorem 6.2.4. That disjunctive normal form in n variables which contains 2n terms is called the complete
disjunctive normal form in n variables.
It will be a consequence of the following theorems that the complete disjunctive normal form is identically
1. A simple argument to prove this directly is to note that for any variable xj , the coefficients of xj and x0j
must be identical in a complete normal form, namely, these coefficients are each the complete normal form in
the remaining n − 1 variables. Factoring serves to eliminate xj , and this process may be repeated to eliminate
each variable in succession, thus reducing the expression to 1.
Theorem 6.2.5. If each of n variables is assigned the value 0 or 1 in an arbitrary, but fixed manner, then
exactly one term of the complete disjunctive normal form in the n variables will have the value 1 and all other
terms will have the value 0.
Proof. Let a1 , a2 , . . . , an represent the values assigned to x1 , x2 , . . . , xn in that order, where each ai is 0 or
1. Select a term from the complete normal form as follows: use xi if ai = 1, and use x0i if ai = 0 for each
xi , i = 1, 2, . . . , n. The term so selected is then a product of n ones, and hence is 1. All other terms in the
complete normal form will contain at least one factor 0 and hence will be 0.
Corollary 6.2.6. Two functions are equal if and only if their respective disjunctive normal forms contain the
same terms.
Proof. Two functions with the same terms are obviously equal. Conversely, if two functions are equal, then
they must have the same value for every choice of value for each variable. In particular, they assume the same
value for each set of values 0 and 1 which may be assigned to the variables. By idempotent property, the
combinations of values of 0 and 1 which, when assigned to the variables, make the function assume the value
1 uniquely determine the terms which are present in the normal form for the function. Hence both normal
forms contain the same terms.
Corollary 6.2.7. To establish any identity in Boolean algebra, it is sufficient to check the value of each
function for all combinations of 0 and 1 which may be assigned to the variables.
We have seen in the preceding theorems that a function is completely determined by the values it assumes
for each possible assignment of 0 and 1 to the respective variables. This suggests that functions could be
conveniently specified by giving a table to represent such properties. In applications, particularly to the design
of circuits, this is precisely the way in which Boolean functions are constructed. If such a table has been given,
then the function, in disjunctive normal form, may be written down by inspection. For each set of conditions
for which the function is to be 1, a corresponding term is included in the disjunctive normal form selected,
as indicated in the proof of the idempotent property in the previous unit. The sum of these terms gives the
function, although not necessarily in simplest form. The following example indicates this method.
60 UNIT 6.
Row x y z f (x, y, z)
1 1 1 1 0
2 1 1 0 1
3 1 0 1 1
4 1 0 0 0
5 0 1 1 0
6 0 1 0 0
7 0 0 1 1
8 0 0 0 0
Table 6.1
Example 6.2.8. Find and simplify the function f (x, y, z) specified by table 6.1
Note that the table shows the value of f for each of the 23 = 8 possible assignments of 0 and 1 to x, y, and
z.
We observe that for the combinations represented by rows 2, 3 and 7 of the table, the function will have
the value 1. Thus the disjunctive normal form of f will contain three terms. For 2, since the x variable is
1, y variable is 1 and z variable is zero, the term in f corresponding to this combination will be xyz 0 (note
that the value if 1 by idempotent property). Similarly, for the terms in 3 and 7th rows, we get xy 0 z and x0 y 0 z
respectively (each giving values 1). Thus, summing these terms over, we get f (x, y, z) = xyz 0 + xy 0 z + x0 y 0 z.
We have
f (x, y, z) = xyz 0 + xy 0 z + x0 y 0 z
= xyz 0 + (x + x0 )y 0 z
= xyz 0 + y 0 z.
Exercise 6.2.9. 1. Express the following in disjunctive normal form in the smallest possible number of
variables:
(a) x0 yz + xy 0 z 0 + x0 y 0 z + x0 yz 0 + xy 0 z + x0 y 0 z 0
(b) (x + y 0 )(y + z 0 )(z + x0 )(x0 + y 0 )
(c) (u + v + w)(uv + u0 w)0
(d) xy 0 + xz + xy
(e) xyz + (x + y)(x + z)
(f) x + x0 y
(g) (x + y)(x + y 0 )(x0 + z)
2. Write separately, and simplify, the three functions f1 , f2 and f3 as given in the table 6.3.
Row x y z f1 f2 f3
1 1 1 1 0 0 1
2 1 1 0 1 1 1
3 1 0 1 0 1 0
4 1 0 0 1 0 0
5 0 1 1 0 0 0
6 0 1 0 0 1 0
7 0 0 1 0 1 1
8 0 0 0 0 0 1
Table 6.2
Definition 6.3.4. That conjunctive normal form in n variables which contains 2n factors is called the complete
conjunctive normal form in n variables.
Theorem 6.3.5. If each of n variables is assigned the value 0 or 1 in an arbitrary, but fixed manner, then
exactly one factor of the complete conjunctive normal form in the n variables will have the value 0 and all
other factors will have the value 1.
Note that to select the factor which will be 0 when a set of values a1 , a2 , . . . , an are assigned to x1 , x2 , . . . , xn
in that order, where each ai is 0 or 1, we simply apply duality principle of that described in the previous sec-
tion. xi is selected if ai = 0 and x0i is selected if ai = 1 for each i = 1, 2, . . . , n. The proper factor is then the
sum of these letters, each of which has value 0. All other factors have the value 1.
Corollary 6.3.6. Two functions, each expressed in conjunctive normal form in n variables, are equal if and
only if they contain identical factors.
62 UNIT 6.
Row x y z f (x, y, z)
1 1 1 1 1
2 1 1 0 1
3 1 0 1 0
4 1 0 0 1
5 0 1 1 1
6 0 1 0 1
7 0 0 1 0
8 0 0 0 1
Example 6.3.7. Find and simplify the function f (x, y, z) specified in the table above.
Observe that only two rows of the table show the value 0 for f . Corresponding to the third row, we see that
x is 1, y is 0 and z is 1. So the corresponding factor will be x0 + y + z 0 . Similarly, for the 7th row, we have
the term as x + y + z 0 . Thus, we would have f (x, y, z) = (x0 + y + z 0 )(x + y + z 0 ) which gives us,
In problems of this type, the disjunctive normal form would normally be used if the number of 1’s is were
less than the number of 0’s in the f column, and the conjunctive normal form would be used if the number of
0’s were less than the number of l’s.
Again, as in the previous section, we can use the conjunctive normal form to find complements of functions
written in this form by inspection. The complement of any function written in conjunctive normal form is that
function whose factors are exactly those factors of the complete conjunctive normal form which are missing
from the given function. For example, the complement of (x + y 0 )(x0 + y) is (x + y)(x0 + y 0 ).
It may be desirable to change a function from one normal form to the other. This can be done more readily
than by following the general procedure for converting a function to a particular form. An example will
illustrate the method, which is based on the fact that (f 0 )0 = f .
Example 6.3.8. Find the conjunctive normal form for the function
f = xyz + x0 yz + xy 0 z 0 + x0 yz 0 .
We have,
f = xyz + x0 yz + xy 0 z 0 + x0 yz 0
= [(xyz + x0 yz + xy 0 z 0 + x0 yz 0 )0 ]0
= [(x0 + y 0 + z 0 )(x + y 0 +, z 0 )(x0 + y + z)(x + y 0 + z)]0
= (x + y + z)(x0 + y + z 0 )(x + y + z 0 )(x0 + y 0 + z).
Here, the first complement was taken with the aid of D’Morgan’s law and the second complement was taken
by the method discussed above. These steps could have been reversed, with the same results. A similar
procedure will change a function from conjunctive normal form to disjunctive normal form.
6.4. PROPOSITIONS AND DEFINITIONS OF SYMBOLS 63
Exercise 6.3.9. 1. Express each of the following in conjunctive normal form in the smallest possible num-
ber of variables:
2. Change each of the following from disjunctive normal form to conjunctive normal form:
(a) uv + u0 v + u0 v 0
(b) abc + ab0 c0 + a0 bc0 + a0 b0 c + a0 b0 c0
3. Change each of the following from conjunctive normal form to disjunctive normal form:
4. Write separately, and simplify, the four functions f1 , f2 , f3 and f4 as given in the table below. Use
whichever normal form seems easier.
Row x y z f1 f2 f3 f4
1 1 1 1 1 0 0 1
2 1 1 0 0 1 1 1
3 1 0 1 1 0 0 1
4 1 0 0 1 0 1 0
5 0 1 1 1 0 1 1
6 0 1 0 1 0 1 1
7 0 0 1 0 1 0 1
8 0 0 0 1 0 0 0
Table 6.3
and which has the property that it is either true or false, but not both. The following examples are typical
propositions:
3 is a prime number;
living creatures exist on the planet Venus.
Note that of these propositions, the first is known to be true, while the second is either true or false. In contrast
to these, the following is not a proposition:
We shall use lower case italic letters to represent propositions. Where no specific proposition is given, these
will be called propositional variables and used to represent arbitrary propositions.
From any proposition, or set of propositions, other propositions may be formed. The simplest example is
that of forming from the proposition p, the negation of p, denoted by ¬p or p0 . or example, suppose that p is
the proposition
sleeping is pleasant.
has negation
sleeping is unpleasant.
Any two propositions p and q may be combined in various ways to form new propositions. To illustrate, let p
be the proposition
ice is cold,
and let q be the proposition
blood is green.
These propositions may be combined by the connective and to form the proposition
This proposition is referred to as the conjunction of p and q. We will denote the conjunction of p and q by
pq, and we will require that the proposition be true in those cases in which both p and q are true, and false in
cases in which either one or both of p and q are false.
Another way in which the propositions in the preceding paragraph may be combined is indicated in the
proposition
either ice is cold or blood is green.
This proposition is referred to as the disjunction of p and q. We will denote the disjunction of p and q by p + q
and is the proposition "either p or q or both". We will require that this proposition be true whenever either one
of p and q or both are true, and false only when both are false.
It follows from our definitions that the negation of "p or q" is the proposition "not p and not q," which can
also be stated "neither p nor q." Likewise, the negation of "p and q" is "either not p or not q. " That is, the laws
of D’Morgan hold for propositions just as they do for sets. In symbolic form we have the following laws for
propositions:
(p + q)0 = p0 q 0
(pq)0 = p0 + q 0 .
Example 6.4.1. Let p be the proposition "missiles are costly" and q be the proposition "Grandpa chews gum".
Write in English the propositions represented by the symbols
6.5. TRUTH TABLES 65
1. p + q 0 2. p0 q 0 3. pq 0 + p0 q
We have
Then,
1. p + q 0 : Either missiles are costly or Grandpa does not chew gum.
2. p0 q 0 : Missiles are not costly and Grandpa does not chew gum.
3. pq 0 + p0 q : Either missiles are costly and Grandpa does not chew gum, or missiles are not costly and
Grandpa chews gum.
2. Let p be the proposition "mathematics is easy," and let q be the proposition "two is less than three."
Write out, in reasonable English, the propositions represented by
i. p + q ii. pq 0 + p0 q
Row p q pq p+q
1 1 1 1 1
2 1 0 0 1
3 0 1 0 1
4 0 0 0 0
We again attempt to draw the truth table of (r0 + pq)0 , for three propositions p, q and r. Since we have three
propositions and two truth values, viz., 0 and 1, so we have 23 = 8 possible combinations of truth values for
the propositions. The table is given in table 6.4.
Table 6.4
An illustration of the usefulness of truth tables occurs in the proof of the following theorem. From the
definition of equality, it follows that two functions are equal if and only if their truth tables are identical. This
fact is used in the third part of the proof below.
Proof. In order to prove that the set of propositions forms a Boolean algebra, we will have to show that the
four postulates hold which we stated in the beginning of the previous unit. We begin with them one by one.
(a) From the definition of disjunction and conjunction of propositions (denoted as (·) and (+)), it follows
that they are commutative and hence the first postulate holds true.
(b) 0 is the identity element for the operation (+) since 0 + p has the same truth value as p and hence equals
p. Similarly, (1)(q) has the same truth value as q and hence equals q, showing that 1 is the identity for
the operation of conjunction.
(c) Each operation is distributive over the other as is shown by the table below (table 6.5).
From table 6.5, it can be seen that the truth values of p + qr and (p + q)(p + r) are same, and hence
they are equal. Also, the truth values of pq + pr and p(q + r) are same and hence they are equal.
(d) For each proposition p, there is a second proposition p0 , the negation of p, which satisfies the relations
pp0 = 0 and p + p0 = 1 as can be verified by the truth table 6.6.
Thus, p0 is the complement of p.
Table 6.5
p p0 pp0 p + p0
1 0 0 1
0 1 0 1
Table 6.6
Exercise 6.5.2. 1. Determine which of the following are tautologies by constructing the truth table for
each.
(a) pq + p0 + q 0 2. p + q + p0
2. Define conjunctive normal form. Find the conjunctive normal form for the function f (x, y, z) = xy 0 +
xz + xy, in the smallest possible number of variables.
4. Let p be the proposition "x is an even number," and let q be the proposition "x is the product of two
integers." Translate into symbols each of the following propositions.
(b) Either x is an even number and a product of integers, or x is an odd number and is not a product
of integers.
(c) x is neither an even number nor a product of integers.
Course Structure
• Boolean Algebra: Logic gates and circuits,
• Applications of Boolean Algebra to Switching theory (using AND, OR, & NOT gates),
• Karnaugh Map method.
7.1 Introduction
In this unit, we will introduce a third important application of Boolean algebra, the algebra of circuits, involv-
ing two-state (bistable) devices. The simplest example of such a device is a switch or contact. The theory
introduced holds equally well for such two-state devices as rectifying diodes, magnetic cores, transistors,
various types of electron tubes, etc. The nature of the two states varies with the device and includes conduct-
ing versus nonconducting, closed versus open, charged versus discharged, magnetized versus nonmagnetized,
high-potential versus low-potential, and others. The algebra of circuits is receiving more attention at present,
both from mathematicians and from engineers, than either of the two applications of Boolean algebra which
we considered in the previous chapters. The importance of the subject is reflected in the use of Boolean al-
gebra in the design and simplification of complex circuits involved in electronic computers, dial telephone
switching systems, and many varied kinds of electronic control devices. The algebra of circuits fits into the
general picture of Boolean algebra as an algebra with two elements 0 and 1. This means that except for the
terminology and meaning connecting it with circuits, it is identical with the algebra of propositions considered
as an abstract system. Either of these Boolean algebras is much more restricted than an algebra of sets.
Objectives
After reading this unit, you will be able to
• learn basic elements of a switching circuit
• learn to minimize a switching circuit using Boolean function
• define the logical circuit elements
• learn to simplify functions using Karnaugh maps
69
70 UNIT 7.
Figure 7.2.1
closed switch, and the value 0 if it represents an open switch. If a and a0 both appear, then a is 1 if and only
if a0 is 0. A switch that is always closed is represented by 1, one that is always open by 0. Letters play the
role of variables which take on the value 0 or 1, and we note the close analogy to proposition variables, which
have the same possible values, although the meaning attached to these values has changed.
Two circuits involving switches a, b, . . . are said to be equivalent if the closure conditions of the two circuits
are the same for any given position of the switches involved (values of the variables a, b, . . .). That is, they are
equivalent if for every position of the switches, current may either pass through both (both closed) or not pass
through either (both open). Two algebraic expressions are defined to be equal if and only if they represent
equivalent circuits.
It is now possible, by drawing the appropriate circuits and enumerating the possible positions of the
switches involved, to check that each of the laws of Boolean algebra is valid when interpreted in terms of
switching circuits. For example, consider the circuits that realize the functions on each side of the identity
stating the distributive law for (+) over (·), shown in figure 7.2.2. By inspection, it is apparent that the circuit
Figure 7.2.2
is closed (current can pass) if switch x is closed, or if both y and z are closed, and that the circuit is open
(current cannot pass) if x and either y or z are open. Hence the circuits are equivalent, and this distributive
law holds.
7.2. SWITCHING CIRCUITS 71
A simpler procedure for checking the validity of the fundamental laws is to note that numerical values of
the switching functions a0 , ab, and a + b are identical to the truth tables for the corresponding propositional
functions (table 7.1).
Row a b a0 ab a+b
1 1 1 0 1 1
2 1 0 0 0 1
3 0 1 1 0 1
4 0 0 1 0 0
Example 7.2.1. We want to find a circuit which realizes the Boolean function xyz 0 + x0 (y + z 0 ).
This expression indicates a series connection of x, y, and z 0 in parallel with a circuit corresponding to
x (y 0 + z 0 ). This latter circuit consists of x0 in series with a parallel connection of y and z. Hence the circuit
0
Figure 7.2.3
Example 7.2.2. We want to find the Boolean function which represents the circuit shown in fig. 7.2.4.
Figure 7.2.4
Table 7.2
Figure 7.2.5
In using the basic laws of Boolean algebra, it often happens that a possible simplification is overlooked. It
may happen that a certain step is easier to recognize if stated in terms of one of the dual laws rather than in
terms of the other. This suggests another method of simplification which may help. To simplify a function f ,
7.3. LOGICAL CIRCUIT ELEMENTS 73
Row x y z f1
1 1 1 1 0
2 1 1 0 1
3 1 0 1 1
4 1 0 0 0
5 0 1 1 0
6 0 1 0 0
7 0 0 1 0
8 0 0 0 1
Table 7.3
Figure 7.2.6
the dual of f may be taken and the resulting expression simplified. If the dual is taken again, the function f is
obtained in a different form. This will usually be simpler than the original.
Example 7.2.6. Simplify the circuit in fig. 7.2.8. The circuit is represented by the function f = cb + ab0 cd +
cd0 + ac0 + a0 bc0 + b0 c0 d0 . Consider the first three terms as the function g, and the last three terms as the
function h. Then g = cb + ab0 cd + cd0 . The dual of g, which we write as d(g) is then
h = ac0 + a0 bc0 + b0 c0 d0
d(h) = (a + c0 )(a0 + b + c0 )(b0 + c0 + d0 ) = c0 + abd0 ,
Figure 7.2.7
Figure 7.2.8
enough to know that these elements can be constructed; in fact, commercially packaged elements of these
types, suitable for use in many types of equipment, can be purchased directly. We will conceive of a logical
circuit element as a little box or package with one or more input leads (wire connections) and one or more
output leads. These leads will carry signals in the form of positive voltage corresponding to a value 1, or zero
voltage corresponding to a value 0. We will use a single letter, say x, to stand for the condition of the lead.
When the lead carries a signal, we will say that x takes on the value 1. When the lead does not carry a signal,
we say that x has the value 0. This represents only a slight modification of our earlier point of view, where
1 and 0 meant closed or open circuits, since we can think of a closed circuit as one carrying a signal, and of
an open circuit as one incapable of carrying a signal. Other signals than that of a positive voltage could be
used equally well, and in fact the signal used will in general depend on the type of components used in circuit
construction. We will use just this one type of signal for simplicity, and we will adapt all our circuits to its
use.
We will draw a circuit element as a circle with a letter inside to designate the type of element, and with lines
indicating inputs and outputs. Arrows on these lines will indicate the difference between input and output, an
arrow pointing toward the circle being used on each input.
The first logical circuit element we will consider has a single input and a single output. The function of
this element is to obtain the complement of a given signal; that is, the output is 0 when the input is 1, and
conversely. Fig. 7.3.1 shows the notation we will use, a circle with C in the center. The input is designated x,
so the output is x0 .
The next two logical circuit elements correspond to the logical connections "and" and "or." Each may have
two or more inputs and only a single output. The "and" element is shown in diagrams as a circle with A in the
center. This element produces an output signal (output has value 1) if and only if every input carries a signal
(has value 1). If the inputs to an "and" element are x, y, and z, for example, the output function may be written
as xyz, where the notation is that of Boolean algebra. The "or" element, represented graphically by a circle
with O in the center, produces an output signal whenever one or more inputs carry a signal. If the inputs to an
"or" element are x, y, and z, for example, the output is the Boolean function x + y + z. Fig. 7.3.2 shows the
symbolic notations for these elements. Each is shown with only two inputs.
Figure 7.2.9
Figure 7.2.10
of a truth table.
The diagram below illustrates the correspondence between the Karnaugh map and the truth table for the
general case of a two variable problem (fig. 7.4.1).
The values inside the squares are copied from the output column of the truth table, therefore there is one
square in the map for every row in the truth table. Around the edge of the Karnaugh map are the values of the
two input variable. x is along the top and y is down the left hand side. The diagram 7.4.2 explains this:
The values around the edge of the map can be thought of as coordinates. So as an example, the square on
the top right hand corner of the map in the above diagram has coordinates x = 1 and y = 0. This square
corresponds to the row in the truth table where x = 1 and y = 0 and f = 1. Note that the value in the f
column represents a particular function to which the Karnaugh map corresponds.
Example 7.4.1. Consider the following map (fig. 7.4.3). The function plotted is:
f (x, y) = xy 0 + xy.
Note that values of the input variables form the rows and columns. That is the logic values of the variables
x and y (with one denoting true form and zero denoting false form) form the head of the rows and columns
respectively. Bear in mind that the above map is a one dimensional type which can be used to simplify an
expression in two variables. There is a two-dimensional map that can be used for up to four variables, and a
three-dimensional map for up to six variables.
Using algebraic simplification,
f = xy 0 + xy = x(y 0 + y) = x.
76 UNIT 7.
Figure 7.3.1
Figure 7.3.2
x y f
0 0 a
0 1 b
1 0 c
1 1 d
Figure 7.4.1
x y f
0 0 0
0 1 1
1 0 1
1 1 1
Figure 7.4.2
Variable B becomes redundant due to B4. Referring to the map 7.4.3, the two adjacent 1’s are grouped
together. Through inspection it can be seen that variable y has its true and false form within the group. This
eliminates variable y leaving only variable x which only has its true form. The minimised answer therefore is
f.
Example 7.4.2. Consider the expression f (x, y) = x0 y 0 + xy 0 + x0 y plotted on the Karnaugh map 7.4.4. Pairs
of 1’s are grouped as shown in the figure, and the simplified answer is obtained by using the following steps:
Note that two groups can be formed for the example given above, bearing in mind that the largest rectangu-
lar clusters that can be made consist of two 1’s. Notice that a 1 can belong to more than one group. The first
group labelled I, consists of two 1’s which correspond to x = 0, y = 0 and x = 1, y = 0. Put in another
way, all squares in this example that correspond to the area of the map where y = 0 contains 1’s, independent
of the value of x. So when y = 0, the output is 1. The expression of the output will contain the term y 0 .
For group labelled II corresponds to the area of the map where x = 0. The group can therefore be defined
as x0 . This implies that when x = 0 the output is 1. The output is therefore 1 whenever y = 0 and x = 0.
Hence the simplified answer is f = x0 + y 0 .
7.5. FEW PROBABLE QUESTIONS 77
Figure 7.4.3
Figure 7.4.4
Figure 7.5.1
4. Find circuits which realize each of the functions given in table 7.6.
Row x y z f1 f2
1 1 1 1 1 1
2 1 1 0 0 1
3 1 0 1 0 0
4 1 0 0 1 1
5 0 1 1 1 1
6 0 1 0 1 0
7 0 0 1 0 1
8 0 0 0 1 1
Table 7.6
Figure 7.5.2
Unit 8
Course Structure
• Combinatorics : Introduction, Basic counting principles,
8.1 Introduction
Combinatorics studies the way in which discrete structures can be combined or arranged. Enumerative com-
binatorics concentrates on counting the number of certain combinatorial objects - e.g. the twelvefold way
provides a unified framework for counting permutations, combinations and partitions. Analytic combinatorics
concerns the enumeration (i.e., determining the number) of combinatorial structures using tools from complex
analysis and probability theory. In contrast with enumerative combinatorics which uses explicit combinatorial
formulae and generating functions to describe the results, analytic combinatorics aims at obtaining asymptotic
formulae. Design theory is a study of combinatorial designs, which are collections of subsets with certain in-
tersection properties. Partition theory studies various enumeration and asymptotic problems related to integer
partitions, and is closely related to q-series, special functions and orthogonal polynomials. Originally a part
of number theory and analysis, partition theory is now considered a part of combinatorics or an independent
field. Order theory is the study of partially ordered sets, both finite and infinite.
Objectives
After reading this unit, you will be able to
• learn the sum rule and product rule principles and solve examples related to them
• learn various mathematical functions such as factorial function, and solve examples related to them
• learn pigeonhole and generalized pigeonhole principle and solve related sums
79
80 UNIT 8.
1. Sum Rule Principle: Suppose some event E can occur in m ways and a second event F can occur in
n ways, and suppose both events cannot occur simultaneously. Then E or F can occur in m + n ways.
2. Product Rule Principle: Suppose there is an event E which can occur in m ways and, independent of
this event, there is a second event F which can occur in n ways. Then combinations of E and F can
occur in mn ways.
The above principles can be extended to three or more events. That is, suppose an event E1 can occur in n1
ways, a second event E2 can occur in n2 ways, and, following E2 ; a third event E3 can occur in n3 ways, and
so on.
Sum Rule: If no two events can occur at the same time, then one of the events can occur in:
n1 + n2 + · · · ways.
Product Rule: If the events occur one after the other, then all the events can occur in the order indicated
in:
n1 · n2 · · · ways.
Example 8.2.1. Suppose a college has 3 different history courses, 4 different literature courses, and 2 different
sociology courses.
1. The number m of ways a student can choose one of each kind of courses is m = 3(4)(2) = 24.
2. The number n of ways a student can choose just one of the courses is n = 3 + 4 + 2 = 9.
There is a set theoretical interpretation of the above two principles. Specifically, suppose n(A) denotes the
number of elements in a set A. Then:
2. Product Rule Principle: Let A × B be the Cartesian product of sets A and B. Then
Example 8.2.2. There are four bus lines between A and B, and three bus lines between B and C. Find the
number m of ways that a man can travel by bus: (a) from A to C by way of B; (b) roundtrip from A to C by
way of B; (c) roundtrip from A to C by way of B but without using a bus line more than once.
(a) There are 4 ways to go from A to B and 3 ways from B to C; hence n = 4 · 3 = 12.
(b) There are 12 ways to go from A to C by way of B, and 12 ways to return. Thus n = 12 · 12 = 144.
8.3. MATHEMATICAL FUNCTIONS 81
(c) The man will travel from A to B to C to B to A. Enter these letters with connecting arrows as follows:
A → B → C → B → A.
The man can travel four ways from A to B and three ways from B to C, but he can only travel two
ways from C to B and three ways from B to A since he does not want to use a bus line more than once.
Enter these numbers above the corresponding arrows as follows:
4 3 2 3
A→
− B→
− C→
− B→
− A.
(a + b)n . Specifically:
Theorem 8.3.2. (Binomial Theorem)
n
n
X n
(a + b) = an−k bk .
k
k=0
The coefficients of the successive powers of a + b can be arranged in a triangular array of numbers, called
Pascal’s triangle, as pictured in fig. 8.3.1. The numbers in Pascal’s triangle have the following interesting
properties:
82 UNIT 8.
Figure 8.3.1
2. Every other number can be obtained by adding the two numbers appearing above it.
Since these numbers are binomial coefficients, we state the above property formally.
Theorem 8.3.3.
n+1 n n
= + .
r r−1 r
Exercise 8.3.4. 1.√ Compute: (a) 4!, 5!; (b) 6!, 7!, 8!, 9!; (c) 50! [Hint: For large n, use Sterling’s approxi-
mation: n! = 2πnnπ e−n , where e ≈ 2.718].
2. Compute: (a) 18 12
5 , (b) 4
3. Prove
17 16 16
= + .
6 5 6
8.4 Permutations
Definition 8.4.1. Any arrangement of a set of n objects in a given order is called a permutation of the object
(taken all at a time). Any arrangement of any r ≤ n of these objects in a given order is called an "r-
permutation" or "a permutation of the n objects taken r at a time."
• BDCA, DCBA, and ACDB are permutations of the four letters (taken all at a time).
• BAD, ACB, DBC are permutations of the four letters taken three at a time.
• AD, BC, CA are permutations of the four letters taken two at a time.
We usually are interested in the number of such permutations without listing them. The number of permuta-
tions of n objects taken r at a time will be denoted by P (n, r). We have the following theorem.
Theorem 8.4.2.
n!
P (n, r) = .
(n − r)!
8.4. PERMUTATIONS 83
, ,
The first letter can be chosen in 6 ways; following this the second letter can be chosen in 5 ways; and, finally,
the third letter can be chosen in 4 ways. Write each number in its appropriate position as follows:
6 , 5 , 4
By the Product Rule there are m = 6 · 5 · 4 = 120 possible three-letter words without repetition from the
six letters. Namely, there are 120 permutations of 6 objects taken 3 at a time. This agrees with the formula in
the previous theorem.
P (6, 30) = 6 · 5 · 4 = 120.
Consider now the special case of P (n, r) when r = n. We get the following result.
Corollary 8.4.4. There are n! permutations of n objects (taken all at a time).
For example, there are 3! = 6 permutations of the three letters A, B, C. These are:
ABC, ACB, BAC, BCA, CAB, CBA.
Ordered Samples
Definition 8.4.7. Many problems are concerned with choosing an element from a set S, say, with n elements.
When we choose one element after another, say, r times, we call the choice an ordered sample of size r.
We consider two cases.
1. Sampling with replacement: Here the element is replaced in the set S before the next element is
chosen. Thus, each time there are n ways to choose an element (repetitions are allowed). The Product
rule tells us that the number of such samples is:
n · n · n · · · n (r factors) = nr .
2. Sampling without replacement: Here the element is not replaced in the set S before the next element
is chosen. Thus, there is no repetition in the ordered sample. Such a sample is simply an r-permutation.
Thus the number of such samples is:
n!
P (n, r) = n(n − 1)(n − 2) · · · (n − r + 1) = .
(n − r)!
Example 8.4.8. Three cards are chosen one after the other from a 52-card deck. Find the number m of ways
this can be done: (a) with replacement; (b) without replacement.
(b) Here there is no replacement. Thus the first card can be chosen in 52 ways, the second in 51 ways, and
the third in 50 ways. Therefore, m = P (52, 3) = 52(51)(50) = 132600.
Exercise 8.4.9. 1. Find the number n of distinct permutations that can be formed from all the letters of
each word: (a) THOSE; (b) UNUSUAL; (c) SOCIOLOGICAL.
3. A class contains 8 students. Find the number n of samples of size 3: (a)With replacement; (b)Without
replacement.
8.5 Combinations
Definition 8.5.1. Let S be a set with n elements. A combination of these n elements taken r at a time is any
selection of r of the elements where order does not count. Such a selection is called an r-combination; it is
simply a subset of S with r elements. The number of such combinations will be denoted by C(n, r).
Before we give the general formula for C(n, r), we consider a special case.
Example 8.5.2. Find the number of combinations of 4 objects, A, B, C, D, taken 3 at a time.
Each combination of three objects determines 3! = 6 permutations of the objects as follows:
Thus the number of combinations multiplied by 3! gives us the number of permutations; that is,
P (4, 3)
C(4, 3) · 3! = P (4, 3) or C(4, 3) = .
3!
But P (4, 3) = 4 · 3 · 2 = 24 and 3! = 6; hence C(4, 3) = 4 as noted above.
As indicated above, any combination of n objects taken r at a time determines r! permutations of the objects
in the combination; that is,
P (n, r) = r!C(n, r).
Accordingly, we obtain the following formula for C(n, r) which we formally state as a theorem.
Theorem 8.5.3. We have,
P (n, r) n!
C(n, r) = = .
r! r!(n − r)!
n
Recall that the binomial coefficient r was defined as
n n!
= .
r r!(n − r)!
Hence,
n
C(n, r) = .
r
n
We shall use C(n, r) and r interchangeably.
Example 8.5.4. A farmer buys 3 cows, 2 pigs, and 4 hens from a man who has 6 cows, 5 pigs, and 8 hens.
Find the number m of choices that the farmer has.
The farmer can choose the cows in C(6, 3) ways, the pigs in C(5, 2) ways, and the hens in C(8, 4) ways.
Thus the number m of choices follows:
6 5 8
m= = 20 · 10 · 70 = 14000.
3 2 4
Example 8.5.5. A class contains 10 students with 6 men and 4 women. We want to find the number n of ways
to:
(a) select a 4-member committee from the students. This concerns combinations, not permutations, since
order does not count in a committee. There are "10 choose 4" such committees. That is:
10
n = C(10, 4) = = 210.
4
(b) select a 4-member committee with 2 men and 2 women. The 2 men can be chosen from the 6 men in
C(6, 2) ways, and the 2 women can be chosen from the 4 women in C(4, 2) ways. Thus, by the Product
Rule:
6 4
n= = 15 · 6 = 90.
2 2
(c) elect a president, vice president, and treasurer. This concerns permutations, not combinations, since
order does count. Thus, n = P (6, 3) = 6 · 5 · 4 = 120.
Exercise 8.5.6. 1. A box contains 8 blue socks and 6 red socks. Find the number of ways two socks can
be drawn from the box if: (a) They can be any color. (b) They must be the same color.
2. Find the number m of committees of 5 with a given chairperson that can be selected from 12 people.
86 UNIT 8.
Theorem 8.6.1. (Pigeonhole Principle) If n pigeonholes are occupied by n + 1 or more pigeons, then at least
one pigeonhole is occupied by more than one pigeon.
This principle can be applied to many problems where we want to show that a given situation can occur.
Example 8.6.2. 1. Suppose a department contains 13 professors, then two of the professors (pigeons)
were born in the same month (pigeonholes).
2. Find the minimum number of elements that one needs to take from the set S = {1, 2, . . . , 9} to be sure
that two of the numbers add up to 10.
Here the pigeonholes are the five sets {1, 9}, {2, 8}, {3, 7}, {4, 6}, {5}. Thus any choice of six elements
(pigeons) of S will guarantee that two of the numbers add up to ten.
Example 8.6.4. Find the minimum number of students in a class to be sure that three of them are born in the
same month.
Here n = 12 months are the pigeonholes, and k + 1 = 3, so k = 2. Hence among any kn + 1 = 25 students
(pigeons), three of them are born in the same month.
Exercise 8.6.5. 1. Find the minimum number of students needed to guarantee that five of them belong to
the same class (Freshman, Sophomore, Junior, Senior).
2. Let L be a list (not necessarily in alphabetical order) of the 26 letters in the English alphabet (which
consists of 5 vowels, A, E, I, O, U, and 21 consonants).
(a) Show that L has a sublist consisting of four or more consecutive consonants.
(b) Assuming L begins with a vowel, say A, show that L has a sublist consisting of five or more
consecutive consonants.
In other words, to find the number n(A ∪ B) of elements in the union of A and B, we add n(A) and n(B) and
then we subtract n(A ∩ B). This follows from the fact that, when we add n(A) and n(B), we have counted
the elements of n(A ∩ B) twice. The principle in fact holds for any finite number of sets. We state it for three
sets.
Example 8.7.2. Find the number of mathematics students at a college taking at least one of the languages
French, German, and Russian, given the following data:
We want to find n(F ∪ G ∪ R), where F, G, and R denote the sets of students studying French, German, and
Russian, respectively.
By the InclusionâĂŞExclusion Principle,
n(A1 ∪ A2 ∪ . . . ∪ Am ) = s1 − s2 + s3 − · · · + (−1)m−1 sm .
Exercise 8.7.4. 1. Suppose among 32 people who save paper or bottles (or both) for recycling, there are
30 who save paper and 14 who save bottles. Find the number m of people who: (a) save both; (b) save
only paper; (c) save only bottles.
2. Let A, B, C, D denote, respectively, art, biology, chemistry, and drama courses. Find the number N of
students in a dormitory given the data:
Figure 8.8.1
(b) Mark and Erik are to play a tennis tournament. The first person to win two games in a row or who wins
a total of three games wins the tournament. We want to find the number of ways the tournament can
occur.
The tree diagram showing the possible outcomes of the tournament appears in fig. 8.8.1 (b). Here the
tree is constructed from top-down rather than from left-right. (That is, the âĂIJrootâĂİ is on the top of
the tree.) Note that there are 10 endpoints, and the endpoints correspond to the following 10 ways the
tournament can occur:
M M, M EM M, M EM EM, M EM EE, M EE, EM M, EM EM M, EM EM E, EM EE, EE.
The path from the beginning (top) of the tree to the endpoint describes who won which game in the
tournament.
Exercise 8.8.3. 1. Teams A and B play in a tournament. The first team to win three games wins the
tournament. Find the number n of possible ways the tournament can occur. Construct the appropriate
tree diagram.
2. Construct the tree diagram that gives the permutations of {a, b, c}.
3. Find n if: (a) P (n, 4) = 42P (n, 2); (b) 2P (n, 2) + 50 = P (2n, 2).
4. Consider all positive integers with three different digits. (Note that zero cannot be the first digit.) Find
the number of them which are: (a) greater than 700; (b) odd; (c) divisible by 5.
5. A class contains l0 students. Find the number n of ordered samples of size 4: (a) with replacement; (b)
without replacement.
6. A women student is to answer 10 out of 13 questions. Find the number of her choices where she must
answer:
7. Consider all integers from 1 up to and including 300. Find the number of them that are divisible by:
9. State pigeonhole principle. Suppose 5 points are chosen at random in the interior of an equilateral
triangle T where each side has length two inches. Show that the distance between two of the points
must be less than one inch.
Unit 9
Course Structure
• Grammar and Language : Introduction, Alphabets, Words, Free semi group,
• Languages, Regular expression and regular languages. Finite Automata (FA). Grammars.
9.1 Introduction
Automata theory is the study of abstract machines and automata, as well as the computational problems that
can be solved using them. It is a theory in theoretical computer science and discrete mathematics (a subject of
study in both mathematics and computer science). The word automata (the plural of automaton) comes from
a Greek word, which means "self-making".
Automata theory is closely related to formal language theory. An automaton is a finite representation of a
formal language that may be an infinite set. Automata are often classified by the class of formal languages
they can recognize, typically illustrated by the Chomsky hierarchy, which describes the relations between
various languages and kinds of formalized logics.
Automata play a major role in theory of computation, compiler construction, artificial intelligence, parsing
and formal verification.
Objectives
After reading this unit, you will be able to
• see that the set of all words form a semigroup with respect to the concatenation of words
• define finite state automata and related terms and find its relation with languages
90
9.2. ALPHABET, WORDS, FREE SEMIGROUP 91
For example, suppose A = {a, b, c}. Then the following sequences are words on A:
When discussing words on A, we frequently call A the alphabet, and its elements are called letters. We will
also abbreviate our notation and write a2 for aa, a3 for aaa, and so on. Thus, for the above words, u = abab2
and v = ac2 ba3 .
The empty sequence of letters, denoted by λ, or , or 1 is also considered to be a word on A, called the
empty word. The set of all words on A is denoted by A∗ .
Definition 9.2.2. The length of a word u, written |u| or l(u), is the number of elements in its sequence of
letters.
For the above words u and v, we have l(u) = 5 and l(v) = 7. Also, l(λ) = 0.
Unless otherwise stated, the alphabet A will be finite, the symbols u, v, w will be reserved for words on A,
and the elements of A will come from the letters a, b, c.
Definition 9.2.3. (Concatenation) Consider two words u and v on the alphabet A. The concatenation of u
and v, written uv, is the word obtained by writing down the letters of u followed by the letters of v.
As with letters, for any word u, we define u2 = uu, u3 = uuu, and in general, un+1 = uun .
Clearly, for any words u, v, w, the words (uv)w and u(vw) are identical, they simply consist of the letters
of u, v, w written down one after the other. Also, adjoining the empty word before or after a word u does not
change the word u. That is:
Theorem 9.2.4. The concatenation operation for words on an alphabet A is associative. The empty word λ is
an identity element for the operation.
(Generally speaking, the operation is not commutative, e.g., uv 6= vu for the above words u and v.)
Definition 9.2.5. (Subwords, Initial Segments) Consider any word u = a1 a2 . . . an on an alphabet A. Any
sequence w = aj aj+1 . . . ak is called a subword of u. In particular, the subword w = a1 a2 . . . ak beginning
with the first letter of u, is called an initial segment of u. In other words, w is a subword of u if u = v1 wv2
and w is an initial segment of u if u = wv. Observe that λ and u are both subwords or uv since u = λu.
Consider the word u = abca. The subwords and initial segments of u are as follows:
Observe that the subword w = a appears in two places in u. The word ac is not a subword of u even though
all its letters belong to u.
92 UNIT 9.
Definition 9.2.6. Let F denote the set of all non-empty words from an alphabet A with the operation of
concatenation. As noted above, the operation is associative. Thus F is a semigroup; it is called the free
semigroup over A or the free semigroup generated by A.
One can easily show that F satisfies the right and left cancellation laws. However, F is not commutative
when A has more than one element. We will write FA for the free semigroup over A when we want to specify
the set A.
Now let M = A∗ be the set of all words from A including the empty word λ. Since λ is an identity element
for the operation of concatenation, M is a monoid, called the free monoid over A.
9.3 Languages
Definition 9.3.1. A language L over an alphabet A is a collection of words on A. Recall that A∗ denotes the
set of all words on A. Thus a language L is simply a subset of A∗ .
1. L1 = {a, ab, ab2 , . . .}, consisting of all words beginning with an a and followed by zero or more b’s.
L1 L2 = {abm abn : m ≥ 0, n ≥ 0}
Clearly, the concatenation of languages is associative since the concatenation of words is associative.
Powers of a language L are defined as follows:
1. The symbol λ and the pair "()"(empty expression) are regular expressions;
9.4. REGULAR EXPRESSIONS AND REGULAR LANGUAGES 93
2. Let r = aa∗ . Then L(r) consists of all positive powers of a excluding the empty word.
3. Let r = a ∨ b∗ . Then L(r) consists of a or any word in b, that is, L(r) = {a, λ, b, b2 , . . .}.
2. L2 = {am bm : m > 0}. L2 consists of all words beginning with one or more a’s followed by the same
number of b’s. There exists no regular expression r such that L2 = L(r); that is, L2 is not a regular
language.
Exercise 9.4.6. 1. Let u = a2 b and v = b3 ab. Find (a) uvu; (b)λu, uλ.
2. State the difference between the free semigroup on an alphabet A and the free monoid on A.
4. Let A = {a, b, c}. State whether w belongs to L(r) or not, where (a) r = a∗ ∨(b∨c)∗ ; (b) r = a∗ (b∨c)∗ .
94 UNIT 9.
4. An initial state s0 in S.
5. A next-state function F : S × A → A.
Example 9.5.2. The following defines an automaton M with two input symbols and three states:
4. s0 , initial state,
1. The vertices of D(M ) are the states in S and an accepting state is denoted by means of a double circle.
2. There is an arrow (directed edge) in D(M ) from state sj to state sk labelled by an input a if F (sj , a) =
sk .
3. The initial state s0 is indicated by means of a special arrow which terminates at s0 but has no initial
vertex.
For each vertex sj and each letter a in the alphabet A, there will be an arrow leaving sj , which is labelled by
a; hence the outdegree of each vertex is equal to number of elements in A. For notational convenience, we
label a single arrow by all the inputs which cause the same change of state rather than having an arrow for
each such input.
The state diagram D = D(M ) of the automaton M in the preceding Example is shown in fig. 9.5.1.
9.5. FINITE STATE AUTOMATA 95
Figure 9.5.1
P = (s0 , a1 , s1 , a2 , s2 , . . . , am , sm ).
We say that M recognizes the word w if the final state sm is an accepting state in Y . The language L(M ) of
M is the collection of all words from A which are accepted by M .
Example 9.5.3. We determine whether or not the automaton M in fig. 9.5.1accepts the words
w1 = ababba, w2 = baab, w3 = λ.
Using fig. 9.5.1 and the words w1 and w2 , we obtain the respective paths:
a b a b b a b a a b
P1 = s0 −
→ s0 →
− s1 −
→ s0 →
− s1 →
− s2 −
→ s2 , and P2 = s 0 →
− s1 −
→ s0 −
→ s0 →
− s1
The final state in P1 is s2 which is not in Y . Hence w1 is not accepted by M . Also, the final state of P2 is s1
which is in Y so w2 is accepted by M . The final state determined by w3 is the initial state s0 which is in Y .
Thus w3 is also accepted by M .
We also describe the language L(M ). L(M ) will consist of all words w on A which do not have two
successive b’s. This comes from the following facts:
1. We can enter the state s2 if and only if there are two successive b’s.
Example 9.5.4. Consider the automaton M in fig. 9.5.2. We want to find the words w in language L that are
accepted by M . The system can reach the accepting state s2 only when there exists an a in w which follows
a b.
The fundamental relationship between regular languages and automata is contained in the following theo-
rem.
Theorem 9.5.5. (Kleene): A language L over an alphabet A is regular if and only if there is a finite state
automaton M such that L = L(M ).
96 UNIT 9.
Figure 9.5.2
Figure 9.5.3
Example 9.5.6. Let A = {a, b}. We construct an automaton M which will accept precisely those words from
A which end in two b’s. Since b2 is accepted, but not λ or b, we need three states, s0 , the initial state, and s1
and s2 with an arrow labelled b going from s0 to s1 and one from s1 to s2 . Also, s2 is an accepting state, but
not s0 nor s1 . This gives the graph in fig. 9.5.3(a). On the other hand, if there is an a, then we want to go
back to s0 , and if we are in s2 and there is a b, then we want to stay in s2 . These additional conditions give
the required automaton M which is shown in fig. 9.5.3(b).
Example 9.5.7. Let A = {a, b}. We construct an automaton M which will accept those words from A which
begin with an a followed by (zero or more) b’s in fig. 9.5.4.
Figure 9.5.4
Pumping Lemma
Let M be an automaton over A with k states. Suppose w = a1 a2 · · · an is a word over A accepted by M and
suppose |w| = n > k, the number of states. Let P = (s0 , s1 , . . . , sn ) be the corresponding sequence of states
determined by the word w. Since n > k, two of the states in P must be equal, say si = sj where i < j. Let
9.6. GRAMMARS 97
x = a1 a2 · · · ai , y = ai+1 · · · aj , z = aj+1 · · · an .
As shown in fig. 9.5.5, xy ends in si = sj ; hence xy m also ends in si . Thus, for every m, wm = xy m z ends
in sn , which is an accepting state.
Figure 9.5.5
1. M has k states.
Example 9.5.9. We want to show that the language L = {am bm : m > 0} is not regular.
Suppose L is regular. Then by theorem 9.5.5, there exists a finite state automaton M which accepts L.
Suppose M has k states. Let w = ak bk . Then |w| > k. By theorem 9.5.8, w = xyz where y is not empty
and w2 = xy 2 z is also accepted by M . If y consists of only a’s or only b’s, then w2 will not have the same
number of a’s as b’s. If y contains both a’s and b’s, then w2 will have a’s following b’s. In either case, w2
does not belong to L, which is a contradiction. Hence L is not regular.
9.6 Grammars
Fig. 9.6.1 shows the grammatical construction of a specific sentence. Observe that there are:
Figure 9.6.1
The final sentence only contains terminals, although both variables and terminals appear in its construction by
the productions. This intuitive description is given in order to motivate the following definition of a grammar
and the language it generates.
Definition 9.6.1. A phrase structure grammar or, simply, a grammar G consists of four parts:
1. A finite set (vocabulary) V ;
2. A subset T of V whose elements are called terminals; the elements of N = V \ T are called non-
terminals or variables;
4. A finite set P of productions. (A production is an ordered pair (α, β), usually written α → β, where α
and β are words in V , and the production must contain at least one non-terminal on its left side α.)
Such a grammar G is denoted by G = G(V, T, S, P ).
The following notation, unless otherwise stated or implied, will be used for our grammars. Terminals will
be denoted by italic lower case Latin letters a, b, c . . ., and non-terminals will be denoted by italic capital Latin
letters A, B, C, . . ., with S as the start symbol. Also, Greek letters, α, β, . . ., will denote words in V , that is,
words in terminals and non-terminals. Furthermore, we will write α → (β1 , β2 , · · · , βk ).
w ⇒ w0
if w0 can be obtained from w by using one of the productions; that is, if there exist words u and v such that
w = uαv and w0 = uβv and there is a production α → β. Furthermore, we write
w ⇒⇒ w0
9.7. FEW PROBABLE QUESTIONS 99
L(G) = {w ∈ T ∗ : S ⇒⇒ w}
Example 9.6.2. The following defines a grammar G with S as the start symbol:
1 2 3 4 5
V = {A, B, S, a, b}, T = {a, b}, P = {S →
− AB, A →
− Aa, B →
− Bb, A →
− a, B →
− b}
will produce the word w = ar abs b, where r and s are non-negative integers. On the other hand, no sequence
of productions can produce an a after a b. Accordingly,
That is, the language L(G) of the grammar G consists of all words which begin with one or more a’s followed
by one or more b’s.
2. Define finite state automaton. Let M be the automaton with the following input set A, state set S with
initial state s0 and accepting set Y :
Draw the State diagram D(M ) of M . Also, describe the language L = L(M ) accepted by M .
3. Let A = {a, b}. Construct an automaton M which will accept precisely those words from A which
have an even number of a’s.
4. Find the language L(G) generated by the grammar G with variables S, A, B, terminals a, b, and pro-
ductions S → aB, B → b, B → bA, A → aB.
Unit 10
Course Structure
• Finite State Machine. Non-deterministic and deterministic FA.
• Computable Functions.
10.1 Introduction
This unit discusses two types of "machines." The first is a finite state machine (FSM) which is similar to a
finite state automaton (FSA) except that the finite state machine "prints" an output using an output alphabet
which may be distinct from the input alphabet. The second is the celebrated Turing machine which may be
used to define computable functions.
Objectives
After reading this unit, you will be able to
• define finite state machines and draw their state tables and diagrams
100
10.2. FINITE STATE MACHINES 101
F a b
s0 s1 , x s2 , y
s1 s2 , x s1 , z
s2 s0 , z s1 , y
Table 10.1
listed on the left of the table with the initial state first, and the input symbols are listed on the top of the table.
The entry in the table is a pair (sk , zr ) where sk = f (si , aj ) is the next state and zr = g(si , aj ) is the output
symbol. The corresponding state diagram is given in fig. 10.2.1.
The state diagram D = D(M ) of a finite state machine M is a labelled digraph the vertices of which are
the states of M . Moreover, if
F (si , aj ) = (sk , zr ) or equivalently, sk = f (si , aj ), zr = g(si , aj )
then there is an arc (arrow) from si to sk which is labelled with the pair aj , zr . We usually put the input
symbol ai near the base of the arrow (near si ) and the output symbol zr near the center of the arrow. We also
label the initial state s0 by drawing an extra arrow into s0 . See fig. 10.2.1.
102 UNIT 10.
Figure 10.2.1
u = a1 a2 · · · am
We visualize these symbols on an "input tape." The machine M "reads" these input symbols one by one and,
simultaneously, changes through a sequence of states
v = s0 s1 s2 · · · sm
w = z1 z2 · · · zm
on an "output tape." Formally, the initial state s0 and the input string u determine the strings v and w as
follows, where i = 1, 2, . . . , m:
Example 10.2.3. Consider the machine M of fig. 10.2.1. Suppose the input is the word u = abaab. We
calculate the sequence v of states and the output word w from the state diagram as follows. Beginning at the
initial state s0 , we follow the arrows which are labelled by the input symbols as follows:
a,x b,z a,x a,z b,y
s0 −−→ s1 −→ s1 −−→ s2 −−→ s0 −−→ s2
v = s0 s1 s1 s2 s0 s2 and w = xzxy.
Binary Addition
This subsection describes a finite state machine M which can do binary addition. By adding 0’s at the begin-
ning of our numbers, we can assume that our numbers have the same number of digits. If the machine is given
the input 1101011 + 0111011 then we want the output to be the binary sum 10100110. Specifically, the input
is the string of pairs of digits to be added:
where b denotes blank spaces, and the output should be the string:
0, 1, 1, 0, 0, 1, 0, 1
We also want the machine to enter a state called "stop" when the machine finishes the addition.
The input symbols and output symbols are, respectively, as follows:
Here n is the initial state. The machine is shown in fig. 10.2.2. In order to show the limitations of our
Figure 10.2.2
Theorem 10.2.4. There is no finite state machine M which can do binary multiplication.
If we limit the size of the numbers that we multiply, then such machines do exist. Computers are important
examples of finite state machines which multiply numbers, but the numbers are limited as to their size.
A = {a1 , a2 , . . . , am } ∪ {B}
104 UNIT 10.
S = {s1 , s2 , . . . , sn } ∪ {sH , sY , sN }
where sH (HALT) is the halting state, sY (YES) is the accepting state, and sN (NO) is the non-accepting
state.
d = {L, R}
Definition 10.3.1. An expression is a finite (possibly empty) sequence of elements from A ∪ S ∪ d. In other
words, an expression is a word whose letters (symbols) come from the sets A, S, and d.
Definition 10.3.2. A tape expression is an expression using only elements from the tape set A.
The Turing machine M may be viewed as a read/write tape head which moves back and forth along an
infinite tape. The tape is divided lengthwise into squares (cells), and each square may be blank or hold one
tape symbol. At each step in time, the Turing machine M is in a certain internal state si scanning one of the
tape symbols aj on the tape. We assume that only a finite number of non-blank symbols appear on the tape.
Fig. 10.3.1(a) is a picture of a Turing machine M in state s2 scanning the second symbol where a1 a3 Ba1 a1
is printed on the tape. (Note again that B is the blank symbol.) This picture may be represented by the
expression α = a1 s2 a3 Ba1 a1 where we write the state s2 of M before the tape symbol a3 that M is scanning.
Observe that α is an expression using only the tape alphabet A except for the state symbol s2 which is not at
the end of the expression since it appears before the tape symbol a3 that M is scanning. Fig. 10.3.1 shows
two other informal pictures and their corresponding picture expressions.
Figure 10.3.1
Definition 10.3.3. A picture α is an expression as follows where P and Q are tape expressions (possibly
empty):
α = P si ak Q
Definition 10.3.4. Let α = P si ak Q be a picture. We say that the Turing machine M is in state si scanning
the letter ak and that the expression on the tape is the expression P ak Q, that is, without its state symbol si .
As mentioned above, at each step in time the Turing machine M is in a certain state si and is scanning a
tape symbol ak . The Turing machine M is able to do the following three things simultaneously:
1. M erases the scanned symbol ak and writes in its place a tape symbol al (where we permit al = ak );
3. M moves one square to the left or moves one square to the right.
The above action by M may be described by a five-letter expression called a quintuple which we define below.
First condition guarantees that the machine M cannot do more than one thing at any given step, and second
condition guarantees that M halts in state sH , sY , or sN .
The following is an alternative equivalent definition.
S \ {sH , sY , sN } × A into A × S × d
The term partial function simply means that domain of M is a subset of S \ {sH , sY , sN } × A.
The action of the Turing machine described above can now be formally defined.
Definition 10.3.8. Let α and β be pictures. We write α → β if one of the following holds where a, b, c are
tape letters and P and Q are tape expressions (possibly empty):
Observe that, in all four cases, M replaces a on the tape by b (where we permit b = a), and M changes its
state from si to sj (where we permit sj = si ). Furthermore:
3. Here M moves to the right; however, since M is scanning the rightmost letter, it must add the blank
symbol B on the right.
4. Here M moves to the left; however, since M is scanning the leftmost letter, it must add the blank symbol
B on the left.
Observe that the initial picture α(W ) of the input W is obtained by placing the initial state s0 in front of the
input tape expression W . In other words, the Turing machine M begins in its initial state s0 and it is scanning
the first letter of W .
Definition 10.3.12. Let M be a Turing machine and let W be an input. We say M halts on W if there is a
computation beginning with the initial picture α(W ).
That is, given an input W , we can form the initial picture α(W ) = s0 (W ) and apply M to obtain the
sequence
α(W ) → α1 → α2 → . . .
Two things can happen:
1. M halts on W . That is, the sequence ends with some terminal Picture αr .
Theorem 10.3.13. A language L is recognizable by a Turing machine M if and only if L is a type 0 language.
Definition 10.4.1. Each number n will be represented by the tape expression hni where hni = 1n+1 .
Definition 10.4.2. Let E be an expression. Then [E] will denote the number of times 1 occurs in E.
Definition 10.4.3. A function f : N0 → N0 is computable if there exists a Turing machine M such that, for
every integer n, M halts on hni and
f (n) = [term(α(hni)].
We then say that M computes f .
That is, given a function f and an integer n, we input hni and apply M . If M always halts on hni and
the number of 1’s in the final picture is equal to f (n), then f is a computable function and we say that M
computes f .
Example 10.4.4. The function f (n) = n + 3 is computable. The input is W = 1n+1 . Thus we need only add
two 1’s to the input. A Turing machine M which computes f follows:
Observe that:
1. q1 moves the machine M to the left.
Thus M computes f (n) = n + 3. It is clear that, for any positive integer k, the function f (n) = n + k is
computable.
Theorem 10.4.5. Suppose f : N0 → N0 and g : N0 → N0 are computable. Then the composition function
h = g ◦ f is computable.
f (m) = [term(α(hmi))]
Observe that:
108 UNIT 10.
Accordingly, if m 6= 0, we have,
Table 10.2
(a) Find the input set A, the state set S, the output set Z, and the initial state.
(b) Draw the state diagram D = D(M ) of M
(c) Suppose w = aababaabbab is an input word (string). Find the corresponding output word v.
2. Define Turing machine. Suppose α = aas2 ba is a picture. Find β such that α → β if the Turing
machine M has the quintuple q where: (a) q = s2 bas1 L; (b) q = s2 bbs3 R.
Course Structure
• Fields and σ-fields of events. Probability as a measure. Random variables. Probability distribution.
11.1 Introduction
The theory of probability had its origin in gambling and games of chance. It owes much to the curiosity
of gamblers who pestered their friends in the mathematical world with all sorts of questions. A random (or
statistical) experiment is an experiment in which
• Any performance of the experiment results in an outcome that is not known in advance.
In probability theory we study this uncertainty of a random experiment. It is convenient to associate with
each such experiment a set Ω, the set of all possible outcomes of the experiment. To engage in any meaningful
discussion about the experiment, we associate with Ω a σ-field S of subsets of Ω. We recall that a σ-field is a
non-empty class of subsets of Ω that is closed under the formation of countable unions and complements and
contains the null set φ.
• S is a σ-field of subsets of Ω.
The elements of Ω are called sample points. Any set A ∈ S is known as an event. Clearly, A is a collection
of sample points. We say that an event A happens if the outcome of the experiment corresponds to a point in
A. Each one point set is known as a simple or elementary event. If the set Ω contains only a finite number of
points, we say that (Ω, S) is a finite sample space. If Ω contains at most a countable number of points, we call
(Ω, S) a discrete sample space. If, however, Ω contains uncountably many points, we say that (Ω, S) is an
uncountable sample space. In particular, if Ω = Rk or some rectangle in Rk , we call it a continuous sample
109
110 UNIT 11.
space.
Let us toss a coin. The set Ω is the set of symbols H and T , where H denotes head and T represents tail.
Also, S is the class of all subsets of Ω, namely {{H}, {T }, {H, T }, φ}. If the coin is tossed two times, then
and
n
S = φ, {(H, H)}, {(H, T )}, {(T, H)}, {(T, T )}, {(H, H)}
{(H, H), (H, T )}, {(H, H), (T, H)}, {(H, H), (T, T )}, {(H, T ), (T, H)},
{(T, T ), (T, H)}, {(T, T ), (H, T )}, {(H, H), (H, T ), (T, H)},
o
{(H, H), (H, T ), (T, T )}, {(H, H), (T, H), (T, T )}, {(H, T ), (T, H), (T, T )}, Ω (11.1.1)
where the first element of a pair denotes the outcome of the first toss, and the second element, the outcome of
the second toss. The event at least one head consists of sample points (H, H), (H, T ), (T, H). The event at
most one head is the collection of sample points (H, T ), (T, H), (T, T ).
Example 11.2.1. Suppose that a coin is tossed twice so that the sample space is S = {HH, HT, TH, TT}. Let
X represent the number of heads that can come up. With each sample point we can associate a number for X
as shown in Table 11.1. Thus, for example, in the case of HH (i.e., 2 heads), X = 2 while for TH (1 head), X
= 1. It follows that X is a random variable.
Table 11.1
A random variable that takes on a finite or countably infinite number of values is called a discrete random
variable while one which takes on a non-countably infinite number of values is called a non-discrete random
variable.
P (X = xk ) = f (xk ) k = 1, 2, . . . (11.3.1)
11.4. DISTRIBUTION FUNCTIONS FOR RANDOM VARIABLES 111
It is convenient to introduce the probability function, also referred to as probability distribution, given by
P (X = x) = f (x) (11.3.2)
For x = xk , this reduces to (11.3.1) while for other values of x, f (x) = 0. In general, f (x) is a probability
function if
1. f (x) ≥ 0
P
2. f (x) = 1
x
2. F (x) is continuous from the right [i.e., lim F (x + h) = F (x) for all x].
h→0+
It follows from the above that if X is a continuous random variable, then the probability that X takes on any
one particular value is zero, whereas the interval probability that X lies between two different values, say, a
and b, is given by
Zb
P (a < X < b) = f (x) dx (11.5.2)
a
Solution: Since f (x) satisfies Property 1 if c ≥ 0, it must satisfy Property 2 in order to be a density function.
Now
Z∞ Z3 3
2 cx3
f (x) dx = cx dx = = 9c
3
−∞ 0 0
Continuous case:
The case where both variables are continuous is obtained easily by analogy with the discrete case on replacing
sums by integrals. Thus the joint probability function for the random variables X and Y (or, as it is more
commonly called, the joint density function of X and Y ) is defined by
1. f (x, y) ≥ 0
R∞ R∞
2. f (x, y) dx dy = 1
−∞ −∞
Eq. (11.6.4) and (11.6.5) are called the marginal distribution functions, or simply the distribution functions,
of X and Y , respectively. The derivatives of (11.6.4) and (11.6.5) with respect to x and y are then called the
marginal density functions, or simply the density functions, of X and Y and are given by
Z∞ Z∞
f1 (x) = f (x, v) dv f2 (y) = f (u, y) du (11.6.6)
v=−∞ u=−∞
Theorem 11.7.2. Let X and Y be discrete random variables having joint probability function f (x, y). Sup-
pose that two discrete random variables U and V are defined in terms of X and Y by U = φ1 (X, Y ), V =
φ2 (X, Y ), where to each pair of values of X and Y there corresponds one and only one pair of values of U
and V and conversely, so that X = ψ1 (U, V ), Y = ψ2 (U, V ). Then the joint probability function of U and V
is given by
g(u, v) = f [ψ1 (u, v), ψ2 (u, v)] (11.7.2)
Continuous variables
Theorem 11.7.3. Let X be a continuous random variable with probability density f (x). Let us define U =
φ(X) where X = ψ(U ) as in Theorem 11.7.1. Then the probability density of U is given by g(u) where
Theorem 11.7.4. Let X and Y be continuous random variables having joint density function f (x, y). Let us
define U = φ1 (X, Y ), V = φ2 (X, Y ) where X = ψ1 (U, V ), Y = ψ2 (U, V ) as in Theorem 11.7.2. Then the
joint density function of U and V is given by g(u, v) where
2−x
x = 1, 2, 3, . . .
f (x) =
0 otherwise
Solution: Since U = X 4 + 1, the √ relationship between the values u and x of the random variables U and
X is given by u = x4 + 1 or x = 4 u − 1, where u = 2, 17, 82, . . . and the real positive root is taken. Then
the required probability function for U is given by
√
4
2−
u−1 u = 2, 17, 82, . . .
g(u) =
0 otherwise
Example 11.7.6. If the random variables X and Y have joint density function
xy/96 0 < x < 4, 1 < y < 5
f (x) =
0 otherwise
Solution: Consider u = xy 2 , v = x2 y. Dividing these equations, we obtain y/x = u/v so that y = ux/v.
This leads to the simultaneous solution x = v 2/3 u−1/3 , y = u2/3 v −1/3 . The image of 0 < x < 4, 1 < y < 5
in the uv-plane is given by
0 < v 2/3 u−1/3 < 4 1 < u2/3 v −1/3 < 5
which are equivalent to
v 2 < 64u v < u2 < 125v
The Jacobian is given by
11.8 Convolutions
As a particular consequence of the above theorems, we can show the density function of the sum of two
continuous random variables X and Y , i.e., of U = X + Y , having joint density function f (x, y) is given by
Z∞
g(u) = f (x, u − x) dx (11.8.1)
−∞
11.8. CONVOLUTIONS 115
In special case where the X and Y are independent, f (x, y) = f1 (x)f2 (y), and (11.8.1) reduces to
Z∞
g(u) = f1 (x)f2 (u − x) dx (11.8.2)
−∞
which is called the convolution of f1 and f2 , abbreviated, f1 ∗f2 . The following are some important properties
of the convolution:
1. f1 ∗ f2 = f2 ∗ f1
2. f1 ∗ (f2 ∗ f3 ) = (f1 ∗ f2 ) ∗ f3
These results show that f1 , f2 , f3 obey the commutative, associative and distributive laws of algebra with
respect to the operation of convolution.
Theorem 11.8.1. Let X and Y be random variables having joint density function f (x, y). Prove that the
density function of U = X + Y is
Z∞
g(u) = f (v, u − v) dv
−∞
Proof. Let U = X + Y , V = X, where we have arbitrary added the second equation. Corresponding to these
we have u = x + y, v = x or x = v, y = u − v. The Jacobian of the transformation is given by
∂x ∂x
∂u ∂v 0 1
J= ∂y ∂y = = −1
∂u ∂v
1 −1
g(u, v) = f (v, u − v)
Example 11.8.2. If X and Y are independent random variables having density functions
−2x −3y
2e x≥0 3e y≥0
f1 (x) = f2 (x) =
0 x<0 0 y<0
Solution: The required density function is the the convolution of f1 and f2 is given by
Z∞
g(u) = f1 ∗ f2 = f1 (v)f2 (u − v) dv
−∞
116 UNIT 11.
In the integrand f1 vanish when v < 0 and f2 vanishes when v > u. Hence
Zu
g(u) = (2e−2v )(3e−3(u−v) dv
0
Zu
−3u
= 6e ev dv = 6e−3u (eu − 1) = 6(e−2u − e−3u )
0
Course Structure
• Expectation. Moments. Moment inequalities, Characteristic function. Convergence of sequence of
random variables-weak convergence, strong convergence and convergence in distribution, continuity
theorem for characteristic functions. Weak and strong law of large numbers. Central Limit Theorem.
For a continuous random variable X having density function f (x), the expectation of X is defined as
Z∞
E(X) = x f (x) dx (12.1.2)
−∞
Another quantity of great importance in probability and statistics is called the variance and is defined by
The variance is a non-negative number. The positive square root of the variance is called the standard deviation
and is given by p p
σX = Var(X) = E[(X − µ)2 ] (12.1.4)
If X is a continuous random variable having density function f (x), then the variance is given by
Z∞
2 2
σX = E[(X − µ) ] = (x − µ)2 f (x) dx (12.1.5)
−∞
117
118 UNIT 12.
12.2 Moments
The r-th moment of a random variable X about the mean µ , also called the r-th central moment, is defined as
where r = 0, 1, 2, . . .. It follows that µ0 = 1, µ1 = 0 and µ2 = σ 2 , i.e., the second central moment or second
moment about the mean is the variance. We have, assuming absolute convergence,
X
µr = (x − µ)r f (x) (discrete variable) (12.2.2)
Z∞
µr = (x − µ)r f (x) dx (continuous variable) (12.2.3)
−∞
The r-th moment of X about the origin, also called the r-th raw moment, is defined as
where r = 0, 1, 2, . . . , and in this case there are formulas analogous to (12.2.2) and (12.2.3) in which µ = 0.
The relation between these moments is given by
r 0 r 0
µr = µ0r − µr−1 µ + . . . + (−1)j µ µj + . . . + (−1)r µr0 µr (12.2.5)
1 j r−j
Proof.
µr = E(X − µ)r ]
r r r−1 j r r−j j r−1 r r−1 r r
= E X − X µ + · · · + (−1) X µ + . . . + (−1) Xµ + (−1) µ
1 j r−1
r r r
= E(X r ) − E(X r−1 )µ + . . . + (−1)j E(X r−j )µj + . . . + (−1)r−1 E(X)µr−1
1 j r−1
+(−1)r µr
r 0 r 0
= µ0r − µr−1 µ + . . . + (−1)j µ µj + . . . + (−1)r−1 rµr + (−1)−r µr
1 j r−j
where the last two terms can be combined to give (−1)r−1 (r − 1)µr .
t2 X 2 t3 X 3
t
MX (t) = E(e X) = E 1 + tX + + + ...
2! 3!
t2 t3
= 1 + tE(X) + E(X 2 ) + E(X 3 ) + . . .
2! 3!
t2 t 3
= 1 + µt + µ02 + µ03 + . . . (12.3.2)
2! 3!
12.4. CHARACTERISTIC FUNCTION 119
Since the coefficients in this expansion enable us to find the moments, the reason for the name moment
generating function is apparent. From the expansion we can show that
dr
µ0r = MX (t) (12.3.3)
dtr
t=0
Find the moment generating function and the first four moments about origin.
Solution: We have
Z∞
tX
M (t) = E(e )= etX f (x) dx
−∞
Z∞ Z∞
tX −2x
= e (2e ) dx = 2 e(t−2x) dx
0 0
∞
2e(t−2)x 2
= = , assuming t < 2
t−2 2−t
0
It follows that
X
φX (ω) = eiωx f (x) (discrete variable) (12.4.2)
Z∞
φX (ω) = eiωx f (x) (continuous variable) (12.4.3)
−∞
120 UNIT 12.
Since |eiωt | = 1, the series and the integral always converge absolutely.
The corresponding results (12.3.2) and (12.3.3) becomes
ω2 ωr
φX (ω) = 1 + iµω − µ02 + · · · + ir µ0r + ··· (12.4.4)
2! r!
where
dr
µ0r = (−1)r ir φX (ω) (12.4.5)
dω r
ω=0
Theorem 12.4.1. If φX (ω) is the characteristic function of the random variable X and a and b (b 6= 0) are
constants, then the characteristic function of (X + a)/b is
ω
φ(X+a)/b (ω) = eaiω/b φX (12.4.6)
b
Theorem 12.4.2. If X and Y are independent random variables having characteristic functions φX (ω) and
φY (ω), respectively, then
φX+Y (ω) = φX (ω)φY (ω) (12.4.7)
Example 12.4.3. Find the characteristic function of the random variable X having density function given by
1/2a |x| < a
f (x) =
0 otherwise
Z∞ Za
iωX iωx 1
E(e = e f (x) dx = eiωx dx
2a
−∞ −a
a
1 eiωx eiaω− e−iaω sin aω
= = =
2a iω 2iaω aω
−a
Example 12.4.4. Find the characteristic function of the random variable X having density function f (x) =
ce−a|x| , −∞ < x < ∞, where a > 0, and c is a suitable constant.
Z∞
f (x) dx = 1
−∞
so that
Z∞ Z0 Z∞
Z∞
iωX
E(e ) = eiωx f (x) dx
−∞
Z0 Z∞
a
= eiωx e−a(−x) dx + eiωx e−ax dx
2
−∞ 0
Z∞
0
Z
a
= e(a+iω)x dx + e−(a−iω)x e−ax dx
2
−∞ 0
0 ∞
a e(a+iω)x e−(a−iω)x
= +a
2 a + iω −(a − iω)
−∞ 0
a a
= +
2(a + iω) 2(a − iω)
a2
=
a2 + ω 2
Z∞
2 2
σ = E[(X − µ) ] = (x − µ)2 f (x) dx
−∞
Since the integrand is non-negative, the value of the integral can only decrease when the range of integration
is diminished. Therefore,
Z Z Z
2 2 2 2
σ ≥ (x − µ) f (x) dx ≥ f (x) dx = f (x) dx
|x−µ|≥ |x−µ|≥ |x−µ|≥
σ2
P (|X − µ| ≥ ) ≤
2
122 UNIT 12.
Proof. We have
Then
Sn X1 + · · · + Xn 1 1
E =E = [E(X1 ) + · · · + E(Xn )] = (nµ) = µ
n n n n
Var(Sn ) = Var(X1 + · · · + Xn ) = Var(X1 ) + · · · + Var(Xn ) = nσ 2
so that
σ2
Sn 1
Var = Var(Sn ) =
n n2 n
Therefore, by Chebyshev’s inequality with X = Sn /n, we have
σ2
Sn
P −µ ≥ ≤ 2
n n
Note: Since Sn /n is the arithmetic mean of X1 , . . . , Xn , this theorem states that the probability of the
arithmetic mean Sn /n differing from its expected value µ by more than approaches zero as n → ∞. A
stronger result which we might expect to be true, is that
Sn
lim = µ,
n→∞ n
but this is actually false. However, we can prove that lim Sn /n = µ with probability one. This result is
n→∞
often called the strong law of large numbers, and, by contrast Theorem 12.6.1 is called the weak law of large
numbers.
probability of failure). The probability that the event will happen exactly x times in n trials (i.e., x successes
and n − x failures will occur) is given by the probability function
n x n−x n!
f (x) = P (X = x) = p q = px q n−x (12.7.1)
x x!(n − x)!
where the random variable X denotes the number of successes in n trials and x = 0, 1, . . . , n.
1 (x−µ)2
f (x) = √ e− 2σ2 , −∞ < x < ∞ (12.7.2)
σ 2π
where µ and σ are the mean and standard deviation, respectively. The corresponding distribution function is
given by
Zx
1 (v−µ)2
F (x) = P (X ≤ x) = √ e− 2σ2 dv (12.7.3)
σ 2π
−∞
If X has the distribution function given by (12.7.3), we say that the random variable X is normally distributed
with mean µ and variance σ 2 . If we let Z be the standardized variable corresponding to X, i.e., if we let
X −µ
Z= (12.7.4)
σ
then the mean or expected value of Z is 0 and the variance is 1. In such case the density function for Z can be
obtained from (12.7.2) by formally placing µ = 0 and σ = 1, yielding
1 z2
f (z) = √ e− 2 (12.7.5)
2π
Here X is the random variable giving the number of successes in n Bernoulli trials and p is the probability of
success. The approximation becomes better with increasing n and is exact in the limiting case. The fact that
the binomial distribution approaches the normal distribution can be described by writing
Zb
X − np 1 2
lim P a≤ √ ≤b = √ e−u /2 du (12.7.7)
n→∞ npq 2π
a
√
In words, we say that the standardized random variable (X − np)/ npq is asymptotically normal.
Theorem 12.8.1. Let X1 , X2 , . . . , Xn be independent random variables that are identically distributed (i.e.,
all have the same probability function in the discrete case or density function in the continuous case) and have
finite mean µ and variance σ 2 . Then is Sn = X1 + X2 + . . . + Xn (n = 1, 2, . . .),
Zb
Sn − nµ 1 2
lim P a ≤ √ ≤b = √ e−u /2 du (12.8.1)
n→∞ σ n 2π
a
√
that is, the random variable (Sn − nµ)/σ n, which is the standardized variable corresponding to Sn , is
asymptotically normal
where, in the last two steps, we have respectively used the facts that the Xk are independent and are identically
distributed. Now, by a Taylor series expansion,
√ t(X1 − µ) t2 (X1 − µ)2
t(X1 −µ)/σ n
E[e ] = E 1+ √ + + ···
σ n 2σ 2 n
t t2
= E(1) + √ E(X1 − µ) + 2 E[(X1 − µ)2 ] + · · ·
σ n 2σ n
t t2
= 1 + √ (0) + 2 (σ 2 ) + · · ·
σ n 2σ n
t2
= 1+ + ···
2n
so that n
t2
∗)
tSn
E(e = 1+ + ···
2n
2
But the limit of this as n → ∞ is et /2 , which is the moment generating function of the standardized normal
distribution. Hence, the required result follows.
Unit 13
Course Structure
• Definition and classification of stochastic processes
• Classification of states.
13.1 Introduction
Since the last century there have been marked changes in the approach to scientific enquiries. There has been
greater realisation that probability (or non-deterministic) models are more realistic than deterministic models
in many situations. Observations taken at different time points rather than those taken at a fixed period of time
began to engage the attention of probabilist. This lead to a new concept of indeterminism: indeterminism in
dynamic studies. This has been called “dynamic indeterminism”. Many phenomenon occurring in physical
and life sciences are studied now not only as a random phenomenon but also as one changing with time or
space. Similar considerations are also made in other areas, such as, social sciences, engineering and manage-
ment and so on. The scope of applications of random variables which are functions of time or space or both
has been on the increase.
Families of random variables which are functions of say, time, are known as stochastic processes (or random
processes, or random functions). A few simple examples are given as illustrations.
Example 13.1.1. Consider a simple experiment like throwing a true die.
(i) Suppose that Xn is the outcome of the n-th throw, n ≥ 1. Then {Xn , n ≥ 1} is a family of random
variables such that for a distinct value of n (= 1, 2, . . .), one get a distinct random variable Xn ; {Xn , n ≥ 1}
constitutes a stochastic process, known as Bernoulli process.
(ii) Suppose that Xn is the number of sixes in the first n throws. For a distinct value of n = 1, 2, . . . , we get
a distinct binomial variable Xn ; {Xn , n ≥ 1} which gives a family of random variables is a stochastic process.
(iii) Suppose that Xn is the maximum number shown in the first n throws. Here {Xn , n ≥ 1} constitutes
a stochastic process.
126
13.2. SPECIFICATION OF STOCHASTIC PROCESSES 127
Example 13.1.2. Consider that there are r cells and an infinitely large number of identical balls and that balls
are thrown at random, one by one, into the cells, the ball thrown being equally likely to go into any one of
the cells. Suppose that Xn is the number of occupied cells after n throws. Then {Xn , n ≥ 1} constitutes a
stochastic process.
For example, if Xn is the toal number of sixes appearing in the first n throws of a die, the set of possible
values of Xn is finite set of non-negative integers 0, 1, . . . , n. Here, the state space of Xn is discrete. We can
write
Xn = Y1 + Y2 + . . . + Yn ,
where Yi is a discrete random variable denoting the outcome of the i-th throw and Yi = 1 or 0 according as
the i-th throw shows six or not. Secondly, consider
Xn = Z1 + Z2 + . . . + Zn ,
where Zi is a continuous random variable assuming values [0, ∞). Here, the set of possible values of Xn is
the interval [0, ∞), and so the state space of Xn is continuous.
In the above two examples, we assume that the parameter n of Xn is restricted to the non-negative integers
n = 0, 1, 2, . . . We consider the state of the system at distinct time points n = 0, 1, 2, . . . , only. Here the
word time is used in a wide sense. We note that in the first case considered above the “time n” implies throw
number n.
On the other hand, one can visualise a family of random variables {Xt , t ∈ T } (or {X(t), t ∈ T }) such
that the state of the system is characterized at every instant over a finite or infinite interval. The system is then
defined for a continuous range of time and we say that we have a family of random variable in continuous
time. A stochastic process in continuous time may have either a discrete or a continuous state space. For
example, suppose that X(t) gives the number of incoming calls at a switchboard in an interval (0, t). Here
the state space of X(t) is discrete through X(t) is defined for a continuous range of time. We have a process
in continuous time having a discrete state space. Suppose that X(t) represents the maximum temperature at
a particular place in (0, t), then the set of possible values of X(t) is continuous. Here we have a system in
continuous time having a continuous state space.
So far we have assumed that the values assumed by the random variable Xn (or X(t)) are one-dimensional,
but he process {Xn } (or {X(t)}) may be multi-dimensional. Consider X(t) = (X1 (t), X2 (t)), where X1
represents the maximum and X2 the minimum temperature at a place in an interval of time (0, t), We have
here a two-dimensional stochastic process in continuous time having continuous state space. One can similarly
have multi-dimensional processes. One-dimensional processes can be classified, in general, into the following
four types of processes:
All the four types may be represented by {X(t), t ∈ T }. In case of discrete time, the parameter generally
used is n, i.e., the family is represented by {Xn , n = 0, 1, 2, . . .}. In case of continuous time both the symbols
{Xt , t ∈ T } and {X(t), t ∈ T } (where T is a finite or infinite interval) are used . The parameter t is usually
interpreted as time, though it may represent such characters as distance, length, thickness and so on. We shall
use the notation {X(t), t ∈ T } both in the cases of discrete and continuous parameters and shall specify,
whenever necessary.
Relationship
In some of the cases, the random variable Xn , i.e., members of the family {Xn , n ≥ 1} are mutually in-
dependent, but more often they are not. We generally come across processes whose members are mutually
dependent. The relationship among them is often of great importance.
The nature of dependence could be infinitely varied. Here dependence of some special types, which occurs
quite often and is of great importance, will be considered. We may broadly describe some stochastic processes
according to the nature of dependence relationship existing among the members of the family.
Suppose that we wish to consider the discrete parameter case. Consider a process in discrete time with
independent increments. Writing
Markov Process If {X(t), t ∈ T } is a stochastic process such that, given the value X(s), the values of
X(t), t > s, do not depend on the values of X(u), u < s, then the process is said to be a Markov process. A
definition of such a process is given below. If, for, t1 < t2 < . . . < tn < t,
n o n
P a ≤ X(t) ≤ b|X(t1 ) = x1 , . . . , X(tn ) = xn = P a ≤ X(t) ≤ b|X(tn ) = xn }
the process {X(t), t ∈ T } is a Makrov process. A discrete parameter Markov process is known as a Markov
chain.
tail by 0 and the random variable denoting the result of the n-th toss by Xn . Then for n = 1, 2, 3, . . .
P {Xn = 1} = p, P {Xn = 0} = q.
Thus we have a sequence of random variables X1 , X2 , . . .. The trials are independent and the result of the n-
th trial does not depend in any way on the previous trials numbered 1, 2, . . . , (n − 1). The random variables
are independent.
Consider now the random variable given by the partial sum Sn = X1 + . . . + Xn . The sum Sn gives the
accumulated number of heads in the first n trials and its possible values are 0, 1, . . . , n.
We have Sn+1 = Sn + Xn+1 . Given that Sn = j (j = 0, 1, . . . , n), the random variable Sn+1 can assume
only two possible values: Sn+1 = j with probability q and Sn+1 = j +1 with probability p; these probabilities
are not at all affected by the values of the variables S1 , S2 , . . . , Sn−1 . Thus
P {Sn+1 = j + 1|Sn = j} = p
P {Sn+1 = j|Sn = j} = q.
We have an example of a Markov chain, a case of simple dependence that the outcome of (n + 1)-st trial
depends directly on that of n-th trial and only on it. The conditional probability of Sn+1 given Sn depends on
the value of Sn and the manner in which the value of Sn was reached is of no consequence.
Definition 13.3.1. The stochastic process {Xn , n = 0, 1, 2, . . .} is called a Markov chain, if, for j, k, j1 , . . . , jn−1 ∈
N (or any subset of I),
The outcomes are called the states of the Markov chain; if Xn has the outcome j (i.e., Xn = j), the process
is said to be at state j at n-th trial. To a pair of states (j,k) at the two successive trials (say, n-th and (n+1)-th
trials) there is an associated conditional probability pjk . It is the probability of transition from the state j at
n-th trial to the state k at (n+1)-th trial. The transition probabilities pjk are basic to the study of the structure
of the Markov chain.
The transition probability may or may not be independent of n. If the transition probability pjk is indepen-
dent of n, the Markov chain is said to be homogeneous (or to have stationary transition probabilities). If it is
dependent on n, the chain is said to be non-homogeneous. Here we shall confine to homogeneous chains.
where i, j = 1, 2, . . . , m. The numbers pij are known as the transition probabilities of the chain, and must
satisfy
Xm
pij ≥ 0, pij = 1
j=1
130 UNIT 13.
for each i = 1, 2, . . . , m.
Transition probabilities form an m × m array which can be assembled into a transition matrix T , where
p11 p12 · · · p1m
p21 p22 · · · p2m
T = [pij ] = . (13.4.2)
. .. .. ..
. . . .
pm1 pm2 · · · pmm
m
X
Note that each row of T is a probability distribution. Any square matrix for which pij ≥ 0 and pij = 1 is
j=1
said to be row-stochastic.
Example 13.4.1. The matrix A = [aij ] and B = [bij ] are m × n row-stochastic matrices. Show that C = AB
is also row-stochastic.
Solution: By the multiplication rule for the matrices
m
X
C = AB = [aij ][bij ] = aik bkj .
k=1
Since aij ≥ 0 and bij ≥ 0 for all i, j = 1, 2, . . . , m, it follows that cij ≥ 0. Also
m
X m X
X m m
X m
X m
X
cij = aik bkj = aik bkj = aik · 1 = 1,
j=1 j=1 k=1 k=1 j=1 k=1
m
X m
X
since, bkj = 1 and aik = 1.
j=1 k=1
It follows from this example that any power T n of the transition matrix T must also be row-stochastic.
T = [pij ], (1 ≤ i, j ≤ m)
For a homogeneous chain, recollect that pij is the probability that a transition occurs between Ei and Ej at
any step or change of state in the chain. We intend to investigate and classify some of the more common types
of states which can occur in Markov chains.
(a) Absorbing state: An absorbing state Ei is characterised by the probabilities
pii = 1, pij = 0, (i 6= j, j = 1, 2, . . . , m)
In this case, the state Ei is said to be periodic with period t. If, for a state, no such t exists with this
property, then the state is described as aperiodic. Let
(n)
d(i) = gcd{n|pii > 0}, (13.5.1)
(n)
that is, the greatest common divisor of the set of integers n for which pii > 0. Then the state Ei is said
to be periodic if d(i) > 1 and aperiodic if d(i) = 1.
Example 13.5.1. A four-state Markov chain has the transition matrix
0 12 0 12
0 0 1 0
T = 1 0 0 0
0 0 1 0
Show that all states have period 3.
Solution: The transition diagram is shown in Fig. 13.5.1, from which it is clear that all states are period
3. For example, if the chain start in E1 , then returns to E1 are only possible at steps 3, 6, 9, . . . . either
through E2 or E3 .
The analysis of chains with periodic states can be complicated. However, one can check for a suspected
and both these matrices have zero diagonal elements for r = 1, 2, 3 . . .. Hence, for i = 1, 2, 3, 4,
(n)
pii = 0 for n 6= 3, 6, 9, . . . ,
(n)
pii 6= 0 for n = 3, 6, 9, . . . ,
and, in general,
n−1
(n) (n) (r) (n−r)
X
pjj = fj + fj pjj (n ≥ 2). (13.5.5)
r=1
The terms in Eqn.(13.5.4) imply that the probability of a return at the third step is the probability of a
first return at the third step, or the probability of a first return at the first step and a return two steps later,
or the probability of a first return at the second step and a return one step later.
(n)
Equations (13.5.2) and (13.5.5) become iterative formulas for the sequence of first returns fj which
can be expressed as:
(1)
fj = pjj , (13.5.6)
n−1
(n) (n) (r) (n−r)
X
fj = pjj − fj pjj (n ≥ 2). (13.5.7)
r=1
where 0 < p < 1, 0 < q < 1. Show that the state E1 is persistent.
Solution: For simple chains a direct approach using the transition diagram is often easier than the for-
(n)
mula (13.5.7) for fj . For this example the transition diagram is shown in Fig. 13.5.2.
13.5. CLASSIFICATION OF STATES 133
If a sequence starts in E1 , then it can be seen that first returns to E1 can be made to E1 at every step
except for n = 2, since after two steps the chain must be in state E3 . From the figure it can be argued
that
(n)
The last result for f1 for n ≥ 4 follows from the following sequence of transitions:
E1 E2 E3 E3 · · · E3 E1 .
| {z }
(n−3) times
∞ ∞
(n)
X X
f1 = = f1 = p+ (1 − p)(1 − q)q n−3
n=1 n=3
∞
X
= p + (1 − p)(1 − q) qs (s = n − 3)
s=0
(1 − q)
= p + (1 − p)
(1 − q)
= 1
∞
(n)
X
The mean recurrence time µj of a persistent state Ej , for which fj = 1, is given by
n=1
∞
(n)
X
µj = nfj . (13.5.8)
n=1
134 UNIT 13.
In Example 13.5.2, the state E1 is persistent and its mean recurrence time is given by
∞ ∞
(n)
X X
µ1 = nf1 = p + (1 − p)(1 − q) nq n−3
n=1 n=3
3 − 2q
= p + (1 − p)(1 − q)
(1 − q)2
3 − 2p − 2q + pq
=
1−q
which is finite. For some chains, however, the mean recurrence time can be infinite; in other words, the
mean number of steps to a first return is unbounded.
Solution: The transition diagram at a general step n is shown in Fig. 13.5.3 From the figure, we have
it follows that
∞ N
X 1 X 1 1 1 1 1
= lim − = lim − = .
n(n + 1) N →∞ n n+1 N →∞ 4 N +1 4
n=4 n=4
Hence
5 3
f1 =
+ = 1,
8 8
which means E1 is persistent. On the other hand, the mean recurrence time
∞ ∞
X (n) 7 3X n
µj = nf1 = +
8 2 n(n + 1)
n=1 n=4
7 3 1 1 1
= + + + + ···
8 2 5 6 7
∞
7 3X1
= + .
8 2 n
n=5
minus the first four terms. The harmonic series is a well-known divergent series, which means that
µ1 = ∞. Hence E1 is persistent and null.
(d) Transient state: For a persistent state the probability of a first return at some step in the future is certain.
For some states,
∞
(n)
X
fj = fj < 1, (13.5.10)
n=1
which means that the probability of a first return is not certain. Such states are described as transient.
Example 13.5.4. A four state Markov chain has the transition matrix
0 12 41 14
1 1 0 0
T = 2 2
0 0 1 0
0 0 21 12
Solution: The transition diagram is shown in Fig. 13.5.4. From the figure
2 3 n
(1) (2) 1 1 1 (3) 1 (n) 1
f1 = 0, f1 = · = , f1 = , f1 = .
2 2 2 2 2
Hence
∞ ∞ n
X (n)
X 1 1
f1 = f1 = = <1
2 2
n=1 n=2
implying that E1 is a transient state. The reason for the transience of E1 can be seen from Fig. 13.5.4,
where transitions from E3 or E4 to E1 or E2 are not possible.
136 UNIT 13.
(d) Ergodic states: A state which is persistent, nonnull and aperiodic is called ergodic state.
where 0 < p < 1, 0 < q < 1. Show that the state E1 is ergodic.
(n)
The convergence of µ1 implies that E1 is nonnull. Also the diagonal elements pii > 0 for n ≥ 3 and
i = 1, 2, 3, which means that E1 is aperiodic. Hence from the definition above E1 (and E2 and E3 also)
is ergodic.
Unit 14
Course Structure
• Statistical Inference
• Estimation of Parameters
14.1 Introduction
To study the features of any population we first select a sample from the population. A carefully selected
sample may be expected to possess the characteristics of the population. A scientific theory developed to get
an idea regarding the properties of a population on the basic of the knowledge of the properties of a sample
drawn from it is known as Statistical Inference.
137
138 UNIT 14.
In the case of point estimation, the value of θ may vary from sample to sample and this function is known
as ‘estimator’ of the parameter and its value for a particular sample is called and ‘estimate’.
In the case of interval estimation, two statistics θ̂1 (x1 , x2 , . . . , xn ) and θ̂2 (x1 , x2 , . . . , xn ) are selected
within which the value of the parameter θ is expected to lie. This interval is known as Confidence Interval and
the two quantities used to specify the interval are known as Confidence Limits.
(i) Unbiasedness,
(ii) Consistency,
(iii) Efficiency,
(iv) Sufficiency.
14.3 Unbiasedness
A statistic T is said to be an unbiased estimator of a parameter θ if the expected value of the statistic coincides
with the actual value of the parameter, i.e., if
E(T ) = θ
Otherwise, the estimation will be called biased. E(T ) − θ is called the bias of the statistic T in estimating θ.
It will be called positively or negatively biased according as E(T ) − θ is greater or less than zero.
Theorem 14.3.1. The sample mean is an unbiased estimate of the population mean.
Proof. Let x1 , x2 , . . . , xn be n simple samples drawn from a finite population X1 , X2 , . . . , XN with replace-
ment. In this case, each xi have equal chance to be selected from any of the N population values. Therefore,
1 1 1
E(xi ) = X1 + X2 + . . . + XN
N N N
1
= (X1 + X2 + . . . + XN )
N
= m, the population mean, i = 1, 2, . . . , n (14.3.1)
Again,
x = sample mean
x1 + x2 + . . . + xn
=
n
14.3. UNBIASEDNESS 139
Now,
x1 + x2 + . . . + xn
E(x) = E
n
1h i
= E(x1 ) + E(x2 ) + . . . + E(xn )
n
1h i
= m + m + ... + m
n
nm
=
n
= m = population mean (14.3.2)
Theorem 14.3.2. The sample variance is a biased estimator of the population variance.
Proof. Let m and σ 2 be the population mean and variance respectively and let x and S 2 be the corresponding
sample mean and variance.
x1 + x2 + . . . + xn
Again, Sample mean x = and sample variance
n
n n
2 1X 1X
S = (xi − x)2 = (xi − m + m − x)2
n n
i=1 i=1
n
1X
= {(xi − m)2 − 2(xi − m)(x − m) + (x − m)2 }
n
i=1
" n n
#
1 X 2
X
2
= (xi − m) − 2(x − m) (xi − m) + n(x − m)
n
i=1 i=1
" n n
! #
1 X
2
X
2
= (xi − m) − 2(x − m) xi − nm + n(x − m)
n
i=1 i=1
" n #
1 X
= (xi − m)2 − 2(x − m)(nx − nm) + n(x − m)2
n
i=1
" n #
1 X
= (xi − m) − (x − m)2 .
2
n
i=1
140 UNIT 14.
Therefore,
n
( )
2 1X
E(S ) = E (xi − m)2 − E{(x − m)2 }
n
i=1
n
1X
= E{(xi − m)2 } − V ar(x)
n
i=1
n
σ2
P
i=1 σ2
= −
n n
σ2
= σ2 −
n
n−1 2
= σ
n
bias = E(S 2 ) − σ 2
n−1 2
= σ − σ2
n
σ2
= − .
n
Again, if we write
n
s2 = S2, (14.3.3)
n−1
n
then E(s2 ) = E(S 2 )
n−1
n n−1 2
= · σ
n−1 n
= σ2.
In other notation,
lim P (|Tn − θ| < ) = 1 (14.4.2)
n→∞
14.5. EFFICIENT ESTIMATOR 141
or its equivalent,
lim P (|Tn − θ| ≥ ) = 0 (14.4.3)
n→∞
Thus, a consistent estimator is expected to come more closer to the parameter as the size of the sample be-
comes larger.
It may be shown that two sufficient conditions for an estimator Tn to be consistent estimator of θ are
If Vm be the variance of the most efficient estimator and V be the variance of another estimator for a parameter
θ, then the efficiency of the estimator is defined as
Vm
Efficiency = .
V
Since, Vm ≤ V , so efficiency cannot exceed 1.
Thus for a random sample x1 , x2 , . . . , xn from a population whose probability density function (p.m.f) is
f (x, θ) if T be a sufficient estimator of θ, then
Let x1 , x2 , . . . , xn be a random sample of size n drawn from a population and let θ1 , θ2 , . . . , θk be k param-
eters of the distribution. This event can be denoted by (X1 = x1 , X2 = x2 , . . . , Xn = xn ) and the probability
of this is clearly a function of sample values x1 , x2 , . . . , xn and the parameters θ1 , θ2 , . . . , θk . This function
is known as likelihood function of the sample and it is generally denoted by L(x1 , x2 , . . . , xn ; θ1 , θ2 , . . . , θk ).
Thus,
142 UNIT 14.
L(x1 , x2 , . . . , xn ; θ1 , θ2 , . . . , θk ) = P (X1 = x1 , X2 = x2 , . . . , Xn = xn )
Since X1 , X2 , . . . , Xn are mutually independent random variables each having the distribution of the popula-
tion, then in the discrete case
P (X = xi ) = fxi (θ1 , θ2 , . . . , θk )
and in the continuous case
P (X = xi ) = f (xi , θ1 , θ2 , . . . , θk ).
Then the liklihood function L in the two cases are given as
L(x1 , x2 , . . . , xn ; θ1 , θ2 , . . . , θn ) = P (X1 = x1 )P (X2 = x2 ) . . . P (Xn = xn )
(In discrete case) = fx1 (θ1 , θ2 , . . . , θk )fx2 (θ1 , θ2 , . . . , θk ) . . . fxn (θ1 , θ2 , . . . , θk )
(In continuous case) = f (x1 , θ1 , θ2 , . . . , θk )f (x2 , θ1 , θ2 , . . . , θk ) . . . f (xn , θ1 , θ2 , . . . , θk )
Now, this method states that regarding the sample values as fixed, we shall try to find the values of θ1 , θ2 , . . . , θk
such that for these values the likelihood function L will be maximised. Since L > 0, so when L is maximum,
then log L is also maximum. The corresponding equations for determining θ1 , θ2 , . . . , θk are
∂ log L ∂ log L ∂ log L
= 0, = 0, . . . , = 0,
∂θ1 ∂θ2 ∂θk
which are called likelihood equations. Solving these k equations we get likelihood estimates of θ1 , θ2 , . . . , θk
and they are generally denoted by
θ1 = θˆ1 (x1 , x2 , . . . , xn ), θ2 = θˆ2 (x1 , x2 , . . . , xn ), . . . , θk = θˆk (x1 , x2 , . . . , xn ).
Also it may tested that for these values of θˆ1 , θˆ2 , . . . , θˆk , L is maximum.
Example 14.7.1. Let T1 and T2 be two estimators of the parameter θ. Under what condition aT1 + bT2 will
be an unbiased estimator of θ?
Solution: Since T1 and T2 are two unbiased estimator of θ, so E(T1 ) = E(T2 ) = θ. Again, if (aT1 + bT2 )
be an unbiased estimator of θ, then
E(aT1 + bT2 ) = θ
⇒ aE(T1 ) + bE(T2 ) = θ
⇒ aθ + bθ = θ
⇒ a + b = 1, which is the required condition.
Example 14.7.2. If X1 , X2 , . . . , Xn is a random sample from N (µ, σ 2 ) population, show that the estimator
n
1 X
T = Xi
n+1
i=1
is a biased but consistent for µ. Hence obtain the unbiased estimator for µ.
Solution: We have
n
1 X
T = Xi
n+1
i=1
n
n 1X
= · Xi
n+1 n
i=1
n
= X
n+1
14.7. METHOD OF MAXIMUM LIKELIHOOD FOR ESTIMATION OF A PARAMETERS 143
in p n
We know, X −−→ as n → ∞ and → 1 as n → ∞. So
n+1
in p
T −−→ µ as n → ∞.
n+1 n+1
So T is a biased estimator of µ. If we put T1 = , then, E(T1 ) = E(T ) = µ.
n n
n+1
Thus, T1 = T is the unbiased estimator for µ.
n
Example 14.7.3. Maximum likelihood estimate of the parameter p of the Binomial (N, p) population for n
sample values.
So,
Now,
∂ log L nx nN − nx 1
= = ∵ x = (x1 + x2 + . . . + xn )
∂p p 1−p n
∂ log L
Thus, = 0 gives
∂p
nx nN − nx
=
p 1−p
x
⇒p= .
N
x
Thus, p̂ = is the likelihood estimate of p.
N
It can be verified that
∂2L
< 0.
∂p p=p̂
Solution: Let x1 , x2 , . . . , xn be n sample values drawn from a Poisson distribution having parameter µ.
Then
e−µ µx
f (x, µ) = (x = 0, 1, 2, . . . , ∞)
x!
The likelihood function L of the sample observations is given by
∂ log L
Now, = 0 gives,
∂µ
n
1X
−n + xi = 0
µ
i=1
n
1X
⇒µ= xi = x.
n
i=1
Therefore, µ = x. Again
n
∂ 2 log L
1 X nx n
=− xi = − 2 = − < 0.
∂µ2 µ=µ̂ µ̂ 2 (x) x
i=1
Thus, µ̂ = x, the sample means is the likelihood estimate of the parameter µ of a Poisson distribution.
Example 14.7.5. Maximum likelihood estimates of the parameter m and σ in Normal (m, σ) population for
a sample of size n.
Thus, σ̂ 2 = S 2 .
Example 14.7.6. Find the maximum likelihood estimate of the parameter λ for the Weibuzl distribution
α
f (x) = λαxα−1 e−λx , (x > 0)
Then,
log L = n log λ − λ(xα1 + xα2 + . . . + xαn ) + terms independent of λ.
∂ log L
So, the likelihood equation = 0 gives
∂λ
n
− (xα1 + xα2 + . . . + xαn ) = 0
λ
n
⇒ λ̂ = n
P α
xi
i=1
∂ 2 log L
n
Again =− . So for this λ̂, L is maximum.
∂λ2 λ=λ̂ λ2
Example 14.7.7. Prove that the maximum likelihood estimate of the parameter α of a population having
density function
2
(α − x), 0 < x < α,
α2
for a sample of unit size is 2x, x being the sample value. Show also that the estimate is biased.
146 UNIT 14.
Solution: Since the sample is of unit size, so the likelihood function L is given by
2
L= (α − x)
α2
⇒ log L = log 2 − 2 log α + log(α − x)
∂ log L
Now, the likelihood equation = 0 gives
∂α
2 1
−+ =0
α α−x
⇒ α = 2x
∂ 2 log L 2 1
= 2− < 0 for α̂ = 2x.
∂α2 α (α − x)2
Course Structure
• Interval estimation
• Statistical hypothesis
15.1 Introduction
We have studied the problem of estimation of a parameter occurring in a distribution such an estimate is called
parameter estimate and the corresponding problem is known as the problem of estimation. Such and estimate
always associated with random error. For this reason, it is sometime desirable to find a δ > 0 for a given small
where 0 < < 1 such that an estimate θ̂ for the parameter θ satisfies
P (T1 ≤ θ ≤ T2 ) = 1 − (15.2.1)
where (0 < < 1) is a parameter. Then the interval (T1 , T2 ) is called an interval estimate or a confidence
interval for the parameter θ with confidence coefficient 1 − ; the statistics T1 and T2 are respectively called
the lower and upper confidence limits for θ.
A practical interpretation of this result is that if a long sequence of random samples, are drawn from a
population under uniform conditions and the statistics T1 and T2 are computed in each time, then
The number of times the interval (T1 , T2 ) includes the true parameter θ
=1−
The total number of samples drawn
147
148 UNIT 15.
The number is usually chosen to be very small, like 0.05, 0.01, 0.001 etc. and the corresponding confidence
coefficients are 0.95, 0.99, 0.999 etc. and then the corresponding confidence intervals will be called 95%,
99%, 99.9% etc. confidence intervals.
The length of the interval (T2 − T1 ) is used as an inverse measure of precision of the interval estimate.
Then (T1 , T2 ) is the desired confidence interval for the population parameter θ.
whose sampling distribution is normal (0, 1) and which depends on the parameter m.
Since normal curve is symmetrical curve, so we take two points ±u symmetrically about the origin, Fig.
15.4.1, such that
Figure 15.4.1
where u is given by P (−u < z < u ) = 1 − or from symmetry P (z > u ) = 21 . For 95% confidence
interval, 1 − = 0.95 and u = 1.96, then the corresponding confidence interval for the population mean m
will be
σ σ
x − 1.96 √ , x + 1.96 √ (15.4.2)
m m
Case II: σ unknown: In this case, the suitable statistic will be
n
x−m 1 X
t= √ , s2 = (xi − x)2
s/ n n−1
i=1
Now procedding exactly as in the Case I, we can calculate two numbers ±t , Fig. 15.4.2, by
Here t is given by P (−t < t < t ) = 1 − or by P (t > t ) = 12 . In case of large samples, if σ is unknown,
then the approximate call interval for m may be obtained by replacing σ by s or S in (15.4.2).
Figure 15.4.2
is a χ2 -distributed with (n − 1) degrees of freedom, where S 2 is the sample variance, σ is the population
variance and n is the size of the sample.
We choose any positive number χ21 and determine χ22 such that
Figure 15.4.3
Example 15.4.1. A sample 2.3, -0.2, -0.4, -0.9 is taken from a normal population with variance 9. Find a 95%
confidence interval for the population mean. (Given P (U ) > 1.960) = 0.025, where U is a normal (0, 1)
variate.
Solution: With usual notation, we have
2.3 + (−0.2) + (−0.4) + (−0.9)
x= = 0.2
4
Also, n = 4 and σ 2 = 9, = 0.05, u = 1.96. Hence, the confidence interval for mean when σ is known is
σ σ
x − 1.96 √ , x + 1.96 √
m m
3 3
= 0.2 − 1.96 × , 0.2 + 1.96 ×
2 2
= (−2.74, 3.14)
Example 15.4.2. The mean and variance of a sample of size 400 from a normal population are found to be
18.35 and 3.25 respectively. Given P (U > 1.96) = 0.025, U being a standard normal variate, find 95%
confidence interval for the population mean.
15.5. STATISTICAL HYPOTHESIS 151
Solution:
10 10
!2
2 1 X 2 1 X
Sample variance S = xi − xi
10 10
i=1 i=1
620 2
39016
= −
10 10
= 3901.6 − 3844
= 57.6
For 99% confidence interval
1
1 − = 0.99, i.e., = 0.01 and = 0.005
2
Here, n = 10. We know that confidence interval of σ is
r r r !
n n 57.6 × 10 57.6 × 10
r
S ,S = ,
χ22 χ21 23.59 1.74
= (4.94, 18.19)
There are two types of hypothesis, viz. simple and composite. When a statistical hypothesis completely
specifies the population distribution, it will be called a simple hypothesis and when it will not completely
specify the population distribution, it will be called a composite hypothesis. In the case of composite hypoth-
esis the number of unspecified parameters is called the degrees of freedom of the composite hypothesis.
As an illustration, let us consider a Normal (m, σ) distribution and let m0 and σ0 be taken to be two given
values of m and σ respectively. Then
152 UNIT 15.
(iii) Hypothesis m = m0 is composite if both m and σ are unknown and its degrees of freedom is 1.
H0 : θ = θ0
Any other hypothesis about the parameter θ against which we wish to test the null hypothesis is called Alter-
native Hypothesis and this is written as
H1 : θ = θ1
Generally, the hypothesis wishing to be rejected by the test is taken as null hypothesis. Say we have two
alternatives i.e., either θ = θ0 or θ = θ1 and we have a priori reason to be more inclined to believe the second
hypothesis, then we take the hypothesis H0 : θ = θ0 as null hypothesis.
Figure 15.7.1
Let us divide the sample space S into two disjoint parts W and W (= S − W ). Let us assume that we
reject the null hypothesis H0 : θ = θ0 if the observed sample point falls in W and in this case we accept
H1 : θ = θ1 . On the other hand we accept H0 if the point fall in W . Technically the region W , i.e., the region
of rejection of the null hypothesis H0 is called the critical region or region of rejection.
not be always true in respect of the population. The following two cases are called Type I and Type II errors.
Type I Error: When the null hypothesis H0 is rejected i.e., H1 is accepted but H0 is true, the error arising
in this situation is called Type I Error. If α be the probability of Type I Error, then
α = Probability of Type I Error
= Probability of rejecting H0 where H0 is true
= P (x ∈ W/H0 = θ0 ), where x = (x1 , x2 , . . . , xn ) (15.8.1)
Type II Error: When the null hypothesis H0 is accepted i.e., H1 is rejected but H0 is false, the error arising
in this situation will be called Type II error. If β be the probability of Type II Error, then
β = Probability of Type II Error
= Probability of accepting H0 where H0 is false
= P (x ∈ W /H1 = θ1 ), where x = (x1 , x2 , . . . , xn ) (15.8.2)
39016 620
= −
10 10
= 3901.6 − 3844
= 57.6
154 UNIT 15.
nS 2
χ2 =
σ2
10 × 57.6
=
(8)2
= 9
Since χ2observed = 9 < χ20.05,9 = 16.92, so, H0 is accepted and thus we conclude that the value of σ may be
taken as 8 at 95% level of significance.
Example 15.10.2. For a large lot of freshly minted coins a random sample of size 50 is taken. The mean
weight of coins in the sample is found to be 28.57 gm. Assuming that the population standard deviation of
weight is 1.25 gm., will it it be reasonable to suppose that the population mean is 28 gm ?
Solution: The size of the sample is 50 and so n = 50. Population mean m = 28 gm and population s.d.
σ = 1.25 gm. Let the null hypothesis H0 and the alternative hypothesis H1 be given by
H0 : m = 28
H1 : m 6= 28
σ 1.25 1.25
S.E. of x = √ = √ = = 0.18
n 50 7.071
Let us consider the statistic
x−m
z=
S.E. of (x)
which is standard normal. Therefore,
28.57 − 28
z= = 3.17
0.18
Since the observed value of z exceeds 1.64, thus z falls in the critical region at 5% level of significance and
so the null hypothesis H0 is rejected at 5% level of significance. So it will not be reasonable to suppose that
the population mean is 28 gm. at 5% level of significance.
Example 15.10.3. The mean life time of a sample of 100 electric bulbs produced by a manufacturing company
is estimated to be 1570 hours with a standard deviation of 120 hours. If µ be the mean life time of all the bulbs
produced by the company, test the hypothesis µ = 1600 hours against the alternative hypothesis µ 6= 1600
hours, using level of significance 0.05.
Solution: Here n = size of the sample = 100, population mean µ = 1570 and population S.D. σ = 120.
We test the null hypothesis H0 = µ = 1600 against the alternative hypothesis H1 : µ 6= 1600 at 5% level of
significance.
σ 120
Here x = 1570 and S.E. of x = √ = √ = 12
n 100
Therefore,
x−µ
z =
S.E. of (x)
1570 − 1600
=
12
= −2.5
15.10. POWER OF THE TEST 155
Hence z falls in the critical region at 5% level of significance and so we reject the null hypothesis.
Thus at 5% level of significance it will not be reasonable to suppose that the mean life of the bulb will be
1600 hours.
Example 15.10.4. In a sample of 600 students of a certain college, 400 are found to use dot pens. In another
college from a sample of 900 students 450 were found to use dot pens. Test whether the colleges are signifi-
cantly different with respect to the habit of using dot pens. (Null and alternative hypothesis should be stated
clearly.)
Solution: With usual notations, null hypothesis will be that the the population proportions of the two
colleges regarding the habit of using dot pen are equal. So H0 : (P1 = P2 ) and alternative hypothesis is
H1 : (P1 6= P2 ).
Here,
400
n1 = 600, p1 = = 0.667
600
450
n2 = 900, p2 = = 0.5
900
If for the null hypothesis P1 = P2 = P , then sample estimate of P is
n1 p1 + n2 p2 600 × 0.667 + 900 × 0.5
p= = = 0.567
n1 + n2 600 + 900
Now,
s
1 1
S.E. of (p1 − p2 ) = pq +
p1 p2
s
1 1
= 0.567 × (1 − 0.567) × +
600 900
p
= 0.567 × 0.433 × (0.0017 + 0.0011)
= 0.026
Now,
p1 − p2 0.667 − 0.5
z= = = 6.42
S.E. 0.026
At 1% level the critical region is |z| > 2.58. So this z falls in the critical region and hence H0 is rejected. So
the two colleges are significantly different with respect to the habit of using dot pens.
Unit 16
Course Structure
• Analysis of variance
• One factor experiments
• Linear mathematical model for ANOVA
16.1 Introduction
Suppose that in an agricultural experiment, four different chemical treatments of soil produced mean wheat
yields of 28, 22, 18 and 24 bushels per acre, respectively. Is there a significant difference in these means, or is
the observed spread simply due to chance?
Such problem can be solved by using an important technique known as the analysis of variance, developed
by Fisher. It makes use of the F distribution already considered in previous unit. Basically, in many situations
there is a need to test the significance of differences among three or more sample means, or equivalently to
test the null hypothesis that the sample means are all equal.
The results of a one-factor experiment can be presented in a table having a rows and b columns (Table.
16.1). Here xjk denotes the measurement in the j-th row and k-th column, where j = 1, 2, . . . , a and k =
1, 2, . . . , b. For example, x35 refers to the fifth measurement for the third treatment.
Table 16.1
Treatment 1 x11 x12 · · · x1b x1
Treatment 2 x21 x22 · · · x2b x2
.. ..
. .
Treatment a xa1 xa2 ··· xab xa
156
16.3. TOTAL VARIATION, VARIATION WITHIN TREATMENTS, VARIATION BETWEEN TREATMENTS157
We shall denote by xj· the mean of the measurements in the j-th row. We have
b
1X
xj· = xjk , j = 1, 2, . . . , a (16.2.1)
b
k=1
The dot in xj· is used to show that the index k has been summed out. The values xj· are called group means
or treatment means or row means. The grand mean or overall mean is the mean of all the measurement in all
the groups and is denoted by x, i.e.,
a b
1 X 1 XX
x= xjk = xjk . (16.2.2)
ab ab
j,k j=1 k=1
and then squaring and summing over j and k, we can show that
X X X
(xjk − x)2 = (xjk − xj )2 + (xj − x)2 (16.3.3)
j,k j,k j,k
X X X
⇒ (xjk − x)2 = (xjk − xj )2 + b (xj − x)2 (16.3.4)
j,k j,k j
We call the first summation on the right side of (16.3.4) the variation within the treatments (since it involves
the squares of the deviations of xjk from the treatment means xj ) and denoted it by vw . Therefore,
X
vw = (xjk − xj )2 (16.3.5)
j,k
The second summation on the right side of (16.3.4) is called the variation between treatments (since it involves
the squares of the deviation of the various treatment means xj from the grand mean x and is denoted by vb ).
Therefore,
X X
vb = (xj − x)2 = b (xj − x)2 (16.3.6)
j,k j
X τ2
v = x2jk − (16.4.1)
ab
j,k
1 X 2 τ2
vb = τj − (16.4.2)
b ab
j
vw = v − vb (16.4.3)
where τ is the total of all values xjk and τj· is the total of all values in the j-th treatment, i.e.,
X X
τ= xjk τj· = xjk (16.4.4)
j,k k
In practice it is convenient to subtract some fixed value from all the data in the table; this has no effect on the
final results.
The ∆jk can be taken as independent (relative to j as well as to k), normally distributed random variables with
mean zero and variance σ 2 . This is equivalent to assuming the the Xjk (j = 1, 2, . . . , a; k = 1, 2, . . . , b) are
mutually independent, normal variables with means µj and common variance σ 2 . Let us define the constant
µ by
1X
µ= µj
a
j
We can think of µ as the mean for a sort of grand population comprising all the treatment populations. Then
(16.5.1) can be rewritten as
X
Xjk = µ + αj + ∆jk where αj = 0 (16.5.2)
j
The constant αj can be viewed as the special effect of the j-th treatment.
The null hypothesis that all treatment means are equal is given by (H0 : αj = 0; j = 1, 2, . . . , a) or equiva-
lently by (H0 = µj = µ; j = 1, 2, . . . , a). If H0 is true, the treatment populations, which by assumption are
normal, have a common mean as well as a common variance. Then there is just one treatment population, and
all treatments are statistically identical.
16.6. EXPECTED VALUES OF THE VARIATIONS 159
Table 16.2
Example 16.9.1. Table 16.3 shows the yields in bushels per acre of a certain variety of wheat grown in a
particular type of soil treated with chemicals A, B, or C.
Table 16.3
Find (a) the mean yields for the different treatments, (b) the grand mean for all treatments, (c) the total
variation, (d) the variation between treatments, (e) the variation within treatments. Use the long method.
Solution: To simplify the arithmetic, we may subtract some suitable number, say, 45, from all the data
without affecting the values of the variations. We then obtain the data of Table 16.4
Table 16.4
16.9. ANALYSIS OF VARIANCE TABLES 161
(a) The treatment (row) means for Table 16.4 are given, respectively, by
1
x1 = (3 + 4 + 5 + 4) = 4,
4
1
x2 = (2 + 4 + 3 + 3) = 3,
4
1
x3 = (4 + 6 + 5 + 5) = 5,
4
Therefore, the mean yields, obtained by adding 45 to these, are 49, 48 and 50 bushels per acre for A, B and
C respectively.
1
(b) x = (3 + 4 + 5 + 4 + 2 + 4 + 3 + 3 + 4 + 6 + 5 + 5) = 4
12
Therefore, the grand mean for the original set of data is 45 + 4 = 46 bushels per acre.
(c)
X
Total variation = v = (xjk − x)2
j,k
The analysis of variance table for Examples 16.9.1 - 16.9.3 is shown in Table 16.5.
162 UNIT 16.
Table 16.5
Exercise 16.9.4. Use the shortcut formulas (16.4.1) through (16.4.3) to obtain the results of Example 16.9.1.
P in all treatments, τ is the sum of all observations, τ j· is the sum of all values in the
number of observations
j-th treatment, and is the sum from j = 1 to a. The analysis of variance table for this case is given in Table
j
16.6.
Example 16.10.1. Table 16.7 shows the lifetimes in hours of samples from three different types of television
tubes manufactured by a company. Using the long method, test at (a) the 0.05, (b) the 0.01 significance level
whether there is a difference in the three types. (Given that F0.95,2,9 = 4.26 and F0.99,2,9 = 8.02).
Solution. It is convenient to subtract a suitable number, say, 400, obtaining Table 16.8. In this table we
have indicated the row total, the sample or group means, and the grand mean. We then have
X
v = (xjk − x)2 = (7 − 7)2 + (11 − 7)2 + · · · + (8 − 7)2 = 72
j,k
X X
vb = (xj· − x)2 = nj (xj· − x)2 = 3(9 − 7)2 + 5(7 − 5)2 + 4(8 − 7)2 = 36
j,k j
vw = v − vb = 72 − 36 = 36
The data can be summarized in the analysis of variance table, Table 16.9. Now, for 2 and 9 degrees of freedom
16.10. MODIFICATIONS FOR UNEQUAL NUMBER OF OBSERVATIONS 163
Table 16.6
Table 16.7
we have F0.95,2,9 = 4.26 and F0.99,2,9 = 8.02. Therefore, we can reject the hypothesis of equal means (i.e.,
there is no difference in the tree types of tubes) at the 0.05 level but not at the 0.01 level.
164 UNIT 16.
Table 16.8
Table 16.9
Exercise 16.10.2. Use the shortcut formulas (16.10.1) through (16.10.3) to obtain the results of Example
16.10.1.
References
14. An Outline of Statistical Theory (Vol 1 and 2): A. M. Goon, M. K. Gupta & B. Dasgupta.
165
POST GRADUATE DEGREE PROGRAMME (CBCS) IN
MATHEMATICS
SEMESTER IV
May, 2020
All rights reserved. No part of this work should be reproduced in any form without the permission in writing
form the Directorate of Open and Distance Learning, University of Kalynai.
Director’s Message
Satisfying the varied needs of distance learners, overcoming the obstacle of distance and reaching the un-
reached students are the threefold functions catered by Open and Distance Learning (ODL) systems. The
onus lies on writers, editors, production professionals and other personnel involved in the process to overcome
the challenges inherent to curriculum design and production of relevant Self Learning Materials (SLMs). At
the University of Kalyani a dedicated team under the able guidance of the Hon’ble Vice-Chancellor has in-
vested its best efforts, professionally and in keeping with the demands of Post Graduate CBCS Programmes
in Distance Mode to devise a self-sufficient curriculum for each course offered by the Directorate of Open and
Distance Learning (DODL), University of Kalyani.
Development of printed SLMs for students admitted to the DODL within a limited time to cater to the
academic requirements of the Course as per standards set by Distance Education Bureau of the University
Grants Commission, New Delhi, India under Open and Distance Mode UGC Regulations, 2017 had been our
endeavour. We are happy to have achieved our goal.
Utmost care and precision have been ensured in the development of the SLMs, making them useful to the
learners, besides avoiding errors as far as practicable. Further suggestions from the stakeholders in this would
be welcome.
During the production-process of the SLMs, the team continuously received positive stimulations and feed-
back from Professor (Dr.) Sankar Kumar Ghosh, Hon’ble Vice-Chancellor, University of Kalyani, who kindly
accorded directions, encouragements and suggestions, offered constructive criticism to develop it within
proper requirements. We gracefully, acknowledge his inspiration and guidance.
Sincere gratitude is due to the respective chairpersons as weel as each and every member of PGBOS
(DODL), University of Kalyani, Heartfelt thanks is also due to the Course Writers-faculty members at the
DODL, subject-experts serving at University Post Graduate departments and also to the authors and aca-
demicians whose academic contributions have enriched the SLMs. We humbly acknowledge their valuable
academic contributions. I would especially like to convey gratitude to all other University dignitaries and
personnel involved either at the conceptual or operational level of the DODL of University of Kalyani.
Their persistent and co-ordinated efforts have resulted in the compilation of comprehensive, learner-friendly,
flexible texts that meet the curriculum requirements of the Post Graduate Programme through Distance Mode.
Self Learning Materials (SLMs) have been published by the Directorate of Open and Distance Learning,
University of Kalyani, Kalyani-741235, West Bengal and all the copyright reserved for University of Kalyani.
No part of this work should be reproduced in any from without permission in writing from the appropriate
authority of the University of Kalyani.
All the Self Learning Materials are self writing and collected from e-book, journals and websites.
Director
University of Kalyani
Optional Paper
MATO 4.2
Marks : 100 (SEE : 80; IA : 20)
• Unit 2: Analysis of stochastically falling equipments including the reliability function, reliability and
growth model.
• Unit 3: Information Theory: Information concept, expected information, Entropy and properties of
entropy function.
• Unit 7: Shannon-Fano encoding procedure, Haffman encoding, noiseless coding theory, noisy coding.
• Unit 8: Family of codes, Hammimg code, Golay code, BCH codes, Reed-Muller code, Perfect code,
codes and design, Linear codes and their dual, weight distribution.
• Unit 10: Imbedded Markov Chain method for Steady State solution.
• Unit 11: Posynomial, Signomial, Degree of difficulty, Unconstrained minimization problems, Solution
using Differential Calculus, Solution seeking Arithmetic-Geometric inequality, Primal dual relationship
& sufficiency conditions in the unconstrained case,
• Unit 12: Constrained minimization, Solution of a constrained Geometric Programming problem, Geo-
metric programming with mixed inequality constrains, Complementary Geometric programming.
• Unit 13: A brief introduction to Inventory Control, Single-item deterministic models without shortages.
• Unit 14: Single-item deterministic models with shortages Dynamic Demand Inventory Models.
• Unit 15: Multi-item inventory models with the limitations on warehouse capacity
• Unit 16: Models with price breaks, single-item stochastic models without Set-up cost and with Set-up
cost, Average inventory capacity, Capital investment.
Contents
Director’s Message
1 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 MTTF in terms of failure density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 10
2.1 Linearly Increasing Hazard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 System Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 17
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Fundamental theorem of information theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.1 Origination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3 Measure of information and characterisation . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3.1 Units of information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.4 Entropy (Shannon’s Definition) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.4.1 Units of entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.4.2 Properties of entropy function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4 25
4.1 Joint, conditional and relative entropies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2 Mutual information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2.1 Conditional mutual information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5 33
5.1 Conditional relative entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.1.1 Convex and Concave functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.1.2 Jensen’s Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.2 Channel Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.3 Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6 43
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.1.1 Expected or average length of a code . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.1.2 Uniquely decodable (separable) code . . . . . . . . . . . . . . . . . . . . . . . . . . 45
CONTENTS
7 52
7.1 Shannon-Fano Encoding Procedure for Binary code: . . . . . . . . . . . . . . . . . . . . . . 52
7.2 Construction of Haffman binary code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
7.3 Construction of Haffman D ary code (D>2) . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
8 63
8.1 Error correcting codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
8.2 Construction of linear codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
8.3 Standard form of parity check matrix: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
8.4 Hamming Code: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
8.5 Cyclic Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
8.6 BCH Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
9 70
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
9.2 Powers of Stochastic Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
10 73
10.1 Ergodic Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
11 85
11.1 Geometric Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
11.1.1 General form of G.P (Unconstrained G.P) (Primal Problem) . . . . . . . . . . . . . . 86
11.1.2 Necessary conditions for optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
12 95
12.1 Constraint Geometric Programming Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 95
13 99
13.1 Inventory Control/Problem/Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
13.1.1 Production Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
13.1.2 Inventory Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
13.1.3 Inventory related cost: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
13.1.4 Why inventory is maintained? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
13.1.5 Variables in Inventory Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
13.1.6 Some Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
13.2 The Economic Order Quantity (EOQ) model without shortage . . . . . . . . . . . . . . . . . 101
13.2.1 Model I(a): Economic lot size model with uniform demand . . . . . . . . . . . . . . . 101
13.2.2 Model I(b): Economic lot size with different rates of demand in different cycles . . . . 102
13.2.3 Model I(c): Economic lot size with finite rate of Replenishment (finite production)
[EPQ model] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
14 108
14.1 Model II(a) : EOQ model with constant rate of demand scheduling time constant . . . . . . . 108
14.2 Model II(b) : EOQ model with constant rate of demand scheduling time variable . . . . . . . 110
14.3 Model II(c) : EPQ model with shortages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
CONTENTS
15 118
15.1 Model III: Multi-item inventory model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
15.1.1 Model III(a): Limitation on Investment . . . . . . . . . . . . . . . . . . . . . . . . . 119
15.1.2 Model III(b): Limitation on inventory . . . . . . . . . . . . . . . . . . . . . . . . . . 121
15.1.3 Model III(c): Limitation on floor space . . . . . . . . . . . . . . . . . . . . . . . . . 123
16 125
16.1 Model IV: Deterministic inventory model with price breaks of quantity discount . . . . . . . . 125
16.1.1 Model IV(a): Purchase inventory model with one price break . . . . . . . . . . . . . . 127
16.1.2 Model IV(b): Purchase inventory model with two price breaks . . . . . . . . . . . . . 128
16.2 Probabilistic Inventory Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
16.2.1 Instantaneous demand, no set up cost . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Unit 1
Course Structure
• Reliability Theory
• MTTF in terms of failure density
1.1 Introduction
Reliability is the probability of a device performing its purpose adequately for the period of time intended
under the operating conditions encountered. The definition brings into focus, four important factors, namely,
• the reliability of a device is expressed as a probability;
• the device is required to give adequate performance;
• the duration of adequate performance is specified;
• the environment or operating conditions are prescribed.
Some of the important aspects of reliability are:
a) Reliability is a function of time. We could not expect an almost wornout light bulb to be as reliable as
one recently put into service.
b) Reliability is a function of conditions to use. In very severe environments, we expect to encounter
frequent system breakdowns than in normal environments.
c) Reliability is expected as a probability which helps us to quantify it and think of optimizing system
reliability.
1.2 Reliability
Definition 1.2.1. Hazard Rate/Failure Rate: Failure rate is the ratio of the number of failures during a
particular unit interval to the average population during that interval. Thus the failure rate for the ith interval
is
n
h i i ,
1 P i−1 Pi
2 N − k=1 n k + N − k=1 n k
1
2 UNIT 1.
where ni is the number of failures during the ith interval and N is the total number of components.
Definition 1.2.2. Failure Density:The failure density in a particular unit interval is the ratio of the number of
failures during that interval to the number of components. So the failure density during the ith interval is
ni
= fdi .
N
Let l be the last interval after which there are no intervals. Then
nl
fdl = .
N
Thus,
1 N
fd1 + fd2 + · · · + fdl = (n1 + n2 + · · · + nl ) = = 1.
N N
Hence the sum of values entered in column 5 is 1 (Table 1.1).
Definition 1.2.3. Reliability: Reliability(R), is the ratio of the number of survivals at any given time to the
total initial population. That is, reliability at ith time is
si
R(i) = ,
N
si is the number of survivals during the ith interval.
Definition 1.2.4. Probability of failure: The concept of probability of failure is similar to that of the concept
of probability of survival. This is the ratio of the number of units failed within a certain time to the total
population.
Hence, the probability of failure within ith time is
n1 + n2 + · · · + ni Fi
or ,
N N
so that the probability of failure at ith time plus reliability at ith time is
Fi si
+ =1
N N
(since Fi + si = N ), that is, probability of failure and reliability at the same time is always 1.
Definition 1.2.5. Mean Failure Rate(h): If Z1 is the failure rate for the first unit of time, Z2 is the failure
rate for the second unit of time, . . ., ZT is the failure rate for the T th unit of time, then the mean failure rate
for T times will be
Z1 + Z2 + · · · + ZT
h(T ) = .
T
The mean failure rate is also obtained from the formula
1 N (0) − N (T )
,
T N (0)
where N (0) is the total population at t = 0 and N (T ) is the total population remaining at time t = T .
1.2. RELIABILITY 3
Definition 1.2.6. Mean time to failure (MTTF): In general, if t1 is the time to failure for the first specimen,
t2 is the time to failure for the second specimen, . . ., tN is the time to failure for the N th specimen, then the
MTTF for N specimens is
t1 + t2 + · · · + tN
.
N
If n1 is the number of specimens that failed during first unit of time, n2 be that during second unit of time,
. . ., nl be that during the last (lth) unit of time, then the MTTF for the N specimens will be
n1 + 2n2 + · · · + lnl
MTTF = ,
N
where N = n1 + n2 + · · · + nl . If the time interval is δt unit instead of 1 unit, then
n1 + 2n2 + · · · + lnl
MTTF = δt
N
Pl
k=1 knk
= δt.
N
4 UNIT 1.
Example 1.2.7. In the life testing of 100 specimens of a particular device, the number of failures during each
time interval of 20 hours is shown in the following table:
Solution. As the number of specimens tested is large, it is tedious to record the time of failure for each
specimen. So we note the number of specimen that fail during each 20 hours interval. Thus
Example 1.2.8. The following table gives the results of tests conducted under severe adverse conditions on
1000 safety valves. Calculate the failure density fd (t) and the hazard rates Z(t) where the time interval is 4
hours instead of 1 hour.
Solution.
Time Number of Cumulative Number of Failure Failure/ Reliability
interval failures frequency Survivals(S) density(fd ) Hazard rate(Z(t)) (R)
t=0 0 0 1000 0 0 1
267
0<t≤4 267 267 733 0.067 4(1000+733) = 0.077 1 − 0.067 = 0.933
2
56
4<t≤8 59 326 674 0.0148 4(733+674) = 0.021 1 − (0.067 + 0.0148)
2
= 0.9182
36
8 < t ≤ 12 36 362 638 0.009 4(674+638) = 0.014 0.9092
2
24
12 < t ≤ 16 24 386 614 0.006 4(638+614) = 0.009 0.9032
2
23
16 < t ≤ 20 23 409 591 0.0057 4(614+591) = 0.009 0.8975
2
11
20 < t ≤ 24 11 420 580 0.0027 4(591+580) = 0.0047 0.8948
2
l
X
fd1 + fd2 + · · · + fdl = fdi = 1 (For discrete case)
i=1
Z T
fd (ξ)dξ = 1,
0
where the limits of the integration are taken from the beginning of the first at t = 0 till the end where
all the specimens failed at time t = T .
Hence the reliability R(t), for the tth hour for continuous case is given by
Z t
R(t) = 1 − fd (ξ)dξ
0
Z T
= fd (ξ)dξ[For continuous case].
t
i
X
F (i) = fd1 + fd2 + · · · + fdi = fdk ,
k=1
(iv) The failure rate or hazard rate for the ith hour is
ni 2[R(i − 1) − R(i)]
Z(i) = = .
1
N−
Pi−1 R(i − 1) + R(i)
2 k=1 nk
where N is the initial total survivals; n1 is the total no. of specimens that failed during the first δt time
interval, n2 is the total no. of specimens that failed during the second δt time interval, ... , nk is the total no.
of specimens failed during the kth δt interval. Now, by definition
nk
fdk =
N.δt
nk
⇒ = fdk δt.
N
Further, f δt is the elapsed time t. Hence the expression for MTTF can be written as
l
X l
X
MTTF = (k.fdk .δt)δt = fdk (kδt)δt (1.3.1)
k=1 k=1
where the summation is for the period from the first δt time interval to lth δt interval.
For continuous case, when δt → 0, and f δt is the elapsed time t and fdk will be the failure density fd (t) at
time t, then Z T
MTTF = tfd (t)dt, (1.3.2)
0
where T is the number of hours after which there are no survivals.
Now we have, F (t) + R(t) = 1. Thus,
Z t
F (t) = 1 − R(t) = fd (ξ)dξ
0
Thus,
d d
(F (t)) = − (R(t)) = fd (t).
dt dt
Thus,
Z ∞
MTTF = tfd (t)dt [For t > T, there are no survivals, so the values of the integration is 0, for t > T ]
0
Z ∞
d
= −t
(R(t))dt
0 dt
Z ∞
∞
= −[t.R(t)]0 + 1.R(t)dt
0
Z ∞
= R(t)dt [Since R(0) = 1 and R(∞) = 0 as t → ∞, there are no survivals] (1.3.3)
0
For continuous case, when the hazard rate is constant, that is, Z(t) = λ, a constant, say, then
Z t Z t
Z(ξ)dξ = λdξ = λt.
0 0
Thus, Rt
R(t) = e− 0 Z(ξ)dξ
= e−λt
and F (t) = 1 − e−λt . Similarly,
fd (t) = Z(t) × R(t) = λ e−λt .
8 UNIT 1.
Thus,
Z ∞
MTTF = R(t)dt
Z0 ∞
= e−λt dt
0
−λt ∞
e 1
= − = .
λ 0 λ
Thus, for a constant hazard model, the MTTF is simply the reciprocal of the hazard rate.
The constant hazard rate is also known as the exponential reliability rate.
Z ∞ Z ∞
MTTF = tfd (t)dt = t e−λt dt
0 0
−λt ∞ Z ∞
te λ −λt
= λ + e dt
−λ 0 0 λ
Z ∞
1
= 0+ e−λt dt = .
0 λ
Solution. For an exponential distribution, the MTTF is the reciprocal of the hazard rate λ(say), where λ is a
constant, that is, MTTF= λ1 .
Here, we identify the MTTF with a mean value 100V. Thus,
1
= 100 ⇒ λ = 0.01.
λ
Hence the p.d.f fd (t) for the voltage distribution is = λ e−λt = 0.01 × e−0.01t .
Now, the probability that the voltage lies between V1 and V2 is given by
Z V2
F (V2 ) − F (V1 ) = fd (t)dt
V1
Z V2
= λ e−λt dt
V1
= 1 − e−λ(V2 −V1 ) .
Example 1.3.2. It is observed that the failure pattern of an electronic system follows an exponential distribu-
tion with mean time to failure of 100 hours. What is the probability that the system failure occurs within 750
hours?
1.3. MTTF IN TERMS OF FAILURE DENSITY 9
1
Solution. MTTF = λ = 1000, where λ is the constant hazard rate. Thus,
1
λ= .
1000
Thus,
fd (t) = λ e−λt
Hence the probability that the system failure occurs within a period V is
Z V Z V
F (V ) = fd (t)dt = λ e−λt dt = 1 − e−λV .
0 0
Unit 2
Course Structure
• Linearly Increasing Hazard
• System Reliability
• Redundancy
and
d2
k k 2 k 2
2
(fd (t)) = k − .2t e− 2 t [1 − kt2 ] + k e− 2 t [−2kt]
dt 2
k 2 k 2
= −k 2 t e− 2 t [1 − kt2 ] − 2k 2 t e− 2 t
k 2 k 2
= −3k 2 t e− 2 t +k 3 t3 e− 2 t
10
2.2. SYSTEM RELIABILITY 11
Now,
d 1
(fd (t)) = 0 ⇒ t = √ [since t > 0]
dt k
At t = √1 ,
k
d2 −3k 2 − k t2 k3 k 2
(fd (t)) = √ e 2 + √ e− 2 t
dt2 k k k
2k 2 k 2
= − √ e− 2 t
k
√ k 2
= −2k k e− 2 t
√ k 1
= = −2k k e− 2 k
√
2k k
= − √ <0
e
√
r
1 k
fd (t) t= √1
= k. √ e−1/2 = k e−1/2 = .
k k e
q
Hence fd (t) reaches a maximum value ke at t = √1k and tends to zero as t becomes larger. Now, we calculate
the MTTF when the hazard rate increases linearly.
Z ∞ Z ∞
2
MTTF = R(t)dt = e−k/2t dt
0
r0 Z ∞ r
2 −z 2 k
= e dz [Put t = z]
k 0 2
r √
2 π
= .
k 2
r
π
= .
2k
consists of n units which are connected in series as shown. Let the successful operation of these individ-
ual units be represented by X1 , X2 , . . . , Xn and their respective probabilities by P (X1 ), P (X2 ), . . . , P (Xn ).
12 UNIT 2.
For the successful operation of the system, it is necessary that all n units function satisfactorily. Hence
the probability of the successful operation of all the units is P (X1 and X2 and · · · and Xn ).
We shall assume that these units are not independent of one another, that is, the successful operation of
unit 1 might affect the successful operation of all other units and so on.
The system reliability is given by
Example 2.2.1. In a hydraulic control system, the connecting linkage has a reliability factor 0.98 and
the valve which has a reliability factor 0.92. Also the pressure sensor which activates the linkage, has
a reliability factor 0.90. Assume that all the three elements namely the activator, the linkage and the
hydraulic valve are connected in series with independent reliability factors. What is the reliability of
the control system?
Solution. Let the successful operation of the elements namely the activator, the linkage and the hy-
draulic valve be denoted by X1 , X2 and X3 respectively. Thus,
Since these elements are connected in series with independent reliability factors, hence the reliability of
the control system, S(say) is
Note 2.2.2. There is an important point that the reliability of a series system is always worse than the
poorest component of the system.
Example 2.2.3. If the system consists of n identical units in series and if each unit has a reliability
factor p, determine the system reliability under the assumption that all units function independently.
Solution. P (S) = p.p . . . p(n times) = pn . Now, if q is the probability of failure of each unit, then
p = 1 − q.
Hence the system reliability
P (S) = pn = (1 − q)n = 1 − nq + · · ·
Example 2.2.4. A system has 10 identical equipments. It is desired that the system reliability be 0.95.
Determine how good each component should be?
2.2. SYSTEM RELIABILITY 13
B. Parallel Configuration: Several systems exist in which successful operation depends on the satisfac-
tory functioning of any one of their n subsystems or elements. These are said to be connected in parallel.
We can also ass a system in which several signal paths perform the same operation and the satisfactory
performance of any one of these paths is sufficient to ensure the successful operation of the system. The
elements of such a system are said to be connected in parallel.
A block diagram representing a parallel configuration is shown in the figure below The reliability of the
system can be calculated very easily by considering the conditions for system failure.
Let X1 , X2 , . . . , Xn represent successful operation of units 1, 2, . . . , n respectively. Similarly, let
X 1 , X 2 , . . . , X n respectively represent their successful operation, that is, the failure of the units.
If P (X1 ) is the probability of successful operation of unit 1, then P (X 1 ) = 1 − P (X1 ), and so on.
For the complete failure of the system S, all the n units have to fail simultaneously. If P (S) is the
probability of failure of the system, then
P (S) = P (X 1 and X 2 and · · · and X n )
= P (X 1 )P (X 2 |X 1 )P (X 3 |X 1 X 2 ) . . . P (X n |X 1 X 2 . . . X n−1 )
The expression P (X 3 |X 1 X 2 ) represents the probability of failure of unit 3 under the condition that
units 1 and 2 have failed.
The other terms can also be interpreted in the same manner. If the unit failures are independent of one
another, then
P (S) = P (X 1 )P (X 2 ) . . . P (X n )
= [1 − P (X1 )][1 − P (X2 )] . . . [1 − P (Xn )].
Since if any one of them does not fail, then the problem of successful configuration of the system is
P (S) = 1 − P (S).
14 UNIT 2.
For independent cases, P (S) = 1 − [1 − P (X1 )][1 − P (X2 )] . . . [1 − P (Xn )]. If the n elements are
identical and the unit failures are independent of one another, then
P (S) = 1 − (1 − P (X))n
Solution. P (S) = 1 − (1 − 0.10)3 = 1 − 0.729 = 0.271. This reveals the important fact that a parallel
configuration can greatly increase system reliability with just three elements connected in parallel.
Example 2.2.6. A parallel system is composed of 10 independent identical components. If the system
reliability P (S), is to be 0.95, how poor can the components be?
Solution. Let P (X) be the probability of successful operation of each component. Thus,
Each component can have a very low reliability factor of 0.2589 but still gives the system a reliability
factor as high as 0.95.
(KU 2011)
2.3 Redundancy
If the state of art is such that either it is not possible to produce highly reliable components or the cost of
producing such components is very high, then we can improve the system reliability by the technique of
introducing redundancies.
This involves the deliberate creation of new parallel path in a system. If two elements A, B with probability
of success P (A) and P (B) are connected in parallel, then the probability of the successful operation of the
system,
P (A or B) = P (A) + P (B) − P (A and B)
= P (A) + P (B) − P (A)P (B),
16 UNIT 2.
Course Structure
• Information Theory: Fundamentals of Information theory
3.1 Introduction
In everyday life we observe that there are numerous means for the transmission of information. For example,
the information is usually transmitted by means of a human voice, i.e., as in telephone, radio, television etc.,
by means of letters, newspapers, books etc. We often come across sentences like
But few people have suspected that it is really possible to measure information quantitatively. An amount
of information has a useful numeric value just like an amount of sugar or an amount of bank balance. For
example, suppose a man goes to a new community to rent a house and asks an unreliable agent “is this house
cool in summer season?" If the agent answers ‘yes’, the man has received very little information, because
more than likely that agent would have answered ‘yes’ regardless of the facts. If on the other hand, the man
has a friend who lives in a neighbouring house, he can get more information by asking his friend the same
question because the answer will be more reliable.
In general way it would appear that the amount of information in the message should be measured by extent
of the change in probability produced by the message. There will be atleast three essential parts of simplest
communication system:
• Transmitter or Source,
• Communication channel or transmission network which carries the message from the transmitter to the
receiver,
17
18 UNIT 3.
• Receiver or Sink
3.2.1 Origination
The information theory is an appealing name assigned to a scientific discipline which deals with the mathe-
matical theory of communication. The origin of information theory dates back to the work of R.V. Hartley
(“Transmission of informations", Bellsys technical journal vol. 7, 1928), who tried to develop a quantitative
measure of information in the telecommunication system. The field of information theory grown consider-
ably often the publication of C.E. Shannon’s (“A mathematical theory of communication", Bellsys technical
journal, vol. 27, 1948). Information theory answers two fundamental questions in communication system.
For this reason, some consider information theory as a subset of communication theory. Indeed it has fun-
damental contribution in statistical physics, computer science, probability and statistics, Biology, Economics
etc. We see information only when we are in doubt which arises when there are number of alternatives and we
are uncertain about the outcome of the event. On the other hand, if the event can occur in just one way, there
is no uncertainty about it and no information is called for we get some information by the occurrence of the
event when there was some uncertainty before its occurrence. Therefore, the amount of information received
must be equal to the amount of uncertainty may be before the occurrence of the event.
Then the event E2 is more likely to occur and so the message conveying the occurrence of E2 contains low
information (bit information) than that conveying the occurrence of E1 . Further if p2 continually decreased to
p1 , the uncertainty associated with the occurrence of E2 increases continually corresponding to the event E1 .
The above intuitive idea suggested that the measure of information conveyed by the message stating the
occurrence of event with the probability p must be a function of p only, say h(p), which is non-negative,
strictly decreasing, continuous and h(1) = 0. Also h(p) is very large when p is nearly equal to 0.
3.3. MEASURE OF INFORMATION AND CHARACTERISATION 19
Next consider two events E1 and E2 with probability of occurrence p1 and p2 respectively. If we are told
that the event E1 has occurred, then we have received an amount of information h(p1 ). Giving this message,
the probability that E2 will occur is
p21 = p(E2 |E1 ).
Suppose now we are told that the event E2 has also occurred. Then the additional amount of information
received is h(p21 ).
Therefore the total amount of information received form their two successive messages is
h(p1 ) + h(p21 ).
p21 = p2 .
Again, the probability of both the events E1 and E2 is p1 p2 and the amount of information conveyed by the
message stating that both the events E1 and E2 have occurred is h(p1 p2 ).
Thus from the above discussion we see that the amount of information received from the message stating
that the event E with probability p has occurred is a function of p only, say h(p) and has the following
characterisations.
(i) h(p) is non-negative, continuous and strictly decreasing function in p in (0, 1].
(ii) h(1) = 0 and h(p) is very large when p is very close to 0, i.e., h(p) → ∞ as p → 0.
(iii) if E1 and E2 are independent events with probability of occurrence p1 and p2 respectively, then the
amount of information conveyed by the message stating that the occurrence of both events E1 and E2 is
equal to the amount of information conveyed by the massage dealing with the event E1 plus the amount
of information dealing with the event E2 , i.e.,
Theorem 3.3.1. Let h(p) denote the amount of information received form the message stating the event E
with probability p has occurred. Then
h(p) = −k log p,
where, k is a positive constant.
Take any p ∈ (0, 1] and let n be a positive integer. We first show that
h(pn ) = nh(p) (3.3.1)
Clearly, (3.3.1) holds for n = 1.
Clearly, H(X) ≥ 0.
Note 3.4.1. x log x → 0 as x → 0 and we have used the conversion that 0 log 0 = 0.
2. If pi = n1 , i = 1, 2, . . . , n, then
1 1 1
Hn (p1 , p2 , . . . , pn ) = Hn , ,··· ,
n n n
n
X 1 1
= − log
n n
i=1
= log n.
1 1 1
So, Hn , ,··· , is strictly increasing function of n.
n n n
3.
Let s1 = p1 + p2 + . . . + pn1
s2 = pn1 +1 + pn1 +2 + . . . + pn2
... ... ............
sk = pnk−1 +1 + pnk−1 +2 + . . . + pnk , (nk = n)
and m1 = n1 , m2 = n2 − n1 , . . . , mk = nk − nk−1
Then,
p1 p2 pn1
Hn (p1 , p2 , . . . , pn ) = Hk (s1 , s2 , . . . , sk ) + s1 Hm1 , ,··· ,
s1 s1 s1
pn1 +1 pn2 pnk−1 +1 pnk
+ s2 Hm2 ,··· , + · · · + sk Hmk ,··· , .
s2 s2 sk sk
The above relation may be expressed as follows:
If a random experiment is decomposed into several successive ones, then the original value of H is equal
to the weighted sum of the corresponding values of H with weights 1, s1 , s2 , . . . , sk .
k
X
Now, we have Hk (s1 , s2 , . . . , sk ) = − si log si .
i=1
n1
p1 p2 pn X pi pi
∴ s1 Hm1 , ,..., 1 = −s1 log
s1 s1 s1 s1 s1
i=1
n1
X n1
X
= − pi log pi + pi log s1
i=1 i=1
Xn1
= − pi log pi + s1 log s1 .
i=1
Similarly, we find
n2
pn1 +1 pn X
s2 Hm2 ,..., 2 =− pi log pi + s2 log s2
s2 s2
i=n1 +1
..
.
nk
pnk−1 + 1 pn X
sk Hmk ,..., k =− pi log pi + sk log sk
sk sk
i=nk−1 +1
3.4. ENTROPY (SHANNON’S DEFINITION) 23
Theorem 3.4.2. For a fixed n, the entropy function Hn (p1 , p2 , . . . , pn ) is maximum when p1 = p2 = . . . =
pn = n1 and Hn (max) = log n.
Proof. We first show that log x ≤ x − 1 for all x > 0 and the equality holds for x = 1.
∴ φ0 (x) = 1 − x1 .
If x > 1, then φ0 (x) > 0 and if 0 < x < 1, then φ0 (x) < 0.
So φ(x) is a strictly increasing function in (1, ∞) and strictly decreasing in (0, 1).
Note 3.4.4. The entropy of X may be interpreted as the expected value of the function log p1i where pi is the
p.m.f of X. Thus
X n n
1 1 X
E log = pi log = − pi log pi = H(X).
pi pi
i=1 i=1
Unit 4
Course Structure
• Bivariate Information Theory
• Joint, conditional and relative entropies
• Mutual Information
25
26 UNIT 4.
1
Similarly, we can show that H(Y |X) = Ep(x,y) log .
p(y|x)
(iii) The relative entropy or Kullback-leibler distance between two probability mass functions p(x) and q(x)
with X = {x1 , x2 , . . . , xm } is defined as
m
X p(xi )
D(p||q) = p(xi ) log
q(xi )
i=1
p(x)
= Ep(x) log
q(x)
Now, let pk = 0 for some k and qk 6= 0, but pi > 0, qi > 0 for i 6= k. Then clearly (4.2.2) holds because
pk log pk = 0 and pk log qk = 0 if qk = 0 for some k but qk 6= 0. pk log qk = −∞ and so (4.2.2) holds.
qk
Suppose that the equality holds in (4.2.2). Also assume that pk 6= qk for some k. Then 6= 1 and so
pk
qk qk
log < − 1.
pk pk
This gives
n n
X qi X
pi log < (qi − pi ) = 0
pi
i=1 i=1
Xn n
X
⇒ pi log qi < pi log pi
i=1 i=1
qk
which contradicts (4.2.2) since here equality does not hold because 6= 1.
pk
∴ p i = qi for all i.
Now, let D 6= e for any x > 0. Then logD x = logD e · loge x and logD e > 0. So, multiplying (4.2.2) by
logD e we get
X n n
X
pi logD qi ≤ pi logD pi .
i=1 i=1
Proof. Let X, Y be two discrete random variables with ranges X = {x1 , x2 , . . . , xm }, Y = {y1 , y2 , . . . , yn }
and probability mass functions (p.m.f) p(x) and q(y) with the joint p.m.f p(x, y) = p(X = x; Y = y). We
have
m
X n
X
H(X) + H(Y ) = − p(xi ) log p(xi ) − q(yj ) log q(yj )
i=1 j=1
Xm Xn m X
X n
= − p(xi , yj ) log p(xi ) − p(xi , yj ) log q(yj )
i=1 j=1 i=1 j=1
m X
X n
= − p(xi , yj ) log p(xi )q(yj )
i=1 j=1
Xm X n
Also, H(X, Y ) = − p(xi , yj ) log p(xi , yj )
i=1 j=1
m X
X n m
X n
X m X
X n
Now, p(xi )q(yj ) = p(xi ) q(yj ) = 1 and p(xi , yj ) = 1
i=1 j=1 i=1 j=1 i=1 j=1
28 UNIT 4.
Proof. Let X, Y, Z be three discrete random variables with ranges X = {x1 , x2 , . . . , xm }, Y = {y1 , y2 , . . . , yn }
and Z = {z1 , z2 , . . . , zk } respectively and probability mass functions are p(x), q(y) and r(z) with joint p.m.f
p(x, y, z) = p(X = x; Y = y; Z = z). Then
m X
X k m X
X n X
k
H(X|Z) + H Y |(X, Z) = − p(xi , zl ) log p(xi |zl ) − p(xi , yj , zl ) log p(yj |(xi , zl ))
i=1 l=1 i=1 j=1 l=1
m X
X n X
k m X
X n X
k
= − p(xi , yj , zl ) log p(xj |zl ) − p(xi , yj , zl ) log p(yj |(xi , zl ))
i=1 j=1 l=1 i=1 j=1 l=1
m X
X n X
k
= − p(xi , yj , zl ) log p(xi |zl )p(yj |(xi , zl ))
i=1 j=1 l=1
m X
n X
k
X p(xi , zl ) p(xi , yj , zl )
= − p(xi , yj , zl ) log ·
p(zl ) p(xi , zl )
i=1 j=1 l=1
m X
n X
k
X p(xi , yj , zl )
= − p(xi , yj , zl ) log
p(zl )
i=1 j=1 l=1
m X
X n X
k
= − p(xi , yj , zl ) log p (xi , yj )|zl
i=1 j=1 l=1
= H (X, Y )|Z
ii) The conditional mutual information of random variables X and Y given Z1 , Z2 , . . . , Zn is defined by
Proof.
Theorem 4.2.10. (Information inequality): Let p(x) and q(x) for x ∈ X be two probability mass functions.
Then
D(p||q) ≥ 0
n
X p(xi )
D(p||q) = p(xi ) log
q(xi )
i=1
n
X pi
= pi log
qi
i=1
4.2. MUTUAL INFORMATION 31
n
X n
X
Also we have pi = qi = 1. So, by Theorem 4.2.1,
i=1 i=1
n
X n
X
pi log qi ≤ pi log pi
i=1 i=1
n
X pi
⇒ pi log ≥0
qi
i=1
⇒ D(p||q) ≥ 0.
Theorem 4.2.11. (Non-negativity of mutual information) For any two random variables X and Y , I(X; Y ) ≥
0.
Proof. Let X and Y be two discrete random variables with range {x1 , x2 , . . . , xm } and {y1 , y2 , . . . , yn } re-
spectively and the p.m.f p(x) and q(y), joint p.m.f p(x, y) = P (X = x, Y = y). Then the mutual information
I(X; Y ) between X and Y is given by
m X
n
X p(xi , yj )
I(X; Y ) = p(xi , yj ) log .
p(xi )q(yj )
i=1 j=1
Now, we have
m X
X n m X
X n m
X n
X
p(xi , yj ) = 1 and p(xi )q(yj ) = p(xi ) q(yj ) = 1
i=1 j=1 i=1 j=1 i=1 j=1
So by Theorem 4.2.1
m X
X n m X
X n
p(xi , yj ) log p(xi , yj ) ≥ p(xi , yj ) log p(xi )q(yj )
i=1 j=1 i=1 j=1
m X n
X p(xi , yj )
i.e., p(xi , yj ) log ≥0
p(xi )q(yj )
i=1 j=1
i.e., I(X; Y ) ≥ 0.
Theorem 4.2.12. (Non-negativity of conditional mutual information) For any two random variables X and
Y given Z, the conditional mutual information I(X; Y |Z) ≥ 0.
Proof. Let X, Y , Z be three discrete random variables with ranges {x1 , x2 , . . . , xm }, {y1 , y2 , . . . , yn },
{z1 , z2 , . . . , zk } respectively and probability mass functions p(x), p(y), p(z) with joint p.m.f p(x, y, z) =
P (X = x, Y = y, Z = z).
Then by definition,
m X
n X
k
X p(xi , yj |zl )
I(X; Y |Z) = p(xi , yj , zl ) log
p(xi |zl )p(yj |zl )
i=1 j=1 l=1
32 UNIT 4.
m X
X n X
k
Now, p(xi , yj , zl ) = 1
i=1 j=1 l=1
m X
n X
k m X
k n
X p(xi , zl )p(yj , zl ) X X p(yj , zl )
and, = p(xi , zl ) ·
p(zl ) p(zl )
i=1 j=1 l=1 i=1 l=1 j=1
m X
k n n
X p(zl ) X X
= p(xi , zl ) ∵ p(yj , zl ) = p(zl ) p(yj ) = p(zl )
p(zl )
i=1 l=1 j=1 j=1
m X
X k
= p(xi , zl )
i=1 l=1
= 1
Course Structure
• Conditional Relative Entropy
• Channel Capacity
• Redundancy
33
34 UNIT 5.
Now consider
n
X bi ai
λ= bi , αi = , ti =
λ bi
i=1
n
X
Then αi = 1 and αi > 0 for all i.
i=1
So, by Jensen’s inequality, we have
n n
!
X X
αi f (ti ) ≥ f αi ti (5.1.1)
i=1 i=1
n n n
! !
X bi ai ai X ai X bi ai
⇒ logD ≥ logD
λ bi bi λ λ bi
i=1 i=1 i=1
n
P
n n ai
X ai X i=1
⇒ ai logD ≥ ai logD n
(5.1.2)
bi P
i=1 i=1 bi
i=1
ai
If = constant = k (say), for i = 1, 2, . . . , n.
bi
Then clearly equality in (5.1.2) holds.
t1 = t2 = . . . = tn
a1 a2 an
⇒ = = ... =
b1 b2 bn
ai
i.e., = constant; i = 1, 2, . . . , n
bi
5.1. CONDITIONAL RELATIVE ENTROPY 35
Theorem 5.1.2. D(p||q) is convex in pair (p, q) i.e., if (p1 , q1 ), (p2 , q2 ) be two pairs of probability mass
functions and λ > 0, µ > 0 with λ + µ = 1, then
Proof. Let (p1 , q1 ) and (p2 , q2 ) be two pairs of probability mass functions and λ > 0, µ > 0 with λ + µ = 1.
Then by Log-Sum inequality, we have
∂H
Now, = −{1 + loge pi } logD e
∂pi
∂2H 1
= − logD e
∂p2i pi
∂2H
= 0, i 6= j
∂pi ∂pj
Proof.
XX p(y|x)
We have, D(p(y|x)||q(y|x)) = p(x, y) log
x y
q(y|x)
XX p(x, y) q(x)
= p(x, y) log
x y
q(x, y) p(x)
XX XX
Now, p(x, y) · q(x) = q(x, y) p(x) = 1
x y x y
Example 5.1.5. In a certain community, 25% of all girls are blondes, and 75% of all blondes are blue eyed.
Also, 50% of all girls in the community have blue eyes. If you know that a girl has blue eyes, how much
additional information do you being informed that she is blond?
Solution. Let p1 = probability of a girl being blonde = 0.25.
p2 = probability of a girl to have blue eyes if she is blonde = pblonde (blue eyes) = 0.75
p4 = p (blonde, blue eyes) = probability that a girl is blonde and has blue eyes
and px = p blue eyes (blonde) = probability that a blue eyed girl is blonde = ?
Then
p1 p2 0.25 × 0.75
p4 = p1 p2 = p3 px ⇒ px = =
p3 0.50
If a girl has blue eyes, the additional information obtained by being informed that she is blonde is
1 p3
log2 = log2
px p1 p2
= log2 p3 − log2 p1 − log2 p2
1 1 3
= log2 − log2 − log2
2 4 4
4
= log2 4 + log2 − log2 2
3
= 1.41503
≈ 1.42 bits
Example 5.1.6. Evaluate the average uncertainty associated with the probability of events A, B, C, D with
1 1 1 1
probability of events , , , respectively.
2 4 8 8
5.1. CONDITIONAL RELATIVE ENTROPY 37
Solution.
1 1 1 1 1 1 1 1 1 1 1 1
We have, H , , , = − log − log − log − log
2 4 8 8 2 2 4 4 8 8 8 8
1 1 3
= log2 2 + log2 2 + log2 2
2
2 4
1 1 3
= + + log2 2
2 2 4
7
= bits
4
Example 5.1.7. A transmitter has an alphabet consisting of 5 letters {x1 , x2 , x3 , x4 , x5 } and the receiver has
an alphabet consisting of 4 letters {y1 , y2 , y3 , y4 }. The joint probabilities for communication are given below
y1 y2 y3 y4
x1 0.25 0.00 0.00 0.00
x2
0.10 0.30 0.00 0.00
x3
0.00 0.05 0.10 0.00
x4 0.00 0.00 0.05 0.10
x5 0.00 0.00 0.05 0.00
Determine the marginal, conditional and joint entropies for this channel. (Assume 0 log 0 ≡ 0)
Solution. The channel is described here by joint probabilities pij , i = 1, 2, 3, 4, 5; j = 1, 2, 3, 4. Then the
conditional and marginal probabilities are easily obtained from pij ’s as follows:
pij
By using the result, pj|i = , the conditional probabilities are given in the following channel matrix
pi0
y1 y2 y3 y4
x1 1 0 0 0
x2 1 3
0 0
4 4
1 2
x3 0 0
3 3
0 1 2
x4 0 3 3
x5 0 0 1 0
38 UNIT 5.
Marginal entropies:
5
X
∴ H(X) = − pi0 log2 pi0
i=1
= −(0.25 log2 0.25 + 0.40 log2 0.40 + . . . + 0.05 log2 0.05)
= 1.326 bits
X4
∴ H(Y ) = − p0j log2 p0j
j=1
= 1.8556 bits
Conditional entropies
5 X
X 4
H(Y |X) = − pij log2 pj|i = 0.6 bits
i=1 j=1
Similarly, H(X|Y ) = H(X) + H(Y |X) − H(Y )
= 1.3260 + 0.6 − 1.8336 = 0.0704 bits
Joint Entropy
H(X, Y ) = H(X) + H(Y |X)
= 1.3260 + 0.6
= 1.9260 bits
Since max{H(X)} occurs when all symbols have equal probabilities, hence the channel capacity for a noise
free channel is
1
C = − log = log2 n bits/symbol.
n
5.3 Redundancy
i)
Absolute redundancy = C − I(X; Y )
= C − H(X)
= log n − H(X) (For noise free channel)
5.3. REDUNDANCY 39
ii)
C − I(X; Y )
Relative redundancy =
C
log n − H(X) H(X)
= =1−
log n log n
iii)
H(X)
Efficiency of a noise free system =
log n
= 1 − Relative redundancy.
Example 5.3.1. Find the capacity of the memory less channel specified by the channel matrix
1 1 1
2 4 4 0
1 1 1 1
P = 4 4 4 4
0 0 1 0
1 1
2 0 0 2
C = max I(X, Y )
= max{H(X) + H(Y ) − H(X, Y )}
4
X
= − pij log pij , j = 1, 2, 3, 4
i=1
X4 4
X 4
X 4
X
= − pi1 log pi1 − pi2 log pi2 − pi3 log pi3 − pi4 log pi4
i=1 i=1 i=1 i=1
where
1 1 1
pi1 = , , ,0
2 4 4
1 1 1 1
pi2 = , , ,
4 4 4 4
pi3 = (0, 0, 1, 0)
1 1
pi1 = , 0, 0,
2 2
1 1 1 1 1 1 1 1
Thus, C = log2 + 2 log2 +4 log2 + 1 log2 1 + 2 log2
2 2 4 4 4 4 2 2
3
= log2 2 + 3 log2 2
2
9
= bits/symbol
2
40 UNIT 5.
n−2
1
Example 5.3.2. Show that the entropy of the following probability distribution is 2 − .
2
1 1 1 1 1 1
Probabilities ... ...
2 22 2i 2n−1 2n−1 2n
1 1
pi = i
, i = 1, 2, . . . , n − 1 and pn = n−1
2 2
n
X 1 1 1 1
and pi = + + . . . + n−1 + n−1
2 22 2 2
i=1
1
1 1 − 2n−1 1
= + n−1
2 1 − 12 2
1 1
= 1 − n−1 + n−1
2 2
= 1
n
X
H(p1 , p2 , . . . , pn ) = − pi log pi
i=1
n−1
X
⇒ H(p1 , p2 , . . . , pn ) = − pi log pi − pn log pn
i=1
n−1
X
1 1 1 1
⇒ H(p1 , p2 , . . . , pn ) = − log2 − log2
2i 2i 2n−1 2n−1
i=1
n−1
X
1 1
⇒ H(p1 , p2 , . . . , pn ) = log2 (2i ) + log2 (2n−1 )
2i 2n−1
i=1
n−1
X 1 1
⇒ H(p1 , p2 , . . . , pn ) = i· i
+ (n − 1) n−1
2 2
i=1
1 2 3 n−1 n−1
⇒ H(p1 , p2 , . . . , pn ) = + + + . . . + n−1 + n−1 (5.3.1)
2 22 23 2 2
1 1 2 3 n−1 n−1
⇒ H(p1 , p2 , . . . , pn ) = 2
+ 3 + 4 + ... + n + n (5.3.2)
2 2 2 2 2 2
5.3. REDUNDANCY 41
Solution. Let us assume that {pi } are decreasing in i, which is quite possible because reordering of the {pi }
does not affect the value of entropy. Then
∞
X i
X
1= pj ≥ pj ≥ ipi
j=1 j=1
∞
X
Hence, pi log i < ∞.
i=1
∞ ∞
X
P
Example 5.3.4. If the probability distribution Φ = (p1 , p2 , . . .), pi ≥ 0, pi = 1 is such that pi log i <
i=1 i=1
∞
X
∞, then show that H(Φ) = − pi log pi < ∞.
i=1
42 UNIT 5.
Example 5.3.5. Let H be the entropy of the probability distribution p1 , p2 , . . . , pn . If H1 be the entropy of
the probability distribution p1 + p2 , p3 , . . . , pn , then show that
p1 Ps p2 Ps
H − H1 = Ps Hs where Ps = p1 + p2 and Hs = log + log
Ps p1 Ps p2
Solution. We have
Course Structure
• Coding Theory
6.1 Introduction
Coding theory is the study of the method for efficient transfer of information from source; the physical medium
through which the information transmitted for the channel, the telephone line and atmosphere are examples
of channel. The undesirable disturbances are called noises. The following diagram provides a rough idea of
the general information system:
Definition 6.1.1. Code: Let X be a random variable with range S = {x1 , x2 , . . . , xq } and let D be the D-ary
alphabet, i.e., the set of all finite strings of symbols {0, 1, 2, . . . , D − 1}. A mapping C : S → D will be
43
44 UNIT 6.
called a code for the random variable X and S is called the source alphabet and D is called the code alphabet.
If xi ∈ S, then C(xi ) is called codeword. Corresponding to xi , the number of symbols in codeword C(xi )
is called the length of the codeword and it is denoted by l(xi ).
Example 6.1.2. Let X be a random variable with range S = {x1 , x2 , x3 , x4 }, D = {0, 1} be the code
alphabet. Define C : S → D as follows
x1 → 0, x2 → 00, x3 → 01, x4 → 11
Definition 6.1.3. A code with code alphabet D = {0, 1} is called a binary code. A code with code alphabet
D = {0, 1, 2} is called a ternary code.
Definition 6.1.4. A code C is said to be non-singular code if the mapping C is one-to-one, i.e., if C(xi ) 6=
C(xj ) for xi 6= xj . Clearly the code C in Example 6.1.2 is a non-singular code.
Definition 6.1.5. Extension of code: Let X be a random variable with range S = {x1 , x2 , . . . , xq } and
D = {0, 1, 2, . . . , D − 1} as the code alphabet and C be a code for the random variable X. The n-th
extension of C is a mapping C ∗ : S n (= S × S × . . . × S(n times)) → D defined by
Example 6.1.6. Let X be a random variable with range S = {x1 , x2 , x3 , x4 }, D = {0, 1} as the code alphabet
and C : S → D be a code defined by
x1 → 0, x2 → 00, x3 → 01, x4 → 11
Example 6.1.7. Let X be a random variable with range S = {x1 , x2 , x3 , x4 } and code alphabet D = {0, 1}
with p.m.f p(x) defined by
1 1 1
p(x1 ) = , p(x2 ) = , p(x3 ) = = p(x4 )
2 4 8
Let the code C be defined as follows:
Let C : S → D be an instantaneous code of the random variable X. The codewords are C(x1 ), C(x2 ), . . . ,
C(xq ). Since no codeword is a prefix of any other codeword, we have C(xi ) 6= C(xj ) for xi 6= xj .
So C is one-to-one. Assuming C is not uniquely decodable, then there is a positive integer n > 1 such that
2nd , 3rd , . . . , (n + 1)th extension of C are one to one. But the nth extension is not one-to-one.
x = xi1 xi2 . . . xin and y = yν1 yν2 . . . yνn in S such that x 6= y (6.1.1)
But
C n (x) = C n (y) (6.1.2)
Write x0 = xi2 xi3 . . . xin and y 0 = yν2 yν3 . . . yνn , then
x = x i1 x 0 , y = yν1 y 0
46 UNIT 6.
where l(xi1 ) is the length of the codeword C(xi1 ) and l(yν1 ) be that of C(yµ1 ). From (6.1.2), and (6.1.3)
(6.1.4), it follows that the codeword C(xi1 ) is a prefix of the codeword C(yν1 ). Since C is an instantaneous
code, it follows that
xi1 = yν1 ⇒ C n−1 (x0 ) = C n−1 (y 0 )
Since C n−1 is one-to-one, we have x0 = y 0 . So, we have x = y [∵ xi1 = yν1 ] which contradicts (6.1.1).
Theorem 6.1.11. Kraft inequality for instantaneous code: Let S = {x1 , x2 , . . . , xq } be the source al-
phabet and D = {0, 1, 2, . . . , D − 1} be a code alphabet for a random variable X. Then a necessary and
sufficient condition for the existence of an instantaneous code for the random variable X with codeword
lengths l1 , l2 , . . . , lq formed by the elements of D is that
q
X
D−li ≤ 1.
i=1
Proof. We first show that the condition is sufficient assuming that we have given codeword lengths l1 , l2 , . . . , lq
satisfying the condition
Xq
D−li ≤ 1. (6.1.5)
i=1
We show that there exists an instantaneous code for the random variable X with these codeword lengths. The
lengths l1 , l2 , . . . , lq may or may not be distinct. We shall find it useful to consider all codewords of the same
length at a time.
We form n1 codewords of length 1. Then there are (D − n1 ) unused codewords of length 1 which may be
used as prefixes. By adding one symbol to the end of these permissible prefixes we may form as many as
(D − n1 )D = D2 − n1 D codewords of length 2. The inequalities (6.1.10) assures that we need no more than
these number of (i.e., D2 − n1 D) codewords of length 2. As before, we chose n2 codewords arbitrarily from
(D2 − n1 D) choices and we are left with (D2 − n1 D − n2 ) unused prefixes of length 2 with which we may
form (D2 − n1 D − n2 )D = D3 − n1 D2 − n2 D codewords of length 3. We select arbitrarily n3 codewords
from them and left with D3 − n1 D2 − n2 D − n3 unused prefixes of length 3.
Continuing this process we obtain a code in which no codeword is prefix of any other codeword. So the
code constructed is an instantaneous code.
We now show that the condition is necessary. Suppose that the codewords C(x1 ), C(x2 ), . . . , C(xq ) of
lengths l1 , l2 , . . . , lq for an instantaneous code for a random variable X.
There are all together D codewords of length 1 of which only n1 codewords have been used. So (D − n1 )
codewords of length 1 are left unused. By adding one symbol to the end of these (D −n1 ) permissible prefixes
we may form as (D − n1 )D = D2 − n1 D codewords of length 2. Of these (D2 − n1 D) codewords of length
2, n2 are used.
∴ n1 ≤ D, n2 ≤ D2 − n1 D
Similarly,
n3 ≤ D 3 − n1 D 2 − n2 D
n4 ≤ D 4 − n1 D 3 − n2 D 2 − n3 D
··· ··· ··· ···
l l−1
nl ≤ D − n1 D − n2 Dl−2 − . . . − nl−1 D
⇒ nl + nl−1 D + nl−2 D2 + . . . + n1 Dl−1 ≤ Dl
l
X
⇒ ni D−i ≤ 1
i=1
q
X
⇒ D−li ≤ 1
i=1
Definition 6.1.12. Optimal code: An instantaneous code is said to be optimal if the expected length of the
code is less than or equal to the expected length of all other instantaneous codes for the same source alphabet
and the same code alphabet.
48 UNIT 6.
Theorem 6.1.13. Let S = {x1 , x2 , . . . , xq } be the source alphabet and D = {0, 1, 2, . . . , D − 1} be the
code alphabet for a random variable X. Then the expected length L∗ of an optimal instantaneous code for the
random variable X is given by
H(X)
L∗ = ,
log D
where H(X) is the entropy of the random variable X.
Theorem 6.1.14. Let S = {x1 , x2 , . . . , xq } be the source alphabet and D = {0, 1, 2, . . . , D − 1} be the code
alphabet for the random variable X with p.m.f p(X). Then the expected length L(C) of any instantaneous
code C for X satisfies the inequality
H(X)
L(C) ≥ .
log D
Proof. Let pi = p(xi ) = P (X = xi ) and li = l(xi ). Since C is an instantaneous code, by Kraft inequality
q
X
D−li ≤ 1. (6.1.11)
i=1
D−li D−li
log ≤ −1
µpi µpi
D−li
−li log D − log µ − log pi ≤ −1
µpi
Multiplying by pi and taking sum we get
q q q q q
X X X X D−li X
− pi li log D − pi log µ − pi log pi ≤ − pi
µ
i=1 i=1 i=1 i=1 i=1
1
⇒ −L(C) log D − log µ + H(X) ≤ · µ − 1
µ
q q q
" #
X X X
⇒ H(X) − L(C) log D ≤ log µ ∵µ= D−li , pi = 1, L(C) = pi li
i=1 i=1 i=1
⇒ H(X) − L(C) log D ≤ 0 [∵ 0 < µ ≤ 1, log µ ≤ 0]
H(X)
⇒ L(C) ≥
log D
Theorem 6.1.15. Let L∗ be the expected length of an instantaneous optimal code for the random variable X
with code alphabet D = {0, 1, 2, . . . , D − 1}. Then
H(X) H(X)
≤ L∗ ≤ + 1,
log D log D
where H(X) is the entropy function of the random variable X.
6.1. INTRODUCTION 49
Proof. Let S = {x1 , x2 , . . . , xq } be the source alphabet and D = {0, 1, . . . , D − 1} be the code alphabet of
the random variable X with p.m.f p(x).
q
X
Let us define pi = p(xi ) = P (X = xi ), li = l(xi ). Now, L∗ be the minimum value of pi li subject to
i=1
the constraint
q
X
D−li ≤ 1 (6.1.13)
i=1
We neglect the integer constraint on l1 , l2 , . . . , lq and assume the inequality (6.1.13) hold.
log pi
The choice of the codeword length li = − , (i = 1, 2, . . . , q) gives
log D
q q
X X −pi log pi H(X)
L= pi li = = .
log D log D
i=1 i=1
log pi
∵− may not equal to an integer
log D
h i
Therefore, we round it upto the even integer. So we take li = − log pi
log D , where for any real x > 0, [x] denote
the greatest positive integer not greater than x. Then
log pi log pi
− ≤ li ≤ − +1 (6.1.14)
log D log D
− log pi ≤ li log D
⇒ log pi ≥ log D−li
⇒ pi ≥ D−li
Therefore,
q
X q
X
−li
D ≤ pi = 1
i=1 i=1
H(X)
≤ L∗ ≤ L (6.1.16)
log D
50 UNIT 6.
H(X) H(X)
∴ ≤ L∗ ≤ + 1.
log D log D
Example 6.1.16. Let S = {x1 , x2 , . . . , xq } be the source alphabet and D = {0, 1, . . . , D − 1} be the code
alphabet of the random variable X with p.m.f p(Xi ) = D−αi , where α1 , α2 , . . . , αq are positive integers.
Show that any code C : S → D for X with codeword lengths α1 , α2 , . . . , αq is an instantaneous optimal
code.
Solution. Let C be any code for the random variable X with codeword lengths li = l(xi ) = αi , i =
1, 2, . . . , q. Then
Xq Xq Xq
D−li = D−αi = pi = 1
i=1 i=1 i=1
Thus the codeword lengths l1 , l2 , . . . , lq of the code C satisfy Kraft inequality. Hence C is an instantaneous
code. Again
pi = D−αi = D−li
∴ log pi = −li log D
X q q
X
⇒− pi log pi = li pi log D
i=1 i=1
⇒ H(X) = L(C) log D
H(X)
⇒ L(C) =
log D
Therefore, the expected length L(C) of the code C is minimum. Hence C is an instantaneous optimal code.
Definition 6.1.17. Efficiency of a code: Let C be a uniquely decodable D-ary code for the random variable
X and L(C) be its expected length. Then the efficiency η of the code C is defined by
H(X)
η=
L(C) log D
Redundancy of a code = β = 1 − η.
Theorem 6.1.18. Let C ∗ be a code of the random variable X of the following distribution
X : x1 x2 ... xn
pi : p1 p1 ... pn
where p1 ≥ p2 ≥ . . . ≥ pn . If L(C ∗ ) ≤ L(C) for any code C of X, then l1∗ ≤ l2∗ ≤ . . . ≤ ln∗ where li∗ is the
length of the code C ∗ (xi ).
Proof. Let E = {1, 2, . . . , n}. We take any two elements i and j of E with i < j. Denote by α, the permuta-
tion of the set E such that α(i) = j and α(j) = i but all other elements of E remain unchanged.
∗
Let C be a code of the random variable X such that lk = lα(k) , where lk is the length of the codeword
C(xk ). Then
∗
li = lα(i) = lj∗ [∵ α(i) = j]
∗
and lj = lα(j) = li∗ [∵ α(j) = i]
li∗ ≤ lj∗ .
∴ l1∗ ≤ l2∗ ≤ . . . ≤ ln∗
Course Structure
• Shannon-Fano Encoding Procedure for Binary code
Let pi = p(xi ) = P (X = xi ), i = 1, 2, . . . , q.
(ii) The binary digit in each codeword appeared independent with equal probabilities.
Step 2: Partition the set S of source symbols into two equiprobable groups S0 and S1 as
S0 = {x1 , x2 , . . . , xr }, S1 = {xr+1 , . . . , xq }
i.e., P (S0 ) ≡ P (S1 )
Step 3: We further partition each of the subgroups S0 and S1 into two most equiprobable subgroups S00 , S01
and S10 , S11 respectively.
52
7.1. SHANNON-FANO ENCODING PROCEDURE FOR BINARY CODE: 53
Step 4: We continue partitioning each of the resulting subgroups into two most equiprobable subgroups till each
subgroup contain only one source symbol.
S00 = {x1{
{
x1
{
S0 = x2 S010 = { x2 {
{xx {
x3 2
S01=
3
S011 = {x3{
S100 = {x4{
{{
{x {
x4
S10 =
x4 5
x5 S101 = {x5{
S1 =
x6 S1100= {x6{
{xx {
x7
{{
6
x8 S110 =
x9 x6
x7
7
S1101 ={x7{
S1110 ={x8{
7
S 11 =
x8
x9 S111 = { xx {
8
9
S1111 ={x9{
Therefore, the codes are
Clearly no codeword is a prefix of any other codeword. So it is an instantaneous code and hence it is uniquely
decodable.
Advantages:
Example 7.1.1. Construct Shannon Fanno binary code for the random variable X with the following distri-
bution.
Source symbols : x1 x2 x3 x4 x5 x6
1 1 1 1 1 1
Probability :
3 4 8 8 12 12
Calculate the expected length and the efficiency of the code.
54 UNIT 7.
Solution. We have
1 1 7
p1 + p2 = + =
3 4 12
1 1 5
p3 + p4 + p5 + p6 = + =
4 6 12
7 5
∴ and are close to each other
12 12
∴ We consider the equiprobable groups as follows.
S 00 = {x1{
{x {
x1
S0 =
2
S 01 = {x2{
S100 = {x3{
{xx {
{{
3
S10 =
x3
x4
4
S101 = {x4{
S1 =
x5 S110 = {x5{
x6 S11 = { xx {
5
6
S111 = {x6{
So the Shannon-Fano binary code will be as follows:
1 1 1 1 1 1
L(C) = Expected length = 2 · +2· +3· +3· +3· +3·
3 4 8 8 12 12
2 1 3 1
= + + +
3 2 4 2
2 3
= 1+ +
3 4
12 + 8 + 9 29
= = bits/symbol
12 12
X6
Entropy, H(X) = − pi log2 pi = 2.3758 bits
i=1
H(X) 2.3758
Efficiency, η = = = 98.30%
L(C) log2 2 29/12
Similar Problems:
Example 7.1.2.
Source symbols : x1 x2 x3 x4 x5 x6 x7 x8 x9
Probability : 0.49 0.14 0.14 0.07 0.07 0.04 0.02 0.02 0.01
{{
S100= {x2{
x1
x2
S10 ={ xx {
2
3
x3
S101= {x3{
x4
S1100= {x4{
{ {
S1= x5 x4
{x6{
{{
x6 x6 S11010 =
x7
x8
x4
x5
S110=
x9 S1101 = { {
x6
x9
S11011 = {x9{
x6
{ {
x9 S11 =
x7 x5 S1110 = {x5{
x8 S111= x7 {x7{
{ {
S11110 =
x9 x8 x7
S1111 = x8
S11111 = {x8{
So the code is
x1 → 0
x2 → 100
x3 → 101
x4 → 1100
x5 → 1110
x6 → 11010
x7 → 11110
x8 → 11111
x9 → 11011
Therefore,
Example 7.1.3.
Source symbols : x1 x2 x3 x4 x5 x6 x7 x8
1 1 1 1 1 1 1 1
Probability :
4 4 8 8 16 16 16 16
S 00 = {x1{
{ {
x1
S0 =
x2
S 01 = {x2{
{x3{
{{
S100=
x3
x4
S10 = { xx {
3
4
{x4{
S101=
{x5{
{{
x5 S1100 =
{ {
S1=
x6 x5 x5
S1101 ={ x6{
S110=
x7 S11=
x6 x6
x8 x7
x8 S111=
{ {
x7
x8
S1110 ={ x7{
S1111 ={x8{
So the code is
x1 → 00
x2 → 01
x3 → 100
x4 → 101
x5 → 1100
x6 → 1101
x7 → 1110
x8 → 1111
Therefore,
1 1 1 1 1 1 1
L(C) = ·2 + ·2 + ·3 + ·4 + ·4 + ·4 + ·4
4 4 8 16 16 16 16
3
= 1+ +1
4
11
= = 2.75 bits/symbol.
4
8
X 11
H(X) = − pi log pi = = 2.75 bits
4
i=1
H(X) 2.75
∴ Efficiency of the code = η = = = 100%
L(C) log2 2 2.75
X: x1 x2 ... xn
Probability : p1 p2 ··· pn
7.2. CONSTRUCTION OF HAFFMAN BINARY CODE 57
Step 1: We arrange the source symbols x0i s in decending order of their probabilities. Without loss of generality
we may assume that p1 ≥ p2 ≥ . . . ≥ pn . We thus have
X: x1 x2 ... xn
Probability : p1 p2 ··· pn
Step 2: We combine the last two symbols to form a new symbol. Then we arrange the source symbols in
descending order of their probabilities. Let us suppose that
pn−1 + pn ≥ p1 .
We take
x1 p1 p1 /
x2 p2 p2 /
x3 p3 p3 /
. .. ..
.. . .
xn-2 pn-2 pn-2 /
xn pn
Step 3: Again we combine the last two symbols to form a new symbol and proceed as in Step 2.
Step 4: The process is continued until we reach a stage where we get only one symbol.
Example 7.2.1. Construct Haffman binary code for the random variable X whose distribution is given by
X: x1 x2 x3 x4 x5
Probability : 0.25 0.25 0.2 0.15 0.15
x 1
0.25 0.15 0.45 0.55 1
x 2
0.25 0.25 0.30 0.45
We arrange the above scheme as a tree in reverse order from which we can write down the corresponding
Haffman binary code.
58 UNIT 7.
x4
15
0
0.
1
0.1
5
x5
0
0.3
0
1
55 0.
0. 25
0
x1
x2
0. 1 25
45 0.
0
0. 1
20
x3
x1 → 01
x2 → 10
x3 → 11
x4 → 000
x5 → 001
X: x1 x2 ... xq
Probability : p1 p2 ··· pq
Case 2: If (q − D) is not divisible by (D − 1), then we add new dummy symbols with zero probability to make
(q ∗ − D) divisible by (D − 1) where q ∗ is the number of symbols after addition of dummy symbols.
7.3. CONSTRUCTION OF HAFFMAN D ARY CODE (D>2) 59
Now, we proceed as in Case 1. The codes for the dummy symbols are discarded.
Example 7.3.1.
Source symbols : x1 x2 x3 x4 x5 x6 x7 x8 x9
Probability : 0.20 0.18 0.16 0.12 0.10 0.08 0.07 0.05 0.04
Construct a Haffman ternary code for X. Calculate the expected length and efficiency of the code.
∴ q − D = 9 − 3 = 6 is divisible by 2 = 3 − 1 = D − 1
x6 0.08 0.10
x7 0.07 0.08
x8 0.05
x9 0.04
We arrange the above scheme as a tree in reverse order from which we can write down the corresponding
Haffman binary code.
x2 x7
07 0
0 0.
8
0.1
0.05 1
x8
0.16 1 0.0
4
0.16 2 2 x9
x3
0
50
0.
2
x4
0.30 0.1 0
1 1
0.10 x5
0.
0.
08
2 2
20
x6
x1
60 UNIT 7.
So the code is
x1 → 2
x2 → 00
x3 → 02
x4 → 10
x5 → 11
x6 → 12
x7 → 010
x8 → 011
x9 → 012
Therefore,
H(X)
∴ Efficiency of the code = η = = 0.9637 = 96.37%
L(C) log2 3
Example 7.3.2. Construct Haffman ternary code with the following distribution
Source symbols : x1 x2 x3 x4 x5 x6
1 1 1 1 1 1
Probability :
3 4 8 8 12 12
Calculate the expected length and its efficiency.
Solution. Here q = 6, D = {0, 1, 2}, D = 3.
x4 1/8 1/8
x5 1/12 1/8
x6 1/12
x7 0
7.3. CONSTRUCTION OF HAFFMAN D ARY CODE (D>2) 61
We arrange the above scheme as a tree in reverse order from which we write down the Haffman ternary code.
x5
1/2 0
1/2 1 x6
0 2
0
6
1/
x7
1 x3
1/8
0 1/8 2
2
x4
5/1
1
x1
1/3
1/
2
4
x2
So the code is
x1 → 1
x2 → 2
x3 → 01
x4 → 02
x5 → 000
x6 → 001
x7 → 002 (discarded).
Therefore,
1 1 1 1 1 1
L(C) = 1× + 1× + 2× + 2× + 3× + 3×
3 4 8 8 12 12
19
= bits/symbol.
12
X6
H(X) = − pi log pi = 1.4990 bits
i=1
H(X)
∴ Efficiency of the code = η = = 0.9467 = 94.67%
L(C) log2 3
Example 7.3.3. Construct Shannon Fanno ternary code for the following distribution of the random variable
X
Source symbols : x1 x2 x3 x4 x5 x6 x7 x8 x9
Probability : 0.20 0.18 0.16 0.12 0.10 0.08 0.07 0.05 0.04
62 UNIT 7.
{x3 {
S10 =
{{
x3
S1= x4 S11={ x4 {
x5
S12={ x5 {
{{
x6 S20 = {x6 {
S2=
x7
x8
S21 = {x7 { S220 = {x8{
x9 S22 = {xx {
8
9
{x9{
S221 =
Course Structure
• Error correcting codes
x + y = (x1 + y1 , x2 + y2 , . . . , xn + yn )
λx = (λx1 , λx2 , . . . , λxn )
Then
x + y, λx ∈ Vn (Fq ).
It is easy to see that Vn (Fq ) is a vector space over the field Fq .
Theorem 8.1.1. For any x, y ∈ Vn (Fq ), if we define d(x, y) = number of i0 s with xi 6= yi , then d is a metric
on Vn (Fq ).
63
64 UNIT 8.
∴ i ∈ A ∪ B.
Definition 8.1.2. q-ary code of length n: A non empty subset C of Vn (Fq ) is called a q-ary code of length n
and members of C are called codeword. If q = 2, the corresponding code is called binary code and so on.
Definition 8.1.3. Weight of a codeword: An element x in Vn (Fq ) is a codeword. The weight of the codeword
x, denoted by w(x) and is defined by
Definition 8.1.4. Linear code: A linear subspace C of Vn (Fq ) is called a linear code of length l over the field
Fq and the dimension k of the subspace C is called the dimension of the code C. It is also called an (n, k)
linear code over the field Fq .
Definition 8.1.5. Minimum distance of the code: Let C be a code in Vn (Fq ). The minimum distance δ(C)
of the code C is defined by
Definition 8.1.6. Generator matrix: Let C be an (n, k) linear code over the field Fq with q elements. A
k × n matrix G with entries from the field Fq is said to be the generator matrix of code C if the row space
of the matrix G is the same as the subspace C. We also say that the matrix G generates the code C. Since
the dimension of C is k, the dimension of the rowspace of G is k which implies that the row vectors of G are
linearly independent and so they form a basis of C.
Definition 8.1.7. Parity check matrix: Let C be an (n, k) linear code over the field Fq with q elements. An
(n − k) × n matrix H with entries from the field Fq is called a parity check matrix of code C iff Hx = 0 for
all x ∈ C.
The matrix H also generates an (n, n − k) linear code over Fq which is denoted by C ⊥ and is called the
dual space of C.
∴ dim(C) + dim(C ⊥ ) = n and rank(H) = n − k.
8.2. CONSTRUCTION OF LINEAR CODES 65
• By using generator matrix: Let G be a k × n (k < n) generator matrix with entries from Fq with q
elements and rank(G) = k.
Let C denote the row space of the matrix G. Then C is and (n, k) linear code denoted by α1 α2 . . . αk , the
row vectors of G.
u = aG = a1 α1 + a2 α2 + . . . + ak αk ∈ C
Example 8.2.1. Find the codewords determined by the binary generator matrix
1 0 0 1 1
G = 0 1 0 0 1
0 0 1 1 1
Solution. G is a binary generation matrix with 5 columns. Also, it is clear that rank(G)=3. The linear code C
generated by G is given by
C = {x : x = aG and a ∈ V3 (F2 )}
The vector a = (a1 a2 a3 ) may be considered in 23 = 8 ways , namely, (0 0 0), (0 0 1), (0 1 0),
(1 0 0), (0 1 1), (1 0 1), (1 1 0), (1 1 1).
1 0 0 1 1
∴ 0 0 0 0 1 0 0 1 = 0 0 0 0 0
0 0 1 1 1
66 UNIT 8.
1 0 0 1 1
0 0 1 0 1 0 0 1 = 0 0 1 1 1
0 0 1 1 1
1 0 0 1 1
0 1 0 0 1 0 0 1 = 0 1 0 0 1
0 0 1 1 1
1 0 0 1 1
1 0 0 0 1 0 0 1 = 1 0 0 1 1
0 0 1 1 1
1 0 0 1 1
0 1 1 0 1 0 0 1 = 0 1 1 1 0 [∵ 1 + 1 = 0]
0 0 1 1 1
1 0 0 1 1
1 0 1 0 1 0 0 1 = 1 0 1 0 0 [∵ 1 + 1 = 0]
0 0 1 1 1
1 0 0 1 1
1 1 0 0 1 0 0 1 = 1 1 0 1 0 [∵ 1 + 1 = 0]
0 0 1 1 1
1 0 0 1 1
1 1 1 0 1 0 0 1 = 1 1 1 0 1 [∵ 1 + 1 = 0; 1 + 1 + 1 = 0 + 1 = 1]
0 0 1 1 1
• By using Parity check matrix: Let H be an r × n(r < n) parity check matrix with entries from Fq with
q elements and rank(H) = r. Let
C = {x : x ∈ Vn (Fq ) and Hx = 0}
Hx = 0, Hy = 0.
H(x + y) = H(x) + H(y) = 0
and H(αx) = αH(x) = 0.
Therefore, C is a linear subspace of Vn (Fq ) and so a linear code over the field Fq .
Clearly H is a parity check matrix for the code C. The dimension of code C is n − r.
Example 8.2.2. Find a codeword determined by the binary parity check matrix
1 0 1 0
H=
0 1 1 1
Solution. Here H is a binary parity check matrix with 4 columns and rank(H) = 2. Therefore, the linear
8.3. STANDARD FORM OF PARITY CHECK MATRIX: 67
code C determined by the parity check matrix H consists of binary codewords (x1 x2 x3 x4 ) satisfies
x1
1 0 1 0 x2 = 0
0 1 1 1 x3
x4
⇒ x1 + x3 = 0 and x2 + x3 + x4 = 0
⇒ x1 = x3 and x2 = x3 + x4 [∵ 2 · 1 = 0; 1 + 1 = 0; − 1 = 1]
If the values of x3 and x4 are assigned then x1 and x2 are determined. There are four ways of choosing x3
and x4 i.e., 00, 01, 10, 11, leading to the codewords 0000, 0101, 1110, 1011.
The following is a similar problem.
Example 8.2.3. Find the codewords determined by the P.C.M
1 0 0 0 1
H = 0 1 0 1 1
0 0 1 1 1
Let C be a cyclic code in Vn (Fq ) and a ∈ C. Then the words are obtained from a by n number of cyclic
shifts. Any number of cyclic shifts such as
belong to C.
Cyclic codes are useful for two reasons; from the practical point of view, it is possible to implement by
simple devices known as shift resister. On the other hand, cyclic code can be constructed and investigated by
means of algebraic theory of rings and polynomials.
xn − 1 = h(x)g(x) (8.5.1)
Let
h(x) = h0 + h1 x + h2 x2 + . . . + hk xk
g(x) = g0 + g1 x + g2 x2 + . . . gn−k−1 xn−k−1 + gn−k xn−k
where gn−k = 1.
It is easy to see from (8.5.1) that hk = 1 and h0 g0 = −1, which gives that h0 6= 0, g0 6= 0. The polynomial
g(x) corresponds to the codeword
g = g0 g1 g2 · · · gn−k 0 0 · · · 0 in Vn (Fq )
g(i) = 0 0 · · · g0 g1 · · · gn−k 0 0 · · · 0
There are i zeros at the beginning and k − 1 − i zeros at the end. We denote by h, the codeword whose 1st
k + 1 bits are hk hk−1 · · · h1 h0 followed by n − k − 1 zeros.
∴ h = hk hk−1 · · · h1 h0 0 0 · · · 0
Let H denote the (n − k) × n matrix whose rows are h, h(1) , h(2) , . . . , h(n−k+1) , where h(i) is the i-th cyclic
shift of the codeword h. Hence
hk hk−1 · · · h1 h0 0 0 ··· 0
0 h k h k−1 · ·· h1 h0 0 ··· 0
H= · · · · · ·
··· ··· ··· ··· ··· ··· · · ·
0 0 ··· 0 hk hk−1 · · · h1 h0
8.6. BCH CODES 69
Example 8.5.1. Determine the binary parity check matrix for the cyclic code C =< g(x) > of length 7 where
g(x) = 1 + x2 + x3 and obtain the code C.
x7 − 1 = (1 + x)(1 + x + x3 )(1 + x2 + x3 )
= h(x)g(x) [∵ In a binary code, -1 = 1] (8.5.2)
∴ h(x) = (1 + x)(1 + x + x3 )
= (1 + x2 + x3 + x4 ) [∵ 1 + 1 = 0]
∴ h0 = 1, h1 = 0, h2 = 1, h3 = 1, h4 = 1.
1 1 1 0 1 0 0
H= 0 1 1 1 0
1 0
0 0 1 1 1 0 1
No column of H consist entirely 00 s and no two columns are exactly same. So the code determined by H is a
Hamming code of length 7.
Course Structure
• Markovian decision Process
• Regular matrices
9.1 Introduction
A Markov Process consists of a set of objects and a set of states such that
(i) at any given time, each object must be in a state (distinct objects need not be in distinct states).
(ii) the probability that an object moves from one state to another state which may be the same as the first
state, in one time period depends only on those two states.
The integral numbers of time periods past the moment when the process is started represent the stages of the
process which may be finite or infinite.
If the number of states is finite or countably infinite, the Markov process is called a Markov Chain. A
finite Markov chain is one having a finite number of states. We denote the probability of moving from state
i to state j in one time period by pij . For an N state Markov chain, where N is a fixed positive integer, the
N × N matrix P = [pij ] is the stochastic or transition matrix associated with the process. Necessarily, the
elements of each row of P sum to unity.
Theorem 9.1.1. Every stochastic matrix has 1 as an eigen value (possible multiple and none of the eigen
values exceed 1 in absolute value).
Because of the way P is defined, it proves convenient in this chapter to indicate N -dimensional vectors as
row vectors.
According to the theorem, there exists a vector X 6= 0 such that XP = X. This left eigen value is called a
fixed point of P .
70
9.2. POWERS OF STOCHASTIC MATRICES 71
Example 9.2.1. Grapes in Kashmir are classified as either superior, average or poor. Following a superior
harvest, the probabilities of having a superior, average and poor harvest in the next year are 0, 0.8 and 0.2.
Following an average harvest, the probabilities of a superior, average and poor harvest are 0.2, 0.6 and 0.1.
Following a poor harvest, the probabilities of a superior, average and poor harvest are 0.1, 0.8 and 0.1. De-
termine the probabilities of a superior harvest for each of the next five years if the most recent harvest was
average.
S A P
(0)
X = 0 1 0
Now,
0 0.8 0.2 0 0.8 0.2
P2 = 0.2 0.6 0.2 0.2 0.6 0.2
0.1 0.8 0.1 0.1 0.8 0.1
0 + 0.16 + 0.02 0 + 0.48 + 0.16 0 + 0.16 + 0.02
= 0 + 0.12 + 0.02 0.16 + 0.36 + 0.16 0.04 + 0.12 + 0.02
0 + 0.16 + 0.01 0.08 + 0.48 + 0.08 0.02 + 0.16 + 0.01
0.18 0.64 0.18
= 0.14 0.68 0.18
0.17 0.64 0.19
72 UNIT 9.
0.18 0.64 0.18 0.18 0.64 0.18
P4 = 0.14 0.68 0.18 0.14 0.68 0.18
0.17 0.64 0.19 0.17 0.64 0.19
0.1526 0.6656 0.1818
= 0.1510 0.6672 0.1818 .
0.1525 0.6656 0.1819
0.1526 0.6656 0.1818 0.18 0.64 0.18
P5 = 0.1510 0.6672 0.1818 0.14 0.68 0.18
0.1525 0.6656 0.1819 0.17 0.64 0.19
0.151558 0.666624 0.181818
= 0.151494 0.666688 0.181818 .
0.151557 0.666624 0.181819
Thus,
X (5) = 0 1 0 P5
= 0.151494 0.666688 0.181818 .
Hence the probability of a superior harvest for each of the next five years is 0.151494.
Definition 9.2.2. (Regular Matrix:) A stochastic matrix is regular if one of its powers contains only positive
entries.
Theorem 9.2.3. If a stochastic matrix is regular, then 1 is an eigen value of multiplicity one, and all other
eigen values λi satisfy |λi | ≤ 1.
Solution.
2 0 1 0 1 0.40 0.60
P = = .
0.4 0.6 0.4 0.6 0.24 0.76
Since each entry of P 2 is positive, hence P is regular.
Unit 10
Course Structure
• Ergodic Matrices
The components of X (∞) are limiting state distributions and represent the approximate proportions of
objects in the various states of a Markov chain after a large number of time periods.
Theorem 10.1.2. A stochastic matrix is ergodic if and only if the only eigen value λ of magnitude 1 is 1 itself
and if λ = 1 has multiplicity k, then there exists k linearly independent (left) eigen vectors associated with
this eigen value.
Theorem 10.1.3. A regular matrix is ergodic but the converse is not true in general.
If P is regular with limit matrix L, then the rows of L are identical with one another, each being the unique
left eigen vector of P associated with the eigen value λ = 1 and having the sum of its components equal to
unity.
Let us denote this eigen vector by E1 . Now, if P is regular, then regardless of the initial distribution X (0) ,
we can write X (∞) = E1 (= X (0) L).
73
74 UNIT 10.
Theorem 10.1.5. If every eigen value of a matrix P yields linearly independent (left) eigen vectors in number
equal to its multiplicity, then there exists a non-singular matrix M , whose rows are left eigen vectors of P ,
such that D ≡ M P M −1 is a diagonal matrix. The diagonal elements of D are the eigen values of P , repeated
according to multiplicity.
We have,
L = lim P n
n→∞
−1
= (M M ) lim P n (M −1 M )
n→∞
−1
= M lim M P n M −1 M
n→∞
= M −1 lim (M P M −1 )n M
n→∞
= M −1 lim Dn M
n→∞
1
1
..
.
−1
= M 1
M.
0
..
.
0 N ×N
The diagonal matrix on the right has k 10 s and (N − k) 00 s on the main diagonal.
Example 10.1.6. Is the stochastic matrix
1 0 0 0
0.4 0 0.6 0
P =
0.2
0 0.1 0.7
0 0 0 1
1−λ 0 0 0
0.4 −λ 0.6 0
= 0
0.2 0 0.1 − λ 0.7
0 0 0 1−λ
⇒ (1 − λ)(−λ)(0.1 − λ)(1 − λ) = 0
⇒ λ = 1, 1, 0.1, 0.
Thus, λ1 = 1 (multiplicity 2), λ2 = 0.1, λ3 = 0 are the eigen values of P . Hence P is not regular.
The left eigen vectors for the double eigen value λ1 = 1 are [1, 0, 0, 0] and [0, 0, 0, 1], which are linearly
independent. Hence P is ergodic. Thus, L = lim P n exists.
n→∞
We now fine the eigen vectors corresponding to λ2 = 0.1 and λ3 = 0.
1 0 0 0
0.4 0 0.6 0
x1 x2 x3 x4 0.2 0 0.1 0.7 = 0.1 x1 x2 x3 x4
0 0 0 1
⇒ (1 − 0.1)x1 + 0.4x2 + 0.2x3 = 0
−0.2x2 = 0
0.6x2 + (0.1 − 0.1)x3 = 0
0.7x3 + (1 − 0.1)x4 = 0
⇒ 0.9x1 + 0.4x2 + 0.2x3 = 0
−0.1x2 = 0
0.6x2 = 0
0.7x3 + 0.9x4 = 0.
x1 = −2, x2 = 0, x3 = 9, x4 = −7.
0 0 0 1
⇒ x1 + 0.6x2 + 0.2x3 = 0
0.6x2 + 0.1x3 = 0
0.7x3 + x4 = 0.
x1 = 4, x2 = 5, x3 = −30, x4 = 21.
1 0 0 0 1 0 0 0
0 0 0 1 0 1 0 0
M =
−2 and D = .
0 9 −7 0 0 1 0
4 5 −30 21 0 0 0 1
We now find M −1 .
1 0 0 0 : 1 0 0 0
0 0 0 1 : 0 1 0 0
[M : I] =
−2 0
9 −7 : 0 0 1 0
4 5 −30 21 : 0 0 0 1
1 0 0 0 : 1 0 0 0
R →R4 4 5 −30 21 : 0 0 0 1
−−2−−−→
−2 0
R4 →R2 9 −7 : 0 0 1 0
0 0 0 1 : 0 1 0 0
1 0 0 0 : 1 0 0 0
R →R −4R1 0 5 −30 21 : −4 0 0 1
−−2−−−2−−−→
R3 →R3 +2R1 0 0 9 −7 : 2 0 1 0
0 0 0 1 : 0 1 0 0
1 0 0 0 : 1 0 0 0
R2 → 51 R2 0 1 −6 21 : − 4 0 0 1
−−−−− −→ 5 5 5
R3 → 19 R3
0 0 1 − 7 : 2 0 1 0
9 9 9
0 0 0 1 : 0 1 0 0
1 0 0 0 : 1 0 0 0
R →R +6R3 0 1 0 − 7 : 8 0 2 1
−−2−−−2−−−→ 15 15
0 0 1 − 7 : 2 0 1 0
3 5
9 9 9
0 0 0 1 : 0 1 0 0
1 0 0 0 : 1 0 0 0
R2 →R2 + 7 R4 0 1 0 0 : 8 7 2 1
−−−−−−−15 −−→
0 0 1 0 :
15
2
15
7
3
1
5 .
R3 →R3 + 97 R4 9 0
9 9
0 0 0 1 : 0 1 0 0
Thus
1 0 0 0
8 7 2 1
M −1 =
2
15 15
7
3
1
5 .
9 9 9 0
0 1 0 0
10.1. ERGODIC MATRIX 77
Thus,
1 0 0 0 1 0 0 0 1 0 0 0
8 7 2 1
5 0 1 0 0 0 0 0 1
L = 15 15 3
2 7 1
0 0 0 0 0 −2 0 9 −7
9 9 9
0 1 0 0 0 0 0 0 4 5 −30 21
1 0 0 0 1 0 0 0
8 7
0 0 0
0 0 1
= 15 15
2 7
0 0 −2 0 9 −7
9 9
0 1 0 0 4 5 −30 21
1 0 0 0
8 0 0 7
= 15 15 .
2 0 0 7
9 9
0 0 0 1
Example 10.1.7. Construct the state-transition diagram for the Markov chain
1 2 3 4
1 1 0 0 0
2 0.4 0 0.6 0
P =
0.2
3 0 0.1 0.7
4 0 0 0 1
Solution. [A state-transition diagram is an oriented network in which the nodes represent states and the arcs
represent possible transitions.]
Labelling the states by 1, 2, 3, 4, we have the following state-transition diagram.
The number on each arc is the probability of the transition.
Example 10.1.8. Prove that if P is regular, then all the rows of L = lim P n are identical.
n→∞
which implies that every row of L is a left eigen vector of P corresponding to the eigen value λ = 1.
Now, P being regular, all such eigen vectors are scalar multiples of a single vector.
On the other hand, L being stochastic, each row of it sums to unity. Thus it follows that all the rows are
identical.
Example 10.1.9. Prove that if λ is an eigen value of a stochastic matrix P , then |λ| ≤ 1.
78 UNIT 10.
Solution. Let E = [e1 e2 . . . eN ]T be a right eigen vector corresponding to λ. Then P E = λE, and
considering the jth component of both sides of this equality, we conclude that
N
X
pjk ek = λej . (10.1.3)
k=1
By definition, E 6= 0, so that |ei | > 0. Thus, it follows from (10.1.3), with j = i and from (10.1.4) that,
N
X N
X N
X
|λ||ei | = |λei | = pik ek ≤ pik |ek | ≤ |ei | pik = |ei |,
k=1 k=1 k=1
The manufacturer of Hi-Glo toothpaste currently controls 60% of the market in a particular city. Data
from the previous year show that 88% of Hi-Glo’s customers remained loyal to Hi-Glo, while 12% of Hi-
Glo’s customers switched to rival brands. In addition, 85% of the competition’s customers remained loyal to
the competition, while the other 15% switched to Hi-Glo. Assuming that these trends continue, determine
Hi-Glo’s share of the market
(a) in 5 years and (b) over the long run.
Solution. We take state 1 to be consumption of Hi-Glo toothpaste and state 2 to be consumption of a rival
brand. Then p11 is the probability that a Hi-Glo customer remains loyal to Hi-Glo, that is, 0.88; p12 is the
probability that a Hi-Glo customer switches to another brand, that is, 0.12; p21 is the probability that the
customer of another brand switches to Hi-Glo, that is, 0.15; p22 is the probability that customer of another
brand remains loyal to the competition, that is, 0.85.
The stochastic matrix (Markov chain) defined by these transition probabilities is
1 2
1 0.88 0.12
P =
2 0.15 0.85
(0)
The initial probability distribution vector is X (0) = [0.60 0.40], where, the components x1 = 0.60 and
(0)
x2 = 0.40 represent the proportions of people initially in states 1 and 2, respectively.
(a) Thus,
X (5) = X (0) P 5
0.6477 0.3523
= 0.60 0.40
0.4404 0.5596
= 0.5648 0.4352 .
After 5 years, Hi-Glo’s share of the market will have declined to 56.48%. Now,
0.88 0.12
P =
0.15 0.85
10.1. ERGODIC MATRIX 79
is regular, since each entry of the first power of P is positive, that is, P is positive. Hence P is ergodic.
So, lim P n = L(say) exists. Now, the left eigen vector corresponding to λ = 1 is given by
n→∞
0.88 0.12
x1 x2 = x1 x2
0.15 0.85
⇒ 0.12x1 − 0.15x2 = 0 and x1 + x2 = 1.
Solving, we get,
5 4
x1 = and x2 =
9 9
and thus
E1 = x1 x2 = 95 4
9 .
Hence, 5 4
n 9 9
L = lim P = 5 4 .
n→∞ 9 9
(b)
X (∞) = X (0) L
5 4
= 0.60 0.40 59 94
9
9 5
= 13 + 29 12 16 4
45 + 45 = 9 9 = E1 .
Therefore, over the long run, Hi-Glo’s share of the market will stabilize at 59 , that is, approximately
55.56%.
Example 10.1.11. Solve the previous problem, if Hi-Glo currently controls 90% of the market
(a)
X (5) = X (0) P 5
0.6477 0.3523
= 0.90 0.10
0.4404 0.5596
= 0.6270 0.3730 .
Example 10.1.12. The geriatric ward of a hospital lists its patients as bedridden or ambulatory. Historical data
indicate that over a 1-week period, 30% of all ambulatory patients are discharged, 40% remain ambulatory, and
30% are remanded to complete bed rest. During the same period, 50% of all the bedridden patients become
ambulatory, 20% remain bedridden, and 30% die. Currently the hospital has 100 patients in its geriatric ward,
with 30 bedridden and 70 ambulatory. Determine the status of the patients
(The status of a discharged patient does not change if the patient die).
Solution. We take state 1 to be discharged, sate 2 to be ambulatory, state 3 to be bedridden or bed rest and
state 4 to be died patients. Consider 1 time period to be 1 week.
The transition probabilities given by the following transition matrix:
Since, currently the hospital has 100 patients in its geriatric ward, with 30 bedridden and 70 ambulatory, so
the initial probability distribution vector is
1 2 3 4
(0)
X = 0 0.7 0.3 0
Now,
1 0 0 0 1 0 0 0
0.3 0.4 0.3 0 0.3 0.4 0.3 0
P2 =
0
0.5 0.2 0.3 0 0.5 0.2 0.3
0 0 0 1 0 0 0 1
1 0 0 0
0.42 0.31 0.18 0.09
=
0.15
.
0.30 0.19 0.36
0 0 0 1
(a)
X (2) = X (0) P 2
1 0 0 0
0.42 0.31 0.18 0.09
= 0 0.7 0.3 0
0.15 0.30 0.19 0.36
0 0 0 1
= 0.339 0.307 0.183 0.171 .
After 2 weeks, there are approximately 34% discharged, 30% ambulatory, 18% bedridden and 17%
dead patients.
Now, the characteristic equation of P is
|P − λI| = 0
1 0 0 0
0.3 0.4 0.3 0
⇒ = 0
0 0.5 0.2 0.3
0 0 0 1
⇒ (1 − λ)2 (λ2 − 0.6λ − 0.07) = 0
⇒ λ = 1, 1, 0.7, − 0.1.
10.1. ERGODIC MATRIX 81
Since λ1 = 1 (multiplicity 2), λ2 = 0.7, λ3 = −0.1 are the eigen values of P , so P is not regular.
The left eigen vectors for the double eigen value 1 are [1 0 0 0] and [0 0 0 1] which are linearly
independent. Hence P is ergodic. Therefore,
L = lim P n .
n→∞
Now,
1 0 0 0
0.3 0.4 0.3 0
x1 x2 x3 x4 0 0.5 0.2 0.3 = 0.7 x1 x2 x3 x4
0 0 0 1
⇒ (1 − 0.7)x1 + 0.3x2 = 0
(0.4 − 0.7)x2 + 0.5x3 = 0
0.3x2 + (0.2 − 0.7)x3 = 0
0.3x3 + (1 − 0.7)x4 = 0
⇒ 0.3x1 + 0.3x2 = 0
0.3x2 − 0.5x3 = 0
0.3x3 + 0.3x4 = 0.
To find M −1 :
1 0 0 0 : 1 0 0 0
0 0 0 1 : 0 1 0 0
[M : I] =
5 −5 −3 3 : 0 0 1 0
−3 11 3 3 : 0 0 0 1
1 0 0 0 : 1 0 0 0
R ↔R
2
−3 11 3 3 : 0 0 0 1
−−−−−−−4−→
0 −5 −3 3 : −5 0 1 0
R3 →R3 −5R1
0 0 0 1 : 0 1 0 0
1 0 0 0 : 1 0 0 0
R →R +3R 0 11 3 3 : 3 0 0 1
−−2−−−2−−−→
1
0 −5 −3 3 : −5 0 1 0
0 0 0 1 : 0 1 0 0
1 0 0 0 : 1 0 0 0
R →R −3R 0 11 3 0 : 3 −3 0 1
−−2−−−2−−−→
4
0 −5 −3 0 : −5 −3 1 0
R3 →R3 −3R4
0 0 0 1 : 0 1 0 0
1 0 0 0 : 1 0 0 0
1 3 3 3 1
R2 → R2 0 1 0 : 11 − 11 0 11
−−−−11
−−→ 11
0 −5 −3 0 : −5 −3 1 0
0 0 0 1 : 0 1 0 0
1 0 0 0 : 1 0 0 0
3 3 3 1
R →R +5R 0 1 0 : 11 − 11 0 11
−−3−−−3−−−→
2 11
0 0 − 18 0 : − 40 − 48 1 5
11 11 11 11
0 0 0 1 : 0 1 0 0
1 0 0 0 : 1 0 0 0
R2 →R2 + 1 R3 0 1 0 0 : − 1 −1 1 1
−−−−−−−6−→ 20
3
8
6
11
66 .
5
R3 →− 11 R
18 3
0 0 1 0 :
9 3 − 18 − 18
0 0 0 1 : 0 1 0 0
Thus,
1 0 0 0
− 1 −1 1 1
M −1 =
20
3
8
6 66 .
9 3 − 11
18
5
− 18
0 1 0 0
10.1. ERGODIC MATRIX 83
Thus,
1 0 0 0
1 0 00 1 0 0 0
− 13 −1 1 1
66 0 1 0
0
0 0 0 1
lim P n = L =
6
20 8 11 5
n→∞
9 3 − 18 − 18 0 0 00
5 −5 −3 3
0 1 0 0 0 0 0 −3 11 3 3
0
1 0 0 0 1 0 0 0
− 1 −1 0 0 0 0 0 1
= 3
20 8
0 0 5 −5 −3 3
9 3
0 1 0 0 −3 11 3 3
1 0 0 0
− 1 0 0 −1
= 3 .
20 0 0 83
9
0 0 0 1
(b) Thus, the status of the patients over the long run is
X (∞) = X (0) L
1 0 0 0
− 1 0 0 −1
= 0 0.7 0.3 0 3
20 0 0 8
9 3
0 0 0 1
= 13 1
30 0 0 10 = 0.43 0 0 0.1 .
Therefore, over the long run, there are 43% discharged patients and 10% patients die. No ambulatory
or bedridden patients remain in the geriatric ward.
Example 10.1.13. The training programme for production supervisors at a particular company consists of two
phases. Phase 1, which involves 3 weeks of classroom work, is followed by Phase 2, which is a 3 week appren-
ticeship program under the direction of working supervisors. From past experience, the company expects only
60% of those beginning classroom training to be graduated into the apprenticeship phase, with the remaining
40% dropped completely from the training program. Of those who make it to the apprenticeship phase, 70%
are graduated as supervisors, 10% are asked to repeat the second phase, and 20% are dropped completely from
the program. How many supervisors can the company expect from its current training programme if it has 45
people in the classroom phase and 21 people in the apprenticeship phase?
Solution. We consider one time period to be 3 weeks and define states 1 through 4 as the conditions of being
dropped, a classroom trainee, an apprentice, and a supervisor, respectively. If we assume that discharged
individuals never re-enter the training programme and that supervisors remain supervisors, then the transition
probabilities are given by the Markov chain
1 2 3 4
1 1 0 0 0
2 0.4 0 0.6 0
P =
0.2
3 0 0.1 0.7
4 0 0 0 1
84 UNIT 10.
. Since there are 45 + 21 = 66 people in the training programme currently, so the initial probability vector is
given by
(0) 45 21
X = 0, , , 0 .
66 66
We have from example 10.1.6,
1 0 0 0
8 0 0 7
lim P n = L =
2
15 15 .
7
n→∞ 9 0 0 9
0 0 0 1
X (∞) = X (0) L
0 0 1 0
8 7
0 0
= 0 45 21
15 15
66 66 0 2 0 0 7
9 9
0 0 0 1
= 0.4343 0 0 0.5657 .
Eventually, 43.43% of those currently in training (or about 29 people) will be dropped from the programme
and 56.67% (or about 37 people) will become supervisors.
Example 10.1.14. Solve the previous problem if all 66 people are currently in the classroom phase of training
programme.
X (∞) = X (0) L
1 0 0 0
8 0 0 7
= 0 1 0 0 15 15
2 0 0 7
9 9
0 0 0 1
8 7
= 15 0 0 15 .
8
Thus, 15 × 66 ' 35 people will ultimately drop from the program and the remaining 66 − 35 = 31 people
eventually become supervisors.
Unit 11
Course Structure
• Geometric programming
Posynomial and Signomial: A generalised polynomial that consist of a finite number of monomials such
as
n
X m
Y
f (x) = Cj (xi )aij
j=1 i=1
is said to be posynomial if all the coefficients Cj are positive; is called the signomial if the coefficients Cj are
negative.
85
86 UNIT 11.
The G.P approach instead of solving a non-linear programming problem first finds the optimal value of the
objective function by solving its dual problem and then determines an optimal solution to the given NLPP
from the optimal solution of the dual.
n
X
min f (x) = cj uj (x)
j=1
such that xi ≥ 0 with cj > 0
Yn
and uj (x) = (xj )aij ,
i=1
But,
∂ arj
uj (x) = uj (x).
∂xr xr
Putting this result in the previous equation, we get,
n
∂f (x) 1 X
= arj cj uj (x) = 0.
∂xr xr
j=1
Let, f ∗ (x) be the minimum value of f (x). Since, each xr and cj is positive, therefore f ∗ (x) will also be
∂f (x)
positive. Defining by f ∗ (x) we get,
∂xr
n
X arj cj uj (x)
= 0.
f ∗ (x)
j=1
cj uj (x)
yj = , j = 1, 2, . . . , n.
f ∗ (x)
Using this transformation, the necessary conditions for local minimum becomes,
n
X
arj yj = 0; r = 1, 2, . . . , m. (11.1.1)
j=1
11.1. GEOMETRIC PROGRAMMING 87
At the optimal solution, conditions (11.1.1) and (11.1.2) are the necessary conditions for optimality of non-
linear function and also known as orthogonality and normality conditions respectively. This condition give a
unique value of yj for m + 1 = n and all equations are independent but for n > (m + 1), the value of yj no
longer remains independent.
[Degree of G.P difficulty (D.D) of G.P is equal to number of terms in G.P -(1 + number of variables in G.P]
AY = b,
where
1 1 ··· 1 y1 1
a11 a12 ··· a1n y2 0
A= . Y = . and b = .
.. .. ..
.. . . . .. ..
am1 am2 · · · amn yn 0
Thus, we require to form the normality and orthogonality condition AY = B. This means that the original
NLP problem is reduced to one of finding the set of values of Y that satisfy this linear non-homogeneous
equation. Hence, to determine the unique value of yj for the purpose of minimizing effect.
(i) Rank (A, b) > Rank (A), there will be no solution, where (A, b) denote the augmented matrix.
(iii) Rank (A) < n, i.e n > m + 1, that is infinite number of solutions exist.
To find the minimum value of f (x)
n
X
Now, since yj = 1, therefore
j=1
n
n
Y h i P yj
{f ∗ (x)}yj = f ∗ (x) j=1
= f ∗ (x)
j=1
88 UNIT 11.
Thus,
n
cj yj
Y
∗
min f (x) = f (x) = and
yj
j=1
n
cj yj
Y
∴ f (x) ≥ .
yj
j=1
where yj must satisfy the orthogonality and normality conditions. For the given value of f ∗ and unique value
of yj , the solution to a set of equations can be obtained from
m
Y
cj (xi )aij = yj f ∗ (x).
i=1
Dual Problem:
n
cj yj
Y
max g(y) =
yj
j=1
Xn
subject to aij yj = 0
j=1
n
X
and yj = 1
j=1
yj ≥ 0.
Theorem 11.1.1. If x is a feasible solution vector of the unconstraint of a primal geometric programming and
y is a feasible solution vector for DP (Dual problem), then
f (x) ≥ g(y). (Primal Dual inequality)
Proof. The expression for f (x) can be written as
m
Y
n
Cj (xi )aij
i=1
X
f (x) = .
yj
j=1
m y1 m y2 m yn
Y Y Y
ai1 ai2 ain
C1 (xi ) C2 (xi ) Cn (xi )
i=1 i=1 i=1
≥ · ···
y1
y2
yn
m
yj
Y
aij
n j
C (xi )
Y i=1
or, f (x) ≥ [since y1 + y2 + · · · + yn = 1 for normality condition]
y j
j=1
n
n m
Cj yj Y
P
Y aij yj
or, f (x) ≥ (xi )j=1 (11.1.3)
yj
j=1 i=1
n n
" #
Cj yj
Y X
or, f (x) ≥ aij yj = 0, orthogonality condition
yj
j=1 i=1
or, f (x) ≥ g(y).
P (i) yir
P yir
n
Y
r=1 airj
g (x) m P (i) Cir (xi )
i Y Y i=1
≥
yir
P (i)
P
i=1 r=1
yir
r=1
PP
(i) PP(i)
yir
m P (i) n P (i)
Cir yir Y
yir Y Y airj yir X
(gi (x)) r=1 ≥ (xi ) r=1 yir .
yir
i=1 r=1 i=1 r=1
Hence, yir
PP(i)
m P (i) n
yir Y P (i)
Y Y Cir airj yir X
1≥ (xi ) r=1 yir . (11.1.4)
yir
i=1 r=1 i=1 r=1
Solution.
1 0 −3 1 y1 0
−1 1 1 1 y2 0
A=
0 −2 1 1
Y =
y3 and b=
0
1 1 1 1 y4 1
and we get AY = b with
1 1 5 3 761
y1 = , y2 = , y3 = , y4 = , f ∗ (x) =
2 6 24 24 50
x∗1 = 1.315, x∗2 = 1.21, x∗3 = 1.2
Now AY = b gives
1 0 −3 1 y1 0
−1 1 1 1 y2 0
0 −2 1 1 y3 = 0
1 1 1 1 y4 1
which leads to the following system of equations
y1 − 3y3 + y4 = 0 (11.1.5)
−y1 + y2 + y3 + y4 = 0 (11.1.6)
−2y2 + y3 + y4 = 0 (11.1.7)
y1 + y2 + y3 + y4 = 1 (11.1.8)
11.1. GEOMETRIC PROGRAMMING 91
−y1 + y2 + y3 + y4 − y1 − y2 − y3 − y4 = −1
1
⇒ −2y1 = −1 ⇒ y1 =
2
−y1 + y2 + y3 + y4 + 2y2 − y3 − y4 = 0
⇒ −y1 + 3y2 = 0 ⇒ 3y2 = y1
1 1
⇒ 3y2 = ⇒ y2 = .
2 6
y1 − 3y3 + y4 + 2y2 − y3 − y4 = 0
⇒ y1 + 2y2 − 4y3 = 0 ⇒ 4y3 = y1 + 2y2
1 1 5 5
⇒ 4y3 = + ⇒ 4y3 = ⇒ y3 = .
2 3 6 24
Now,
y4 = 1 − (y1 + y2 + y3 )
1 1 5
= 1− + +
2 6 24
12 + 4 + 5
= 1−
24
21
= 1−
24
3
=
24
1 1 5 3
∴ y1 = , y2 = , y3 = , y4 =
2 6 24 24
Now
m
Y
cj (xi )aij = yj f ∗ (x)
i=1
1 761
∴ 7x1 x−1
2 = ×
2 50
761
⇒ x1 x−1
2 = (11.1.9)
700
−2 1 761
and 3x2 x3 = ×
6 50
761
⇒ x2 x−2
3 =
900
−3 5 761
5x1 x2 x3 = ×
24 50
761
⇒ x−3
1 x2 x3 = (11.1.10)
1200
3 761
and x1 x2 x3 = ×
24 50
761
⇒ x1 x2 x3 = (11.1.11)
400
Now (11.1.10) and (11.1.11) gives
x−3
1 x2 x3 761/1200
=
x1 x2 x3 761/400
1
⇒ x−41 =
3
−1/4
1
x1 = = 31/4 = 1.316.
3
∴ x∗1 = 1.316
Now from, (11.1.9) we get
761
x1 x−1
2 =
700
761 1
⇒ x−1
2 = ×
700 x1
700
⇒ x2 = × x1
761
700
⇒ x2 = × 1.3616
761
⇒ x2 = 1.21
∴ x∗2 = 1.21
Now, from (11.1.9) we get
761
x2 x−2
3 =
900
900
x23 = x2
761
900 √
r
x3 = × 1.21 = 1.2
761
∴ x∗3 = 1.2
11.1. GEOMETRIC PROGRAMMING 93
Since, maximization of f (y) is equivalent to log f (y), taking log both sides we have
log f (y) = 0.5(1 − 3y4 ){log 10 − log(1 − 3y4 )} + 0.5(1 − y4 ){log 4 − log(1 − y4 )}
+y5 (log 5 − log y4 ) + y4 {log 1 − log y4 } (11.1.12)
The value of y4 maximizing log f (y) must be unique, because the primal problem has a unique minimum.
Differentiating (11.1.12) with respect to y4 and equating to zero, we have
∂ 3 3 3
f (y) = − log 10 − − + − log(1 − 3y4 )
∂y4 2 2 2
1 1 1
− log 4 − {− + − log(1 − y4 )}
2 2 2
+ log 5 − {1 + log y4 } + log 1 − {1 + log y4 } = 0
x−1
2 = 0.42 × 9.661
1
⇒ x∗2 = = 0.647
0.16 × 9.661
Unit 12
Course Structure
• Constraint Geometric Programming Problem
min z = f (x)
P (i)
X
such that gi (x) = cij uir (x) = 1, i = 1, 2, . . . , M.
r=1
n
Y
where P (i) denotes the number of terms in the i-th constraint and uir (x) = (xi )airj .
j=1
∂F
(ii) = gi (x) − 1 = 0; i = 1, 2, . . . , M.
∂λi
So, long as right hand side in the second constraint gi (x) = 1, it can be obtained in this form by simple
transformation. However, gi (x) = 0 is not admissible because solution space required x > 0. Considering
once again condition (i), we have
n M P (i)
∂F X cj atj cj (x) X X cir airt uir (x)
= + λi .
∂xt xt xt
j=1 i=1 r=1
95
96 UNIT 12.
cj uj (x)
We have seen in earlier discussion that yj were all positive, because yj = > 0. However, in the
f ∗ (x)
equality constraint case, yj are again positive. But, yir may be negative because λi need not be non-negative.
To formulate a dual function it is desirable to all yir > 0. But if one of the yir is negative, then its sign
can be reversed by writing the term in the Lagrange function as λq {1 − gq (x)}. Once again normality and
orthogonality conditions can be derived by solving a system of linear equations
n
X
atj yj
j=1
When these equations have a unique solution, the optimal of the original problem can be obtained from the
definition of yj and yir in terms of f ∗ (x) and x. In case, these equations have an infinite number of solution,
we tend to maximize the dual function given by
n M
yj Y P (i) yrj YM
Y cj Y crj
max f (y) = (vi )
yj yij
j=1 i=1 r=1 i=1
P (i)
X
where vi = yir such that the orthogonality and normality constraints.
r=1
In the above functions the constraints are linear and therefore it is easy to obtain the optimal solution.
Moreover, we may also work with log of the dual function which is linear in the variable δi = log yj and
δir = log yir .
Example 12.1.1. Solve the following NLPP by G.P.
32
min f (x) = 2x1 x−3 −1 −2
2 + 4x1 x2 + x1 x2
3
such that x−1 2
1 x2 = 0
x1 , x2 ≥ 0.
Solution. Given problem derive as
32
min f (x) = 2x1 x−3 −1 −2
2 + 4x1 x2 + x1 x2
3
such that 0.1x−1 2
1 x2 = 1
x1 , x2 ≥ 0.
12.1. CONSTRAINT GEOMETRIC PROGRAMMING PROBLEM 97
Dual problem:
y1 y2 y3 y4
2 4 32 0.1
max f (y) = (y4 )y4
y1 y2 3y3 y4
such that
y1 + y2 + y3 = 1
y1 − y2 + y3 − y4 = 0
−3y1 − 2y2 + y3 + 2y4 = 0
Expressing each of the variable in the objective function in terms of y1 , we get
y1 !1− 4 y1 1
2 4 3
32 3 y1 8
max f (y1 ) = 4 (0.1) 3 y1 −1
y1 1 − 3 y1 y1
where
4
y2 = 1 − y1
3
y1
y3 =
3
8
y4 = y1 − 1
3
Taking log both sides of f (y1 ) and differentiating with respect to y1 , we have,
F (y1 ) = log f (y1 )
2 4 4
= y1 log + 1− y1 log 4 − log 1 − y1
y1 3 3
y1 8
{log 32 − log y1 } + y1 − 1 log(0.1)
3 3
Now,
dF 2 16 32 8
= log +2− y1 + log + log(0.1) = 0
dy1 y1 3 y1 3
⇒ y1 = 0.662
The values of the other variables are
y1 = 0.662, y2 = 0.217, y3 = 0.221, y4 = 0.766
cj uj
Using the relation yj = ∗ we obtain
f (x)
c1 u1 2x1 x−12
y1 = =
f ∗ (x) f ∗ (x)
c2 u2 4x−1
1 x2
−1
y2 = =
f ∗ (x) f ∗ (x)
c3 u3 32x1 x2
y3 = ∗ =
f (x) 3f ∗ (x)
c4 u4 x−1
1 x2
2
y4 = =
f ∗ (x) f ∗ (x)
98 UNIT 12.
1.
2.
10
min f (x) = 2x1 + 4x2 +
x1 x2
x1 , x2 ≥ 0
3.
3x1 x22
min z = + + x21 x2
x2 x1
such that
1 2 −1 1
x x + x2 x1 = 1
4 1 2 9
1 x2
2 2 +4 =2
x1 x21
x1 , x2 ≥ 0.
Unit 13
Course Structure
• Inventory Control/Problem/Model
Indirect inventory includes those items which are necessarily required for manufacturing but do not become
the component of finished products like oil, grease, lubricants, petrol, office materials, etc.
99
100 UNIT 13.
Inventory Decision
Deterministic Probabilistic
(2) Shortage or stockout cost (C2 or Cs ): The penalty cost which is incurred as a result of running out of
stock or shortage is known as shortage or stockout cost. It is usually denoted by C2 per unit of goods
for a specified period. This cost arises due to shortage of goods, sales may be lost, goodwill may be lost
and so on.
(3) Set up or ordering cost (C3 or C0 ): This includes the fixed cost associated with obtaining goods during
placing of an order or purchasing or manufacturing or setting up a machinery before starting production.
It is usually denoted by C3 or C0 per production run (cycle).
Inventory level
q = Rt
Time
t t
Let each production cycle be made at fixed interval t and therefore the quantity q already present in the
beginning should be
q = Rt, (13.2.1)
where R is a demand rate. Since, the stock in small time dt is Rt dt, therefore, the stock in total time t will be
Zt
1 1
R t dt = Rt2 = qt.
2 2
0
102 UNIT 13.
Thus,
1 1
The cost of holding inventory per production run = C1 qt = C1 Rt2 (13.2.2)
2 2
The set up cost = C3 per production run for interval t.
1
Total cost = C1 Rt2 + C3 (13.2.3)
2
Therefore, total average cost is given by
1 2
2 C1 Rt + C3 1 C3
C(t) = = C1 Rt + (Cost Equation) (13.2.4)
t 2 t
The condition of minimum or maximum of C(t),
dh i
C(t) = 0
dt
1 C3
⇒ C1 R − 2 = 0
2 r t
2C3
⇒ t∗ = (13.2.5)
C1 R
d2 2C3
Also, 2 C(t) = 3 , which is obviously positive for the value of t∗ . Hence, C(t) is minimum for optimum
dt t
time interval t∗ and optimum quantity to be produced or ordered at each interval t∗ is given by
r r
∗ ∗ 2C3 2C3 R
q = Rt = R = (13.2.6)
C1 R C1
which is called optimal lost size formula and the corresponding minimum cost
r r
∗ 1 2C3 C1 R
Cmin = RC1 + C3
2 C1 R 2C3
r r
C1 C3 R C1 C3 R
= +
2 2
p
= 2C1 C3 R per unit time.
13.2.2 Model I(b): Economic lot size with different rates of demand in different cycles
In model I(a), the total demand D is prescribed over the total period T instead of demand rate being constant
for each production cycle, that is rate of demand being different in different production cycles.
Let q be the fixed quantity produced in each production cycle. Since, D is the total demand prescribed
over the time period T , the number of production cycle will be n = D/q. Also, let the total time period
T = t1 + t2 + t3 + · · · + tn . Obviously, the carrying cost for the period T will be
1 1 1 1 1
qt1 C1 + qt2 C1 + · · · + qtn C1 = C1 q(t1 + t2 + · · · + tn ) = C1 qT
2 2 2 2 2
13.2. THE ECONOMIC ORDER QUANTITY (EOQ) MODEL WITHOUT SHORTAGE 103
D
Set up cost will be equal to C3 . Thus, we obtain the cost equation for period T .
q
1 D
C(q) = C1 qT + C3
2 q
Inventory level
1 1 1 qt 1
qt qt 2 qt
2 1 2 2 3 2 n
Time
t1 t2 t3 tn
d2 C 2C3 D
Also, = > 0, which minimizes the total cost C(q) and the corresponding minimum value will be
dq 2 q3
s s
2C3 D
1 T C1
Cmin = C1 T + C3 D
2C3 D
2 C1 T
r r
C1 C3 T D C1 C3 T D
= +
2 2
p
= 2C1 C3 DT
Note 13.2.2. Here we observed that the fixed demand rate R in model I(a) is replaced by the average demand
rate D/T .
104 UNIT 13.
Example 13.2.3. You have to supply your customer 100 units of a certain product every Monday. You
obtained the product from a local supplier at Rs. 60 per unit. The cost of ordering and transportation from the
supplier is Rs. 150 per order. The cost of carrying inventory is estimated at 15% per year of the cost of the
product carried.
(ii) Find the lot size which will minimize the cost of the system.
Solution. Here
R = 100 units/week.
C3 = 150 per order.
15 × 60
C1 = Rs. per unit per week
100 × 52
9
= Rs.
52
(i)
1 C3
C(t) = 60R + C1 Rt + .
2 t
(ii)
r
∗ 2C3 R
q =
C
r 1
2 × 150 × 100 × 52
= .
9
= 416 units
(iii)
q∗ 416
t∗ = = = 4.16 weeks
R 100
(iv)
R 100
η= = orders per week
q∗ 416
(v)
p
Cmin = 60R + 2C1 C3 R
r
9
= (60 × 100) + 2× × 150 × 100
52
= 6000 + 72
= Rs. 6072
13.2. THE ECONOMIC ORDER QUANTITY (EOQ) MODEL WITHOUT SHORTAGE 105
Example 13.2.4. An aircraft company uses rebate at an approximate customer rate of 2500 kg per year. Each
unit costs Rs. 30 per kg and the company personal estimate that it cost Rs. 130 to place an order and that
the carrying cost of inventory is 10% per year. How frequently should orders be placed? Also determine the
optimum size of each order.
Solution. Here
R = 2500 kg per year.
C3 = Rs. 130
C1 = Cost of each unit × inventory carrying cost
1
= Rs. 30 ×
30
= Rs. 3 per unit per year
r
∗ 2C3 R
q =
C
r 1
2 × 130 × 2500
=
3
= 466 units
q∗ 466
∴ t∗ = = = 0.18 year = 0.18 × 12 months = 2.16 months
R 2500
13.2.3 Model I(c): Economic lot size with finite rate of Replenishment (finite production)
[EPQ model]
Some Notations:
C1 = Holding cost per unit item per unit time.
R = Demand rate.
K = Production rate is finite, uniform and greater than R.
t = interval between production cycle.
q = Rt
In this model, each production cycle time t consists of two parts: t1 and t2 , where
(i) t1 is the period during which the stock is growing up at a rate of (K − R) items per unit time.
(ii) t2 is the period during which there is supply but there is only a constant demand at the rate of R.
It is evident from the graphical situation (see fig. 13.1) that
Q Q
t1 = and t2 =
K −R R
t = t1 + t2
Q Q
= +
K −R R
QK
=
R(K − R)
106 UNIT 13.
Inventory level
R
Q
R
K-
Q Q
K-R R
Time
t1 t2
t
Figure 13.1
which gives
K −R
Q = Rt
K
K −R
= q [∵ q = Rt]
K
1
Now, Holding cost for the time period t is C1 Qt and the set up cost for period t is C3 .
2
∴ The total average cost is
1
2 C1 Qt + C3
C(t) =
t
1 K −R R
C(q) = C1 q + C3 [∵ q = Rt] (13.2.7)
2 K q
For optimum value of q, we have
dC
=0
dq
1 R C3 R
⇒ 1− C1 − 2 = 0
2 K q
s s
2C3 RK 2C3 R
⇒q= = R
C1 (K − R) C1 1 − K
d2 C 2C3 R
Now, 2
= >0
dq q3
s
2C3 R
∴ q∗ = R
(optimal lot size)
C1 1 − K
s
∗ q∗ 2C3
and t = = R
R C1 R 1 − K
13.2. THE ECONOMIC ORDER QUANTITY (EOQ) MODEL WITHOUT SHORTAGE 107
Note 13.2.5. 1. If K = R, Cmin = 0, which implies that there will be no carrying cost and set up cost.
2. If K → ∞, i.e., production rate is infinite, then this model becomes exactly same as Model I(a).
Example 13.2.6. A contractor has to supply 10,000 bearings per day to an auto-mobile manufacturer. He
finds that when he starts a production run, he can produce 25,000 bearings per day. The cost of holding a
bearing in stock for one year is 20 paisa and set up cost of a production run is Rs. 180. How frequently (time)
should production run be made?
Solution.
Course Structure
• Model II(a): EOQ model with constant rate of demand scheduling time constant.
• Model II(b): EOQ model with constant rate of demand scheduling time variable.
14.1 Model II(a) : EOQ model with constant rate of demand scheduling time
constant
Model II is the extension of Model I allowing shortages.
Inventory level
z
z
Stock tp R
z
R B D
q Time
p
O
Shortage
C
tp
108
14.1. MODEL II(A) : EOQ MODEL WITH CONSTANT RATE OF DEMAND SCHEDULING TIME CONSTANT109
Some Notations:
C1 = Holding cost
C2 = Shortage cost
R = Demand rate
tp = Scheduling time period is constant
qp = Fixed lot size (Rtp )
z = Order level to which the inventory raised in the beginning of each scheduling period.
In this model, we can easily observe that the inventory carrying cost C1 and also the shortage cost C2 will
be involved only when 0 ≤ z ≤ qp .
1 z 2 C1 1 C2 C3
Total average cost is C(z) = + (qp − z)2 +
2 qp 2 qp tp
C3
Note 14.1.1. Since, the set up cost C3 and period tp are constant, the average set up cost also being
tp
constant, will be considered in the cost equation.
Now
dC 1 C1 1 C2
= · · 2z + 2(qp − z)(−1) = 0
dz 2 qp 2 qp
C2 C2
⇒z= qp = Rtp .
C1 + C2 C1 + C2
d2 C C1 C2 C1 + C2
2
= + = > 0.
dz qp qp qp
C2
∴ z∗ = Rtp
C1 + C2
C1 C2
Cmin = Rtp .
2(C1 + C2 )
110 UNIT 14.
14.2 Model II(b) : EOQ model with constant rate of demand scheduling time
variable
Assumptions:
(iii) q = Rt.
Formulate the model. Show that the optimal order quantity per run which minimizes the total cost is
s
2RC3 (C1 + C2 )
q=
C1 C2
Since, all the assumptions in this model are same as in Model II(a), except with the difference that the schedul-
C3
ing time period t is not constant here, so, it now becomes important to consider the average set up cost in
t
the cost equation.
C1 z 2 1 C2 C3
C(z, t) = + (Rt − z)2 + .
2Rt 2 Rt t
∂C ∂C
For the optimization, = 0 and = 0 which gives
∂z ∂t
1 2C1 z 2C3
− (Rt − z) = 0
t 2R 2R
C2 Rt
∴z=
C1 + C2
Now
1 C1 z 2
C2 2 1 C2
− 2 + (Rt − z) + C3 + 0+ 2(Rt − z) + 0 = 0
t 2R 2R t 2R
1 C1 z 2
C2 2 C2
⇒ − 2 + (Rt − z) + C3 + (Rt − z) = 0
t 2R 2R t
−(C1 + C2 )z 2 + C2 R2 t2 = 2RC3
14.2. MODEL II(B) : EOQ MODEL WITH CONSTANT RATE OF DEMAND SCHEDULING TIME VARIABLE111
√
Further, it is interesting to note that the minimum cost is less than that already given by Model I(a) 2C1 C3 R.
(Draw figure as like Model II(a) replaced by t).
A
K-R
Q1
t1 t2 C D E
Time
O Q1 B Q1 t3 t4
K-R R
Q2
(vi) Shortage cost is Rs. C2 per quantity unit per unit time.
1
Holding Cost = C1 × ∆OAC = C1 × Q1 (t1 + t2 ).
2
1
Shortage Cost = C2 Q2 (t3 + t4 )
2
and Set up cost C3 .
14.3. MODEL II(C) : EPQ MODEL WITH SHORTAGES 113
Q2
Q2 = Rt3 , t4 = .
K −R
Rt3
Q2 = (K − R)t4 = .
K −R
Finally,
q = Rt = R(t1 + t2 + t3 + t4 )
Rt2 Rt3
= R + t2 + t3 +
K −R K −R
(t2 + t3 )KR
=
nK − R o
1 Rt2 Rt3
2 C 1 (Rt 2 ) K−R + t2 + C 3 Rt3 t3 + K−R + C3
C = Rt2 Rt3
K−R + t2 + t3 + K−R
n 2
C2 t23 RK
o
1 C1 t2 RK
2 K−R + K−R + C3
=
R
(t2 + t3 ) 1 + K−R
1 2 + C t2 ) RK
2 (C 1 t2 2 3 K−R + C3
=
K
(t2 + t3 ) K−R
1 2
2 (C1 t2 + C2 t23 )RK + C3 (K − R)
= .
K(t2 + t3 )
∂C ∂C
= 0, = 0,
∂t2 ∂t3
s s
2C3 C2 (1 − R/K) 2RC3 (C1 + C2 ) 1
t∗2 = , ∗
q =
(R(C1 + C2 )C1 (C1 C2 1 − R/K
s s
2C3 C1 (1 − R/K) 2RC1 C2 C3 (1 − R/K)
t∗3 = , Cmin =
(R(C1 + C2 )C2 C1 + C2
1 2
2 (C1 t2 + C2 t23 )RK + C3 (K − R)
C= .
K(t2 + t3 )
114 UNIT 14.
Now,
∂C
= 0.
∂t2
K(t2 + t3 ) 12 C1 × 2t2 RK − 12 (C1 t22 + C2 t23 )RK + C3 (K − R) K
⇒ =0
K 2 (t2 + t3 )2
1
⇒ K(t2 + t3 ) · C1 t2 RK − (C1 t22 + C2 t23 )RK + C3 (K − R) K = 0
2
1 1
⇒ C1 t22 RK 2 + C1 t2 t3 RK 2 − C1 t22 RK 2 − C2 t23 RK 2 − C3 K(K − R) = 0
2 2
1 1
⇒ C1 t22 RK 2 + C1 t2 t3 RK 2 − C2 t23 RK 2 − C3 K(K − R) = 0
2 2
1 1
⇒ C1 t2 RK + C1 t2 t3 RK − C2 t23 RK 2 = C3 K(K − R)
2 2 2
2 2
1
⇒ RK (C1 t2 + 2C1 t2 t3 − C2 t23 ) = C3 K(K − R)
2 2
2
2C3 (1 − R/K)
⇒ C1 t22 + 2C1 t2 t3 − C2 t23 =
R
2C3 (1 − R/K)
⇒ C1 t22 + 2C1 t2 t3 + C1 t23 − C1 t23 − C2 t23 =
R
2C 3 (1 − R/K)
⇒ C1 (t2 + t3 )2 − t23 (C1 + C2 ) =
R
2C 3 (1 − R/K)
⇒ C1 (t2 + t3 )2 = + t23 (C1 + C2 )
R
2C3 (1 − R/K) t23 (C1 + C2 )
⇒ (t2 + t3 )2 = +
RC1 C1
s
2
2C3 (1 − R/K) t3 (C1 + C2 )
⇒ t2 + t3 = + .
RC1 C1
Also,
1 1 2 2
K(t2 + t3 ) × C2 × 2C3 RK − (C1 t2 + C2 t3 )RK + C3 (K − R) K = 0
2 2
1 1
⇒ C2 t2 t3 RK 2 + C2 t23 RK 2 − C1 t22 RK 2 − C2 t23 RK 2 − C3 (K − R)K = 0
2 2
1 2 2 2 1 2 2
⇒ C2 t3 RK + C2 t2 t3 RK − C1 t2 RK = C3 (K − R)K
2 2
2C3 (1 − R/K)
⇒ C2 t23 + 2C2 t2 t3 + C2 t22 − (C1 + C2 )t22 =
R
2C 3 (1 − R/K)
⇒ C2 (t2 + t3 )2 − (C1 + C2 )t22 = .
R
14.3. MODEL II(C) : EPQ MODEL WITH SHORTAGES 115
Now,
2C3 (1 − R/K)
C1 (t2 + t3 )2 − (C1 + C2 )t23 =
R
2 2 2C3 (1 − R/K)
C2 (t2 + t3 ) − (C1 + C2 )t2 =
R
⇒ C1 (t2 + t3 ) − (C1 + C2 )t3 = C2 (t2 + t3 )2 − (C1 + C2 )t22
2 2
Thus,
2C3 (1 − R/K)
C2 (t2 + t3 )2 − (C1 + C2 )t22 =
R
2 2
C2 C2 2C3 (1 − R/K)
⇒ C2 t3 + t3 − (C1 + C2 ) t3 =
C1 C1 R
2
C2
C2 2C3 (1 − R/K)
⇒ C2 + 1 t23 − (C1 + C2 ) 22 t23 =
C1 C1 R
C2 (C2 + C1 )2 2 (C1 + C2 )C22 2 2C3 (1 − R/K)
⇒ t3 − t3 =
C12 C12 R
2
t3 2C3 (1 − R/K)
(C1 + C2 ) C2 (C1 + C2 ) − C22 =
⇒ 2
C1 R
2
t3 (C1 + C2 ) 2C3 (1 − R/K)
⇒ 2 C1 C2 =
C1 R
t23 (C1 + C2 ) 2C3 (1 − R/K)
⇒ C2 =
C1 R
2C 1 C 3 (1 − R/K)
⇒ t23 =
R(C1 + C2 )C2
s
2C1 C3 (1 − R/K)
⇒ t∗3 = .
R(C1 + C2 )C2
116 UNIT 14.
Now,
C2 ∗
t∗2 = t
C1 3
s
C2 2C1 C3 (1 − R/K)
=
C1 R(C1 + C2 )C2
s
2C1 C3 (1 − R/K)C22
=
C12 R(C1 + C2 )C2
s
2C2 C3 (1 − R/K)
= .
R(C1 + C2 )C1
Now,
"s s #
KR 2C2 C3 (1 − R/K) 2C1 C3 (1 − R/K)
q∗ = +
K −R R(C1 + C2 )C1 R(C1 + C2 )C2
"s (r r )#
R 2C3 (1 − R/K) C2 C1
= +
(1 − R/K) R(C1 + C2 ) C1 C2
s
2C3 (C1 + C2 )R
= × √
R(C1 + C2 )(1 − R/K) C1 C2
s
2C3 (C1 + C2 )2 R2
=
R(C1 + C2 )C1 C2 (1 − R/K)
s s
2RC3 (C1 + C2 ) 2RC3 (C1 + C2 ) 1
= = .
C1 C2 (1 − R/K) C1 C2 1 − R/K
So,
1 ∗2 + C2 t∗2
2 (C1 t2 3 )RK + C3 (K − R)
Cmin =
K(t∗2 + t∗3 )
h i
1 2C1 C2 C3 (1−R/K) 2C1 C2 C3 (1−R/K)
2 R(C1 +C2 )C1 + R(C1 +C2 )C2 + C3 (K − R)
= q
2C3 (1−R/K) (C1 +C2 )
K R(C1 +C2 ) · √
C1 C2
h i
C2 C3 (1−R/K) C1 C3 (1−R/K) C3 (K−R)
R(C1 +C2 ) + R(C1 +C2 ) + 2
= q
2C3 (1−K/R)(C1 +C2 )
K RC1 C2
2C2 C3 (1 − R/K) + 2C1 C3 (1 − R/K) + C3 K(1 − R/K)R(C1 + C2 )
= q .
2C3 (1−R/K)(C1 +C2 )
K RC1 C2
Example 14.3.1. The demand of an item is uniform at a rate of 25 units per month. The fixed cost is Rs. 15
each time a production run is made (Setup cost). The production cost if Rs. 1 per item and inventory carrying
cost is Rs. 0.30 per item per month. If the shortage cost is Rs. 1.50 per item per month, determine how often
to make a production run and of what size it should be?
14.3. MODEL II(C) : EPQ MODEL WITH SHORTAGES 117
Solution. We have,
Thus,
C1 = Rs. 0.30 per item per month.
Here, the demand of an item is uniform. So,
s s
2RC 3 (C 1 + C2 ) 2 × 25 × 15 × (0.30 + 1.50)
q∗ = = ≈ 54 units.
C1 C2 0.30 × 1.50)
and s r
∗ 2C3 (C1 + C2 ) 2 × 15 × (0.30 + 1.50)
t = = = 2.19 months.
RC1 C2 25 × 0.30 × 1.50
Unit 15
Course Structure
• Model III: Multi-item inventory model
2. R1i is the uniform demand rate for the ith item (i = 1, 2, . . . , n).
1 C3i
Ci (t) = C1i Ri t + ,
2 t
1 C3i Ri
or, Ci (qi ) = C1i qi + (15.1.1)
2 qi
118
15.1. MODEL III: MULTI-ITEM INVENTORY MODEL 119
∂C
=0
∂qi
1 C3i Ri
⇒ C1i − =0
2 qi2
r
2c3i Ri
⇒ qi = .
C1i
Thus,
∂2C
> 0, ∀ qi .
∂qi2
The total cost is minimum. Hence, the optimum cost of
r
2C3i Ri
qi∗ = , (i = 1, 2, . . . , n) (15.1.3)
C1i
1. limitation on investment
Now, our problem is to minimize the total cost C given by equation (15.1.2) subject to the constraint (15.1.4).
In this situation, two cases may arise.
n r
X 2C3i Ri
Case I: When C4i qi ≤ M and qi∗ = .
C1i
i=1
In this case, there is no difficulty and hence qi∗ is the optimal solution.
n r
X 2C3i Ri
Case II: When C4i qi > M and qi∗ = .
C1i
i=1
In this case, qi∗ are not required optimal solutions. Thus, we shall use the Lagrange’s multiplier tech-
nique.
n n
!
X 1 C3i Ri X
L= C1i qi + +λ C4i qi − M .
2 qi
i=1 i=1
Example 15.1.1. Consider a shop producing three items, the items are produced in lots. The demand rate for
each item is constant and can be assumed to be deterministic. No back order (shortages) are allowed. The
following data are given below.
Item 1 2 3
H.C 20 20 20
S.C 50 40 60
Cost per unit item 6 7 5
Yearly demand rate 10,000 12,000 7,500
Determine approximately the EOQ when the total value of average inventory levels of three items if Rs.
1,000.
Solution.
√
r r
2C31 R 2 × 50 × 10, 000
q1∗ = = = 100 5 ≈ 223
C 20
∗
√ 11
q2 = 40 30 ≈ 216
√
q3∗ = 150 2 ≈ 210.
Since the average optimal inventory at any time is qi∗ /2, the investment over the average inventory is obtained
by replacing qi by qi∗ /2, that is,
n
X 1 ∗ 223 216 210
C4i q = Rs. 6× +7× +5× = Rs. 1950.
2 i 2 2 2
i=1
We observe that the amount of Rs. 1950 is greater than the upper limit of Rs. 1000. Thus, we try to find the
suitable value of λ by trial and error method for computing qi∗ .
If we put λ = 4, we get
r
∗ 2 × 50 × 10, 000
q1 = = 121
20 + 2 × 4 × 6
q2∗ = 112
q3∗ = 123.
15.1. MODEL III: MULTI-ITEM INVENTORY MODEL 121
q1∗ = 111
q2∗ = 102
q3∗ = 113.
and
Corresponding cost = Rs. 972.50
which is less than Rs. 1000.
From this, we conclude that, the most suitable value of λ lies between 4 and 5.
1112.5
1000
972.5
4 5
To find the most suitable value of λ, we draw a graph between cost and the value of λ as shown in the figure.
This graph indicates that λ = 4.7 is the most suitable value corresponding to which the cost of inventory is
Rs. 999.5, which is sufficiently close to Rs. 1000. Hence, for λ = 4.7, we obtain
q1∗ = 114
q2∗ = 105
q3∗ = 116.
n
1X
Case II: qi > N , then qi∗ are not the required values. So, we use Lagrange’s multiplier technique. Here,
2
i=1
Lagrangian function
n n
!
X 1 C3i Ri 1X
L= C1i qi + +λ qi − N
2 qi 2
i=1 i=1
∂L 1 C3i Ri λ
= C1i − + =0
∂qi 2 qi2 2
n
∂L 1X
= qi − N = 0, i = 1, 2, . . . , n.
∂λ 2
i=1
Solving, we get
r
2C3i Ri
qi∗ =
C1i + λ
n
1X
qi = N.
2
i=1
To obtain the value of qi∗ , we obtain the value of λ by successive trial and error method and satisfying
the given constraint in equality sign.
Example 15.1.2. A company producing three items have a limited storage space of 750 items of all types in
average. Determine the optimal production quantity for each item separately when the following information
is given
Product 1 2 3
H.S(Rs.) 0.05 0.02 0.04
S.C(Rs.) 50 40 60
D.R(per unit) 100 120 75
Solution. We have
q1∗ = 447
q2∗ = 693
q3∗ = 464.
1
The total average inventory is = (447 + 693 + 464) = 802 units,
2
which is greater than 750 units per year. Thus, we have to find the value of the parameter λ by trial and error
method.
From these, we observe that the average inventory level is less than the available amount of items. So we
try for some other values of λ,
λ = 0.004, 0.003, 0.002, etc.
15.1. MODEL III: MULTI-ITEM INVENTORY MODEL 123
For λ = 0.002,
q1∗ = 428
q2∗ = 628
q3∗ = 444
1
Average inventory level = (428 + 628 + 444) = 750,
2
which is equivalent to the given amount of average inventory. Hence, the optimal solutions are
q1∗ = 428
q2∗ = 628
q3∗ = 444.
For λ = 0.004,
r
2 × 50 × 100
q1∗ = = 430
0.05 + 0.004
r
∗ 2 × 40 × 120
q2 = = 632
0.02 + 0.004
r
2 × 60 × 75
q1∗ = = 452
0.04 + 0.004
1
Average inventory level = (430 + 632 + 452) = 757.
2
For λ = 0.003,
r
2 × 50 × 100
q1∗ = = 434
0.05 + 0.003
r
2 × 40 × 120
q2∗ = = 646
0.02 + 0.003
r
∗ 2 × 60 × 75
q1 = = 457
0.04 + 0.003
1
Average inventory level = (434 + 646 + 457) = 768.5.
2
∂L ∂L
= 0, = 0.
∂qi ∂λ
Then, solving we have
r n
2C3i Ri X
qi∗ = , i = 1, 2, . . . , n, and ai qi∗ = A.
C1i + 2λai
i=1
The second equation implies that qi∗ must satisfy the storage constraint in equality sense. The determi-
nation of λ by usual trial and error method automatically gives the optimal value of qi∗ .
Unit 16
Course Structure
• Model IV: Deterministic inventory model with price breaks of quantity discount
16.1 Model IV: Deterministic inventory model with price breaks of quantity
discount
Notations:
Assumptions:
Determine:
125
126 UNIT 16.
We have,
q = Rt (16.1.1)
1
The number of inventories will be given by qt.
2
1 1 q q2
qt = q = (16.1.2)
2 2 R R
The number of lot of inventories will be given by
q2
1 qt 1 1q
= R = (16.1.3)
2q 2 q 2R
C3 = Setup Cost.
qP = the purchasing cost of q units.
1q
C3 I = Cost associated with setup of inventory for period t.
2R
1q
qP I = Cost associated with purchase of inventory for period t.
2R
1q 1q
C3 + qP + C3 I + qP · I
2R 2R
Hence, average cost per unit time,
1 1q 1q
C(q) = C3 + qP + C3 I + qP · I
t 2R 2R
C3 R C3 I qP I q
C(q) = + pR + + since t =
q 2 2 R
1
But the term C3 I being constant throughout the model, it maybe neglected for the purpose of minimization.
2
Therefore,
C3 R qP I
C(q) = + PR + (16.1.4)
q 2
d
For minimum value of C(q), C(q) = 0.
dq
d
C(q) = 0
dq
−C3 R 1
⇒ + PI = 0
q2 2
r
2C3 R
⇒ q∗ = (16.1.5)
IP
Therefore,
C(q ∗ ) =
p
2C3 RP I + P R (16.1.6)
16.1. MODEL IV: DETERMINISTIC INVENTORY MODEL WITH PRICE BREAKS OF QUANTITY DISCOUNT127
16.1.1 Model IV(a): Purchase inventory model with one price break
Consider the table 16.1
where b is the quantity at and beyond which the quantity discount applies. Obviously, P2 < P1 . For any
purchase quantity q1 in the range 1 ≤ q1 < b,
C3 R P1 q1 I
C(q1 ) = + P1 R + (16.1.7)
q1 2
Similarly, for q2 ,
C3 R P2 q2 I
C(q2 ) = + P2 R + (16.1.8)
q2 2
Rule I Compute q2∗ , using (16.1.5). If q2 ≥ b, then the optimum lot size will be q2∗ .
Rule II If, q2 < b, then the quantity discount no longer applies to the purchase quantity q2∗ . Compute q1∗ , then
compare C(q1∗ ) and C(b) given by,
C3 R P1 q1∗ I
C(q1∗ ) = + P 1 R +
q1∗ 2
C3 R P2 Ib
C(b) = + + P2 R
b 2
It shows that,
C3 R C3 R
+ P2 R < ∗ + P1 R [since q1∗ < b and P2 < P1 ]
b q1
P2 Ib P1 Iq1∗
However, may or may not be less than . Hence, we must compare the total cost. So, q ∗ = b.
2 2
Example 16.1.1. Find the optimum order quantity for a product for which the price breaks are as follows:
The monthly demand for a product is 200 units, the cost of storage is 2% of unit cost and cost of ordering
is Rs. 350.
Solution.
R = 200 units per month
I = Rs. 0.02
C3 = Rs. 350
P1 = Rs. 10.00
P2 = Rs. 9.25
r r
∗ 2C3 R 2 × 350 × 200
q2 = = = 870 units > b = 500.
P2 I 9.25 × 0.02
128 UNIT 16.
Since q2∗ = 870 lies within the range q2 ≥ 500, hence the optimum purchase quantity will be q2∗ = 870
units.
Example 16.1.2. Same as the previous example with C3 = Rs. 100. Thus,
r r
2C3 R 2 × 100 × 200
q2∗ = = = 447 units < b = 500.
P2 I 9.25 × 0.02
Then compare C(447) with C(500), that is, the optimum cost of procuring the least quantity which will entitle
or price break, that is,
16.1.2 Model IV(b): Purchase inventory model with two price breaks
Table 16.2
Consider the table 16.2, where b1 and b2 are the quantities which determine the price breaks. The working
rule is as follows:
Step 2: Compute q2∗ , since q3∗ < b2 and q2∗ is also less than b2 because q1∗ < q2∗ < . . . < qn∗ in general. Thus,
there are only two possibilities when q2∗ < b2 , that is, either q2∗ ≥ b1 or q2∗ < b1 .
(i) When q2∗ < b2 but ≥ b1 , then proceed as in case of one price- break only, that is, compare the cost
C(q2∗ ) and C(b2 ) to obtain the optimum purchase quantity.
The quantity with least cost will naturally be optimum.
(ii) If q2∗ < b2 and b1 , then go to step 3.
Step 3: If q2∗ < b2 (and b1 both). Then compute q1∗ which will satisfy the inequality q1∗ < b1 . In this case,
compare the cost C(q1∗ ) with C(b1 ) and C(b2 ) both to determine the optimum purchase quantity.
Example 16.1.3. Find the optimum order quantity for a product for which the price breaks are in table 16.3.
The monthly demand for a product is 200 units. The cost of storage is 2% of the unit cost. Cost of ordering is
Rs. 350.
16.2. PROBABILISTIC INVENTORY MODEL 129
Solution.
Thus, C(750) < C(500) < C(q1∗ ). This shows that, the optimum purchase quantity is q ∗ = 750 units.
(i) t is the constant interval between orders. (daily, monthly, weekly, etc.)
(iii) d is the estimated (random) demand at a discontinuous rate with probability P(d)
Inventory Inventory
z
Stock
z-d z
No Shortage
Time Time
d<z d
(Over Supply)
Shortage
d-z
No Stock
In the model with instantaneous demand, it is assumed that the total demand is fulfilled at the beginning of
the period. Thus, depending on the demanded amount the inventory position may either be positive (surplus
or stock) or negative (shortage).
Case I: d ≤ z
To get the expected cost, we have to multiply the cost by given probability P (d). Further to get the total
expected cost we must sum over all the expected cost. So, the total expected cost per unit time is,
z
X ∞
X z
X ∞
X
C (z) = (z − d) C1 P (d) + C1 · 0 · P (d) + C2 · 0 · P (d) + C2 · (d − z) P (d)
d=0 d=z+1 d=0 d=z+1
Xz X∞
= (z − d) C1 P (d) + C2 · (d − z) P (d) (16.2.1)
d=0 d=z+1
16.2. PROBABILISTIC INVENTORY MODEL 131
But, we can difference (16.2.1) under the summation sign for d = z + 1, the following condition satisfied
Now,
z
X ∞
X
∆C(z) = C1 [((z + 1) − d) − (z − d)]P (d) + C2 [(d − (z + 1)) − (d − z)]P (d)
d=0 d=z+1
Xz ∞
X
= C1 P (d) − C2 P (d)
d=0 d=z+1
z
"∞ z
#
X X X
= C1 P (d) − C2 P (d) − P (d)
d=0 d=0 d=0
z ∞
" #
X X
= (C1 + C2 ) P (d) − C2 . since P (d) = 1
d=0 d=0
∆C(z) > 0
z
X
⇒ (C1 + C2 ) P (d) − C2 > 0
d=0
z
X C2
⇒ P (d) > (16.2.3)
C1 + C2
d=0
Similarly,
∆C(z − 1) < 0
z−1
X C2
P (d) < .
C1 + C2
d=0
Combining, we get
z−1 z
X C2 X
P (d) < < P (d). (16.2.4)
C1 + C2
d=0 d=0
Example 16.2.1. (Newspaper boy problem) A newspaper boy buys papers for Rs. 2.60 each and sells
them for Rs. 3.60 each. He can not return unsold newspapers. Daily demand has the following probability
distribution (Table 16.4).
No. of customers : 23 24 25 26 27 28 29 30 31 32
Probability : 0.01 0.03 0.06 0.10 0.20 0.25 0.15 0.10 0.05 0.05
Table 16.4
If each day, demand is independent of the previous days, how many papers should be ordered each day?
132 UNIT 16.
Solution. Let z=The number of newspapers ordered per day and d=demand that is, the number that could be
sold per day if z ≥ d, P (d)=The probability that the demand will be equal to on a randomly selected day,
If the demand d exceeds z, his profit would become equal to (C2 − C1 )z, and no newspaper will be let unsold.
On the other hand, if d does not exceed z, his profit becomes equal to (C2 − C1 )d − (z − d)C1 , where
(C2 − C1 )d is for the sold papers and (z − d)C1 for the unsold papers. Then the expected net profit per day
becomes equal to
Xz X∞
P (z) = (C2 d − C1 z)P (d) + (C2 − C1 )zP (d).
d=0 d=z+1
where, (C2 d − C1 z)P (d) is for d ≤ z and (C2 − C1 )zP (d) for d > z.
Using finite difference calculus, we know that the condition for maximum value of P (z) is
z
X ∞
X
∆P (z) = [{C2 d − C1 (z + 1)} − (C2 d − C1 z)] P (d) + (C2 − C1 ){(z + 1) − z}P (d)
d=0 d=z+1
z
X ∞
X
= −C1 P (d) + (C2 − C1 ) P (d)
d=0 d=z+1
z
(∞ z
)
X X X
= −C1 P (d) + (C2 − C1 ) P (d) − P (d)
d=0 d=0 d=0
z z
( )
X X
= −C1 P (d) + (C2 − C1 ) 1 − P (d)
d=0 d=0
Xz
= −C2 P (d) + (C2 − C1 ).
d=0
∆P (z) < 0
Xz
or, −C2 P (d) + (C2 − C1 ) < 0
d=0
z
X C2 − C1
or, P (d) > . (16.2.5)
C2
d=0
In this problem, C1 = Rs. 2.60, C2 = Rs. 3.60. The lower limit for demand d is 23 and upper limit is 32.
Therefore, substituting these values in (16.2.5), we get,
z
X 3.60 − 2.60
P (d) > = 0.28.
3.60
d=0
Now, we can easily verify that this inequality holds for z = 27, that is,
27
X
P (d) = P (23) + P (24) + P (25) + P (26) + P (27)
d=23
= 0.01 + 0.03 + 0.06 + 0.10 + 0.20 = 0.40 > 0.28.
Similarly,
26
X
P (d) = 0.20 < 0.28.
d=23
Continuous Case
This model is same as the previous model except that the stock levels are now assumed to be continuous quan-
tities. So, instead of probability P (d), we shall have f (x)dx and in place of summation, we take integration,
where f (x) is the pdf (probability density function). The cost equation for this model becomes
Z z Z ∞
C(z) = C1 (z − x)f (x)dx + C2 (x − z)f (x)dx. (16.2.6)
0 z
dC
The optimal value of z is obtained by equating z to zero the first derivative of c(z), that is, = 0.
dz
Differentiating (16.2.6), we get,
dx z dx ∞
Z z Z ∞
dC
= C1 (1 − 0)f (x)dx + C1 (z − x)f (x) + C2 (0 − 1)f (x)dx + C2 (x − z)f (x)
dz 0 dz 0 z dz z
Z z Z ∞
= C1 f (x)dx − C2 f (x)dx
Z0 z 0 Z z
= C1 f (x)dx − C2 1 − f (x)dx
0 0
Z z
= (C1 + C2 ) f (x)dx − C2 .
0
Thus,
dC
=0
dz Z z
⇒ (C1 + C2 ) f (x)dx − C2
0
Z z
C2
⇒ f (x)dx =
0 C1 + C2
2 dx z
d C
= (C 1 + C 2 ) f (x) = (C1 + C2 )f (x) > 0.
dz 2 dz 0
Hence, we can get optimum value of z satisfying the sufficient condition for which the total expected cost C
is minimum.
134 UNIT 16.
Example 16.2.2. A baking company sells cake by the kg weight, it makes a profit of Rs 5.00 per kg on each
kg sold on the day it is baked. It disposes off all cakes not sold on the day it is baked at a loss of Rs. 1.20 per
kg. If demand is known to be rectangular between 2000 and 3000 kgs, determine the optimal daily amount
baked.
Solution.
where, Z x2
f (x)dx = the probability of an order within x1 to x2 .
x1
Case I: If x ≤ z, then clearly the demand x is satisfied and unsold (z − x) quantities are returned with a loss of
C2 per kg, so, profit is C1 x and loss is C2 (z − x). Hence the net profit becomes, C1 x − C2 (z − x) =
(C1 + C2 )x − C2 z.
Case II: If x > z, then the net profit becomes C1 z. Thus, the total expected profit is given by
Z z Z x2
P (z) = [(C1 + C2 )x − C2 z] f (x)dx + C1 zf (x)dx = P1 (z) + P2 (z) (say).
x1 z
dP (z) d d
= P1 (z) + P2 (z) = 0
dz dz dz
Now,
Z z
P1 (z) = [(C1 + C2 )x − C2 z] f (x)dx
x
Z 1z
dx z
d
P1 (z) = (0 − C2 )f (x)dx + {(C1 + C2 )x − C2 z}f (x)
dz x1 dz x1
Z z
= −C2 f (x)dx + {(C1 + C2 )x − C2 z}f (x)
x1
Z z
= −C2 f (x)dx + C1 zf (z).
x1
Similarly,
x2
dx x2
Z
d
P2 (z) = C1 f (x)dx + C1 zf (z)
dz z dz z
Z x2
= C1 f (x)dx − C1 zf (z).
z
16.2. PROBABILISTIC INVENTORY MODEL 135
Hence, we have,
Z z Z x2
dP (z)
= −C2 f (x)dx + C1 zf (z) + C1 C1 f (x)dx − C1 zf (z) = 0
dz x1 z
Z z Z x2
⇒ −C2 f (x)dx + C1 f (x)dx = 0
x1 z
Z z Z x2 Z z
⇒ −C2 f (x)dx + C1 f (x)dx − f (x)dx = 0
x1 x1 x1
Z z
⇒ −(C1 + C2 ) f (x)dx + C1 = 0
x1
Z z
C1
⇒ f (x)dx = (16.2.7)
x1 C1 + C2
Also,
d2 P (z)
= −(C1 + C2 )f (z) < 0
dz 2
satisfies the sufficient condition of maximum of P (z).
In this problem,
C1 = Rs. 5.00, C2 = Rs. 1.20, x1 = 2000, x2 = 3000.
1 1
f (x) = = .
x2 − x1 1000
Substituting these values in equation (16.2.7), we have
Z z
1 5
dx = = 0.807
2000 1000 5 + 1.20
1
⇒ (z − 2000) = 0.807
1000
⇒ z = 2807 kg.
References
136
POST GRADUATE DEGREE PROGRAMME (CBCS) IN
MATHEMATICS
SEMESTER IV
May, 2020
All rights reserved. No part of this work should be reproduced in any form without the permission in writing
form the Directorate of Open and Distance Learning, University of Kalynai.
Director’s Message
Satisfying the varied needs of distance learners, overcoming the obstacle of distance and reaching the un-
reached students are the threefold functions catered by Open and Distance Learning (ODL) systems. The
onus lies on writers, editors, production professionals and other personnel involved in the process to overcome
the challenges inherent to curriculum design and production of relevant Self Learning Materials (SLMs). At
the University of Kalyani a dedicated team under the able guidance of the Hon’ble Vice-Chancellor has in-
vested its best efforts, professionally and in keeping with the demands of Post Graduate CBCS Programmes
in Distance Mode to devise a self-sufficient curriculum for each course offered by the Directorate of Open and
Distance Learning (DODL), University of Kalyani.
Development of printed SLMs for students admitted to the DODL within a limited time to cater to the
academic requirements of the Course as per standards set by Distance Education Bureau of the University
Grants Commission, New Delhi, India under Open and Distance Mode UGC Regulations, 2017 had been our
endeavour. We are happy to have achieved our goal.
Utmost care and precision have been ensured in the development of the SLMs, making them useful to the
learners, besides avoiding errors as far as practicable. Further suggestions from the stakeholders in this would
be welcome.
During the production-process of the SLMs, the team continuously received positive stimulations and feed-
back from Professor (Dr.) Sankar Kumar Ghosh, Hon’ble Vice-Chancellor, University of Kalyani, who kindly
accorded directions, encouragements and suggestions, offered constructive criticism to develop it within
proper requirements. We gracefully, acknowledge his inspiration and guidance.
Sincere gratitude is due to the respective chairpersons as weel as each and every member of PGBOS
(DODL), University of Kalyani, Heartfelt thanks is also due to the Course Writers-faculty members at the
DODL, subject-experts serving at University Post Graduate departments and also to the authors and aca-
demicians whose academic contributions have enriched the SLMs. We humbly acknowledge their valuable
academic contributions. I would especially like to convey gratitude to all other University dignitaries and
personnel involved either at the conceptual or operational level of the DODL of University of Kalyani.
Their persistent and co-ordinated efforts have resulted in the compilation of comprehensive, learner-friendly,
flexible texts that meet the curriculum requirements of the Post Graduate Programme through Distance Mode.
Self Learning Materials (SLMs) have been published by the Directorate of Open and Distance Learning,
University of Kalyani, Kalyani-741235, West Bengal and all the copyright reserved for University of Kalyani.
No part of this work should be reproduced in any from without permission in writing from the appropriate
authority of the University of Kalyani.
All the Self Learning Materials are self writing and collected from e-book, journals and websites.
Director
University of Kalyani
Optional Paper
MATO 4.3
Marks : 100 (SEE : 80; IA : 20)
Mathematical Biology
(Applied Stream)
Syllabus
• Unit 1 •
Epidemic models: Simple epidemic; SIS epidemic model; SIS epidemic model with specific rate of infection,
SIS epidemic model with constant number of carriers.
• Unit 2 •
General epidemic model; Approximate solution, Recurring epidemic model.
• Unit 3 •
Stochastic epidemic models without removal, Basic system of equations and its solution.
• Unit 4 •
Stochastic epidemic models: with multiple infections; Removal; Carriers; Infectives, immigration and emi-
gration.
• Unit 5 •
Basic model for inheritance of genetic characterises, Hadly-Wienberg law.
• Unit 6 •
Correlation between genetic composition of siblings, Bayes theorem and its applications in genetics
• Unit 7 •
Extension of basic model for inheritance of genetic characteristics, Models for genetic improvement
• Unit 8 •
Genetic Improvement through elimination of Recessives, Selection and Mutation, Alternative Discussion of
selection.
• Unit 9 •
Some basic concepts of fluid dynamics, Hegen-Poiseuille Flow, Reynolds number Flow, Non-Newtonian Flu-
ids
• Unit 10 •
Basic concepts about blood, Cardiovascular system, Special Characteristics of Blood flow, Structure and me-
chanical properties of blood vessels
• Unit 11 •
Non-Newtonian Flow in Circular Tubes, Power-Law, Herschel-Bulkley and Casson fluid flow in circular tubes.
• Unit 12 •
Fahraeus-Lindqvist Effect, Pulsatile Flow in Circular Rigid Tube, Blood Flow through Artery with Mild
Stenosis.
• Unit 13 •
• Unit 14 •
Two-dimensional Flow in Renal Tubule, Function of Renal Tubule, Basic Equations and Boundary Condi-
tions, Solution under approximations.
• Unit 15 •
Diffusion and Diffusion-Reaction Models, Ficks Law of Diffusion, Solutions of One and Two-dimensional
Diffusion Equation.
• Unit 16 •
Diffusivity of Population Models, Diffusion on stability of Single Species, Two Species and Prey-Predator
Model
Contents
1 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Simple Epidemic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 SIS Epidemic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1 SIS Model with Specific Rate of Infection as a Function of t . . . . . . . . . . . . . . 5
1.3.2 SIS Model with Constant Number of Carriers . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Simple Epidemic Model with Carriers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 7
2.1 General Epidemic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Approximate Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Recurring epidemic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 15
3.1 Stochastic Epidemic Model Without Removal . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Basic System of Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2.1 Solution of the System of Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4 19
4.1 Other Stochastic Epidemic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.1.1 Epidemics with Multiple Infections . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.1.2 Stochastic Epidemic Model with Removal . . . . . . . . . . . . . . . . . . . . . . . . 20
5 25
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.2 Basic Model for Inheritance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.3 Hardy-Weinberg Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6 31
6.1 Correlation between Genetic Composition of Siblings . . . . . . . . . . . . . . . . . . . . . . 31
6.2 Bayes Theorem and Its Applications in Genetics . . . . . . . . . . . . . . . . . . . . . . . . . 32
7 37
7.1 Further Discussion of Basic Model for Inheritance of Genetic Characteristics . . . . . . . . . 37
7.1.1 Phenotype Ratios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
7.2 Multiple Alleles and Application to Blood Groups . . . . . . . . . . . . . . . . . . . . . . . . 38
7.3 Models for Genetic Improvement: Selection and Mutation . . . . . . . . . . . . . . . . . . . 41
7.3.1 Genetic Improvement through Cross Breeding . . . . . . . . . . . . . . . . . . . . . 41
CONTENTS
8 43
8.1 Genetic Improvement through Elimination Recessives . . . . . . . . . . . . . . . . . . . . . . 43
8.2 Selection and Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
8.3 An Alternative Discussion of Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
9 49
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
9.2 Some Basic Concepts of Fluid Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
9.2.1 Navier-Stokes Equations for the Flow of a Viscous Incompressible Fluid . . . . . . . 49
9.3 Hagen-Poiseuille Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
9.4 Inlet Length Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
9.5 Reynolds Number of Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
9.6 Non-Newtonian Fluids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
10 57
10.1 Basic Concepts about Blood, Cardiovascular System and Blood Flow . . . . . . . . . . . . . 57
10.1.1 Constitution of Blood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
10.1.2 Viscosity of Blood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
10.1.3 Cardiovascular System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
10.1.4 Special Characteristics of Blood Flow . . . . . . . . . . . . . . . . . . . . . . . . . . 59
10.1.5 Structure and function of Blood Vessels . . . . . . . . . . . . . . . . . . . . . . . . . 60
10.1.6 Principal of Blood Vessels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
10.1.7 Mechanical Properties of Blood Vessels . . . . . . . . . . . . . . . . . . . . . . . . . 61
11 63
11.1 Steady Non-Newtonian Fluid Flow in Circular Tubes . . . . . . . . . . . . . . . . . . . . . . 63
11.1.1 Basic Equations for Fluid Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
11.2 Flow of Power-Law Fluid in Circular Tube . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
11.3 Flow of Herschel-Bulkley Fluid in Circular Tube . . . . . . . . . . . . . . . . . . . . . . . . 65
11.4 Flow of Casson Fluid in Circular Tube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
12 71
12.1 Newtonian Fluid Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
12.1.1 Fahraeus-Lindqvist Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
12.2 Blood Flow through Artery with Mild Stenosis . . . . . . . . . . . . . . . . . . . . . . . . . 75
12.2.1 Effect of Stenosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
12.2.2 Analysis of Mild Stenosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
13 79
13.1 Peristaltic Flows in Tubes and Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
13.1.1 Peristaltic Flows in Biomechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
14 85
14.1 Two Dimensional Flow in Renal Tubule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
14.1.1 Function of Renal Tubule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
14.1.2 Basic Equations and Boundary Conditions . . . . . . . . . . . . . . . . . . . . . . . 85
14.1.3 Solution When Radial Velocity at Wall Decreases Linearly with z . . . . . . . . . . . 87
CONTENTS
15 91
15.1 The Diffusion Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
15.1.1 Fick’s Laws of Diffusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
15.1.2 Some Solution of the One-dimensional Diffusion Equation . . . . . . . . . . . . . . . 93
15.1.3 Some Solutions of the Two-dimensional Diffusion Equation . . . . . . . . . . . . . . 96
16 99
16.1 Application of Diffusion and Diffusion-Reaction Models in Population Biology . . . . . . . . 99
16.2 Absence of Diffusive Instability for Single Species . . . . . . . . . . . . . . . . . . . . . . . 100
16.3 Possibility of Diffusive Instability for Two Species . . . . . . . . . . . . . . . . . . . . . . . 101
16.4 Influence of Diffusion on the Stability of Prey-Predator Models . . . . . . . . . . . . . . . . . 102
CONTENTS
Unit 1
Course Structure
• Terminologies related to epidemic
• Simple epidemic
• SIS Epidemic Model
• SIS Epidemic Model with Specific Rate of Infection as a Function of time
• SIS Model with Constant Number of Carriers
• Simple Epidemic Model with Carriers
1.1 Introduction
The study of mathematical theory of epidemic can be look upon as a continuation of our previous study in the
sense that here also our concern is about the population sizes when effected by epidemics. In fact, we will
draw our attention in modelling of problems of epidemics in mathematical terms. Sometimes such study is
also called the study of mathematical epidemiology.
In order to pose a problem of epidemic, let us think of a small group of individuals who can carry a commu-
nicable infection to a large group of individuals, who can therefore be consider to be capable of the conducting
the disease. Our immediate problem is to investigate how the disease is develop. In order to have a mathe-
matical model of such situation we need some assumption regarding the characteristic of the disease as well
as the mixing of the population. For this we need to consider the basic definition.
• Susceptible Individuals: An individual who is capable of conducting the disease directly or indirectly
from another infected individual and is thereby become an infectious.
• Removed Individuals: An individual who had the disease and has recover or is death and is permanently
immune or is latent (existing but not developed) until recovering an permanent immunity occurs.
1
2 UNIT 1.
• Latent Period: This it the period during which a disease is developed within a newly infected individual
in purely internal way.
• Infections Period: This is the period during which the infected is liable to communicate infectious ma-
terial to susceptible.
• Incubation Period: This is the interval between the exposer to disease and the appearance of symptoms.
• Genial Period: This is the time interval between the appearance of symptoms in one case and the ap-
pearance of symptoms in another case infected from the first.
Remark 1.1.1. The disease under consideration confers permanent immunity upon any individuals who
has completely recovers from it and has a negligible short incubation period.
The population is obviously divided into three classes, viz. susceptible class, infective class and re-
moved class.
We next consider some simple cases depending on the nature of the epidemic. We consider three types of
epidemic, viz. simple, general and recurring epidemic.
Let there n susceptible (S) and let us introduce an simple infective (I) into this group at time t = 0, so
that we have a group of (n + 1) individuals. Let S(t) and I(t) be the respective number of susceptible and
infective at time t so that
S(t) + I(t) = n + 1 (1.2.1)
We now assume that the disease spread in such a way that the average number of new cases of the disease in
an interval ∆t is proportional to both the number of susceptible and the infective.
Let γ > 0 be the constant rate between the members at time ∆t so that
∆S = −γSI∆t (1.2.2)
ln(n)
Fig. 1.1 is known as epidemic curve and has a maximum at τ = . We therefore, conclude that the
n+1
rate appearance of the new cases increases rapidly to begin with rises to a maximum and there after falls to
zero.
dI
dt
t
O
Remark 1.2.1. The above analysis does not less tell us the rate at which the infection is spreading. To do this
we take the basic equation
S+I =N (1.2.7)
dS
= −γSI (1.2.8)
dt
where N is size of total population. Therefore we have,
d
(N − I) = −γ(N − I)I
dt
dI
⇒ = γ(N − I)I [∵ N is time independent] (1.2.9)
dt
On integration of the Eq. (1.2.9), with the condition I(0) = 1, we have
N
I(t) = (1.2.10)
(N − 1)e−γN t + 1
Since γ is positive, I(t) goes to N as t → ∞ one can conclude that every individual in the population will
eventually contact the disease. Thus one can calculate S(t) using Eq. (1.2.7) in the form
N (N − 1)e−γN t
S(t) = (1.2.11)
1 + (N − 1)e−γN t
4 UNIT 1.
t → ∞, so there exist an extreme value when (N − 1)e−γN t − 1 = 0. So the rate has a maximum at
ln(N − 1)
t= = tmax (1.2.14)
γN
N 2γ
dI N
Then = and Imax = 2.
dt max 4
Note:
1. One can note from the expression of tmax that if γ is small tmax tends to the large, i.e., smaller the γ
longer it take to reach the peak value. Also the epidemic will be complete in a much shorter time for a dense
population than for a sparse one.
2. A serious limitation of this epidemic model is that everyone in the population will contact the disease as
many susceptible still remain in the population.
As t → ∞,
N −ρ if N > ρ = γ/β
I(t) → (1.3.6)
0 if N ≤ ρ = γ/β
1.4. SIMPLE EPIDEMIC MODEL WITH CARRIERS 5
Course Structure
• General Epidemic Model
• Approximate Solution
• Recurring Epidemic
• The population is treated as closed (constant) and continious which can be represented by S moving
over to I moving over to R (we ignore both birth and immigration).
• The rate of change of susceptible population is proportional to the number of contacts between the
members of the class S and I, in which we take in term, the number of contacts to be proportional to
the product of the numbers of S and I. This assumption takes care of uniform mixing of the population.
Let r > 0 be the infective rate and γ > 0 be the removed rate and if S0 , I0 be the initial number of members
of S and I respectively, then the governing equations are given by
dS
= −rSI (2.1.1)
dt
dI
= rSI − γI (2.1.2)
dt
dR
= γI (2.1.3)
dt
7
8 UNIT 2.
We are to study these equation with the following conditions given by S = S0 , I = I0 and R = 0 initially at
t = 0. In addition to these we have
d
S(t) + I(t) + R(t) = constant i.e., (S + I + R) = 0 (2.1.4)
dt
From Eq. (2.1.2), we have
dI γ
=r S− I (2.1.5)
dt r
γ dI dI
If S0 < then < 0 and since S(t) < S0 , one can conclude that < 0 for all t. Therefore, it is
r dt dt
such a case in which the infection dies out, i.e., non epidemic takes place. This is known as a “Threshold
Phenomena”. We therefore conclude that there exist a critical value for which the initial susceptible has to
γ
exceed for their to be an epidemic, in other words the relative removal rate must be sufficiently small so as
r
to allow the epidemic to spread.
The Eqs. (2.1.1)–(2.1.4) also enable us to study another behaviour relative to spread of the disease. Since
dR
S(t) is non-increasing and positive lim S(t) → S(∞) exists and since ≥ 0 and R(t) ≤ N then R(∞)
t→∞ dt
exists. Again we have I(t) = N − R(t) − S(t) is follows the lim I(t) → 0.
t→∞
Now we consider some other values in dividing Eq. (2.1.1) by Eq. (2.1.3) when we have
dS r
=− S (2.1.6)
dR γ
On integration we have
r
S = S0 exp − R (2.1.7)
γ
Now since R ≤ N which implies −R ≥ −N , so that
r r
S = S0 exp − R ≥ S0 exp − N > 0 = α (say). (2.1.8)
γ γ
Therefore, lim S(t) is always positive, one can interpreted this by saying that there will always be susceptible
t→∞
remaining in the population. Thus we conclude that some individual will escape the disease all together and in
particular the spread of disease does not stop for the lack of susceptible population. Let us consider a function
1 γ
f (z) = S0 exp − (N − z) − z in which ρ = (2.1.9)
ρ r
Now, f (0) > 0 and f (N ) = S0 − N < 0. Therefore, there must be a positive root for f (z) = 0. Let z0 be
the root, then we have
0 1 1
f (z) = S0 exp − (N − z) − 1 (2.1.10)
ρ ρ
1 1
and f 00 (z) = 2
S0 exp − (N − z) (2.1.11)
ρ ρ
Now since f 00 (z) > 0 and f (N ) < 0, there is only one such root z0 < N . Now we have seen that
R 1
S = S0 exp − i.e., S∞ = S0 exp − (N − S∞ ) (2.1.12)
ρ ρ
Hence, we can say that S∞ is the root of the equation f (z) = 0. Now we can sum up all the results in the
form of a theorem as follows:
2.2. APPROXIMATE SOLUTION 9
Theorem 2.1.1. If S0 < ρ then I(t) decreases monotonically to zero. If S0 > ρ then the number of infective
increases as time t increases and then tends monotonically to zero. Further limt→∞ S(t) exits and S∞ is a
root of the transcendental equation.
dS r
Remark 2.1.1. The equation = − S can be solved under certain approximations when R is known.
dR γ
R
Now we expand the exponential term in the right hand side in powers of , which becoming smaller and
ρ
R
smaller as t → ∞ and can be approximate upto second power of . Therefore, we have
ρ
R2
R
0 ≈ γ N −N 1− + 2 −R
ρ 2ρ
2
R R
⇒R ≈ N − 2
ρ 2ρ
1 2ρ − R
⇒ ≈
N 2ρ2
2ρ2
⇒ ≈ 2ρ − R
N ρ
⇒ R ≈ 2ρ 1 −
N
This is approximate as t → ∞ and hence one should get the ultimate size of the epidemic. If ρ > N , there is
no true epidemic and hence the appearance of epidemic will be there only when ρ < N , i.e., when the effective
removal rate is less than the initial number of susceptible and in this case all persons do not get infected. A
stage may be reached when all the infected person are immediately removed. So in order of epidemic may
10 UNIT 2.
This shows that the initial density of susceptible namely S0 (= N = ρ + γ) is reduced to S∞ (= ρ − γ) which
means that the final number of susceptible falls at a point as far below the threshold value ρ as originally it
was above it. This is known as “Kermack & McKendric Threshold Theorem” .
Remark 2.2.1. • The above theorem corresponds to the general observation of the epidemic tends to built
up more rapidly for the density of susceptible is high on account of over crowding and the removal rate
is relatively low because of the factors that ignorance and inadequate isolation.
• The Eq. (2.2.2) can also be integrated when the approximation is taken upto second powers to R.
Integration leading to approximate solution. We have
dR R
= γ N − R − S0 exp − (2.2.3)
dt ρ
R2
R R
Substituting exp − = 1 − + 2 into the above equation, one get
ρ ρ 2ρ
R2
dR R
= γ N − R − S0 1 − + 2
dt ρ 2ρ
S0 R2
dR S0
⇒ = γ N − S0 + R −1 −
dt ρ 2 ρ2
dR
⇒ = a + bR − cR2
dt
S0 γS0
where a = γ(N − S0 ), b = γ − 1 and c = 2 . On integration, we obtain
ρ 2ρ
2 2cR − b p
tanh−1 = t + c1 , c1 being a constant and q = b2 + 4ac
q q
1 qt
⇒ R(t) = b + q tanh + c2 , c2 is a different constant
2c 2
−qt+c
1 1−e 3
⇒ R(t) = b+q
2c 1 + e−qt+c3
Since q > b and since tanh x increases monotonically from −1 to +1 when x increases from −∞ to +∞,
it follows that the constant c2 and c3 exists and have real values. These constants can also be chosen in such
a way that R(0) = 0. Behaviour of R(t) for large values of time or in other words asymptotic behaviour of
R(t) can be found as
1
lim R(t) = (b + q) (2.2.4)
t→∞ 2c
2.2. APPROXIMATE SOLUTION 11
1h i
R∞ = ρ(S0 − ρ) + ρ{(S0 − ρ)2 + 2S0 I0 }1/2 (2.2.5)
S0
Let us now see if some additional assumption regarding the relative size of the parameter gives some result of
the threshold theorem mention early.
In particular, it is customary to assume that an epidemic is generated through the introduction of a small
number of infected individuals to a population of susceptible. Mathematically, S0 > ρ and I0 > 0. We now
use the quantity
2(S0 − ρ)ρ
lim R(∞) = (2.2.6)
I0 →0 S0
to represents the asymptotic size of an epidemic resulted from the introduction of a small number of infective
into a group of susceptible.
Finally, let us assume that S − 0 is closed to the threshold value ρ, then the epidemic develop only of
S0 > ρ, i.e., S0 = ρ + γ, where γ > 0 is small. Therefore,
γ −1
2γR
lim R∞ = = 2γ 1 + ≈ 2γ (2.2.7)
I0 →∞ ρ+γ ρ
Therefore the asymptotic size of the epidemic is approximately equal to 2γ. Hence, we can state as follows:
The total size of the epidemic resulting from an introduction of trace infection into a population of suscep-
tible whose size S0 is closed to the threshold value ρ is approximately equal to 2(S0 − ρ).
Remark 2.2.2. • It may be remarked that this result is also taken as a part of threshold theorem of epi-
demiology.
ρ2 S0
αγt
R(t) = − 1 + α tanh −φ (2.2.8)
S0 ρ 2
" 2 #1/2
S0 2S0 −1 1 S0
where α = − 1 + 2 (N − S0 ) and φ = tanh −1 .
ρ ρ α ρ
Differentiating, we get
γρ2 α2
dR 1
= sech2 αγt − φ (2.2.9)
dt 2S0 2
dR
This equation defines a symmetrical bell-shaped curve in t − plane (see Fig. 2.1). It may be noted that
dt
dR
“Kermack & McKendric” compared the values of from this equation and found complete agreement with
dt
the data from an actual plague which occur during 1905-06 in Bombay. The typical variations of S(t), I(t)
and R(t) can be represented graphically in Fig. 2.2.
12 UNIT 2.
dR
dt
t
O
S0
8
I0
t t o t
O O
dS
= −rSI + µ (2.3.1)
dt
dI
= rSI − γI (2.3.2)
dt
The steady state conditions are given by
dS dI
=0=
dt dt
Therefore the steady states are given by
γ µ
S= = S0 and I= = I0
r γ
Let us now study about the equilibrium position through the use of
S = S0 (1 + u) and I = I0 (1 + v)
2.3. RECURRING EPIDEMIC 13
where u and v are small quantities. Substituting the above quantities in Eqs. (2.3.1) and (2.3.2), we have
1 du du γ
= −(u + v + uv) ⇒ σ = −(u + v + uv) where σ = (2.3.3)
rI0 dt dt rµ
dv dv 1
= γu(1 + v) ⇒ τ = u(1 + v) where τ = (2.3.4)
dt dt γ
Since u and v are small, so their products may be neglected so that the Eqs. (2.3.3) and (2.3.4) reduced to
du
σ = −(u + v) (2.3.5)
dt
dv
τ = u (2.3.6)
dt
From these equations, we get
d2 v
du 1 1 dv
τ 2 = = − (u + v) = − τ +v
dt dt σ σ dt
2
d v 1 dv 1
⇒ 2 + + v=0 (2.3.7)
dt σ dt τσ
The general solution of the Eq. (2.3.7) is given by
dv
u(t) = τ
dt
1 1 1
= τ v0 − cos ξt + sin ξt + −ξ sin ξt + cos ξt e−t/2σ
2σ 2σξ 2σ
−t/2σ 1
= τ v0 e sin ξt − ξ sin ξt
4σ 2 ξ
−t/2σ 1
= τ v0 e − ξ sin ξt
4σ 2 ξ
This clearly represents damp harmonic motion to discuss the small departure from equilibrium.
14 UNIT 2.
Unit 3
Course Structure
• Stochastic Epidemic Model without Removal
o(∆t)
→0 as ∆t → 0. (3.2.1)
∆t
The probability that there is no change in the time interval (t, t + ∆t) is then given by
X
1− fj (n)∆t + o(∆t) (3.2.2)
j
so that
pn (t + ∆t) − pn (t) X X o(∆t)
= −pn (t) fj (n) + pn−j (t)fj (n − j) + . (3.2.4)
∆t ∆t
j j
15
16 UNIT 3.
Multiplying (3.2.5) by xn , summing for all n, and using the definition of the probability generating function,
namely,
X∞
φ(x, t) = pn (t)xn , (3.2.6)
n=0
we get
∂φ XX XX
=− fj (n)pn (t)xn + pn−j (t)fj (n − j)xn−j (3.2.7)
∂t n n
j j
to get
∂ X
ψ x φ= ψ(n)pn (t)xn , (3.2.10)
∂x n
where ψ(x) is any polynomial functions of x. In order to find all the probabilities, we either sove the finite
system of differential-difference equations (3.2.5) or solve the partial differential equation (3.2.8) subject to
the initial conditions X
φ(x, 0) = pn (0)xn = xn0 , (3.2.11)
n
where n0 is the number of susceptible in the system at t = 0.
fj (r) = βr(n + 1 − r) (j = 1)
= 0 (j 6= 1) (3.2.13)
3.2. BASIC SYSTEM OF EQUATION 17
n
X
Substituting φ(x, t) = pr (t)xr in (3.2.14) and equating the coefficients of the various powers of x, we get
r=0
dpr
= β(r + 1)(n − r)pr+1 − βr(n − r + 1)pr (r = 0, 1, 2, . . . , n − 1), (3.2.16)
dt
dpn
= −βnpn (3.2.17)
dt
with initial conditions
pn (0) = 1, pr (0) = 0 (r = 0, 1, 2, . . . , n − 1). (3.2.18)
We can now follow either of the two procedures:
• We can solve the partial differential equation (3.2.14) subject to initial condition (3.2.15), or
• We can solve the system of n + 1 differential differential equations, namely, (3.2.16) and (3.2.17), subject
to initial conditions (3.2.18). We adopt the second procedure here.
Zt
−2β(n−1)t n h −nβt i
pn−1 (t) = e nβe−nβt e2(n−1)βt dt = e − e−(2n−2)βt . (3.2.21)
n−2
0
We can proceed in this way systematically step by step to find pn−2 (t), pn−3 (t), . . . , p0 (t).
18 UNIT 3.
Alternatively, we can use the Laplace transform method to solve (3.2.16) and (3.2.17) subject to (3.2.18).
Let
Z∞
qr (s) = e−st pr (t) dt. (3.2.22)
0
Multiplying both sides of (3.2.16) and (3.2.17) by e−st and integrating over the range 0 to ∞, we get
Z∞ Z∞ Z∞
−st dpr −st
e dt = β e (r + 1)(n − r)pr+1 dt − βr(n − r + 1) e−st pr dt
dt
0 0 0
Z∞ Z∞
dpn
e−st dt = −βn e−st pn dt.
dt
0 0
sqr (s) = β(r + 1)(n − 1)qr+1 (s) − βr(n − r + 1)qr (s), r = 0, 1, 2, . . . , n − 1, (3.2.23)
sqn (s) = 1 − βnqn (s) (3.2.24)
From (3.2.23),
β(r + 1)(n − r)
qr (s) = qr+1 (s)
[s + r(n − r + 1)β]
β 2 (r + 1)(n − r)(r + 2)(n − r − 1)
= qr+2
[s + r(n − r + 1)β][s + (r + 1)(n − r)β]
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . . . . . . .
β n−r [(r + 1)(r + 2) · · · (r + n − r)][(n − r)!]
= n−r+1
Q h i
s + j(n − j + 1)β
j=1
n−r+1
β n−r [ n! ][ (n − r)! ] Y 1
= h i (r = 0, 1, 2, . . . , n − 1) (3.2.25)
r! s + j(n − j + 1)β
j=1
1
qn (s) = . (3.2.26)
s + nβ
By inverting the Laplace transforms, we can find pr (t). This can be easily done by splitting the product on the
right-hand side of Eq. (3.2.25) into partial fractions.
• If r > n/2, there are no repeated factors, and this is relatively easier.
Course Structure
• Stochastic Epidemic Model with Multiple Infections
Here rj be the contact rates for j infections. In such a case, the basic partial differential equation will have
the form
m
∂φ X
−j ∂ ∂
= (x − 1)βj x n+1−x φ
∂t ∂x ∂x
j=1
m
∂2φ
∂φ X 1 j ∂φ ∂φ
⇒ = (1 − x )βj (n + 1) − −x 2
∂t xj−1 ∂x ∂x ∂x
j=1
m
1 − xj ∂2φ
∂φ X ∂φ
⇒ = βj n −x 2
∂t xj−1 ∂x ∂x
j=1
∂2φ
∂φ ∂φ 1 1 1
⇒ = n −x 2 β1 (1 − x) + β2 − x + β3 − x + · · · + βm −x
∂t ∂x ∂x x x2 xm+1
19
20 UNIT 4.
This results is equivalent to (3.2.14) if one takes for m = 1 and βm = β. We can also write the system of
differential difference equations from first principle and solve those one-by-one directly or by using Laplace
transformation technique.
Let pm,n (t) be the probability that there are m susceptibles and n infectives in the population at time t. If N
is the total size of the population, then the number of persons in the removed category is N − m − n.
Let the probability of susceptible being infected in the time interval (t, t + ∆t) be βmn∆t + o(∆t), and
let the corresponding probability of one infected being removed in the same time interval be γn∆t + o(∆t).
The probability of not having any change in this time interval is
Now there can be m susceptibles and n infected persons at time t + ∆t if there are
(i) m + 1 susceptibles and n − 1 infectives at time t and if one person has become infected in time ∆t, or
(ii) m susceptibles and n + 1 infectives at time t and if one infected person has been removed in time ∆t,
or
(iii) m susceptibles and n infectives at time t and if there is no change in time ∆t.
We assume, as usual, that the probability of more than one change in time ∆t is o(∆t). Then, using the
theorem of total and compound probability, we get
dh i
pm,n (t) = β(m + 1)(n − 1)pm+1,n−1 (t) − βmnpm,n (t)
dt
+γ(n + 1)pm,n+1 (t) − γnpm,n (t). (4.1.2)
Initially, let there be s susceptibles and a infectives. Then we define the probability generating function by
s s+a−m
X X
φ(x, y, t) = pm,n (t)xm y n . (4.1.3)
m=0 n=0
4.1. OTHER STOCHASTIC EPIDEMIC MODELS 21
s s+a−m s s+a−m
∂ X X X X
pm,n (t)xn y n = βy 2 pm+1,n−1 (t)(m + 1)(n − 1)xm y n−2
∂t
m=0 n=0 m=0 n=0
Xs s+a−m
X
−βxy pm,n (t)mnxm−1 y n−1
m=0 n=0
s s+a−m
X X
+γ pm,n−1 (t)(n + 1)xm y n
m=0 n=0
Xs s+a−m
X
−γy pm,n (t)nxm y n−1 . (4.1.4)
m=0 n=0
∂φ ∂2φ ∂φ
= β(y 2 − xy) + γ(1 − y) . (4.1.5)
∂t ∂x∂y ∂y
Now the equation (4.1.5) can be solved subject to the initial condition
s a 1; m = s, n = a
φ(x, y, 0) = x y since pm,n (0) = (4.1.6)
0; otherwise.
∂φ ∂2φ ∂φ
= (x−1 y 1 − 1)βxy + (x0 y −1 − 1)γy
∂t ∂x∂y ∂y
∂φ 2
∂ φ ∂φ
⇒ = β(y 2 − xy) + γ(1 − y) (4.1.8)
∂t ∂x∂y ∂y
It is worthwhile to note here that Eq. (4.1.5) and Eq. (4.1.8) are identical.
f−1,−1 (m, n) = βmn, f0,−1 (m, n) = γn, f1,0 (m, n) = µ, f0,1 (m, n) = γ, f−1,0 (m, n) = δn
∂φ ∂2φ ∂φ
= (x−1 y 1 − 1)βxy + (x0 y −1 − 1)γy + (xy 0 − 1)µφ
∂t ∂x∂y ∂y
∂φ
+(x0 y 1 − 1)νφ + (x−1 y 0 − 1)δx
∂x
∂φ ∂ 2φ ∂φ ∂φ
⇒ = β(y 2 − xy) + γ(1 − y) + µ(x − 1)φ + ν(y − 1)φ + δ(1 − x) . (4.1.9)
∂t ∂x∂y ∂y ∂x
In the absence of immigration and emigration, (4.1.9) gives
∂φ ∂2φ ∂φ
= β(y 2 − xy) + γ(1 − y) . (4.1.10)
∂t ∂x∂y ∂y
It is worthwhile to note here that Eq. (4.1.8) and Eq. (4.1.10) are identical.
∂φ ∂2φ ∂φ
= (x−1 y 0 − 1)βxy + (x0 y −1 − 1)γy
∂t ∂x∂y ∂y
∂φ 2
∂ φ ∂φ
⇒ = βy(1 − x) + γ(1 − y) (4.1.12)
∂t ∂x∂y ∂y
If we allow immigration and emigration of susceptibles and carriers, we get
∂φ ∂2φ ∂φ ∂φ
= βy(1 − x) + γ(1 − y) + µ(x − 1) + ν(y − 1)φ + δ(1 − x) . (4.1.13)
∂t ∂x∂y ∂y ∂x
4.1. OTHER STOCHASTIC EPIDEMIC MODELS 23
∂2φ ∂2φ
∂φ −1 1 0 ∂φ
= (x y z − 1) βxy + xz + (x0 y 0 z −1 − 1)δz
∂t ∂x∂y ∂x∂z ∂z
2 2
∂φ ∂ φ ∂ φ ∂φ
⇒ = (y − x) βy + γz + δ(1 − z) . (4.1.15)
∂t ∂x∂y ∂x∂z ∂z
24 UNIT 4.
Unit 5
Course Structure
• Basic model for inheritance of genetic characteristic
5.1 Introduction
Population genetics deals with genetic differences within and between populations, and is a part of evolu-
tionary biology. Studies in this branch of biology examine such phenomena as adaptation, speciation, and
population structure. Population genetics was a vital ingredient in the emergence of the modern evolutionary
synthesis. Traditionally a highly mathematical discipline, modern population genetics encompasses theoreti-
cal, lab, and field work. Population genetic models are used both for statistical inference from DNA sequence
data and for proof/disproof of concept.
When two individuals mate, the offspring gets from each parent either of the two forms of genes with the
same probability 1/2. Thus, if (G, G) is crossed with (G, g), there are four possibilities:
25
26 UNIT 5.
(i) The offspring gets the first G from the first parent and G from the second parent. The probability of this
is 1/2 × 1/2 = 1/4, and the offspring is D.
(ii) The offspring gets the first G from the first parent and g from the second parent. The probability of this
is 1/2 × 1/2 = 1/4, and the offspring is H.
(iii) The offspring gets the second G from the first parent and G from the second parent. The probability of
this is 1/2 × 1/2 = 1/4, and the offspring is D.
(iv) The offspring gets the second G from the first parent and g from the second parent. The probability of
this is 1/2 × 1/2 = 1/4, and the offspring is H.
Thus the probabilities of the offspring being D, H, R are 1/2, 1/2, 0 respectively. Arguing in the same
way, we get the results given in below which gives the probabilities of the offspring being D, H, R when
these are crossed with D, H, R, in that order.
The three fundamental genetic matrices which refer to mating with D, H, R, respectively, are obtained as
follows:
1 0 0 1/2 1/2 0 0 1 0
A = 1/2 1/2 0 , B = 1/4 1/2 1/4 , C = 0 1/2 1/2 . (5.2.1)
0 1 1 0 1/2 1/2 0 0 1
Each of these matrices is a stochastic matrix since all of its elements are non-negative and the row sums are
unity (this is so because in each case the probability that the offspring is a D or H or R is unity).
Let us consider a population in which the probabilities of a person being D, H, R are p, q, r, respectively,
so that p + q + r = 1. We shall call P = (p, q, r) the probability vector of the population.
If each individual in this population is mated with a dominant, then the first matrix gives:
0 · p + 0 · q + 0 · r = 0. (5.2.4)
5.2. BASIC MODEL FOR INHERITANCE 27
Thus the probability vector for the first generation, on population being mated with pure dominants, is ob-
tained by taking the product of the row matrix P with the first matrix A, i.e., it is given by P A. Similarly, the
probability vector for the first generation when population with the probability vector P is mated with pure
hybrids (pure recessives) is given by P B (P C).
If the original population is mated with dominants, hybrids, dominants, recessives, hybrids, in that order,
the probability vector for the fifth generation is given by P ABACB.
Similarly,
1 1 1
PB = P ⇒ P = , , , (5.2.6)
4 2 4
P C = P ⇒ P = (0, 0, 1) . (5.2.7)
Now, if the population with the probability vector P is crossed with pure dominant n times, the probability
vector of the n-th generation is given by P An . To find this, An has to be determined, and this is easily
done by
1
first diagonalising the matrix A. Thus, since the eigenvalues of the matrix A are easily found to be 1, , 0
2
and the corresponding eigenvectors are (1, 1, 0), (0, 1, 2) and (0, 0, 1), we can write
1 0 0 1 0 0 1 0 0
A = 1 1 0 0 1/2 0 −1 1 0 = S ∧ S −1 (5.2.8)
1 2 1 1 0 0 1 −2 1
so that
An = (S ∧ S −1 )(S ∧ S −1 ) · · · (S ∧ S −1 ) = S ∧n S −1
1 0 0 1 0 0 1 0 0
= 1 1 0 0 1/2n 0 −1 1 0
1 2 1 1 0 0 1 −2 1
1 0 0
= 1 − 21n 1
2n 0 , (5.2.9)
1 1
1 − 2n−1 2n−1 0
Therefore, q r q r
P An = 1 − n − n−1 , n + n−1 , 0 (5.2.10)
2 2 2 2
n
As n tends to infinity, P A approaches to the vector (1, 0, 0).
Thus, if any population is mated at random with only dominants successively, we find that (i) the recessives
never appear, (ii) the proportion of hybrids tends to zero, and (iii) the proportion of dominants tends to unity.
Even if the original population consists of only recessives, we shall have a proportion 15/16 of dominants
in the fifth generation and a proportion of 511/512 of dominants in the tenth generation. Thus, if dominants
28 UNIT 5.
are desired, we can transform even a breed of recessives into a breed of dominants in a number of generations
by repeated mating with dominants.
Exercise 5.2.1. 1. Prove that P AB 6= P BA, P BC 6= P CB, and P CA 6= P AC. What conclusions
can you draw?
2. Suppose an individual of unknown genotype is crossed with a recessive and the offspring is again
crossed with a recessive, and so on. Show that, after a long period of such breeding, it is almost certain
that the offspring will be a recessive genotype.
Consider random mating or panmixia in a population with probability vector P . The probability vectors for
mating with D, H, R are given by P A, P B, P C, but the relative proportions of D, H, R in the population
are p, q, r so that the probability vector for the first generation, say F1 , is given by
" 2 #
1 2
1 1 1
p PA + q PB + r PC = p+ q , 2 p+ q r+ q , r+ q
2 2 2 2
= (p0 , q 0 , r0 ) = P 0 (say). (5.3.1)
The three components of the probability vector for the second generation, say F2 , are then given by
2 " #2
1 2 1
0 1 0 1 1
p + q = p+ q + ·2 p+ q r+ q
2 2 2 2 2
2 2
1 1 1
= p+ q p+ q+r+ q
2 2 2
1 2
= p+ q
2
0
= p, (5.3.2)
5.3. HARDY-WEINBERG LAW 29
" 2 #
1 1 1 1 1
2 p0 + q 0 r0 + q0 = 2 p+ q + p+ q r+ q
2 2 2 2 2
" 2 #
1 1 1
× r+ q + p+ q r+ q
2 2 2
1 1 1 1 1 1
= 2 p+ q p+ q+r+ q r+ q r+ q+p+ q
2 2 2 2 2 2
1 1
= 2 p+ q r+ q
2 2
0
= q, (5.3.3)
2 " 2 #2
1 1 1 1
r0 + q0 = r+ q + p+ q r+ q
2 2 2 2
2 2
1 1 1
= r+ q r+ q+p+ q
2 2 2
2
1
= r+ q
2
0
= r, (5.3.4)
Thus the probability vector for F2 is the same as that for F1 . This shows that, due to random mating, the
probability vectors for the first generation and all succeeding generations are same. This is known as the
Hardy-Weinberg law, called after the mathematician G. H. Hardy and the geneticist W. Weinberg.
Note: We may note that the Hardy-Weinberg law holds for a gene if the mating is random with respect to
that gene. Thus, in human populations, the law is likely to hold for genes for blood groups, since, in general,
people do not worry about blood groups when marrying, but the law may not hold for the gene determining
heights since tall people, in general, tend to marry tall people.
Note: If we can identify the three genotypes for a particular gene in a population and if their relative pro-
portions verify (5.3.6), then it confirms that mating is likely to be random for that gene. If (5.3.6) can not be
verified, it may be due to non-random mating or differential mortality of dominant and recessive genes.
Note: In general, however, it is not easy to distinguish between the three genotypes. If G is dominant to
g, then individuals having (G, G) (G, g) have the same appearance and belong to the same phenotype. Thus,
while there are three genotypes, only two distinct phenotypes exist, namely, {(G, G), (G, g) } and {(g, g},
with respect to a gene.
30 UNIT 5.
The individuals with (G, G), (g, g) are said to be homozygous and the individual with (G, g) are said to be
heterozygous.
The Hardy-Weinberg law can also be stated in terms of gene frequencies. If P, Q are the relative gene
frequencies in a population and (p, q, r) are the relative frequencies of (G, G), (G, g) and (g, g), then it is
easily seen that
1 1
P = p + q, Q = q + r. (5.3.7)
2 2
Knowing p, q, r, we can find P and Q uniquely, but knowing P and Q, we cannot find p, q, r uniquely.
However, for random mating, (5.3.6) and (5.3.7) give
√ √
P = 1 − r, Q = r. (5.3.8)
Now, in any generation, if the relative frequencies of genes G and g are P and Q provided P + Q = 1, in both
males and females, then, in random mating, the probability of an offspring getting G from both parents is P 2 ,
the probability of its getting G from one parent and g from the other parent is 2P Q, and the probability of its
getting g from both parent is Q2 , so that the proportions in F1 are
From (5.3.2) - (5.3.4) and (5.3.7), wet the Hardy-Weinberg ratio. The relative gene frequencies in F1 are
so that the proportions of genes are the same as in the original population and F2 has the ratios given by
(5.3.9). This again confirms the Hardy-Weinberg law.
Unit 6
Course Structure
• Correlation between Genetic Composition of Siblings
• Bayes Theorem and Its Applications in Genetics
1 1
= p2 + pq + q 2
2 16
√ 4 √ √ 1 √
= (1 − r) + r(1 − r)3 + r(1 − r)2
4
1 √ 2 √ 2
= (1 − r) (2 − r) . (6.1.2)
4
31
32 UNIT 6.
Similarly,
1√ √ √
P (Y1 = D, Y2 = H) = P (Y1 = H, Y2 = D) = r(1 − r)2 (2 − r), (6.1.3)
2
1 √
P (Y1 = D, Y2 = R) = P (Y1 = R, Y2 = D) = r(1 − r)2 , (6.1.4)
4
1 √ √
P (Y1 = H, Y2 = R) = P (Y1 = R, Y2 = H) = r(1 − r)(1 + r), (6.1.5)
√ √ √ 2
P (Y1 = H, Y2 = H) = r(1 − r)(1 + r − r), (6.1.6)
1 √
P (Y1 = R, Y2 = R) = r(1 + r)2 , (6.1.7)
4
If we assign arbitrary values 1, 0, - 1 to D, H and R, receptively, we get the bivariate probability distribution
as follows:
Y1 Y2 Probability
1 √ √
1 1 (1 − r)2 (2 − r)2
4
1 √
-1 -1 r(1 + r)2
4
√ √ √
0 0 r(1 − r)(1 + r − r)
1√ √ √
1 0 r(1 − r)2 (2 − r)
2
1√ √ √
0 1 r(1 − r)2 (2 − r)
2
1 √
1 -1 r(1 − r)2
4
1 √
-1 1 r(1 − r)2
4
1 √ √
0 -1 r(1 − r)(1 + r)
2
1 √ √
-1 0 r(1 − r)(1 + r)
2
The marginal distributions of Y1 and Y2 are the same and are given by
Y1 or Y2 1√ √ 0 √ -1
Probability (1 − r)2 2 r(1 − r) r
probabilities. The occurrence of an event changes our degrees of confidence in the sense that the probabilities
of some hypotheses may increase and of other may decrease. The new probabilities are called a posteriori
probabilities. Bayes theorem connects a posteriori and a priori probabilities.
Let H1 , H2 , . . . , Hn be n mutually exclusive hypothesis, and let their a priori probabilities be P (H1 ), P (H2 ),
. . . , P (Hn ). Now let an event A happen, and let the probabilities of happening of this event on the basis of
various hypotheses be given by P (A/H1 ), P (A/H2 ), . . . , P (A/Hn ). Our object is to find the posteriori
probabilities P (H1 /A), P (H2 /A), . . . , P (Hn /A) in terms of the known probabilities P (Hi ), P (A/Hi ), i =
1, 2, . . . , n.
so that
P (Hi )P (A/Hi )
P (Hi /A) = , i = 1, 2, . . . , n. (6.2.1)
P (A)
Since H1 , H2 , . . . , Hn are mutually exclusive and exhaustive hypotheses under consideration, we have, by the
theorem of total probability,
n
X n
X
P (A) = P (AH1 ) + P (AH2 ) + . . . + P (AHn ) = P (AHj ) = P (Hj )P (A/Hj ). (6.2.2)
j=1 j=1
Illustration - I
As an illustration of Bayes theorem in genetics, we investigate the probability that two blue-eyed boy twins
are monovular (i.e., from the same egg). Here we have two possible hypothesis:
(i) H1 : Both are from the same egg, i.e., both are monovular;
(ii) H2 : Both are from the different eggs, i.e., both are binovular.
To find P (H1 ) and P (H2 ), we remember that observation that 32 percent of all twin pairs are of unlike sex;
of the remaining 68 percent, half are expected to be monovular and the other half are expected to be binovular
so that
0.36 9 0.32 8
P (H1 ) = = , P (H2 ) = = . (6.2.4)
0.68 17 0.68 17
To find P (A/H1 ) and P (A/H2 ), we assume that mating is random and is genetically stable so that the
proportions of D, H, R are given by
√ √ √
p = (1 − r)2 , q = 2 r(1 − r), r=r (6.2.5)
34 UNIT 6.
Also, blue eyes are known to arise due to a recessive gene so that:
P (A/H1 ) = probability that both boys are recessive when they are from the same egg
= r, (6.2.6)
P (A/H2 ) = probability that both boys are recessive when they are from the different egg
= P (parents are Bb, Bb, and both children are bb)
+P (parents are Bb, bb, and both children are bb)
+P (parents are bb, Bb, and both children are bb)
+P (parents are bb, bb, and both children are bb)
1 1 1 1 1 1
= q 2 · · + qr · · + qr · · + r2 · 1 · 1
4 √4 √2 2 √ 2 2
r(1 − r)2 r r(1 − r)
= + + r2
4 1
1 √ √
= r(1 + r − 2 r + 4 r − 4r + 4r)
4
1 √
= r(1 + r)2 . (6.2.7)
4
Using (6.2.3), (6.2.4), (6.2.6) and (6.2.7), we can find P (H1 /A) and P (H2 /A).
Illustration - II
To illustrate another application of Bayes theorem, we consider the following problem. For mating of two
dominant-looking individuals, a dominant-looking individual is obtained. What is the probability that both
the parents are real dominants?
Exercise 6.2.1. 1. A flock of certain species of fowls consists of 117, 191 and 16 with blue, black, and
white plumages. Assuming that black and white plumages are the phenotypes corresponding to the ho-
mozygous genotypes (b, b) and (w, w) and the blue plumage corresponds to the heterozygous genotype
(w, b), find the genotype and gene frequencies.
2. In a certain human population, dominants, hybrids and recessives are 16 per cent, 48 per cent and 36
per cent, respectively. Given that a man is recessive and has a brother, show that the probability of the
brother being recessive is 0.66. What are the probabilities of the brother being a dominant or a hybrid?
3. Assuming Mendel’s law of independent assortment which postulates that, when there are tow or more
gene pair segregating at the same time, they do independently, prove that the double inter-cross AaBb×
AaBb results in four phenotypes, namely, AB, Ab, aB, and ab, in the ratios 9 : 3 : 3 : 1.
4. From the mating of two hybrids Gg and Gg, a dominant-looking offspring Gx is obtained. This in-
dividual is mated with another hybrid, and as a result, n individuals are obtained, all of whom look
dominant. What is the a posteriori probability that x = G?
5. From the mating of two dominant-looking individuals, n offspring are produced, of which r are reces-
sives. What is the probability that both the parents are hybrid?
36 UNIT 6.
Unit 7
Course Structure
• Extension of basic model for inheritance of genetic characteristics
G1 G1 G 2 G2 G1 G1 G2 g2 G1 G1 g2 G2 G1 G1 g2 g2
G1 g1 G2 G2 G1 g1 G2 g2 g1 G1 g2 G2 G1 g1 g2 g2
G1 g1 G2 G2 G1 g1 G2 g2 g1 G1 g2 G2 G1 g1 g2 g2
G1 g1 G2 G2 G1 g1 G2 g2 g1 G1 g2 G2 G1 g1 g2 g2
D1 D2 , D1 H2 , D1 R2 , H1 D2 , H1 H2 , H1 R2 , R1 D2 , R1 H2 , R1 R2
D1 D2 , D1 R2 , R1 D2 , R1 R2
37
38 UNIT 7.
Let us now generalize the case of n genes. For each gene, there are 4 possibilities and so the total number
of possibilities for n gene is 4n . For each gene, there are 3 genotypes and so the total number of genotypes
is 3n . For each gene, there are 2 phenotypes and so the total number of phenotypes for n genes is 2n . With
respect to each gene, there are 3 dominant phenotypes for each recessive phenotype.
Let us find how many phenotypes are dominant with respect to r genes. We can choose r genes in nr
ways and, corresponding to each of these, there are 3 dominant phenotypes and 1 recessive phenotype so that
of genotypes, which are dominant with respect to r genes and recessives with respect to n − r
the frequency
n r n−r
genes, is 3 1 , and the total of all these frequencies is
r
n
X n
3r = (3 + 1)n = 4n . (7.1.1)
r
r=0
Thus of the 4n possibilities with n genes, we have nr groups of 3n−r , each dominant with respect to n − r
genes for r = 0, 1, 2, . . . , n.
Thus we have one group of 3n , n1 groups of 3n−1 each, . . . nr groups of 3n−r each, . . . and one group
3n ,
|{z} 3n−1 , 3n−1 , . . . , 3n−1 , ... 3, 3, . . . , 3, 30
|{z}
| {z } | {z }
(n0 ) n
(1) n
(n−1 ) (nn)
The frequencies of phenotypes are given by coefficients in the expansion of (3x + y)n . Similarly, the frequen-
cies of genotypes are given by coefficients of (x + 2y + z)n , and the frequency of a genotype dominant with
respect to r genes, hybrid with respect to s genes, and recessive with respect to n − r − s genes is
n!
2n .
r! s! (n − r − s)!
We now get the results given in Table 1 for genotypes and blood groups of offspring. From Table 1, we
can deduce the table for the possible blood groups for the father when we know the blood group of mother
and child (Table 2). Table 2 is used in certain disputed legal cases to decide whether a certain child born of a
certain mother can be the child of a given male.
Again, if the proportions of alleles A, B, O in the population are p, q, r, then the proportions of persons
with blood groups A, B, AB, and O in the population are
If we know the division of the population according to blood groups, we can calculate p, q, r.
40
g1 g1 , g2 g2 , g3 g3 , ··· gn gn (7.3.1)
G1 G 1 , G2 G2 , G3 G3 , ··· Gn Gn . (7.3.2)
We shall call the individual having gene pairs (7.3.2) as belonging to the G-race. We are not implying here
that G’s are dominant and g’s are recessive. On crossing the given generation F0 with the G-race, we get the
first generation F1 , namely,
G1 g1 , G2 g2 , G3 g3 , ··· Gn gn , (7.3.3)
so that one g in each pair is replaced by the corresponding G. Our object is to replace the other g also by
successive crossing with the G-race. Successive crosses give us the generations
In every generation, there is a probability 1/2 that gi has been replaced Gi , and there is a probability 1/2 that gi
has not been replaced by Gi , and so the probability that (m + 1)-th generation still has gi is (1/2)m . Also, the
probability that r of the n replacements of genes have not taken place is given by the binomial distribution, as
r
1 n−r
n 1
1− m . (7.3.5)
r 2m 2
As m approaches infinity, the probability approaches unity, regardless of the value of n. Thus, ultimately all
genes gi will be replaced by genes Gi for i = 1, 2, . . . , n.
42 UNIT 7.
Unit 8
Course Structure
• Genetic Improvement through Elimination Recessives
In the n-th generation, if the proportions of dominants, hybrids, and recessives are pn , qn and rn , then, in
the (n + 1)-th generation, these proportions are
2 2
1 1 1 1
pn+1 = p n + qn , qn+1 = 2 p n + qn rn + qn , rn+1 = rn + qn . (8.1.1)
2 2 2 2
In the n-th generation, if recessives are eliminated, then the new proportions in the (n + 1)-th generation are
given by
1 0 2 1 0 2
0 0 1 0 1 0
pn+1 = pn + qn , qn+1 = 2 pn + qn q , rn+1 = q . (8.1.2)
2 2 2 n 2 n
where p0n , qn0 are the new proportions in the n-th generation after elimination of the recessives so that
p0n pn
= , p0n + qn0 = 1. (8.1.3)
qn0 qn
1 0 2
2
1 0 1 0
pn+1 = 1 − qn , qn+1 = qn0 1 − qn , rn+1 = q . (8.1.4)
2 2 2 n
43
44 UNIT 8.
p0n+1 0
qn+1 1
1 0 = 0
=
1 − 2 qn qn 1 + 12 qn0
0 qn0
⇒ qn+1 = (8.1.5)
1 + 12 qn0
1
un = (8.1.6)
qn
in (8.1.5), we get
1
un+1 = un + (8.1.7)
2
whose solution is
1
un = A + n (8.1.8)
2
so that
1
qn0 = . (8.1.9)
A + 12 n
√ √ √
To determine A, we make use of p = (1 − r)2 , q = 2 r(1 − r), r = r, to get
√
1 2 r
q10 = = √ (8.1.10)
p+q 1+ r
so that
1
A= √ . (8.1.11)
2 r
Also,
√
2 r
qn0 = √ , (8.1.12)
1+n r
1 0 2
0 r
rn+1 = qn = √ . (8.1.13)
2 (1 + n r)2
This gives the proportion of recessives in the (n + 1)-th generation. Given the proportion of recessives in the
original stable population, we can find, by using (8.1.13), the number of generation in which we can reduce
the proportion of recessives below any given limit by elimination of recessives at all stages. We can also find
that pn → 1, qn → 0, rn → 0 as n → ∞.
Instead of eliminating all the recessives, we may keep a fraction k of the recessives. The basic equations in
8.2. SELECTION AND MUTATION 45
and hence
Pn+1 (1 − K)(Pn2 /Q2n ) + (Pn /Qn )
= (8.2.3)
Qn+1 (Pn /Qn ) + (1 − k)
(1 − K)u2n + un Pn
⇒ un+1 = , where un = . (8.2.4)
un + (1 − k) Qn
This is a nonlinear difference equation of the first order. Knowing u1 , we can find, step by step, un . The
equilibrium solution is obtained by putting un = un+1 = u which gives
(1 − K)u2 + u
u=
u+1−k
⇒ u(uK − k) = 0 (8.2.5)
46 UNIT 8.
so that u = 0, or u = k/K, 1/u = 0 i.e. either dominants or recessives survive. However, a non-trivial
equilibrium solution is
u = k/K. (8.2.6)
Since this equilibrium solution has to be positive, both k and K have to be either positive or negative, i.e.,
either the hetrozygotes have to be the fittest or they have to be the least fit. If K = k, the equilibrium of G
and g are the same.
To discuss the stability of the equilibrium solution of (8.2.5), we note that (8.2.4) give
Kun k
un+1 − un = − un (8.2.7)
un + 1 − k K
or
un+1 − un Kun
= (8.2.8)
k/K − un un + 1 − k
and
un+1 − k/K un (1 − K) + (1 − k)
= (8.2.9)
un − k/K un + 1 − k
From (8.2.9), we deduce the following results:
(i) If 0 < k < K < 1 when un > k/K, we find that un+1 < un and un+1 > k/K, i.e., un+1
is nearer to k/K than un , and the sequence {un } monotonically decreases to k/K. On the other
hand, if un < k/K, then un+1 > un and un+1 < k/K so that the sequence {un } monotonically
increases to k/K. In the first case, we get a monotonically decreasing sequence bounded below; in the
second case, we get a monotonically increasing sequence bounded above. In either case, we find that,
if 0 < k < K < 1, then the equilibrium solution is stable.
(ii) If k and K are both negative and k/K < 1, then (8.2.8) and (8.2.9) show that un+1 − un , un − k/K,
and un+1 − k/K have the same sign so that, if un > k/K, then un+1 > k/K and un+1 > k/K and
un+1 > un , and hence un+1 is farther from k/K than un . Similarly, if un < k/K, then un+1 < k/K
and un+1 < un so that here too un+1 is farther from k/K than un . Thus, when k and K are both
negative, the equilibrium solution is unstable.
Thus, when the hybrids are the fittest, we get a stable equilibrium; when these are the least fit, we obtain an
unstable equilibrium. Now (8.2.8) can be written as
un+1 − un Kun k
= − un . (8.2.10)
(n + 1) − 1 un + 1 − k K
When the change in one generation is not substantially different from the changes in the preceding or suc-
ceeding generations (e.g., when K is very small or when there are small oscillations about the equilibrium
position), we can replace (8.2.10) by the differential equation
du Ku k
= −u (8.2.11)
dn u+1−k K
Similarly, we can discuss the balance between selection and mutation. Let the probabilities of survival of
D, H, R be S(1 − K), S(1 − K), and S, respectively, and let µ be the probability of a mutation from g to G
in one generation. Then we get
Pn+1 2S(1 − K)Pn + 2S(1 − K)Pn Qn + µ[2S(1 − K)Pn Qn + 2SQ2n ]
=
Qn+1 [2S(1 − K)Pn Qn + 2SQ2n ](1 − µ]
Pn Pn Pn Pn
Qn + 1 (1 − K) Qn + µ Qn + 1 − K Qn
= (8.2.13)
Pn Pn
Qn − 1 − K Qn (1 − µ)
so that
(un + 1)(1 − K)un + µ(un + 1 − Kun
un+1 = . (8.2.14)
(un + 1 − Kun )(1 − µ)
If we assume that un is very small (which is justified since mutation rates are small, i.e., of the order of 10−5 or
less), then genes with lower fitness level can be maintained only at very low frequency by mutation. Equation
(8.2.14) can now be written as
un+1 = un (1 − K) + µ, (8.2.15)
In equilibrium, this gives
u = µ/K. (8.2.16)
(i) If σ2 > σ1 , σ3 , then the first factor on the right-hand side of (8.3.4) is less than unity and
so that Pn → Pe as n → ∞, regardless of the initial value P0 . Therefore, the equilibrium point is stable
(see Fig. 1).
0 Pe Pn+1 Pn P2 P1 P0 1 0 P0 P1 P2 Pn Pn+1 Pe 1
P0 > Pe P0 < Pe
(ii) If σ2 < σ1 , σ3 , then the first factor on the right-hand side of (8.3.4) is greater than unity and
Hence the equilibrium is unstable. If P0 < Pe , then Pn → 0, and if P0 > Pe , then Pn → 1 (see Fig. 2).
0 Pe P0 P1 P2 Pn Pn+1 1 0 Pn+1 Pn P2 P1 P0 Pe 1
P0 > Pe P0 < Pe
These results are the same as those of §8.2 and show that the equilibrium is stable if the heterozygotes have
the greatest chance of survival and is unstable if the heterozygotes have the least chance of survival.
The convergence of a sequence {Pn } to a limit Pe is said to be geometric at the rate c for 0 < |c| < 1, 0 <
a < |c| if
|Pn − Pe | |Pn − Pe |
lim n
< ∞, lim = ∞. (8.3.9)
n→∞ c n→∞ an
The convergence is said to be algebraic if
Using (8.3.9) and (8.3.10), we find that, when σ2 > σ1 , σ3 , the convergence is geometric at the rate
2σ2 − σ1 − σ3 σ2 (σ1 + σ3 ) − 2σ1 σ3
1/ 1 + Pe Qe = . (8.3.11)
σ1 Pe + σ3 Qe σ22 − σ1 σ3
Unit 9
Course Structure
• Some basic concepts of fluid dynamics
• Hegen-Poiseuille Flow
• Non-Newtonian Fluids
9.1 Introduction
In large and medium sized arteries, those more typically affected by vascular diseases, blood can be mod-
elled by means of the Navier-Stokes (NS) equation for incompressible homogeneous Newtonian fluids. Non-
Newtonian rheological models are necessary for describing some specific flow processes, such as clotting or
sickle cell diseases, or more generally flow in capillaries. Let us recall some preliminary concepts of fluid
dynamics.
49
50 UNIT 9.
the element. The equations of motion, knows as Navier-Stokes equations, for the of a Newtonian viscous
incompressible fluid are
2
∂ u ∂2u ∂2u
∂u ∂u ∂u ∂u ∂p
ρ +u +v +w = X− +µ + + (9.2.2)
∂t ∂x ∂y ∂z ∂x ∂x2 ∂y 2 ∂z 2
2
∂2v ∂2v
∂v ∂v ∂v ∂v ∂p ∂ v
ρ +u +v +w = Y − +µ + + (9.2.3)
∂t ∂x ∂y ∂z ∂y ∂x2 ∂y 2 ∂z 2
2
∂ w ∂2w ∂2w
∂w ∂w ∂w ∂w ∂p
ρ +u +v +w = Z− +µ + + (9.2.4)
∂t ∂x ∂y ∂z ∂z ∂x2 ∂y 2 ∂z 2
If the external body forces X, Y, Z form a conservative system, there exists a potential function Ω such that
∂Ω ∂Ω ∂Ω
X=− , Y =− , Z=−
∂x ∂y ∂z
(9.2.5)
∂p ∂ ∂p ∂ ∂p ∂
X− = − (Ω + p), Y − = − (Ω + p), Z− = − (Ω + p)
∂x ∂x ∂y ∂y ∂z ∂z
so that p is effectively replaced by p + Ω.
If X, Y, Z are known or are absent, (9.2.1)-(9.2.4) give a system of four coupled nonlinear partial differ-
ential equations for the four unknown functions u, v, w, and p. These equations have to be solved subject
to certain initial conditions giving the motion of the fluid at time t = 0 and certain prescribed boundary
conditions on the surfaces with which the fluid may be in contact or conditions which may hold at very large
distances from the surfaces. Usually, the boundary conditions are provided by the no-slip condition according
to which both tangential and normal components of the fluid velocity vanish at all points of the surfaces of the
stationary bodies with which the the fluid may be in contact. However, if a body is moving, then the tangential
and normal components of the fluid velocity at any point of contact are the same as those of the moving body
at that point.
(iv) The basic equations (9.2.2)-(9.2.4) can also be simplified when the motion is axially symmetric, i.e.,
when it is symmetrical about an axis. Here we use the cylindrical polar coordinate (r, θ, z), where the
axis of symmetry is taken as the axis of z. There are, in general, three components of velocity, namely,
vr along the radius vector perpendicular to the axis, vθ perpendicular to the axis and the radius vector,
and vz parallel to the axis of z. For the axi-symmetric case, we take vθ = 0, and we also take vr , vz and
p to be independent of θ. In this case, the equation of continuity and the equations of motion are given
by
1 ∂ ∂
(rvr ) + vz = 0, (9.2.14)
r∂r ∂z 2
∂ 2 vr
∂vr ∂vr ∂vr ∂p ∂ vr 1 ∂vr vr
ρ + vr + vz =− +µ + + − 2 , (9.2.15)
∂t ∂r ∂z ∂r ∂r2 ∂z 2 r ∂r r
2 2
∂vz ∂vz ∂vz ∂p ∂ vz ∂ vr 1 ∂vr
ρ + vr + vz =− +µ 2
+ 2
+ . (9.2.16)
∂t ∂r ∂z ∂z ∂r ∂z r ∂r
We can satisfy (9.2.14) by introducing the stream function ψ defined by
1 ∂ψ 1 ∂ψ
= vz , = −vr (9.2.17)
r ∂r r ∂z
Substituting (9.2.17) in (9.2.15) and (9.2.16) and eliminating p, we get the fourth-order partial differen-
tial equation for ψ, as
∂ 1 ∂(ψ, D2 ψ) 2 ∂ψ 2
(D2 ψ) − − 2 D ψ = νD4 ψ, (9.2.18)
∂t r ∂(r, z) r ∂z
where
∂2 1 ∂ ∂2
D2 ≡ − + , D2 ψ = D2 (D2 ψ). (9.2.19)
∂r2 r ∂r ∂z 2
After solving for ψ, we can obtain pressure p and vorticity ω by using the equation
" 2 #
∂ 2 p ∂ 2 p 1 ∂p 2 ∂ 2 ψ ∂ 2 ψ 1 ∂ψ ∂ 2 ψ 1 ∂ψ ∂ 2 ψ
∂ψ
+ + = − − + − (9.2.20)
∂r2 ∂z 2 r ∂r r ∂z 2 ∂r2 r ∂r ∂z ∂z∂r r ∂r ∂z 2
1 ∂ 2 ψ 1 ∂ψ ∂ 2 ψ
2
ω = −D ψ = − 2 − + . (9.2.21)
r ∂z 2 r ∂r ∂z 2
52 UNIT 9.
We consider steady flow when there is only one velocity component parallel to the axis so that vr = 0, vθ =
0, and vz = v. Then the equation of continuity gives
vz = v(r). (9.3.1)
The equations of motion, (9.2.15) and (9.2.16), now give
∂p d2 v 1 dv 1 ∂p
= 0, + = . (9.3.2)
∂r dr2 r dr µ ∂z
∂p
From (9.3.2), − must be a constant. Let us denote this constant pressure gradient by P . Then (9.3.2) gives
∂z
1 d dv P
r =− . (9.3.3)
r dr dr µ
Integrating (9.3.3) twice, we get
dv 1 P r2
r = − P r2 + A, v(r) = − + A ln r + B, (9.3.4)
dr 2µ 4µ
but velocity on the axis (i.e., at r = 0) must be finite, giving A = 0, and it should vanish on r = a because of
the no-slip condition so that
P a2 P 2
B= , v= (a − r2 ). (9.3.5)
4µ 4µ
The velocity is zero on the surface and is maximum on the axis. In fact, the velocity profile is parabolic and
in the three-dimensional space, it may be regarded as a paraboloid of revolution.
The total flux across any section, i.e., the total volume of the fluid crossing any section per unit time, is
given by
Za
πa4
Q = 2πr v dr = P. (9.3.6)
8µ
0
The result that the flux is proportional to the pressure gradient and to the fourth power of the radius of the
tube was discovered experimentally by Hagen and rediscovered independently by Poiseuille. The importance
of this result is that it can be confirmed experimentally and can be used to determine µ.
9.4. INLET LENGTH FLOW 53
ρU 2 L2 ρU L UL
Re = = = (9.5.1)
µLU µ µ
where µ = µ/ρ is called the kinematic viscosity of the fluid. Now the dimensions of µ and ρU L are given by
Thus, Re is a dimensionless number. It is called Reynold’s number, after Osborn Reynold who in 1890 showed
that the fully developed Poiseuille flow in a circular tube changes from stream line or laminar flow to turbulent
flow when this number, based on the diameter of the tube, exceed a critical value of about 2000.
54 UNIT 9.
When Reynold number is small, viscous forces dominate over inertial forces. If we neglect the inertial
forces, which we can justifiably do when Re << 1, (9.2.13) and (9.2.18) give
∇4 ψ = 0. (9.5.4)
(i) lubrication theory, which we shall find useful in our study of lubrication of human joints;
(ii) microcirculation or flows of blood in blood vessel of diameter less than 100 µm;
(iii) air flows in alveolar passages of diameter less than a few hundred micron; and
τ = µe, (9.6.1)
where µ is the constant coefficient of viscosity. We have fluids for which µ itself may be a function of strain
rate, i.e., for which stress becomes a non-linear or non-homogeneous function of strain rate (see Fig. 9.3).
Such fluids are called non-Newtonian fluids. One important call of non-Newtonian fluids is that of power-law
fluids with constitutive equations
τ = µen = µen−1 e. (9.6.2)
If n < 1, we get a pseudo-plastic power-law fluid in which the effective viscosity coefficient µen−1
decreases with intreaing strain rate.
If n > 1, we a dilatant power-law fluid in which the effective viscosity coefficient increases with
increasing strain rate.
If n = 1, Eq.(9.6.2) gives the Newtonian viscous fluid as a special case.
Another important non-Newtonian fluid, namely, the Bingham plastic, has the constitutive equation
τ = µe + τ0 (τ ≥ τ0 ),
(9.6.3)
e=0 (τ ≤ τ0 ).
It shows an yield stress τ0 and, if τ < τ0 , no flow takes place. Some other laws which have been proposed for
special non-Newtonian fluids are:
• Herschel-Bulkley fluid
τ = µen + τ0 (τ ≥ τ0 ),
(9.6.4)
e=0 (τ ≤ τ0 ).
• Casson fluid
1 1 1 1
τ 2 = µ 2 e 2 + τ02 (τ ≥ τ0 ),
(9.6.5)
e=0 (τ ≤ τ0 ).
• Prandtl fluid e
τ = A sin−1 (9.6.6)
c
• Prandtl-Eyring fluid e
τ = Ae + B sin−1 (9.6.7)
c
(i) the plates are at rest and there is an external pressure gradient;
(ii) one plate is moving in relation to the other and there is no external constant pressure gradient;
(iii) one plate is moving in relation to the other and there is also an external constant pressure gradient.
3. For steady motion between coaxial circular cylinders, show that
b2 ln(r/a) − a2 ln(r/b)
ln(r/b) ρ 2
v=V − r − ,
ln(a/b) 4π ln(b/a)
where the inner cylinder moves with velocity V and the outer cylinder is at rest. Show also that
" #
1 2 2)
− (b2 − a2 )
2 (b a 2 πρ 4 4
Q = πV −a + b −a − . (9.6.8)
ln(b/a) 8µ ln(b/a)
56 UNIT 9.
Unit 10
Course Structure
• Basic Concepts about Blood
10.1 Basic Concepts about Blood, Cardiovascular System and Blood Flow
10.1.1 Constitution of Blood
Blood consists of a suspension of cells in an aqueous solution called plasma which is composed of about 90
per cent water and 7 per cent protein. There are about 5 × 109 cells in a millilitre (1 cc) of healthy human
blood, of which about 95 per cent are red cells or erythrocytes whose main function is to transport oxygen
from the lungs to all the cells of the body and the removal of carbon-dioxide formed by metabolic processes in
the body to the lungs. About 45 per cent of the blood volume in an average man is occupied by red c ells. This
fraction is known as the hematocrit. Of the remaining, white cells or leucocytes constitute about one-sixth or
1 per cent of the total, and these play a role in the resistance of the body to infection; platelets form 5 per cent
of the total, and they perform a function related to blood clotting.
(i) τ = µen (power law equation). This is found to hold good for strain rates between 5 and 200 sec−1 ,
with n having a value between 0.68 and 0.80.
57
58 UNIT 10.
1/2
(iii) τ 1/2 = µ1/2 e1/2 + τ0 (Casson equation). This holds for strain rated between 0 and 100000 sec−1 .
The yield stress arises because, at low shear stress, red cells form aggregates in the form of rouleaux which
are stacks of red cells in the shape of a roll of coins (see Fig. 10.1). At some finite stress, which is usually
small (of the order of 0.005 dyne/cm2 ), the aggregate is disrupted and blood begins to flow.
For hematocrits exceeding 5.8 per cent, it has been found that the yield stress is given by
1/2
τ0 = A(H − Hm )/100, (10.1.1)
where A = (0.008 ± 0.002 dyne/cm2 )1/3 , H is the normal hematocrit, and Hm is the hematocrit below which
there is no yield stress. Taking H as 45 per cent and Hm as 5 per cent, the yield stress of normal human blood
should be between 0.01 and 0.06 dyne/cm2 .
Not only τ0 , but also τ and effective viscosity, depend significantly on the hematocrit. The effective viscosity
is also apparently found to depend on capillary radius when measurements are made in capillaries of diameters
less than 300 µm. This apparent dependence of viscosity on capillary radius is known as Fahraeus-Lindqvist
effect. We shall explain this effect which is based on the hypothesis of a two layer flow (a plasma layer and a
core layer) with different viscosities.
(i) The heart (which acts as a pump, whose elastic muscular walls contract rhythmically, making possible
the pulsatile flow of blood through the vascular system)
(ii) The distributory system (comprising arteries and arterioles for sending blood to the various organs of
the body)
(iii) The diffusing system (made up of fine capillaries which are in contact with the cells of the body)
(iv) The collecting system of veins (which collects blood depleted of oxygen and full of products of metabolic
processes of the system).
10.1. BASIC CONCEPTS ABOUT BLOOD, CARDIOVASCULAR SYSTEM AND BLOOD FLOW 59
The organs which supplement the function of the cardiovascular system are (i) the lungs which provide a
region of inter-phase transfer of O2 to the blood and removal of CO2 from it, and (ii) the kidney, liver, and
spleen, which help in maintaining the chemical quality of blood under normal conditions and under conditions
of extreme stress.
Deoxygenated blood enters the right atrium (RA) from where it goes to the right ventricle (RV), as shown
in Fig. 10.2. When the heart contracts, the tricuspid valve between the RA and RV closes and blood is pushed
out to the lung through the pulmonary artery (PA) which branches to the right and left lungs where CO2 is
removed and blood is oxygenated. The blood returns from the lungs through the pulmonary vein (PV) to left
atrium (LA) and then it goes to the left ventricle (LV) and from there, due to contraction of the heart, it enters
the aorta from which it travels to other arteries and the rest of the vascular system.
(vi) Unusual pulsatility of flows. This arises from the rhythmic action of the heart.
There is also an unusual separation of flows, leading to increased resistance to flow and undesirable effects,
e.g., hardening of arteries. The separation occurs due to various reasons, some of which are as follows (see
Fig. ??):
(ii) Atheroma of blood vessels or fatty degeneration of the inner walls of the blood vessel
(iii) Stenosis of heart valve or narrowing of the heart valve when the valve is fully open
(iv) Secular aneurysm or a sac-like permanent abnormal blood-filled dilatation of blood vessel, resulting
from a disease of the vessel wall
(i) The innermost layer called Tunika-Intiama, consist of thin layer of endothelial cells,
(ii) The middle layer called Tunika-Median consists of plain muscles and a network of elastic fibres, and
(iii) The outer most layer, called Tunika-Adventesia, is made up of fibrous tissues and elastic tissue.
Veins are the blood vessels which carrying blood to the heart. The venous cross-sectional area at any point is
larger than of arteries and the velocity of blood is considerably lower when the arteries break up into minute
vessels, they are turned to capillaries.
10.1. BASIC CONCEPTS ABOUT BLOOD, CARDIOVASCULAR SYSTEM AND BLOOD FLOW 61
Collagen: It is the most important structure element of animal. There is a high amount of collagen present
in bone materials. Collagen is relatively inextensible fibrous protein. The fibres can be identified by light or
electron microscope.
Elastin: Unlike collagen elastin is an extensible fibrous protein present in large amount in skin, blood
vessels, lung etc. The elastic behaviour of this structure is solely due to the presence of elastin,. The fact that
elastin never appears without collagen, leads us to think that there must be resembles in structure of both.
Smooth muscles: Muscles consist of many fibres held together by connective tissues. Their structure and
function varying widely in different organ and animal. One of the basic structure they are divided into smooth
and straight muscles.
Inhomogeneity: Usually the wall of blood vessels are inhomogeneous. But experimental investigations
showed that the outermost layer, adventesia has a very lose network and merges externally with the surround-
ing tissues. The inner most layer intima, is very tin and can be easily neglected. The remaining layer, the
media, is considered homogeneous containing a matrix of smooth muscles elastic and collagen.
Compressibility: A material is said to be compressible if it changes its volume when it subjected to stress.
It is said to be incompressible if the change of the volume is ignorable. The experimental studied showed that
there is 20-40% change in volume and hence, for practical purpose the compressibility of vascular tissue can
be considerably very small.
Anisotropy: Healthy arteries are highly deformable comfit structures and show a non-linear stress strain
response with a typical stiffening effect at high pressure. This stiffening effect, common to all biological
tissues is based on the recruitment of embedded wavy collagen fibrils which leads to the characteristics of
anisotropic behaviour of artery.
Visco-elasticity: For a perfectly elastic body, there must be a single valued relationship between the ap-
plied strain and resulting stress. But when artery is subject to a cyclically varying strain the stress response
exhibits a hysteresis loop called it cycle. The rate of decreases is very rapid in the beginning, but a steady state
is observed after a numbers of cycles.
62 UNIT 10.
Moreover, two main characteristic of visco-elastic martial as for example creep and stress relaxation wave
also observed in vascular tissue.
σ ε
p
ee
Relaxation Cr
ε1
t
t1 t1+p1 - p1 t1
In the first stage, increases under the constant stress. This phenomena is called creep. In the second stage,
the stress decrease under constant strain, i.e., the material relaxes. This phenomena is called stress relaxation.
Unit 11
Course Structure
• Steady non-Newtonian fluid flow in circular tubes
Due to the pressure gradient, there is a forward force P × 2π[(r + r dr) − r] = P × 2πr dr on it. Let the
stress be τ (r) at a distance r from the axis. Then the force on the inner cylindrical surface is 2πrτ , and the
63
64 UNIT 11.
Since v = 0 at r = R, we have
ZR
Q=π r2 e(r) dr. (11.1.10)
0
11.2. FLOW OF POWER-LAW FLUID IN CIRCULAR TUBE 65
Also,
ZR 1/n
1P nπ 1
Q= 2πr v dr = R n +3 . (11.2.3)
2µ 3n + 1
0
P × πrp2 = τ0 × 2πrp
⇒ rp = 2τ0 /P (11.3.1)
This gives the relative change in Q with τ0 . Fig. 11.3 illustrates the variation of f (cp ) with cp for various
values of n. The figure shows that:
11.4. FLOW OF CASSON FLUID IN CIRCULAR TUBE 67
(i) As τ0 increases (µ and n remaining the same), the flux decreases rapidly and approaches zero as cp
approaches unity.
(ii) If n < 1, the curve is always concave upwards; when n = 1, the curve is always a straight line in the
beginning and becomes concave upwards; and when n > 1, the curve is convex in the beginning and
becomes concave near cp = 1, and, therefore, it has a pint of inflexion.
(iii) If τ0 and µ are constant, the decline in Q is more when n < 1 and less when n > 1. If we put n = 1 in
(11.3.8) and (11.3.9), we get the results for the special case of a Bingham plastic.
(iv) If we put τ0 = 0, rp = 0 in (11.3.8), we get results for the special case of a power-law fluid. Further, if
we put n = 1, we obtain results for Poiseuille flow.
or
dv 1 P h 1/2 i2 1 P h √ i
=− r − rp1/2 = 2 rp r − r − rp . (11.4.3)
dr 2µ 2µ
Figure 11.4 shows the variation of g(cp ) with cp . This shows that, as τ0 increases (µ remaining the same),
the plug velocity or the maximum velocity of flow decreases rapidly till cp reaches 0.6 when the velocity
is reduced to about 6 per cent of the value and then it rises sightly. For blood, small changes in τ0 lead to
significant changes in maximum velocity.
The flux Q is given by
"
2 Pπ 8 √ 1 1
Q = πvp rp + rp (R7/2 − rp7/2 ) − (R4 − rp4 ) − rp (R3 − rp3 )
µ 21 8 3
#
2√ 3/2 2 2 1 2 2 2 1 2 2
− rp R (R − rp ) + R (R − rp ) + rp R(R − rp )
3 4 2
"
π P R4 2 πP R4 8 √ 1 1
= cp g(cp ) + cp (1 − c7/2 4 3
p ) − (1 − cp ) − cp (1 − cp )
4 µ µ 21 8 3
#
2√ 1 1
− cp (1 − c2p ) + (1 − c2p ) + cp (1 − c2p )
3 4 2
πP R4
= h(cp ), (say) (11.4.7)
8µ
so that
Q
= h(cp ). (11.4.8)
Q0
11.4. FLOW OF CASSON FLUID IN CIRCULAR TUBE 69
Figure 11.5 gives the graph of h(cp ) against cp . Its shows that, as τ0 increases (µ remaining the same), the
flux decreases rapidly till cp = 0.6 and till it has fallen to about 5 per cent of Q0 and then it rises again. For
blood, small changes in τ0 can make significant changes in Q. The Casson fluid flows in the tube takes place
only if rp < R, i.e., if
2τ0 < P R. (11.4.9)
70 UNIT 11.
Unit 12
Course Structure
• Fahraeus-Lindqvist Effect
In arteries, blood flows in two layers, a plasma layer near the walls consisting of only the plasma and almost
no cells and a core layer consisting of red cells in plasma (see Fig. 12.1). If µp and µc are the viscosities of
the two fluids, which are assumed Newtonian, we get
P
vp = (R2 − r2 ), R − δ ≤ r ≤ R, (12.1.1)
4µp
71
72 UNIT 12.
P P 2 µc
(R2 − r2 ) + R − (R − δ)2
vc = −1 , 0 ≤ r ≤ R − δ. (12.1.2)
4µc µc µp
Thus the velocity in the plasma layer is the same as it would be when the whole tube is filled with plasma, but
the velocity in the core layer is more than it would be when the whole tube is filled with the core fluid. This is
what is expected.
Now,
R−δ
Z ZR
Q = 2πrvc dr + 2πrvp dr
0 R−δ
" #
πP R4 δ 4
µp
= 1− 1− 1− . (12.1.3)
8µp R µc
If the whole tube were filled with a single Newtonian fluid with viscosity coefficient µ, we would have
πP R4
Q= . (12.1.4)
8µ
where µ is the effective viscosity of the two fluids taken together. From (12.1.5), it can be seen that the effective
δ
viscosity depends on R. In practice, << 1, and hence (12.1.5) gives
R
4δ µc
µ = µp 1 − −1 . (12.1.6)
R µp
We find that, as R decreases, µ decreases. This explains the Fahraeus-Lindqvist effect. Here it has been
assumed that δ is independent of R.
so that the equation of continuity and the equation of motion are given by
∂v ∂p
= 0, = 0, (12.1.8)
∂z ∂r 2
∂ v 1 ∂v ∂ 2 v
∂v ∂v 1 ∂p
+v =− +ν + + . (12.1.9)
∂t ∂z ρ ∂z ∂r2 r ∂r ∂z 2
From (12.1.7) and (12.1.8), v is a function of r and t only and p is a function of z and t only. From (12.1.10),
∂p/∂z is a function of t only. Thus for a pulsatile sinusoidal flow, we take
∂p √
= −P eiωt , (i = −1), (12.1.11)
∂z
v(r, t) = V (r)eiωt . (12.1.12)
This means that the real part gives the velocity for pressure gradient P cos(ωt) and the imaginary part gives
the velocity for the pressure gradient P sin ωt.
From (12.1.10)-(12.1.12),
d2 V
1 dV
iωV ρ = P + µ + (12.1.13)
dr2 r dr
2
d V 1 dV iω P
⇒ + − ρV = − . (12.1.14)
dr2 r dr µ µ
d2 y 1 dy
2
+ − k2 y = 0 (12.1.15)
dx x dx
is
y = AJ0 (ikx) + BY0 (ikx), (12.1.16)
where both J0 (x) and Y0 (x) are Bessel functions of zero order and are of the first and second kind, respec-
tively.
Since v and V have to be finite on the axis (i.e., at r = 0) and Y0 (0) is not finite, B has to be zero. Also,
because of the no-slip condition v(r) = 0 when r = R, we have
r
3 ωρ P
AJ0 i 2 R + = 0, B = 0. (12.1.18)
µ ωρi
Let
ωρ 2 ωR2
α2 = R = (12.1.19)
µ ν
so that
P 1
A = i , (12.1.20)
ωρ J0 (i3/2 α
" #
P J0 (i3/2 αs)
V (r) = − i 1 − , (12.1.21)
ωρ J0 (i3/2 α
where
r
s= . (12.1.22)
R
74 UNIT 12.
ZR
Q = v2πr dr
0
Z1
2
= 2φR vs ds
0
1
Z1
2πP R4
Z
iωt 1
= − ie s ds − J0 (i3/2 αs) s ds
µα2 J0 (i3/2 α)
0 0
3/2α
iZ
πP R4
2 xJ0 (x)
= − ieiωt 1 − dx . (12.1.24)
µα2 J0 (i3/2 α) i3 α2
0
But Z
xJ0 (x) dx = xJ1 (x) (12.1.25)
so that
" #
πP R4 iωt 2i i3/2 αJ1 (i3/2 α)
Q = − ie 1−
µα2 J0 (i3/2 α) α2
" #
πR4 2J1 (i3/2 α)
= − 2 iP 1 − 3/2 eiωt
µα i αJ0 (i3/2 α)
πR4 P
= X(α)eiωt (say). (12.1.26)
µα2 i
Now the series expansion for J0 (x) and J1 (x) are given by
1
J0 (x) = 1 − x2 + . . . , (12.1.27)
2
x (x/2)3 (x/2)5
J1 (x) = − 2 + 2 2 − ... (12.1.28)
2 1 ·2 1 ·2 ·3
For small values of α,
3
i3/2 α i3/2 α
2 − + ... i3 α2
2 2/2
1− + ... iα2
X(α) = 1 − 2 =1− 8
i3 α2
= + O(α4 ) (12.1.29)
3/2 1− + ... 8
i3/2 1 − i 2 α + . . . 4
πR4 P
Q= + O(α ) eiωt .
2
(12.1.30)
8
12.2. BLOOD FLOW THROUGH ARTERY WITH MILD STENOSIS 75
The stenosis growth usually passes through three stages, as shown in Fig. 12.2. In stage I, there is no
separation of flow and there is no back flow. In stage II, the flow is laminar, but separation occurs and there is
back flow. In stage III, turbulence develops in a certain region of the down stream. We shall discuss here only
Stage I, called mild stenosis.
The development of stenosis in artery can have serious consequences and can disrupt the normal functioning
of the circulatory system. In particular, it may lead to
(i) increased resistance to flow, with possible severe reduction in blood flow;
(ii) increased danger of complete occlusion (obstruction);
(iii) abnormal cellular growth in the vicinity of the stenosis, which increases the intensity of the stenosis;
and
(iv) tissue damage leading to post-stenosis dilatation.
76 UNIT 12.
equations of motion in cylindrical polar coordinates, it can be shown that the radial velocity can be neglected
in relation to axial velocity v which is determined by
2
∂p ∂ v 1 ∂v
0 = − +µ + , (12.2.3)
∂z ∂r2 r ∂r
∂p
0 = − , (12.2.4)
∂r
or
µ ∂ ∂v
−P (z) = r . (12.2.5)
r ∂r ∂r
The no-slip condition on the stenosis surface gives
v = 0 at r = R(z), −z0 ≤ z ≤ z0 ,
(12.2.6)
v = 0 at r = R0 , |z| ≥ z0 .
Thus for a mild stenosis, the main difference from the usual Poiseuille flow is that the pressure gradient and
axial velocity are functions of z also. However, for a stenosis in stage II or stage III, the radial velocity can be
significant, and turbulence may have to be considered. Obviously then, the analysis is more complicated.
Since Q is constant for all section of the tube, the pressure gradient varies inversely as the fourth power of the
surface distance of the stenosis from the axis of the artery so that it (the pressure gradient) is minimum at the
middle of the stenosis and is maximum at the ends.
78 UNIT 12.
Unit 13
Course Structure
• Peristaltic Flows in Tubes and Channel
• Long-wavelength Approximation
(ii) celia transport through the ducts efferents of the male reproductive organ,
79
80 UNIT 13.
The wide occurrence of peristaltic motion should not be surprising since it results physiologically from neuro-
muscular properties of any tubular smooth muscle.
We now consider peristaltic motion in channels or tubes. The fluid involved may be non-Newtonian (e.g.,
power-law, viscoelastic, or micropolar fluid) or Newtonian, and the flow may take place in two layers (a core
layer and a peripheral layer). The equations of motion in their complete generality do nt admit of simple
solutions and we have to look for reasonable approximations. For this we first transform these equations in
terms of dimensionless variables.
where is the amplitude ratio, λ the wavelength, and c the phase velocity of the waves. Now using Eq. 9.10
of Unit 9, the stream function ψ(X, Y ) for the two-dimensional motion satisfies the equation
ν∇4 ψ = ∇2 ΨT + ΨY ∇2 ΨX − ΨX ∇2 ΨY , (13.1.2)
U = ΨY , V = −ΨX . (13.1.3)
Assuming that the walls have only transverse displacements at all times, we get the boundary conditions as
2πac 2π
U = 0, V = ± sin (X − cT ) at Y = ±η(X, T ). (13.1.4)
λ λ
X Y cT Ψ a ac
x= , y= , t= , ψ= , δ= , Re = (13.1.5)
λ a λ ac λ ν
so that (13.1.2) becomes
2
2
∂2 2 ∂2 2 ∂2 2 ∂2
1 2 ∂ 2 ∂ 2 ∂ 2 ∂
δ + ψ= δ + ψt + ψy δ + ψx − ψx δ + ψy .
δ Re ∂x2 ∂y 2 ∂x2 ∂y 2 ∂x2 ∂y 2 ∂x2 ∂y 2
(13.1.6)
The boundary conditions becomes
Thus the basic partial differential equations and the boundary consdition together involve three dimensionless
parameters:
(i) The Reynolds number, Re determined by the phase velocity, half the mean distance between the plates,
and the kinematic viscosity. (This number is small if the distance between the walls is small or the
phase velocity is small or the kinematic viscosity is large.)
13.1. PERISTALTIC FLOWS IN TUBES AND CHANNEL 81
(ii) The wave number δ which is small if the wavelength is large as compared to the distance between the
walls.
(iii) The amplitude ratio which is small if the amplitude of the wave is small as compared to the distance
between the walls.
In obtaining the equations for the stream funtion, the pressure gradient was eliminated. Hence there may
arise a fourth dimensionless parameter, depending on the pressure gradient. Non-Newtonian fluids give rise
to additional dimensionless parameters, depending on the parameters occurring in the constitutive equations
of the fluids.
It is not possible to solve (13.1.2) for arbitrary values of δ, , Re and, therefore, this equation is solved
under, among others, the following alternative sets of assumptions:
(i) << 1, and Stoke’s assumption of slow motion so that inertial terms can be neglected.
where a is the undisturbed radius of the tube and the amplitude ratio, a(1 + ) and a(1 − ) are the maximum
and minimum disturbed radii, and λ is the wave velocity and c the phase velocity (see Fig. 13.1). Under the
a ac
assumptions << 1 and << 1, we conduct an order of magnitude study of the various terms in the
λ ν
equation of continuity and equations of motion in cylindrical polar coordinates to find
∂p ∂p
<< (13.1.9)
∂R ∂Z
82 UNIT 13.
Now it is convenient to use the moving coordinate system (r, z) travelling with the wave so that
r = R, z = Z − ct. (13.1.11)
In this system, p is a function of z only. The equations of continuity and motion reduce respectively to
∂ ∂
(ru) + (rw) = 0, (13.1.12)
∂r ∂z
∂ 2 w 1 ∂w
dp µ ∂ ∂w
=µ + = r , (13.1.13)
dz ∂r2 r ∂r r ∂r ∂r
where u and w are the velocity components for the motion of the fluid in relation to the moving coordinate
system.
∂h
u= , w = −c at r = h. (13.1.14)
∂t
Integrating (13.1.13) at the constant z, we obatin
1 dp 2
w = −c − (h − r2 ). (13.1.15)
4µ dz
To an observer moving with velocity c in the axial direction, the pressure and flow appear stationary. Hence
the flow rate q measured in the moving coordinate system is a constant, independent of position and time.
Now
Zh
q = 2π rw dr. (13.1.16)
0
Zr
∂w
ru = − r dr. (13.1.20)
∂z
0
13.1. PERISTALTIC FLOWS IN TUBES AND CHANNEL 83
cr3 2qr3
dh 2qr
u=− − + . (13.1.21)
dz h3 πh3 πh5
We now revert to the stationary coordinate system with the coordinates R, Z, the velocity components U, W ,
and the flow rate Q so that
W = w + c, U = u, (13.1.22)
Z h
Q = 2π W R dR or Q = q + πch2 . (13.1.23)
0
Let Q denote the time average of Q over a complete time period T for h so that
λ
T = (13.1.24)
c
ZT
1 2 1 2
Q = Q dt = q + πca 1+ . (13.1.25)
T 2
0
Here h is determined as a function of Z and t from (13.1.26), and q is known from (13.1.25) after Q is deter-
mined experimentally.
To determine the pressure drop across a length equal to the wavelength λ, we integrate (13.1.18) to get
Zλ Zλ
8µq dz 8µc dz
(∆p)k = − 4 4 − 2 2
πa 2π πa 2π
1 + sin λ z 1 + sin λ z
0 0
Z2π
πca2
4µλ q
= − 2 4 + dr
π a [1 + sin τ ]4 [1 + sin τ ]2
0
2 + 32 2πca2
4µλ
= − 4 q + . (13.1.30)
πa (1 − 2 )7/2 (1 − 2 )3/2
84 UNIT 13.
a2 (1 − 2 )2
q = −2πc , (13.1.31)
2 + 32
and then from (13.1.25),
πa2 c(162 − 4 )
Q= . (13.1.32)
2(2 + 32 )
Substituting (13.1.31) in (13.1.28) and (13.1.29), we get
4ca2 (1 − 2 )2 R2
2πacR 2π 2
U = − cos (Z − ct) R + 1− 2 , (13.1.33)
λh3 λ 2 + 32 h
2a2 (1 − 2 )2 R2
W = 2c 1 − 2 1 − . (13.1.34)
h (2 + 32 ) h2
For every fixed z, we can draw the velocity profiles U/c and W/C in the special case (∆p)λ = 0. If
(∆p)λ 6= 0, then velocity profiles will depend also on q.
Unit 14
Course Structure
• Two Dimensional Flow in Renal Tubule
This reabsorption or seepage creates a radial component of the velocity in the cylindrical tubule, which
must be considered along with the axial component of the velocity (see Fig. 14.1). Due to loss of fluid from
the walls, both the radial and axial velocities decrease with z. Mathematically, we have to solve the problem
of flow of viscous fluid in circular cylinder when there are axial and radial components of velocity and the
radial velocity at all points on the surface of the cylinder is prescribed and is a decreasing function φ(z) of z.
85
86 UNIT 14.
is abut 10−1 cm/sec, since this is very much less than one, we neglect the inertial terms to get the following
equations of continuity and motion
1 ∂ ∂vz
(rvr ) + = 0, (14.1.1)
r ∂r ∂z
∂ 2 vr
1 ∂p ∂ 1 ∂
= (rvr ) + , (14.1.2)
µ ∂r ∂r r ∂r ∂z 2
∂ 2 vz
1 ∂p ∂ 1 ∂
= (rvz ) + , (14.1.3)
µ ∂z ∂r r ∂r ∂z 2
The boundary conditions are
∂vz
= 0, vr = 0, vz = finite at r = 0, (14.1.4)
∂r
vz = 0, vr = φ(z) at r = R, (14.1.5)
p = p0 at z = 0, (14.1.6)
p = pL at z = L. (14.1.7)
If
vr = f (r)g(z), (14.1.13)
then the form of (14.1.8) suggests that an analytical solution may be possible if
This suggests that we may get an analytical solution when the radial component of velocity on the surface of
the cylinder is given by
φ(z) = a0 + a1 z or φ(z) = ceγz . (14.1.16)
We shall give the solutions for a special cases in §14.1.3.
Ar4 Br2
F (r) = C + Dr2 + + ln r. (14.1.27)
8 2
From (14.1.23) and (14.1.27), we have
2 2
d 1 d d 1 d
− − G(r) + 2a1 F (r) = 0. (14.1.28)
dr2 r dr dr2 r dr
Using (14.1.24) and (14.1.25), we get
2
d 1 d
− G(r) + 2a1 F (r) = M r2 + N. (14.1.29)
dr2 r dr
Now from (14.1.24), (14.1.25), (14.1.18) and (14.1.19), we have
d 1 0
F (r) = 0 at r = 0, (14.1.30)
dr r
d 1 0
G (r) = 0 at r = 0, (14.1.31)
dr r
1
F (r) = 0 at r = 0, (14.1.32)
r
1 0 1 0
F (r) and G (r) are finite at r = 0, (14.1.33)
r r
F 0 (R) = 0, G0 (R) = 0, F (R) = R. (14.1.34)
C = 0, B = 0. (14.1.35)
d2 G 1 dG 2 r2 r4
− = M r + N − 4a1 + 2a1 . (14.1.38)
dr2 r dr R R3
Integrating (14.1.38), we obtain
M r4 N r2 ln r a1 r4 a1 r6
G(r) = M1 r2 + N1 + + − + . (14.1.39)
8 2 2 R 12 R3
From (14.1.33) and (14.1.39), we have
N = 0. (14.1.40)
From (14.1.34) and (14.1.39), we have
1 3a1 2
2M1 R + M R3 − R = 0. (14.1.41)
2 2
14.1. TWO DIMENSIONAL FLOW IN RENAL TUBULE 89
Equation 14.1.41 can determine only one of the two unknown constants M and M1 . To determine both of
them, we need one more relation. This relation can be found in terms of Q0 which is the total flux at z = 0.
Using (14.1.18) and (14.1.19), we get
ZR
Q(z) = 2πrvz (r, z) dr
0
ZR
4r3 4r M r3 2a1 3 a1 r5
1 2
= 2π − a0 z + a1 z − 2M1 r − − r + dr,
(14.1.42)
R3 R 2 2 R 2R3
0
Q0 M R 2 a1
∴ = − R, (14.1.43)
2πR2 8 3
8 Q0 a1 R
⇒M = + , (14.1.44)
R2 2πR2 3
Q0 a1 R
⇒ M1 = − 2+ . (14.1.45)
πR 12
From (14.1.39), (14.1.40), 14.1.44 and 14.1.45,
a1 R 4 a1 r 4 a1 r6
a1 R Q0 2 1 Q0
G(r) = − r + N 1 + + r − + . (14.1.46)
12 πR2 R2 2πR2 3 2 R 12 R3
The constant N1 need not to be determined since ψ(r, z) can always contain an arbitrary constant without
affecting the velocity components.
so that the decrease of flux is equal to the amount of the fluid coming out of the cylinder per unit length per
unit time. Integrating (14.1.49), we get
r2
2Q
vz = 1 − 2 . (14.1.52)
R πR2
Comparing (14.1.51) and (14.1.52), we find that there are two changes:
(ii) there is further distortion due to the varying nature of the radial flow.
∂p 8µr
= − (a0 + a1 z), (14.1.53)
∂r R2
4a1 µ r2
∂p 2Q(z) 1
= − + + . (14.1.54)
∂z R R2 a1 πR3 2
4µr2
p(r, z) = − (a0 + a1 z) + K(z). (14.1.55)
R3
Differentiating (14.1.55) partially with respect to z and then substituting ∂p/∂z in (14.1.54), we get
0 4a1 µ 1 2Q(z)
K (z) = − + (14.1.56)
R 3 a1 πR2
so that
4a1 µ 1 2zQ(z)
K(z) = − z+ + K0 , (14.1.57)
R 3 a1 πR3
where
Zz
Q(z) = Q(z) dz. (14.1.58)
0
r2
4µ 4a1 8Q
p(r, z) − p(0, 0) = − (a0 + a1 z) 2 − µ + z. (14.1.59)
R R 3R πR4
RR
p(r, z)2πr dr
0 2a0 8Q(z) 10a1
p(z) = = −µ + + z (14.1.60)
RR R πR4 3R
2πr dr
0
Course Structure
• Diffusion and Diffusion-Reaction Models
or
∂c ∂c ∂c
jx = −D , jy = −D , jz = −D . (15.1.2)
∂x ∂y ∂z
Here the quantities jx , jy , jz give respectively the amounts of the solute crossing the planes perpendicular to
x, y, z axes per unit area per unit time so that the dimensions of D are
M L−2 T −1
= L2 T −1 . (15.1.3)
M L−3 L−1
The negative signs in (15.1.1) and (15.1.2) indicate that the flow takes place in the direction of decresing con-
centration. D can vary with x, y, z but we shall take it to be constant. Its values for some common biological
solutes in water lie between 0.05 × 10−6 and 10 × 10−6 cm2 /sec.
91
92 UNIT 15.
Now, consider a volume V with surface S (see Fig. 15.1). The rate of change of the amount of the solute is
given by Z
∂
c(x, y, z, t) dx dy dz. (15.1.4)
∂t
V
The amount of the solute which comes out of the surface S per unit time is given by
Z
j · n̂ dS, (15.1.5)
S
where n̂ is the unit normal vector to the surface. If there is no source or sink inside the volume, then on using
(15.1.1), (15.1.4) and (15.1.5), and Gauss’ divergence theorem, we get
Z Z
∂
c(x, y, z, t) dx dy dz = − j · n̂ dS
∂t
V
Z S
= (D grad c) · n̂ dS
S
Z
= div (D grad c) dx dy dz (15.1.6)
V
so that Z
∂c
− div (D grad c) dx dy dz = 0. (15.1.7)
∂t
V
Since (15.1.7) holds for all volumes, we get Fick’s second law of diffusion as
∂c
= div (D grad c). (15.1.8)
∂t
Since D is assumed to be constant, we get the diffusion equation
2
∂2c ∂2c
∂c 2 ∂ c
= D div (grad c) = D∇ c = D + + . (15.1.9)
∂t ∂x2 ∂y 2 ∂z 2
The equation governing the temperature θ of a heat-conducting homogeneous solid is given by
2
∂2θ ∂2θ
∂θ ∂ θ
=k + + . (15.1.10)
∂t ∂x2 ∂y 2 ∂z 2
where k is called the thermal diffusivity of the solid. The diffusion equation is therefore also known as the
heat-conduction equation.
15.1. THE DIFFUSION EQUATION 93
∂c ∂2c
= D 2. (15.1.11)
∂t ∂x
By differentiating and substituting in (15.1.11), it can be easily verified that
x2
m
c = c(x, t) = exp − (15.1.12)
(4πDt)1/2 4Dt
c
so that m denotes the total amount of the diffusing solute. It is easily seen that is the density function
m c
for the normal probability distribution with mean zero and variance 2Dt. The graphs of m against x for
Dt = 4, 1, 1/4, 1/9 and 1/6 are given in Fig. 15.2.
The area under each of these curves is unity. As t → 0, the variance tends to zero and we get Dirac
delta-function δ(x) which vanishes everywhere except at x = 0 and is such that
Z∞ Z∞
δ(x) dx = 1, f (x)δ(x) dx = f (0). (15.1.14)
−∞ −∞
(x − ξ)2
1
c(x, t) = exp − . (15.1.16)
(4πDt)1/2 Dt
If the solute has an initial density distribution A(ξ) dξ, then the concentration of the solute at time t is given
by
Z∞
(x − ξ)2
1
c(x, t) = A(ξ) exp − dξ. (15.1.17)
(4πDt)1/2 Dt
0
Solution II
∂c
For obtaining the second solution of (15.1.11), if c(x, t) satisfies (15.1.11), then also satisfies it. Con-
∂x
versely, if (15.1.12) is a solution of (15.1.11), then
Zx Zη
x2
m m
exp − dx = √ exp[−η 2 ] dη, (15.1.18)
(4πDt)1/2 4Dt π
−∞ 0
where
x
η= , (15.1.19)
(4Dt)1/2
is also a solution of (15.1.11). If we define error function erf(z) and error function complement erfc(z) as
Zz
2
erf(z) = √ exp[−η 2 ] dη,
π
0
(15.1.20)
Z∞
2
ercf(z) = 1 − erf(z) = √ exp[−η 2 ] dη,
π
z
x
then we find that erfc is a solution of the one-dimensional diffusion equation. We may note that
(4Dt)1/2
Since both erf(z) and erfc(z) are tabulated functions, we have a convenient solution of the one dimensional
diffusion equation.
15.1. THE DIFFUSION EQUATION 95
Solution III
For solving the boundary value problem for which there is no flux at x = 0 and x = a, i.e., for solving
(15.1.11) subject to the boundary conditions
∂c
= 0 at x = 0, x = a, (15.1.22)
∂x
we use the method of separation of variables and try the solution of the form
which is only the average value of the initial concentration. This shows that, as t → ∞, the concentration
tends to become uniform and equal to the average value of the initial concentration. In fact, from (15.1.27),
Za Za
c(x, t) dx = C0 a = f (x) dx (15.1.32)
0 0
so that the total amount of the solute at any time t is equal to the initial total amount. This result is expected
since, according to boundary conditions (15.1.22), no solute enters or leaves the boundaries.
96 UNIT 15.
is XX
c(x, y, t) = Cλµ cos(λx + k ) cos(µx + m u) exp[−(λ2 + µ2 )Dt]. (15.1.34)
λ µ
∂c
= 0 when x = 0, a,
∂x
(15.1.35)
∂c
= 0 when y = 0, b,
∂y
we get
mπ nπ
λ = 0, µ = 0, λ= , µ= (15.1.36)
a b
so that
∞ X
∞ 2 2
n2 π 2
mπx nπx
X m π
c(x, y, t) = Cmn cos cos exp − + 2 Dt . (15.1.37)
a b a2 b
m=0 n=0
Zb Za
1
C00 = f (x, y) dy dx, (15.1.38)
ab
0 0
Zb Za
2 mπx
Cm0 = f (x, y) cos dy dx, (15.1.39)
ab a
0 0
Zb Za
2 mπy
C0n = f (x, y) cos dy dx, (15.1.40)
ab b
0 0
Zb Za
4 mπx nπy
Cmn = f (x, y) cos cos dy dx, (15.1.41)
ab a b
0 0
so that, as expected
Zb Za
1
lim c(x, y, t) = C00 = f (x, y) dy dx, (15.1.42)
t→∞ ab
0 0
Zb Za Zb Za
c(x, y, t) dy dx = abC00 = f (x, y) dy dx. (15.1.43)
0 0 0 0
15.1. THE DIFFUSION EQUATION 97
Solution II
For the axially-symmetric case, the diffusion equation in cylindrical polar coordinates is
2
∂2c
∂c ∂ c 1 ∂c
=D + + . (15.1.44)
∂t ∂r2 r ∂r ∂z 2
A solution independent of z is X
c(r, t) = Ak J0 (λr) exp(−λ2 Dt). (15.1.46)
λ
∂c
If the flux = 0 across the cylindrical boundary r = a, then
∂r
ξ
J1 (λa) = 0 or λ = , (15.1.47)
a
where ξ is a zero of the first order Bessel function. Hence
∞ 2
X r ξ
c(r, t) = Bn J0 ξn exp − n2 Dt , (15.1.48)
a a
n=1
where ξn is the n-th zero of J1 (x). The constants Bn are to be determined from
∞
X r
c(r, 0) = f (r) = Bn J0 ξn . (15.1.49)
a
n=1
J0 (λa) = 0 (15.1.50)
so that
∞
ηn2 Dt ηn r
X
c(r, t) = Dn exp − J0 , (15.1.51)
a a
n=1
where
Za
2 η r
n
Dn = 2 2 rf (r)J0 dr (15.1.52)
a J1 (ηn ) a
0
Course Structure
• Ecological Application of Diffusion Models
• Diffusion on the Stability of Single Species Model
• Diffusion on the Stability of Two Species Model
• Diffusion on the Stability of Prey-Predator Models
99
100 UNIT 16.
Now we will discuss the stabilities of the equilibrium states of these models.
dN
= f (N ). (16.2.1)
dt
Let the population be confined to the volume 0 ≤ x ≤ a, 0 ≤ y ≤ b, 0 ≤ z ≤ c, and let there be diffusion.
Let there be no flux across the faces of the rectangular parallelepiped so that (16.2.1) becomes
2
∂2N ∂2N
∂N ∂ N
= f (N ) + D + + . (16.2.2)
∂t ∂x2 ∂y 2 ∂z 2
∂N
= 0 at x = 0, a,
∂x
∂N
= 0 at y = 0, b, (16.2.3)
∂y
∂N
= 0 at z = 0, c.
∂z
If N gives an equilibrium value for (16.2.1), it also gives an equilibrium value for (16.2.2). Let
where u is sufficiently small so its squares and higher powers can be neglected. Then (16.2.2) gives
2
∂ u ∂2u ∂2u
∂u ∂f
=u +D + 2 + 2 . (16.2.5)
∂t ∂N ∂x2 ∂y ∂z
∂f ∂f
where denote the value of at the equilibrium point N . Now the boundary condition (16.2.3) becomes
∂N ∂N
∂u
= 0 at x = 0, a,
∂x
∂u
= 0 at y = 0, b, (16.2.6)
∂y
∂u
= 0 at z = 0, c.
∂z
For (16.2.5), we try the solution
XXX mπx nπy pπz
u(x, y, z, t) = eλt Amnp cos cos cos (16.2.7)
p n m
a b c
which automatically satisfies boundary conditions (16.2.6). Substituting (16.2.7) in (16.2.5), we get
2 2
n2 π 2 p2 π 2
∂f m π
λ− +D + 2 + =0 (16.2.8)
∂N ∂a2 b ∂c2
16.3. POSSIBILITY OF DIFFUSIVE INSTABILITY FOR TWO SPECIES 101
or
m2 n2 p2
∂f
λ= − Dσ 2 , where σ 2 = + + π2. (16.2.9)
∂N ∂a2 b2 ∂c2
∂f
If, in the absence of diffusion, the equilibrium position is unstable, then is negative, and so λ is also
∂N
negative. Therefore, a position of equilibrium, which is stable in the absence of diffusion remains stable
when there is diffusion in a finite domain with no flux across its surfaces. Thus there is no possibility of
diffusion-induced instability when there is only one single species.
If
N1 (x, y, z, t) = N1 + u1 (x, y, z, t),
(16.3.4)
N2 (x, y, z, t) = N2 + u2 (x, y, z, t),
then, after substituting (16.3.1) and (16.3.2) and linearlizing, we get
2
∂ 2 u1 ∂ 2 u1
∂u1 ∂f1 ∂f1 ∂ u1
= u1 + u2 + D1 + + , (16.3.5)
∂t ∂N1 ∂N2 ∂x2 ∂y 2 ∂z 2
2
∂ 2 u2 ∂ 2 u2
∂u2 ∂f2 ∂f2 ∂ u2
= u1 + u2 + D2 + + (16.3.6)
∂t ∂N1 ∂N2 ∂x2 ∂y 2 ∂z 2
∂fi ∂fi
where , i = 1, 2, denotes the value of at the equilibrium point N1 , N2 . When there is no flux, the
∂Ni ∂Ni
boundary conditions are
∂ui
= 0 at x = 0, a,
∂x
∂ui
= 0 at y = 0, b, (16.3.7)
∂y
∂ui
= 0 at z = 0, c.
∂z
where i = 1, 2. Trying the solution
XXX mπx nπy pπz
u1 = eλt amnp cos cos cos ,
p n m
a b c
XXX mπx nπy pπz (16.3.8)
u2 = eλt bmnp cos cos cos ,
p n m
a b c
102 UNIT 16.
we get
∂f1 ∂f1
λ− + D1 σ 2 −
∂N1 ∂N2
=0 (16.3.9)
∂f2 ∂f2
− λ− + D2 σ 2
∂N1 ∂N2
m2 n2 p2
2
where σ = + 2 + 2 π2.
a2 b c
or
2 2 ∂f1 ∂f2 ∂f1 ∂f2 ∂f1 ∂f2
λ + λ (D1 + D2 )σ − − + −
∂N1 ∂N2 ∂N1 ∂N2 ∂N2 ∂N1
∂f2 ∂f1
− σ 2 D1 + D2 + D1 D2 σ 4 = 0. (16.3.10)
∂N2 ∂N1
In the absence of diffusion, the equation corresponding to (16.3.10) is
2 ∂f1 ∂f2 ∂f1 ∂f2 ∂f1 ∂f2
λ −λ + + − = 0. (16.3.11)
∂N1 ∂N2 ∂N1 ∂N2 ∂N2 ∂N1
We assume that the equilibrium position (N1 , N2 ) is stable in the absence of diffusion so that
∂f1 ∂f2 ∂f1 ∂f2 ∂f1 ∂f2
+ < 0, − > 0. (16.3.12)
∂N1 ∂N2 ∂N1 ∂N2 ∂N2 ∂N1
Inequalities (16.3.12) show that the coefficient of λ in (16.3.10) is positive and the constant term in (16.3.10)
is also positive if
∂f2 ∂f1
D1 + D2 < 0. (16.3.13)
∂N2 ∂N1
Thus, if (16.3.13) is satisfied, the equilibrium position which is stable in the absence of diffusion remains stable
when there is diffusion. In particular, in view of the first inequality in (16.3.12), if the diffusion coefficients are
equal, diffusion fails to induce instability. Thus for diffusion-induced instability to occur, it is necessary that
D1 and D2 should be unequal; but his condition is obviously not sufficient. Even when inequality (16.3.13)
is reversed, the constant term in (16.3.10) may be (but need not to be) negative, and the equilibrium position
may be unstable when there is diffusion. A sufficient condition for diffusion-induced instability is
∂f1 ∂f2 ∂f1 ∂f2 4 2 ∂f2 ∂f1
− + D1 D2 σ − σ D1 + D2 <0 (16.3.14)
∂N1 ∂N2 ∂N2 ∂N1 ∂N2 ∂N1
for some integral values of m, n, p. We may note that the stable equilibrium remain stable in spite of diffusion
(16.3.13) is satisfied or if D1 = D2 or if
∂f1 ∂f2
< 0, < 0. (16.3.15)
∂N1 ∂N2
λ2 + λ[(D1 + D2 )σ 2 ] + a1 a2 + σ 4 D1 D2 = 0, (16.4.3)
2
λ + a1 a2 = 0, (16.4.4)
so that the equilibrium is neutral without diffusion and is neutral or stable with diffusion. Thus diffusion
may ‘increase’ stability; at least it does not ‘decrease stability’.
(ii) For the more general prey-predator model given by (16.3.1) and (16.3.2), we have
∂f1 ∂f2 ∂f1 ∂f2
≥ 0, ≤ 0, < 0, > 0. (16.4.5)
∂N1 ∂N2 ∂N2 ∂N1
Thus, if the equilibrium is stable without diffusion and is unstable with diffusion, we get
∂f2 ∂f1 ∂f1 ∂f2
≥ , D2 > D1 (16.4.6)
∂N2 ∂N1 ∂N1 ∂N2
which give D1 < D2 . Thus for diffusion-induced instability, it is necessary that the coefficient of
diffusion for prey should be less than the diffusion coefficient for predator. Again, this condition is not
sufficient.
(iii) Consider the model which, in the absence of diffusion, is given by
dN1
= N1 [f (N1 ) − N2 ],
dt (16.4.7)
dN2
= N2 [N1 − g(N2 )].
dt
Then we have
f (N1 ) = N2 , g(N2 ) = N2 ,
∂f1 ∂f1
= N1 f 0 (N1 ), = −N1 ,
∂N1 ∂N2 (16.4.8)
∂f2 ∂f2
= N2 , = −N2 g 0 (N2 ).
∂N1 ∂N2
Now, (16.3.10) and (16.3.11) gives
By the same reasoning as before, D1 < D2 . If f 0 (N1 ) > 0, g 0 (N2 ) > 0, we can find the values of
m, n, p so that the equilibrium with diffusion is unstable. However, if f 0 (N1 ) < 0, then g 0 (N2 ) > 0.
This is not possible, and the equilibrium continues to be stable.
References
8. F. Verhulust (1996): Nonlinear Differential Equations and Dynamical Systems, Springer Verlag.
10. Mark Kot (2001): Elements of Mathematical Ecology, Cambridge Univ. Press
105
POST GRADUATE DEGREE PROGRAMME (CBCS) IN
MATHEMATICS
SEMESTER IV
May, 2020
All rights reserved. No part of this work should be reproduced in any form without the permission in writing
form the Directorate of Open and Distance Learning, University of Kalynai.
Director’s Message
Satisfying the varied needs of distance learners, overcoming the obstacle of distance and reaching the un-
reached students are the threefold functions catered by Open and Distance Learning (ODL) systems. The
onus lies on writers, editors, production professionals and other personnel involved in the process to overcome
the challenges inherent to curriculum design and production of relevant Self Learning Materials (SLMs). At
the University of Kalyani a dedicated team under the able guidance of the Hon’ble Vice-Chancellor has in-
vested its best efforts, professionally and in keeping with the demands of Post Graduate CBCS Programmes
in Distance Mode to devise a self-sufficient curriculum for each course offered by the Directorate of Open and
Distance Learning (DODL), University of Kalyani.
Development of printed SLMs for students admitted to the DODL within a limited time to cater to the
academic requirements of the Course as per standards set by Distance Education Bureau of the University
Grants Commission, New Delhi, India under Open and Distance Mode UGC Regulations, 2017 had been our
endeavour. We are happy to have achieved our goal.
Utmost care and precision have been ensured in the development of the SLMs, making them useful to the
learners, besides avoiding errors as far as practicable. Further suggestions from the stakeholders in this would
be welcome.
During the production-process of the SLMs, the team continuously received positive stimulations and feed-
back from Professor (Dr.) Sankar Kumar Ghosh, Hon’ble Vice-Chancellor, University of Kalyani, who kindly
accorded directions, encouragements and suggestions, offered constructive criticism to develop it within
proper requirements. We gracefully, acknowledge his inspiration and guidance.
Sincere gratitude is due to the respective chairpersons as weel as each and every member of PGBOS
(DODL), University of Kalyani, Heartfelt thanks is also due to the Course Writers-faculty members at the
DODL, subject-experts serving at University Post Graduate departments and also to the authors and aca-
demicians whose academic contributions have enriched the SLMs. We humbly acknowledge their valuable
academic contributions. I would especially like to convey gratitude to all other University dignitaries and
personnel involved either at the conceptual or operational level of the DODL of University of Kalyani.
Their persistent and co-ordinated efforts have resulted in the compilation of comprehensive, learner-friendly,
flexible texts that meet the curriculum requirements of the Post Graduate Programme through Distance Mode.
Self Learning Materials (SLMs) have been published by the Directorate of Open and Distance Learning,
University of Kalyani, Kalyani-741235, West Bengal and all the copyright reserved for University of Kalyani.
No part of this work should be reproduced in any from without permission in writing from the appropriate
authority of the University of Kalyani.
All the Self Learning Materials are self writing and collected from e-book, journals and websites.
Director
University of Kalyani
Optional Paper
MATO 4.3
Marks : 100 (SEE : 80; IA : 20)
• Unit 1: The functions- M (r) and A(r). Hadamard theorem on the growth of log M (r)
• Unit 3: Dirichlet series, abscissa of convergence and abscissa of absolute convergence, their represen-
tations in terms of the coefficients of the Dirichlet series.
• Unit 4: The Riemann Zeta function, the product development and the zeros of the zeta functions.
• Unit 5: Entire functions, growth of an entire function, order and type and their representations in terms
of the Taylor coefficients.
• Unit 8: Canonical product, Borel’s first theorem. Borel’s second theorem (statement only), Hadamard’s
factorization theorem, Schottky’s theorem (no proof), Picard’s first theorem.
√
• Unit 9: Multiple-valued functions, Riemann surface for the functions z , log z.
• Unit 10: Analytic continuation, uniqueness, continuation by the method of power series
• Unit 11: Continuation by the method of natural boundary. Existence of singularity on the circle of
convergence.
• Unit 12: Functions element, germ and complete analytic functions. Monodromy theorem.
• Unit 13: Conformal transformations, Riemanns theorems for circle, Schwarz principle of symmetry
1 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 The functions M (r) and A(r) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Hadamard’s theorem on the growth of log M (r) . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.1 Analytical condition for convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Few Probable Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 8
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Schwarz Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1 Borel-Caratheodory theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Open Mapping Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Few Probable Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 15
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Dirichlet Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2.1 Convergence of Dirichlet’s series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3 Few Probable Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4 24
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2 Riemann Zeta Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3 The Product Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.4 Functional Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.4.1 Relationship with the Gamma Function . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.4.2 Theta Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.4.3 Functional equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.5 Few Probable Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5 31
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.2 Entire Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.2.1 Order of an entire function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.2.2 Type of an entire function of finite non-zero order . . . . . . . . . . . . . . . . . . . . 34
5.2.3 Order for sum and multiplications of entire functions . . . . . . . . . . . . . . . . . . 36
5.2.4 Order and coefficients in terms of Taylor’s Coefficients . . . . . . . . . . . . . . . . . 38
5.3 Few Probable Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
CONTENTS
6 41
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.2 Distribution of zeros of analytic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.3 Distribution of zeros of entire functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6.3.1 Convergence exponent of zeros of entire functions . . . . . . . . . . . . . . . . . . . 45
6.4 Few Probable Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
7 50
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7.2 Infinite Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7.2.1 Infinite product of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
7.3 Factorization of Entire functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.4 Few Probable Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
8 60
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
8.2 Canonical Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
8.3 Hadamard’s Factorization theorem and results . . . . . . . . . . . . . . . . . . . . . . . . . . 63
8.4 Few Probable Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
9 68
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
9.2 Multiple-Valued Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
9.3 Argument as a function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
9.4 Branch Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
9.4.1 Multibranches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
9.4.2 Branch Cuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
9.5 Riemann Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
9.5.1 Square Root function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
9.5.2 Logarithm Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
9.6 Few Probable Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
10 78
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
10.2 Analytic Continuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
10.3 Analytic Continuation along a curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
10.4 Power Series Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
10.5 Few Probable Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
11 86
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
11.2 Continuation by method of natural boundary . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
11.3 Existence of singularities on the circle of convergence . . . . . . . . . . . . . . . . . . . . . . 87
11.4 Few Probable Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
12 92
12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
12.2 Monodromy Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
12.3 Few Probable Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
CONTENTS
13 99
13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
13.2 Conformal Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
13.3 Conformal Equivalences and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
13.4 Möbius Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
13.5 Few Probable Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
14 110
14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
14.2 Schwarz Principle of Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
14.3 Schwarz Christoffel formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
14.4 Few Probable Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
15 115
15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
15.2 Normal Families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
15.3 Univalent Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
15.4 Few Probable Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
16 119
16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
16.2 Area Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
16.3 Growth and Distortion Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
16.4 Few Probable Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Unit 1
Course Structure
• The functions- M (r) and A(r).
1.1 Introduction
We have read about the Maximum and Minimum modulus theorems for a non-constant analytic functions on
bounded set G. We mainly introduce two new terms, viz., M (r) and A(r), and derive their properties. The
main motive is to study the growth of an analytic function f . f , being a complex function, is not comparable
with the real functions when we come to measure their orders of growth. So, to be able to measure the order
of growth of such functions, we need to define real function and find the desired results with respect to them.
Objectives
After reading this unit, you will be able to
• deduce the properties of them with the help of the maximum modulus theorem
Theorem 1.2.1. If a function f is analytic in a bounded region G, and continuous on G and M = max{|f (z)| : z ∈
∂G}, where ∂G is the boundary of G, then |f (z)| < M in G, unless f is a constant function.
1
2 UNIT 1.
Corollary 1.2.1. Suppose that f is analytic in a bounded region G and continuous on G. Then, each of
Ref (z), −Ref (z), Imf (z) and −Imf (z) attains its maximum at some point on the boundary ∂G of G.
Proof. Let u(x, y) = Re f (z) and g(z) = ef (z) . By the Maximum Modulus theorem, |g(z)| = eu(x,y) cannot
assume the maximum value in G. Since eu is maximized when u is maximized, obtain that u(x, y) cannot
attain its maximum value in G. Similarly, the other cases can be proved.
The minimum modulus theorem comes as a direct corollary of the above theorem which is
Theorem 1.2.2. Let f be a non-constant analytic function in a bounded region G and continuous on G. If
f (z) 6= 0 inside ∂G, then |f (z)| must attain its minimum value on ∂G.
Example 1.2.2. Suppose that f and g are analytic on the closed unit disc |z| ≤ 1 such that
We wish to use the Maximum modulus theorem to find the maximum value of |f (z)| on |z| ≤ 1/3. To do
this, we proceed as follows. On |z| = 1, we have
and so, |g(z)| ≤ M for |z| ≤ 1. Now, for |z| ≤ 1/3, we have,
The hypothesis that G is bounded can’t be dropped however as we see in the following example.
Example 1.2.3. Define f (z) = e−iz on G = {z : Im z > 0}. Then |f (z)| = 1 on the boundary ∂G =
{z : Im z = 0}, that is, the real axis. But, for z = x + iy ∈ G, we have,
|f (x + iy)| = ey → ∞ as y → +∞;
that is, f itself is not bounded. And the Maximum modulus theorem fails.
We will now define the terms M (r) and A(r) related to an analytic function f as follows
Definition 1.2.1. Let f be a non-constant analytic function defined in |z| ≤ R. Then, for 0 ≤ r < R, we
define
Theorem 1.2.3. Let f be a non-constant analytic function defined in |z| ≤ R. Then, 0 ≤ r < R
1.3. HADAMARD’S THEOREM ON THE GROWTH OF LOG M (R) 3
Geometrically, f (x) is said to be convex downwards, or simply convex in [a, b] if the curve y = f (x)
between any two points x1 and x2 in [a, b] always lies below the chord joining the points (x1 , f (x1 )) and
(x2 , f (x2 )).
Figure 1.1
Theorem 1.3.1. (Hadamard’s three-circles theorem) Let f be analytic on the closed annulus 0 < r1 ≤
|z| ≤ r3 (see fig. 1.2). If r1 < r2 < r3 , then
r3 r3 r2
log log log
M (r2 ) r1
≤ M (r1 ) r2
M (r3 ) r1
.
Proof. Let φ(z) = z λ f (z), where λ is a real constant to be chosen later. If λ is not an integer, φ(z) is multi-
valued in r1 ≤ |z| ≤ r3 . So we cut the annulus along the negative part of the real axis obtaining a simply
connected region G in which the principal branch of φ is analytic.
The maximum modulus of this branch of φ in G, that is, the cut-annulus is obtained on the boundary
of G. Since λ is real, all the branches of φ have the same modulus. By considering another branch of
φ which is analytic in another cut-annulus obtained by using a different cut, it is clear that the principal
branch of φ must attain its maximum modulus on at least one of the bounding circles of the annulus. Thus,
|φ(z)| ≤ max{r1λ M (r1 ), r3λ M (r3 )}. Hence, on |z| = r2 , we have,
r3
r1
X0 O
X
r2
Y0
With this λ, we get from (1.3.2), and the fact that alog b = blog a holds for all positive real numbers a and b, we
get
λ
r1
M (r2 ) ≤ M (r1 )
r2
log M (r1 )
log
r3
r1 M (r3 )
log
r3
⇒ M (r2 ) r1
≤ · M (r1 ) r1
r2
log r1
log
r3
M (r1 ) r2
log
r3
⇒ M (r2 ) r1
≤ · M (r1 ) r1
M (r3 )
r3 r3 r2
log log log
⇒ M (r2 ) r1
≤ M (r1 ) r2
· M (r3 ) r1
.
Note 1.3.1. The equality in the above theorem can be achieved when φ(z) is constant, that is, f (z) is of the
form f (z) = kz λ , for some real λ, k being a constant.
P∞ n
Theorem 1.3.2. Let f (z) = n=0 an z be analytic in |z| ≤ r. Then
P∞
Proof. Let z = r eiθ , f (z) = n=0 an z
n = u(r, θ) + iv(r, θ) and an = αn + iβn . Thus,
∞
X
u(r, θ) + iv(r, θ) = (αn + iβn )rn einθ
n=0
X∞
= (αn + iβn )rn (cos nθ + i sin nθ)
n=0
X∞
= rn {(αn cos nθ − βn sin nθ) + i(αn sin nθ + βn cos nθ)}.
n=0
Hence, Z 2π Z 2π
1
u(r, θ)dθ = α0 2π ⇒ α0 = u(r, θ)dθ.
0 2π 0
Also,
u(r, θ) cos nθ = α0 cos nθ+(α1 cos θ cos nθ−β1 sin θ cos nθ)r+· · ·+(αn cos2 nθ−βn sin nθ cos nθ)rn +· · ·
So, Z 2π Z 2π
n
u(r, θ) cos nθdθ = αn r cos2 nθdθ = παn rn .
0 0
Hence, for n > 0, Z 2π
n 1
αn r = u(r, θ) cos nθdθ.
π 0
Similarly, multiplying (1.3.3) by sin nθ and integrating term by term from 0 to 2π, we get,
1 2π
Z
−βn rn = u(r, θ) sin nθdθ, for n > 0.
π 0
Hence,
1 2π 1 2π
Z Z
n n
an r = (αn + iβn )r = u(r, θ) cos nθdθ − u(r, θ) sin nθdθ
π 0 π 0
1 2π
Z
= u(r, θ) e−inθ dθ, n > 0.
π 0
Thus, Z 2π
1
|an |rn ≤ |u(r, θ)|dθ, n > 0.
π 0
Hence, Z 2π
1
|an |rn + 2α0 = {|u(r, θ)| + u(r, θ)}dθ. (1.3.4)
π 0
1.4. FEW PROBABLE QUESTIONS 7
Now, |u(r, θ)| + u(r, θ) = 0 when u(r, θ) < 0. Hence if A(r) < 0, the right hand side of (1.3.4) is 0. Again,
if A(r) ≥ 0, the right hand side of (1.3.4) does not exceed
Z 2π
1
2A(r)dθ = 4A(r).
π 0
Thus,
1. State the Maximum modulus theorem. Show that M (r) is an increasing function of r.
2. Show that for any non-constant analytic function f defined on a bounded region G and continuous on G,
Re f (z) attains its maximum at some point on the boundary ∂G of G. Show that A(r) is an increasing
function of r.
Course Structure
• Schwarz Lemma, Open mapping theorem.
• Borel-Caratheodory inequality
2.1 Introduction
Let us denote D = {z ∈ C : |z| < 1}. In this unit, we start with a simple but one of the classical theorems
in complex analysis, named Schwarz’ Lemma, which states that if f is analytic and satisfies |f (z)| < 1 in D
and f (0) = 0, then |f (z)| ≤ |z| for each z ∈ D with equality sign if and only if f has the form f (z) = eiα z,
for some α ∈ R. Furthermore, |f 0 (0)| ≤ 1 with the equality if and only if f has the form as stated previously.
This result has important role in the proof of Riemann mapping theorem.
This unit also deals with deducing the Open mapping theorem and the Borel-Caratheodory theorem.
Objectives
After reading this unit, you will be able to
• deduce Schwarz lemma and its various variants; also applying it in various problems
• deduce the Borel Caratheodory inequality from Schwarz lemma and discuss some of its consequences
Theorem 2.2.1. (Schwarz Lemma) Let f : D → D be analytic having a zero of order n at the origin. Then
2. |f (n) (0)| ≤ n!
8
2.2. SCHWARZ LEMMA 9
and the equality holds either in 1 for some point 0 6= z0 ∈ D or in 2 occurs if and only if f (z) = z n with
|| = 1.
Proof. Let f : D → D be analytic on D and has nth order zero at the origin. Then we have,
So we can write
∞
X
f (z) = ak z k = z n g(z), for z ∈ D,
k=n
where
∞
f (k) (0) X
ak = and g(z) = ak z k−n .
k!
k=n
The function g(z) = f (z)/z n has a removable singularity at the origin so that if
then
R g is analytic in D \ {0} and continuous in D. Since we will have by Cauchy’s theorem for a disc,
C g(z)dz = 0 for all closed contours C inside D so by Morera’s theorem, g is analytic on D.
We claim that |g(z)| ≤ 1 for all z ∈ D. Now, for 0 < r < 1, g is analytic on the bounded domain
Dr = {z : |z| < r} and g is continuous on the closure of Dr . Thus, the maximum modulus theorem is
applicable here. As |f (z)| ≤ 1 for all z ∈ D, it follows that for |z| = r,
|f (z)| 1
|g(z)| = n
≤ n.
|z| r
By Maximum modulus theorem, |g(z)| ≤ r−n , for all z with |z| ≤ r. Since r is arbitrary, by letting r → 1,
we find that |g(z)| ≤ 1, that is,
|g(z)| ≤ 1 for all z ∈ D (2.2.1)
and this implies that
|f (z)| ≤ |z|n for all z ∈ D.
Equality in 1 holds for some point z0 in D \ {0} implies that |g(z0 )| = 1. It follows that g achieves its
maximum modulus at an interior point z0 . Consequently, by the Maximum modulus theorem, g must reduce
to a constant, say . Then, f (z) = z n , where || = 1.
Also, note that |g(z)| ≤ 1 throughout the disc D. Since |an | = |g(0)|, we get by equation (2.2.1), |g(0)| ≤ 1
so, we get |f (n) (0)|/n! ≤ 1 and hence, 2 follows.
Again, if |f (n) (0)| = n! then |g(0)| = 1 showing that g achieves its maximum modulus 1 at some interior
point 0 00 . So, g is a constant function of absolute value 1 and as before, it means that f (z) = z n with
|| = 1.
Remark 2.2.1. Note that the case n = 1 of the previous theorem is the original Schwarz lemma state in the
beginning of this unit.
For example, if f is an analytic function over D with |f (z)| ≤ 1 and f (0) = 0, then what kind of function
is f if f (1/2) = 1/2? It must be none other than the identity function since the equality in the preceding
theorem holds with n = 1 and z = 1/2 ∈ D.
Corollary 2.2.1. If f is analytic and satisfies |f (z)| ≤ M in B(a; R) and f (a) = 0, then
10 UNIT 2.
2. |f 0 (a)| ≤ M/R
with the equality sign if and only if f has the form f (z) = M (z − a)/R, for some constant with || ≤ 1.
Proof. Use Schwarz lemma with g(z) = f (Rz + a)/M , |z| < 1.
Corollary 2.2.2. If f is analytic and satisfies |f (z)| ≤ M in B(a; R) and a is a zero of f of order n. Then
with the equality sign if and only if f has the form f (z) = M (z − a)n /Rn , for some constant with || ≤ 1.
Proof. Prove the corollary independently without using the Schwarz lemma.
Does the Schwarz lemma hold for real-valued functions? Consider the function
2x
u(x) = .
x2+1
Then u is infinitely differentiable on R. In particular, u0 (x) is continuous on [−1, 1], u(0) = 0 and |u(x)| ≤ 1.
But |u(x)| > |x| for 0 < |x| < 1.
Example 2.2.1. Let ω = e2πi/n be the nth root of unity, where n ∈ N is fixed. Suppose that f : D → D is
analytic such that f (0) = 0. We wish to apply Schwarz lemma to show that
and the equality for some point 0 6= z0 ∈ D occurs if and only if f (z) = z n with || = 1. To do this, we
define F : D → D by
n−1
1X
F (z) = f (ω k z).
n
k=0
so that (as ω n = 1)
n−1
f (m) (0) 1 − (ω m )n
1 X k m (m)
F (m) (0) = (ω ) f (0) = = 0.
n n 1 − ωm
k=0
By Schwarz lemma, it follows that |F (z)| ≤ |z|n for all z ∈ D which is the same as (2.2.2). The equality in
this inequality for some point z0 6= 0 occurs if and only if F (z) = z n with || = 1, or equivalently,
n−1
X
[f (ω k z) − z n ] = 0. (2.2.3)
k=0
2.2. SCHWARZ LEMMA 11
P∞
We claim that the above equation implies that f (z) = z n . If we let f (z) = m=1 am z
m, then (2.2.3)
becomes
∞ n−1
!
X X
am ω km z m = nz n .
m=1 k=0
In view of the identity
n−1
X
ω km = n if m is a multiple of n
k=0
= 0 otherwise
the last equation implies that an = and a2n = a3n = · · · = 0. On the other hand, as |f (z)| ≤ 1 on D and
|an | = 1, we have
Z 2π ∞
1 iθ 2
X
lim |f (r e )| dθ = |am |2 ≤ 1
r→1− 2π 0
m=1
which shows that all the Taylor’s coefficients of f (except an ) must vanish and so, f (z) = eiθ z n .
Theorem 2.2.2. (Borel Caratheodory theorem) Let f be analytic on D : |z| ≤ R and M (r) and A(r) are
as we defined in the previous unit. Then for 0 < r < R,
2r R+r
M (r) ≤ A(R) + |f (0)|.
R−r R−r
Proof. We consider the following cases.
√
Case I. When f (z)
√ = constant = a + ib, where a and b are real constants. Then M (r) = a2 + b2 , and
2 2
|f (0)| = a + b , and A(R) = a. Now,
p
2r R+r 2r R+r
A(R) + |f (0)| − M (r) = a+ −1 a2 + b2
R−r R−r R−r R−r
2r p
= (a + a2 + b2 ) ≥ 0.
R−r
Hence,
2r R+r
M (r) ≤ A(R) + |f (0)|.
R−r R−r
Case II. When f (z) 6= constant and f (0) = 0. Then A(R) > A(0) = 0, since A(r) is an increasing function of
r. Let,
f (z)
g(z) = . (2.2.4)
2A(R) − f (z)
2A(R)−f (z) 6= 0 for all z ∈ D, since the real part of 2A(R)−f (z) does not vanish in D and g(0) = 0.
Let f (z) = u + iv. Then
u2 + v 2
|g(z)|2 = ≤ 1, z ∈ D,
(2A(R) − u)2 + v 2
12 UNIT 2.
Hence,
2A(R)g(z) 2A(R)|g(z)| 2A(R) Rr 2r
|f (z)| = ≤ ≤ = A(R),
1 + g(z) 1 − |g(z)| 1 − Rr R−r
2r
for |z| = r < R. Thus, M (r) ≤ R−r A(R) and
2r R+r
M (r) ≤ A(R) + |f (0)|.
R−r R−r
Case III. When f (z) 6= constant and f (0) 6= 0. Let h(z) = f (z) − f (0). Then h(0) = 0. So, by case II, we have
2r
max{|h(z)| : |z| = r} ≤ max{Re h(z) : |z| = R}. (2.2.5)
R−r
Now,
and
2r R+r R+r
M (r) ≤ A(R) + |f (0)| ≤ (A(R) + |f (0)|)
R−r R−r R−r
h i
2r r+r
since R−r < R−r as R + r > 2r .
2.3. OPEN MAPPING THEOREM 13
2n+2 · n!R
max{|f (n) (z)| : |z| = r} ≤ (A(R) + |f (0)|).
(R − r)n+1
Proof. By Cauchy’s Integral formula for derivatives, we have,
Z
n! f (t)dt
f (n) (z) = (2.2.6)
2πi γ (t − z)n+1
where, γ : |t − z| = δ = (R − r)/2.
On γ,
1 1
|t| = |t − z + z| ≤ |t − z| + |z| = (R − r) + r = (R + r) < R,
2 2
which ensures that γ lies within |z| = R. By Borel Caratheodory theorem, we have,
R + 12 (R + r)
max |f (t)| ≤ (A(R) + |f (0)|)
R − 21 (R + r)
3R + r
= (A(R) + |f (0)|)
R−r
4R
< (A(R) + |f (0)|).
R−r
Hence from (2.2.6),
Z
(n) n! f (t)dt
|f (z)| = n+1
2π γ (t − z)
n! 4R
≤ n+1
(A(R) + |f (0)|) · 2πδ
2πδ R−r
n! 4R
= · (A(R) + |f (0)|)
δn R − r
2n+2 · n!R
= (A(R) + |f (0)|).
(R − r)n+1
Hence,
2n+2 · n!R
max{|f (n) (z)| : |z| = r} ≤ (A(R) + |f (0)|).
(R − r)n+1
Exercise 2.2.1. If f is an analytic function defined on D such that |f (z)| < 1 for all z in D and f fixes two
distinct points of D, then show that f is the identity function.
ex + e−x
f (x) = x2 , g(x) = sin x, h(x) = ,
2
14 UNIT 2.
respectively. Clearly,
f (R) = [0, ∞), g(R) = (0, 1], h(R) = [1, ∞)
showing that each of f , g and h are not open mappings. Each of the above functions are infinitely differentiable
non-constant real valued functions defined on the real line. Thus, the above examples show that the following
theorem does not hold for real line R. Let us now state the open mapping theorem.
Theorem 2.3.1. Let G be a region and suppose that f is a non-constant analytic function on G. Then for any
open set U in G, f (U ) is open.
Proof. Let U ⊂ G be open. To show that f (U ) is open, we show that for each a ∈ U , ∃ δ > 0 such that
the open ball B(f (a); δ) ⊂ f (U ). Let φ(z) = f (z) − f (a). Then a is a zero of φ. Since the zeros of a
non-constant analytic functions are isolated points, so there exists an open ball B(a; r) with B(a; r) ⊂ U
such that φ(z) 6= 0 in 0 < |z − a| < r. In particular, φ(α) 6= 0 for α ∈ ∂B(a; ρ) where ρ < r.
Let
2δ = min{|φ(α)| : α ∈ ∂B(a; ρ)}.
Then δ > 0. Now, for any w ∈ B(f (a); δ), we have,
Let F (z) = f (z) − w. Then F has a zero in B(a; ρ). For, if F (z) 6= 0 in B(a; ρ), there exists a nbd N (a) of
a containing B(a; ρ) lying in G such that F (z) 6= 0 in N (a). Then 1/F (z) will be analytic in N (a) and
1 1 1
< max : α ∈ ∂B(a; ρ) =
F (a) F (α) min{|F (α)| : α ∈ ∂B(a; ρ)}
that is,
min{|f (α) − w| : α ∈ ∂B(a; ρ)} < |f (a) − w|,
which contradicts (2.3.1). Hence ∃ z0 ∈ B(a; ρ) such that f (z0 ) = w. Since w is an arbitrary point of
B(f (a); δ), it follows that B(f (a); δ) ⊂ f (U ), and hence the theorem.
Course Structure
• Dirichlet series, abscissa of convergence and abscissa of absolute convergence,
3.1 Introduction
In mathematics, a Dirichlet series is any series of the form
∞
X an
f (s) =
ns
n=1
where s is a complex number, and an is a complex sequence. It is a special case of general Dirichlet series.
Dirichlet’ s series were, as their name implies, first introduced into analysis by Dirichlet, primarily with a
view to applications in the theory of numbers. A number of important theorems concerning them were proved
by Dedekind, and incorporated by him in his later editions of Dirichlet’s Vorlesungen uber Zahlentheorle.
Dirichlet and Dedekind, however, considered only real values of the variable s. The first theorems involving
complex values of s are due to Jensen, who determined the nature of the region of convergence of the general
series; and the first attempt to construct a systematic theory of the function f (s) was made by Cahent in a
memoir which, although much of the analysis which it contains is open to serious criticism, has served and
possibly just for that reason as the starting point of most of the later researches in the subject. We will however,
not go into a very vigorous treatment of the subject. We will mainly concern ourselves with the preliminaries
of Dirichlet series and gain some idea about their convergence.
Objectives
After reading this unit, you will be able to
• define certain terms related to the convergence of Dirichlet’s series and deduce certain properties
15
16 UNIT 3.
where, {λn } is an increasing sequence of real numbers whose limit is infinity, and s = σ + it is a complex
variable, whose real and imaginary parts are σ and t respectively. Such a series is called a Dirichlet’s series of
type λn . If λn = n, then (3.2.1) is a power series in e−n . If λn = log n, then (3.2.1) becomes
∞
X
f (s) = an n−s (3.2.2)
n=1
is called an ordinary Dirichlet’s series. In this unit, we will mainly deal with the ordinary Dirichlet’s series.
It is clear that all but a finite number of the numbers λn must be positive. It is often convenient to suppose
that they are all positive, or at any rate λ1 ≥ 0. Sometimes, an additional assumption is needed, such as the
Bohr condition, namely λn+1 − λn ≥ c/n for some c > 0.
We will look into certain examples of Dirichlet’s series now.
Example 3.2.1. A very important example is the Riemann zeta function which is
∞
X 1
ζ(s) = .
ns
n=1
For t = 0, that is, s = σ ∈ R, it is proved from elementary calculus, that ζ(σ) diverges for σ = 1 and is
absolutely convergent for σ > 1. This is called the ”p-test”, where p = σ. We will learn more about this in
our next unit.
again when s = σ ∈ R known as the Euler-Dedekind function. It is proved in elementary calculus that this
series converges for σ > 0, where the convergence is conditional for 0 < σ ≤ 1 and absolute for 1 < σ.
In this section we shall prove that very similar results hold, with appropriate hypotheses on the coefficients
an , for s ∈ C, that is, dropping the condition t = 0.
3. for any r < R, the series converges uniformly and absolutely in {|z| ≤ R} and the sum is bounded on
this set;
Note 3.2.1. Setting α = log(m), β = log(n) in the above lemma, 0 < m < n, σ > 0, then
|s| −σ
|m−s − n−s | ≤
(m − n−σ ).
σ
Lemma 3.2.2. (Abel’s Summation by parts formula) Let An = nk=1 ak , then
P
n
X n
X
ak bk = An bn+1 − Ak (bk+1 − bk ).
k=1 k=1
Proof. We have, |An | ≤ C, for some C > 0 and for all n. We shall use corollary 3.2.1, with an = An and
bn = n−s . Then
|An bn+1 | = |An | · |bn+1 | ≤ C · (n + 1)−σ → 0, as n → ∞.
Hence the second condition of Corollary (3.2.1) is satisfied, that is, {An bn+1 } converges (in this case, to 0).
For the first condition, we apply the Cauchy convergence criterion to ∞ −s − k −s ). Given
P
k=1 Ak ((k + 1)
> 0 and using note 3.2.1, if {Sn } is the partial sum associated with the series, then we have,
n
X
|Sn − Sm | = Ak ((k + 1)−s − k −s )
k=m+1
Xn
≤ C |(k + 1)−s − k −s )|
k=m+1
n
C|s|X 1 1
≤ σ
−
σ k (k + 1)σ
k=m+1
C|s| 1 1
= −
σ (m + 1)σ (n + 1)σ
C|s|
≤ <
σ(m + 1)σ
Theorem 3.2.2. If the series is convergent for s = σ + it, then it is convergent for any value of s whose real
part is greater than σ.
This theorem is included in the more general and less elementary theorem which follows. The above
theorem can be obtained as a corollary of the theorem that follows.
Theorem 3.2.3. If the series ∞ −s converges at some s ∈ C, then, for every δ > 0, it converges
P
n=1 an n 0
uniformly in the sector n π π o
s : − + δ < arg(s − s0 ) < − δ .
2 2
P∞
Proof. Without
P∞ any loss of generality, we may assume that s 0 = 0, that is, the series n=1 an converges.
Let rn = k=n+1 ak , and fix > 0. Then there exists n0 ∈ N such that for all n ≥ n0 , |rn | < . Using
summation by parts, for s in the sector and M, N > n0 , we get,
N
X N
X
−s
an n = (rn−1 − rn )n−s
n=M n=M
N −1
X 1 1 rM −1 rN
= rn s
− s + s
− s (3.2.3)
(n + 1) n M N
n=M
The absolute values of the last two terms are bounded by , numerators are bounded by , while the denomi-
nators have absolute value at least 1. To estimate the summation part of (3.2.3), note that
n+1
−s
Z
1 1
s
− s = dx,
(n + 1) n n xs+1
3.2. DIRICHLET SERIES 19
so that
n+1
|s| 1
Z
1 1 dx 1
s
− s ≤ |s| = − . (3.2.4)
(n + 1) n n |xs+1 | σ nσ (n + 1)σ
Thus the absolute value of the summation part of (3.2.3) satisfies for M, N > n0 ,
N −1 N −1
X 1 1 X |s| 1 1
rn − ≤ |rn |−
(n + 1)s ns σ nσ (n + 1)σ
n=M n=M
N −1
|s| X 1 1
≤ −
σ nσ (n + 1)σ
n=M
|s| 1 1
≤ − σ
σ Mσ N
≤ c(δ),
since
|s| 1 1
= ≤ π
=: c(δ).
σ cos(arg s) cos 2 − δ
This proves that the series is uniformly Cauchy, and hence uniformly convergent.
There are now three possibilities as regards the convergence of the series. It may converge for all, or no, or
some values of s. In the last case it follows from theorem 3.2.2, by a classical argument, that we can find a
number σ0 such that the series is convergent for σ > σ0 and divergent or oscillatory for σ < σ0 .
Theorem 3.2.4. The series may be convergent for all values of s, or for none, or for some only. In the last
case there is a number σ0 such that the series is convergent for σ > σ0 and divergent or oscillatory for σ < σ0 .
Proof. If the series converges at some s0 ∈ C, the theorem follows from the inclusion
[
{s : Re s = σ > σ0 } ⊂ {s : |s − s0 | < π/2 − δ}.
δ>0
In other words the region of convergence is a half-plane. We call σ0 as the abscissa of convergence, and
the line σ = σ0 as the line of convergence. It is convenient to write σ0 = −∞ or σ0 = ∞ when the series is
convergent for all or no values of s. On the line of convergence the question of the convergence of the series
remains open, and requires considerations of a much more delicate character.
We formally define the abscissa of convergence as follows:
Definition 3.2.2. The abscissa of convergence of the Dirichlet series ∞ −s is the extended real number
P
n=1 an n
σ0 ∈ [−∞, ∞] with the following properties
2. If Re s < σ0 , then the series diverges; If Re s = σ0 , nothing can be said about the convergence of the
series.
Note 3.2.2. To determine the abscissa of convergence, it is enough to look at convergence of the series for
s ∈ R.
20 UNIT 3.
1. The series ∞ n −s
P
Example 3.2.3. n=1 a n , where |a| < 1, is convergent for all s. If |a| > 1, then the
series converges for no value of s. And for a = 1, it is not convergent at any point of the line of
convergence, diverging to +∞ for s = 1 and oscillating finitely for other values of s.
2. The series ∞ −2 −s has the same line of convergence as the last series, but is convergent
P
n=2 (log n) n
(indeed absolutely convergent) at all points of the line.
3. The series ∞ −s where a = (−1)n + (log n)−2 , has the same line of convergence, and is
P
n=2 an n n
convergent (though not absolutely) at all points of it.
∞
( )
X
−s
σa = inf ρ : an n converges absolutely for some s with Re s = ρ
n=1
∞
( )
X
−s
= inf ρ: an n converges absolutely for all s with Re s ≥ ρ .
n=1
The following theorem gives the relationship between σ0 and σa for a Dirichlet’s series.
σ0 ≤ σa ≤ σ0 + 1.
P∞ The −s
Proof. first inequality is obvious. For the second, assume, that σ0 = 0. We need to show that for σ > 1,
n=1 n|a n | converges. Take > 0 such that σ − > 1. Then,
∞ ∞ ∞ ∞
X X |an | X |an | 1 X 1
|an n−s | = = · ≤C < ∞,
nσ n nσ− nσ−
n=1 n=1 n=1 n=1
Remark 3.2.1. If an > 0 for all n ∈ N, then σ0 = σa . This follows immediately by considering s ∈ R.
Recall that for the radius of convergence of a power series, we have the following formula
1
1/R = lim sup |an | n .
n→∞
The following is an analogous formula for the abscissa of convergence of a Dirichlet’s series.
Theorem 3.2.6. Let ∞ −s be a Dirichlet’s series, and let σ be its abscissa of convergence. Let
P
n=1 an n 0
sn = a1 + a2 + · · · + an and rn = an+1 + an+1 + · · ·
P
1. If an diverges, then
log |sn |
0 ≤ σ0 = lim sup .
n→∞ log n
P
2. If an converges, then
log |rn |
0 ≥ σ0 = lim sup .
n→∞ log n
3.2. DIRICHLET SERIES 21
P∞ −s
Proof. 1. We assume that the series n=1 an n diverges and define
log |sn |
α = lim sup .
n→∞ log n
We will first show that α ≤ σ0 . Assume thatP ∞ −σ converges. Thus, σ > 0 and we need to
P
n=1 an n
−s n
show that σ ≥ α. Let bn = an n and Bn = k=1 bk (so that B0 = 0). By assumption, the sequence
{Bn } is bounded, say by M , and we can use the summation by parts as follows:
N
X
sN = an
n=1
XN
= bn nσ
n=1
N
X −1
= Bn [nσ − (n + 1)σ ] + BN N σ
n=1
so that
N
X −1
|sN | ≤ M [(n + 1)σ − nσ ] + M N σ ≤ 2M N σ .
n=1
so,
log |sn | log 2M
≤σ+ ,
log N log N
and this tends to σ as N → ∞, giving the desired upper bound for α.
We need to show the other inequality σ0 ≤ α. Suppose σ > α. We need to show that ∞ −σ
P
n=1 an n
converges. Chosen an > 0 such that α + < σ. By definition, there exists n0 ∈ N such that for all
n ≥ n0 ,
log |sn |
≤ α + .
log n
This implies that
log |sn | ≤ (α + ) log n = log nα+ .
N N
X an X
= sn [n−σ − (n + 1)σ ] + SN (N + 1)σ − sM M −σ
nσ
n=M +1 n=M
N
X
≤ nα+ [σn−σ−1 ] + N α+ N −σ + M α+ M −σ
n=M
. (M − 1)α+−σ ,
PN
We estimated n=M nα+−σ−1 by the integral
Z N −1
xα+−σ−1 dx . (M − 1)α+−σ ,
M −1
and the symbol . means less than or equal to a constant times the right hand-side (where the constant
depends on α + − σ, but, critically, not on M ).
2. Similar to the first part.
From the formulae above we can simply deduce formulae for the abscissa of absolute convergence, although
these can be derived easily on their own.
Corollary 3.2.2. For a Dirichlet’s series ∞ −s
P
n=1 an n , we have
P
1. if |an | diverges, then
log(|a1 | + |a2 | + · · · + |an |)
σa = lim sup ≥ 0,
n→∞ log n
P
2. if |an | converges, then
log(|an+1 | + |an+2 | + · · · )
σa = lim sup ≤ 0.
n→∞ log n
Example 3.2.4. The series
∞
X (−1)n
psn
n=1
(where, pn are primes) has σ0 = 0 and σa = 1.
The series of coefficients diverges and so we use the first of the pair of formulae for each abscissae
log 1
σ0 = lim sup = 0,
n→∞ log n
and, using the prime number theorem,
log(π(n)) log n − log(log n)
σa = lim sup = lim sup = 1,
n→∞ log n n→∞ log n
where, π(x) denotes the number of primes less than or equal to x.
Theorem 3.2.7. Suppose that the series ∞ −s converges absolutely to some f (s) in some half-plane
P
n=1 an n
Hc = {s : Re s > c} and f (s) ≡ 0 in the half-plane Hc . Then an = 0 for all n ∈ N.
P
Proof. We may assume that c < 0, so, in particular, |an | < ∞. Suppose that all an ’s are not zero, and let
n0 be the be the smallest natural number such that an0 6= 0.
We claim that lim f (σ)nσ0 = an0 . To prove the claim that
σ→∞
X
0 ≤ nσ0 an n−σ
n>n0
n σ
0
X
≤ |an |
n>n0
n
σ X
n0
≤ |an |,
n0 + 1 n>n0
3.3. FEW PROBABLE QUESTIONS 23
P
and the last term tends to 0 as σ → ∞, since |an | converges. As
X
f (σ)nσ0 = an0 + nσ0 an n−σ ,
n>n0
P∞ −s .
1. Define the abscissa of convergence of a Dirichlet’s series n=1 an n Show that if the series diverges,
then
log |sn |
0 ≤ σ0 = lim sup .
n→∞ log n
3. Show that if the series ∞ −s converges for some s ∈ C, then, for every δ > 0, it converges
P
n=1 an n 0
uniformly in the sector n π π o
s : − + δ < arg(s − s0 ) < − δ .
2 2
Unit 4
Course Structure
• The Riemann Zeta function
4.1 Introduction
As we have introduced in the previous unit, the Riemann zeta function ζ(s) is a function of the complex
variable s, defined as
∞
X 1
ζ(s) = ,
ns
n=1
which plays a pivotal role in the analytic number theory and has applications in physics, probability theory,
and applied statistics.
As a function of a real variable, Leonhard Euler first introduced and studied it in the first half of the
eighteenth century without using complex analysis, which was not available at the time. Bernhard Riemann’s
1859 article ”On the Number of Primes Less Than a Given Magnitude” extended the Euler definition to a
complex variable, proved its meromorphic continuation and functional equation, and established a relation
between its zeros and the distribution of prime numbers.
The values of the Riemann zeta function at even positive integers were computed by Euler. The first of
them, ζ(2), provides a solution to the Basel problem. In 1979 Roger Apry proved the irrationality of ζ(3).
The values at negative integer points, also found by Euler, are rational numbers and play an important role
in the theory of modular forms. Many generalizations of the Riemann zeta function, such as Dirichlet series,
Dirichlet L-functions and L-functions, are known. We will however not indulge into such rigorous treatments
of the zeta function. We will only restrict ourselves to some preliminary ideas, starting with the definition,
convergence, etc.
Objectives
After reading this unit, you will be able to
• define the Riemann zeta function and know about its origins in a preliminary level
24
4.2. RIEMANN ZETA FUNCTION 25
where, s = σ + it is a complex number. First, we will discuss the convergence of the function. See that
∞
X 1
|ζ(s)| =
ns
n=1
∞
X 1
≤
|ns |
n=1
∞
X 1
=
|nσ+it |
n=1
∞
X 1
=
|nσ | · |nit |
n=1
∞
X 1
= σ | eit log(n) |
n=1
n
∞
X 1
= .
nσ
n=1
The last term in the above equation is the sum of all terms n−s , excluding the terms n which are multiples of
2.
Again, multiplying the equation (4.3.3) by 3−s , we get
∞ ∞
1 1 X 1 1 X 1
ζ(s) 1 − s · s = · = . (4.3.4)
2 3 ns 3s (3n)s
n=1; n6=2k n=1; n6=2k
The last term in the above equation is the sum of all terms n−s , excluding the terms n which are multiples of
2 and 3.
Continuing in this way, we get,
∞
1 1 1 X 1
ζ(s) 1 − s 1 − s ··· 1 − s = ,
2 3 pn ns
··· n6=pn k
where, the term on the right hand side of the above equation is the sum of all those terms n−s , which are not
the multiples of the primes 2, 3, 5, . . . , pn arranged in ascending order. Thus, taking limit as n → ∞, we get
1 1 1 X 1
lim ζ(s) 1 − s 1 − s ··· 1 − s = ,
n→∞ 2 3 pn ns
n6=pn k
where, the sum is taken over all such n−s , such that n is not a multiple of any prime pn and such number can
be none other than 1. So, the above equation becomes,
1 1 1
ζ(s) 1 − s 1 − s · · · 1 − s · · · = 1,
2 3 pn
which finally gives,
1
= (1 − p−s −s −s
1 )(1 − p2 ) · · · (1 − pn ) · · · ,
ζ(s)
where, p1 , p2 , p3 , . . . , pn , . . . is the complete list of prime numbers arranged in ascending order.
4.4. FUNCTIONAL EQUATIONS 27
The above representation of the zeta function is called the Euler product representation of the zeta function.
Also, notice that the product development explained above, includes the introduction of an infinite product.
Infinite products, like the infinite series, are convergent when the sequence of partial products converge as we
will see in subsequent units. The infinite product in this case is convergent uniformly in the region σ > 1.
We have taken for granted that there are infinitely many primes. Actually, the reasoning can be used to
prove this fact. For if pn were the largest prime, then we would have got
1 1 1
ζ(s) 1 − s 1 − s ··· 1 − s = 1
2 3 pn
and
P∞it would follow that ζ(σ) has a finite limit when σ → 1. This contradicts the divergence of the series
n −1 .
n=1
and the integral converges for all values of Re s > 0. We make the substitution
t = nu ⇒ dt = ndu,
which gives Z ∞
1
Γ(s) s us−1 e−nu du.
n 0
Summing over n from 1 to ∞, we get,
∞ ∞ Z ∞
X 1 X
Γ(s) = us−1 e−nu du
ns 0
n=1 n=1
Since the integral on the right hand side is absolutely converging, so the sum and integral can be exchanged.
Thus, the above equation changes to
Z ∞ ∞
X
Γ(s)ζ(s) = u s−1
e−nu du
0 n=1
Z ∞
1 s−1
= u − 1 du
0 1 − e−u
Z ∞
e−u
= us−1 du
0 1 − e−u
Z ∞ s−1
u
= du.
0 eu −1
28 UNIT 4.
for Re s > 1. Thus, the celebrated relationship between the gamma and zeta function is given by
∞
us−1
Z
Γ(s)ζ(s) = du
0 eu −1
for Re s > 1.
Let f be any complex function that is analytic in the strip {z ∈ C : |Im z| < a}, and |f (x+iy)| ≤ A/(1+x2 )
2
for some constant A > 0 and all x ∈ R such that |y| < a for a > 0. e−πn z satisfies the properties stated thus.
By Poisson summation formula, we have for such f as described above,
X X
f (n) = fˆ(n),
n∈Z n∈Z
2
where, fˆ is the Fourier transform of f . The Fourier transform of the function eπx is the function itself, that
is, Z ∞
2 2
e−πx e−2πixξ dx = e−πξ .
−∞
For fixed values of t > 0 and a ∈ R, the change of variables x 7→ t1/2 (x + a) in the above integral show that
the Fourier transform of the function
2
f (x) = e−πt(x+a) ,
for fixed values of t > 0 and a ∈ R, we get
2
fˆ(ξ) = t−1/2 e−πξ /t e2πiaξ .
This identity has noteworthy consequences. For instance, the special case a = 0 is the transformation law for
the theta function we defined above. Thus, we get,
for t > 0.
We have Z ∞
Γ(s) = ts−1 e−t dt.
0
Thus,
s Z ∞ s
Γ = t 2 −1 e−t dt. (4.4.2)
2 0
We use the substitution
t = πn2 x ⇒ dt = πn2 dx
Thus, equation (4.4.2) becomes
s Z ∞ s 2
Z ∞
s s 2
Γ = (πn2 x) 2 −1 e−πn x πn2 dx = π 2 ns x 2 −1 e−πn x dx. (4.4.3)
2 0 0
which gives,
s Z ∞ ∞
s s X 2x
π− 2 Γ ζ(s) = x 2 −1 e−πn dx. (4.4.4)
2 0 n=1
We have,
∞
2x 2x
X X
ϑ(x) = e−πn =1+2 e−πn = 1 + 2ψ(x).
n∈Z n=1
Replacing this in the right hand side of equation (4.4.4) we get,
Z ∞ ∞ Z ∞
s X 2 s
x 2 −1 e−πn x dx = x 2 −1 ψ(x)dx
0 n=1 0
Z 1 s
Z ∞ s
−1
= x 2 ψ(x)dx + x 2 −1 ψ(x)dx. (4.4.5)
0 1
We have, by (4.4.1),
1 1
2ψ(x) + 1 = √ 2ψ +1
x x
which gives,
1 1 1 1
ψ(x) = √ ψ + √ − .
x x 2 x 2
Thus,
Z 1 Z 1
s
−1 1 1s
−1 1 1
x 2 ψ(x)dx = x √ ψ 2 + √ − dx
0 0 x x 2 x 2
Z 1
s
− 3 1 1 s−3 s
−1
= x2 2 ψ + x2 2 − x2 dx
0 x 2
Z 1
s
− 3 1 1
= x2 2 ψ dx + . (4.4.6)
0 x s(s − 1)
30 UNIT 4.
Using the substitution in the integral on the right hand of the above equation
1 1
x= ⇒ dx = − 2 du
u u
and the limits also change accordingly and the equation (4.4.6) becomes, with a change in the dummy variable
u to x,
Z 1 Z ∞
s
−1 s 1 1
x 2 ψ(x)dx = x− 2 − 2 ψ(x)dx + .
0 1 s(s − 1)
Finally, (4.4.5) becomes
Z ∞ Z ∞ Z ∞
s
−1 s
−1 s 1 1
x 2 ψ(x)dx = x 2 ψ(x)dx + x− 2 − 2 ψ(x)dx +
0 s(s − 1)
Z1 ∞ 1
s s 1
1
= x 2 −1 + x− 2 − 2 ψ(x)dx + .
1 s(s − 1)
Thus, (4.4.4) and (4.4.5) together give
s Z ∞
s s s 1
1
π− 2 Γ ζ(s) = x 2 −1 + x− 2 − 2 ψ(x)dx +
2 s(s − 1)
Z1 ∞
s 1−s
ψ(x) 1
= x2 + x 2 dx + . (4.4.7)
1 x s(s − 1)
Putting s by 1 − s in the above equation, we get,
Z ∞
− 1−s 1−s 1−s 1−(1−s)
ψ(x) 1
π 2 Γ ζ(1 − s) = x 2 +x 2 dx +
2 x (1 − s)(1 − s − 1)
Z1 ∞
1−s s)
ψ(x) 1
= x 2 +x2 dx + . (4.4.8)
1 x (1 − s)s
Notice that the right hand sides of the equations (4.4.7) and (4.4.8) are the same. So, we get
− 2s
s
− 1−s 1−s
π Γ ζ(s) = π 2 Γ ζ(1 − s).
2 2
Which is our required functional equation.
1. With proper justification, find the abscissa of convergence of the zeta function.
2. Establish the Euler product representation of the zeta function.
3. Establish the relation between zeta function and the gamma function.
4. Deduce the Riemann functional equation.
5. Write the Riemann functional equation. Hence deduce that
πs
ζ(s) = 2s π s−1 sin Γ(1 − s)ζ(1 − s).
2
Unit 5
Course Structure
• Entire functions, growth of an entire function
• Order and type and their representations in terms of the Taylor coefficients.
5.1 Introduction
In complex analysis, an entire function, also called an integral function, is a complex-valued function that
is holomorphic at all finite points over the whole complex plane. Typical examples of entire functions are
polynomials and the exponential function, and any finite sums, products and compositions of these, such
as the trigonometric functions sine and cosine and their hyperbolic counterparts sinh and cosh, as well as
derivatives and integrals of entire functions such as the error function. If an entire function f (z) has a root
at w, then f (z)/(z − w), taking the limit value at w, is an entire function. On the other hand, neither the
natural logarithm nor the square root is an entire function, nor can they be continued analytically to an entire
function. Also, a transcendental entire function is an entire function that is not a polynomial. For example, the
exponential function, sine and cosine functions are most common transcendental entire functions. This unit is
dedicated to the study of the growth of entire functions.
Objectives
After reading this unit, you will be able to
• learn the behaviour of the maximum modulus function M (r) for entire functions
• define the order and type of entire functions with the help of the Taylor coefficients
31
32 UNIT 5.
f (z) = a0 + a1 z + · · · + an z n + · · ·
which converges for all finite z. Moreover, according to the Cauchy Hadamard formula,
p
lim n |an | = 0.
n→∞
There are three possible ways in which an entire function f (z) can behave at infinity:
1. f (z) can have a regular point at infinity, then, according to Liouville’s theorem, f is a constant function;
3. f can have an essential singularity at infinity, and then f is said to be an entire transcendental function.
Note that, the behaviour of f (z) at infinity is determined by the action of f (1/z) at 0. We will be mainly
concerned with the transcendental entire functions from now on. If f is such a function, then we will clearly
have, since M (r) is a strictly increasing function of r,
lim M (r) = ∞.
r→∞
Theorem 5.2.1. If f (z) is a transcendental entire function, with maximum modulus function M (r), then
log M (r)
lim inf = ∞.
r→∞ log r
P∞ k
Proof. Let f (z) = k=0 ak z be a transcendental entire function with maximum modulus function M (r). If
possible, let
log M (r)
lim inf = µ < ∞.
r→∞ log r
Then for > 0, we can find an increasing sequence {rn }, tending to ∞, such that
log M (rn )
<µ+
log rn
for every rn , that is,
We have another result analogous to the above for non-transcendental entire functions as follows.
5.2. ENTIRE FUNCTIONS 33
Theorem 5.2.2. For an entire function f , if there exists a positive integer k such that
M (r)
lim < ∞,
r→∞ rk
then f is a polynomial of degree k atmost.
M (r)
lim = µ < ∞.
r→∞ r k
Then,
M (r) < (µ + )rk ,
for any positive and all r ≥ r0 , for some r0 . By Cauchy’s inequality,
M (r)
|an | ≤ < (µ + )rk−n
rn
for all r ≥ r0 . Since we can choose r sufficiently large, |an | → 0 as r → ∞ for n > k. Hence, an = 0 for all
n > k and thus f is a polynomial of degree at most k.
Definition 5.2.1. An entire function f is said to be of finite order if there exists a positive number k such that
the inequality
log M (r) < rk ,
or
k
M (r) < er
holds for sufficiently large r. Then
k
ρ = inf { k : M (r) < er holds for sufficiently large r}
is called the order of f . If ρ = ∞, that is, for any number k, there exists arbitrarily large values of r such that
log M (r) > rk , then f is said to be of infinite order.
z
For example, ez if of finite order (in fact, of order 1), while ee is of infinite order. From the definition, it is
clear that the order of an entire function is always non-negative.
Proof. Let ρ be the order of f . Then, from the definition of ρ, we have, for any > 0, there exists a number
r0 () > 0 such that
log M (r) < rρ+
holds for all r > r0 . On the other hand, there exists an increasing sequence {rn } tending to infinity, such that
Definition 5.2.2. Let f be an entire function with finite non-zero order ρ. By the type τ of f , we mean the
greatest lower bound of positive numbers k such that the inequality
However, suppose that τ = ∞, that is, suppose that given any positive number k, there exists arbitrarily large
values of r such that
log M (r) > krρ .
Then f is said to be of infinite (or, maximum) type. And when, τ = 0, then f is said to be of minimum type.
From the definition of type, it is clear that τ is always non-negative. When 0 < τ < ∞, then f is said to be of
normal type.
Theorem 5.2.4. The type τ of an entire function f with finite non-zero order ρ is given by the formula
log M (r)
τ = lim sup .
r→∞ rρ
5.2. ENTIRE FUNCTIONS 35
Proof. Let τ be the type of an entire function f of order ρ(6= 0). Then, from the definition of τ , we have, for
any > 0, there exists a number r0 () > 0 such that
On the other hand, there exists an increasing sequence {rn } tending to infinity, such that
In other words,
log M (r)
< τ + , ∀ r > r0 (5.2.4)
rρ
and
log M (r)
> τ − , (5.2.5)
rρ
for a sequence if values of r tending to infinity. Equations (5.2.4) and (5.2.5) together means
log M (r)
τ = lim sup .
r→∞ rρ
Example 5.2.2. We show that the order of any polynomial is zero. Let f (z) = a0 + a1 z + · · · + an z n be a
polynomial. Then
|f (z)| = |a0 + a1 z + · · · + an z n |
≤ |a0 | + |a1 ||z| + · · · + |an ||z|n .
Thus,
M (r) ≤ |a0 | + |a1 |r + · · · + |an |rn ≤ rn (|a0 | + |a1 | + · · · + |an |) = Brn .
(taking r ≥ 1. This choice is justified since ultimately r → ∞.) where,
Hence,
log M (r) ≤ log B + n log r ≤ logr + n log r
taking r sufficiently large. Thus,
log M (r) ≤ (n + 1) log r
for large r. Hence,
log log M (r) log(n + 1) + log log r
ρ = lim sup ≤ lim sup = 0,
r→∞ log r r→∞ log r
that is, ρ ≤ 0. Also, we know that by definition, ρ ≥ 0. Hence, ρ = 0.
From the above example, it is clear that the order of any constant function is zero. But, it does not mean
that any zero order entire function is always a polynomial.
Example 5.2.3. The order of a transcendental entire function may also be zero. For example, if
∞
X zn
f (z) = , δ > 0,
n=0
nn1+δ
Since ρ1 and ρ2 are orders of f1 and f2 respectively, we have for any given > 0,
log M (r, f1 ) < rρ1 +
log M (r, f2 ) < rρ2 +
for all sufficiently large r. Now,
|φ(z)| = |f1 (z)||f2 (z)| ≤ M (r, f1 ) · M (r, f2 ), ∀ z in |z| ≤ r.
Hence,
M (r, φ) ≤ M (r, f1 ) · M (r, f2 )
or, log M (r, φ) ≤ log M (r, f1 ) + log M (r, f2 )
≤ rρ1 + + rρ2 +
≤ rρ1 + + rρ1 +
≤ 2rρ1 + < r · rρ1 + = rρ1 +2
for large r. Thus,
M (r, φ) < rρ1 +2 (5.2.6)
for sufficiently large r. Since > 0 is arbitrary, so from (5.2.6), it follows that
ρ ≤ ρ1 = max{ρ1 , ρ2 }.
Similarly, the result follows when ρ2 > ρ1 .
2. Let ψ(z) = f1 (z) ± f2 (z) be of order ρ and let ρ1 ≥ ρ2 . Also, let M (r, ψ) = max|ψ(z)|. Since ρ1 and
|z|=r
ρ2 are orders of f1 and f2 respectively, we have for any given > 0,
ρ1 + ρ2 +
M (r, f1 ) < er , and M (r, f2 ) < er ,
for sufficiently large r. Now,
|ψ(z)| ≤ |f1 (z)| + |f2 (z)| ⇒ M (r, ψ) ≤ M (r, f1 ) + M (r, f2 )
ρ1 + ρ2 + ρ1 +
< er + er < 2 er
for sufficiently large r [since exponential function and rn are both increasing]. Thus,
ρ1 +2
M (r, ψ) < er ⇒ log M (r) < rρ1 +2
for all large r. Since > 0 is arbitrary, it follows that the order of ψ can’t exceed ρ1 , that is, ρ ≤ ρ1 =
max{ρ1 , ρ2 }. Similarly, if ρ2 > ρ1 , the result can be proved.
5.2. ENTIRE FUNCTIONS 37
Hence,
ρ1 − ρ2 + 1
M (r, ψ) ≥ ern − ern = exp(rnρ1 − ){1 − exp(rnρ2 + − rnρ1 − )} > exp(rnρ1 − )
2
if is chosen so small that ρ2 + < ρ1 − for sufficiently large n. Hence, ρ ≥ ρ1 . But already we have,
ρ ≤ ρ1 . Thus, ρ = ρ1 = max{ρ1 , ρ2 }.
Remark 5.2.1. The result of the above two corollaries are not true if ρ1 = ρ2 . For example, let f1 (z) = ez
and f2 (z) = − ez . Then the orders of f1 and f2 are both 1. But the order of f1 + f2 is 0. Similarly, if
f1 (z) = ez and f2 (z) = e−z then the orders of f1 and f2 are both 1 and the order of f1 f2 is 0.
Example 5.2.4. Let P (z) be a polynomial of degree n, then the order of eP (z) is n and the type of eP (z) is the
modulus of the coefficient of the highest degree term in P (z).
Let P (z) = a0 + a1 z + · · · + an z n , an 6= 0 and f (z) = exp(a0 + a1 z + · · · + an z n ). Then,
m
Let us find max| eam z |. Let am = t eiφ and z = r eiθ . Then
|z|=r
Hence,
m
max| eam z | = max| exp{trm ei(mθ+φ) }|
|z|=r θ
= max| exp(trm {cos(mθ + φ) + i sin(mθ + φ)})|
θ
= max| exp(trm cos(mθ + φ))| = exp(trm ) = exp{|am |rm }.
θ
Hence,
log M (r, f ) = |a0 | + |a1 |r + · · · + |an |rn .
38 UNIT 5.
Hence,
And,
Let f (z) = ∞ n
P
n=0 an z be an entire function with order ρ and type τ . We now state the formulae for order
and type of f in terms of the Taylor’s coefficients.
P∞ n
Theorem 5.2.6. Let f (z) = n=0 an z be an entire function of finite order ρ. Then
log n
ρ = lim sup .
n→∞ 1
log |an |1/n
P∞ n
Theorem 5.2.7. Let f (z) = n=0 an z be an entire function of finite order ρ. Then
1
τ= lim sup n|an |ρ/n .
e ρ n→∞
Hence,
Hence, τ = α.
Exercise 5.2.1. 1. Find the order and type of the following functions
(a) ez
4
(b) ez ·z 4
(c) sin z
1. Show that for a transcendental entire function f with maximum modulus function M (r),
log M (r)
lim inf = ∞.
r→∞ log r
40 UNIT 5.
M (r)
lim < ∞,
r→∞ rk
holds, then show that f is a polynomial of degree at most k.
3. Show that the order ρ of an entire function with maximum modulus function M (r) is given by
4. Show that any polynomial is of order zero. Is the converse true? Justify.
5. Define the type of an entire function f having finite non-zero order ρ. Also, show that
log M (r)
τ = lim sup .
r→∞ rρ
6. Show that for two entire functions f1 and f2 of orders ρ1 and ρ2 respectively,
order of f1 f2 ≤ max{ρ1 , ρ2 }.
7. Show that for two entire functions f1 and f2 of orders ρ1 and ρ2 respectively,
order of f1 ± f2 ≤ max{ρ1 , ρ2 }.
Unit 6
Course Structure
• Distribution of zeros of entire functions.
• The exponent of convergence of zeros.
6.1 Introduction
The zeroes of entire functions play an important role in determining their growth rates. We will start off with
the Jensen’s theorems for analytic functions. In complex analysis, Jensen’s formula, introduced by Johan
Jensen (1899), relates the average magnitude of an analytic function on a circle with the number of its zeros
inside the circle. It forms an important statement in the study of entire functions as we will soon come to see.
Objectives
After reading this unit, you will be able to
• study Jensen’s theorems and related results
• define convergence exponent of the zeros of an entire function and deduce various related results
41
42 UNIT 6.
The zeroes of the denominator of φ are also the zeros of f of the same order. Hence the zeros of f cancel the
poles an in the product and so φ is analytic on |z| ≤ R. Also, φ(z) 6= 0 on |z| ≤ R. Since
R2 R2 R2
R2 − ak z = 0 ⇒ z = ⇒ |z| = = >R
ak |ak | |ak |
since |ak | < R for all k = 1, 2, . . . , n. Thus, any zero of φ(z) lies outside the circle |z| = R. So, φ has neither
zeros nor poles in |z| ≤ R. Thus, the function log φ(z) is analytic in |z| ≤ R. Thus, by Cauchy’s Integral
theorem, we have
n
!
R2 − ak z
Z
1 1 Y
log φ(0) = log f (z) dz. (6.2.2)
2πi |z|=R z R(z − ak )
k=1
On |z| = R, we have, z = R eiθ , θ ∈ [0, 2π], which implies that dz = R eiθ idθ. Also,
R2 − a1 z R2 − a2 z R 2 − an z
|φ(z)| = |f (z)| · ··· .
R(z − a1 ) R(z − a2 ) R(z − an )
On |z| = R, we have
R2 − ak z zz − ak z |z| z − ak z − ak
= = = =1
R(z − ak ) R(z − ak ) R z − ak z − ak
and thus,
|φ(z)| = |f (z)|, on |z| = R.
The equation (6.2.2) changes to
n
Z 2π !
1 1 R 2−a z
k
Y
log φ(0) = log f (R eiθ ) R eiθ idθ
2πi 0 R eiθ R(z − ak )
k=1
Z 2π " n 2 #
1 iθ
X R − ak z
= log f (R e ) + log dθ. (6.2.3)
2π 0 R(z − ak )
k=1
Taking real parts of equation (6.2.3), we get, by using the conditions deduced in the previous discussions,
Z 2π
1
log |φ(0)| = log |f (R eiθ )|dθ. (6.2.4)
2π 0
Since equation (6.2.1) gives
n n
Y R Y R Rn
|φ(0)| = |f (0)| = |f (0)| = |f (0)|
|ak | rk r1 r2 · · · rn
k=1 k=1
Thus,
Rn
log |φ(0)| = log |f (0)| + log
r1 r2 · · · rn
and thus, equation (6.2.4) gives
2π
Rn
Z
1
log = log |f (R eiθ )|dθ − log |f (0)|.
r1 r2 · · · rn 2π 0
6.2. DISTRIBUTION OF ZEROS OF ANALYTIC FUNCTIONS 43
Theorem 6.2.2. (Jensen’s Inequality) Let f be analytic on |z| ≤ R, f (0) 6= 0 and f (z) 6= R on |z| = R. If
a1 , a2 , . . . , an are the zeros of f in |z| < R, multiple zeros being repeated, and |ai | = ri , then
Rn |f (0)|
≤ M (R).
r1 r2 · · · rn
Proof. Let
R2 − a1 z R2 − a2 z R2 − an z
φ(z) = f (z) · · ···
R(z − a1 ) R(z − a2 ) R(z − an )
n
Y R 2 − ak z
= f (z) .
R(z − ak )
k=1
The zeroes of the denominator of φ are also the zeros of f of the same order. Hence the zeros of f cancel the
poles an in the product and so φ is analytic on |z| ≤ R. Also,
R2 − a1 z R2 − a2 z R 2 − an z
|φ(z)| = |f (z)| · ··· .
R(z − a1 ) R(z − a2 ) R(z − an )
On |z| = R, we have
R2 − ak z zz − ak z |z| z − ak z − ak
= = = =1
R(z − ak ) R(z − ak ) R z − ak z − ak
and thus,
|φ(z)| = |f (z)|, on |z| = R.
By Maximum modulus theorem, |φ(z)| ≤ M (R) for |z| ≤ R. In particular, |φ(0)| ≤ M (R), that is,
R R Rn |f (0)|
|f (0)| ··· ≤ M (R) ⇒ ≤ M (R).
−a1 −an r1 r2 · · · rn
Definition 6.2.1. Let f be analytic on |z| ≤ R, with zeros at the points a1 , a2 , . . . , arranged in the order of
non-decreasing modulus, multiple zeros being repeated. We define the function n(r) as the number of zeros
of f in |z| ≤ r, r ≤ R. Evidently, n(r) is a non-negative, non-decreasing function of r which is constant in
any interval which does not contain the modulus of a zero of f . Observe that if f (0) 6= 0, then n(r) = 0 for
r < |a1 |. Also, n(r) = n for |an | ≤ r < |an+1 |.
Theorem 6.2.3. Let f be analytic on |z| ≤ R, f (0) 6= 0. Let its zeros, arranged in order of non-decreasing
modulus be a1 , a2 , . . ., multiple zeros being repeated according to their multiplicities. If |an | ≤ r < |an+1 |,
then Z x
n(x)
dx ≤ log M (r) − log |f (0)|.
0 x
44 UNIT 6.
Proof. Let |ai | = ri , i = 1, 2, . . . , and r be a positive number such that rN ≤ r < rN +1 , (r ≤ R). Let
x1 , x2 , . . . , xm be the distinct numbers of the set E = {r1 , r2 , . . . , rN } so that x1 = r1 , . . . , xm = rN .
Suppose xi is repeated pi times in E. Then p1 + · · · + pm = N . Also, si = p1 + · · · + pi , i = 1, 2, . . . , m.
We consider two cases.
Case I: Let rN < r. Then,
(Z ) Z
Z x x2 − Z x3 − Z xm − r
n(x) n(x) n(x) n(x) n(x)
dx = lim dx + dx + · · · + dx + dx
0 x →0 x1 x x2 x xm−1 x xm x
Z x1 −
n(x)
since dx = 0 as n(x) = 0 when 0 ≤ x < x1
0 x
(Z ) Z
x2 − Z x3 − Z xm − r
s1 s2 sm−1 N
= lim dx + dx + · · · + dx + dx
→0 x1 x x2 x xm−1 x xm x
n o
= lim [s1 log x]xx21 − + [s2 log x]xx32 − + · · · + [sm−1 log x]xxm −
m−1
+ [N log x]rxN
→0
= lim [s1 {log(x2 − ) − log x1 } + s2 {log(x3 − ) − log x2 } + · · ·
→0
+sm−1 {log(xm − ) − log xm−1 }] + N (log r − log rN )
= s1 (log x2 − log x1 ) + s2 (log x3 − log x2 ) + · · · + sm−1 (log xm − log xm−1 )
+N (log r − log rN )
= p1 log x2 − p1 log x1 + (p1 + p2 ) log x3 − (p1 + p2 ) log x2 + · · ·
+(p1 + · · · + pm−1 ) log xm − (p1 + · · · + pm−1 ) log xm−1
+N log r − (p1 + · · · + pm ) log xm
= N log r − (p1 log x1 + p2 log x2 + · · · + pm log xm )
= log rN − log xp11 x2p2 · · · xpmm
rN rN
= log pm = log .
xp11 xp22 · · · xm r1 r2 · · · rN
Note 6.2.2. Jensen’s inequality is also true for entire functions where R → ∞.
Theorem 6.3.2. Let f be an entire function with zeros z1 , z2 , . . ., arranged in order of non-decreasing modulus
∞
X 1
and |zn | = rn . If the convergence exponent ρ1 of the zeros of f be finite, then the series converges
rn α
n=1
when α > ρ1 and diverges when α < ρ1 . If ρ1 is infinite, the above series diverges for all positive values of
α.
1
Proof. Let ρ1 be finite and α > ρ1 . Then ρ1 < (ρ1 + α). Hence, from the definition of ρ1 we have,
2
log n 1
< (ρ1 + α) for all large n. Hence,
log rn 2
1 1
log n < (ρ1 + α) log rn = log rn 2 (ρ1 +α)
2
that is,
1 2
n < rn 2 (ρ1 +α) , or, n ρ1 +α < rn ,
2α α−ρ
1+ α+ρ1 α − ρ1
rnα > n ρ1 +α = n 1 = n1+p , where p = >0
α + ρ1
∞
1 1 X 1
Hence, < for large n. Hence, converges.
rn α n1+p rn α
n=1
log n
Next, let α < ρ1 . Then, > α for a sequence of values of n, tending to ∞, that is, log n > α log rn
log rn
1 1
= log rn α . Hence, n > rn α , or α > for a sequence of values of n tending to infinity. Let N be such a
rn n
1 1
value of n for which the above inequality holds, that is, α
> and let m be the least integer greater than
rN N
N
. Then, since rn is non-decreasing with n, we have
2
N
X 1 1 1 1 1 m+1 m 1
= α
+ α
+ ··· + α
≥ α
+ ··· + α
= α
> >
rN −m rN −m+1 rN rN rN rN N 2
N −m
∞
X 1
Since these are values of N as large as we please, by Cauchy’s principle of convergence, the series
rn α
n=1
diverges.
log n
If ρ1 is infinite, then for any value of α, > α for a sequence of values of n tending to infinity, that
log rn
is, log n > log rn α , that is, n > rn α for a sequence of values of n tending to infinity, from which we may
∞
X 1
similarly conclude that the series diverges for any positive α.
rn α
n=1
Note 6.3.1. We may also define convergence exponent ρ1 as the g.l.b of the positive numbers α for which the
∞ ∞
X 1 X 1
series α
is convergent. For an entire function with no zeros we define ρ1 = 0 and if the series
r r α
n=1 n n=1 n
diverges for all positive α, then ρ1 = ∞.
∞
X 1
Note 6.3.2. If ρ1 is finite, the series may be either convergent or divergent. For example, if rn = n
r ρ1
n=1 n
∞ ∞
log n log n X 1 X 1
we have, ρ1 = lim sup = lim sup = 1 and ρ
= diverges.
n→∞ log rn n→∞ log n rn 1 n
n=1 n=1
6.3. DISTRIBUTION OF ZEROS OF ENTIRE FUNCTIONS 47
log n log n 1
ρ1 = lim sup = lim sup = lim sup log log n
=1
n→∞ log rn n→∞ log n + 2 log log n n→∞ 1 + 2
log n
∞ ∞
X 1 X 1
and = converges.
rn
n=1
ρ 1
n (log n)2
n=1
Theorem 6.3.3. If f is an entire function with finite order ρ and r1 , r2 , . . . are the moduli of the zeros of f ,
∞
X 1
then converges if α > ρ.
rn α
n=1
Proof. Let β be a number such that ρ < β < α. Since n (r) = O (rρ+ ) for any > 0. We have, n (r) < Arβ
for all large r, A being a constant.
1
β β n nβ
Putting r = rn , n being large, this inequality gives n < Arn , that is, rn > , or, rn > 1 or,
A Aβ
α
n β α 1 B1 α
rn α > α = Bn β , B = constant. Hence, α < α for large n, B1 = constant. Since > 1, it fol-
Aβ rn nβ β
∞
X 1
lows that converges.
rn α
n=1
Corollary 6.3.1. Since convergence exponent ρ1 of the zeros of f is the lower bound of the positive numbers
∞
X 1
α for which is convergent, it follows from the above theorem that ρ1 ≤ ρ.
rn α
n=1
Remark 6.3.1. We can prove that the result ρ1 ≤ ρ without using the last theorem.
Proof. We have
log n
ρ1 = lim sup
n→∞ log rn
log n (r) log (Arρ+ )
= lim sup ≤ lim sup
r→∞ log r r→∞ log r
log A + (ρ + ) log r
= lim sup , A = constant
r→∞ log r
log A
= lim sup ρ + +
r→∞ log r
= ρ + , for any > 0
Hence, ρ1 ≤ ρ.
log n
Note 6.3.3. Convergence exponent may be 0 or ∞. For example, if rn = en , then ρ1 = lim sup = 0.
n→∞ n
Also,if rn = log n, then
log n
ρ1 = lim sup =∞
n→∞ log log n
We may have, ρ1 < ρ. For example, if f (z) = ez , then ρ = 1, ρ1 = 0, since there are no zeros of f . For sin z
or cos z, ρ = ρ1 = 1.
48 UNIT 6.
Theorem 6.3.4. Let f be an entire function of finite order. If convergence exponent ρ1 of the zeros of f is
greater than zero,then f has infinite number of zeros.
Proof. If possible, let f has finite number of zeros. Let r1 , r2 , . . . , rN be the moduli of the zeros of f arranged
N
X 1
in non-decreasing order. The series , being a series of finite number of terms, converges for every
r α
n=1 n
positive value of α. It follows ρ1 = 0 which contradicts our assumption. Hence f contains infinite number of
zeros.
π −π 3π −3π
Solution. The zeros of cos z are , , , , . . .. Now,
2 2 2 2
∞ α α α α
X 1 2 2 2 1 2 1
= + + · α+ · α + ···
rn α π π π 3 π 3
n=1
α
2 1 1
= 2 1 + α + α + ···
π 3 5
1 1 1
The series α
+ α + α + · · · converges when α > 1 and diverges when α < 1. Hence, the lower bound
1 3 5
∞
X 1
of positive numbers α for which the series converges us 1. Hence, ρ1 = 1.
r α
n=1 n
π π π 3π
Aliter: The zeros of cos z are (2n + 1) , n = 0, ±1, ±2, . . .. Let z1 = , z10 = − , z2 = ,
2 2 2 2
3π π π π
z20 = − , . . . , zn = (2n − 1) , zn0 = −(2n − 1) , . . .. Hence, r1 = |z1 | = |z10 | = ,
2 2 2 2
0 3π 0 π
r2 = |z2 | = |z2 | = , . . ., rn = |zn | = |zn | = (2n − 1) , . . .
2 2
Hence,
log n
ρ1 = lim sup
n→∞ log rn
log n
= lim sup
n→∞ log(2n − 1)π/2
log n
= lim sup
n→∞ log(2n − 1) + log π/2
log n
= lim sup
n→∞ log n(2 − 1/n) + log π/2
1
= lim sup log(2−1/n)
= 1.
n→∞ 1 +
log n + log π/2
log n
Theorem 6.3.5. If f is an entire function having no zeros, then f is of the form f (z) = eg(z) , where g(z) is
an entire function.
6.4. FEW PROBABLE QUESTIONS 49
f 0 (z)
h(z) = (6.3.3)
f (z)
is also an entire function. Integrating (6.3.3) along any path joining the two points z0 and z, we get
Z z Z z 0
f (z)
h(z)dz = dz = log f (z) − log f (z0 ),
z0 z0 f (z)
The right hand side of equation (6.3.4) is an entire function, say g(z). Hence,
3. For a function f analytic in |z| ≤ R, f (0) 6= 0, if a1 , a2 , . . . are its zeros, arranged in the order
of non-decreasing modulus, multiple zeros repeated according to their multiplicities, show that, for
|an | ≤ r < |an+1 |, Z x
n(x)
dx ≤ log M (r) − log |f (0)|.
0 x
4. Show that for an entire function f of finite order ρ and f (0) 6= 0, n(r) = O(rρ+ ) for any > 0 and
for sufficiently large values of r.
5. Define convergence exponent ρ1 of the zeros of an entire function. Show that if ri be the moduli of
∞
X 1
the zeros of an entire function f , arranged in order of non-decreasing modulus, then the series
rnα
n=1
converges for α > ρ1 and diverges for α < ρ1 .
6. Show that an entire function f having no zeros, is of the form f (z) = eg(z) , where g(z) is an entire
function.
Unit 7
Course Structure
• Infinite products and infinite product of functions
7.1 Introduction
In this unit, our main objective is to deduce Weierstrass’ factorization theorem, which asserts that every entire
function can be represented as a (possibly infinite) product involving its zeroes. The theorem may be viewed
as an extension of the fundamental theorem of algebra, which asserts that every polynomial may be factored
into linear factors, one for each root.
The theorem, which is named for Karl Weierstrass, is closely related to a second result that every sequence
tending to infinity has an associated entire function with zeroes at precisely the points of that sequence.
A generalization of the theorem extends it to meromorphic functions and allows one to consider a given
meromorphic function as a product of three factors: terms depending on the function’s zeros and poles, and
an associated non-zero analytic function.
We will start off with the infinite products of complex numbers and study the conditions required for their
convergence and thereafter establish the results for factorisation of entire functions in the upcoming units.
Objectives
After reading this unit, you will be able to
• define the conditions of convergence of infinite products and also the infinite product of functions
• learn about the factorisations of entire functions and deduce the Weierstrass’ factorisation theorem
50
7.2. INFINITE PRODUCTS 51
∞
Y
Proof. Let un = u, then Pn = u1 u2 · · · un → u 6= 0 as n → ∞, and Pn−1 = u1 u2 · · · un−1 → u. Hence
n=1
Pn lim Pn u
lim un = lim = n→∞ = = 1.
n→∞ n→∞ Pn−1 lim Pn−1 u
n→∞
∞
Y n
Remark 7.2.1. The condition is however, not sufficient. For example, if we take the product , then
n+1
n=1
we have,
1 2 n−1 n 1
Pn = · ··· · = → 0 as n → ∞
2 3 n n+1 n+1
n
and thus the product is divergent. But, lim un = = 1. So, the condition in the preceding theorem is
n→∞ n+1
not sufficient.
In view of the necessary condition for convergence, we write the general term of the product (7.2.1) in the
∞
Y
form un = (1 + an ), (an 6= 1), so that a necessary condition for convergence of the product (1 + an ) is
n=1
lim an = 0.
n→∞
52 UNIT 7.
∞
Y
Definition 7.2.1. (Absolute Convergence) An infinite product (1 + an ) is said to be absolutely conver-
n=1
∞
Y
gent if the product (1 + |an |) is convergent.
n=1
In case of absolute convergence, the factors of the product can be rearranged arbitrarily without affecting
the convergence of the product or changing the value of the product.
∞
Y ∞
X
Theorem 7.2.2. The infinite product (1 + an ), (an 6= 1), converges if and only if the series log(1 + an )
n=1 n=1
converges, where each logarithm has its principal value. Also,
∞ ∞
!
Y X
(1 + an ) = exp log(1 + an ) .
n=1 n=1
n
Y n
X
Proof. Let Pn = (1 + ak ), Sn = log(1 + ak ) and lim Sn = S. Since ez+w = ez · ew for all z, w ∈
n→∞
k=1 k=1
C, we have,
As Pn → P 6= (−∞, 0], we have, Pn ∈ C \ (−∞, 0] for large n and since log z is continuous at P ,
log Pn → log P as n → ∞. Since eSn = Pn , we can write
Then,
log(1 + an+1 ) = Sn+1 − Sn = log Pn+1 − log Pn + 2πi(kn+1 − kn ), (7.2.3)
where kn+1 and kn are integers. Equating the imaginary parts on both sides of (7.2.3), we get
∞
Y
Theorem 7.2.3. If an ≥ 0 for all n ∈ N, then the product (1 + an ) converges if and only if the series
n=1
∞
X
an converges.
n=1
n
Y n
X
Proof. Let Pn = (1 + ak ), and Sn = ak . Since an ≥ 0 for all n, {Pn } and {Sn } are both increasing
k=1 k=1
sequences. Since 1 + x ≤ ex for x ≥ 0, we have, a1 < 1 + a1 ≤ ea1 . Hence,
that is, Sn < Pn ≤ eSn . Thus, by the monotonic bounded principle, the series and the product are both
convergent or both divergent according as both are bounded or unbounded. Let Sn be bounded. Then Sn ≤ M
for some M > 0 and for all n. Now, Pn ≤ eSn and so Pn ≤ eM for all n, that is Pn is bounded. Thus, both the
Y∞
series and the product are either both bounded or both unbounded. Hence the product (1 + an ) converges
n=1
∞
X
if and only if the series an converges.
n=1
∞
Y ∞
X
Corollary 7.2.1. The product (1 + an ) is absolutely convergent if and only if the series an is abso-
n=1 n=1
lutely convergent.
∞
Y
Proof. Since |an | ≥ 0 for all n ∈ N, so by the previous theorem, the product (1 + |an |) is convergent if
n=1
∞
X
and only if the series |an | is convergent. Hence the result.
n=1
∞
X ∞
X ∞
X
Theorem 7.2.4. The three series |an |, | log(1 + an )| and log(1 + |an |) either converge or diverge
n=1 n=1 n=1
together.
log(1 + z) log(1 + z)
Proof. We know that lim = 1. Hence for with 0 < < 1, we get, −1 <
z→0 z z
whenever z → 0. The triangle inequality shows that
log(1 + z)
(1 − ) < < 1 + , whenever z → 0
z
that is,
(1 − )|z| < | log(1 + z)| < (1 + )|z|, whenever z → 0. (7.2.5)
log(1 + t)
Similarly, lim = 1 for 0 < t ≤ 1. Then,
t→0 t
(1 − )t < log(1 + t) < (1 + )t, whenever t → 0. (7.2.6)
∞
X
From (7.2.7) and (7.2.8), by comparison test, for any sequence {an } convergent to 0, the three series |an |,
n=1
∞
X ∞
X
| log(1 + an )| and log(1 + |an |) converge or diverge together.
n=1 n=1
∞
Y
And, the infinite product (1 + fn (z)) is said to be uniformly convergent to a function f (z) in E if the
n=1
n
Y
sequence {Pn (z)} of partial products, defined by Pn (z) = (1 + fk (z)) is uniformly convergent to the
k=1
function f (z) in E, with f (z) 6= 0 in E.
Theorem 7.2.5. Let every term of the sequence of functions {fn (z)} be analytic in a region G and suppose
∞
X
the infinite series log(1 + fn (z)) is uniformly convergent on every compact subset of G (in particular,
n=1
∞
Y
none of the terms fn (z) can take the value −1 at any point of G). Then the infinite product (1 + fn (z))
n=1
converges uniformly on every compact subset of G.
Theorem 7.2.6. (M test) Let every term of the sequence of functions {fn (z)} be analytic in a region G and
suppose none of the terms fn (z) takes the value −1 at any point of G. Moreover, suppose that there is a
∞
X
convergent series Mn , whose terms are non-negative constants, such that |fn (z)| ≤ Mn for all z ∈ G,
n=1
∞
Y
and for all n ≥ N , N being a positive integer. Then the infinite product (1 + fn (z)) converges uniformly
n=1
and absolutely to a non-vanishing analytic function f (z) on every compact subset of G.
z2 zp
E(z, 0) = 1 − z, E(z, p) = (1 − z) exp z + + ··· , p = 1, 2, . . .
2 p
are called Weierstrass’ primary factors. Each primary factor is an entire function with only one zero, a simple
zero at z = 1.
1
Lemma 7.3.1. If |z| ≤ , | log E(z, p)| ≤ 2|z|p+1 .
2
Proof. First let |z| < 1. Then,
z2 zp
log E(z, p) = log(1 − z) + z + + ···
2 p
2 p p+1 z2 zp
z z z
= −z − − ··· − − ··· + z + + ···
2 p p+1 2 p
z p+1 z p+2
= − − − ···
p+1 p+2
(7.3.1)
Now if |z| ≤ 21 ,
Theorem 7.3.1. Weierstrass’ Factorization theorem: Let {an } be an arbitrary sequence of complex num-
bers whose only limit point is ∞, that is, an → ∞ as n → ∞. Then it is possible to construct an entire
function f (z) with zeros precisely at these points.
Proof. We may suppose that the origin is not a zero of the entire function f (z) to be constructed so that
an 6= 0 ∀n. Because if origin is a zero of f (z) of order n, we need only multiply the constructed function by
56 UNIT 7.
z m . We also arrange the zeros in order of non-decreasing modulus(if several distinct points an have the same
modulus, we take them in any order) so that |a1 | ≤ |a2 | ≤ · · · .
Let |an | = rn . Since rn → ∞, we can always find a sequence of positive integers {pn } such that
∞
r pn
X
converges for all r > 0. In fact, if pn = n, for any given value of r, the inequality
rn
n=1
n
r 1
<
rn 2n
holds for all sufficiently large values of n and hence the series is convergent.
Next, we take an arbitrary positive number R and choose the integer N such that rN ≤ 2R < rN +1 . Then,
for n > N and |z| ≤ R, we have
z R R 1
≤ ≤ < .
an rn rN +1 2
pn
z R
From the previous lemma, we get log E , pn − 1 ≤ 2 . By Weierstrass’ M-test, the series
an rn
z
log E , pn − 1 converges absolutely and uniformly in |z| ≤ R. This implies that the infinite product
an
∞
Y z
E , pn − 1 converges absolutely and uniformly in the disk |z| ≤ R, however large R may be.
an
n=1
∞
Y z
Hence, the above product represents an entire function, say G(z). Thus G(z) = E , pn − 1 with
an
n=1
the same value of R. We choose another integer K such that rK ≤ R < rK+1 . Then each of the functions of
m
Y z
the sequence E , pn − 1 , m = K + 1, K + 2, . . . vanishes at the points a1 , a2 , . . . ak and nowhere
an
n=1
else in |z| ≤ R. Hence by Hurwitz’s theorem(Let each function of the sequence {fn } be analytic in the
closed region D and bounded by a closed contour γ. The sequence {fn } converges uniformly to f in D. If
f (z) 6= 0 on γ, then f and the functions fn , for all large values of n have the same number of zeros within
γ. Also, a zero of f is either a zero of fn for all sufficiently large values of n, or, else is a limit point of the
set of zeros of the functions of the sequence), the only zeros of G in |z| ≤ R are a1 , a2 , . . . , aK . Since R is
arbitrary this implies that the only zeros of G are the points of the sequence {an }. Thus, G is our required
entire function. Now, if the origin is a zero function of order m of the required entire function f (z), then f (z)
is of the form:
f (z) = z m G(z)
Note 7.3.1. Since there are many possible sequences {pn } in the construction of the function G(z) and
ultimately of f (z), the function f (z) is not uniquely determined. Again, for any entire function g(z), eg(z) is
also an entire function without any zeros. Hence, the general form of the required entire function f (z) is of
the form:
∞
m g(z) m g(z)
Y z
f (z) = z e G(z) = z e E , pn − 1 .
an
n=1
Theorem 7.3.2. If f is an entire function and f (0) 6= 0, then f (z) = f (0)G(z) eg(z) , where G(z) is a product
of primary factors and g(z) is an entire function.
7.3. FACTORIZATION OF ENTIRE FUNCTIONS 57
f 0 (z) G0 (z)
Proof. We form G(z) as in the previous theorem from the zeros of f . Let φ(z) = − . Then φ is
f (z) G(z)
an entire function, since the poles of one term are cancelled by those of the other. Let
Z z
g(z) = φ(t)dt.
0
Then g(z) is also an entire function. Now,
Z z 0
f (t) G0 (t)
g(z) = − dt
0 f (t) G(t)
Z z
d
= (log f (t) − log G(t)) dt
0 dt
= [log f (t) − log G(t)]z0
= log f (z) − log f (0) − log G(z) + log G(0)
= log f (z) − log f (0) − log G(z) [since log G(0) = 1]
f (z)
= log .
f (0)G(z)
Hence,
f (z)
eg(z) = ⇒ f (z) = f (0)G(z) eg(z) .
f (0)G(z)
Theorem 7.3.3. If the real part of an entire function f satisfies the inequality Re f < rk+ for any > 0 and
for a sequence of values of r tending to infinity, then f is a polynomial of degree not exceeding k.
∞ Z
X 1 f (z)
Proof. By Taylor’s theorem, f (z) = an z n , where an = dz, γ : |z| = r, r being a positive
2πi γ z n+1
n=0
number. On γ, z = r eiθ , 0 ≤ θ ≤ 2π. Now, when n > 0,
Z Z X ∞
f (z) dz
n+1
dz = am z m n+1
γ z γ m=0 z
∞
XZ dz
= am z m n+1
z
m=0 γ
∞ Z 2π
X am rm e−imθ ·ir eiθ
= dθ
m=0 0
rn+1 ei(n+1)θ
∞ Z 2π
X
= am rm−n e−i(m+n)θ idθ = 0,
m=0 0
∞
X
the term by term integration is valid since the series am z m converges uniformly. Hence, for n > 0,
m=0
Z Z
1 f (z) 1 f (z)
an = n+1
dz + dz
2πi γ z 2πi γ z n+1
Z
1 dz
= f (z) + f (z) n+1
2πi γ z
1 2π Re f (r eiθ )
Z Z
1 dz
= Re f (z) n+1 = dθ.
πi γ z π 0 rn einθ
58 UNIT 7.
Hence,
2π 2π
Re f (r eiθ )
Z Z
n 1 1
an r = inθ
dθ. ⇒ |an |rn ≤ |Re f (r eiθ )|dθ, for n > 0.
π 0 e π 0
Also, Z Z 2π
1 f (z) 1
a0 = dz = f (r eiθ )dθ
2πi γ z 2π 0
and so, Z 2π
1
Re a0 = Re f (r eiθ )dθ.
2π 0
Hence,
Z 2π
1n
2Re a0 + |an |r ≤ (|Re f | + Re f ) dθ = 2Re f ; if Re f > 0
π 0
= 0; if Re f ≤ 0.
Since Re f < rk+ for any > 0 and for a sequence of values {rm } of r tending to infinity, we have,
n k+ k+−n −n
2Re a0 + |an |rm < 4rm ⇒ |an | < 4rm − 2Re a0 rm
for any > 0. Taking limit as rm → ∞, we have, an = 0 when n > k and so, f is a polynomial of degree
not exceeding k.
Theorem 7.3.4. The function ef (z) is an entire function of finite order with no zeros if and only if f (z) is a
polynomial.
Proof. We already know that ef (z) is an entire function with no zeros if and only if f is an entire function.
Moreover, if f is a polynomial of degree k, then ef (z) is of finite order k.
Conversely, we assume that ef (z) is an entire function with finite order ρ and without any zeros. Then, f is
ρ+
an entire function. Also, ef (z) < er , for all r > r0 , r = |z| and > 0 is arbitrary, that is,
ρ+
eRe f < er ⇒ Re f < rρ+ , ∀r > r0 and > 0.
Hence, by the previous theorem, f is a polynomial of degree not exceeding ρ.
∞
Y
1. When is an infinite product un , un 6= 0 for all n, said to be convergent. Deduce a necessary
n=1
condition for the convergence of the product.
∞
Y
2. Show that if an infinite product un , un 6= 0 for all n, is convergent, then lim un = 1. Is the
n→∞
n=1
condition sufficient? Justify your answer.
∞
Y ∞
X
3. Show that the infinite product (1 + an ), (an 6= −1) converges if and only if the series log(1 + an )
n=1 n=1
converges and
∞ ∞
!
Y X
(1 + an ) = exp log(1 + an ) .
n=1 n=1
7.4. FEW PROBABLE QUESTIONS 59
∞
Y ∞
X
4. Show that the infinite product (1 + an ), (an ≥ 0) converges if and only if the series an con-
n=1 n=1
verges.
6. If for an entire function f satisfies the inequality Re f < rk+ for any > 0 and a sequence of values
of r tending to infinity, then show that f is a polynomial of degree not exceeding k.
Unit 8
Course Structure
• Canonical product, Borel’s first theorem. Borel’s second theorem (statement only),
8.1 Introduction
This unit deals with the factorization of entire functions of finite order with the help of the newly defined
canonical product. Hence we have deduced several results related to the relationship between the order of an
entire function f and the convergence exponent of its zeros culminating in the Picard’s little theorem. Little
Picard’s theorem, named after Charles E. Picard, is an important theorem that puts light on the range set of a
non-constant entire function. We will discuss few examples in this light.
Objectives
After reading this unit, you will be able to
60
8.2. CANONICAL PRODUCT 61
∞
Y z
form the infinite product G(z) = E , p . By Weierstrass’ factorization theorem, G(z) represents an
an
n=1
entire function having zeros precisely at the points an . We call G(z) as the canonical product formed with the
sequence {an } of zeros of f and the integer p is called its genus.
If z = 0 is a zero of f of order m, then the canonical product is z m G(z).
Observe that, if the convergence exponent ρ1 6= an integer, then p = [ρ1 ] and if ρ1 = an integer, then
∞
X 1
1. p = ρ1 , if ρ1 is divergent;
n=1
rn
∞
X 1
2. p = ρ1 − 1, if is convergent.
n=1
rnρ1
Theorem 8.2.1. (Borel’s theorem) The order of a canonical product is equal to the convergence exponent of
its zeros.
Proof. Let
∞
Y z
G(z) = E ,p
an
n=1
be a canonical product with zeros at the points a1 , a2 , . . . and genus p. Let ρ1 and ρ be the convergence
exponent and order respectively of G(z). Since ρ1 ≤ ρ for any entire function, we only need to prove that
ρ ≤ ρ1 for G(z). Let |an | = rn and K(> 1) be a constant. Then for |z| = r,
X z X z
log |G(z)| = log E ,p + log E , p = Σ1 + Σ2 (say).
an an
rn ≤Kr rn >Kr
r 1
We first estimate Σ2 . In Σ2 , < < 1. Hence,
rn K
p+1 p+2
z 1 z 1 z
log E ,p = − − − ···
an p + 1 an p + 2 an
( )
r p+1
p+2
z 1 r
⇒ log E ,p < + + ···
an p+1 rn rn
p+1
r p+1
1 rn r
= r <A ,
p+1 1 − rn rn
62 UNIT 8.
A being a constant. Also, we know that log |f | = Re(log f ) ≤ |log f | [since log f (z) = log |f (z)|+i arg f (z)],
for any function f .
Hence,
! !
X
z
X r p+1 X 1
= O rp+1 = O rp+1
Σ2 = log E ,p = O p+1
an rn rn
rn >Kr rn >Kr rn >Kr
" #
X 1
since is convergent by the definition of p and converges to B(say) . If p + 1 = ρ1 , then
rn p+1
r n >Kr
ρ1
P
2 = O (r ). Otherwise, p + 1 > ρ1 + , > 0 being small enough. Then
h i
since rn >Kr rn−ρ1 − is convergent . Thus, in any case,
P
Σ2 = O rρ1 + .
(8.2.1)
P P r 1
Next, we estimate 1. In 1, ≥ . Now,
rn K
1 z p
z z z
log E ,p = log 1 − exp + ...
an an an p an
p
r r 1 r
≤ log 1 + + + ··· .
rn rn p rn
p
r r |x| z r
Also, log 1 + < [since 1+|x| < e , hence log (1 + |x|) < |x|]. Hence, log E ,p < A
rn rn a n rn
p
X r X
= O r p rn −p
P
where A depends only on K. Hence 1 = O
rn
rn ≤Kr rn ≤Kr
X X
= O r p rnρ1 +−p · rn−ρ1 − = O rp (Kr)ρ1 +−p · rn−ρ1 − = O rρ1 + . Using this and
rn ≤Kr rn ≤Kr
equation (8.2.1), we get, log |G(z)| = O(rρ1 + ). This implies that ρ ≤ ρ1 . Combining, we have ρ = ρ1 .
is divergent, but
∞ ∞
X 1 1 X 1
=
rn2 π2 n2
n=1 n=1
8.3. HADAMARD’S FACTORIZATION THEOREM AND RESULTS 63
is convergent. Hence the genus of the required canonical product is 1. Hence the canonical product is given
by
−1 ∞
Y z z Y z z
G(z) = 1− e an · 1− e an
n=−∞
an an
n=1
∞
Y z z z −z
= 1− e nπ 1 + e nπ
nπ nπ
n=1
∞
z2
Y
= 1− 2 2 .
n π
n=1
where Q(z) is an entire function and G(z) is the canonical product with genus p formed with the zeros
a1 , a2 , . . . of f . Since ρ is finite, we need to show that Q(z) is a polynomial of degree less than or equal to ρ.
Let m = [ρ]. Then genus p ≤ m. Taking logarithms on both sides of (8.3.1), we get
Now, Q(z) will be a polynomial of degree n at most if we can show that Q(m+1) (z) = 0.
Let
z −1
f (z) Y
gR (z) = 1− .
f (0) an
|an |≤R
Y z −1
Then since f (z) is entire and f (0) 6= 0 and 1− gR (z) cancels with the factors in f (z), so
an
|an |≤R
z
gR (z) is an entire function and gR (z) 6= 0 in |z| ≤ R. For |z| = 2R and |an | ≤ R, we have, 1 − ≥ 1.
an
Hence,
|f (z)|
< A exp (2R)ρ+ ,
|gR (z)| ≤ for |z| = 2R, A = constant.
|f (0)|
By maximum modulus theorem,
Let hR (z) = log gR (z), the logarithm being determined such that hR (0) = 0. Then hR (z) is analytic in
|z| ≤ R. Now, from (8.3.4), we have
Hence,
Re hR (z) < KRρ+ , K = constant. (8.3.5)
Hence, from the second corollary of Borel Caratheodory theorem, we have,
Hence,
dm f 0 (z)
(m+1)
X 1
hR (z) = + m! .
dz m f (z) (an − z)m+1
|an |≤R
R
and so also for |z| < by maximum modulus theorem. The first term on the right hand side of (8.3.7)
2
tends to zero as R → ∞ if > 0 is small enough, since m + 1 > ρ. Also, the second term tends to 0
8.3. HADAMARD’S FACTORIZATION THEOREM AND RESULTS 65
∞
X 1 X 1
since m+1
is convergent. In fact, becomes the remainder term for large R. Hence,
|an | |an |m+1
n=1 |an |>R
Q(m+1) (z) = 0, since Q(m+1) is independent of R. Thus, Q(z) is a polynomial of degree not greater than
ρ.
Theorem 8.3.2. If f is an entire function of order ρ and ρ1 is the convergence exponent of its zeros, then
ρ1 = ρ if ρ is not an integer.
Proof. Since the zeros of f coincide with the zeros of its canonical product G(z), we can take ρ1 to be
the convergence exponent of the zeros of G(z). By Hadamard’s factorization theorem, we have, f (z) =
eQ(z) G(z), where Q(z) is a polynomial of degree not exceeding ρ. In any case, ρ1 ≤ ρ. Suppose if possible
ρ1 < ρ. Also, if degree of Q(z) = q, then eQ(z) is of order q ≤ ρ. In this case, q < ρ, since q is an integer
and ρ is not an integer. Thus, f is the product of two entire functions each of order less than ρ. Hence, order
of f is less than ρ which contradicts the given hypothesis. Hence, ρ1 = ρ.
Theorem 8.3.3. Let f be an entire function with order ρ and g be an entire function with order ≤ ρ. If the
f (z)
zeros of g are all zeros of f , then H(z) = is an entire function of order ρ at most.
g(z)
Proof. Since the zeros of g are all zeros of f , H(z) is an entire function. Let G1 (z) and G2 (z) be the canonical
products formed with the zeros of f and g respectively. By Hadamard’s factorization theorem, we have
where Q1 (z) and Q2 (z) are polynomials with degrees less than or equal to ρ. Then,
G1 (z)
H(z) = G(z) eQ1 (z)−Q2 (z) , where G(z) =
G2 (z)
is the canonical product formed with the zeros of G1 (z) that are not zeros of G2 (z). Since the convergence
exponent of a sequence is not increased by removing some of the terms, the convergence exponent and hence
the order of G(z) does not exceed ρ. Also, Q1 (z) − Q2 (z) is a polynomial of degree ≤ ρ. Hence, order of
eQ1 (z)−Q2 (z) is ≤ ρ. Thus, H is the product of two entire functions, each of order ≤ ρ. Hence, order of H(z)
is ρ at most.
Theorem 8.3.4. (Picard’s little theorem) An entire function of finite order takes any complex number except
at most one number.
Proof. Let f be an entire function of finite order. If possible, let f do not take two values a and b. Then
f (z)−a 6= 0 and f (z)−b 6= 0 for all z ∈ C. Thus, there exists an entire function g such that f (z) − a = eg(z) .
Since f is of finite order, the function f (z) − a is also of finite order. By Hadamard’s factorization theorem,
g(z) must be a polynomial. Now,
Hence, eg(z) 6= b − a for all z ∈ C. This is a contradiction, since g(z) being a polynomial, by fundamental
theorem of algebra, g(C) = C. Hence the theorem.
Example 8.3.1. 1. The most common example is the non-constant entire function ez which omits only the
value 0.
66 UNIT 8.
2. Any non-constant polynomial f takes all the values of the finite complex plane. This is due to the fact
that for any complex number a, the function f (z) − a is also a polynomial in C having zero in C by the
fundamental theorem of algebra.
Theorem 8.3.5. Let f be an entire function of finite order ρ which is not an integer. Then f has infinitely
many zeros.
Proof. Let f be an entire function of finite order ρ which is not an integer. If possible, suppose that the zeros
of f are {a1 , a2 , . . . , an }, finite in number, counted according to multiplicities. Then f (z) can be expressed
as f (z) = (z − a1 )(z − a2 ) · · · (z − an ) eg(z) , where g(z) is an entire function. By Hadamard’s factorization
theorem, g(z) is a polynomial whose degree is less than or equal to ρ. Clearly, f (z) and eg(z) are of same
order. But the order of eg(z) is exactly the degree of g(z), which is an integer. This implies that ρ is an integer.
This is a contradiction and hence the result.
∞
Y z 1
Example 8.3.2. If α > 1, we show that the entire function 1 − α is of finite order . Firstly, the
n α
n=1
∞ ∞
Y z Y z
infinite product 1 − α is of the form E , 0 . Here, p = 0 is the least non-negative integer for
n nα
n=1 n=1
which,
∞ ∞
X 1 X 1
p+1 =
rn nα
n=1 n=1
is convergent since α > 1. Hence, the given infinite product is a canonical product. So, the order of
∞
Y z
1 − α is the same as the convergence exponent of its zeros. Here, zeros are an = nα . Hence,
n
n=1
rn = |an | = nα . Hence the convergence exponent
log n log n 1
ρ1 = lim sup = lim sup = .
n→∞ log rn n→∞ α log n α
(a) An entire function that omit two values on the complex plane;
(b) An entire function that omit values on the negative real axis;
(c) An entire function whose range set is a dense set;
(d) An entire function whose range set is not a dense set;
(e) An entire function whose range set is a straight line;
(f) An entire function whose range set is B(0; R), R > 0 is a real number;
(g) An entire function for which M (r1 ) = M (r2 ) for r1 < r2 ;
8.4. FEW PROBABLE QUESTIONS 67
1. Define canonical product of the zeros of an entire function. Find the canonical product of the function
sin z.
4. Show that for an entire function f of finite order ρ, which is not an integer, and convergence exponent
ρ1 , ρ = ρ1 .
5. State the Hadamard’s factorization theorem. If f and g be entire functions of order ρ and ρ0 respectively,
f (z)
such that ρ0 ≤ ρ, and also if the zeros of g are all zeros of f , the show that the function H(z) =
g(z)
is an entire function of order at most ρ.
7. Show that an entire function f of finite order ρ, which is not an integer, has infinitely many zeros.
8. State Picard’s Little theorem. Hence prove the Fundamental theorem of Algebra.
Unit 9
Course Structure
• Multiple-valued functions,
√
• Riemann surface for the functions z , log z.
9.1 Introduction
Multifunctions are hard to avoid. Many complex functions, like the complex exponential, are not globally
one-to-one. We may view such a function as having an inverse, so long as we allow the inverse to be a mul-
tifunction. Constructing, at least locally, a well-behaved functional inverse will involve extracting a suitable
value from this multifunction at each point of the domain. But in order to treat more complicated exam-
√
ples, such as z, z 2/3 , etc., we begin a deeper analysis of many-valuedness. which will enable us to handle
logarithms and powers of rational functions.
Objectives
After reading this unit, you will be able to
• discuss in detail the argument function and its rle in the many-valuedness of complex functions
• discuss the Riemann surfaces of the square root function and logarithm function
68
9.3. ARGUMENT AS A FUNCTION 69
Figure 9.1
√
For example, if we consider the function 3 z, then it has three different values (if z is non-zero) for a single
value of z and hence is a three-valued multifunction.
√ √
Let us see how. The function z 7→ 3 z, that is, if w = 3 z, and a is a solution of this equation, let us find
the other two solutions too. If z = r eiθ orbits round an origin-centred circles, z 3 = r3 e3iθ orbits three times
faster executing a complete revolution each time z executes one third of a revolution. Put differently, reversing
the direction of the mapping divides the angular speed by three. This is an essential ingredient, which we will
now study in detail.
√ √ √
Writing z = r eiθ , we have 3 z = 3 r ei(θ/3) . Here, 3 r uniquely defined as the real cube root of the
length of z; the sole source of the three-fold ambiguity in the formula is the fact that there are infinitely many
different choices for the angle θ of a given point z.
Think of z as a moving point that is initially at z = p. If we arbitrarily choose θ to be the angle φ as shown
√
in fig. 9.1, then 3 p = a. As z gradually moves away from p, θ gradually changes from its initial value φ,
√ √
and 3 z = 3 r ei(θ/3) gradually moves away from its initial position a, but in a completely determined way-its
distance from the origin is the cubic root of the distance of z, and its speed of movement is one-third that of z.
The bracket notation [arg z] is designed to emphasize that the argument of z is a set of numbers, not a single
number. In fact, [arg z] is an infinite set, consisting of all numbers from θ + 2kπ for k ∈ Z, where θ is any
fixed real number such that eiθ = z/|z|.
The restriction −π < θ ≤ π, or alternatively, 0 ≤ θ < 2π, uniquely determines θ in the equation
0 6= z = |z| eiθ .
Now, consider what happens to a principal value determination of argument Arg z = θ, where z = |z| eiθ ,
0 ≤ θ < 2π, where z performs a complete anticlockwise circuit round the unit circle starting from z = 1,
with θ ∈ [0, 2π). Within the chosen range [0, 2π), θ has value 0 at the start and increases steadily towards 2π
as z moves round the circle until it arrives back at 1, when θ must jump back to 0. Thus, Arg z has a jump
discontinuity. On the other hand, if we insist on choosing θ so that it varies continuously with z, then its final
70 UNIT 9.
value has to be 2π, a different choice from [arg 1] from that we made at the start. We can give a more formal
treatment of the issues just discussed.
We show that there is no way to impose a restriction which selects θ(z) ∈ [arg z] for all z ∈ C \ {0}, so
θ : z 7→ θ(z) is continuous as a function of z. We assume for a contradiction that such a continuous function
θ does not exist and consider
1
θ(eit ) + θ(e−it ) , t ∈ R.
k(t) =
2π
Then k is continuous and
1
k(t) = ((t + 2mt π) + (−t + 2nt π)) , mt , nt ∈ Z,
2π
so k takes only integer values. Also k(0) is even and k(π) is odd, so k is non-constant. This contradicts the
intermediate value theorem from real analysis.
This result has implications for other multifunctions. For example, it tells us that there cannot be a con-
tinuous logarithm in C \ {0}: if there were one, then its imaginary part - an argument function - would be
continuous too.
Figure 9.2
√
3
loop A (finally returning to p), z travels along the illustrated closed loop and returns to its original value a.
9.4. BRANCH POINTS 71
√
However, if z instead travels along the closed loop B, which goes round the origin once, then 3 z does not
return to its original value, but instead it ends up at a different cube root of p, namely b. Note that the detailed
shape of B is irrelevant, all that matters is that it encircles the origin once. Similarly, if z travels along C,
√
encircling the origin twice, then 3 z ends up at c, the third and final cube root of p. Clearly, if z ere to travel
√
along the loop (not shown) that encircled the origin three times, then 3 z would return to the original value a.
√ √
The premise for this picture of z 7→ 3 z was the arbitrary choice of 3 p = a, rather than b or c. If we
√
instead chose 3 p = b, then the orbits on the left of fig. 9.2 would simply be rotated by 2π/3. Similarly, if we
√
chose 3 p = c, then the orbits would be rotated by 4π/3.
√
The point z = 0 is the branch point of 3 z. More generally, let f (z) be a multifunction and let a = f (p) be
one of its values at some point z = p. Arbitrarily choosing the initial position of f (z) to be a, we may follow
the movement of f (z) as z travels along a closed loop beginning and ending at p. When z returns to p, f (z)
will either return to a or it will not. A branch point z = q of f is a point such that f (z) fails to return to a as
z travels along any loop that encircles q once.
√
Returning to the specific example f (z) = 3 z, we have seen that if z executes three revolutions round the
branch point at z = 0 then f (z) returns to its original value. If f (z) were an ordinary, single-valued function
then it would return to its original value after only one revolution. Thus, relative to an ordinary function, two
extra revolutions are needed to restore the original value of f (z). We summarize this by saying that 0 is a
√
branch point of 3 z of order two.
Definition 9.4.1. If q is a branch point of some multifunction f (z), and f (z) first returns to its original value
after N revolutions round q, then q is called an algebraic branch point of order (N − 1); an algebraic branch
point of order 1 is called a simple branch point. We should stress that it is perfectly possible that f (z) never
returns to its original value, no matter how many times z travels round q. In this case q is the logarithmic
branch point-the name will be explained in the next section.
√
By extending the above discussion of 3 z, check for yourself that if n is an integer, then z 1/n is an n-valued
multifunction whose only (finite) branch point is at z = 0, the order of this branch point being (n − 1). More
generally, the same is true for any fractional power z m/n , where m/n is a fraction reduced to lowest terms.
9.4.1 Multibranches
Suppose we are given a multifunction f (z). Our goal is to select a value w = f (z) from various possible
values, for each z in as large a domain as possible, so that f is holomorphic. In particular. f has to be
continuous. We now introduce multibranches. These provide a stepping stone on the way to our goal.
There is a sense in which we can make continuous selections from multifunctions in a natural way. The key
idea is the following. Rather than considering z as our variable we introduce, for each branch point a, new
√
variables (r, θ), where z = a + r eiθ . Let us illustrate this with the example of 3 z.
√
By arbitrarily picking one of the three values of 3 p at z = p, and then allowing z to move, we see that we
√
obtain a unique value of 3 Z associated with any particular path from p to Z. However, we are still dealing
√ multifunction: by going round the branch point 0, we can end up at any one of the three possible values
with
of 3 Z. √
On the other hand, the value of 3 Z does not depend on the detailed shape of the √ path: if we continuously
deform the path without crossing the branch point, then we obtain the same value of 3 Z. This shows us how
we may obtain a single-valued function. If we restrict z to a simply connected set S that contains
√ p, but does
3
not contain the branch point, then every path in S from p to Z will yield the same value of Z, which we will
call f1 (Z). Since the path is irrelevant, f1 is an ordinary, single-valued function of position in S, which is a
√
branch of the 3 z.
√
Fig. 9.3 illustrates such a set S, together with its image under the branch f1 of 3 z. Here, we have reverted
√
to our normal practice of depicting the mapping going from left to right. If we instead choose 3 p = b, then
72 UNIT 9.
Figure 9.3
√ √
we obtain a second branch f2 of 3 z, while 3 p = c yields a third and final branch f3 .
We can simplify it as considering three different single-valued functions
√ √ √
f1 (r, θ) = 3 r eiθ/3 , f2 (r, θ) = 3 r ei(θ+2π)/3 , f3 (r, θ) = 3 r ei(θ+4π)/3
Figure 9.4
√
called the principal branch of the cube root. Let us denote this as [ 3 z]. Note that the principal branch agrees
with the real cube root function on the positive real axis, but not the negative real axis and note that the other
√ √
two branches can be expressed in terms of the principal branch as ei(2π/3) [ 3 z] and ei(4π/3) [ 3 z].
so f1 and f2 can be thought of as ”plus” and ”minus” square root functions. The negative real axis is called
a branch cut for the functions f1 and f2 . Each point on the branch cut is a point of discontinuity for both
functions f1 and f2 .
Example 9.5.1. We show that f1 is discontinuous along the negative real axis. Let z0 = r eiπ denote a
negative real number. We compute the limit as z approaches z0 through the upper half plane and the limit as
z approaches z0 through the lower half plane. In polar coordinates, these are given by
iθ √ θ θ √
lim f1 (r e ) = lim r0 cos + i sin = i r0 , and
(r,θ)→(r0 ,π) (r,θ)→(r0 ,π) 2 2
iθ √ θ θ √
lim f1 (r e ) = lim r0 cos + i sin = −i r0 .
(r,θ)→(r0 ,−π) (r,θ)→(r0 ,−π) 2 2
The two limits are distinct, so f1 is discontinuous at z0 . Since z0 is arbitrary, so f1 is discontinuous on the
whole negative real axis.
We will now draw the Riemann surface for f . f (z) has two values for any z 6= 0. Each functions f1 and f2
are single-valued on the domain formed by cutting the z plane along the negative real axis. Let D1 (see fig.
9.5) and D2 (see fig. 9.6) be the domains of f1 and f2 respectively. The range set for f1 is H1 consisting of
the right-half plane and the positive v-axis; and the range set for f2 is H2 consisting of the left-half plane and
the negative v-axis. The sets H1 and H2 are ”glued together” along the positive v-axis and the negative v-axis
to form the w plane with the origin deleted.
74 UNIT 9.
√
Figure 9.5: A portion of D1 and its image under w = z
√
Figure 9.6: A portion of D2 and its image under w = z
We stack D1 directly above D2 . The edge of D1 in the upper half-plane is joined to the edge of D2 in the
lower half-plane, and the edge of D1 in the lower half-plane is joined to the edge of D2 in the upper half-plane.
When these domains are glued together in this manner, they form R, which is a Riemann surface domain for
√
the mapping w = f (z) = z. The portions of D1 , D2 and R that lie in {z : |z| < 1} are shown in fig. 9.7.
The beauty of this structure is that it makes this ”full square root function” continuous for all z 6= 0.
Normally, the principal square root function would be discontinuous along the negative real axis, as points
near −1 but above that axis would get mapped to points close to i, and points near −1 but below the axis
would get mapped to points close to −i. As fig. 9.7 indicates, however, between the point A and the point B,
the domain switches from the edge of D1 in the upper half-plane to the edge of D2 in the lower half plane.
The corresponding mapped points A0 and B 0 are exactly where they should be. The surface works in such a
way that going directly between the edges of D1 in the upper and lower half planes is impossible (likewise for
D2 ). Going counterclockwise, the only way to get from the point A to the point C, for example, is to follow
the path indicated by the arrows in fig. 9.7.
We now move on to the logarithmic function.
Exercise 9.5.1. Show that f2 is discontinuous at every point on the negative real axis.
9.5. RIEMANN SURFACES 75
√
Figure 9.7: A portion of R and its image under w = z
elog z = z.
It follows that
log z = ln |z| + i arg(z).
Since arg(z) takes infinitely many values, differing from each other by multiples of 2π, we see that log(z) is
a multifunction taking infinitely many values, differing from each other by multiples of 2πi. For example,
√ π
log(2 + 2i) = ln 2 2 + i + 2nπi,
4
where n is an arbitrary integer. The reason we get infinitely many values is clear if we see the behaviour of the
exponential function ez . Each time z travels straight upwards by 2πi, ez executes a complete revolution and
returns to its original value. Clearly, log(z) has a branch point at 0. However, this branch point is quite unlike
√
that of n z, for no matter how many times we loop around the origin, log(z) never returns to its original value,
rather it continues moving upwards forever. You can now understand previously introduced term ”logarithmic
branch point”.
√
Here is another difference between the branch point of n z and log(z). As z approaches the origin, say
√
along a ray, | n z| tends to zero, but | log(z)| tends to infinity, and in this sense, origin is a singularity as well
as a branch point.
To define single-valuedness of log(z), we make a branch cut from 0 out to infinity. The most common
choice for this cut is the negative real axis. In this cut plane, we may restrict arg(z) to its principal value
Arg z. This yields the principal branch of logarithm, written as Log z, defined as
We are in a position to draw the Riemann surface for log(z). Let define the multibranches of logarithm
function as follows
Fk (z) = log |z| + i(θ + 2kπ), k ∈ Z.
Each Fk is a continuous function of z. Furthermore, for any fixed c ∈ R, and for 0 6= z = r eiθ ,
Figure 9.8
with no values repeated, and if similarly, θ is restricted to any interval (c, c + 2π].
The domain of definition of each Fk are none other than the whole complex plane with different arguments.
Let Sk be the domain of definition of the branch Fk of log(z). (These copies of the complex plane are each
cut along the negative real axis) These cut planes are then stacked directly on top of each other and joined as
follows. For each integer k, the edge of Sk in the upper half plane is joined to the edge of Sk+1 in the lower half
plane. The resulting Riemann surface of log(z) looks like a spiral staircase that extends upwards on S1 , S2 , . . .
and downwards on S−1 , S−2 , . . . as shown in fig. 9.8. If we start on S0 and make a counterclockwise circuit
around the origin, we end up on S1 , and the next circuit brings us to S2 , etc, so each time we cross the negative
real axis, we end up on a new branch of log(z).
A. 0 B. 1 C. ∞ D. −1
√
3
3. The function z 2 has an algebraic branch point of order ...... at z = 0.
A. 3/2 B. 2/3 C. 2 D. 3
√
4. For the function z − 1, the point z = 1 is a/an ...... branch point.
9.6. FEW PROBABLE QUESTIONS 77
5. Define branch of a multifunction. Show that the branches of the square root function are discontinuous
at each point of the negative real axis.
Unit 10
Course Structure
• Analytic continuation, uniqueness,
10.1 Introduction
Analytic continuation is an important idea since it provides a method for making the domain of definition of an
analytic function as large as possible. Usually, analytic functions are defined by means of some mathematical
expressions such as polynomials, infinite series, integrals, etc. The domain of definition of such an analytic
function is often restricted by the manner of defining the function. For instance the power series representation
of such analytic functions does not provide any direct information as to whether we could have a function
analytic in a domain larger than disc of convergence which coincides with the given function. We have
previously seen that an analytic function is determined by its behaviour at a sequence of points having limit
point. This was precisely the content of the identity theorem which is also referred to as the principle of
analytic continuation. For example, as a consequence, there is precisely a unique entire function on C which
agrees with sin x on the real axis, namely sin z.
Objectives
After reading this unit, you will be able to
• define analytic continuation of an analytic function and consider examples of such process
• define chain and function elements and discuss the condition for analytic continuation from a domain
into another
78
10.2. ANALYTIC CONTINUATION 79
The series on the right hand side of (10.2.1), as is well known, is convergent for |z| < 1 and diverges for
|z| ≥ 1. On the other hand, we know that the series given by the formula (10.2.1) represents an analytic
function for |z| < 1 and the sum of the series (10.2.1) for |z| < 1 is 1/(1 − z). However, the function F
defined by the formula
1
F (z) =
1−z
is analytic for z ∈ C∞ \ {1} = D, since
1 1 z
F = =
z 1 − z −1 z−1
is analytic at z = ∞. Now, f (z) = F (z) for all z ∈ D ∩ D, and we call F an analytic continuation of f from
D into D, that is, the function f , given at first for |z| < 1, has been extended to the extended complex plane
but for the point 1, at which the function has a simple pole. Thus, it seems that F , which is analytic globally,
is represented by a power series only locally.
We now, formally define analytic continuation of a function f as follows.
Definition 10.2.1. Suppose that f and F are two functions such that
1. f is analytic on some domain D ⊂ C;
2. F is analytic in a domain D1 such that D1 ∩D 6= ∅ and D ⊂ D1 , such that f (z) = F (z) for z ∈ D∩D1 .
Then we call F an analytic continuation or holomorphic extension of f from domain D into D1 . In other
words, f is said to be analytically continuable into D1 .
The definition can also be given as follows.
Definition 10.2.2. A function f (z), together with a domain D in which it is analytic, is said to be a function
element and is denoted by (f, D). Two function elements (f1 , D1 ) and (f2 , D2 ) are called direct analytic
continuations of each other if and only if
D1 ∩ D2 6= ∅ and f1 = f2 on D1 ∩ D2 .
80 UNIT 10.
Remark 10.2.1. Whenever there exists a direct analytic continuation of (f1 , D1 ) into a domain D2 , it must
be uniquely determined, for any two direct analytic continuations would have to agree on D1 ∩ D2 , and by
identity theorem, would consequently have to agree throughout D2 . That is, given an analytic function f1 on
D1 , there is at most one way to extend f1 from D1 into D2 so that the extended function is analytic in D2 .
The property of being a direct analytic continuation is not transitive. That is, even if (f1 , D1 ) and (f2 , D2 )
are direct analytic continuations of each other, and (f2 , D2 ) and (f3 , D3 ) are direct analytic continuations
of each other, we cannot conclude that (f1 , D1 ) and (f3 , D3 ) are direct analytic continuations of each other.
A simple example of this occurs whenever D1 and D3 have no points in common. However, there is a
relationship between f1 (z) and f3 (z) that is worth explaining.
Definition 10.2.3. Suppose that {(f1 , D1 ), (f2 , D2 ), . . . , (fn , Dn )} is a finite set of function elements with the
property that (fk , Dk ) and (fk+1 , Dk+1 ) are direct analytic continuations of each other for k = 1, 2, . . . , n−1.
Then the set of function elements are said to be analytic continuations of one another. Such a set of function
elements is then called a chain.
Figure 10.1
f1 (z) = Log z, z ∈ D1
f2 (z) = Log z, z ∈ D2
f3 (z) = Log z + 2πi, z ∈ D3 .
Then {(f1 , D1 ), (f2 , D2 ), (f3 , D3 )} is a chain with n = 3. Note that 0 = f1 (1) 6= f3 (1) = 2πi.
Note that (fi , Di ) and (fj , Dj ) are analytic continuations of each other if and only if they can be connected
by finitely many direct analytic continuations.
10.3. ANALYTIC CONTINUATION ALONG A CURVE 81
Figure 10.2
Proof. Suppose there are two analytic continuations of (f1 , D1 ) along the curve γ, namely,
such that z(t) ∈ Di for ti−1 ≤ t ≤ ti , for i = 1, 2, . . . , n and z(t) ∈ Ej for sj−1 ≤ t ≤ sj for j =
1, 2, . . . , m. We claim that if 1 ≤ i ≤ n, 1 ≤ j ≤ m and
[ti−1 , ti ] ∩ [sj−1 , sj ] 6= ∅
then (fi , Di ) and (gj , Ej ) are direct analytic continuations of each other. This is certainly true when i = j = 1,
since f1 = g1 and E1 = D1 . If it is not true for all i and j, then we may pick from all (i, j), for which
the statement is false and such that i + j is minimal. Suppose that ti−1 ≥ sj−1 , where i ≥ 2. Since
[ti−1 , ti ] ∩ [sj−1 , sj ] 6= ∅ and sj−1 ≤ ti−1 , we must have ti−1 ≤ sj . Thus, sj−1 ≤ ti−1 ≤ sj . It follows
that z(ti−1 ) ∈ Di−1 ∩ Ei ∩ Ej . In particular, this intersection is non-empty. None of (fi , Di ) is a direct
analytic continuation of (fi−1 , Di−1 ). Moreover, (fi−1 , Di−1 ) is a direct analytic continuation of (gj , Ej )
since i + j is minimal, where we observe that ti−1 ∈ [ti−2 , ti−1 ] ∩ [sj−1 , sj ] so that the hypothesis of the claim
is satisfied. Since Di−1 ∩ Di ∩ Ej 6= ∅, (fi , Di ) must be direct analytic continuation of (gj , Ej ) which is a
contradiction. Hence our claim holds for all i and j. In particular, it holds for i = n and j = m, which proves
the theorem.
Given a chain {(f1 , D1 ), (f2 , D2 ), . . . , (fn , Dn )}, can a function f (z) be defined such that f (z) is analytic
in the domain D1 ∪ D2 ∪ · · · ∪ Dn ? Certainly this can be done when n = 2. The function
f (z) = f1 (z) if z ∈ D1
= f2 (z) if z ∈ D2
Figure 10.3
of the original series? One way to provide an affirmative answer is by the power series method. Let us start
our discussion on this method and see how one can use the power series to go beyond the boundary of the disc
of convergence. A fundamental fact about a function f , analytic in a domain D, is that, for each a ∈ D, there
exists a sequence {an }n≥0 and a number ra ∈ (0, ∞] such that
∞
X
f (z) = an (z − a)n for all z ∈ B(a; ra ).
n=0
To extend f , we choose a point b other than a in the disc of convergence B(a; ra ). Then |b − a| < ra and
∞
X ∞
X
n
an (z − a) = an [z − b + b − a]n
n=0 n=0
∞ n
!
X X n
= an (b − a)n−k (z − b)k
k
n=0 k=0
∞ ∞
!
X X n
n−k
= an (b − a) (z − b)k
k
k=0 n=k
∞
X
= Ak (z − b)k .
k=0
whenever |z − b| + |b − a| < ra . Therefore, the series about b converges at least for |z − b| < ra − |b − a|.
However, this may happen that the disc of convergence B(b; rb ) for this new series extends outside B(a; ra ),
that is, it may be possible that rb > ra − |b − a|. In this case, the function can be analytically continued to the
union of these two discs. This process may be continued.
84 UNIT 10.
∞ ∞
X
n
X √
Thus, An (z − i) is an analytic continuation of z n in D to the disc B(i; 2). Similarly, one can see
n=0 n=0
∞ ∞
X 1 X
that (z + 1)n is an analytic continuation of z n from D to the disc B(−1; 2).
2n+1
n=0 n=0
1 z z2
f (z) = + 2 + 3 + ···
a a a
can be continued analytically. This series converges within the circle C0 : |z| = |a| and has the sum
1 1 1
f (z) = z = .
a1− a a−z
The only singularity of f (z) on C0 is at z = a. Hence the analytic continuation of f (z) beyond C0 is possible.
For this purpose we take a point z = b not lying on the line segment joining z = 0 and z = a. We draw a
circle C1 with centre b and radius |a − b| that is, C1 is |z − b| = |a − b|. This new circle C1 clearly extends
beyond C0 as shown in the figure 10.4
Now, we reconstruct the power series given in powers of (z − b) in the form
∞
X (z − b)n n!
, where f (n) (b) = . (10.4.1)
(a − b)n+1 (a − b)n+1
n=0
1
This power series has the cirlce of convergence C1 and has cum function . Thus, the power series
a−z
(10.4.1) and the given power series represent the same function in the region common to C0 and C1 . Hence
(10.4.1) represents an analytic continuation of the given series.
10.5. FEW PROBABLE QUESTIONS 85
a
b
x
O
Figure 10.4
1. Define analytic continuation of an analytic function f from a domain D1 into another domain D2 . Show
that such a continuation is unique. Is the analytic continuation of an analytic function always possible?
Justify your answer.
3. Let {(f1 , D1 ), (f2 , D2 ), . . . , (fn , Dn )} be a chain. With proper justifications, show that a function
defined by f (z) = fi (z), for z ∈ Di is analytic in D1 ∪ D2 ∪ · · · ∪ Dn if D1 ∩ D2 ∩ · · · ∩ Dn 6= ∅. Is
the result true for any general case? Justify.
Unit 11
Course Structure
• Continuation by the method of natural boundary,
• Existence of singularity on the circle of convergence
11.1 Introduction
This unit deals with the continuation by natural boundary. Suppose that a power series has radius of conver-
gence R and defines an analytic function f inside that disc. Consider points on the circle of convergence. A
point for which there is a neighbourhood on which f has an analytic extension is regular, otherwise singular.
Convergence is limited to within by the presence of at least one singularity on the boundary of . If the sin-
gularities on are so densely packed on the circle, that analytic continuation cannot be carried out on a path
that crosses, then it is said to form a natural boundary. In particular, the circle is a natural boundary if all its
points are singular. More generally, we may apply the definition to any open connected domain on which f is
analytic, and classify the points of the boundary of the domain as regular or singular: the domain boundary is
then a natural boundary if all points are singular. We will study about this in details.
Objectives
After reading this unit, you will be able to
• define natural boundary of an analytic function f on a domain D
• deduce certain results on the existence of singularities on the circle of convergence
86
11.3. EXISTENCE OF SINGULARITIES ON THE CIRCLE OF CONVERGENCE 87
A direct consequence of the Root test is that the radius of convergence of the above series is 1 and so, f (z)
defined as above is analytic for |z| < 1. If |z| ≥ 1, then lim |z 2n | =
6 0 is therefore, the series diverges for
n→∞
|z| ≥ 1.
n n
Let ζ = e2πim/2 , m = 0, 1, 2, . . . , 2n − 1, (n ∈ N) be the 2n th root of unity. If z = r e2πim/2 ∈ D, then
n−1 ∞
2k k
X X
f (z) = z + z2
k=0 k=n
lim |f (ζr)| = ∞.
r→1−
Therefore, if D is a domain containing points of D and of its complement, then D contains the points ζ =
n
e2πim/2 and so any function F in D which coincides with f in D ∩ D cannot be continued analytically
through ζ 2n = 1 for each n ∈ N. In other words, any root of the equation
z 2 = 1, z 4 = 1, . . . , z 2n = 1 (n ∈ N)
is a singular point of f and hence any arc, however small it may be, of ∂D contains an infinite number of
singularities. Thus, f on D cannot be continued analytically across the boundary ∂D of D. This observation
shows that the unit circle |z| = 1 is a natural boundary for the power series defined by (11.2.1).
Proof. Suppose, on contrary that f has no singularity on |z| = R. Then f must be analytic at all points of
|z| = R. This implies f is analytic on |z| ≤ R. It follows, from the definition of analyticity at a point, that for
each ζ ∈ ∂DR there exists for some Rζ > 0 and a function fζ which is analytic in B(ζ; Rζ ) and
f = fζ on B(ζ; Rζ ) ∩ DR
In this way, if ζk and ζl ∈ ∂DR (k 6= l) with G = B(ζk ; Rζk ) ∩ B(ζl ; Rζl ) 6= φ, then we have two functions
fζk and fζl which are respectively analytic in B(ζk ; Rζk ) and B(ζl ; Rζl ) such that
f = fζk = fζl G ∩ DR
Since G is connected and G ∩ DR is an open subset of G, by the uniqueness theorem, fζk = fζl on G. Since
y
∂B(ζ; Rζ )
x
O R
∂DR
|ζ| = R is compact, by the Heine-Borel theorem, we may select a finite number of B(ζ1 ; Rζ1 ), B(ζ2 ; Rζ2 ), . . . ,
B(ζn ; Rζn ) from the collection {B(ζ; Rζ ) : ζ ∈ ∂DR } such that it covers the circle ∂DR . Let
Ω = ∪nk=1 B(ζk ; Rζk ) and δ = dist(∂DR Ω)
Then, as Rζk > 0 for each k, we have δ > 0. Moreover,
{z : R − δ < |z| < R + δ} ⊂ Ω and DR+δ ⊂ D = DR ∪ Ω
Then g is defined by
g(z) = f (z) for |z| < R
= fζk (z) for |z − ζk | < Rζk , k = 1, 2, . . . , n
as well defined, single-valued and analytic on D and has same power series representation as f for |z| < R.
Thus there exists an analytic function, say φ, in DR+δ , which coincides with f on DR . But, then by Taylor’s
theorem we have the power series representation
X
φ(z) = bn z n for z ∈ DR+δ
n≥0
11.3. EXISTENCE OF SINGULARITIES ON THE CIRCLE OF CONVERGENCE 89
Since f = g on DR , by the uniqueness theorem, we have an = bn for each n. This shows that the radius of
convergence of f is R + δ, which is a contradiction.
X
Theorem 11.3.2. If an ≥ 0 and f (z) = an z n has radius of convergence 1, then (f, D) has no direct
n≥0
analytic continuation to a function element (F, D) with 1 ∈ D.
eiα
r
δ δ+r x
O 1
Proof. For each z = r eiθ ∈ D(0 < r < 1; θ ∈ [0, 2π]), we have
X
f (k) (z) = n(n − 1) · · · (n − (k − 1))an z n−k (11.3.1)
n≥k
so that since an ≥ 0
X
|f (k) (r eiθ )| ≤ n(n − 1) · · · (n − (k − 1))an z n−k = f (k) (r) (11.3.2)
n≥k
We have to show that 1 is a singular point of f . Suppose, on the contrary that 1 is a regular point of f . Then, f
can be analytically continued in a neighbourhood of z = 1 and so there is a δ with 0 < δ < 1 (see fig. (11.2))
for which the Taylor’s series expansion of f about δ, namely the series
X f (k) (δ
(z − δ)k , (11.3.3)
k!
k≥0
would be convergent for |z − δ| < r with δ + r > 1. Then by (11.3.2), we find that
|f (k) (δ eiθ )| f (k) (δ)
≤ .
k! k!
From this, the root test and the comparison test with (11.3.3), it follows that the radius of convergence of the
Taylor series about δ eiθ is at least r. This observation implies that the Taylor series
X f (k) (δ eiθ ) k
z − δ eiθ
k!
k≥0
90 UNIT 11.
would be convergent in the disc |z − δ eiθ | < r for each θ, with δ + r > 1. In other words, the Taylor series
X f (k) (z0 )
(z − z0 )k
k!
k≥0
about each z0 with |z0 | = δ would have radius of convergence ≥ r > 1−δ. Since this contradicts the previous
theorem, and hence 1 must be a singular point of f . This completes the proof.
X
Notice that the last series is actually a rearrangement of an z n . Indeed, by (11.3.1),
n≥0
X n n
X XX n
an z0n−k (z − z0 ) k
= an z0n−k (z − z0 )k
k k
k≥0 n≥k n≥0 k=0
X
= an (z − z0 + z0 )n
n≥0
X
= an z n .
n≥0
X
Corollary 11.3.1. If an ≥ 0 and f (z) = an z n has the radius of convergence R > 0, then z = R is a
n≥0
singularity of f (z).
Clearly, nk = 3k . Thus,
nk+1 3k+1 3k
lim inf = lim inf k = lim inf 3 k = 3 > 1.
k→∞ nk k→∞ 3 k→∞ 3
Thus, by the previous theorem, the circle of convergence of the given series is the natural boundary for f . We
find the radius of convergence by Cauchy-Hadamard’s theorem. Let R be the radius of convergence of the
X∞
power series. Writing the given power series as an z n , we get the terms an of the power series as
n=0
1 1
f (z) = 0 · z 0 + 1 · z 1 + 0 · z 2 + z 3 + · · · + z 9 + · · ·
3 9
1
Thus, the set |an | n is given below
( 1 1 )
1 1 1 3 1 9
|1| , |0| ,
2 , 0, 0, . . . , ,... .
3 9
11.4. FEW PROBABLE QUESTIONS 91
Hence, R = 1 is the radius of convergence of the given power series and |z| = 1 is the natural boundary of f .
X
1. Show that the function f (z) = an z n having radius of convergence R > 0 must have at least one
n≥0
singularity on |z| = R.
2. If an ≥ 0, and
X
3. Find the natural boundary of the function f (z) = an z n has radius of convergence 1, the show that
n≥0
(f, D) has no direct analytic continuation to a function element (F, D), with 1 ∈ D.
∞ k
X z2
f (z) = .
k=0
2k2
Unit 12
Course Structure
• Monodromy theorem
• germs
12.1 Introduction
This unit is a continuation of the previous unit an deals in the Monodromy theorem. In complex analysis, the
Monodromy theorem is an important result about analytic continuation of a complex-analytic function to a
larger set. The idea is that one can extend a complex-analytic function (from here on called simply analytic
function) along curves starting in the original domain of the function and ending in the larger set. A potential
problem of this analytic continuation along a curve strategy is there are usually many curves which end up at
the same point in the larger set. The Monodromy theorem gives sufficient conditions for analytic continuation
to give the same value at a given point regardless of the curve used to get there, so that the resulting extended
analytic function is well-defined and single-valued.
Objectives
After reading this unit, you will be able to
92
12.2. MONODROMY THEOREM 93
Definition 12.2.1. Let γ0 , γ1 : [0, 1] → G be two closed rectifiable curves in a region G then γ0 is homotopic
to γ1 in G if there is a continuous function
F : [0, 1] × [0, 1] → G
such that
F (s, 0) = γ0 (s)
F (s, 1) = γ1 (s) (0 ≤ s ≤ 1)
F (0, t) = F (1, t) (0 ≤ t ≤ 1)
Definition 12.2.2. Let γ0 , γ1 : [0, 1] → G be two closed rectifiable curves in G such that γ0 (0) = γ1 (0) = a
and γ0 (1) = γ1 (1) = b. Then γ0 and γ1 are fixed-end-point homotopic (FEP homotopic) if there is a
continuous map F : [0, 1] × [0, 1] → G such that
We note that the relation of FEP homotopic is an equivalence relation on the curves from one given point
to another.
Definition 12.2.3. An open set G is called simply connected if G is connected and every closed curve in G is
homotopic to zero.
This is equivalent to the definition of simply connected region which we had learnt previously which states
that a set is simply connected if every closed rectifiable curve can be continuously deformed to a single point
without passing through any point outside the set. Now, we define the germ of a function f .
Definition 12.2.4. Let (f, G) be a function element. Then the germ of f at a is the collection of all function
elements (g, D) such that a ∈ D and f (z) = g(z) for all z in a neighbourhood of a. The germ of f at a is
denoted by [f ]a .
Definition 12.2.5. Let γ : [0, 1] → C be a path and suppose that for each t ∈ [0, 1] there is a function element
(ft , Dt ) such that
1. γ(t) ∈ Dt ;
2. for each t ∈ [0, 1], there is a δ > 0 such that |s − t| < δ implies that γ(s) ∈ Dt and
Remark 12.2.1. Since γ is a continuous function and γ(t) is in the open set Dt , so there is a δ > 0 such that
γ(s) ∈ Dt for |s − t| < δ.
whenever |s − t| < δ.
94 UNIT 12.
Theorem 12.2.1. Let γ : [0, 1] → C be a path from a to b and let {(ft , Dt ) : 0 ≤ 1} and {(gt , Bt ) : 0 ≤ 1}
be analytic continuation along γ such that [f0 ]a = [g0 ]a . Then [f1 ]b = [g1 ]b .
γ(s) ∈ Dt ∩ Bt and
[fs ]γ(s) = [ft ]γ(s)
[gs ]γ(s) = [gt ]γ(s)
But t ∈ T implies
ft (z) = gt (z) ∀z ∈ Dt ∩ Bt
Hence, [ft ]γ(s) = [gt ]γ(s) for all γ(s) ∈ Dt ∩ Bt . So, [fs ]γ(s) = [gs ]γ(s) whenever |s − t| < δ. That is, s ∈ T
whenever |s − t| < δ or (t − δ, t + δ) ⊂ T .
If t = 0 then the above argument shows that [a, a + δ) ⊂ T for some δ > 0. Hence T is open.
To show that T is closed let t be a limit point of T . Again by definition of analytic continuation there is a
δ > 0 such that |s − t| < δ, γ(s) ∈ Dt ∩ Bt and
Since t is a limit point of T , there is a point s in T such that |s − t| < δ. Let G = Dt ∩ Bt ∩ Ds ∩ Bs . Then
γ(s) ∈ G. So, G is non-empty open set. Thus by definition of T , fs (z) = gs (z) for all z ∈ G. But, (12.2.1)
implies
Since, G has a limit point in Dt ∩ Bt , this gives [ft ]γ(t) = [gt ]γ(t) . Thus, t ∈ T and so T is closed.
Now,T is non-empty subset of [0, 1] such that T is both open and closed. So, connectedness of [0, 1] implies
T = [0, 1]. Thus 1 ∈ T and hence [f1 ]γ(1) = [g1 ]γ(1) , that is, [f1 ]b = [g1 ]b as γ(1) = b
Remark 12.2.2. Suppose a and b are two complex numbers and let γ and σ be two paths from a to b. Suppose,
{(ft , Dt )} and {(gt , Dt )} are analytic continuations along γ and σ respectively such that [f0 ]a = [g0 ]a . Now,
the question is, does it follow that [f1 ]b = [g1 ]b ? If γ and σ are the same path then the above result gives an
affirmative answer. However, if γ and σ are distinct then the answer can be no.
Lemma 12.2.1. Let γ : [0, 1] → C be a path and let {(ft , Dt ) : 0 ≤ t ≤ 1} be an analytic continuation along
γ. For 0 ≤ t ≤ 1, let R(t) be the radius of convergence of the power series expansion of f about z = γ(t).
Then either R(t) ≡ ∞ or R : [0, 1] → (0, ∞) is continuous.
12.2. MONODROMY THEOREM 95
R(t)
a
Figure 12.1
Proof. Suppose R(t) = ∞ for some value of t. Then, ft can be extended to an entire function. It follows that
fs (z) = ft (z) for all z ∈ Ds so that R(s) = ∞ for all s ∈ [0, 1]. That is R(s) ≡ ∞. Now, suppose that
R(t) < ∞ for all t. Let t be a fixed number in [0, 1] and let a = γ(t). Let
∞
X
ft (z) = an (z − a)n
n=0
be the power series expansion of ft about a. Now, let δ1 > 0 be such that |s − t| < δ1 implies that γ(s) ∈
Dt ∩ B(a; R(t)) and [fs ]γ(s) = [ft ]γ(s) . Fix s with |s − t| < δ1 and let b = γ(s). Now, ft can be extended
to an analytic function on B(a; R(t)). Since, fs agrees with ft on a neighbourhood of fs can be extended so
that it is also analytic on B(a; R(t)) ∪ Ds . If fs has power series expansion
∞
X
fs (z) = bn (z − b)n about z = b
n=0
Then the radius of convergence R(s) must be at least as big as the distance from b to the circle |z −a| = R(t);
that is,
This implies R(t)−R(s) ≤ |a−b| that is R(t)−R(s) ≤ |γ(t)−γ(s)|. Similarly, we can show R(s)−R(t) ≤
|γ(t) − γ(s)|. Hence,
|R(s) − R(t)| ≤ |γ(t) − γ(s)| for |s − t| < δ1 .
Since, γ : [0, 1] → C is continuous so given > 0, ∃δ2 > 0 so that |γ(t) − γ(s)| < for |s − t| < δ2 . Let
δ = min{δ1 , δ2 }. Then δ > 0 and |R(s) − R(t)| < for |s − t| < δ. Hence R is continuous at t.
Lemma 12.2.2. Let γ : [0, 1] → C be a path from a to b and let {(ft , Dt ) : 0 ≤ t ≤ 1} be an analytic
continuation along γ. There is a number > 0 such that if σ : [0, 1] → C is any path from a to b with
|γ(t) − σ(t)| < for all t and if {(gt , Bt ) : 0 ≤ t ≤ 1} is any continuation along γ with [g0 ]a = [f0 ]a ; the
[g1 ]b = [f1 ]b .
Proof. For 0 ≤ t ≤ 1, let R(t) be the radius of convergence of the power series expansion of ft about
z = γ(t). If R(t) ≡ ∞ then any value of will be sufficient. So, suppose R(t) < ∞ for all t. Since R is a
96 UNIT 12.
continuous function and R(t) > 0 for all t, R has a positive minimum value. Let 0 < < 12 min{R(t) : 0 ≤
t ≤ 1}. Suppose σ : [0, 1] → C is any path from a to b with |γ(t) − σ(s)| < for all t and {(gt , Bt ) : 0 ≤
t ≤ 1} is any continuation along σ with [g0 ]a = [f0 ]a . Suppose Dt is a disk of radius R(t) about γ(t). Since
|σ(t) − γ(t)| < < R(t), σ(t) ∈ Bt ∩ Dt for all t.
Define the set T = {t ∈ [0, 1] : ft (z) = gt (z) ∀ z ∈ Bt ∩ Dt }. Then 0 ∈ T , since [g0 ]a = [f0 ]a . So,
T 6= φ. We will show that 1 ∈ T . For this, it is sufficient to show that T is both open and closed subset of
[0, 1].
To show T is open, let t be any fixed point of T . Choose δ > 0
|σ(s) − γ(t)| = |σ(s) − γ(s) + γ(s) − γ(t)| ≤ |σ(s) − γ(s)| + |γ(s) − γ(t)| < 2 < R(t)
Definition 12.2.7. Let (f, D) be a function element and let G be a region which contains D. Then (f, D)
admits unrestricted analytic continuation in G if for any path γ in G with initial point in D there is an analytic
continuation of (f, D) along γ.
Theorem 12.2.2. (Monodromy Theorem) Let (f, D) be a function element and let G be a region containing
D such that (f, D) admits unrestricted continuation in G. Let a ∈ D, b ∈ G and let γ0 and γ1 be paths in G
from a to b; let {(ft , Dt ) : 0 ≤ t ≤ 1} and {(gt , Dt ) : 0 ≤ t ≤ 1} be analytic continuations of (f, D) along
γ0 and γ1 respectively. If γ0 and γ1 are FEP homotopic in G, then
[f1 ]b = [g1 ]b .
Proof. Since γ0 and γ1 are fixed end point homotopic in G, there is a continuous function F : [0, 1] × [0, 1] →
G such that
For all t and u in [0, 1]. Let u be a fixed point of [0, 1]. Consider the path γu , defined by
Then,
γu (0) = F (0, u) = a, γu (1) = F (1, u) = b.
12.2. MONODROMY THEOREM 97
{(ht,u , Dt,u ) : 0 ≤ t ≤ 1}
of (f, D) along γu . Now, {(ht,u , Dt,u ) : 0 ≤ t ≤ 1} and {(ft , Dt ) : 0 ≤ t ≤ 1} are analytic continuations
along γ0 so by theorem 12.2.1, we have
[f1 ]b = [h1,0 ]b .
Similarly,
[g1 ]b = [h1,1 ]b .
To prove the theorem, it is sufficient to show
[h1,0 ]b = [h1,1 ]b .
By lemma 12.2.2, there is an > 0 such that if σ is any path from a to b with |γu (t) − σ(t)| < for all t and
if {(kt , Et )} is any continuation of (f, D) along σ, then
[h1,u ]b = [h1,v ]b .
Suppose u ∈ U such that [h1,u ]b = [h1,0 ]b . Then as proved above, there is a δ > 0 such that |u − v| < δ
which implies that
[h1,u ]b = [h1,v ]b
i.e. v ∈ (u − δ, u + δ) ⇒ [h1,v ]b = [h1,0 ]b
i.e. v ∈ (u − δ, u + δ) ⇒ v ∈ U
i.e. (u − δ, u + δ) ⊂ U.
Hence U is open.
To show that U is closed, we show that U = U . Let u ∈ U and δ be the positive number satisfying
(12.2.3). Then there is a v ∈ U such that |u − v| < δ. So, by (12.2.3), [h1,u ]b = [h1,v ]b . Since v ∈ U , so
[h1,v ]b = [h1,0 ]b . Thus, [h1,u ]b = [h1,0 ]b so that u ∈ U . Thus, U is closed as U = U .
Now, U is a non-empty open and closed subset of [0, 1] and since [0, 1] is connected, so, U = [0, 1]. So,
1 ∈ U and the result is proved.
98 UNIT 12.
Corollary 12.2.1. Let (f, D) be a function element which admits unrestricted continuation in the simply
connected region G. Then there is an analytic function F : G → C such that F (z) = f (z) for all z ∈ D.
Proof. Let a be a fixed point in D and z is any point in G. If γ is a path in G from a to z and {(ft , Dt ) : 0 ≤
t ≤ 1} is an analytic continuation of (f, D) along γ, then let F (z, γ) = f1 (z) since G is simply connected.
F (z, γ) = F (z, σ) for any two paths γ and σ in G from a to z. Thus, F (z) = F (z, γ) is a well defined
function from G to C. To show that F is analytic, let z ∈ G. Let γ be a path in G from a to z and {(ft , Dt )}
be the analytic continuation of (f, D) along γ. Then F (ω) = f1 (ω) for all ω in a neighbourhood of z. Hence
F must be analytic.
Course Structure
• Conformal transformations,
13.1 Introduction
In mathematics, a conformal map is a function that locally preserves angles, but not necessarily lengths. We
shall see that the derivative relates the angle between two curves to the angle between their images. In addition,
the derivative will be seen to measure the ”distortion” of image curves. They are also worth studying because
of their usefulness in solving certain physical problems, for example, problems about two-dimensional fluid
flow, the idea being to transform a given problem into an equivalent one which is easier to solve. So we wish
to consider the problem of mapping a given region G onto a geometrically simpler region G0 . For example
the open unit disc or the open upper half-plane.
Objectives
After reading this unit, you will be able to
• define conformal and isogonal maps and see certain examples
• define Möbius transformation and related terms and deduce few results related to symmetry
99
100 UNIT 13.
Suppose now that a function f is analytic on a smooth (parameterized) curve whose derivative is given
by f 0 (z(t))z 0 (t) (by chain rule). A smooth curve is characterized by having a tangent at each point. So, we
interpret z 0 (t) as a vector in the direction of the tangent vector at the point z(t). Our purpose is to compare
the inclination of the tangent to the curve at a point with the inclination of the tangent to the image curve at
the image of the point.
Let z0 = z(t0 ) be a point on the curve z = z(t). Then the vector z 0 (t0 ) is tangent to the curve at
the point z0 and arg z 0 (t0 ) is the angle this directed tangent makes with the positive x-axis. Suppose that
w = w(t) = f (z(t)) with w0 = f (z0 ). For any point z on the curve other than z0 , we have the identity
f (z) − f (z0 )
w − w0 = (z − z0 ).
z − z0
Thus,
f (z) − f (z0 )
arg(w − w0 ) = arg + arg(z − z0 ) ( mod 2π), (13.2.1)
z − z0
where it is assumed that f (z) 6= f (z0 ) so that (13.2.1) has meaning. Note that arg(z − z0 ) is the angle in the
z plane between the x axis and the straight line passing through the points z and z0 , while arg(w − w0 ) is the
angle in the w plane between the u axis and the straight line passing through the points w and w0 . Hence, as
z approaches z0 along the curve z(t), arg(z − z0 ) approaches a value θ, which is the angle that the tangent to
the curve z(t) at z0 makes with the x-axis. Similarly, arg(w − w0 ) approaches a value φ, the angle that the
tangent to the curve f (z(t)) at w0 makes with the u axis.
Suppose that f 0 (z0 ) 6= 0 so that arg f 0 (z0 ) has a meaning. Then, taking limits in (13.2.1), we find (
mod 2π) that
φ = arg f 0 (z0 ) + θ, or arg w0 (t0 ) = arg f 0 (z0 ) + arg z 0 (t0 ). (13.2.2)
That is, the difference between the tangent to a curve at a point and the tangent to the image curve at the image
of the point depends only on the derivative of the function at the point.
Theorem 13.2.1. Suppose f (z) is analytic at z0 with f 0 (z0 ) 6= 0. Let C1 : z1 (t) and C2 : z2 (t) be smooth
0 0
curves in the z plane that intersect at z0 = z1 (t0 ) = z2 (t0 ) with C1 : w1 (t) and C2 : w2 (t) the images of
C1 and C2 , respectively. Then the angle between C1 and C2 , measured from C1 to C2 , is equal to the angle
0 0 0 0
between C1 and C2 measured from C1 to C2 .
Proof. Let the tangents to C1 and C2 make angles θ1 and θ2 respectively with the x-axis. Then the angle
0 0
between C1 and C2 at z0 is θ2 − θ1 (see fig. 13.1). According to (13.2.2), the angle between C1 and C2 , which
13.2. CONFORMAL TRANSFORMATIONS 101
Figure 13.1
0 0
is the angle between the tangent vectors f 0 (z0 )z1 (t0 ) and f 0 (z0 )z2 (t0 ), of the image curves is
A function that preserves both angle size and orientation is said to be conformal. Theorem 13.2 says that
an analytic function is conformal at all points where the derivative is non-zero. For example, the function
f (z) = ez maps vertical and horizontal lines into circles and orthogonal radial rays, respectively.
A function that preserves angle size but not orientation is said to be isogonal. An example of such a
function is f (z) = z. To illustrate, z maps maps the positive real axis and the positive imaginary axis onto the
positive real axis and the negative real axis respectively (see fig. 13.2). Although the two curves intersect at
right angles in each plane, a ”counterclockwise” angle is mapped onto a ”clockwise” angle.
Figure 13.2
The non-zero derivatives of f has certain implications which we shall see now.
102 UNIT 13.
Theorem 13.2.2. If f (z) is analytic at z0 with f 0 (z0 ) 6= 0, then f (z) is one-to-one in some neighbourhood of
z0 .
Proof. Since f 0 (z0 ) 6= 0 and f 0 (z) is continuous at z0 , there exists a δ > 0 such that
|f 0 (z0 )|
|f 0 (z) − f 0 (z0 )| < for all |z| < δ
2
Let z1 and z2 be two distinct points in |z| < δ, and γ be a line segment connecting z1 and z2 . Set φ(z) =
f (z) − f 0 (z0 )z so that |φ0 (z)| < |f 0 (z0 )|/2 for all |z| < δ. Now we have,
Z
|φ(z2 ) − φ(z1 )| = φ0 (z)dz < (|f 0 (z0 )|/2)|z2 − z1 |
γ
or equivalently,
|f (z2 ) − f (z1 ) − f 0 (z0 )(z2 − z1 )| < (|f 0 (z0 )|/2)|z2 − z1 |.
Thus, by the triangle inequality, we obtain
The vanishing of a derivative does not preclude the possibility of real function being one-to-one. Although
the derivative of f (x) = x3 is zero at the origin, the function is still one-to-one on the real line. That this
cannot occur for complex functions is seen by
Theorem 13.2.3. If f (z) is analytic and one-to-one in a domain D, then f 0 (z) 6= 0 in D, so that f is conformal
in D.
Proof. If f 0 (z) = 0 at some point z0 in D, then
f 00 (z0 )
f (z) − f (z0 ) = (z − z0 )2 + · · ·
2!
has a zero of order k(k ≥ 2) at z0 . Since zeros of an analytic function are isolated, there exists an r > 0
so small that both f (z) − f (z0 ) and f 0 (z) have no zeros in the punctured disk 0 < |z − z0 | ≤ r. Let
g(z) := f (z) − f (z0 ), C = {z : |z − z0 | = r} and m = minz∈C |g(z)|.
Then, g has a zero of order k(k ≥ 2) and m > 0. Let b ∈ C be such that 0 < |b − f (z0 )| < m. Then, as
m ≤ |g(z)| on C,
|f (z0 ) − b| < |g(z)| on C
It follows from Rouche’s theorem that g(z) and g(z) + (f (z0 ) − b) = f (z) − b have same number of zeros
inside C. Thus, f (z) − b has at least two zeros inside C. Observe that none of these zeros can be at z0 . Since
f 0 (z) 6= 0 in the punctured disk 0 < |z − z0 | ≤ r, these zeros must be simple and so, distinct. Thus, f (z) = b
at two or more points inside C. This contradicts the fact that f is one-to-one on D.
We sum up our results for differentiable functions. In the real case,the nonvanishing of a derivative on an
interval is a sufficient but not a necessary condition for the function to be one-to-one on the interval; whereas
in the complex case, the nonvanishing of a derivative on a domain is a necessary but not a sufficient condition
for the function to be one-to-one on the domain.
An analytic function f : D → C is called locally bianalytic at z0 ∈ D if there exists a neighbourhood
N of z0 such that the restriction of f from N onto f (N ) is bianalytic. Clearly, a locally bianalytic map on D
need not be bianalytic on D, as the example f (z) = z n (n > 2) on C − {0} illustrates.
Combining 13.2.2 and 13.2.3 leads to the following criterion for local bianalytic maps.
13.3. CONFORMAL EQUIVALENCES AND EXAMPLES 103
Theorem 13.2.4. Let f (z) be analytic in a domain D and z0 ∈ D. Then f is bianalytic at z0 iff f 0 (z0 ) 6= 0.
A sufficient condition for an analytic function to be one-to-one in a simply connected domain is that it be
one-to-one on its boundary. More formally, we have
Theorem 13.2.5. Let f (z) be analytic in a simply connected domain D and on its boundary, the simple closed
contour C. If f (z) is one-to-one on C, then f (z) is one-to-one in D.
Proof. (See fig. 13.3) Choose a point z0 ∈ D such that w0 = f (z0 ) 6= f (z) for zZon C. According to
the argument principle, the number of zeros of f (z) − f (z0 ) in D is given by (1/2πi) {f (z) − f (z0 )}dz.
C
By hypothesis, the image of C must be a simple closed contour, which we shall denote by C 0 . Thus the net
change in the argument of w − w0 = f (z) − f (z0 ) as w = f (z) traverses the contour C 0 is either +2π or
−2π, according to whether the contour is traversed counterclockwise or clockwise. Since f (z) assumes the
value w0 at least once in D, we must have
Z Z
1 1
{f (z) − f (z0 )}dz = {w − w0 } = 1.
2πi C 2πi C
Figure 13.3
This proves the theorem for all points zZ0 in D at which f (z) 6= f (z0 ) when z is on C. If f (z) = f (z0 )
at some point on C, then the expression {f (z) − f (z0 )}dz is not defined. We leave for the reader the
C
completion of the proof in this special case.
Example 13.3.1. Let H = {z ∈ C : Im z > 0} be the upper half plane. A remarkable fact, which at first
seems surprising, is that the unbounded set H is conformally equivalent to the unit disc. Moreover, an explicit
formula giving this equivalence exists. Indeed, let
i−z 1−w
F (z) = and G(w) = i .
i+z 1+w
Then it is a regular exercise to check that map F : H → D is conformal with inverse G : D → H. An
interesting aspect of these functions is their behaviour on the boundaries of our open sets. Observe that F is
analytic everywhere on C except at z = −i, and in particular it is continuous everywhere on the boundary of
H, namely, the real line. If we take z = x real, then the distance from x to i is the same as the distance from
x to −i, therefore |F (x)| = 1. Thus, F maps R onto the boundary of D. We get more information by writing
i−x 1 − x2 2x
F (z) = = +i ,
i+x 1 + x2 1 + x2
π π
and parametrizing the real line by x = tan t with t ∈ − , . Since
2 2
2 tan a 1 − tan2 a
sin 2a = and cos 2a = ,
1 + tan2 a 1 + tan2 a
we have, F (x) = cos 2t + i sin 2t = ei2t . Hence the image of the real line is the arc consisting of the circle
omitting the point −1. Moreover, as x travels from −∞ to ∞, F (x) travels along the arc starting from −1
and first going through that part of the circle that lies in the lower half-plane. The point −1 on the circle
corresponds to the ”point at infinity” of the upper half-plane.
Figure 13.4
As θ travels from 0 to π we see that f (eiθ ) travels along the imaginary axis from infinity to 0. Moreover, if
z = x is real, then
1+x
f (z) =
1−x
is also real; and one sees from this, that f is actually a bijection from (−1, 1) to the positive real axis, with
f (x) increasing from 0 to infinity as x travels from −1 to 1. Note also that f (0) = 1.
Exercise 13.3.1. 1. Show that for h ∈ C, the translation map f (z) = z + h is a conformal map from C to
itself.
n π π o
2. Show that the map f (z) = eiz takes the half-strip z = x + iy : − < x < , y > 0 conformally
2 2
to the half-disc {w = u + iv : |w| < 1, u > 0}.
az + b
f (z) = (13.4.1)
cz + d
is a linear fractional transformation. If ad − bc 6= 0, then f (z) is called a Möbius Transformation. If f is a
Möbius Transformation, then
dz − b
f −1 (z) =
−cz + a
is the inverse map of f . Also, if f and g are two linear fractional transformations, then their composition f ◦ g
is also so. Hence, the set of all Möbius Transformations form a group under group composition.
Theorem 13.4.1. If f is a Möbius Transformation, then f is the composition of translations, dilations and
inversion.
The fixed points of a Möbius Transformation (13.4.1) are the points where f (z) = z, that is,
cz 2 + (d − a)z − b = 0.
Hence a Möbius Transformation has at most two fixed points unless it is the identity transformation.
106 UNIT 13.
Now, let f be a Möbius Transformation and let a, b, c be distinct points in C∞ such that f (a) = α, f (b) =
β, f (c) = γ. Suppose that g is another Möbius Transformation with the same property. Then g −1 ◦ f
has a, b and c as fixed points and hence it is the identity transformation and thus, f ≡ g. Thus a Möbius
Transformation is uniquely determined by its action on three points in C∞ .
Let z2 , z3 and z4 be points on C∞ . Define f : C∞ → C∞ by
z − z3 z2 − z4
f (z) = · if z2 , z3 , z4 ∈ C∞
z − z4 z2 − z3
z − z3
= if z2 = ∞
z − z4
z2 − z 4
= if z3 = ∞
z − z4
z − z3
= if z4 = ∞.
z2 − z 3
In any case, f (z2 ) = 1, f (z3 ) = 0, f (z4 ) = ∞ and f is the only transformation having this property.
Definition 13.4.1. If z1 ∈ C∞ , then the cross ratio of z1 , z2 , z3 and z4 is the image of z1 under the unique
Mö transformation which takes z2 to 1, z3 to 0 and z4 to ∞. The cross ratio of z1 , z2 , z3 and z4 is denoted by
(z1 , z2 , z3 , z4 ).
For example, (z2 , z2 , z3 , z4 ) = 1 and (z, 1, 0, ∞) = z. Also, if M is a Möbius map and w2 , w3 , w4 are the
points such that M w2 = 1, M w3 = 0 and M w4 = ∞, then M z = (z, w2 , w3 , w4 ).
Theorem 13.4.2. If z2 , z3 and z4 are distinct points and T is any Möbius transformation, then
Proof. Let S(z) = (z, z2 , z3 , z4 ). Then S is a Möbius map. If M = ST −1 , then M (T (z2 )) = 1, M (T (z3 )) =
0, M (T (z4 )) = ∞. Hence, ST −1 (z) = (z, T (z2 ), T (z3 ), T (z4 )) for all z ∈ C∞ . In particular, if z = T (z1 ),
the desired result follows.
Theorem 13.4.3. If z2 , z3 , z4 are distinct points in C∞ and w2 , w3 , w4 are also distinct points of C∞ , then
there is one and only one Möbius transformation S such that S(z2 ) = w2 , S(z3 ) = w3 , S(z4 ) = w4 .
Proof. Let T (z) = (z, z2 , z3 , z4 ), M (z) = (z, w2 , w3 , w4 ) and put S = M −1 T . Clearly, S has the desired
property. If R is another Möbius transformation with Rzj = wj for j = 2, 3, 4 then R−1 · S has three fixed
points (z2 , z3 and z4 ). Hence, R−1 · S = I or S = R.
It is well known that three points in the plane determine a circle. The next result explains when four points
lie on a circle.
Theorem 13.4.4. Let z1 , z2 , z3 , z4 be four distinct points in C∞ . Then (z1 , z2 , z3 , z4 ) is a real number iff all
four points lie on a circle.
Proof. Let S : C∞ → C∞ be defined by S(z) = (z, z2 , z3 , z4 ); then S −1 (R)=the set of z such that
(z, z2 , z3 , z4 ) is real. Hence, we will be finished if we can show that the image of R∞ under a Möbius
map us a circle.
Let
az + b
S(z) = (13.4.2)
cz + d
13.4. MÖBIUS TRANSFORMATIONS 107
If z = w ∈ R and w = S −1 (x) then x = S(w) implies that S(w) = S(w). That is,
aw + b aw + b
=
cw + d cw + d
Cross multiplying this gives
If ac is real then ac − ac = 0; putting α = 2(ad − bc), β = i(bd − bd) and multiplying (13.4.3) by i gives
since β is real. That is, w lies on the line determined by (13.4.4) for fixed α and β. If ac is not real then
(13.4.3) becomes
|w|2 + γw + γw − δ = 0 (13.4.5)
for some constants γ in C, δ in R. Hence,
|w + γ| = λ (13.4.6)
where
p ad − bc
λ= |γ|2 + δ = > 0.
ac − ac
Since γ and λ are independent of x and since (13.4.6) is the equation of a circle, the proof is done.
Proof. Let Γ be any circle in C∞ and let S be any Möbius transformation. Let z2 , z3 , z4 be three distinct
points on Γ and put wj = S(zj ) for j = 2, 3, 4. Then w2 , w3 , w4 determine a circle Γ0 . We claim that
S(Γ) = Γ0 . In fact, for any z in C∞ ,
by theorem 13.4.2. By the preceding theorem, if z is on Γ, then both sides of (13.4.7) are real. But this says
that S(z) ∈ Γ0 .
Now, let Γ and Γ0 be two circles in C∞ and let z2 , z3 , z4 ∈ Γ; w2 , w3 , w4 ∈ Γ0 . Put R(z) = (z, z2 , z3 , z4 ),
S(z) = (z, w2 , w3 , w4 ). Then T = S −1 ◦ R maps Γ onto Γ0 . In fact, T (zj ) = wj for j = 2, 3, 4 and, as in the
above proof, it follows that T (Γ) = Γ0 .
Theorem 13.4.6. For any given circles Γ and Γ0 in C∞ , there is a Möbius transformation T such that T (Γ) =
Γ0 . Furthermore we can specify that T takes any three points in Γ onto any three points on Γ0 . If we specify
T (zj ) for j = 2, 3, 4 (distinct zj in Γ) then T is unique.
Now that we know that a Möbius map takes circles to circles, the next question is: What happens to the
inside and the outside of these circles? To answer this we introduce some new concepts.
Definition 13.4.2. Let Γ be a circle through points z2 , z3 , z4 . The points z, z ∗ in C∞ are said to be symmetric
with respect to Γ is
(z ∗ , z2 , z3 , z4 ) = (z, z2 , z3 , z4 ). (13.4.8)
108 UNIT 13.
z∗
Figure 13.5
As it stands, this definition not only depends on the circle but also on the points z2 , z3 , z4 .
Also by theorem 13.4.4, z is symmetric to itself with respect to Γ if and only if z ∈ Γ. Let us investigate
what it means for z and z ∗ to be symmetric. If Γ is a straight line then our linguistic prejudices lead us to
believe that z and z ∗ are symmetric with respect to Γ if the line through z and z ∗ are the same distance from
Γ but on the opposite sides of it (see fig. 13.5.
If Γ is a straight line then, choosing z4 = ∞, (13.4.8) becomes
z ∗ − z3 z − z3
= .
z 2 − z3 z2 − z3
This gives |z ∗ − z3 | = |z − z3 . Since z3 was not specified, we have that z and z ∗ are equidistant from each
point on Γ. Also,
z ∗ − z3 z − z3 z − z3
Im = Im = −Im .
z2 − z 3 z2 − z3 z2 − z3
Hence, we have (unless z ∈ Γ) that z and z ∗ in different half planes determined by Γ. It now follows that
[z, z ∗ ] is perpendicular to Γ.
Now, suppose that Γ = {z : |z − a| = R} (0 < R < ∞). Let z2 , z3 , z4 be points in Γ. Using (13.4.8) and
theorem 13.4.2 for a number of Möbius transformations gives
(z ∗ , z2 , z3 , z4 ) = (z, z2 , z3 , z4 )
= (z − a, z2 − a, z3 − a, z4 − a)
R2 R2 R2
= z − a, , ,
z 2 − a z3 − a z4 − a
2
R
= , z2 − a, z3 − a, z4 − a
z−a
2
R
= + a, z2 , z3 , z4 .
z−a
z∗ − a R2
= > 0,
z−a |z − a|2
13.5. FEW PROBABLE QUESTIONS 109
Figure 13.6
so that z ∗ lies on the ray {a+t(z−a) : 0 < t < ∞} from a through z. Using the fact that |z−a||z ∗ −a| = R2 ,
we obtain z ∗ from z (if z lies inside Γ) as in the figure 13.6. That is, let L be the ray from a through z.
Construct a line P perpendicular to L at z and at the point where P intersects Γ; construct the tangent to Γ.
The point of intersection of this tangent with L is the point z ∗ . Thus, the points a and ∞ are symmetric with
respect to Γ.
Theorem 13.4.7. (Symmetry Principle) If a Möbius transformation T takes a circle Γ1 onto the circle Γ2 ,
then any pair of points symmetric with respect to Γ1 are mapped by T onto a pair of points symmetric with
respect to Γ2 .
1. Define conformal maps. Show that a map f , analytic at z0 with f 0 (z0 ) 6= 0 is one-to-one in a neigh-
bourhood of z0 .
3. If a function f is analytic in a simply connected domain D and on its boundary C (which is a simple
closed contour), then f one-to-one on C implies it is so in D.
4. Define conformally equivalent regions. Show that the upper half disc {z : |z| < 1, Im z > 0} is
conformally equivalent to the first quadrant {w = u + iv : u > 0, v > 0}.
5. Define cross ratio of z1 , z2 , z3 , z4 . For z1 , z2 , z3 , z4 ∈ C∞ , show that the cross ratio is a real number if
and only if all the four points lie on a circle.
6. Show that a Möbius transformation takes circles into circles. When are two points said to be symmetric
with respect to a circle Γ?
Unit 14
Course Structure
• Schwarz principle of symmetry
14.1 Introduction
In mathematics, the Schwarz reflection principle, or the Schwarz principle of symmetry, is a way to extend
the domain of definition of a complex analytic function, i.e., it is a form of analytic continuation. It states that
if an analytic function is defined on the upper half-plane, and has well-defined (non-singular) real values on
the real axis, then it can be extended to the conjugate function on the lower half-plane as we shall see. This
unit is also dedicated to a preliminary study of the Schwarz-Christoffel mapping which is mainly a conformal
transformation of the upper half-plane onto the interior of a simple polygon. SchwarzChristoffel mappings are
used in potential theory and some of its applications, including minimal surfaces and fluid dynamics. They
are named after Elwin Bruno Christoffel and Hermann Amandus Schwarz.
Objectives
After reading this unit, you will be able to
• define symmetric open set and deduce the symmetry principle
110
14.2. SCHWARZ PRINCIPLE OF SYMMETRY 111
more conditions on the extension. The situation is very different for holomorphic functions. Not only are these
functions indefinitely differentiable in their domain of definition, but they also have additional characteristi-
cally rigid properties, which make them difficult to mould. For example, there exist holomorphic functions in
a disc which are continuous on the closure of the disc, but which cannot be continued (analytically) into any
region larger than the disc.
Let Ω be an open subset of C that is symmetric with respect to the real line, that is
z ∈ Ω if and only if z ∈ Ω.
Let Ω+ denote a part of Ω that lies in the upper half-plane and Ω− that part that lies in the lower half-plane
(see fig.14.1 for illustration).
Figure 14.1
Also, let I = Ω ∩ R so that I denotes the interior of that part of the boundary of Ω+ and Ω− that lies on the
real axis. Then we have
Ω+ ∪ I ∪ Ω− = Ω
and the only interesting case of the next theorem occurs, of course, when I is non-empty.
Theorem 14.2.1. (Symmetry principle) If f + and f − are analytic in Ω+ and Ω− respectively, that extend
continuously to I and
f + (x) = f − (x), ∀ x ∈ I,
then the function f defined on Ω by
f (z) = f + (z) if z ∈ Ω+
= f + (z) = f − (z) if z ∈ I
− −
= f (z) if z ∈ Ω
is analytic on all of Ω.
112 UNIT 14.
Proof. One notes first that f is continuous throughout Ω. The only difficulty is to prove that f is analytic at
points of I. Suppose D is a disc centred at a point on I and entirely contained in Ω. We prove that f is analytic
in D by Moreras theorem. Suppose T is a triangle in D. If T does not intersect I, then
Z
f (z)dz = 0
T
since f is analytic in the upper and lower half-discs. Suppose now that one side or vertex of T is contained
in I, and the rest of T is in, say, the upper half-disc.
Z If T is the triangle obtained from T by slightly raising
the edge or vertex which lies on I, we have f = 0 since T is entirely contained in the upper half-disc an
T
(illustration of the case when an edge lies on I is given in Figure 14.2). When we let → 0, and by continuity,
we conclude that Z
f (z)dz = 0
T
If the interior of T intersects I, we can reduce the situation to the previous one by writing T as the union of
triangles each of which has an edge or vertex on I as shown in Figure 14.3. By Moreras theorem we conclude
that f is analytic in D, as was to be shown.
We can now state the extension principle, where we use the above notation.
Theorem 14.2.2. (Schwarz reflection principle) Suppose that f is a analytic function in Ω+ that extends
continuously to I and such that f is real-valued on I. Then there exists a function F analytic in all of Ω such
that F = f on Ω+ .
14.3. SCHWARZ CHRISTOFFEL FORMULA 113
F (z) = f (z).
To prove that F is analytic in Ω− we note that if z, z0 ∈ Ω− , then z, z 0 ∈ Ω+ and hence, the power series
expansion of f near z 0 gives X
f (z) = an (z − z 0 )n .
As a consequence we see that X
F (z) = an (z − z0 )n
and F is analytic in Ω− . Since f is real valued on I we have, f (x) = f (x), whenever x ∈ I and hence F
extends continuously up to I. The proof is complete once we invoke the symmetry principle.
In particular, if C is a segment of the x-axis with positive sense to the right then t = 1 and arg t = 0 at each
point z0 = x on C. In that case, equation (14.3.1) becomes
If f 0 (z) has a constant argument along that segment, it follows that arg τ is constant. Hence, the image Γ of
C is also a segment of a straight line.
Let us now construct a transformation w = f (z) that maps the whole x-axis onto a polygon of n sides,
where x1 , x2 , . . . , xn−1 and ∞ are the points on that axis whose images are to be the vertices of the polygon
and where x1 < x2 < · · · < xn−1 . The vertices are the n points wj = f (xj )(j = 1, 2, . . . n − 1) and
wn = f (∞). The function f should be such that arg f 0 (z) jumps from one constant value to another at the
points z = xj as the point z traces out the x-axis. If the function f is chosen such that
y v wn
w1 w3
z
w2
x1 t x2 x3 xn x u
where A is a complex constant and each kj is a real constant, then the argument of f 0 (z) changes in the
prescribed manner as z describes the real axis. This is seen writing the argument of the derivative (14.3.3) as
When x1 < x < x2 , the argument arg(z − x1 ) is 0 and each of the other arguments is π. According to
equation (14.3.4), then arg f 0 (z) increases abruptly by the angle k1 π as z moves to the right through the point
z = x1 . It again jumps in value, by the amount k2 π, as z passes through the point x2 , etc.
In view of (14.3.2), the unit vector τ is constant in direction as z moves from xj−1 to xj ; the point w thus
moves in that fixed direction along a straight line. The direction of τ changes abruptly by the angle kj π at the
image point wj of xj . Those angles kj π are the exterior angles of the polygon described by the point w.
The exterior angles can be limited to angles between −π to π, in which case −1 < kj < 1. We assume
that the sides of the polygon never cross one another and that the polygon is given is given a positive or
counterclockwise orientation. The sum of the exterior angles of a closed polygon is, then 2π and the exterior
angle at the vertex wn which is the image of the point z = ∞, can be written
kn π = 2π − (k1 + k2 + · · · + kn−1 )π
Note that kn = 0 if k1 + k2 + · · · + kn−1 = 2. This means that the direction of τ does not change at the point
wn . So, wn is not a vertex, and the polygon has n − 1 sides.
Course Structure
• Univalent functions, general theorems
15.1 Introduction
In mathematics, in the branch of complex analysis, an analytic function on an open subset of the complex plane
is called univalent if it is injective. The theory of univalent functions is an old subject, born around the turn of
the century, yet it remains an active field of research. This unit introduces the class S of univalent functions
and some of its subclasses defined by geometric conditions. A number of basic questions are answered by
elementary methods. Most of the results concerning the class S are direct consequences of the area theorem,
which may be regarded as the cornerstone of the entire subject.
Objectives
After reading this unit, you will be able to
115
116 UNIT 15.
uniformly bounded on each compact subset of D. If F is a locally bounded family of analytic functions, then
by the Cauchy integral formula, the family of derivatives {f 0 : f ∈ F } is also locally bounded.
We have the following theorem concerning locally bounded family of analytic functions.
Theorem 15.2.1. A necessary and sufficient condition for a family of analytic functions to be locally bounded
is that, it is normal.
The Koebe function maps the disc D onto the entire plane minus the part of the negative real axis from −1/4
to infinity. This is best seen by writing
1 1+z 2 1
k(z) = −
4 1−z 4
and observing that the function
1+z
w=
1−z
maps D conformally onto the right half-plane Re{w} > 0.
Other simple examples of functions in S are
1. f (z) = z, the identity mapping;
2. f (z) = z(1 − z)−1 , which maps D conformally onto the half plane Re{w} > −1/2;
1
3. f (z) = z(1 − z 2 )−1 , which maps D onto the entire plane minus the two half lines ≤ x < ∞ and
2
1
−∞ < x ≤ − ;
2
1 1
4. f (z) = z − z 2 = [1 − (1 − z)2 ], which maps D onto the interior of a cardioid.
2 2
15.3. UNIVALENT FUNCTIONS 117
The sum of two functions in S need not be univalent. For example, the sum of z(1 − z)−1 and z(1 + iz)−1
1
has a derivative which vanishes at (1 + i) (verify!). However, the class S is preserved under a number of
2
elementary transformations.
1. Conjugation: If f ∈ S and
g(z) = f (z) = z + a2 z 2 + a3 z 3 + · · · ,
then g ∈ S.
2. Rotation: If f ∈ S and
g(z) = e−iθ f (e−iθ z),
then g ∈ S.
3. Dilation: If f ∈ S and
1
g(z) = f (rz), where 0 < r < 1,
r
then g ∈ S.
4. Range Transformation: If f ∈ S and ψ is a function analytic and univalent on the range of f , with
ψ(0) = 0 and ψ 0 (0) = 1, then g = ψ ◦ f ∈ S.
ωf
g= ∈ S.
ω−f
p
6. Square-root transformation: If f ∈ S and g(z) = f (z 2 ), the g ∈ S.
The square root transformation requires a word of explanation. Since f (z) = 0 only at the origin, a single
valued branch of the square root may be chosen by writing
p 1
g(z) = f (z 2 ) = z{1 + a2 z 2 + a3 z 4 + · · · } 2
= z + c3 z 3 + c5 z 5 + · · · , |z| < 1.
Note that g is an odd analytic function, so that g(−z) = −g(z). If g(z1 ) = g(z2 ), then f (z12 ) = f (z22 ) and
z12 = z22 , which gives z1 = ±z2 . But, if z1 = −z2 , then g(z1 ) = g(z2 ) = −g(z1 ). Thus g(z1 ) = 0, and
z1 = 0. This shows that z1 = z2 in either case, proving that g is univalent.
Closely related to S, is the class Σ of functions
g(z) = z + b0 + b1 z −1 + b2 z −2 + · · ·
is analytic and univalent in the domain E = {z : |z| > 1}, exterior to the domain D, except for a simple pole
at infinity with residue 1. Each function g ∈ Σ maps E onto the complement of a compact connected set E.
It is useful to consider the subclass Σ0 of functions g ∈ Σ for which 0 ∈ E; that is, for which g(z) 6= 0 in E.
Any function g ∈ Σ will belong to Σ0 after suitable adjustment of the constant term b0 . Such an adjustment
will only translate the range of g and will not destroy the univalence.
For each f ∈ S, the function
−1
1
g(z) = f = z − a2 + (a22 − a3 )z −1 + · · ·
z
118 UNIT 15.
It is important to observe that this operation cannot be applied to every function g ∈ Σ, but is permissible only
if g ∈ Σ0 , because the square root will introduce a branch point wherever g(z 2 ) = 0.
Sometimes, it is convenient to consider the subclass Σ0 consisting of all g ∈ Σ with b0 = 0. Obviously
this can be achieved by suitable translation, but it may not be possible to translate a given function g ∈ Σ
simultaneously to both Σ0 and Σ0 .
It is also useful to distinguish the subclass Σ̃ of all functions g ∈ Σ whose omitted set E has two dimen-
sional Lebesgue measure zero. The functions g ∈ Σ̃ will be called full mappings.
3. Define univalent function on a domain D. Show that for an analytic function f on D, f 0 (z0 ) 6= 0 at
z0 ∈ D is equivalent to the local univalence of f at z0 .
Unit 16
Course Structure
• Area theorem
16.1 Introduction
The univalence of a function
∞
X
g(z) = z + b0 + bn z −1 , |z| > 1
n=1
places strong restriction on the size of the Laurent coefficients bn , n = 1, 2, . . .. This is expressed by the area
theorem, which is fundamental to the theory of univalent functions. The reason for the name will be apparent
from the proof. Gronwall discovered the theorem in 1914.
Objectives
After reading this unit, you will be able to
119
120 UNIT 16.
Proof. Let E be the set omitted by g. For r > 1, let Cr be the image under g of the circle |z| = r. Since g is
univalent, Cr is a simple closed curve which encloses a domain Er ⊃ E. By Green’s theorem, the area of Er
is
Z Z
1 1
Ar = wdw = g(z)g 0 (z)dz
2i Cr 2i |z|=r
∞ ∞
1 2π
Z X X
−iθ −n inθ
= {r e + bn r e } × {1 − vbv r−v−1 e−i(v+1)θ }r eiθ dθ
2 0
n=0 v=1
∞
X
= π{r2 − n|bn |2 r−2n }, r>1
n=1
(16.2.1)
An immediate corollary is the inequality |bn | ≤ n−1/2 , n = 1, 2, . . .. This inequality is not sharp if n ≥ 2,
since the function
g(z) = z + n−1/2 z −n
is not univalent. Indeed, its derivative
g 0 (z) = 1 − n1/2 z −n−1
vanishes at certain points in E if n ≥ 2. However, the inequality |b1 | ≤ 1 is sharp and has important
consequences.
P
Corollary 16.2.1. If g ∈ , then |b1 | ≤ 1, with equality if and only if g has the form
b1
g(z) = z + b0 + , |b1 | = 1
z
This is a conformal mapping of E onto the complement of a line segment of length 4.
From this result it is a short step to a theorem of Bieberbach estimating the second coefficient a2 of a
function of class S. This theorem was given in 1916 and was the main basis for the famous Bieberbach
conjecture.
Theorem 16.2.2. (Bieberbach’s Theorem). If f ∈ S, then |a2 | ≤ 2, with equality if and only if f is a rotation
of the Koebe function.
Proof. A square-root transformation and an inversion applied to f ∈ S will produce a function
g(z) = z − eiθ /z
As a first application of Bieberbach’s theorem, we shall now prove a famous covering theorem due to
Koebe. Each function f ∈ S is an open mapping with f (0) = 0, so its range contains some disk centered
at the origin. As early as 1907, Koebe discovered that the ranges of all functions in S contain a common
disk |w| < ρ, where ρ is an absolute constant. The Koebe function shows that ρ ≤ 41 , and Bieberbach later
established Koebe’s conjecture that ρ may be taken to be 14 .
Theorem 16.2.3. (Koebe One-Quarter Theorem): The range of every function of class S contains the disk
{w : |w| < 41 }.
is analytic and univalent in D. This is the omitted-value transformation, which is the composition of f with a
linear fractional mapping. Since, g ∈ S, Bieberbach’s theorem gives
1
a2 + ≤2
ω
Combined with the inequality |a2 | ≤ 2 this shows that |1/ω| ≤ 4, or |ω| ≥ 14 . Thus every omitted value must
lie outside the disk |w| < 14 .
This proof actually shows that the Koebe function and its rotations are the only functions in S which omit
a value of modulus 14 . Thus the range of every other function in S covers a disk of larger radius.
It should be observed that univalence is the key to Koebe’s theorem. For example, the analytic functions
1 nz
fn (z) = (e −1), n = 1, 2, . . . ,
n
have the properties fn (0) = 0 and fn0 (0) = 1, yet fn omits the value −1/n, which may be chosen arbitrarily
close to the origin.
zf 00 (z) 2r2 4r
0
− 2
≤ , |z| = r < 1 (16.3.1)
f (z) 1−r 1 − r2
The distortion theorem will now be applied to obtain the sharp upper and lower bounds for |f (z)|. This
result is as follows.
Theorem 16.3.3. (Growth Theorem). For each f ∈ S,
r r
≤ |f (z)| ≤ , |z| = r < 1. (16.3.6)
(1 + r)2 (1 − r)2
For each z ∈ D, z 6= 0, equality occurs if and only if f is a suitable rotation of the Koebe function.
16.3. GROWTH AND DISTORTION THEOREMS 123
Proof. Let f ∈ S and fix z = eiθ with 0 < r < 1. Observe that
Z r
f (z) = f 0 (ρ eiθ ) eiθ dρ,
0
The lower estimate is more subtle. It holds trivially if |f (z)| ≥ 14 , since r(1 + r)−2 < 41 for 0 < r < 1. If
|f (z)| < 41 , the Koebe one-quarter theorem implies that the radial segment from 0 to f (z) lies entirely in the
range of f . Let C be the preimage of this segment. Then C is a simple arc from 0 to z, and
Z
f (z) = f 0 (ζ)dζ
C
But f 0 (ζ)dζ has constant signum along C, by construction, so the distortion theorem gives
Z r
1−ρ
Z
r
|f (z)| = |f 0 (ζ)||dζ| ≥ 3
dρ = .
C 0 (1 + ρ) (1 + r)2
Equality in either part of (16.3.6) implies equality in the corresponding part of (16.3.3), which implies that f
is a rotation of the Koebe function.
All of this information was obtained by passing to the real part in the basic inequality (16.3.1). Taking the
imaginary part instead, one finds
4r zf 00 (z) 4r
− ≤ Im{ }≤
1 − r2 f 0 (z) 1 − r2
4r ∂ 4r
− 2
≤ arg f 0 (r eiθ ) ≤
1−r ∂r 1 − r2
Radial integration now produces the inequality
1+r
| arg f 0 (z)| ≤ 2 log , f ∈S (16.3.7)
1−r
Here it is understood that arg f 0 (z) is the branch which vanishes at the origin. The quantity arg f 0 (z) can
be interpreted geometrically as the local rotation factor under the conformal mapping f . For this reason the
inequality (16.3.7) may be called a rotation theorem. Unfortunately, however, it is not sharp at any point
z 6= 0 in the disk. The true rotation theorem
√
| arg f 0 (z)| ≤ 4 sin−1 r, r ≤ 1/ 2
r2 √
≤ π + log 2
, r ≥ 1/ 2,
1−r
√
lies much deeper. The splitting of the sharp bound at r = 1/ 2 is one of the most remarkable phenomena in
the univalent function theory.
One further inequality, a combined growth and distortion theorem, is sometimes useful.
124 UNIT 16.
125