Introduction to probability models 11th edition solutions pdf

Introduction to Probability Models Eleventh Edition

Sheldon M. Ross University of Southern California Los Angeles, California

!-34%2$!-s"/34/.s(%)$%,"%2's,/.$/.s.%79/2+ /8&/2$s0!2)3s3!.$)%'/s3!.&2!.#)3#/s3).'!0/2% 39$.%9s4/+9/ !CADEMIC0RESSISAN)MPRINTOF%LSEVIER

Academic Press is an imprint of Elsevier The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, UK Radarweg 29, PO Box 211, 1000 AE Amsterdam, The Netherlands 225 Wyman Street, Waltham, MA 02451, USA 525 B Street, Suite 1800, San Diego, CA 92101-4495, USA Eleventh edition 2014 Tenth Edition: 2010 Ninth Edition: 2007 Eighth Edition: 2003, 2000, 1997, 1993, 1989, 1985, 1980, 1972 Copyright © 2014 Elsevier Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means electronic, mechanical, photocopying, recording or otherwise without the prior written permission of the publisher Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone (+44) (0) 1865 843830; fax (+44) (0) 1865 853333; email: [email protected] Alternatively you can submit your request online by visiting the Elsevier web site at http://elsevier.com/ locate/permissions, and selecting Obtaining permission to use Elsevier material. Notice No responsibility is assumed by the publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. Because of rapid advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made. Library of Congress Cataloging-in-Publication Data Ross, Sheldon M., author. Introduction to probability models / by Sheldon Ross. – Eleventh edition. pages cm Includes bibliographical references and index. ISBN 978-0-12-407948-9 1. Probabilities. I. Title. QA273.R84 2014 519.2–dc23 2013035819 British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library ISBN: 978-0-12-407948-9 For information on all Academic Press publications visit our web site at store.elsevier.com Printed and bound in USA 14 15 16 17 18 10 9 8 7 6 5 4 3 2 1

Preface

This text is intended as an introduction to elementary probability theory and stochastic processes. It is particularly well suited for those wanting to see how probability theory can be applied to the study of phenomena in fields such as engineering, computer science, management science, the physical and social sciences, and operations research. It is generally felt that there are two approaches to the study of probability theory. One approach is heuristic and nonrigorous and attempts to develop in the student an intuitive feel for the subject that enables him or her to “think probabilistically.” The other approach attempts a rigorous development of probability by using the tools of measure theory. It is the first approach that is employed in this text. However, because it is extremely important in both understanding and applying probability theory to be able to “think probabilistically,” this text should also be useful to students interested primarily in the second approach.

New to This Edition The tenth edition includes new text material, examples, and exercises chosen not only for their inherent interest and applicability but also for their usefulness in strengthening the reader’s probabilistic knowledge and intuition. The new text material includes Section 2.7, which builds on the inclusion/exclusion identity to find the distribution of the number of events that occur; and Section 3.6.6 on left skip free random walks, which can be used to model the fortunes of an investor (or gambler) who always invests 1 and then receives a nonnegative integral return. Section 4.2 has additional material on Markov chains that shows how to modify a given chain when trying to determine such things as the probability that the chain ever enters a given class of states by some time, or the conditional distribution of the state at some time given that the class has never been entered. A new remark in Section 7.2 shows that results from the classical insurance ruin model also hold in other important ruin models. There is new material on exponential queueing models, including, in Section 2.2, a determination of the mean and variance of the number of lost customers in a busy period of a finite capacity queue, as well as the new Section 8.3.3 on birth and death queueing models. Section 11.8.2 gives a new approach that can be used to simulate the exact stationary distribution of a Markov chain that satisfies a certain property. Among the newly added examples are 1.11, which is concerned with a multiple player gambling problem; 3.20, which finds the variance in the matching rounds problem; 3.30, which deals with the characteristics of a random selection from a population; and 4.25, which deals with the stationary distribution of a Markov chain.

xii

Preface

Course Ideally, this text would be used in a one-year course in probability models. Other possible courses would be a one-semester course in introductory probability theory (involving Chapters 1–3 and parts of others) or a course in elementary stochastic processes. The textbook is designed to be flexible enough to be used in a variety of possible courses. For example, I have used Chapters 5 and 8, with smatterings from Chapters 4 and 6, as the basis of an introductory course in queueing theory.

Examples and Exercises Many examples are worked out throughout the text, and there are also a large number of exercises to be solved by students. More than 100 of these exercises have been starred and their solutions provided at the end of the text. These starred problems can be used for independent study and test preparation. An Instructor’s Manual, containing solutions to all exercises, is available free to instructors who adopt the book for class.

Organization Chapters 1 and 2 deal with basic ideas of probability theory. In Chapter 1 an axiomatic framework is presented, while in Chapter 2 the important concept of a random variable is introduced. Section 2.6.1 gives a simple derivation of the joint distribution of the sample mean and sample variance of a normal data sample. Chapter 3 is concerned with the subject matter of conditional probability and conditional expectation. “Conditioning” is one of the key tools of probability theory, and it is stressed throughout the book. When properly used, conditioning often enables us to easily solve problems that at first glance seem quite difficult. The final section of this chapter presents applications to (1) a computer list problem, (2) a random graph, and (3) the Polya urn model and its relation to the Bose-Einstein distribution. Section 3.6.5 presents k-record values and the surprising Ignatov’s theorem. In Chapter 4 we come into contact with our first random, or stochastic, process, known as a Markov chain, which is widely applicable to the study of many real-world phenomena. Applications to genetics and production processes are presented. The concept of time reversibility is introduced and its usefulness illustrated. Section 4.5.3 presents an analysis, based on random walk theory, of a probabilistic algorithm for the satisfiability problem. Section 4.6 deals with the mean times spent in transient states by a Markov chain. Section 4.9 introduces Markov chain Monte Carlo methods. In the final section we consider a model for optimally making decisions known as a Markovian decision process. In Chapter 5 we are concerned with a type of stochastic process known as a counting process. In particular, we study a kind of counting process known as a Poisson process. The intimate relationship between this process and the exponential distribution is discussed. New derivations for the Poisson and nonhomogeneous Poisson processes are discussed. Examples relating to analyzing greedy algorithms, minimizing highway encounters, collecting coupons, and tracking the AIDS virus, as well as

Preface

xiii

material on compound Poisson processes, are included in this chapter. Section 5.2.4 gives a simple derivation of the convolution of exponential random variables. Chapter 6 considers Markov chains in continuous time with an emphasis on birth and death models. Time reversibility is shown to be a useful concept, as it is in the study of discrete-time Markov chains. Section 6.7 presents the computationally important technique of uniformization. Chapter 7, the renewal theory chapter, is concerned with a type of counting process more general than the Poisson. By making use of renewal reward processes, limiting results are obtained and applied to various fields. Section 7.9 presents new results concerning the distribution of time until a certain pattern occurs when a sequence of independent and identically distributed random variables is observed. In Section 7.9.1, we show how renewal theory can be used to derive both the mean and the variance of the length of time until a specified pattern appears, as well as the mean time until one of a finite number of specified patterns appears. In Section 7.9.2, we suppose that the random variables are equally likely to take on any of m possible values, and compute an expression for the mean time until a run of m distinct values occurs. In Section 7.9.3, we suppose the random variables are continuous and derive an expression for the mean time until a run of m consecutive increasing values occurs. Chapter 8 deals with queueing, or waiting line, theory. After some preliminaries dealing with basic cost identities and types of limiting probabilities, we consider exponential queueing models and show how such models can be analyzed. Included in the models we study is the important class known as a network of queues. We then study models in which some of the distributions are allowed to be arbitrary. Included are Section 8.6.3 dealing with an optimization problem concerning a single server, general service time queue, and Section 8.8, concerned with a single server, general service time queue in which the arrival source is a finite number of potential users. Chapter 9 is concerned with reliability theory. This chapter will probably be of greatest interest to the engineer and operations researcher. Section 9.6.1 illustrates a method for determining an upper bound for the expected life of a parallel system of not necessarily independent components and Section 9.7.1 analyzes a series structure reliability model in which components enter a state of suspended animation when one of their cohorts fails. Chapter 10 is concerned with Brownian motion and its applications. The theory of options pricing is discussed. Also, the arbitrage theorem is presented and its relationship to the duality theorem of linear programming is indicated. We show how the arbitrage theorem leads to the Black–Scholes option pricing formula. Chapter 11 deals with simulation, a powerful tool for analyzing stochastic models that are analytically intractable. Methods for generating the values of arbitrarily distributed random variables are discussed, as are variance reduction methods for increasing the efficiency of the simulation. Section 11.6.4 introduces the valuable simulation technique of importance sampling, and indicates the usefulness of tilted distributions when applying this method.

xiv

Preface

Acknowledgments We would like to acknowledge with thanks the helpful suggestions made by the many reviewers of the text. These comments have been essential in our attempt to continue to improve the book and we owe these reviewers, and others who wish to remain anonymous, many thanks: Mark Brown, City University of New York Zhiqin Ginny Chen, University of Southern California Tapas Das, University of South Florida Israel David, Ben-Gurion University Jay Devore, California Polytechnic Institute Eugene Feinberg, State University of New York, Stony Brook Ramesh Gupta, University of Maine Marianne Huebner, Michigan State University Garth Isaak, Lehigh University Jonathan Kane, University of Wisconsin Whitewater Amarjot Kaur, Pennsylvania State University Zohel Khalil, Concordia University Eric Kolaczyk, Boston University Melvin Lax, California State University, Long Beach Jean Lemaire, University of Pennsylvania Andrew Lim, University of California, Berkeley George Michailidis, University of Michigan Donald Minassian, Butler University Joseph Mitchell, State University of New York, Stony Brook Krzysztof Osfaszewski, University of Illinois Erol Pekoz, Boston University Evgeny Poletsky, Syracuse University James Propp, University of Massachusetts, Lowell Anthony Quas, University of Victoria Charles H. Roumeliotis, Proofreader David Scollnik, University of Calgary

Preface

Mary Shepherd, Northwest Missouri State University Galen Shorack, University of Washington, Seattle Marcus Sommereder, Vienna University of Technology Osnat Stramer, University of Iowa Gabor Szekeley, Bowling Green State University Marlin Thomas, Purdue University Henk Tijms, Vrije University Zhenyuan Wang, University of Binghamton Ward Whitt, Columbia University Bo Xhang, Georgia University of Technology Julie Zhou, University of Victoria

Introduction to Probability Theory

1.1

Introduction

Any realistic model of a real-world phenomenon must take into account the possibility of randomness. That is, more often than not, the quantities we are interested in will not be predictable in advance but, rather, will exhibit an inherent variation that should be taken into account by the model. This is usually accomplished by allowing the model to be probabilistic in nature. Such a model is, naturally enough, referred to as a probability model. The majority of the chapters of this book will be concerned with different probability models of natural phenomena. Clearly, in order to master both the “model building” and the subsequent analysis of these models, we must have a certain knowledge of basic probability theory. The remainder of this chapter, as well as the next two chapters, will be concerned with a study of this subject.

1.2

Sample Space and Events

Suppose that we are about to perform an experiment whose outcome is not predictable in advance. However, while the outcome of the experiment will not be known in advance, let us suppose that the set of all possible outcomes is known. This set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.

Introduction to Probability Models

Some examples are the following. 1. If the experiment consists of the flipping of a coin, then S = {H, T } where H means that the outcome of the toss is a head and T that it is a tail. 2. If the experiment consists of rolling a die, then the sample space is S = {1, 2, 3, 4, 5, 6} where the outcome i means that i appeared on the die, i = 1, 2, 3, 4, 5, 6. 3. If the experiments consists of flipping two coins, then the sample space consists of the following four points: S = {(H, H ), (H, T ), (T, H ), (T, T )} The outcome will be (H, H ) if both coins come up heads; it will be (H, T ) if the first coin comes up heads and the second comes up tails; it will be (T, H ) if the first comes up tails and the second heads; and it will be (T, T ) if both coins come up tails. 4. If the experiment consists of rolling two dice, then the sample space consists of the following 36 points: ⎧ ⎫ (1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6)⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎬ (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6) S= (4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6) ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ⎭ (6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6) where the outcome (i, j) is said to occur if i appears on the first die and j on the second die. 5. If the experiment consists of measuring the lifetime of a car, then the sample space consists of all nonnegative real numbers. That is,∗ S = [0, ∞)

Any subset E of the sample space S is known as an event. Some examples of events are the following. 1′ . In Example (1), if E = {H }, then E is the event that a head appears on the flip of the coin. Similarly, if E = {T }, then E would be the event that a tail appears. 2′ . In Example (2), if E = {1}, then E is the event that one appears on the roll of the die. If E = {2, 4, 6}, then E would be the event that an even number appears on the roll. ∗ The set (a, b) is defined to consist of all points x such that a < x < b. The set [a, b] is defined to consist of all points x such that a " x " b. The sets (a, b] and [a, b) are defined, respectively, to consist of all points x such that a < x " b and all points x such that a " x < b.

Introduction to Probability Theory

3′ . In Example (3), if E = {(H, H ), (H, T )}, then E is the event that a head appears on the first coin. 4′ . In Example (4), if E = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}, then E is the event that the sum of the dice equals seven. 5′ . In Example (5), if E = (2, 6), then E is the event that the car lasts between two and six years. ! We say that the event E occurs when the outcome of the experiment lies in E. For any two events E and F of a sample space S we define the new event E ∪ F to consist of all outcomes that are either in E or in F or in both E and F. That is, the event E ∪ F will occur if either E or F occurs. For example, in (1) if E = {H } and F = {T }, then E ∪ F = {H, T } That is, E ∪ F would be the whole sample space S. In (2) if E = {1, 3, 5} and F = {1, 2, 3}, then E ∪ F = {1, 2, 3, 5} and thus E ∪ F would occur if the outcome of the die is 1 or 2 or 3 or 5. The event E ∪ F is often referred to as the union of the event E and the event F. For any two events E and F, we may also define the new event EF, sometimes written E ∩ F, and referred to as the intersection of E and F, as follows. EF consists of all outcomes which are both in E and in F. That is, the event EF will occur only if both E and F occur. For example, in (2) if E = {1, 3, 5} and F = {1, 2, 3}, then EF = {1, 3} and thus EF would occur if the outcome of the die is either 1 or 3. In Example (1) if E = {H } and F = {T }, then the event EF would not consist of any outcomes and hence could not occur. To give such an event a name, we shall refer to it as the null event and denote it by Ø. (That is, Ø refers to the event consisting of no outcomes.) If EF = Ø, then E and F are said to be mutually exclusive. We also define unions and intersections of more than two events( in a similar manner. If E 1 , E 2 , . . . are events, then the union of these events, denoted by ∞ n=1 E n , is defined to be the event that consists of all outcomes that are in E n for at least one ) value of n = 1, 2, . . . . Similarly, the intersection of the events E n , denoted by ∞ n=1 E n , is defined to be the event consisting of those outcomes that are in all of the events E n , n = 1, 2, . . . . Finally, for any event E we define the new event E c , referred to as the complement of E, to consist of all outcomes in the sample space S that are not in E. That is, E c will occur if and only if E does not occur. In Example (4) if E = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}, then E c will occur if the sum of the dice does not equal seven. Also note that since the experiment must result in some outcome, it follows that S c = Ø.

Introduction to Probability Models

1.3

Probabilities Defined on Events

Consider an experiment whose sample space is S. For each event E of the sample space S, we assume that a number P(E) is defined and satisfies the following three conditions: (i) 0 " P(E) " 1. (ii) P(S) = 1. (iii) For any sequence of events E 1 , E 2 , . . . that are mutually exclusive, that is, events for which E n E m = Ø when n ̸= m, then *∞ , ∞ + P En = P(E n ) n=1

n=1

We refer to P(E) as the probability of the event E.

Example 1.1 In the coin tossing example, if we assume that a head is equally likely to appear as a tail, then we would have P({H }) = P({T }) =

1 2

On the other hand, if we had a biased coin and felt that a head was twice as likely to appear as a tail, then we would have P({H }) = 23 ,

P({T }) =

1 3

Example 1.2 In the die tossing example, if we supposed that all six numbers were equally likely to appear, then we would have P({1}) = P({2}) = P({3}) = P({4}) = P({5}) = P({6}) =

1 6

From (iii) it would follow that the probability of getting an even number would equal P({2, 4, 6}) = P({2}) + P({4}) + P({6}) = 21

Remark We have chosen to give a rather formal definition of probabilities as being functions defined on the events of a sample space. However, it turns out that these probabilities have a nice intuitive property. Namely, if our experiment is repeated over and over again then (with probability 1) the proportion of time that event E occurs will just be P(E). Since the events E and E c are always mutually exclusive and since E ∪ E c = S we have by (ii) and (iii) that 1 = P(S) = P(E ∪ E c ) = P(E) + P(E c ) or P(E c ) = 1 − P(E)

(1.1)

Introduction to Probability Theory

In words, Equation (1.1) states that the probability that an event does not occur is one minus the probability that it does occur. We shall now derive a formula for P(E ∪ F), the probability of all outcomes either in E or in F. To do so, consider P(E) + P(F), which is the probability of all outcomes in E plus the probability of all points in F. Since any outcome that is in both E and F will be counted twice in P(E) + P(F) and only once in P(E ∪ F), we must have P(E) + P(F) = P(E ∪ F) + P(EF) or equivalently P(E ∪ F) = P(E) + P(F) − P(EF)

(1.2)

Note that when E and F are mutually exclusive (that is, when EF = Ø), then Equation (1.2) states that P(E ∪ F) = P(E) + P(F) − P(Ø) = P(E) + P(F)

a result which also follows from condition (iii). (Why is P(Ø) = 0?)

Example 1.3 Suppose that we toss two coins, and suppose that we assume that each of the four outcomes in the sample space S = {(H, H ), (H, T ), (T, H ), (T, T )} is equally likely and hence has probability 14 . Let E = {(H, H ), (H, T )} and F = {(H, H ), (T, H )} That is, E is the event that the first coin falls heads, and F is the event that the second coin falls heads. By Equation (1.2) we have that P(E ∪ F), the probability that either the first or the second coin falls heads, is given by P(E ∪ F) = P(E) + P(F) − P(EF) = 21 + 21 − P({H, H }) =1−

1 4

3 4

This probability could, of course, have been computed directly since P(E ∪ F) = P({H, H ), (H, T ), (T, H )}) =

3 4

We may also calculate the probability that any one of the three events E or F or G occurs. This is done as follows: P(E ∪ F ∪ G) = P((E ∪ F) ∪ G)

Introduction to Probability Models

which by Equation (1.2) equals P(E ∪ F) + P(G) − P((E ∪ F)G) Now we leave it for you to show that the events (E ∪ F)G and EG ∪ FG are equivalent, and hence the preceding equals P(E ∪ F ∪ G)

= P(E) + P(F) − P(EF) + P(G) − P(EG ∪ FG) = P(E) + P(F) − P(EF) + P(G) − P(EG) − P(FG) + P(EGFG) = P(E) + P(F) + P(G) − P(EF) − P(EG) − P(FG) + P(EFG)

(1.3)

In fact, it can be shown by induction that, for any n events E 1 , E 2 , E 3 , . . . , E n , P(E i ) − P(E i E j ) + P(E i E j E k ) P(E 1 ∪ E 2 ∪ · · · ∪ E n ) = i

−

i< j

i< j m. Assume that in the count of the votes all possible orderings of the n + m votes are equally likely. Let Pn,m denote the probability that from the first vote on A is always in the lead. Find (a) P2,1 (b) P3,1 (c) Pn,1 (d) P3,2 (e) P4,2 (f) Pn,2 (g) P4,3 (h) P5,3 (i) P5,4 (j) Make a conjecture as to the value of Pn,m . *25. Two cards are randomly selected from a deck of 52 playing cards. (a) What is the probability they constitute a pair (that is, that they are of the same denomination)? (b) What is the conditional probability they constitute a pair given that they are of different suits? 26. A deck of 52 playing cards, containing all 4 aces, is randomly divided into 4 piles of 13 cards each. Define events E 1 , E 2 , E 3 , and E 4 as follows: E 1 = {the first pile has exactly 1 ace}, E 2 = {the second pile has exactly 1 ace}, E 3 = {the third pile has exactly 1 ace}, E 4 = {the fourth pile has exactly 1 ace}

Use Exercise 23 to find P(E 1 E 2 E 3 E 4 ), the probability that each pile has an ace. *27. Suppose in Exercise 26 we had defined the events E i , i = 1, 2, 3, 4, by E 1 = {one of the piles contains the ace of spades},

E 2 = {the ace of spades and the ace of hearts are in different piles}, E 3 = {the ace of spades, the ace of hearts, and the ace of diamonds are in different piles}, E 4 = {all 4 aces are in different piles}

Now use Exercise 23 to find P(E 1 E 2 E 3 E 4 ), the probability that each pile has an ace. Compare your answer with the one you obtained in Exercise 26. 28. If the occurrence of B makes A more likely, does the occurrence of A make B more likely? 29. Suppose that P(E) = 0.6. What can you say about P(E|F) when (a) E and F are mutually exclusive? (b) E ⊂ F? (c) F ⊂ E?

*30. Bill and George go target shooting together. Both shoot at a target at the same time. Suppose Bill hits the target with probability 0.7, whereas George, independently, hits the target with probability 0.4.

Introduction to Probability Theory

(a) Given that exactly one shot hit the target, what is the probability that it was George’s shot? (b) Given that the target is hit, what is the probability that George hit it? 31. What is the conditional probability that the first die is six given that the sum of the dice is seven? *32. Suppose all n men at a party throw their hats in the center of the room. Each man then randomly selects a hat. Show that the probability that none of the n men selects his own hat is 1 1 ( − 1)n 1 − + − +··· 2! 3! 4! n! Note that as n → ∞ this converges to e−1 . Is this surprising? 33. In a class there are four freshman boys, six freshman girls, and six sophomore boys. How many sophomore girls must be present if sex and class are to be independent when a student is selected at random? 34. Mr. Jones has devised a gambling system for winning at roulette. When he bets, he bets on red, and places a bet only when the ten previous spins of the roulette have landed on a black number. He reasons that his chance of winning is quite large since the probability of eleven consecutive spins resulting in black is quite small. What do you think of this system? 35. A fair coin is continually flipped. What is the probability that the first four flips are (a) H, H, H, H ? (b) T, H, H, H ? (c) What is the probability that the pattern T, H, H, H occurs before the pattern H, H, H, H ? 36. Consider two boxes, one containing one black and one white marble, the other, two black and one white marble. A box is selected at random and a marble is drawn at random from the selected box. What is the probability that the marble is black? 37. In Exercise 36, what is the probability that the first box was the one selected given that the marble is white? 38. Urn 1 contains two white balls and one black ball, while urn 2 contains one white ball and five black balls. One ball is drawn at random from urn 1 and placed in urn 2. A ball is then drawn from urn 2. It happens to be white. What is the probability that the transferred ball was white? 39. Stores A, B, and C have 50, 75, and 100 employees, and, respectively, 50, 60, and 70 percent of these are women. Resignations are equally likely among all employees, regardless of sex. One employee resigns and this is a woman. What is the probability that she works in store C? *40. (a) A gambler has in his pocket a fair coin and a two-headed coin. He selects one of the coins at random, and when he flips it, it shows heads. What is the probability that it is the fair coin? (b) Suppose that he flips the same coin a second time and again it shows heads. Now what is the probability that it is the fair coin?

Introduction to Probability Models

(c) Suppose that he flips the same coin a third time and it shows tails. Now what is the probability that it is the fair coin? 41. In a certain species of rats, black dominates over brown. Suppose that a black rat with two black parents has a brown sibling. (a) What is the probability that this rat is a pure black rat (as opposed to being a hybrid with one black and one brown gene)? (b) Suppose that when the black rat is mated with a brown rat, all five of their offspring are black. Now, what is the probability that the rat is a pure black rat? 42. There are three coins in a box. One is a two-headed coin, another is a fair coin, and the third is a biased coin that comes up heads 75 percent of the time. When one of the three coins is selected at random and flipped, it shows heads. What is the probability that it was the two-headed coin? 43. The blue-eyed gene for eye color is recessive, meaning that both the eye genes of an individual must be blue for that individual to be blue eyed. Jo (F) and Joe (M) are both brown-eyed individuals whose mothers had blue eyes. Their daughter Flo, who has brown eyes, is expecting a child conceived with a blue-eyed man. What is the probability that this child will be blue eyed? 44. Urn 1 has five white and seven black balls. Urn 2 has three white and twelve black balls. We flip a fair coin. If the outcome is heads, then a ball from urn 1 is selected, while if the outcome is tails, then a ball from urn 2 is selected. Suppose that a white ball is selected. What is the probability that the coin landed tails? *45. An urn contains b black balls and r red balls. One of the balls is drawn at random, but when it is put back in the urn c additional balls of the same color are put in with it. Now suppose that we draw another ball. Show that the probability that the first ball drawn was black given that the second ball drawn was red is b/(b + r + c). 46. Three prisoners are informed by their jailer that one of them has been chosen at random to be executed, and the other two are to be freed. Prisoner A asks the jailer to tell him privately which of his fellow prisoners will be set free, claiming that there would be no harm in divulging this information, since he already knows that at least one will go free. The jailer refuses to answer this question, pointing out that if A knew which of his fellows were to be set free, then his own probability of being executed would rise from 13 to 21 , since he would then be one of two prisoners. What do you think of the jailer’s reasoning? 47. For a fixed event B, show that the collection P(A|B), defined for all events A, satisfies the three conditions for a probability. Conclude from this that P(A|B) = P(A|BC)P(C|B) + P(A|BCc )P(C c |B) Then directly verify the preceding equation. *48. Sixty percent of the families in a certain community own their own car, thirty percent own their own home, and twenty percent own both their own car and their own home. If a family is randomly chosen, what is the probability that this family owns a car or a house but not both?

Introduction to Probability Theory

References Reference [2] provides a colorful introduction to some of the earliest developments in probability theory. References [3,4], and [7] are all excellent introductory texts in modern probability theory. Reference [5] is the definitive work that established the axiomatic foundation of modern mathematical probability theory. Reference [6] is a nonmathematical introduction to probability theory and its applications, written by one of the greatest mathematicians of the eighteenth century. [1] L. Breiman, “Probability,” Addison-Wesley, Reading, Massachusetts, 1968. [2] F.N. David, “Games, Gods, and Gambling,” Hafner, New York, 1962. [3] W. Feller, “An Introduction to Probability Theory and Its Applications,” Vol. I, John Wiley, New York, 1957. [4] B.V. Gnedenko, “Theory of Probability,” Chelsea, New York, 1962. [5] A.N. Kolmogorov, “Foundations of the Theory of Probability,” Chelsea, New York, 1956. [6] Marquis de Laplace, “A Philosophical Essay on Probabilities,” 1825 (English Translation), Dover, New York, 1951. [7] S. Ross, “A First Course in Probability,” Eighth Edition, Prentice Hall, New Jersey, 2010.

Random Variables

2.1

Random Variables

It frequently occurs that in performing an experiment we are mainly interested in some functions of the outcome as opposed to the outcome itself. For instance, in tossing dice we are often interested in the sum of the two dice and are not really concerned about the actual outcome. That is, we may be interested in knowing that the sum is seven and not be concerned over whether the actual outcome was (1, 6) or (2, 5) or (3, 4) or (4, 3) or (5, 2) or (6, 1). These quantities of interest, or more formally, these real-valued functions defined on the sample space, are known as random variables. Since the value of a random variable is determined by the outcome of the experiment, we may assign probabilities to the possible values of the random variable. Example 2.1 Letting X denote the random variable that is defined as the sum of two fair dice; then P{X = 2} = P{(1, 1)} =

1 36 ,

P{X = 3} = P{(1, 2), (2, 1)} =

2 36 ,

P{X = 4} = P{(1, 3), (2, 2), (3, 1)} =

3 36 ,

P{X = 5} = P{(1, 4), (2, 3), (3, 2), (4, 1)} =

4 36 ,

P{X = 6} = P{(1, 5), (2, 4), (3, 3), (4, 2), (5, 1)} =

5 36 ,

P{X = 8} = P{(2, 6), (3, 5), (4, 4), (5, 3), (6, 2)} =

5 36 ,

P{X = 7} = P{(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)} = P{X = 9} = P{(3, 6), (4, 5), (5, 4), (6, 3)} =

6 36 ,

4 36 ,

Introduction to Probability Models

P{X = 10} = P{(4, 6), (5, 5), (6, 4)} = P{X = 11} = P{(5, 6), (6, 5)} =

P{X = 12} = P{(6, 6)} =

3 36 ,

2 36 ,

1 36

(2.1)

In other words, the random variable X can take on any integral value between two and twelve, and the probability that it takes on each value is given by Equation (2.1). Since X must take on one of the values two through twelve, we must have ! 12 # 12 " $ 1= P {X = n} = P{X = n} n=2

i=2

which may be checked from Equation (2.1).

Example 2.2 For a second example, suppose that our experiment consists of tossing two fair coins. Letting Y denote the number of heads appearing, then Y is a random variable taking on one of the values 0, 1, 2 with respective probabilities P{Y = 0} = P{(T, T )} = 41 ,

P{Y = 1} = P{(T, H ), (H, T )} = 24 ,

P{Y = 2} = P{(H, H )} =

1 4

Of course, P{Y = 0} + P{Y = 1} + P{Y = 2} = 1.

Example 2.3 Suppose that we toss a coin having a probability p of coming up heads, until the first head appears. Letting N denote the number of flips required, then assuming that the outcome of successive flips are independent, N is a random variable taking on one of the values 1, 2, 3, . . . , with respective probabilities P{N = 1} = P{H } = p, P{N = 2} = P{(T, H )} = (1 − p) p, P{N = 3} = P{(T, T, H )} = (1 − p)2 p, .. . P{N = n} = P{( T, T, . . . , T, H )} = (1 − p)n−1 p, % &' ( n−1

n≥1

As a check, note that )∞ * ∞ " $ P {N = n} = P{N = n} n=1

n=1 ∞ $

n=1

(1 − p)n−1

p 1 − (1 − p) =1

Random Variables

Example 2.4 Suppose that our experiment consists of seeing how long a battery can operate before wearing down. Suppose also that we are not primarily interested in the actual lifetime of the battery but are concerned only about whether or not the battery lasts at least two years. In this case, we may define the random variable I by + 1, if the lifetime of battery is two or more years I = 0, otherwise If E denotes the event that the battery lasts two or more years, then the random variable I is known as the indicator random variable for event E. (Note that I equals 1 or 0 depending on whether or not E occurs.) ! Example 2.5 Suppose that independent trials, each of which ,m results in any of m pospi = 1, are continually sible outcomes with respective probabilities p1 , . . . , pm , i=1 performed. Let X denote the number of trials needed until each outcome has occurred at least once. Rather than directly considering P{X = n} we will first determine P{X > n}, the probability that at least one of the outcomes has not yet occurred after n trials. Letting Ai denote the event that outcome i has not yet occurred after the first n trials, i = 1, . . . , m, then )m * " Ai P{X > n} = P i=1

m $ i=1

P(Ai ) −

P(Ai A j )

i< j

$$$ + P(Ai A j Ak ) − · · · + (−1)m+1 P(A1 · · · Am ) i< j< k

Now, P(Ai ) is the probability that each of the first n trials results in a non-i outcome, and so by independence P(Ai ) = (1 − pi )n Similarly, P(Ai A j ) is the probability that the first n trials all result in a non-i and non- j outcome, and so P(Ai A j ) = (1 − pi − p j )n As all of the other probabilities are similar, we see that P{X > n} =

m $ i=1

(1 − pi )n −

$$ (1 − pi − p j )n i< j

$$$ + (1 − pi − p j − pk )n − · · · i< j< k

Introduction to Probability Models

Since P{X = n} = P{X > n − 1} − P{X > n}, we see, upon using the algebraic identity (1 − a)n−1 − (1 − a)n = a(1 − a)n−1 , that P{X = n} =

m $ i=1

pi (1 − pi )n−1 −

$$ ( pi + p j )(1 − pi − p j )n−1 i< j

$$$ + ( pi + p j + pk )(1 − pi − p j − pk )n−1 − · · ·

i< j< k

In all of the preceding examples, the random variables of interest took on either a finite or a countable number of possible values.∗ Such random variables are called discrete. However, there also exist random variables that take on a continuum of possible values. These are known as continuous random variables. One example is the random variable denoting the lifetime of a car, when the car’s lifetime is assumed to take on any value in some interval (a, b). The cumulative distribution function (cdf) (or more simply the distribution function) F(·) of the random variable X is defined for any real number b, −∞ < b < ∞, by F(b) = P{X ≤ b}

In words, F(b) denotes the probability that the random variable X takes on a value that is less than or equal to b. Some properties of the cdf F are (i) F(b) is a nondecreasing function of b, (ii) limb→∞ F(b) = F(∞) = 1, (iii) limb→−∞ F(b) = F(−∞) = 0.

Property (i) follows since for a < b the event {X ≤ a} is contained in the event {X ≤ b}, and so it must have a smaller probability. Properties (ii) and (iii) follow since X must take on some finite value. All probability questions about X can be answered in terms of the cdf F(·). For example, P{a < X ≤ b} = F(b) − F(a)

for all a < b

This follows since we may calculate P{a < X ≤ b} by first computing the probability that X ≤ b (that is, F(b)) and then subtracting from this the probability that X ≤ a (that is, F(a)). If we desire the probability that X is strictly smaller than b, we may calculate this probability by P{X < b} = lim P{X ≤ b − h} h→0+

= lim F(b − h) h→0+

where lim h→0+ means that we are taking the limit as h decreases to 0. Note that P{X < b} does not necessarily equal F(b) since F(b) also includes the probability that X equals b. ∗ A set is countable if its elements can be put in a one-to-one correspondence with the sequence of positive integers.

Random Variables

2.2

Discrete Random Variables

As was previously mentioned, a random variable that can take on at most a countable number of possible values is said to be discrete. For a discrete random variable X , we define the probability mass function p(a) of X by p(a) = P{X = a} The probability mass function p(a) is positive for at most a countable number of values of a. That is, if X must assume one of the values x1 , x2 , . . . , then p(xi ) > 0, p(x) = 0,

i = 1, 2, . . . all other values of x

Since X must take on one of the values xi , we have ∞ $ i=1

p(xi ) = 1

The cumulative distribution function F can be expressed in terms of p(a) by $ p(xi ) F(a) = all xi ≤a

For instance, suppose X has a probability mass function given by p(1) = 21 ,

p(2) = 13 ,

p(3) =

1 6

then, the cumulative distribution function F of X is given by ⎧ 0, a < 1 ⎪ ⎪ ⎪ ⎪ ⎨1, 1 ≤ a < 2 2 F(a) = 5 ⎪ ⎪ ⎪6, 2 ≤ a < 3 ⎪ ⎩ 1, 3 ≤ a This is graphically presented in Figure 2.1.

Figure 2.1 Graph of F(x).

Introduction to Probability Models

Discrete random variables are often classified according to their probability mass functions. We now consider some of these random variables.

2.2.1

The Bernoulli Random Variable

Suppose that a trial, or an experiment, whose outcome can be classified as either a “success” or as a “failure” is performed. If we let X equal 1 if the outcome is a success and 0 if it is a failure, then the probability mass function of X is given by p(0) = P{X = 0} = 1 − p, p(1) = P{X = 1} = p

(2.2)

where p, 0 ≤ p ≤ 1, is the probability that the trial is a “success.” A random variable X is said to be a Bernoulli random variable if its probability mass function is given by Equation (2.2) for some p ∈ (0, 1).

2.2.2

The Binomial Random Variable

Suppose that n independent trials, each of which results in a “success” with probability p and in a “failure” with probability 1 − p, are to be performed. If X represents the number of successes that occur in the n trials, then X is said to be a binomial random variable with parameters (n, p). The probability mass function of a binomial random variable having parameters (n, p) is given by p(i) =

1 2 n pi (1 − p)n−i , i

i = 0, 1, . . . , n

(2.3)

where 1 2 n! n = i (n − i)! i! equals the number of different groups of i objects that can be chosen from a set of n objects. The validity of Equation (2.3) may be verified by first noting that the probability of any particular sequence of the n outcomes containing i successes and n − i i n−i failures is, by the assumed 3 4independence of trials, p (1 − p) . Equation (2.3) then i sucfollows since there are ni different sequences of the n outcomes leading 3to 4 cesses and n − i failures. For instance, if n = 3, i = 2, then there are 23 = 3 ways in which the three trials can result in two successes. Namely, any one of the three outcomes (s, s, f ), (s, f, s), ( f, s, s), where the outcome (s, s, f ) means that the first two trials are successes and the third a failure. Since each of the three outcomes (s, s, f ), (s,3 f,4s), ( f, s, s) has a probability p 2 (1 − p) of occurring the desired probability is thus 23 p 2 (1 − p).

Random Variables

Note that, by the binomial theorem, the probabilities sum to one, that is, ∞ $ i=0

p(i) =

n 1 2 $ n i=0

5 6n pi (1 − p)n−i = p + (1 − p) = 1

Example 2.6 Four fair coins are flipped. If the outcomes are assumed independent, what is the probability that two heads and two tails are obtained? Solution: Letting X equal the number of heads (“successes”) that appear, then X is a binomial random variable with parameters (n = 4, p = 21 ). Hence, by Equation (2.3), P{X = 2} =

1 2 1 22 1 22 1 3 1 4 = 2 2 2 8

Example 2.7 It is known that any item produced by a certain machine will be defective with probability 0.1, independently of any other item. What is the probability that in a sample of three items, at most one will be defective? Solution: If X is the number of defective items in the sample, then X is a binomial random variable with parameters (3, 0.1). Hence, the desired probability is given by 1 2 1 2 3 3 P{X = 0}+ P{X = 1} = (0.1)0 (0.9)3 + (0.1)1 (0.9)2 = 0.972 ! 0 1 Example 2.8 Suppose that an airplane engine will fail, when in flight, with probability 1 − p independently from engine to engine; suppose that the airplane will make a successful flight if at least 50 percent of its engines remain operative. For what values of p is a four-engine plane preferable to a two-engine plane? Solution: Because each engine is assumed to fail or function independently of what happens with the other engines, it follows that the number of engines remaining operative is a binomial random variable. Hence, the probability that a four-engine plane makes a successful flight is 1 2 1 2 1 2 4 3 4 4 4 2 2 p (1 − p) + p (1 − p)0 p (1 − p) + 3 4 2 = 6 p 2 (1 − p)2 + 4 p 3 (1 − p) + p 4

whereas the corresponding probability for a two-engine plane is 1 2 1 2 2 2 2 p(1 − p) + p = 2 p(1 − p) + p 2 1 2 Hence the four-engine plane is safer if 6 p 2 (1 − p)2 + 4 p 3 (1 − p) + p 4 ≥ 2 p(1 − p) + p 2

Introduction to Probability Models

or equivalently if 6 p(1 − p)2 + 4 p 2 (1 − p) + p 3 ≥ 2 − p which simplifies to 3 p 3 − 8 p 2 + 7 p − 2 ≥ 0 or ( p − 1)2 (3 p − 2) ≥ 0 which is equivalent to 3 p − 2 ≥ 0 or p ≥

2 3

Hence, the four-engine plane is safer when the engine success probability is at least as large as 23 ,whereas the two-engine plane is safer when this probability falls below 23 . ! Example 2.9 Suppose that a particular trait of a person (such as eye color or left handedness) is classified on the basis of one pair of genes and suppose that d represents a dominant gene and r a recessive gene. Thus a person with dd genes is pure dominance, one with rr is pure recessive, and one with r d is hybrid. The pure dominance and the hybrid are alike in appearance. Children receive one gene from each parent. If, with respect to a particular trait, two hybrid parents have a total of four children, what is the probability that exactly three of the four children have the outward appearance of the dominant gene? Solution: If we assume that each child is equally likely to inherit either of two genes from each parent, the probabilities that the child of two hybrid parents will have dd, rr , or r d pairs of genes are, respectively, 41 , 41 , 21 . Hence, because an offspring will have the outward appearance of the dominant gene if its gene pair is either dd or r d, it follows that the number of such children is binomially distributed with parameters (4, 43 ). Thus the desired probability is 1 2 1 23 1 21 3 1 27 4 ! = 3 4 4 64 Remark on Terminology If X is a binomial random variable with parameters (n, p), then we say that X has a binomial distribution with parameters (n, p).

2.2.3

The Geometric Random Variable

Suppose that independent trials, each having probability p of being a success, are performed until a success occurs. If we let X be the number of trials required until the first success, then X is said to be a geometric random variable with parameter p. Its probability mass function is given by p(n) = P{X = n} = (1 − p)n−1 p,

n = 1, 2, . . .

(2.4)

Equation (2.4) follows since in order for X to equal n it is necessary and sufficient that the first n − 1 trials be failures and the nth trial a success. Equation (2.4) follows since the outcomes of the successive trials are assumed to be independent.

Random Variables

To check that p(n) is a probability mass function, we note that ∞ $ n=1

2.2.4

p(n) = p

∞ $ n=1

(1 − p)n−1 = 1

The Poisson Random Variable

A random variable X , taking on one of the values 0, 1, 2, . . . , is said to be a Poisson random variable with parameter λ, if for some λ > 0, p(i) = P{X = i} = e−λ

λi , i!

i = 0, 1, . . .

(2.5)

Equation (2.5) defines a probability mass function since ∞ $ i=0

p(i) = e−λ

∞ $ λi i=0

= e−λ eλ = 1

The Poisson random variable has a wide range of applications in a diverse number of areas, as will be seen in Chapter 5. An important property of the Poisson random variable is that it may be used to approximate a binomial random variable when the binomial parameter n is large and p is small. To see this, suppose that X is a binomial random variable with parameters (n, p), and let λ = np. Then n! pi (1 − p)n−i (n − i)! i! 1 2i 1 2 λ n−i λ n! 1− = (n − i)! i! n n

P{X = i} =

n(n − 1) · · · (n − i + 1) λi (1 − λ/n)n ni i! (1 − λ/n)i

Now, for n large and p small 1 2 λ n 1− ≈ e−λ , n

n(n − 1) · · · (n − i + 1) ≈ 1, ni

2 1 λ i 1− ≈1 n

Hence, for n large and p small, P{X = i} ≈ e−λ

λi i!

Example 2.10 Suppose that the number of typographical errors on a single page of this book has a Poisson distribution with parameter λ = 1. Calculate the probability that there is at least one error on this page.

Introduction to Probability Models

Solution: P{X ≥ 1} = 1 − P{X = 0} = 1 − e−1 ≈ 0.633

Example 2.11 If the number of accidents occurring on a highway each day is a Poisson random variable with parameter λ = 3, what is the probability that no accidents occur today? Solution: P{X = 0} = e−3 ≈ 0.05

Example 2.12 Consider an experiment that consists of counting the number of α-particles given off in a one-second interval by one gram of radioactive material. If we know from past experience that, on the average, 3.2 such α-particles are given off, what is a good approximation to the probability that no more than two α-particles will appear? Solution: If we think of the gram of radioactive material as consisting of a large number n of atoms each of which has probability 3.2/n of disintegrating and sending off an α-particle during the second considered, then we see that, to a very close approximation, the number of α-particles given off will be a Poisson random variable with parameter λ = 3.2. Hence the desired probability is P{X ≤ 2} = e−3.2 + 3.2e−3.2 +

2.3

(3.2)2 −3.2 e ≈ 0.382 2

Continuous Random Variables

In this section, we shall concern ourselves with random variables whose set of possible values is uncountable. Let X be such a random variable. We say that X is a continuous random variable if there exists a nonnegative function f (x), defined for all real x ∈ (−∞, ∞), having the property that for any set B of real numbers 7 f (x) d x (2.6) P{X ∈ B} = B

The function f (x) is called the probability density function of the random variable X . In words, Equation (2.6) states that the probability that X will be in B may be obtained by integrating the probability density function over the set B. Since X must assume some value, f (x) must satisfy 7 ∞ 1 = P{X ∈ (−∞, ∞)} = f (x) d x −∞

All probability statements about X can be answered in terms of f (x). For instance, letting B = [a, b], we obtain from Equation (2.6) that 7 b f (x) d x (2.7) P{a ≤ X ≤ b} = a

Random Variables

If we let a = b in the preceding, then 7 a f (x) d x = 0 P{X = a} = a

In words, this equation states that the probability that a continuous random variable will assume any particular value is zero. The relationship between the cumulative distribution F(·) and the probability density f (·) is expressed by 7 a f (x) d x F(a) = P{X ∈ (−∞, a]} = −∞

Differentiating both sides of the preceding yields d F(a) = f (a) da That is, the density is the derivative of the cumulative distribution function. A somewhat more intuitive interpretation of the density function may be obtained from Equation (2.7) as follows: 7 a+ε/2 8 ε9 ε = f (x) d x ≈ εf (a) P a− ≤ X ≤a+ 2 2 a−ε/2 when ε is small. In other words, the probability that X will be contained in an interval of length ε around the point a is approximately εf (a). From this, we see that f (a) is a measure of how likely it is that the random variable will be near a. There are several important continuous random variables that appear frequently in probability theory. The remainder of this section is devoted to a study of certain of these random variables.

2.3.1

The Uniform Random Variable

A random variable is said to be uniformly distributed over the interval (0, 1) if its probability density function is given by + 1, 0< x 0 only when x ∈ (0, 1), it follows that X must assume a value in (0, 1). Also, since f (x) is constant for x ∈ (0, 1), X is just as likely to be “near” any value in (0, 1) as any other value. To check this, note that, for any 0 < a < b < 1, 7 b P{a ≤ X ≤ b} = f (x) d x = b − a a

Introduction to Probability Models

In other words, the probability that X is in any particular subinterval of (0, 1) equals the length of that subinterval. In general, we say that X is a uniform random variable on the interval (α, β) if its probability density function is given by ⎧ ⎨ 1 , if α < x < β (2.8) f (x) = β − α ⎩ 0, otherwise Example 2.13 Calculate the cumulative distribution function of a random variable uniformly distributed over (α, β). :a Solution: Since F(a) = −∞ f (x) d x, we obtain from Equation (2.8) that ⎧ 0, a≤α ⎪ ⎪ ⎨a − α , α 7} = 7 10 10 :6 dx 1 = P{1 < X < 6} = 1 10 2 P{X < 3} =

2.3.2

Exponential Random Variables

A continuous random variable whose probability density function is given, for some λ > 0, by + −λx , if x ≥ 0 λe f (x) = 0, if x < 0 is said to be an exponential random variable with parameter λ. These random variables will be extensively studied in Chapter 5, so we will content ourselves here with just calculating the cumulative distribution function F: 7 a λe−λx d x = 1 − e−λa , a≥0 F(a) = 0

Note that F(∞) =

:∞ 0

λe−λx d x = 1, as, of course, it must.

Random Variables

2.3.3

Gamma Random Variables

A continuous random variable whose density is given by ⎧ −λx (λx)α−1 ⎨ λe , if x ≥ 0 f (x) = %(α) ⎩ 0, if x < 0

for some λ > 0, α > 0 is said to be a gamma random variable with parameters α, λ. The quantity %(α) is called the gamma function and is defined by 7 ∞ %(α) = e−x x α−1 d x 0

It is easy to show by induction that for integral α, say, α = n, %(n) = (n − 1)!

2.3.4

Normal Random Variables

We say that X is a normal random variable (or simply that X is normally distributed) with parameters µ and σ 2 if the density of X is given by f (x) = √

1 2π σ

e−(x−µ)

2 /2σ 2

−∞ < x < ∞

This density function is a bell-shaped curve that is symmetric around µ (see Figure 2.2). An important fact about normal random variables is that if X is normally distributed with parameters µ and σ 2 then Y = α X + β is normally distributed with parameters αµ + β and α 2 σ 2 . To prove this, suppose first that α > 0 and note that FY (·)∗, the cumulative distribution function of the random variable Y , is given by FY (a) = P{Y ≤ a}

= P{α X + β ≤ a}

Figure 2.2 Normal density function. ∗ When there is more than one random variable under consideration, we shall denote the cumulative distribution function of a random variable Z by Fz (·). Similarly, we shall denote the density of Z by f z (·).

Introduction to Probability Models

+ ; a−β =P X≤ α 1 2 a−β = FX α 7 (a−β)/α 1 2 2 = e−(x−µ) /2σ d x √ 2π σ −∞ ; + 7 a 1 −(v − (αµ + β))2 = dv exp √ 2α 2 σ 2 2π ασ −∞

(2.9)

where the last equality : a is obtained by the change in variables v = αx + β. However, since FY (a) = −∞ f Y (v) dv, it follows from Equation (2.9) that the probability density function f Y (·) is given by ; + 1 −(v − (αµ + β))2 f Y (v) = √ , −∞0

We say that X and Y are jointly continuous if there exists a function f (x, y), defined for all real x and y, having the property that for all sets A and B of real numbers 7 7 P{X ∈ A, Y ∈ B} = f (x, y) d x d y B

The function f (x, y) is called the joint probability density function of X and Y . The probability density of X can be obtained from a knowledge of f (x, y) by the following reasoning: P{X ∈ A} = P{X ∈ A, Y ∈ (−∞, ∞)} 7 ∞7 = f (x, y) d x d y 7−∞ A f X (x) d x = A

Random Variables

where f X (x) =

∞ −∞

f (x, y) dy

is thus the probability density function of X . Similarly, the probability density function of Y is given by 7 ∞ f Y (y) = f (x, y) d x Because

−∞

F(a, b) = P(X ≤ a, Y ≤ b) = differentiation yields

−∞ −∞

f (x, y) d y d x

d2 F(a, b) = f (a, b) da db Thus, as in the single variable case, differentiating the probability distribution function gives the probability density function. A variation of Proposition 2.1 states that if X and Y are random variables and g is a function of two variables, then $$ E[g(X, Y )] = g(x, y) p(x, y) in the discrete case =

∞

−∞ −∞

g(x, y) f (x, y) d x d y

in the continuous case

For example, if g(X, Y ) = X + Y , then, in the continuous case, 7 ∞7 ∞ (x + y) f (x, y) d x d y E[X + Y ] = −∞ −∞ 7 ∞7 ∞ 7 ∞7 ∞ = x f (x, y) d x d y + y f (x, y) d x d y −∞ −∞

= E[X ] + E[Y ]

−∞ −∞

where the first integral is evaluated by using the variation of Proposition 2.1 with g(x, y) = x, and the second with g(x, y) = y. The same result holds in the discrete case and, combined with the corollary in Section 2.4.3, yields that for any constants a, b E[a X + bY ] = a E[X ] + bE[Y ]

(2.10)

Joint probability distributions may also be defined for n random variables. The details are exactly the same as when n = 2 and are left as an exercise. The corresponding result to Equation (2.10) states that if X 1 , X 2 , . . . , X n are n random variables, then for any n constants a1 , a2 , . . . , an , E[a1 X 1 + a2 X 2 + · · · + an X n ] = a1 E[X 1 ] + a2 E[X 2 ] + · · · + an E[X n ] (2.11)

Introduction to Probability Models

Example 2.29 Calculate the expected sum obtained when three fair dice are rolled. Solution: Let X denote the sum obtained. Then X = X 1 + X 2 + X 3 where X i represents the value of the ith die. Thus, 5 6 ! E[X ] = E[X 1 ] + E[X 2 ] + E[X 3 ] = 3 27 = 21 2

Example 2.30 As another example of the usefulness of Equation (2.11), let us use it to obtain the expectation of a binomial random variable having parameters n and p. Recalling that such a random variable X represents the number of successes in n trials when each trial has probability p of being a success, we have X = X1 + X2 + · · · + Xn

where

Xi =

1, if the ith trial is a success 0, if the ith trial is a failure

Hence, X i is a Bernoulli random variable having expectation E[X i ] = 1( p) + 0(1 − p) = p. Thus, E[X ] = E[X 1 ] + E[X 2 ] + · · · + E[X n ] = np

This derivation should be compared with the one presented in Example 2.17.

Example 2.31 At a party N men throw their hats into the center of a room. The hats are mixed up and each man randomly selects one. Find the expected number of men who select their own hats. Solution: Letting X denote the number of men that select their own hats, we can best compute E[X ] by noting that where

X = X1 + X2 + · · · + X N Xi =

+ 1, if the ith man selects his own hat 0, otherwise

Now, because the ith man is equally likely to select any of the N hats, it follows that P{X i = 1} = P{ith man selects his own hat} =

1 N

and so E[X i ] = 1P{X i = 1} + 0P{X i = 0} =

1 N

Hence, from Equation (2.11) we obtain E[X ] = E[X 1 ] + · · · + E[X N ] =

1 N

N =1

Hence, no matter how many people are at the party, on the average exactly one of the men will select his own hat. !

Random Variables

Example 2.32 Suppose there are 25 different types of coupons and suppose that each time one obtains a coupon, it is equally likely to be any one of the 25 types. Compute the expected number of different types that are contained in a set of 10 coupons. Solution: Let X denote the number of different types in the set of 10 coupons. We compute E[X ] by using the representation X = X 1 + · · · + X 25 where + 1, if at least one type i coupon is in the set of 10 Xi = 0, otherwise Now, E[X i ] = P{X i = 1}

= P{at least one type i coupon is in the set of 10} = 1 − P{no type i coupons are in the set of 10} 5 610 = 1 − 24 25

when the last equality follows since each of the 10 coupons will (independently) not be a type i with probability 24 25 . Hence,

2.5.2

= 5 610 > E[X ] = E[X 1 ] + · · · + E[X 25 ] = 25 1 − 24 25

Independent Random Variables

The random variables X and Y are said to be independent if, for all a, b, P{X ≤ a, Y ≤ b} = P{X ≤ a}P{Y ≤ b}

(2.12)

In other words, X and Y are independent if, for all a and b, the events E a = {X ≤ a} and Fb = {Y ≤ b} are independent. In terms of the joint distribution function F of X and Y , we have that X and Y are independent if F(a, b) = FX (a)FY (b)

for all a, b

When X and Y are discrete, the condition of independence reduces to p(x, y) = p X (x) pY (y)

(2.13)

while if X and Y are jointly continuous, independence reduces to f (x, y) = f X (x) f Y (y)

(2.14)

Introduction to Probability Models

To prove this statement, consider first the discrete version, and suppose that the joint probability mass function p(x, y) satisfies Equation (2.13). Then P{X ≤ a, Y ≤ b} = = =

p(x, y)

y≤b x≤a

p X (x) pY (y)

y≤b x≤a

pY (y)

p X (x)

x≤a

y≤b

= P{Y ≤ b}P{X ≤ a} and so X and Y are independent. That Equation (2.14) implies independence in the continuous case is proven in the same manner and is left as an exercise. An important result concerning independence is the following. Proposition 2.3

If X and Y are independent, then for any functions h and g

E[g(X )h(Y )] = E[g(X )]E[h(Y )] Proof.

Suppose that X and Y are jointly continuous. Then

E[g(X )h(Y )] = = =

∞

−∞ −∞ 7 ∞7 ∞

g(x)h(y) f (x, y) d x d y

g(x)h(y) f X (x) f Y (y) d x d y 7 ∞ h(y) f Y (y) dy g(x) f X (x) d x

−∞ −∞ 7 ∞ −∞

= E[h(Y )]E[g(X )]

−∞

The proof in the discrete case is similar.

2.5.3

Covariance and Variance of Sums of Random Variables

The covariance of any two random variables X and Y , denoted by Cov(X, Y ), is defined by Cov(X, Y ) = E[(X − E[X ])(Y − E[Y ])] = E[X Y − YE[X ] − XE[Y ] + E[X ]E[Y ]]

= E[X Y ] − E[Y ]E[X ] − E[X ]E[Y ] + E[X ]E[Y ] = E[X Y ] − E[X ]E[Y ]

Note that if X and Y are independent, then by Proposition 2.3 it follows that Cov(X, Y ) = 0.

Random Variables

Let us consider now the special case where X and Y are indicator variables for whether or not the events A and B occur. That is, for events A and B, define + + 1, if A occurs 1, if B occurs X= Y = 0, otherwise, 0, otherwise Then, Cov(X, Y ) = E[X Y ] − E[X ]E[Y ] and, because X Y will equal 1 or 0 depending on whether or not both X and Y equal 1, we see that Cov(X, Y ) = P{X = 1, Y = 1} − P{X = 1}P{Y = 1} From this we see that Cov(X, Y ) > 0 ⇔ P{X = 1, Y = 1} > P{X = 1}P{Y = 1} P{X = 1, Y = 1} > P{Y = 1} ⇔ P{X = 1} ⇔ P{Y = 1|X = 1} > P{Y = 1} That is, the covariance of X and Y is positive if the outcome X = 1 makes it more likely that Y = 1 (which, as is easily seen by symmetry, also implies the reverse). In general it can be shown that a positive value of Cov(X, Y ) is an indication that Y tends to increase as X does, whereas a negative value indicates that Y tends to decrease as X increases. Example 2.33 The joint density function of X, Y is f (x, y) =

1 −(y+x/y) e , 0 < x, y < ∞ y

(a) Verify that the preceding is a joint density function. (b) Find Cov (X, Y ). Solution: To show that f (x, y) is a joint density function we need to show it is :∞ :∞ nonnegative, which is immediate, and that −∞ −∞ f (x, y) d y d x = 1. We prove the latter as follows: 7 ∞7 ∞ 7 ∞7 ∞ 1 −(y+x/y) e f (x, y) d y d x = dy dx y −∞ −∞ 0 0 7 ∞ 7 ∞ 1 −x/y e = e−y dx dy y 0 70 ∞ = e−y dy =1

Introduction to Probability Models

To obtain Cov(X, Y ), note that the density funtion of Y is 7 ∞ 1 −x/y −y e d x = e−y f Y (y) = e y 0 Thus, Y is an exponential random variable with parameter 1, showing (see Example 2.21) that E[Y ] = 1 We compute E[X ] and E[X Y ] as follows: 7 ∞7 ∞ x f (x, y) d y d x E[X ] = −∞ −∞ 7 ∞ 7 ∞ x −x/y e = e−y dx dy y 0 0 :∞ Now, 0 xy e−x/y d x is the expected value of an exponential random variable with parameter 1/y, and thus is equal to y. Consequently, 7 ∞ ye−y dy = 1 E[X ] = 0

Also E[X Y ] = = =

∞

x y f (x, y) d y d x

−∞ −∞ 7 ∞ 7 ∞ −y 0

∞

x −x/y e dx dy y

y 2 e−y dy

Integration by parts (dv = e−y dy, u = y 2 ) gives 7 7 ∞ 2 −y 2 −y|∞ 0 y e dy = −y e + E[X Y ] = 0

∞

2ye−y dy = 2E[Y ] = 2

Consequently, Cov(X, Y ) = E[X Y ] − E[X ]E[Y ] = 1 The following are important properties of covariance.

Properties of Covariance For any random variables X, Y, Z and constant c, 1. Cov(X, X ) = Var(X ), 2. Cov(X, Y ) = Cov(Y, X ),

Random Variables

3. Cov(cX, Y ) = c Cov(X, Y ), 4. Cov(X, Y + Z ) = Cov(X, Y ) + Cov(X, Z ).

Whereas the first three properties are immediate, the final one is easily proven as follows: Cov(X, Y + Z ) = E[X (Y + Z )] − E[X ]E[Y + Z ] = E[X Y ] − E[X ]E[Y ] + E[X Z ] − E[X ]E[Z ] = Cov(X, Y ) + Cov(X, Z )

The fourth property listed easily generalizes to give the following result: ⎛ ⎞ n m n $ m $ $ $ Cov ⎝ Xi , Yj⎠ = Cov(X i , Y j ) i=1

j=1

(2.15)

i=1 j=1

A useful expression for the variance of the sum of random variables can be obtained from Equation (2.15) as follows: ⎞ ⎛ ) n * n n $ $ $ Var X i = Cov ⎝ Xi , X j⎠ i=1

i=1

= = =

n $ n $

j=1

Cov(X i , X j )

i=1 j=1 n $ i=1 n $ i=1

Cov(X i , X i ) + Var(X i ) + 2

n $ $

Cov(X i , X j )

i=1 j̸=i

n $ $

Cov(X i , X j )

(2.16)

i=1 j< i

If X i , i = 1, . . . , n are independent random variables, then Equation (2.16) reduces to ) n * n $ $ Xi = Var(X i ) Var i=1

i=1

Definition 2.1 If X 1 , . . . , X n are independent and identically distributed, then the ,n random variable X¯ = i=1 X i /n is called the sample mean.

The following proposition shows that the covariance between the sample mean and a deviation from that sample mean is zero. It will be needed in Section 2.6.1. Proposition 2.4 Suppose that X 1 , . . . , X n are independent and identically distributed with expected value µ and variance σ 2 . Then, (a) E[ X¯ ] = µ. (b) Var( X¯ ) = σ 2 /n. (c) Cov( X¯ , X i − X¯ ) = 0, i = 1, . . . , n.

Introduction to Probability Models

Proof.

Parts (a) and (b) are easily established as follows: n

1$ E[X i ] = µ, n i=1 ) n * 1 2 n 1 22 $ 1 σ2 1 2$ Var( X¯ ) = Var Xi = Var(X i ) = n n n E[ X¯ ] =

i=1

To establish part (c) we reason as follows: Cov( X¯ , X i − X¯ ) = Cov( X¯ , X i ) − Cov( X¯ , X¯ ) ⎛ ⎞ $ 1 = Cov⎝ X i + X j , X i ⎠ − Var( X¯ ) n j̸=i ⎛ ⎞ $ 1 1 σ2 = Cov(X i , X i ) + Cov⎝ X j , Xi ⎠ − n n n =

σ2 n

−

σ2 n

j̸=i

where the final equality used the fact that X i and have covariance 0.

j̸=i

X j are independent and thus !

Equation (2.16) is often useful when computing variances. Example 2.34 (Variance of a Binomial Random Variable) Compute the variance of a binomial random variable X with parameters n and p. Solution: Since such a random variable represents the number of successes in n independent trials when each trial has a common probability p of being a success, we may write X = X1 + · · · + Xn where the X i are independent Bernoulli random variables such that + 1, if the ith trial is a success Xi = 0, otherwise Hence, from Equation (2.16) we obtain Var(X ) = Var(X 1 ) + · · · + Var(X n ) But Var(X i ) = E[X i2 ] − (E[X i ])2 = E[X i ] − (E[X i ])2 = p − p2

since X i2 = X i

Random Variables

and thus !

Var(X ) = np(1 − p)

Example 2.35 (Sampling from a Finite Population: The Hypergeometric) Consider a population of N individuals, some of whom are in favor of a certain proposition. In particular suppose that N p of them are in favor and N − N p are opposed, where p is assumed to be unknown. We are interested in estimating p, the fraction of the population that is for the proposition, by randomly choosing and then determining the positions of n members of the population. In such situations as described in the preceding, it is common to use the fraction of the sampled population that is in favor of the proposition as an estimator of p. Hence, if we let + 1, if the ith person chosen is in favor Xi = 0, otherwise ,n X i /n. Let us now compute its mean and variance. then the usual estimator of p is i=1 Now, C n D n $ $ E Xi = E[X i ] i=1

= np

where the final equality follows since the ith person chosen is equally likely to be any of the N individuals in the population and so has probability N p/N of being in favor. * ) n n $$ $ $ Xi = Var(X i ) + 2 Cov(X i , X j ) Var 1

i< j

Now, since X i is a Bernoulli random variable with mean p, it follows that Var(X i ) = p(1 − p)

Also, for i ̸= j,

Cov(X i , X j ) = E[X i X j ] − E[X i ]E[X j ]

= P{X i = 1, X j = 1} − p 2

= P{X i = 1}P{X j = 1|X i = 1} − p 2 N p (N p − 1) = − p2 N N −1 where the last equality follows since if the ith person to be chosen is in favor, then the jth person chosen is equally likely to be any of the other N − 1 of which N p − 1 are in favor. Thus, we see that ) n * F 1 2E $ p(N p − 1) n 2 Var X i = np(1 − p) + 2 −p 2 N −1 1

= np(1 − p) −

n(n − 1) p(1 − p) N −1

Introduction to Probability Models

and so the mean and variance of our estimator are given by C

D n $ Xi E = p, n 1 C n D $ Xi p(1 − p) (n − 1) p(1 − p) − Var = n n n(N − 1) 1

Some remarks are in order: As the mean of the estimator is the unknown value p, we would like its variance to be as small as possible (why is this?), and we see by the preceding that, as a function of the population size N , the variance increases as N increases. The limiting value, as N → ∞, of the variance is p(1 − p)/n, which is not surprising since for N large , each of the X i will be (approximately) independent random variables, and thus n1 X i will have an (approximately) binomial distribution with parameters n and p., The random variable n1 X i can be thought of as representing the number of white balls obtained when n balls are randomly selected from a population consisting of N p white and N − N p black balls. (Identify a person who favors the proposition with a white ball and one against with a black ball.) Such a random variable is called hypergeometric and has a probability mass function given by

n $ 1

Xi = k

Np k

N − Np n−k 1 2 N n

It is often important to be able to calculate the distribution of X + Y from the distributions of X and Y when X and Y are independent. Suppose first that X and Y are continuous, X having probability density f and Y having probability density g. Then, letting FX +Y (a) be the cumulative distribution function of X + Y , we have FX +Y (a) = P{X + Y ≤ a} 77 f (x)g(y) d x d y = = = =

x+y≤a ∞ 7 a−y

f (x)g(y) d x d y 2 f (x) d x g(y) dy

−∞ −∞ 7 ∞ 17 a−y −∞ ∞

−∞

FX (a − y)g(y) dy

(2.17)

The cumulative distribution function FX +Y is called the convolution of the distributions FX and FY (the cumulative distribution functions of X and Y , respectively).

Random Variables

By differentiating Equation (2.17), we obtain that the probability density function f X +Y (a) of X + Y is given by 7 ∞ d FX (a − y)g(y) dy da −∞ 7 ∞ d (FX (a − y))g(y) dy = da −∞ 7 ∞ = f (a − y)g(y) dy

f X +Y (a) =

−∞

(2.18)

Example 2.36 (Sum of Two Independent Uniform Random Variables) If X and Y are independent random variables both uniformly distributed on (0, 1), then calculate the probability density of X + Y . Solution: From Equation (2.18), since f (a) = g(a) =

1, 0 < a < 1 0, otherwise

we obtain f X +Y (a) =

f (a − y) dy

For 0 ≤ a ≤ 1, this yields f X +Y (a) =

dy = a

For 1 < a < 2, we get f X +Y (a) =

a−1

dy = 2 − a

Hence, ⎧ 0≤a≤1 ⎨a, f X +Y (a) = 2 − a, 1 < a < 2 ⎩ 0, otherwise

Rather than deriving a general expression for the distribution of X +Y in the discrete case, we shall consider an example. Example 2.37 (Sums of Independent Poisson Random Variables) Let X and Y be independent Poisson random variables with respective means λ1 and λ2 . Calculate the distribution of X + Y .

Introduction to Probability Models

Solution: Since the event {X + Y = n} may be written as the union of the disjoint events {X = k, Y = n − k}, 0 ≤ k ≤ n, we have

P{X + Y = n} = = =

n $ k=0 n $ k=0 n $

P{X = k, Y = n − k} P{X = k}P{Y = n − k} e−λ1

k=0

λk1 −λ2 λn−k 2 e k! (n − k)!

= e−(λ1 +λ2 ) = =

e−(λ1 +λ2 ) n! e−(λ1 +λ2 ) n!

n $ λk1 λn−k 2 k!(n − k)! k=0 n $ k=0

n! λk λn−k k!(n − k)! 1 2

(λ1 + λ2 )n

In words, X 1 + X 2 has a Poisson distribution with mean λ1 + λ2 .

The concept of independence may, of course, be defined for more than two random variables. In general, the n random variables X 1 , X 2 , . . . , X n are said to be independent if, for all values a1 , a2 , . . . , an ,

P{X 1 ≤ a1 , X 2 ≤ a2 , . . . , X n ≤ an } = P{X 1 ≤ a1 }P{X 2 ≤ a2 } · · · P{X n ≤ an } Example 2.38 Let X 1 , . . . , X n be independent and identically distributed continuous random variables with probability distribution F and density function F ′ = f . If we let X (i) denote the ith smallest of these random variables, then X (1) , . . . , X (n) are called the order statistics. To obtain the distribution of X (i) , note that X (i) will be less than or equal to x if and only if at least i of the n random variables X 1 , . . . , X n are less than or equal to x. Hence,

P{X (i) ≤ x} =

n 1 2 $ n k=i

(F(x))k (1 − F(x))n−k

Random Variables

Differentiation yields that the density function of X (i) is as follows: n 1 2 $ n

k(F(x))k−1 (1 − F(x))n−k k k=i n 1 2 $ n (n − k)(F(x))k (1 − F(x))n−k−1 − f (x) k

f X (i) (x) = f (x)

k=i

= f (x)

n $

k=i n−1 $

− f (x) = f (x)

n! (F(x))k−1 (1 − F(x))n−k (n − k)!(k − 1)!

k=i

n $ k=i

− f (x)

n! (F(x))k (1 − F(x))n−k−1 (n − k − 1)!k!

n! (F(x))k−1 (1 − F(x))n−k (n − k)!(k − 1)!

n $

j=i+1

n! (F(x)) j−1 (1 − F(x))n− j (n − j)!( j − 1)!

n! f (x)(F(x))i−1 (1 − F(x))n−i = (n − i)!(i − 1)! The preceding density is quite intuitive, since in order for X (i) to equal x, i − 1 of the n values X 1 , . . . , X n must be less than x; n − i of them must be greater than x; and one must be equal to x. Now, the probability density that every member of a specified set of i − 1 of the X j is less than x, every member of another specified set of n − i is greater than x, and the remaining value is equal to x is (F(x))i−1 (1 − F(x))n−i f (x). Therefore, since there are n!/[(i − 1)!(n − i)!] different partitions of the n random variables into the three groups, we obtain the preceding density function. !

2.5.4

Joint Probability Distribution of Functions of Random Variables

Let X 1 and X 2 be jointly continuous random variables with joint probability density function f (x1 , x2 ). It is sometimes necessary to obtain the joint distribution of the random variables Y1 and Y2 that arise as functions of X 1 and X 2 . Specifically, suppose that Y1 = g1 (X 1 , X 2 ) and Y2 = g2 (X 1 , X 2 ) for some functions g1 and g2 . Assume that the functions g1 and g2 satisfy the following conditions: 1. The equations y1 = g1 (x1 , x2 ) and y2 = g2 (x1 , x2 ) can be uniquely solved for x1 and x2 in terms of y1 and y2 with solutions given by, say, x1 = h 1 (y1 , y2 ), x2 = h 2 (y1 , y2 ).

Introduction to Probability Models

2. The functions g1 and g2 have continuous partial derivatives at all points (x1 , x2 ) and are such that the following 2 × 2 determinant < < < ∂ g1 ∂ g1 < < < < ∂ x1 ∂ x2 < ∂ g1 ∂ g2 ∂ g1 ∂ g2 k, then the best prize will be selected if the best of the first k prizes is also the best of the first i − 1 prizes (for then none of the prizes in positions k + 1, k + 2, . . . , i − 1 would be selected). Hence, we see that Pk (best|X = i) = 0, if i ! k

Pk (best|X = i) = P{best of first i − 1 is among the first k} = k/(i − 1), if i > k From the preceding, we obtain n k ! 1 n i −1 i=k+1 ) n−1 1 k dx ≈ n k x " # n−1 k = log n k .n / k ≈ log n k

Pk (best) =

Now, if we consider the function .n / x g(x) = log n x

then

g ′ (x) = and so

.n / 1 1 log − n x n

g ′ (x) = 0 ⇒ log (n/x) = 1 ⇒ x = n/e Thus, since Pk (best) ≈ g(k), we see that the best strategy of the type considered is to let the first n/e prizes go by and then accept the first one to appear that is better than all of those. In addition, since g(n/e) = 1/e, the probability that this strategy selects the best prize is approximately 1/e ≈ 0.36788.

Remark Most students are quite surprised by the size of the probability of obtaining the best prize, thinking that this probability would be close to 0 when n is large. However, even without going through the calculations, a little thought reveals that the probability of obtaining the best prize can be made to be reasonably large. Consider the strategy of letting half of the prizes go by, and then selecting the first one to appear that is better than all of those. The probability that a prize is actually selected is the probability that the overall best is among the second half and this is 1/2. In addition,

120

Introduction to Probability Models

given that a prize is selected, at the time of selection that prize would have been the best of more than n/2 prizes to have appeared, and would thus have probability of at least 1/2 of being the overall best. Hence, the strategy of letting the first half of all prizes go by and then accepting the first one that is better than all of those prizes results in a probability greater than 1/4 of obtaining the best prize. " Example 3.26 At a party n men take off their hats. The hats are then mixed up and each man randomly selects one. We say that a match occurs if a man selects his own hat. What is the probability of no matches? What is the probability of exactly k matches? Solution: Let E denote the event that no matches occur, and to make explicit the dependence on n, write Pn = P(E). We start by conditioning on whether or not the first man selects his own hat—call these events M and M c . Then Pn = P(E) = P(E|M)P(M) + P(E|M c )P(M c ) Clearly, P(E|M) = 0, and so Pn = P(E|M c )

n−1 n

(3.8)

Now, P(E|M c ) is the probability of no matches when n − 1 men select from a set of n − 1 hats that does not contain the hat of one of these men. This can happen in either of two mutually exclusive ways. Either there are no matches and the extra man does not select the extra hat (this being the hat of the man that chose first), or there are no matches and the extra man does select the extra hat. The probability of the first of these events is just Pn−1 , which is seen by regarding the extra hat as “belonging” to the extra man. Because the second event has probability [1/(n − 1)]Pn−2 , we have P(E|M c ) = Pn−1 + and thus, from Equation (3.8), Pn =

1 Pn−2 n−1

n−1 1 Pn−1 + Pn−2 n n

or, equivalently, 1 Pn − Pn−1 = − (Pn−1 − Pn−2 ) n

(3.9)

However, because Pn is the probability of no matches when n men select among their own hats, we have P1 = 0,

P2 =

1 2

and so, from Equation (3.9), (P2 − P1 ) 1 1 1 =− or P3 = − , 3 3! 2! 3! 1 1 1 (P3 − P2 ) 1 = or P4 = − + P4 − P3 = − 4 4! 2! 3! 4!

P3 − P2 = −

Conditional Probability and Conditional Expectation

121

and, in general, we see that Pn =

1 1 1 (−1)n − + − ··· + 2! 3! 4! n!

To obtain the probability of exactly k matches, we consider any fixed group of k men. The probability that they, and only they, select their own hats is 1 1 1 (n − k)! ··· Pn−k = Pn−k nn−1 n − (k − 1) n! where Pn−k is the conditional probability that the other n − k men, selecting among $ % their own hats, have no matches. Because there are nk choices of a set of k men, the

desired probability of exactly k matches is Pn−k k!

1 1 (−1)n−k − + ··· + 2! 3! (n − k)! = k!

which, for n large, is approximately equal to e−1 /k!. Remark The recursive equation, Equation (3.9), could also have been obtained by using the concept of a cycle, where we say that the sequence of distinct individuals i 1 , i 2 , . . . , i k constitutes a cycle if i 1 chooses i 2 ’s hat, i 2 chooses i 3 ’s hat, . . . , i k−1 chooses i k ’s hat, and i k chooses i 1 ’s hat. Note that every individual is part of a cycle, and that a cycle of size k = 1 occurs when someone chooses his or her own hat. With E being, as before, the event that no matches occur, it follows upon conditioning on the size of the cycle containing a specified person, say person 1, that Pn = P(E) =

n ! k=1

P(E|C = k)P(C = k)

(3.10)

where C is the size of the cycle that contains person 1. Now, call person 1 the first person, and note that C = k if the first person does not choose 1’s hat; the person whose hat was chosen by the first person—call this person the second person—does not choose 1’s hat; the person whose hat was chosen by the second person—call this person the third person—does not choose 1’s hat; . . . , the person whose hat was chosen by the (k − 1)st person does choose 1’s hat. Consequently, P(C = k) =

n−k+1 1 1 n−1n−2 ··· = n n−1 n−k+2n−k+1 n

(3.11)

That is, the size of the cycle that contains a specified person is equally likely to be any of the values 1, 2, . . . , n. Moreover, since C = 1 means that 1 chooses his or her own hat, it follows that P(E|C = 1) = 0

122

Introduction to Probability Models

On the other hand, if C = k, then the set of hats chosen by the k individuals in this cycle is exactly the set of hats of these individuals. Hence, conditional on C = k, the problem reduces to determining the probability of no matches when n − k people randomly choose among their own n − k hats. Therefore, for k > 1 P(E|C = k) = Pn−k

(3.12)

Substituting (3.11)–(3.13) back into Equation (3.10) gives Pn =

n 1! Pn−k n

(3.13)

k=2

which is easily shown to be equivalent to Equation (3.9).

Example 3.27 (The Ballot Problem) In an election, candidate A receives n votes, and candidate B receives m votes where n > m. Assuming that all orderings are equally likely, show that the probability that A is always ahead in the count of votes is (n − m)/(n + m). Solution: Let Pn,m denote the desired probability. By conditioning on which candidate receives the last vote counted we have n Pn,m = P{A always ahead|A receives last vote} n+m m +P{A always ahead|B receives last vote} n+m

Now, given that A receives the last vote, we can see that the probability that A is always ahead is the same as if A had received a total of n − 1 and B a total of m votes. Because a similar result is true when we are given that B receives the last vote, we see from the preceding that Pn,m =

n m Pn−1,m + Pn,m−1 n+m m+n

(3.14)

We can now prove that Pn,m = (n − m)/(n + m) by induction on n + m. As it is true when n + m = 1, that is, P1,0 = 1, assume it whenever n + m = k. Then when n + m = k + 1, we have by Equation (3.14) and the induction hypothesis that m n−m+1 n n−1−m + n+m n−1+m m+nn+m−1 n−m = n+m

Pn,m =

and the result is proven.

The ballot problem has some interesting applications. For example, consider successive flips of a coin that always land on “heads” with probability p, and let us determine the probability distribution of the first time, after beginning, that the total number of heads is equal to the total number of tails. The probability that the first time this occurs

Conditional Probability and Conditional Expectation

123

is at time 2n can be obtained by first conditioning on the total number of heads in the first 2n trials. This yields P{first time equal = 2n} = P{first time equal = 2n|n heads in first 2n}

" # 2n n p (1 − p)n n

Now, given a total of n heads in the first 2n flips we can see that all possible orderings of the n heads and n tails are equally likely, and thus the preceding conditional probability is equivalent to the probability that in an election, in which each candidate receives n votes, one of the candidates is always ahead in the counting until the last vote (which ties them). But by conditioning on whomever receives the last vote, we see that this is just the probability in the ballot problem when m = n − 1. Hence, " # 2n n p (1 − p)n P{first time equal = 2n} = Pn,n−1 n " # 2n n p (1 − p)n n = 2n − 1 Suppose now that we wanted to determine the probability that the first time there are i more heads than tails occurs after the (2n + i)th flip. Now, in order for this to be the case, the following two events must occur: (a) The first 2n + i tosses result in n + i heads and n tails; and (b) The order in which the n + i heads and n tails occur is such that the number of heads is never i more than the number of tails until after the final flip. Now, it is easy to see that event (b) will occur if and only if the order of appearance of the n +i heads and n tails is such that starting from the final flip and working backwards heads is always in the lead. For instance, if there are 4 heads and 2 tails (n = 2, i = 2), then the outcome _ _ _ _TH would not suffice because there would have been 2 more heads than tails sometime before the sixth flip (since the first 4 flips resulted in 2 more heads than tails). Now, the probability of the event specified in (a) is just the binomial probability of getting n + i heads and n tails in 2n + i flips of the coin. We must now determine the conditional probability of the event specified in (b) given that there are n + i heads and n tails in the first 2n + i flips. To do so, note first that given that there are a total of n + i heads and n tails in the first 2n + i flips, all possible orderings of these flips are equally likely. As a result, the conditional probability of (b) given (a) is just the probability that a random ordering of n + i heads and n tails will, when counted in reverse order, always have more heads than tails. Since all reverse orderings are also equally likely, it follows from the ballot problem that this conditional probability is i/(2n + i).

124

Introduction to Probability Models

That is, we have shown that "

2n + i n i P{b|a} = 2n + i P{a} =

p n+i (1 − p)n ,

and so P{first time heads leads by i is after flip 2n +i} =

" # i 2n + i n+i p (1− p)n n 2n + i

Example 3.28 Let U1 , U2 , . . . be a sequence of independent uniform (0, 1) random variables, and let N = min{n # 2: Un > Un−1 } and M = min{n # 1: U1 + · · · + Un > 1} That is, N is the index of the first uniform random variable that is larger than its immediate predecessor, and M is the number of uniform random variables we need sum to exceed 1. Surprisingly, N and M have the same probability distribution, and their common mean is e! Solution: It is easy to find the distribution of N . Since all n! possible orderings of U1 , . . . , Un are equally likely, we have P{N > n} = P{U1 > U2 > · · · > Un } = 1/n! To show that P{M > n} = 1/n!, we will use mathematical induction. However, to give ourselves a stronger result to use as the induction hypothesis, we will prove the stronger result that for 0 < x ! 1, P{M(x) > n} = x n /n!, n # 1, where M(x) = min{n # 1: U1 + · · · + Un > x} is the minimum number of uniforms that need be summed to exceed x. To prove that P{M(x) > n} = x n /n!, note first that it is true for n = 1 since P{M(x) > 1} = P{U1 ! x} = x

Conditional Probability and Conditional Expectation

125

So assume that for all 0 < x ! 1, P{M(x) > n} = x n /n!. To determine P{M(x) > n + 1}, condition on U1 to obtain ) 1 P{M(x) > n + 1} = P{M(x) > n + 1|U1 = y} dy 0 ) x = P{M(x) > n + 1|U1 = y} dy )0 x = P{M(x − y) > n} dy )0 x (x − y)n = dy by the induction hypothesis n! 0 ) x n u = du 0 n! x n+1 = (n + 1)! where the third equality of the preceding follows from the fact that given U1 = y, M(x) is distributed as 1 plus the number of uniforms that need be summed to exceed x − y. Thus, the induction is complete and we have shown that for 0 < x ! 1, n # 1, P{M(x) > n} = x n /n! Letting x = 1 shows that N and M have the same distribution. Finally, we have E[M] = E[N ] =

∞ ! n=0

P{N > n} =

∞ ! n=0

1/n! = e

Example 3.29 Let X 1 , X 2 , . . . be independent continuous random variables with a common distribution function F and density f = F ′ , and suppose that they are to be observed one at a time in sequence. Let N = min{n # 2: X n = second largest of X 1 , . . . , X n } and let M = min{n # 2: X n = second smallest of X 1 , . . . , X n } Which random variable—X N , the first random variable which when observed is the second largest of those that have been seen, or X M , the first one that on observation is the second smallest to have been seen—tends to be larger? Solution: To calculate the probability density function of X N , it is natural to condition on the value of N ; so let us start by determining its probability mass function. Now, if we let Ai = {X i ̸= second largest of X 1 , . . . , X i }, i # 2

126

Introduction to Probability Models

then, for n # 2,

$ % P{N = n} = P A2 A3 · · · An−1 Acn

Since the X i are independent and identically distributed it follows that, for any m # 1, knowing the rank ordering of the variables X 1 , . . . , X m yields no information about the set of m values {X 1 , . . . , X m }. That is, for instance, knowing that X 1 < X 2 gives us no information about the values of min(X 1 , X 2 ) or max(X 1 , X 2 ). It follows from this that the events Ai , i # 2 are independent. Also, since X i is equally likely to be the largest, or the second largest, . . . , or the ith largest of X 1 , . . . , X i it follows that P{Ai } = (i − 1)/i, i # 2. Therefore, we see that P{N = n} =

n−21 1 123 ··· = 234 n−1n n(n − 1)

Hence, conditioning on N yields that the probability density function of X N is as follows: f X N (x) =

∞ !

1 f X |N (x|n) n(n − 1) N

∞ !

n! 1 (F(x))n−2 f (x)(1 − F(x)) n(n − 1) (n − 2)!1!

n=2

Now, since the ordering of the variables X 1 , . . . , X n is independent of the set of values {X 1 , . . . , X n }, it follows that the event {N = n} is independent of {X 1 , . . . , X n }. From this, it follows that the conditional distribution of X N given that N = n is equal to the distribution of the second largest from a set of n random variables having distribution F. Thus, using the results of Example 2.38 concerning the density function of such a random variable, we obtain f X N (x) =

n=2

= f (x)(1 − F(x)) = f (x)

∞ !

(F(x))i

i=0

Thus, rather surprisingly, X N has the same distribution as X 1 , namely, F. Also, if we now let Wi = −X i , i # 1, then W M will be the value of the first Wi , which on observation is the second largest of all those that have been seen. Hence, by the preceding, it follows that W M has the same distribution as W1 . That is, −X M has the same distribution as −X 1 , and so X M also has distribution F! In other words, whether we stop at the first random variable that is the second largest of all those presently observed, or we stop at the first one that is the second smallest of all those presently observed, we will end up with a random variable having distribution F. Whereas the preceding result is quite surprising, it is a special case of a general result known as Ignatov’s theorem, which yields even more surprises. For instance, for k # 1, let Nk = min{n # k: X n = kth largest of X 1 , . . . , X n }

Conditional Probability and Conditional Expectation

127

Therefore, N2 is what we previously called N , and X Nk is the first random variable that upon observation is the kth largest of all those observed up to this point. It can then be shown by a similar argument as used in the preceding that X Nk has distribution function F for all k (see Exercise 82 at the end of this chapter). In addition, it can be shown that the random variables X Nk , k # 1 are independent. (A statement and proof of Ignatov’s theorem in the case of discrete random variables are given in Section 3.6.6.) " Example 3.30 A population consists of m families. Let X j denote the size of family j, and suppose that X 1 , . . . , X m are independent random variables having the common probability mass function p(k) = P(X j = k),

∞ ! k=1

pk = 1

with mean µ = &k kpk . Suppose a member of the population is randomly chosen, in that the selection is equally likely to be any of the members of the population, and let Si be the event that the selected individual is from a family of size i. Argue that P(Si ) →

i pi as m → ∞ µ

Solution: A heuristic argument for the preceding formula is that because each family is of size i with probability pi , it follows that there are approximately mpi families of size i when m is large. Thus, impi members of the population come from a family of size i, implying that the probability that the selected individual is from i = iµpi . a family of size i is approximately 2imp jmp j

For a more formal argument, let Ni denote the number of families that are of size i. That is, Ni = number {k : k = 1, . . . , m : X k = i}

Then, conditional on X = (X 1 , . . . , X m ) i Ni P(Si |X) = 2m k=1

Hence,

P(Si ) = E[P(Si |X )] 6 5 i Ni = E 2m Xk 5 k=1 6 i Ni /m = E 2m k=1 X k /m

Because each family is independently of size i with probability pi , it follows by the strong law of large numbers that Ni /m, the fraction of families that are of size i,

128

Introduction to Probability Models

converges to pi as m → ∞. Also by the strong law of large numbers, E[X ] = µ as m → ∞. Consequently, with probability 1, i Ni /m i pi 2m → µ k=1 X k /m

k=1

X k /m →

as m → ∞

Because the random variable 2mi Ni X converges to iµpi so does its expectation, which k=1 k proves the result. (While it is now always the case that limm→∞ Ym = c implies that limm→∞ E[Ym ] = c, the implication is true when the Ym are uniformly bounded random variables, and the random variables 2mi N i X are all between 0 and 1.) " k=1

The use of conditioning can also result in a more computationally efficient solution than a direct calculation. This is illustrated by our next example. Example 3.31 Consider n independent trials in which each trial results in one of the 2k outcomes 1, . . . , k with respective probabilities p1 , . . . , pk , i=1 pi = 1. Suppose further that n > k, and that we are interested in determining the probability that each outcome occurs at least once. If we let Ai denote the event that8outcome i does not k Ai ), and it can occur in any of the n trials, then the desired probability is 1 − P( i=1 be obtained by using the inclusion–exclusion theorem as follows: 4 3 k k 9 ! !! Ai = P(Ai ) − P(Ai A j ) P i=1

i=1

!!! i

j>i k> j

j>i

P(Ai A j Ak ) − · · · + (−1)k+1 P(A1 · · · Ak )

where P(Ai ) = (1 − pi )n P(Ai A j ) = (1 − pi − p j )n , i < j

P(Ai A j Ak ) = (1 − pi − p j − pk )n , i < j < k The difficulty with the preceding solution is that its computation requires the calculation of 2k −1 terms, each of which is a quantity raised to the power n. The preceding solution is thus computationally inefficient when k is large. Let us now see how to make use of conditioning to obtain an efficient solution. To begin, note that if we start by conditioning on Nk (the number of times that outcome k occurs) then when Nk > 0 the resulting conditional probability will equal the probability that all of the outcomes 1, . . . , k − 1 occur at least once when 2n − Nk trials are performed, and each results in outcome i with probability pi / k−1 j=1 p j , i = 1, . . . , k − 1. We could then use a similar conditioning step on these terms. To follow through on the preceding idea, let Am,r , for m ! n, r ! k, denote the event that each of the outcomes 1, . . . , r occurs at least once when m independent trials are performed, where each trial results in one of the 2 outcomes 1, . . . , r with respective probabilities p1 /Pr , . . . , pr /Pr , where Pr = rj=1 p j . Let P(m, r ) = P(Am,r )

Conditional Probability and Conditional Expectation

129

and note that P(n, k) is the desired probability. To obtain an expression for P(m, r ), condition on the number of times that outcome r occurs. This gives P(m, r ) = =

m ! j=0

P{Am,r |r occurs j times}

m−r !+1 j=1

# " # " #j " pr m− j pr m 1− j Pr Pr

" # " #j " # pr pr m− j m P(m − j, r − 1) 1− j Pr Pr

Starting with P(m, 1) = 1, if m # 1 P(m, 1) = 0, if m = 0 we can use the preceding recursion to obtain the quantities P(m, 2), m = 2, . . . , n − (k − 2), and then the quantities P(m, 3), m = 3, . . . , n − (k − 3), and so on, up to P(m, k − 1), m = k − 1, . . . , n − 1. At this point we can then use the recursion to compute P(n, k). It is not difficult to check that the amount of computation needed is a polynomial function of k, which will be much smaller than 2k when k is large. " Our next example is concerned with final score probabilities in serve and rally games such as table tennis, squash, paddle ball, volleyball, and others. Example 3.32 (Serve and Rally Competitions) Consider a serve and rally competition involving players A and B. Suppose that each rally that begins with a serve by player A is won by player A with probability pa and is won by player B with probability qa = 1 − pa . Furthermore, suppose that each rally that begins with a serve by player B is won by player A with probability pb and is won by player B with probability qb = 1 − pb . Suppose that the winner of the rally earns a point and becomes the server of the next rally. The competition is decided either when A has won a total of N points or when B has won a total of M. Given that A serves first, we are interested in determining the final score probabilities. The format of this example is used in a variety of serve and rally games, including international volleyball and American squash, both of which changed from their original format which gave service to the winner of the previous rally but only awarded a point if the winner of a rally was the server. (See Exercise 84 for an analysis of this latter format.) Let F denote the final score, with F = (i, j) meaning that A won a total of i points and B a total of j points. Clearly P(F = (N , 0)) = paN ,

P(F = (0, M)) = qa qbM−1

To determine the other final score probabilities, imagine that A and B continue to play even after the competition is decided. Define the concept of a “round" by letting the initial serve of A start the first round and letting a new round begin each time A serves. Let Bi denote the number of points won by B in round i. Note that if the first point of a round is won by A, then that round ends with B winning 0 points in it. On the other

130

Introduction to Probability Models

hand, if B wins the first point in a round then B will continue serving until A wins a point, showing that the number of points won by B in a round is equal to the number of times that B serves. Because the number of consecutive serves of B before A wins a point is geometric with parameter pb , we see that ( 0, with probability pa Bi = Geometric( pb ), with probability qa That is, P(Bi = 0) = pa P(Bi = k|Bi > 0) = qbk−1 pb , k > 0 Because a new round begins each time A wins a point, it follows that Bi is the number of points that B wins between the 2ntime that A has won i − 1 points until A has won i Bi is the number of points that B has won at the points. Consequently, B(n) ≡ i=1 moment that A wins its nth point. Noting that the final score will be (N , m), m < M, if B(N ) = m, let us determine P(B(n) = m) for m > 0. To do so, we condition on the number of B1 , . . . , Bn that are positive. Calling this number Y , that is, Y = number of i ! n such that Bi > 0 we obtain P(B(n) = m) = =

n ! r =0 n ! r =1

P(B(n) = m|Y = r )P(Y = r ) P(B(n) = m|Y = r )P(Y = r )

where the last equality followed since m > 0 and so P(B(n) = m|Y = 0) = 0. Because each of B1 , . . . , Bn is independently positive with probability qa , it follows that Y , the number of them that are positive, is binomial with parameters n, qa . Consequently, P(B(n) = m) =

n ! r =1

P(B(n) = m|Y = r )

" # n r n−r qa pa r

Now, if r of the variables B1 , . . . , Bn are positive, then B(n) is distributed as the sum of r independent geometric random variables with parameter pb , which is the negative binomial distribution of the number of trials until there have been r successes when each trial is independently a success with probability pb . Hence, P(B(n) = m|Y = r ) =

" # m − 1 r m−r pb q b r −1

Conditional Probability and Conditional Expectation

where we are using the convention that

= 0 if b > a. This gives

" # n r n−r qa pa r −1 r r =1 #" #" # n " ! pb q a r m−1 n m n = qb pa r −1 r qb pa

P(B(n) = m) =

# n " ! m−1

$a %

131

pbr qbm−r

r =1

Thus, we have shown that P(F = (N , m)) = P(B(N ) = m) # #" #" N " ! pb q a r m−1 N = qbm paN , 0 0 for some n ! 0. Note that this implies that state j is accessible from state i if and only if, starting in i, it is possible that the process will ever enter state j. This is true since if j is not accessible

Markov Chains

195

from i, then P{ever be in j|start in i} = P

∞ .

n=0 ∞ ! n=0

∞ !

/ 0 / / {X n = j}/ X 0 = i /

P{X n = j|X 0 = i}

Pinj

n=0

Two states i and j that are accessible to each other are said to communicate, and we write i ↔ j. Note that any state communicates with itself since, by definition, Pii0 = P{X 0 = i|X 0 = i} = 1 The relation of communication satisfies the following three properties: (i) State i communicates with state i, all i ! 0. (ii) If state i communicates with state j, then state j communicates with state i. (iii) If state i communicates with state j, and state j communicates with state k, then state i communicates with state k. Properties (i) and (ii) follow immediately from the definition of communication. To prove (iii) suppose that i communicates with j, and j communicates with k. Thus, there m > 0. Now by the Chapman–Kolmogorov exist integers n and m such that Pinj > 0, P jk equations, we have Pikn+m =

∞ !

m Pirn Prmk ! Pinj P jk >0

r =0

Hence, state k is accessible from state i. Similarly, we can show that state i is accessible from state k. Hence, states i and k communicate. Two states that communicate are said to be in the same class. It is an easy consequence of (i), (ii), and (iii) that any two classes of states are either identical or disjoint. In other words, the concept of communication divides the state space up into a number of separate classes. The Markov chain is said to be irreducible if there is only one class, that is, if all states communicate with each other. Example 4.14 Consider the Markov chain consisting of the three states 0, 1, 2 and having transition probability matrix " "1 1 " 2 2 0" " " " " P = " 21 41 14 " " " "0 1 2 " 3

196

Introduction to Probability Models

It is easy to verify that this Markov chain is irreducible. For example, it is possible to go from state 0 to state 2 since 0→1→2 That is, one way of getting from state 0 to state 2 is to go from state 0 to state 1 (with " probability 21 ) and then go from state 1 to state 2 (with probability 41 ). Example 4.15 Consider a Markov chain consisting of the four states 0, 1, 2, 3 and having transition probability matrix "1 1 " " " " 2 2 0 0" "1 1 " " 2 2 0 0" " P=" "1 1 1 1" "4 4 4 4" " " "0 0 0 1"

The classes of this Markov chain are {0, 1}, {2}, and {3}. Note that while state 0 (or 1) is accessible from state 2, the reverse is not true. Since state 3 is an absorbing state, that is, P33 = 1, no other state is accessible from it. "

For any state i we let f i denote the probability that, starting in state i, the process will ever reenter state i. State i is said to be recurrent if f i = 1 and transient if f i < 1. Suppose that the process starts in state i and i is recurrent. Hence, with probability 1, the process will eventually reenter state i. However, by the definition of a Markov chain, it follows that the process will be starting over again when it reenters state i and, therefore, state i will eventually be visited again. Continual repetition of this argument leads to the conclusion that if state i is recurrent then, starting in state i, the process will reenter state i again and again and again—in fact, infinitely often. On the other hand, suppose that state i is transient. Hence, each time the process enters state i there will be a positive probability, namely, 1 − f i , that it will never again enter that state. Therefore, starting in state i, the probability that the process will be in state i for exactly n time periods equals f in−1 (1 − f i ), n ! 1. In other words, if state i is transient then, starting in state i, the number of time periods that the process will be in state i has a geometric distribution with finite mean 1/(1 − f i ). From the preceding two paragraphs, it follows that state i is recurrent if and only if, starting in state i, the expected number of time periods that the process is in state i is infinite. But, letting * 1, if X n = i In = 0, if X n ̸= i # we have that ∞ n=0 In represents the number of periods that the process is in state i. Also, 1∞ 2 ∞ ! ! E In |X 0 = i = E[In |X 0 = i] n=0

n=0

Markov Chains

197

= =

∞ ! n=0 ∞ !

P{X n = i|X 0 = i} Piin

n=0

We have thus proven the following. Proposition 4.1 recurrent if

State i is ∞ ! n=1

transient if

∞ ! n=1

Piin = ∞, Piin < ∞

The argument leading to the preceding proposition is doubly important because it also shows that a transient state will only be visited a finite number of times (hence the name transient). This leads to the conclusion that in a finite-state Markov chain not all states can be transient. To see this, suppose the states are 0, 1, . . . , M and suppose that they are all transient. Then after a finite amount of time (say, after time T0 ) state 0 will never be visited, and after a time (say, T1 ) state 1 will never be visited, and after a time (say, T2 ) state 2 will never be visited, and so on. Thus, after a finite time T = max{T0 , T1 , . . . , TM } no states will be visited. But as the process must be in some state after time T we arrive at a contradiction, which shows that at least one of the states must be recurrent. Another use of Proposition 4.1 is that it enables us to show that recurrence is a class property. Corollary 4.2 j is recurrent.

If state i is recurrent, and state i communicates with state j, then state

Proof. To prove this we first note that, since state i communicates with state j, there exist integers k and m such that Pikj > 0, P jim > 0. Now, for any integer n P jm+n+k ! P jim Piin Pikj j This follows since the left side of the preceding is the probability of going from j to j in m + n + k steps, while the right side is the probability of going from j to j in m + n + k steps via a path that goes from j to i in m steps, then from i to i in an additional n steps, then from i to j in an additional k steps. From the preceding we obtain, by summing over n, that ∞ !

P jm+n+k ! P jim Pikj j

n=1

P jim Pikj

∞ ! n=1

#∞

Piin = ∞

since > 0 and n=1 Piin is infinite since state i is recurrent. Thus, by Proposition 4.1 it follows that state j is also recurrent. "

198

Introduction to Probability Models

Remarks (i) Corollary 4.2 also implies that transience is a class property. For if state i is transient and communicates with state j, then state j must also be transient. For if j were recurrent then, by Corollary 4.2, i would also be recurrent and hence could not be transient. (ii) Corollary 4.2 along with our previous result that not all states in a finite Markov chain can be transient leads to the conclusion that all states of a finite irreducible Markov chain are recurrent. Example 4.16 Let the Markov chain consisting of the states 0, 1, 2, 3 have the transition probability matrix " "0 " " "1 P=" "0 " " "0

1 2

1" 2"

" 0" " 0" " " 0"

Determine which states are transient and which are recurrent. Solution: It is a simple matter to check that all states communicate and, hence, since this is a finite chain, all states must be recurrent. " Example 4.17 "1 " "2 "1 "2 " " P = "0 " "0 " " "1 4

Consider the Markov chain having states 0, 1, 2, 3, 4 and 1 2 1 2

0 0

1 2 1 2

1 4

" 0" " " 0" " " 0" " 0" " " 1" 2

Determine the recurrent state.

Solution: This chain consists of the three classes {0, 1}, {2, 3}, and {4}. The first two classes are recurrent and the third transient. " Example 4.18 (A Random Walk) Consider a Markov chain whose state space consists of the integers i = 0, ±1, ±2, . . . , and has transition probabilities given by Pi,i+1 = p = 1 − Pi,i−1 , i = 0, ±1, ±2, . . . where 0 < p < 1. In other words, on each transition the process either moves one step to the right (with probability p) or one step to the left (with probability 1 − p). One colorful interpretation of this process is that it represents the wanderings of a drunken man as he walks along a straight line. Another is that it represents the winnings of a gambler who on each play of the game either wins or loses one dollar.

Markov Chains

199

Since all states clearly communicate, it follows from Corollary 4.2 that they are either or all recurrent. So let us consider state 0 and attempt to determine # all transient n is finite or infinite. P if ∞ n=1 00 Since it is impossible to be even (using the gambling model interpretation) after an odd number of plays we must, of course, have that 2n−1 = 0, n = 1, 2, . . . P00

On the other hand, we would be even after 2n trials if and only if we won n of these and lost n of these. Because each play of the game results in a win with probability p and a loss with probability 1− p, the desired probability is thus the binomial probability 3 4 (2n)! 2n 2n = P00 ( p(1 − p))n , n = 1, 2, 3, . . . p n (1 − p)n = n n!n! By using an approximation, due to Stirling, which asserts that √ n! ∼ n n+1/2 e−n 2π

(4.3)

where we say that an ∼ bn when limn→∞ an /bn = 1, we obtain 2n ∼ P00

(4 p(1 − p))n √ πn

# Now it # is easy to verify, for positive an , bn , that if an ∼ bn , then n an < ∞ if and #∞ n will converge if and only if only if n bn < ∞. Hence, n=1 P00 ∞ ! (4 p(1 − p))n √ πn n=1

does. However, 4 p(1 − p) # 1 with equality holding if and only if p = 21 . Hence, #∞ n 1 1 n=1 P00 = ∞ if and only if p = 2 . Thus, the chain is recurrent when p = 2 and transient if p ̸= 21 . When p = 21 , the preceding process is called a symmetric random walk. We could also look at symmetric random walks in more than one dimension. For instance, in the two-dimensional symmetric random walk the process would, at each transition, either take one step to the left, right, up, or down, each having probability 41 . That is, the state is the pair of integers (i, j) and the transition probabilities are given by P(i, j),(i+1, j) = P(i, j),(i−1, j) = P(i, j),(i, j+1) = P(i, j),(i, j−1) =

1 4

By using the same method as in the one-dimensional case, we now show that this Markov chain is also recurrent. Since the preceding chain is irreducible, it follows that all states will be recurrent 2n . Now after 2n steps, the chain will be if state 0 = (0, 0) is recurrent. So consider P00 back in its original location if for some i, 0 # i # n, the 2n steps consist of i steps to the left, i to the right, n − i up, and n − i down. Since each step will be either of these

200

Introduction to Probability Models

four types with probability 41 , it follows that the desired probability is a multinomial probability. That is, 2n P00

3 42n 1 = 4 i=0 3 42n n ! n! n! 1 (2n)! = n!n! (n − i)!i! (n − i)!i! 4 i=0 4 3 42n 3 4 ! n 3 43 1 2n n n = n i n−i 4 i=0 3 42n 3 4 3 4 1 2n 2n = n n 4 n !

(2n)! i!i!(n − i)!(n − i)!

(4.4)

where the last equality uses the combinatorial identity 3 4 ! 4 n 3 43 2n n n = n i n−i i=0

which follows upon noting that both sides represent the number of subgroups of size n one can select from a set of n white and n black objects. Now, 3 4 (2n)! 2n = n n!n! √ (2n)2n+1/2 e−2n 2π by Stirling’s approximation ∼ n 2n+1 e−2n (2π ) 4n =√ πn Hence, from Equation (4.4) we see that 2n ∼ P00

1 πn

# 2n which shows that n P00 = ∞, and thus all states are recurrent. Interestingly enough, whereas the symmetric random walks in one and two dimensions are both recurrent, all higher-dimensional symmetric random walks turn out to be transient. (For instance, the three-dimensional symmetric random walk is at each transition equally likely to move in any of six ways—either to the left, right, up, down, in, or out.) " Remark For the one-dimensional random walk of Example 4.18 here is a direct argument for establishing recurrence in the symmetric case, and for determining the probability that it ever returns to state 0 in the nonsymmetric case. Let β = P{ever return to 0}

Markov Chains

201

To determine β, start by conditioning on the initial transition to obtain β = P{ever return to 0|X 1 = 1} p + P{ever return to 0|X 1 = −1}(1 − p) (4.5) Now, let α denote the probability that the Markov chain will ever return to state 0 given that it is currently in state 1. Because the Markov chain will always increase by 1 with probability p or decrease by 1 with probability 1 − p no matter what its current state, note that α is also the probability that the Markov chain currently in state i will ever enter state i − 1, for any i. To obtain an equation for α, condition on the next transition to obtain α = P{ever return|X 1 = 1, X 2 = 0}(1 − p) + P{ever return|X 1 = 1, X 2 = 2} p = 1 − p + P{ever return|X 1 = 1, X 2 = 2} p = 1 − p + pα 2

where the final equation follows by noting that in order for the chain to ever go from state 2 to state 0 it must first go to state 1—and the probability of that ever happening is α—and if it does eventually go to state 1 then it must still go to state 0—and the conditional probability of that ever happening is also α. Therefore, α = 1 − p + pα 2 The two roots of this equation are α = 1 and α = (1 − p)/ p. Consequently, in the case of the symmetric random walk where p = 1/2 we can conclude that α = 1. By symmetry, the probability that the symmetric random walk will ever enter state 0 given that it is currently in state −1 is also 1, proving that the symmetric random walk is recurrent. Suppose now that p > 1/2. In this case, it can be shown (see Exercise 17 at the end of this chapter) that P{ever return to 0|X 1 = −1} = 1. Consequently, Equation (4.5) reduces to β = αp + 1 − p Because the random walk is transient in this case we know that β < 1, showing that α ̸= 1. Therefore, α = (1 − p)/ p, yielding that β = 2(1 − p),

p > 1/2

Similarly, when p < 1/2 we can show that β = 2 p. Thus, in general P{ever return to 0} = 2 min ( p, 1 − p)

Example 4.19 (On the Ultimate Instability of the Aloha Protocol) Consider a communications facility in which the numbers of messages arriving during each of the time periods n = 1, 2, . . . are independent and identically distributed random variables. Let ai = P{i arrivals}, and suppose that a0 + a1 < 1. Each arriving message will transmit at the end of the period in which it arrives. If exactly one message is transmitted,

202

Introduction to Probability Models

then the transmission is successful and the message leaves the system. However, if at any time two or more messages simultaneously transmit, then a collision is deemed to occur and these messages remain in the system. Once a message is involved in a collision it will, independently of all else, transmit at the end of each additional period with probability p—the so-called Aloha protocol (because it was first instituted at the University of Hawaii). We will show that such a system is asymptotically unstable in the sense that the number of successful transmissions will, with probability 1, be finite. To begin let X n denote the number of messages in the facility at the beginning of the nth period, and note that {X n , n ! 0} is a Markov chain. Now for k ! 0 define the indicator variables Ik by ⎧ ⎨1, if the first time that the chain departs state k it directly goes to state k − 1 Ik = ⎩ 0, otherwise and let it be 0 if the system is never in state k, k ! 0. (For instance, if the successive states are 0, 1, 3, 4, . . . , then I3 = 0 since when the chain first departs state 3 it goes to state 4; whereas, if they are 0, 3, 3, 2, . . . , then I3 = 1 since this time it goes to state 2.) Now, 1∞ 2 ∞ ! ! E Ik = E[Ik ] k=0

k=0

= #

∞ ! k=0 ∞ ! k=0

P{Ik = 1} P{Ik = 1|k is ever visited}

(4.6)

Now, P{Ik = 1|k is ever visited} is the probability that when state k is departed the next state is k − 1. That is, it is the conditional probability that a transition from k is to k − 1 given that it is not back into k, and so P{Ik = 1|k is ever visited} = Because

Pk,k−1 1 − Pk,k

Pk,k−1 = a0 kp(1 − p)k−1 ,

Pk,k = a0 [1 − kp(1 − p)k−1 ] + a1 (1 − p)k

which is seen by noting that if there are k messages present on the beginning of a day, then (a) there will be k − 1 at the beginning of the next day if there are no new messages that day and exactly one of the k messages transmits; and (b) there will be k at the beginning of the next day if either (i) there are no new messages and it is not the case that exactly one of the existing k messages transmits, or

Markov Chains

203

(ii) there is exactly one new message (which automatically transmits) and none of the other k messages transmits. Substitution of the preceding into Equation (4.6) yields 9 8! ∞ ∞ ! a0 kp(1 − p)k−1 Ik # E 1 − a0 [1 − kp(1 − p)k−1 ] − a1 (1 − p)k k=0

k=0

0 : X n = j} equal to the number of transitions until the Markov chain makes a transition into state j, m j = E[N j |X 0 = j] Definition: Say that the recurrent state j is positive recurrent if m j < ∞ and say that it is null recurrent if m j = ∞.

Now suppose that the Markov chain is irreducible and recurrent. In this case we now show that the long-run proportion of time that the chain spends in state j is equal to 1/m j . That is, letting π j denote the long-run proportion of time that the Markov chain is in state j, we have the following proposition. Proposition 4.4 state

If the Markov chain is irreducible and recurrent, then for any initial

π j = 1/m j Proof. Suppose that the Markov chain starts in state i, and let T1 denote the number of transitions until the chain enters state j; then let T2 denote the additional number of

Markov Chains

205

transitions from time T1 until the Markov chain next enters state j; then let T3 denote the additional number of transitions from time T1 + T2 until the Markov chain next enters state j, and so on. Note that T1 is finite because Proposition 4.3 tells us that with probability 1 a transition into j will eventually occur. Also, for n ! 2, because Tn is the number of transitions between the (n −1)th and the nth transition into state j, it follows from the Markovian property that T2 , T3 , . . . are independent and identically distributed with mean m j . Because the nth transition into state j occurs at time T1 + . . . + Tn we obtain that π j , the long-run proportion of time that the chain is in state j, is n π j = lim #n n→∞

= lim

n→∞ 1 n

= lim

n→∞ T1 n

1 mj

i=1 Ti

1 #n +

i=1 Ti

T2 +...+Tn n

where the last equality follows because limn→∞ T1 /n = 0 and, from the strong law of n n n−1 = limn→∞ T2 +...+T " large numbers, limn→∞ T2 +...+T n n−1 n = m j. Because m j < ∞ is equivalent to 1/m j > 0, it follows from the preceding that state j is positive recurrent if and only if π j > 0. We now exploit this to show that positive recurrence is a class property. Proposition 4.5

If i is positive recurrent and i ↔ j then j is positive recurrent.

Proof. Suppose that i is positive recurrent and that i ↔ j. Now, let n be such that Pi,n j > 0. Because πi is the long-run proportion of time that the chain is in state i, and Pi,n j is the long-run proportion of time when the Markov is in state i that it will be in state j after n transitions πi Pi,n j = long-run proportion of time the chain is in i and will be in j after n transitions = long-run proportion of time the chain is in j and was in i n transitions ago # long-run proportion of time the chain is in j Hence, π j ! πi Pi,n j > 0, showing that j is positive recurrent.

Remarks (i) It follows from the preceding result that null recurrence is also a class property. For suppose that i is null recurrent and i ↔ j. Because i is recurrent and i ↔ j we can conclude that j is recurrent. But if j were positive recurrent then by the preceding proposition i would also be positive recurrent. Because i is not positive recurrent, neither is j.

206

Introduction to Probability Models

(ii) An irreducible finite state Markov chain must be positive recurrent. For we know that such a chain must be recurrent; hence, all its states are either positive recurrent or null recurrent. If they were null recurrent then all the long run proportions would equal 0, which is impossible when there are only a finite number of states. Consequently, we can conclude that the chain is positive recurrent. " To determine the long-run proportions {π j , j ! 1}, note, because πi is the long-run proportion of transitions that come from state i, that πi Pi, j = long-run proportion of transitions that go from state i to state j Summing the preceding over all i now yields that ! πj = πi Pi, j i

Indeed, the following important theorem can be proven. Theorem 4.1 Consider an irreducible Markov chain. If the chain is positive recurrent then the long-run proportions are the unique solution of the equations ! πj = πi Pi, j , j ! 1 ! j

πj = 1

Moreover, if there is no solution of the preceding linear equations, then the Markov chain is either transient or null recurrent and all π j = 0.

Example 4.20 Consider Example 4.1, in which we assume that if it rains today, then it will rain tomorrow with probability α; and if it does not rain today, then it will rain tomorrow with probability β. If we say that the state is 0 when it rains and 1 when it does not rain, then by Theorem 4.1 the long-run proportions π0 and π1 are given by π0 = απ0 + βπ1 , π1 = (1 − α)π0 + (1 − β)π1 , π0 + π1 = 1

which yields that π0 =

β 1−α , π1 = 1+β −α 1+β −α

For example if α = 0.7 and β = 0.4, then the long-run proportion of rain is π0 = 0.571.

4 7

= "

Example 4.21 Consider Example 4.3 in which the mood of an individual is considered as a three-state Markov chain having a transition probability matrix " " "0.5 0.4 0.1" " " " P=" "0.3 0.4 0.3" "0.2 0.3 0.5"

Markov Chains

207

In the long run, what proportion of time is the process in each of the three states? Solution: The long run proportions πi , i = 0, 1, 2, are obtained by solving the set of equations in Equation (4.1). In this case these equations are π0 = 0.5π0 + 0.3π1 + 0.2π2 , π1 = 0.4π0 + 0.4π1 + 0.3π2 ,

π2 = 0.1π0 + 0.3π1 + 0.5π2 , π0 + π1 + π2 = 1 Solving yields π0 =

21 62 ,

π1 =

23 62 ,

π2 =

18 62

Example 4.22 (A Model of Class Mobility) A problem of interest to sociologists is to determine the proportion of society that has an upper- or lower-class occupation. One possible mathematical model would be to assume that transitions between social classes of the successive generations in a family can be regarded as transitions of a Markov chain. That is, we assume that the occupation of a child depends only on his or her parent’s occupation. Let us suppose that such a model is appropriate and that the transition probability matrix is given by " "0.45 " P=" "0.05 "0.01

0.48 0.70 0.50

" 0.07" " 0.25" " 0.49"

(4.8)

That is, for instance, we suppose that the child of a middle-class worker will attain an upper-, middle-, or lower-class occupation with respective probabilities 0.05, 0.70, 0.25. The long-run proportions πi thus satisfy π0 = 0.45π0 + 0.05π1 + 0.01π2 , π1 = 0.48π0 + 0.70π1 + 0.50π2 ,

π2 = 0.07π0 + 0.25π1 + 0.49π2 , π0 + π1 + π2 = 1 Hence, π0 = 0.07, π1 = 0.62, π2 = 0.31

In other words, a society in which social mobility between classes can be described by a Markov chain with transition probability matrix given by Equation (4.8) has, in the long run, 7 percent of its people in upper-class jobs, 62 percent of its people in middle-class jobs, and 31 percent in lower-class jobs. "

208

Introduction to Probability Models

Example 4.23 (The Hardy–Weinberg Law and a Markov Chain in Genetics) Consider a large population of individuals, each of whom possesses a particular pair of genes, of which each individual gene is classified as being of type A or type a. Assume that the proportions of individuals whose gene pairs are A A, aa, or Aa are, respectively, p0 , q0 , and r0 ( p0 + q0 + r0 = 1). When two individuals mate, each contributes one of his or her genes, chosen at random, to the resultant offspring. Assuming that the mating occurs at random, in that each individual is equally likely to mate with any other individual, we are interested in determining the proportions of individuals in the next generation whose genes are A A, aa, or Aa. Calling these proportions p, q, and r , they are easily obtained by focusing attention on an individual of the next generation and then determining the probabilities for the gene pair of that individual. To begin, note that randomly choosing a parent and then randomly choosing one of its genes is equivalent to just randomly choosing a gene from the total gene population. By conditioning on the gene pair of the parent, we see that a randomly chosen gene will be type A with probability P{A} = P{A|A A} p0 + P{A|aa}q0 + P{A|Aa}r0 = p0 + r0 /2

Similarly, it will be type a with probability P{a} = q0 + r0 /2 Thus, under random mating a randomly chosen member of the next generation will be type A A with probability p, where p = P{A}P{A} = ( p0 + r0 /2)2 Similarly, the randomly chosen member will be type aa with probability q = P{a}P{a} = (q0 + r0 /2)2 and will be type Aa with probability r = 2P{A}P{a} = 2( p0 + r0 /2)(q0 + r0 /2) Since each member of the next generation will independently be of each of the three gene types with probabilities p, q, r , it follows that the percentages of the members of the next generation that are of type A A, aa, or Aa are respectively p, q, and r . If we now consider the total gene pool of this next generation, then p + r /2, the fraction of its genes that are A, will be unchanged from the previous generation. This follows either by arguing that the total gene pool has not changed from generation to generation or by the following simple algebra: p + r /2 = ( p0 + r0 /2)2 + ( p0 + r0 /2)(q0 + r0 /2) = ( p0 + r0 /2)[ p0 + r0 /2 + q0 + r0 /2] = p0 + r0 /2 since p0 + r0 + q0 = 1 = P{A}

(4.9)

Markov Chains

209

Thus, the fractions of the gene pool that are A and a are the same as in the initial generation. From this it follows that, under random mating, in all successive generations after the initial one the percentages of the population having gene pairs A A, aa, and Aa will remain fixed at the values p, q, and r . This is known as the Hardy–Weinberg law. Suppose now that the gene pair population has stabilized in the percentages p, q, r , and let us follow the genetic history of a single individual and her descendants. (For simplicity, assume that each individual has exactly one offspring.) So, for a given individual, let X n denote the genetic state of her descendant in the nth generation. The transition probability matrix of this Markov chain, namely,

is easily verified by conditioning on the state of the randomly chosen mate. It is quite intuitive (why?) that the limiting probabilities for this Markov chain (which also equal the fractions of the individual’s descendants that are in each of the three genetic states) should just be p, q, and r . To verify this we must show that they satisfy Theorem (4.1). Because one of the equations in Theorem (4.1) is redundant, it suffices to show that ;p r< ; ; r

Thus, if p > 21 , there is a positive probability that the gambler’s fortune will increase indefinitely; while if p # 21 , the gambler will, with probability 1, go broke against an infinitely rich adversary. Example 4.28 Suppose Max and Patty decide to flip pennies; the one coming closest to the wall wins. Patty, being the better player, has a probability 0.6 of winning on each flip. (a) If Patty starts with five pennies and Max with ten, what is the probability that Patty will wipe Max out? (b) What if Patty starts with 10 and Max with 20? Solution: (a) The desired probability is obtained from Equation (4.14) by letting i = 5, N = 15, and p = 0.6. Hence, the desired probability is ? @5 1 − 23 ? @15 ≈ 0.87 1 − 23 (b) The desired probability is ? @10 1 − 23 ? @30 ≈ 0.98 1 − 23

For an application of the gambler’s ruin problem to drug testing, suppose that two new drugs have been developed for treating a certain disease. Drug i has a cure rate Pi , i = 1, 2, in the sense that each patient treated with drug i will be cured with probability Pi . These cure rates, however, are not known, and suppose we are interested in a method for deciding whether P1 > P2 or P2 > P1 . To decide upon one of these alternatives, consider the following test: Pairs of patients are treated sequentially with one member of the pair receiving drug 1 and the other drug 2. The results for each pair are determined, and the testing stops when the cumulative number of cures using one of the drugs exceeds the cumulative number of cures when using the other by some fixed predetermined number. More formally, let * 1, if the patient in the jth pair to receive drug number 1 is cured Xj = 0, otherwise * 1, if the patient in the jth pair to receive drug number 2 is cured Yj = 0, otherwise

Markov Chains

223

For a predetermined positive integer M the test stops after pair N where N is the first value of n such that either X 1 + · · · + X n − (Y1 + · · · + Yn ) = M or X 1 + · · · + X n − (Y1 + · · · + Yn ) = −M In the former case we then assert that P1 > P2 , and in the latter that P2 > P1 . In order to help ascertain whether the preceding is a good test, one thing we would like to know is the probability of it leading to an incorrect decision. That is, for given P1 and P2 where P1 > P2 , what is the probability that the test will incorrectly assert that P2 > P1 ? To determine this probability, note that after each pair is checked the cumulative difference of cures using drug 1 versus drug 2 will either go up by 1 with probability P1 (1 − P2 )—since this is the probability that drug 1 leads to a cure and drug 2 does not—or go down by 1 with probability (1− P1 )P2 , or remain the same with probability P1 P2 + (1 − P1 )(1 − P2 ). Hence, if we only consider those pairs in which the cumulative difference changes, then the difference will go up 1 with probability p = P{up 1|up 1 or down 1} P1 (1 − P2 ) = P1 (1 − P2 ) + (1 − P1 )P2 and down 1 with probability q =1− p =

P2 (1 − P1 ) P1 (1 − P2 ) + (1 − P1 )P2

Hence, the probability that the test will assert that P2 > P1 is equal to the probability that a gambler who wins each (one unit) bet with probability p will go down M before going up M. But Equation (4.14) with i = M, N = 2M, shows that this probability is given by 1 − (q/ p) M 1 − (q/ p)2M 1 = 1 + ( p/q) M

P{test asserts that P2 > P1 } = 1 −

Thus, for instance, if P1 = 0.6 and P2 = 0.4 then the probability of an incorrect decision is 0.017 when M = 5 and reduces to 0.0003 when M = 10.

4.5.2

A Model for Algorithmic Efficiency

The following optimization problem is called a linear program: minimize cx, subject to Ax = b, x!0

224

Introduction to Probability Models

where A is an m × n matrix of fixed constants; c = (c1 , . . . , cn ) and b = (b1 , . . . , bm ) . . . , xn ) is the n-vector of nonnegative are vectors of fixed constants; and x = (x1 ,# n ci xi . Supposing that n > m, it can values that is to be chosen to minimize cx ≡ i=1 be shown that the optimal x can always be chosen to have at least n − m components equal to 0—that is, it can always be taken to be one of the so-called extreme points of the feasibility region. The simplex algorithm solves this linear program by moving from an extreme point of the feasibility region to a better (in terms of the objective function cx) extreme point (via the pivot operation) until the optimal is reached. Because there can be as many ? @ as N ≡ mn such extreme points, it would seem that this method might take many

iterations, but, surprisingly to some, this does not appear to be the case in practice. To obtain a feel for whether or not the preceding statement is surprising, let us consider a simple probabilistic (Markov chain) model as to how the algorithm moves along the extreme points. Specifically, we will suppose that if at any time the algorithm is at the jth best extreme point then after the next pivot the resulting extreme point is equally likely to be any of the j − 1 best. Under this assumption, we show that the time to get from the N th best to the best extreme point has approximately, for large N , a normal distribution with mean and variance equal to the logarithm (base e) of N . Consider a Markov chain for which P11 = 1 and Pi j =

1 , i −1

j = 1, . . . , i − 1, i > 1

and let Ti denote the number of transitions needed to go from state i to state 1. A recursive formula for E[Ti ] can be obtained by conditioning on the initial transition: i−1

E[Ti ] = 1 +

1 ! E[T j ] i −1 j=1

Starting with E[T1 ] = 0, we successively see that E[T2 ] = 1,

E[T3 ] = 1 + 21 ,

E[T4 ] = 1 + 13 (1 + 1 + 21 ) = 1 +

1 2

1 3

and it is not difficult to guess and then prove inductively that E[Ti ] =

i−1 !

1/ j

j=1

However, to obtain a more complete description of TN , we will use the representation TN =

N −1 ! j=1

Markov Chains

225

where

* 1, Ij = 0,

if the process ever enters j otherwise

The importance of the preceding representation stems from the following: Proposition 4.7

I1 , . . . , I N −1 are independent and

P{I j = 1} = 1/ j, 1 # j # N − 1 Proof. Given I j+1 , . . . , I N , let n = min{i: i > j, Ii = 1} denote the lowest numbered state, greater than j, that is entered. Thus we know that the process enters state n and the next state entered is one of the states 1, 2, . . . , j. Hence, as the next state from state n is equally likely to be any of the lower number states 1, 2, . . . , n − 1 we see that P{I j = 1|I j+1 , . . . , I N } =

1/(n − 1) = 1/ j j/(n − 1)

Hence, P{I j = 1} = 1/ j, and independence follows since the preceding conditional " probability does not depend on I j+1 , . . . , I N . Corollary 4.8 # −1 1/ j. (i) E[TN ] = Nj=1 # −1 (1/ j)(1 − 1/ j). (ii) Var(TN ) = Nj=1 (iii) For N large, TN has approximately a normal distribution with mean log N and variance log N . Proof. # N −1 j=1

Parts (i) and (ii) follow from Proposition 4.7 and the representation TN = I j . Part (iii) follows from the central limit theorem since N

: N −1 N −1 ! dx dx < 1/ j < 1 + x x 1

log N
0, all other states are transient. This follows since Pi0 = P0i , which implies that starting with i individuals there is a positive probability of at least P0i that no later generation will ever consist of i individuals. Moreover, since any finite set of transient states {1, 2, . . . , n} will be visited only finitely often, this leads to the important conclusion that, if P0 > 0, then the population will either die out or its size will converge to infinity. Let ∞ ! µ= j Pj j=0

denote the mean number of offspring of a single individual, and let σ2 =

∞ ! j=0

( j − µ)2 P j

be the variance of the number of offspring produced by a single individual. Let us suppose that X 0 = 1, that is, initially there is a single individual present. We calculate E[X n ] and Var(X n ) by first noting that we may write X n−1

Xn =

! i=1

Markov Chains

235

where Z i represents the number of offspring of the ith individual of the (n − 1)st generation. By conditioning on X n−1 , we obtain E[X n ] = E[E[X n |X n−1 ]] ⎡ ⎡ ⎤⎤ X n−1 ! = E ⎣E ⎣ Z i |X n−1 ⎦⎦ i=1

= E[X n−1 µ] = µE[X n−1 ]

where we have used the fact that E[Z i ] = µ. Since E[X 0 ] = 1, the preceding yields E[X 1 ] = µ,

E[X 2 ] = µE[X 1 ] = µ2 , .. . E[X n ] = µE[X n−1 ] = µn Similarly, Var(X n ) may be obtained by using the conditional variance formula Var(X n ) = E[Var(X n |X n−1 )] + Var(E[X n |X n−1 ]) Now, given X n−1 , X n is just the sum of X n−1 independent random variables each having the distribution {P j , j ! 0}. Hence, E[X n |X n−1 ] = X n−1 µ, Var(X n |X n−1 ) = X n−1 σ 2 The conditional variance formula now yields I H Var(X n ) = E X n−1 σ 2 + Var(X n−1 µ)

Therefore,

= σ 2 µn−1 + µ2 Var(X n−1 ) ? @ = σ 2 µn−1 + µ2 σ 2 µn−2 + µ2 Var(X n−2 ) ? @ = σ 2 µn−1 + µn + µ4 Var(X n−2 ) ? @ ? @ = σ 2 µn−1 + µn + µ4 σ 2 µn−3 + µ2 Var(X n−3 ) ? @ = σ 2 µn−1 + µn + µn+1 + µ6 Var(X n−3 ) = ··· ? @ = σ 2 µn−1 + µn + · · · + µ2n−2 + µ2n Var(X 0 ) ? @ = σ 2 µn−1 + µn + · · · + µ2n−2

Var(X n ) =

σ 2 µn−1 nσ 2 ,

;

1−µn 1−µ

< ,

if µ ̸= 1 if µ = 1

(4.19)

236

Introduction to Probability Models

Let π0 denote the probability that the population will eventually die out (under the assumption that X 0 = 1). More formally, π0 = lim P{X n = 0|X 0 = 1} n→∞

The problem of determining the value of π0 was first raised in connection with the extinction of family surnames by Galton in 1889. We first note that π0 = 1 if µ < 1. This follows since n

µ = E[X n ] = !

∞ ! j=1 ∞ ! j=1

j P{X n = j} 1 · P{X n = j}

= P{X n ! 1} Since µn → 0 when µ < 1, it follows that P{X n ! 1} → 0, and hence P{X n = 0} → 1. In fact, it can be shown that π0 = 1 even when µ = 1. When µ > 1, it turns out that π0 < 1, and an equation determining π0 may be derived by conditioning on the number of offspring of the initial individual, as follows: π0 = P{population dies out} ∞ ! P{population dies out|X 1 = j}P j = j=0

Now, given that X 1 = j, the population will eventually die out if and only if each of the j families started by the members of the first generation eventually dies out. Since each family is assumed to act independently, and since the probability that any particular family dies out is just π0 , this yields j

P{population dies out|X 1 = j} = π0 and thus π0 satisfies π0 =

∞ !

(4.20)

π0 P j

j=0

In fact when µ > 1, it can be shown that π0 is the smallest positive number satisfying Equation (4.20). Example 4.32

If P0 = 21 , P1 = 41 , P2 = 41 , then determine π0 .

Solution: Since µ = Example 4.33

3 4

# 1, it follows that π0 = 1.

If P0 = 41 , P1 = 41 , P2 = 21 , then determine π0 .

Markov Chains

237

Solution: π0 satisfies π0 =

1 4

+ 41 π0 + 21 π02

or 2π02 − 3π0 + 1 = 0 The smallest positive solution of this quadratic equation is π0 = 21 .

Example 4.34 In Examples 4.32 and 4.33, what is the probability that the population will die out if it initially consists of n individuals? Solution: Since the population will die out if and only if the families of each of the members of the initial generation die out, the desired probability is π0n . For Example ? @n " 4.32 this yields π0n = 1, and for Example 4.33, π0n = 21 .

4.8

Time Reversible Markov Chains

Consider a stationary ergodic Markov chain (that is, an ergodic Markov chain that has been in operation for a long time) having transition probabilities Pi j and stationary probabilities πi , and suppose that starting at some time we trace the sequence of states going backward in time. That is, starting at time n, consider the sequence of states X n , X n−1 , X n−2 , . . .. It turns out that this sequence of states is itself a Markov chain with transition probabilities Q i j defined by Q i j = P{X m = j|X m+1 = i} P{X m = j, X m+1 = i} = P{X m+1 = i} P{X m = j}P{X m+1 = i|X m = j} = P{X m+1 = i} π j P ji = πi To prove that the reversed process is indeed a Markov chain, we must verify that P{X m = j|X m+1 = i, X m+2 , X m+3 , . . .} = P{X m = j|X m+1 = i} To see that this is so, suppose that the present time is m+1. Now, since X 0 , X 1 , X 2 , . . . is a Markov chain, it follows that the conditional distribution of the future X m+2 , X m+3 , . . . given the present state X m+1 is independent of the past state X m . However, independence is a symmetric relationship (that is, if A is independent of B, then B is independent of A), and so this means that given X m+1 , X m is independent of X m+2 , X m+3 , . . .. But this is exactly what we had to verify.

238

Introduction to Probability Models

Thus, the reversed process is also a Markov chain with transition probabilities given by Qi j =

π j P ji πi

If Q i j = Pi j for all i, j, then the Markov chain is said to be time reversible. The condition for time reversibility, namely, Q i j = Pi j , can also be expressed as πi Pi j = π j P ji for all i, j

(4.21)

The condition in Equation (4.21) can be stated that, for all states i and j, the rate at which the process goes from i to j (namely, πi Pi j ) is equal to the rate at which it goes from j to i (namely, π j P ji ). It is worth noting that this is an obvious necessary condition for time reversibility since a transition from i to j going backward in time is equivalent to a transition from j to i going forward in time; that is, if X m = i and X m−1 = j, then a transition from i to j is observed if we are looking backward, and one from j to i if we are looking forward in time. Thus, the rate at which the forward process makes a transition from j to i is always equal to the rate at which the reverse process makes a transition from i to j; if time reversible, this must equal the rate at which the forward process makes a transition from i to j. If we can find nonnegative numbers, summing to one, that satisfy Equation (4.21), then it follows that the Markov chain is time reversible and the numbers represent the limiting probabilities. This is so since if ! xi = 1 (4.22) xi Pi j = x j P ji for all i, j, i

then summing over i yields ! ! ! xi Pi j = x j P ji = x j , xi = 1 i

and, because the limiting probabilities πi are the unique solution of the preceding, it follows that xi = πi for all i. Example 4.35 abilities

Consider a random walk with states 0, 1, . . . , M and transition prob-

Pi,i+1 = αi = 1 − Pi,i−1 , i = 1, . . . , M − 1, P0,1 = α0 = 1 − P0,0 , PM,M = α M = 1 − PM,M−1

Without the need for any computations, it is possible to argue that this Markov chain, which can only make transitions from a state to one of its two nearest neighbors, is time reversible. This follows by noting that the number of transitions from i to i + 1 must at all times be within 1 of the number from i + 1 to i. This is so because between any two transitions from i to i + 1 there must be one from i + 1 to i (and conversely)

Markov Chains

239

since the only way to reenter i from a higher state is via state i + 1. Hence, it follows that the rate of transitions from i to i + 1 equals the rate from i + 1 to i, and so the process is time reversible. We can easily obtain the limiting probabilities by equating for each state i = 0, 1, . . . , M − 1 the rate at which the process goes from i to i + 1 with the rate at which it goes from i + 1 to i. This yields π0 α0 = π1 (1 − α1 ), π1 α1 = π2 (1 − α2 ), .. . πi αi = πi+1 (1 − αi+1 ), i = 0, 1, . . . , M − 1 Solving in terms of π0 yields α0 π0 , 1 − α1 α1 α1 α0 π0 π1 = π2 = 1 − α2 (1 − α2 )(1 − α1 ) π1 =

and, in general, πi = Since

#M 0

⎡

αi−1 · · · α0 π0 , i = 1, 2, . . . , M (1 − αi ) · · · (1 − α1 ) πi = 1, we obtain

π0 ⎣1 + ⎡

and

M ! j=1

π0 = ⎣1 + πi =

⎤ α j−1 · · · α0 ⎦=1 (1 − α j ) · · · (1 − α1 )

M ! j=1

α j−1 · · · α0 ⎦ (1 − α j ) · · · (1 − α1 )

αi−1 · · · α0 π0 , i = 1, . . . , M (1 − αi ) · · · (1 − α1 )

For instance, if αi ≡ α, then ⎡ ⎤ 4 j −1 M 3 ! α ⎦ π0 = ⎣1 + 1−α j=1

⎤−1

1−β 1 − β M+1

(4.23)

(4.24)

240

Introduction to Probability Models

and, in general, πi = where β=

β i (1 − β) , i = 0, 1, . . . , M 1 − β M+1 α 1−α

Another special case of Example 4.35 is the following urn model, proposed by the physicists P. and T. Ehrenfest to describe the movements of molecules. Suppose that M molecules are distributed among two urns; and at each time point one of the molecules is chosen at random, removed from its urn, and placed in the other one. The number of molecules in urn I is a special case of the Markov chain of Example 4.35 having M −i , i = 0, 1, . . . , M M Hence, using Equations (4.23) and (4.24) the limiting probabilities in this case are ⎡ ⎤−1 M ! (M − j + 1) · · · (M − 1)M ⎦ π0 = ⎣1 + j( j − 1) · · · 1 j=1 ⎡ ⎤−1 M 3 4 ! M ⎦ =⎣ j αi =

j=0

3 4M 1 = 2

where we have used the identity 3 4 1 1 M 1= + 2 2 M ! 3 M 4 3 1 4M = j 2 j=0

Hence, from Equation (4.24) 3 4 3 4M 1 M πi = , i = 0, 1, . . . , M i 2

Because the preceding are just the binomial probabilities, it follows that in the long run, the positions of each of the M balls are independent and each one is equally likely to be in either urn. This, however, is quite intuitive, for if we focus on any one ball, it becomes quite clear that its position will be independent of the positions of the other balls (since no matter where the other M − 1 balls are, the ball under consideration at each stage will be moved with probability 1/M) and by symmetry, it is equally likely to be in either urn.

Markov Chains

241

Figure 4.1 A connected graph with arc weights.

Example 4.36 Consider an arbitrary connected graph (see Section 3.6 for definitions) having a number wi j associated with arc (i, j) for each arc. One instance of such a graph is given by Figure 4.1. Now consider a particle moving from node to node in this manner: If at any time the particle resides at node i, then it will next move to node j with probability Pi j where wi j Pi j = # j wi j

and where wi j is 0 if (i, j) is not an arc. For instance, for the graph of Figure 4.1, P12 = 3/(3 + 1 + 2) = 21 . The time reversibility equations πi Pi j = π j P ji reduce to wi j w ji πi # = πj # w j ij i w ji

or, equivalently, since wi j = w ji #

πj πi =# j wi j i w ji

which is equivalent to

πi =c j wi j

πi = c

wi j

# or, since 1 = i πi # j wi j πi = # # i j wi j

242

Introduction to Probability Models

Because the πi s given by this equation satisfy the time reversibility equations, it follows that the process is time reversible with these limiting probabilities. For the graph of Figure 4.1 we have that π1 =

6 32 ,

π2 =

3 32 ,

π3 =

6 32 ,

π4 =

5 32 ,

π5 =

12 32

If we try to solve Equation (4.22) for an arbitrary Markov chain with states 0, 1, . . . , M, it will usually turn out that no solution exists. For example, from Equation (4.22), xi Pi j = x j P ji ,

xk Pk j = x j P jk

implying (if Pi j P jk > 0) that P ji Pk j xi = xk Pi j P jk which in general need not equal Pki /Pik . Thus, we see that a necessary condition for time reversibility is that Pik Pk j P ji = Pi j P jk Pki for all i, j, k

(4.25)

which is equivalent to the statement that, starting in state i, the path i → k → j → i has the same probability as the reversed path i → j → k → i. To understand the necessity of this, note that time reversibility implies that the rate at which a sequence of transitions from i to k to j to i occurs must equal the rate of ones from i to j to k to i (why?), and so we must have πi Pik Pk j P ji = πi Pi j P jk Pki

implying Equation (4.25) when πi > 0. In fact, we can show the following. Theorem 4.2 An ergodic Markov chain for which Pi j = 0 whenever P ji = 0 is time reversible if and only if starting in state i, any path back to i has the same probability as the reversed path. That is, if Pi,i1 Pi1 ,i2 · · · Pik ,i = Pi,ik Pik ,ik−1 · · · Pi1 ,i

(4.26)

for all states i, i 1 , . . . , i k .

Proof. We have already proven necessity. To prove sufficiency, fix states i and j and rewrite (4.26) as Pi,i1 Pi1 ,i2 · · · Pik , j P ji = Pi j P j,ik · · · Pi1 ,i

Summing the preceding over all states i 1 , . . . , i k yields k+1 Pik+1 j P ji = Pi j P ji

Letting k → ∞ yields π j P ji = Pi j πi

which proves the theorem.

Markov Chains

243

Example 4.37 Suppose we are given a set of n elements, numbered 1 through n, which are to be arranged in some ordered list. At each unit of time a request is made to retrieve one of these elements, element i being requested (independently of the past) with probability Pi . After being requested, the element then is put back but not necessarily in the same position. In fact, let us suppose that the element requested is moved one closer to the front of the list; for instance, if the present list ordering is 1, 3, 4, 2, 5 and element 2 is requested, then the new ordering becomes 1, 3, 2, 4, 5. We are interested in the long-run average position of the element requested. For any given probability vector P = (P1 , . . . , Pn ), the preceding can be modeled as a Markov chain with n! states, with the state at any time being the list order at that time. We shall show that this Markov chain is time reversible and then use this to show that the average position of the element requested when this one-closer rule is in effect is less than when the rule of always moving the requested element to the front of the line is used. The time reversibility of the resulting Markov chain when the one-closer reordering rule is in effect easily follows from Theorem 4.2. For instance, suppose n = 3 and consider the following path from state (1, 2, 3) to itself: (1, 2, 3) → (2, 1, 3) → (2, 3, 1) → (3, 2, 1) → (3, 1, 2) → (1, 3, 2) → (1, 2, 3) The product of the transition probabilities in the forward direction is P2 P3 P3 P1 P1 P2 = P12 P22 P32 whereas in the reverse direction, it is P3 P3 P2 P2 P1 P1 = P12 P22 P32 Because the general result follows in much the same manner, the Markov chain is indeed time reversible. (For a formal argument note that if f i denotes the number of times element i moves forward in the path, then as the path goes from a fixed state back to itself, it follows that element i will also move backward f i times. Therefore, since the backward moves of element i are precisely the times that it moves forward in the reverse path, it follows that the product of the transition probabilities for both the path and its reversal will equal = f +r Pi i i i

where ri is equal to the number of times that element i is in the first position and the path (or the reverse path) does not change states.) For any permutation i 1 , i 2 , . . . , i n of 1, 2, . . . , n, let π(i 1 , i 2 , . . . , i n ) denote the limiting probability under the one-closer rule. By time reversibility we have Pi j+1 π(i 1 , . . . , i j , i j+1 , . . . , i n ) = Pi j π(i 1 , . . . , i j+1 , i j , . . . , i n ) for all permutations.

(4.27)

244

Introduction to Probability Models

Now the average position of the element requested can be expressed (as in Section 3.6.1) as ! Pi E[Position of element i] Average position = i

! i

=1+ =1+ =1+ =1+

⎡

Pi ⎣1 +

!! i

! i< j

! j̸=i

⎤

P{element j precedes element i}⎦

Pi P{e j precedes ei }

j̸=i

[Pi P{e j precedes ei } + P j P{ei precedes e j }] [Pi P{e j precedes ei } + P j (1 − P{e j precedes ei })]

(Pi − P j )P{e j precedes ei } +

i< j

Hence, to minimize the average position of the element requested, we would want to make P{e j precedes ei } as large as possible when P j > Pi and as small as possible when Pi > P j . Under the front-of-the-line rule we showed in Section 3.6.1, P{e j precedes ei } =

Pj P j + Pi

(since under the front-of-the-line rule element j will precede element i if and only if the last request for either i or j was for j). Therefore, to show that the one-closer rule is better than the front-of-the-line rule, it suffices to show that under the one-closer rule P{e j precedes ei } >

Pj P j + Pi

when P j > Pi

Now consider any state where element i precedes element j, say, (. . . , i, i 1 , . . . , i k , j, . . .). By successive transpositions using Equation (4.27), we have π( . . . , i, i 1 , . . . , i k , j, . . . ) =

Pi Pj

4k+1

π( . . . , j, i 1 , . . . , i k , i, . . . )

For instance, P2 P2 P1 π(1, 3, 2) = π(3, 1, 2) P3 P3 P3 3 42 P2 P1 P1 P1 = π(3, 2, 1) = π(3, 2, 1) P3 P3 P2 P3

π(1, 2, 3) =

(4.28)

Markov Chains

245

Now when P j > Pi , Equation (4.28) implies that π( . . . , i, i 1 , . . . , i k , j, . . . )
0 for some i.) Example 4.39 Suppose that we want to generate a uniformly distributed element in S , the set of all permutations (x1 , . . . , xn ) of the numbers (1, . . . , n) for which # n j=1 j x j > a for a given constant a. To utilize the Hastings–Metropolis algorithm we need to define an irreducible Markov transition probability matrix on the state space S . To accomplish this, we first define a concept of “neighboring” elements of S , and then construct a graph whose vertex set is S . We start by putting an arc between each pair of neighboring elements in S , where any two permutations in S are said

Markov Chains

249

to be neighbors if one results from an interchange of two of the positions of the other. That is, (1, 2, 3, 4) and (1, 2, 4, 3) are neighbors whereas (1, 2, 3, 4) and (1, 3, 4, 2) are not. Now, define the q transition probability function as follows. With N (s) defined as the set of neighbors of s, and |N (s)| equal to the number of elements in the set N (s), let q(s, t) =

1 if t ∈ N (s) |N (s)|

That is, the candidate next state from s is equally likely to be any of its neighbors. Since the desired limiting probabilities of the Markov chain are π(s) = C, it follows that π(s) = π(t), and so α(s, t) = min(|N (s)|/|N (t)|, 1) That is, if the present state of the Markov chain is s then one of its neighbors is randomly chosen, say, t. If t is a state with fewer neighbors than s (in graph theory language, if the degree of vertex t is less than that of vertex s), then the next state is t. If not, a uniform (0,1) random number U is generated and the next state is t if U < |(N (s)|/|N (t)| and is s otherwise. The limiting probabilities of this Markov chain are π(s) = 1/|S |, where |S | is the (unknown) number of permutations in S . "

The most widely used version of the Hastings–Metropolis algorithm is the Gibbs sampler. Let X = (X 1 , . . . , X n ) be a discrete random vector with probability mass function p(x) that is only specified up to a multiplicative constant, and suppose that we want to generate a random vector whose distribution is that of X. That is, we want to generate a random vector having mass function p(x) = Cg(x) where g(x) is known, but C is not. Utilization of the Gibbs sampler assumes that for any i and values x j , j ̸= i, we can generate a random variable X having the probability mass function P{X = x} = P{X i = x|X j = x j , j ̸= i}

It operates by using the Hasting–Metropolis algorithm on a Markov chain with states x = (x1 , . . . , xn ), and with transition probabilities defined as follows. Whenever the present state is x, a coordinate that is equally likely to be any of 1, . . . , n is chosen. If coordinate i is chosen, then a random variable X with probability mass function P{X = x} = P{X i = x|X j = x j , j ̸= i} is generated. If X = x, then the state y = (x1 , . . . xi−1 , x, xi+1 , . . . , xn ) is considered as the candidate next state. In other words, with x and y as given, the Gibbs sampler uses the Hastings–Metropolis algorithm with q(x, y) =

p(y) 1 P{X i = x|X j = x j , j ̸= i} = n n P{X j = x j , j ̸= i}

250

Introduction to Probability Models

Because we want the limiting mass function to be p, we see from Equation (4.32) that the vector y is then accepted as the new state with probability 3 4 p(y)q(y, x) α(x, y) = min ,1 p(x)q(x, y) 4 3 p(y) p(x) ,1 = min p(x) p(y) =1 Hence, when utilizing the Gibbs sampler, the candidate state is always accepted as the next state of the chain. Example 4.40 Suppose that we want to generate n uniformly distributed points in the circle of radius 1 centered at the origin, conditional on the event that no two points are within a distance d of each other, when the probability of this conditioning event is small. This can be accomplished by using the Gibbs sampler as follows. Start with any n points x1 , . . . , xn in the circle that have the property that no two of them are within d of the other; then generate the value of I , equally likely to be any of the values 1, . . . , n. Then continually generate a random point in the circle until you obtain one that is not within d of any of the other n − 1 points excluding x I . At this point, replace x I by the generated point and then repeat the operation. After a large number of iterations of this algorithm, the set of n points will approximately have the desired distribution. " Example 4.41 Let X i , i = 1, . . . , n, be independent #n exponential random variables X i , and suppose that we want with respective rates λi , i = 1, . . . , n. Let S = i=1 to generate the random vector X = (X 1 , . . . , X n ), conditional on the event that S > c for some large positive constant c. That is, we want to generate the value of a random vector whose density function is f (x1 , . . . , xn ) =

i=1

= ! 1 λi e−λi xi , xi ! 0, xi > c P{S > c}

This is easily accomplished #n by starting with an initial vector x = (x1 , . . . , xn ) satisfying xi > c. Then generate a random variable I that is equally xi > 0, i = 1, . . . , n, i=1 likely to be any of 1, . . . , n. Next, generate # an exponential random variable X with rate λ I conditional on the event that X + j̸= I x j > c. This latter step, which calls for # generating the value of an exponential random variable given that it exceeds c − j̸= I x j , is easily accomplished by using the fact that an exponential conditioned to be greater than a positive constant is distributed as the constant plus the exponential. Consequently, to obtain X , first generate an exponential random variable Y with rate λ I , and then set ⎛ ⎞+ ! X = Y + ⎝c − xj⎠ j̸= I

where a + = max(a, 0).

Markov Chains

251

The value of x I should then be reset as X and a new iteration of the algorithm begun. " Remark As can be seen by Examples 4.40 and 4.41, although the theory for the Gibbs sampler was represented under the assumption that the distribution to be generated was discrete, it also holds when this distribution is continuous.

4.10

Markov Decision Processes

Consider a process that is observed at discrete time points to be in any one of M possible states, which we number by 1, 2, . . . , M. After observing the state of the process, an action must be chosen, and we let A, assumed finite, denote the set of all possible actions. If the process is in state i at time n and action a is chosen, then the next state of the system is determined according to the transition probabilities Pi j (a). If we let X n denote the state of the process at time n and an the action chosen at time n, then the preceding is equivalent to stating that P{X n+1 = j|X 0 , a0 , X 1 , a1 , . . . , X n = i, an = a} = Pi j (a) Thus, the transition probabilities are functions only of the present state and the subsequent action. By a policy, we mean a rule for choosing actions. We shall restrict ourselves to policies that are of the form that the action they prescribe at any time depends only on the state of the process at that time (and not on any information concerning prior states and actions). However, we shall allow the policy to be “randomized” in that its instructions may be to choose actions according to a probability distribution. In other words, a policy β is a set of numbers β = {βi (a), a ∈ A, i = 1, . . . , M} with the interpretation that if the process is in state i, then action a is to be chosen with probability βi (a). Of course, we need have 0 # βi (a) # 1, for all i, a ! βi (a) = 1, for all i a

Under any given policy β, the sequence of states {X n , n = 0, 1, . . .} constitutes a Markov chain with transition probabilities Pi j (β) given by Pi j (β) = Pβ {X n+1 = j|X n = i}∗ ! = Pi j (a)βi (a) a

where the last equality follows by conditioning on the action chosen when in state i. Let us suppose that for every choice of a policy β, the resultant Markov chain {X n , n = 0, 1, . . .} is ergodic. ∗

We use the notation Pβ to signify that the probability is conditional on the fact that policy β is used.

252

Introduction to Probability Models

For any policy β, let πia denote the limiting (or steady-state) probability that the process will be in state i and action a will be chosen if policy β is employed. That is, πia = lim Pβ {X n = i, an = a} n→∞

The vector π = (πia ) must satisfy (i) (ii) (iii)

πia ! 0 for all i, a, # # i a πia = 1, # # # a π ja = i a πia Pi j (a) for all j

(4.33)

Equations (i) and (ii) are obvious, and Equation (iii), which is an analogue of Theorem (4.1), follows as the left-hand side equals the steady-state probability of being in state j and the right-hand side is the same probability computed by conditioning on the state and action chosen one stage earlier. Thus for any policy β, there is a vector π = (πia ) that satisfies (i)–(iii) and with the interpretation that πia is equal to the steady-state probability of being in state i and choosing action a when policy β is employed. Moreover, it turns out that the reverse is also true. Namely, for any vector π = (πia ) that satisfies (i)–(iii), there exists a policy β such that if β is used, then the steady-state probability of being in i and choosing action a equals πia . To verify this last statement, suppose that π = (πia ) is a vector that satisfies (i)–(iii). Then, let the policy β = (βi (a)) be βi (a) = P{β chooses a|state is i} πia =# a πia

Now let Pia denote the limiting probability of being in i and choosing a when policy β is employed. We need to show that Pia = πia . To do so, first note that {Pia , i = 1, . . . , M, a ∈ A} are the limiting probabilities of the two-dimensional Markov chain {(X n , an ), n ! 0}. Hence, by the fundamental Theorem 4.1, they are the unique solution of (i′ ) (ii′ ) (iii′ )

Pia # ! 0, # Pia# = 1, i a# P ja = i a ′ Pia ′ Pi j (a ′ )β j (a)

where (iii′ ) follows since

P{X n+1 = j, an+1 = a|X n = i, an = a ′ } = Pi j (a ′ )β j (a) Because π ja β j (a) = # a π ja

Markov Chains

253

we see that (Pia ) is the unique solution of !! a

Pia ! 0, Pia = 1, P ja =

!! i

a′

π ja Pia ′ Pi j (a ′ ) # a π ja

Hence, to show that Pia = πia , we need show that !! a

πia ! 0, πia = 1, π ja =

!! i

a′

π ja πia ′ Pi j (a ′ ) # a π ja

The top two equations follow from (i) and (ii) of Equation (4.33), and the third, which is equivalent to ! a

π ja =

!! i

a′

πia ′ Pi j (a ′ )

follows from condition (iii) of Equation (4.33). Thus we have shown that a vector β = (πia ) will satisfy (i), (ii), and (iii) of Equation (4.33) if and only if there exists a policy β such that πia is equal to the steady-state probability of being in state i and # choosing action a when β is used. In fact, the policy β is defined by β i (a) = πia / a πia . The preceding is quite important in the determination of “optimal” policies. For instance, suppose that a reward R(i, a) is earned whenever action a is chosen in state i. Since R(X i , ai ) would then represent the reward earned at time i, the expected average reward per unit time under policy β can be expressed as expected average reward under β = lim E β n→∞

8 #n

i=1 R(X i , ai )

Now, if πia denotes the steady-state probability of being in state i and choosing action a, it follows that the limiting expected reward at time n equals lim E[R(X n , an )] =

n→∞

!! i

πia R(i, a)

which implies that expected average reward under β =

!! i

πia R(i, a)

254

Introduction to Probability Models

Hence, the problem of determining the policy that maximizes the expected average reward is !! maximize πia R(i, a) π =(πia )

subject to πia ! 0, for all i, a, !! πia = 1, a

! a

π ja =

!! i

πia Pi j (a), for all j

(4.34)

However, the preceding maximization problem is a special case of what is known as a linear program and can be solved by a standard linear programming algorithm known ∗ ) maximizes the preceding, then the optimal as the simplex algorithm.∗If β ∗ = (πia ∗ policy will be given by β where π∗ βi∗ (a) = # ia ∗ a πia

Remarks

(i) It can be shown that there is a π ∗ maximizing Equation (4.34) that has the property ∗ is zero for all but one value of a, which implies that the optithat for each i, πia mal policy is nonrandomized. That is, the action it prescribes when in state i is a deterministic function of i. (ii) The linear programming formulation also often works when there are restrictions placed on the class of allowable policies. For instance, suppose there is a restriction on the fraction of time the process spends in some state, say, state 1. Specifically, suppose that we are allowed to consider only policies having the property that their use results in the process being in state 1 less than 100α percent of time. To determine the optimal policy subject to this requirement, we add to the linear programming problem the additional constraint ! π1a # α a

since

4.11

π1a represents the proportion of time that the process is in state 1.

Hidden Markov Chains

Let {X n , n = 1, 2, . . .} be a Markov chain with transition probabilities Pi, j and initial state probabilities pi = P{X 1 = i}, i ! 0. Suppose that there is a finite set S # # It is called a linear program since the objective function i a R(i, a)πia and the constraints are all linear functions of the πia . For a heuristic analysis of the simplex algorithm, see 4.5.2.

∗

Markov Chains

255

of signals, and that a signal from S is emitted each time the Markov chain enters a state. Further, suppose that when the Markov chain enters state j then, independently of previous # Markov chain states and signals, the signal emitted is s with probability p(s| j), s∈S p(s| j) = 1. That is, if Sn represents the nth signal emitted, then P{S1 = s|X 1 = j} = p(s| j),

P{Sn = s|X 1 , S1 , . . . , X n−1 , Sn−1 , X n = j} = p(s| j)

A model of the preceding type in which the sequence of signals S1 , S2 , . . . is observed, while the sequence of underlying Markov chain states X 1 , X 2 , . . . is unobserved, is called a hidden Markov chain model. Example 4.42 Consider a production process that in each period is either in a good state (state 1) or in a poor state (state 2). If the process is in state 1 during a period then, independent of the past, with probability 0.9 it will be in state 1 during the next period and with probability 0.1 it will be in state 2. Once in state 2, it remains in that state forever. Suppose that a single item is produced each period and that each item produced when the process is in state 1 is of acceptable quality with probability 0.99, while each item produced when the process is in state 2 is of acceptable quality with probability 0.96. If the status, either acceptable or unacceptable, of each successive item is observed, while the process states are unobservable, then the preceding is a hidden Markov chain model. The signal is the status of the item produced, and has value either a or u, depending on whether the item is acceptable or unacceptable. The signal probabilities are p(u|1) = 0.01, p(u|2) = 0.04,

p(a|1) = 0.99, p(a|2) = 0.96

while the transition probabilities of the underlying Markov chain are P1,1 = 0.9 = 1 − P1,2 ,

P2,2 = 1

Although {Sn , n ! 1} is not a Markov chain, it should be noted that, conditional on the current state X n , the sequence Sn , X n+1 , Sn+1 , . . . of future signals and states is independent of the sequence X 1 , S1 , . . . , X n−1 , Sn−1 of past states and signals. Let Sn = (S1 , . . . , Sn ) be the random vector of the first n signals. For a fixed sequence of signals s1 , . . . , sn , let sk = (s1 , . . . , sk ), k # n. To begin, let us determine the conditional probability of the Markov chain state at time n given that Sn = sn . To obtain this probability, let Fn ( j) = P{Sn = sn , X n = j} and note that P{Sn = sn , X n = j} P{Sn = sn } Fn ( j) =# i Fn (i)

P{X n = j|Sn = sn } =

256

Introduction to Probability Models

Now, Fn ( j) = P{Sn−1 = sn−1 , Sn = sn , X n = j} ! = P{Sn−1 = sn−1 , X n−1 = i, X n = j, Sn = sn } i

= =

! i

Fn−1 (i)P{X n = j, Sn = sn |Sn−1 = sn−1 , X n−1 = i} Fn−1 (i)P{X n = j, Sn = sn |X n−1 = i} Fn−1 (i)Pi, j p(sn | j)

= p(sn | j)

Fn−1 (i)Pi, j

(4.35)

where the preceding used that P{X n = j, Sn = sn |X n−1 = i} = P{X n = j|X n−1 = i} × P{Sn = sn |X n = j, X n−1 = i} = Pi, j P{Sn = sn |X n = j} = Pi, j p(sn | j)

Starting with F1 (i) = P{X 1 = i, S1 = s1 } = pi p(s1 |i) we can use Equation (4.35) to recursively determine the functions F2 (i), F3 (i), . . ., up to Fn (i). Example 4.43 Suppose in Example 4.42 that P{X 1 = 1} = 0.8. It is given that the successive conditions of the first three items produced are a, u, a. (a) What is the probability that the process was in its good state when the third item was produced? (b) What is the probability that X 4 is 1? (c) What is the probability that the next item produced is acceptable? Solution: With s3 = (a, u, a), we have F1 (1) = (0.8)(0.99) = 0.792,

F1 (2) = (0.2)(0.96) = 0.192

F2 (1) = 0.01[0.792(0.9) + 0.192(0)] = 0.007128, F2 (2) = 0.04[0.792(0.1) + 0.192(1)] = 0.010848 F3 (1) = 0.99[(0.007128)(0.9)] ≈ 0.006351,

F3 (2) = 0.96[(0.007128)(0.1) + 0.010848] ≈ 0.011098

Markov Chains

257

Therefore, the answer to part (a) is P{X 3 = 1|s3 } ≈

0.006351 ≈ 0.364 0.006351 + 0.011098

To compute P{X 4 = 1|s3 }, condition on X 3 to obtain P{X 4 = 1|s3 } = P{X 4 = 1|X 3 = 1, s3 }P{X 3 = 1|s3 }

+P{X 4 = 1|X 3 = 2, s3 }P{X 3 = 2|s3 } = P{X 4 = 1|X 3 = 1, s3 }(0.364)

+P{X 4 = 1|X 3 = 2, s3 }(0.636) = 0.364P1,1 + 0.636P2,1

= 0.3276

To compute P{S4 = a|s3 }, condition on X 4 to obtain P{S4 = a|s3 } = P{S4 = a|X 4 = 1, s3 }P{X 4 = 1|s3 } +P{S4 = a|X 4 = 2, s3 }P{X 4 = 2|s3 } = P{S4 = a|X 4 = 1}(0.3276)

+P{S4 = a|X 4 = 2}(1 − 0.3276) = (0.99)(0.3276) + (0.96)(0.6724) = 0.9698

# To compute P{Sn = sn }, use the identity P{Sn = sn } = i Fn (i) along with Equation (4.35). If there are N states of the Markov chain, this requires computing n N quantities Fn (i), with each computation requiring a summation over N terms. This can be compared with a computation of P{Sn = sn } based on conditioning on the first n states of the Markov chain to obtain P{Sn = sn } = =

i 1 ,...,i n

P{Sn = sn |X 1 = i 1 , . . . , X n = i n }P{X 1 = i 1 , . . . , X n = i n } p(s1 |i 1 ) · · · p(sn |i n ) pi1 Pi1 ,i2 Pi2 ,i3 · · · Pin−1 ,in

The use of the preceding identity to compute P{Sn = sn } would thus require a summation over N n terms, with each term being a product of 2n values, indicating that it is not competitive with the previous approach. The computation of P{Sn = sn } by recursively determining the functions Fk (i) is known as the forward approach. There also is a backward approach, which is based on the quantities Bk (i), defined by Bk (i) = P{Sk+1 = sk+1 , . . . , Sn = sn |X k = i}

258

Introduction to Probability Models

A recursive formula for Bk (i) can be obtained by conditioning on X k+1 . ! P{Sk+1 = sk+1 , . . . , Sn = sn |X k = i, X k+1 = j}P{X k+1 = j|X k = i} Bk (i) = j

= =

! j

P{Sk+1 = sk+1 , . . . , Sn = sn |X k+1 = j}Pi, j P{Sk+1 = sk+1 |X k+1 = j}

×P{Sk+2 = sk+2 , . . . , Sn = sn |Sk+1 = sk+1 , X k+1 = j}Pi, j ! = p(sk+1 | j)P{Sk+2 = sk+2 , . . . , Sn = sn |X k+1 = j}Pi, j j

! j

p(sk+1 | j)Bk+1 ( j)Pi, j

(4.36)

Starting with Bn−1 (i) = P{Sn = sn |X n−1 = i} ! = Pi, j p(sn | j) j

we would then use Equation (4.36) to determine the function Bn−2 (i), then Bn−3 (i), and so on, down to B1 (i). This would then yield P{Sn = sn } via ! P{S1 = s1 , . . . , Sn = sn |X 1 = i} pi P{Sn = sn } = i

= = =

! i

P{S1 = s1 |X 1 = i}P{S2 = s2 , . . . , Sn = sn |S1 = s1 , X 1 = i} pi p(s1 |i)P{S2 = s2 , . . . , Sn = sn |X 1 = i} pi p(s1 |i)B1 (i) pi

Another approach to obtaining P{Sn = sn } is to combine both the forward and backward approaches. Suppose that for some k we have computed both functions Fk ( j) and Bk ( j). Because P{Sn = sn , X k = j} = P{Sk = sk , X k = j}

×P{Sk+1 = sk+1 , . . . , Sn = sn |Sk = sk , X k = j} = P{Sk = sk , X k = j}P{Sk+1 = sk+1 , . . . , Sn = sn |X k = j} = Fk ( j)Bk ( j)

we see that P{Sn = sn } =

! j

Fk ( j)Bk ( j)

Markov Chains

259

The beauty of using the preceding identity to determine P{Sn = sn } is that we may simultaneously compute the sequence of forward functions, starting with F1 , as well as the sequence of backward functions, starting at Bn−1 . The parallel computations can then be stopped once we have computed both Fk and Bk for some k.

4.11.1

Predicting the States

Suppose the first n observed signals are sn = (s1 , . . . , sn ), and that given this data we want to predict the first n states of the Markov chain. The best predictor depends on what we are trying to accomplish. If our objective is to maximize the expected number of states that are correctly predicted, then for each k = 1, . . . , n we need to compute P{X k = j|Sn = sn } and then let the value of j that maximizes this quantity be the predictor of X k . (That is, we take the mode of the conditional probability mass function of X k , given the sequence of signals, as the predictor of X k .) To do so, we must first compute this conditional probability mass function, which is accomplished as follows. For k # n, P{Sn = sn , X k = j} P{Sn = sn } Fk ( j)Bk ( j) =# j Fk ( j)Bk ( j)

P{X k = j|Sn = sn } =

Thus, given that Sn = sn , the optimal predictor of X k is the value of j that maximizes Fk ( j)Bk ( j). A different variant of the prediction problem arises when we regard the sequence of states as a single entity. In this situation, our objective is to choose that sequence of states whose conditional probability, given the sequence of signals, is maximal. For instance, in signal processing, while X 1 , . . . , X n might be the actual message sent, S1 , . . . , Sn would be what is received, and so the objective would be to predict the actual message in its entirety. Letting Xk = (X 1 , . . . , X k ) be the vector of the first k states, the problem of interest is to find the sequence of states i 1 , . . . , i n that maximizes P{Xn = (i 1 , . . . , i n )|Sn = sn }. Because P{Xn = (i 1 , . . . , i n )|Sn = sn } =

P{Xn = (i 1 , . . . , i n ), Sn = sn } P{Sn = ss }

this is equivalent to finding the sequence of states i 1 , . . . , i n that maximizes P{Xn = (i 1 , . . . , i n ), Sn = sn }. To solve the preceding problem let, for k # n, Vk ( j) = max P{Xk−1 = (i 1 , . . . , i k−1 ), X k = j, Sk = sk } i 1 ,...,i k−1

260

Introduction to Probability Models

To recursively solve for Vk ( j), use that Vk ( j) = max max P{Xk−2 = (i 1 , . . . , i k−2 ), X k−1 = i, X k = j, Sk = sk } i

i 1 ,...,i k−2

= max max P{Xk−2 = (i 1 , . . . , i k−2 ), X k−1 = i, Sk−1 = sk−1 , i

i 1 ,...,i k−2

X k = j, Sk = sk } = max max P{Xk−2 = (i 1 , . . . , i k−2 ), X k−1 = i, Sk−1 = sk−1 } i

i 1 ,...,i k−2

×P{X k = j, Sk = sk |Xk−2 = (i 1 , . . . , i k−2 ), X k−1 = i, Sk−1 = sk−1 } = max max P{Xk−2 = (i 1 , . . . , i k−2 ), X k−1 = i, Sk−1 = sk−1 } i

i 1 ,...,i k−2

×P{X k = j, Sk = sk |X k−1 = i} = max P{X k = j, Sk = sk |X k−1 = i} i

× max P{Xk−2 = (i 1 , . . . , i k−2 ), X k−1 = i, Sk−1 = sk−1 } i 1 ,...,i k−2

= max Pi, j p(sk | j)Vk−1 (i) i

= p(sk | j) max Pi, j Vk−1 (i) i

(4.37)

Starting with V1 ( j) = P{X 1 = j, S1 = s1 } = p j p(s1 | j) we now use the recursive identity (4.37) to determine V2 ( j) for each j; then V3 ( j) for each j; and so on, up to Vn ( j) for each j. To obtain the maximizing sequence of states, we work in the reverse direction. Let jn be the value (or any of the values if there are more than one) of j that maximizes Vn ( j). Thus jn is the final state of a maximizing state sequence. Also, for k < n, let i k ( j) be a value of i that maximizes Pi, j Vk (i). Then max P{Xn = (i 1 , . . . , i n ), Sn = sn }

i 1 ,...,i n

= max Vn ( j) j

= Vn ( jn )

= max P{Xn = (i 1 , . . . , i n−1 , jn ), Sn = sn } i 1 ,...,i n−1

= p(sn | jn ) max Pi, jn Vn−1 (i) i

= p(sn | jn )Pin−1 ( jn ), jn Vn−1 (i n−1 ( jn )) Thus, i n−1 ( jn ) is the next to last state of the maximizing sequence. Continuing in this manner, the second from the last state of the maximizing sequence is i n−2 (i n−1 ( jn )), and so on. The preceding approach to finding the most likely sequence of states given a prescribed sequence of signals is known as the Viterbi Algorithm.

Markov Chains

261

Exercises *1. Three white and three black balls are distributed in two urns in such a way that each contains three balls. We say that the system is in state i,i = 0, 1, 2, 3, if the first urn contains i white balls. At each step, we draw one ball from each urn and place the ball drawn from the first urn into the second, and conversely with the ball from the second urn. Let X n denote the state of the system after the nth step. Explain why {X n , n = 0, 1, 2, . . .} is a Markov chain and calculate its transition probability matrix. 2. Suppose that whether or not it rains today depends on previous weather conditions through the last three days. Show how this system may be analyzed by using a Markov chain. How many states are needed? 3. In Exercise 2, suppose that if it has rained for the past three days, then it will rain today with probability 0.8; if it did not rain for any of the past three days, then it will rain today with probability 0.2; and in any other case the weather today will, with probability 0.6, be the same as the weather yesterday. Determine P for this Markov chain. *4. Consider a process {X n , n = 0, 1, . . .}, which takes on the values 0, 1, or 2. Suppose P{X n+1 = j|X n = i, X n−1 = i n−1 , . . . , X 0 = i 0 } - I Pi j , when n is even = II Pi j , when n is odd # # where 2j=0 PiIj = 2j=0 PiIIj = 1, i = 0, 1, 2. Is {X n , n ! 0} a Markov chain? If not, then show how, by enlarging the state space, we may transform it into a Markov chain. 5. A Markov chain {X n , n ! 0} with states 0, 1, 2, has the transition probability matrix ⎡ 1 1 1 ⎤ 2

3 1 3

1 2

⎢ ⎢ 0 ⎣

6 2 3 1 2

⎥ ⎥ ⎦

If P{X 0 = 0} = P{X 0 = 1} = 41 , find E[X 3 ]. 6. Let the transition probability matrix of a two-state Markov chain be given, as in Example 4.2, by " " " p 1 − p" " " P=" " "1 − p p " Show by mathematical induction that " "1 1 1 1 n" " + (2 p − 1)n 2 − 2 (2 p − 1) " "2 2 (n) P =" " 1 1 n" " 1 − 1 (2 p − 1)n 2 2 2 + 2 (2 p − 1)

262

Introduction to Probability Models

7. In Example 4.4 suppose that it has rained neither yesterday nor the day before yesterday. What is the probability that it will rain tomorrow? 8. Suppose that coin 1 has probability 0.7 of coming up heads, and coin 2 has probability 0.6 of coming up heads. If the coin flipped today comes up heads, then we select coin 1 to flip tomorrow, and if it comes up tails, then we select coin 2 to flip tomorrow. If the coin initially flipped is equally likely to be coin 1 or coin 2, then what is the probability that the coin flipped on the third day after the initial flip is coin 1? Suppose that the coin flipped on Monday comes up heads. What is the probability that the coin flipped on Friday of the same week also comes up heads? 9. In a sequence of independent flips of a fair coin that comes up heads with probability .6, what is the probability that there is a run of three consecutive heads within the first 10 flips? 10. In Example 4.3, Gary is currently in a cheerful mood. What is the probability that he is not in a glum mood on any of the following three days? 11. In Example 4.3, Gary was in a glum mood four days ago. Given that he hasn’t felt cheerful in a week, what is the probability he is feeling glum today? 12. For a Markov chain {X n , n ! 0} with transition probabilities Pi, j , consider the conditional probability that X n = m given that the chain started at time 0 in state i and has not yet entered state r by time n, where r is a specified state not equal to either i or m. We are interested in whether this conditional probability is equal to the n stage transition probability of a Markov chain whose state space does not include state r and whose transition probabilities are Pi, j , i, j ̸= r Q i, j = 1 − Pi,r Either prove the equality

n P{X n = m|X 0 = i, X k ̸= r, k = 1, . . . , n} = Q i,m

or construct a counterexample. 13. Let P be the transition probability matrix of a Markov chain. Argue that if for some positive integer r, Pr has all positive entries, then so does Pn , for all integers n ! r . 14. Specify the classes of the following Markov chains, and determine whether they are transient or recurrent: " " " 0 0 0 1" " " " " 1 1 " "0 " " 2 2" " " " " "1 " 0 0 0 1" 1" , " P2 = " P1 = " 2 0 2 " ", " 1 1 0 0" " " " " 2 2 " 1 1 0" " " 2 2 " 0 0 1 0" " " "1 "1 3 " " 2 0 21 0 0 " " " " " 4 4 0 0 0" " " "1 1 1 "1 1 " " " " " 4 2 4 0 0" " 2 2 0 0 0" " " "1 " 1 " " P4 = " P3 = " " 2 0 2 0 0" , " 0 0 1 0 0" " " " " "0 0 0 1 1 " " 0 0 1 2 0" " " " " 2 2 3 3 " " " " "0 0 0 1 1 " " 1 0 0 0 0" 2 2

Markov Chains

263

15. Prove that if the number of states in a Markov chain is M, and if state j can be reached from state i, then it can be reached in M steps or less. *16. Show that if state i is recurrent and state i does not communicate with state j, then Pi j = 0. This implies that once a process enters a recurrent class of states it can never leave that class. For this reason, a recurrent class is often referred to as a closed class. 17. For the random walk of Example 4.18 use the strong law of large numbers to give another proof that the Markov chain is transient when p ̸= 21 . #n Hint: Note that the state at time n can be written as i=1 Yi where the Yi s are 1 − P{Yi = −1}. Argue that if p > 21 , then, independent and P{Yi = 1} = p = # by the strong law of large numbers, n1 Yi → ∞ as n → ∞ and hence the initial state 0 can be visited only finitely often, and hence must be transient. A similar argument holds when p < 21 . 18. Coin 1 comes up heads with probability 0.6 and coin 2 with probability 0.5. A coin is continually flipped until it comes up tails, at which time that coin is put aside and we start flipping the other one. (a) What proportion of flips use coin 1? (b) If we start the process with coin 1 what is the probability that coin 2 is used on the fifth flip? 19. For Example 4.4, calculate the proportion of days that it rains. 20. A transition probability matrix P is said to be doubly stochastic if the sum over each column equals one; that is, ! Pi j = 1, for all j i

If such a chain is irreducible and aperiodic and consists of M+1 states 0, 1, . . . , M, show that the long-run proportions are given by πj =

1 , M +1

j = 0, 1, . . . , M

*21. A DNA nucleotide has any of four values. A standard model for a mutational change of the nucleotide at a specific location is a Markov chain model that supposes that in going from period to period the nucleotide does not change with probability 1 − 3α, and if it does change then it is equally likely to change to any of the other three values, for some 0 < α < 13 . n = 1 + 3 (1 − 4α)n . (a) Show that P1,1 4 4 (b) What is the long-run proportion of time the chain is in each state?

22. Let Yn be the sum of n independent rolls of a fair die. Find lim P{Yn is a multiple of 13}

n→∞

Hint: Define an appropriate Markov chain and apply the results of Exercise 20.

264

Introduction to Probability Models

23. In a good weather year the number of storms is Poisson distributed with mean 1; in a bad year it is Poisson distributed with mean 3. Suppose that any year’s weather conditions depends on past years only through the previous year’s condition. Suppose that a good year is equally likely to be followed by either a good or a bad year, and that a bad year is twice as likely to be followed by a bad year as by a good year. Suppose that last year—call it year 0—was a good year. (a) Find the expected total number of storms in the next two years (that is, in years 1 and 2). (b) Find the probability there are no storms in year 3. (c) Find the long-run average number of storms per year. 24. Consider three urns, one colored red, one white, and one blue. The red urn contains 1 red and 4 blue balls; the white urn contains 3 white balls, 2 red balls, and 2 blue balls; the blue urn contains 4 white balls, 3 red balls, and 2 blue balls. At the initial stage, a ball is randomly selected from the red urn and then returned to that urn. At every subsequent stage, a ball is randomly selected from the urn whose color is the same as that of the ball previously selected and is then returned to that urn. In the long run, what proportion of the selected balls are red? What proportion are white? What proportion are blue? 25. Each morning an individual leaves his house and goes for a run. He is equally likely to leave either from his front or back door. Upon leaving the house, he chooses a pair of running shoes (or goes running barefoot if there are no shoes at the door from which he departed). On his return he is equally likely to enter, and leave his running shoes, either by the front or back door. If he owns a total of k pairs of running shoes, what proportion of the time does he run barefooted? 26. Consider the following approach to shuffling a deck of n cards. Starting with any initial ordering of the cards, one of the numbers 1, 2, . . . , n is randomly chosen in such a manner that each one is equally likely to be selected. If number i is chosen, then we take the card that is in position i and put it on top of the deck—that is, we put that card in position 1. We then repeatedly perform the same operation. Show that, in the limit, the deck is perfectly shuffled in the sense that the resultant ordering is equally likely to be any of the n! possible orderings. *27. Each individual in a population of size N is, in each period, either active or inactive. If an individual is active in a period then, independent of all else, that individual will be active in the next period with probability α. Similarly, if an individual is inactive in a period then, independent of all else, that individual will be inactive in the next period with probability β. Let X n denote the number of individuals that are active in period n. (a) (b) (c) (d)

Argue that X n , n ! 0 is a Markov chain. Find E[X n |X 0 = i]. Derive an expression for its transition probabilities. Find the long-run proportion of time that exactly j people are active. Hint for (d):

Consider first the case where N = 1.

Markov Chains

265

28. Every time that the team wins a game, it wins its next game with probability 0.8; every time it loses a game, it wins its next game with probability 0.3. If the team wins a game, then it has dinner together with probability 0.7, whereas if the team loses then it has dinner together with probability 0.2. What proportion of games result in a team dinner? 29. An organization has N employees where N is a large number. Each employee has one of three possible job classifications and changes classifications (independently) according to a Markov chain with transition probabilities ⎡ ⎤ 0.7 0.2 0.1 ⎣0.2 0.6 0.2⎦ 0.1 0.4 0.5

What percentage of employees are in each classification? 30. Three out of every four trucks on the road are followed by a car, while only one out of every five cars is followed by a truck. What fraction of vehicles on the road are trucks? 31. A certain town never has two sunny days in a row. Each day is classified as being either sunny, cloudy (but dry), or rainy. If it is sunny one day, then it is equally likely to be either cloudy or rainy the next day. If it is rainy or cloudy one day, then there is one chance in two that it will be the same the next day, and if it changes then it is equally likely to be either of the other two possibilities. In the long run, what proportion of days are sunny? What proportion are cloudy? *32. Each of two switches is either on or off during a day. On day n, each switch will independently be on with probability [1 + number of on switches during day n − 1]/4

For instance, if both switches are on during day n − 1, then each will independently be on during day n with probability 3/4. What fraction of days are both switches on? What fraction are both off? 33. A professor continually gives exams to her students. She can give three possible types of exams, and her class is graded as either having done well or badly. Let pi denote the probability that the class does well on a type i exam, and suppose that p1 = 0.3, p2 = 0.6, and p3 = 0.9. If the class does well on an exam, then the next exam is equally likely to be any of the three types. If the class does badly, then the next exam is always type 1. What proportion of exams are type i, i = 1, 2, 3? 34. A flea moves around the vertices of a triangle in the following manner: Whenever it is at vertex i it moves to its clockwise neighbor vertex with probability pi and to the counterclockwise neighbor with probability qi = 1 − pi , i = 1, 2, 3. (a) Find the proportion of time that the flea is at each of the vertices. (b) How often does the flea make a counterclockwise move that is then followed by five consecutive clockwise moves?

35. Consider a Markov chain with states 0, 1, 2, 3, 4. Suppose P0,4 = 1; and suppose that when the chain is in state i, i > 0, the next state is equally likely to be any of the states 0, 1, . . . , i − 1. Find the limiting probabilities of this Markov chain.

266

Introduction to Probability Models

36. The state of a process changes daily according to a two-state Markov chain. If the process is in state i during one day, then it is in state j the following day with probability Pi, j , where P0,0 = 0.4,

P0,1 = 0.6,

P1,0 = 0.2,

P1,1 = 0.8

Every day a message is sent. If the state of the Markov chain that day is i then the message sent is “good” with probability pi and is “bad” with probability qi = 1 − pi , i = 0, 1

(a) If the process is in state 0 on Monday, what is the probability that a good message is sent on Tuesday? (b) If the process is in state 0 on Monday, what is the probability that a good message is sent on Friday? (c) In the long run, what proportion of messages are good? (d) Let Yn equal 1 if a good message is sent on day n and let it equal 2 otherwise. Is {Yn , n ! 1} a Markov chain? If so, give its transition probability matrix. If not, briefly explain why not. 37. Show that the stationary probabilities for the Markov chain having transition probabilities Pi, j are also the stationary probabilities for the Markov chain whose transition probabilities Q i, j are given by Q i, j = Pi,k j for any specified positive integer k. 38. Capa plays either one or two chess games every day, with the number of games that she plays on successive days being a Markov chain with transition probabilities P1,1 = .2,

P1,2 = .8

P2,1 = .4,

P2,2 = .6

Capa wins each game with probability p. Suppose she plays two games on Monday. (a) What is the probability that she wins all the games she plays on Tuesday? (b) What is the expected number of games that she plays on Wednesday? (c) In the long run, on what proportion of days does Capa win all her games. 39. Consider the one-dimensional symmetric random walk of Example 4.18, which was shown in that example to be recurrent. Let πi denote the long-run proportion of time that the chain is in state i. (a) Argue that# πi = π0 for all i. (b) Show that i πi ̸= 1. (c) Conclude that this Markov chain is null recurrent, and thus all πi = 0.

40. A particle moves on 12 points situated on a circle. At each step it is equally likely to move one step in the clockwise or in the counterclockwise direction. Find the mean number of steps for the particle to return to its starting position.

Markov Chains

267

*41. Consider a Markov chain with states equal to the nonnegative integers, and suppose its transition probabilities satisfy Pi, j = 0 if j # i. Assume X 0 = 0, and let e j be the probability that the Markov chain is ever in state j. (Note that e0 = 1 because X 0 = 0.) Argue that for j > 0 ej =

j−1 !

ei Pi, j

i=0

If Pi,i+k = 1/3, k = 1, 2, 3, find ei for i = 1, . . . , 10. 42. Let A be a set of states, and let Ac be the remaining states. (a) What is the interpretation of !!

πi Pi j ?

i∈A j∈Ac

(b) What is the interpretation of !!

πi Pi j ?

i∈Ac j∈A

i∈A j∈Ac

πi Pi j =

πi Pi j

i∈Ac j∈A

43. Each day, # one of n possible elements is requested, the ith one with probability Pi , i ! 1, n1 Pi = 1. These elements are at all times arranged in an ordered list that is revised as follows: The element selected is moved to the front of the list with the relative positions of all the other elements remaining unchanged. Define the state at any time to be the list ordering at that time and note that there are n! possible states. (a) Argue that the preceding is a Markov chain. (b) For any state i 1 , . . . , i n (which is a permutation of 1, 2, . . . , n), let π(i 1 , . . . , i n ) denote the limiting probability. In order for the state to be i 1 , . . . , i n , it is necessary for the last request to be for i 1 , the last non-i 1 request for i 2 , the last non-i 1 or i 2 request for i 3 , and so on. Hence, it appears intuitive that π(i 1 , . . . , i n ) = Pi1

Pin−1 Pi2 Pi3 ··· 1 − Pi1 1 − Pi1 − Pi2 1 − Pi1 − · · · − Pin−2

Verify when n = 3 that the preceding are indeed the limiting probabilities.

44. Suppose that a population consists of a fixed number, say, m, of genes in any generation. Each gene is one of two possible genetic types. If exactly i (of the m)

268

Introduction to Probability Models

genes of any generation are of type 1, then the next generation will have j type 1 (and m − j type 2) genes with probability 3 4 3 4j 3 4 i m − i m− j m , j m m

j = 0, 1, . . . , m

Let X n denote the number of type 1 genes in the nth generation, and assume that X 0 = i.

(a) Find E[X n ]. (b) What is the probability that eventually all the genes will be type 1? 45. Consider an irreducible finite Markov chain with states 0, 1, . . . , N .

(a) Starting in state i, what is the probability the process will ever visit state j? Explain! (b) Let xi = P{visit state N before state 0|start in i}. Compute a set of linear equations that the xi satisfy, i = 0, 1, . . . , N . # (c) If j j Pi j = i for i = 1, . . . , N − 1, show that xi = i/N is a solution to the equations in part (b).

46. An individual possesses r umbrellas that he employs in going from his home to office, and vice versa. If he is at home (the office) at the beginning (end) of a day and it is raining, then he will take an umbrella with him to the office (home), provided there is one to be taken. If it is not raining, then he never takes an umbrella. Assume that, independent of the past, it rains at the beginning (end) of a day with probability p. (a) Define a Markov chain with r + 1 states, which will help us to determine the proportion of time that our man gets wet. (Note: He gets wet if it is raining, and all umbrellas are at his other location.) (b) Show that the limiting probabilities are given by

πi =

⎧ q ⎪ , ⎪ ⎪ ⎨ r +q ⎪ ⎪ ⎪ ⎩

1 , r +q

if i = 0 if i = 1, . . . , r

where q = 1 − p

(c) What fraction of time does our man get wet? (d) When r = 3, what value of p maximizes the fraction of time he gets wet

*47. Let {X n , n ! 0} denote an ergodic Markov chain with limiting probabilities πi . Define the process {Yn , n ! 1} by Yn = (X n−1 , X n ). That is, Yn keeps track of the last two states of the original chain. Is {Yn , n ! 1} a Markov chain? If so, determine its transition probabilities and find lim P{Yn = (i, j)}

n→∞

Markov Chains

269

48. Consider a Markov chain in steady state. Say that a k length run of zeroes ends at time m if X m−k−1 ̸= 0,

X m−k = X m−k+1 = . . . = X m−1 = 0, X m ̸= 0

Show that the probability of this event is π0 (P0,0 )k−1 (1 − P0,0 )2 , where π0 is the limiting probability of state 0. 49. Let P (1) and P (2) denote transition probability matrices for ergodic Markov chains having the same state space. Let π 1 and π 2 denote the stationary (limiting) probability vectors for the two chains. Consider a process defined as follows: (a) X 0 = 1. A coin is then flipped and if it comes up heads, then the remaining states X 1 , . . . are obtained from the transition probability matrix P (1) and if tails from the matrix P (2) . Is {X n , n ! 0} a Markov chain? If p = P{coin comes up heads}, what is limn→∞ P(X n = i)? (b) X 0 = 1. At each stage the coin is flipped and if it comes up heads, then the next state is chosen according to P (1) and if tails comes up, then it is chosen according to P (2) . In this case do the successive states constitute a Markov chain? If so, determine the transition probabilities. Show by a counterexample that the limiting probabilities are not the same as in part (a). 50. In Exercise 8, if today’s flip lands heads, what is the expected number of additional flips needed until the pattern t, t, h, t, h, t, t occurs? 51. In Example 4.3, Gary is in a cheerful mood today. Find the expected number of days until he has been glum for three consecutive days. 52. A taxi driver provides service in two zones of a city. Fares picked up in zone A will have destinations in zone A with probability 0.6 or in zone B with probability 0.4. Fares picked up in zone B will have destinations in zone A with probability 0.3 or in zone B with probability 0.7. The driver’s expected profit for a trip entirely in zone A is 6; for a trip entirely in zone B is 8; and for a trip that involves both zones is 12. Find the taxi driver’s average profit per trip. 53. Find the average premium received per policyholder of the insurance company of Example 4.27 if λ = 1/4 for one-third of its clients, and λ = 1/2 for two-thirds of its clients. 54. Consider the Ehrenfest urn model in which M molecules are distributed between two urns, and at each time point one of the molecules is chosen at random and is then removed from its urn and placed in the other one. Let X n denote the number of molecules in urn 1 after the nth switch and let µn = E[X n ]. Show that (a) µn+1 = 1 + (1 − 2/M)µn . (b) Use (a) to prove that 3 4 3 4 M −2 n M M E[X 0 ] − + µn = 2 M 2

55. Consider a population of individuals each of whom possesses two genes that can be either type A or type a. Suppose that in outward appearance type A is dominant and type a is recessive. (That is, an individual will have only the outward

270

Introduction to Probability Models

characteristics of the recessive gene if its pair is aa.) Suppose that the population has stabilized, and the percentages of individuals having respective gene pairs A A, aa, and Aa are p, q, and r . Call an individual dominant or recessive depending on the outward characteristics it exhibits. Let S11 denote the probability that an offspring of two dominant parents will be recessive; and let S10 denote the probability that the offspring of one dominant and one recessive parent will be 2 . (The quantities S and recessive. Compute S11 and S10 to show that S11 = S10 10 S11 are known in the genetics literature as Snyder’s ratios.) 56. Suppose that on each play of the game a gambler either wins 1 with probability p or loses 1 with probability 1 − p. The gambler continues betting until she or he is either up n or down m. What is the probability that the gambler quits a winner? 57. A particle moves among n + 1 vertices that are situated on a circle in the following manner. At each step it moves one step either in the clockwise direction with probability p or the counterclockwise direction with probability q = 1 − p. Starting at a specified state, call it state 0, let T be the time of the first return to state 0. Find the probability that all states have been visited by time T . Hint: Condition on the initial transition and then use results from the gambler’s ruin problem. 58. In the gambler’s ruin problem of Section 4.5.1, suppose the gambler’s fortune is presently i, and suppose that we know that the gambler’s fortune will eventually reach N (before it goes to 0). Given this information, show that the probability he wins the next gamble is p[1 − (q/ p)i+1 ] , if p ̸= 1 − (q/ p)i i +1 , if p = 21 2i Hint:

1 2

The probability we want is P{X n+1 = i + 1|X n = i, lim X m = N } m→∞

P{X n+1 = i + 1, limm X m = N |X n = i} = P{limm X m = N |X n = i}

59. For the gambler’s ruin model of Section 4.5.1, let Mi denote the mean number of games that must be played until the gambler either goes broke or reaches a fortune of N , given that he starts with i, i = 0, 1, . . . , N . Show that Mi satisfies M0 = M N = 0;

Mi = 1 + pMi+1 + q Mi−1 , i = 1, . . . , N − 1

Solve these equations to obtain Mi = i(N − i), =

if p =

1 2

i N 1 − (q/ p)i , if p ̸= − q − p q − p 1 − (q/ p) N

1 2

Markov Chains

271

60. The following is the transition probability matrix of a Markov chain with states 1, 2, 3, 4 ⎛ ⎞ .4 .3 .2 .1 ⎜ .2 .2 .2 .4 ⎟ ⎟ P=⎜ ⎝ .25 .25 .5 0 ⎠ .2 .1 .4 .3 If X 0 = 1

(a) find the probability that state 3 is entered before state 4; (b) find the mean number of transitions until either state 3 or state 4 is entered. 61. Suppose in the gambler’s ruin problem that the probability of winning a bet depends on the gambler’s present fortune. Specifically, suppose that αi is the probability that the gambler wins a bet when his or her fortune is i. Given that the gambler’s initial fortune is i, let P(i) denote the probability that the gambler’s fortune reaches N before 0. (a) Derive a formula that relates P(i) to P(i − 1) and P(i + 1). (b) Using the same approach as in the gambler’s ruin problem, solve the equation of part (a) for P(i). (c) Suppose that i balls are initially in urn 1 and N − i are in urn 2, and suppose that at each stage one of the N balls is randomly chosen, taken from whichever urn it is in, and placed in the other urn. Find the probability that the first urn becomes empty before the second. *62. Consider the particle from Exercise 57. What is the expected number of steps the particle takes to return to the starting position? What is the probability that all other positions are visited before the particle returns to its starting state? 63. For the Markov chain with states 1, 2, 3, 4 whose transition probability matrix P is as specified below find f i3 and si3 for i = 1, 2, 3. ⎡ ⎤ 0.4 0.2 0.1 0.3 ⎢ 0.1 0.5 0.2 0.2 ⎥ ⎥ P=⎢ ⎣ 0.3 0.4 0.2 0.1 ⎦ 0 0 0 1

64. Consider a branching process having µ < 1. Show that if X 0 = 1, then the expected number of individuals that ever exist in this population is given by 1/(1 − µ). What if X 0 = n? 65. In a branching process having X 0 = 1 and µ > 1, prove that π0 is the smallest positive number satisfying Equation (4.20). # j Hint: Let π be any solution of π = ∞ j=0 π P j . Show by mathematical induction that π ! P{X n = 0} for all n, and let n → ∞. In using the induction argue that ∞ ! P{X n = 0} = (P{X n−1 = 0}) j P j j=0

272

Introduction to Probability Models

66. For a branching process, calculate π0 when (a) P0 = 41 , P2 = 43 .

(b) P0 = 41 , P1 = 21 , P2 = 41 . (c) P0 = 16 , P1 = 21 , P3 = 13 .

67. At all times, an urn contains N balls—some white balls and some black balls. At each stage, a coin having probability p, 0 < p < 1, of landing heads is flipped. If heads appears, then a ball is chosen at random from the urn and is replaced by a white ball; if tails appears, then a ball is chosen from the urn and is replaced by a black ball. Let X n denote the number of white balls in the urn after the nth stage. Is {X n , n ! 0} a Markov chain? If so, explain why. What are its classes? What are their periods? Are they transient or recurrent? Compute the transition probabilities Pi j . Let N = 2. Find the proportion of time in each state. Based on your answer in part (d) and your intuition, guess the answer for the limiting probability in the general case. (f) Prove your guess in part (e) either by showing that Theorem (4.1) is satisfied or by using the results of Example 4.35. (g) If p = 1, what is the expected time until there are only white balls in the urn if initially there are i white and N − i black? (a) (b) (c) (d) (e)

*68. (a) Show that the limiting probabilities of the reversed Markov chain are the same as for the forward chain by showing that they satisfy the equations πj =

πi Q i j

(b) Give an intuitive explanation for the result of part (a). 69. M balls are initially distributed among m urns. At each stage one of the balls is selected at random, taken from whichever urn it is in, and then placed, at random, in one of the other M −1 urns. Consider the Markov chain whose state at any time is the vector (n 1 , . . . , n m ) where n i denotes the number of balls in urn i. Guess at the limiting probabilities for this Markov chain and then verify your guess and show at the same time that the Markov chain is time reversible. 70. A total of m white and m black balls are distributed among two urns, with each urn containing m balls. At each stage, a ball is randomly selected from each urn and the two selected balls are interchanged. Let X n denote the number of black balls in urn 1 after the nth interchange. (a) Give the transition probabilities of the Markov chain X n , n ! 0. (b) Without any computations, what do you think are the limiting probabilities of this chain? (c) Find the limiting probabilities and show that the stationary chain is time reversible.

Markov Chains

273

71. It follows from Theorem 4.2 that for a time reversible Markov chain Pi j P jk Pki = Pik Pk j P ji , for all i, j, k

It turns out that if the state space is finite and Pi j > 0 for all i, j, then the preceding is also a sufficient condition for time reversibility. (That is, in this case, we need only check Equation (4.26) for paths from i to i that have only two intermediate states.) Prove this. Hint:

Fix i and show that the equations π j P jk = πk Pk j

# are satisfied by π j = c Pi j /P ji , where c is chosen so that j π j = 1. 72. For a time reversible Markov chain, argue that the rate at which transitions from i to j to k occur must equal the rate at which transitions from k to j to i occur. 73. Show that the Markov chain of Exercise 31 is time reversible. 74. A group of n processors is arranged in an ordered list. When a job arrives, the first processor in line attempts it; if it is unsuccessful, then the next in line tries it; if it too is unsuccessful, then the next in line tries it, and so on. When the job is successfully processed or after all processors have been unsuccessful, the job leaves the system. At this point we are allowed to reorder the processors, and a new job appears. Suppose that we use the one-closer reordering rule, which moves the processor that was successful one closer to the front of the line by interchanging its position with the one in front of it. If all processors were unsuccessful (or if the processor in the first position was successful), then the ordering remains the same. Suppose that each time processor i attempts a job then, independently of anything else, it is successful with probability pi . (a) Define an appropriate Markov chain to analyze this model. (b) Show that this Markov chain is time reversible. (c) Find the long-run probabilities. 75. A Markov chain is said to be a tree process if (i) Pi j > 0 whenever P ji > 0, (ii) for every pair of states i and j, i ̸= j, there is a unique sequence of distinct states i = i 0 , i 1 , . . . , i n−1 , i n = j such that Pik ,ik+1 > 0, k = 0, 1, . . . , n − 1

In other words, a Markov chain is a tree process if for every pair of distinct states i and j there is a unique way for the process to go from i to j without reentering a state (and this path is the reverse of the unique path from j to i). Argue that an ergodic tree process is time reversible. 76. On a chessboard compute the expected number of plays it takes a knight, starting in one of the four corners of the chessboard, to return to its initial position if we assume that at each play it is equally likely to choose any of its legal moves. (No other pieces are on the board.) Hint:

Make use of Example 4.36.

274

Introduction to Probability Models

77. In a Markov decision problem, another criterion often used, different than the expected average return per unit time, is that of the expected discounted return. In this criterion we# choose a number α, 0 < α < 1, and try to choose a policy so ∞ i α R(X i , ai )] (that is, rewards at time n are discounted at as to maximize E[ i=0 n rate α ). Suppose that the initial state is chosen according to the probabilities bi . That is, P{X 0 = i} = bi , i = 1, . . . , n

For a given policy β let y ja denote the expected discounted time that the process is in state j and action a is chosen. That is, 1∞ 2 ! n y ja = E β α I{X n = j,an =a} n=0

where for any event A the indicator variable I A is defined by * 1, if A occurs IA = 0, otherwise

(a) Show that ! a

y ja = E

∞ ! n=0

α I{X n = j}

# or, in other words, a y ja is the expected discounted time in state j under β. (b) Show that !! 1 , y ja = 1−α a j ! !! y ja = b j + α yia Pi j (a) a

Hint:

For the second equation, use the identity !! I{X n=i ,an=a } I{X n+1 = j} I{X n+1 = j} = i

Take expectations of the preceding to obtain H I !! H I E I X n+1 = j} = E I{X n=i ,an=a } Pi j (a) i

(4.38)

Markov Chains

275

Argue that y ja can be interpreted as the expected discounted time that the process is in state j and action a is chosen when the initial state is chosen according to the probabilities b j and the policy β, given by yia βi (a) = # a yia

is employed.

Hint: Derive a set of equations for the expected discounted times when policy β is used and show that they are equivalent to Equation (4.38). (d) Argue that an optimal policy with respect to the expected discounted return criterion can be obtained by first solving the linear program !! maximize y ja R ( j, a), j

such that

!! j

1 , 1−α

y ja =

y ja = b j + α

!! i

yia Pi j (a),

y ja ! 0, all j, a;

and then defining the policy β ∗ by y∗ βi∗ (a) = # ia ∗ a yia

where the y ∗ja are the solutions of the linear program. 78. For the Markov chain of Exercise 5, suppose that p(s| j) is the probability that signal s is emitted when the underlying Markov chain state is j, j = 0, 1, 2.

(a) What proportion of emissions are signal s? (b) What proportion of those times in which signal s is emitted is 0 the underlying state?

79. In Example 4.43, what is the probability that the first 4 items produced are all acceptable?

References [1] K. L. Chung, “Markov Chains with Stationary Transition Probabilities,” Springer, Berlin, 1960. [2] S. Karlin and H. Taylor, “A First Course in Stochastic Processes,” Second Edition, Academic Press, New York, 1975.

276

Introduction to Probability Models

[3] J. G. Kemeny and J. L. Snell, “Finite Markov Chains,” Van Nostrand Reinhold, Princeton, New Jersey, 1960. [4] S. M. Ross, “Stochastic Processes,” Second Edition, John Wiley, New York, 1996. [5] S. Ross and E. Pekoz, “A Second Course in Probability,” Probabilitybookstore.com, 2006.

The Exponential Distribution and the Poisson Process

5.1

Introduction

In making a mathematical model for a real-world phenomenon it is always necessary to make certain simplifying assumptions so as to render the mathematics tractable. On the other hand, however, we cannot make too many simplifying assumptions, for then our conclusions, obtained from the mathematical model, would not be applicable to the real-world situation. Thus, in short, we must make enough simplifying assumptions to enable us to handle the mathematics but not so many that the mathematical model no longer resembles the real-world phenomenon. One simplifying assumption that is often made is to assume that certain random variables are exponentially distributed. The reason for this is that the exponential distribution is both relatively easy to work with and is often a good approximation to the actual distribution. The property of the exponential distribution that makes it easy to analyze is that it does not deteriorate with time. By this we mean that if the lifetime of an item is exponentially distributed, then an item that has been in use for ten (or any number of) hours is as good as a new item in regards to the amount of time remaining until the item fails. This will be formally defined in Section 5.2 where it will be shown that the exponential is the only distribution that possesses this property. In Section 5.3 we shall study counting processes with an emphasis on a kind of counting process known as the Poisson process. Among other things we shall discover about this process is its intimate connection with the exponential distribution.

278

Introduction to Probability Models

5.2

The Exponential Distribution

5.2.1

Definition

A continuous random variable X is said to have an exponential distribution with parameter λ, λ > 0, if its probability density function is given by !

f (x) =

λe−λx , 0,

x !0 x 400 0, if X " 400

Let Y = (X − 400)+ be the amount paid. By the lack of memory property of the exponential, it follows that if a damage amount exceeds 400, then the amount by

282

Introduction to Probability Models

which it exceeds it is exponential with mean 1000. Therefore, E[Y |I = 1] = 1000 E[Y |I = 0] = 0

Var(Y |I = 1) = (1000)2 Var(Y |I = 0) = 0 which can be conveniently written as E[Y |I ] = 103 I,

Var(Y |I ) = 106 I

Because I is a Bernoulli random variable that is equal to 1 with probability e−0.4 , we obtain & ' E[Y ] = E E[Y |I ] = 103 E[I ] = 103 e−0.4 ≈ 670.32

and, by the conditional variance formula & ' ( ) Var(Y ) = E Var(Y |I ) + Var E[Y |I ]

= 106 e−0.4 + 106 e−0.4 (1 − e−0.4 )

where the final equality used that the variance of a Bernoulli random variable with parameter p is p(1 − p). Consequently, * Var(Y ) ≈ 944.09 #

It turns out that not only is the exponential distribution “memoryless,” but it is the unique distribution possessing this property. To see this, suppose that X is memoryless ¯ and let F(x) = P{X > x}. Then by Equation (5.3) it follows that ¯ + t) = F(s) ¯ F(t) ¯ F(s

¯ That is, F(x) satisfies the functional equation g(s + t) = g(s)g(t) However, it turns out that the only right continuous solution of this functional equation is g(x) = e−λx ∗ ∗

This is proven as follows: If g(s + t) = g(s)g(t), then + , + , + , 2 1 1 1 =g + = g2 g n n n n

and repeating this yields g(m/n) = g m (1/n). Also, + , , + 1 1 1 1 + + ··· + = gn or g(1) = g n n n n

+ , 1 = (g(1))1/n n

Hence g(m/n) = (g(1))m/n , which implies, since g is right continuous, that g(x) = (g(1))x . Since g(1) = (g( 21 ))2 ! 0 we obtain g(x) = e−λx , where λ = −log(g(1)).

The Exponential Distribution and the Poisson Process

283

and since a distribution function is always right continuous we must have ¯ F(x) = e−λx or F(x) = P{X " x} = 1 − e−λx which shows that X is exponentially distributed. Example 5.5 A store must decide how much of a certain commodity to order so as to meet next month’s demand, where that demand is assumed to have an exponential distribution with rate λ. If the commodity costs the store c per pound, and can be sold at a price of s > c per pound, how much should be ordered so as to maximize the store’s expected profit? Assume that any inventory left over at the end of the month is worthless and that there is no penalty if the store cannot meet all the demand. Solution: Let X equal the demand. If the store orders the amount t, then the profit, call it P, is given by P = s min(X, t) − ct Writing min(X, t) = X − (X − t)+ we obtain, upon conditioning whether X > t and then using the lack of memory property of the exponential, that E[(X − t)+ ] = E[(X − t)+ |X > t]P(X > t) +E[(X − t)+ |X " t]P(X " t) = E[(X − t)+ |X > t]e−λt 1 = e−λt λ

where the final equality used the lack of memory property of exponential random variables to conclude that, conditional on X exceeding t, the amount by which it exceeds it is an exponential random variable with rate λ. Hence, E[min(X, t)] =

1 1 − e−λt λ λ

giving that E[P] =

s s − e−λt − ct λ λ

Differentiation now yields that the maximal profit is attained when se−λt − c = 0; that is, when t=

1 log(s/c) λ

284

Introduction to Probability Models

Now, suppose that all unsold inventory can be returned for the amount r < min(s, c) per pound; and also that there is a penalty cost p per pound of unmet demand. In this case, using our previously derived expression for E[P], we have E[P] =

s s − e−λt − ct + r E[(t − X )+ ] − pE[(X − t)+ ] λ λ

Using that min(X, t) = t − (t − X )+ we see that E[(t − X )+ ] = t − E[min(X, t)] = t −

1 1 + e−λt λ λ

Hence, s r s r p − e−λt − ct + r t − + e−λt − e−λt λ λ λ λ λ r − s − p −λt s −r + e = − (c − r )t λ λ

E[P] =

Calculus now yields that the optimal amount to order is + , s + p−r 1 t = log λ c−r It is worth noting that the optimal amount to order increases in s, p, and r and decreases in λ and c. (Are these monotonicity properties intuitive?) # The memoryless property is further illustrated by the failure rate function (also called the hazard rate function) of the exponential distribution. Consider a continuous positive random variable X having distribution function F and density f . The failure (or hazard) rate function r (t) is defined by r (t) =

f (t) 1 − F(t)

(5.4)

To interpret r (t), suppose that an item, having lifetime X , has survived for t hours, and we desire the probability that it does not survive for an additional time dt. That is, consider P{X ∈ (t, t + dt)|X > t}. Now, P{X ∈ (t, t + dt), X > t} P{X > t} P{X ∈ (t, t + dt)} = P{X > t} f (t) dt = r (t) dt ≈ 1 − F(t)

P{X ∈ (t, t + dt)|X > t} =

The Exponential Distribution and the Poisson Process

285

That is, r (t) represents the conditional probability density that a t-year-old item will fail. Suppose now that the lifetime distribution is exponential. Then, by the memoryless property, it follows that the distribution of remaining life for a t-year-old item is the same as for a new item. Hence, r (t) should be constant. This checks out since f (t) 1 − F(t) λe−λt = −λt = λ e

r (t) =

Thus, the failure rate function for the exponential distribution is constant. The parameter λ is often referred to as the rate of the distribution. (Note that the rate is the reciprocal of the mean, and vice versa.) It turns out that the failure rate function r (t) uniquely determines the distribution F. To prove this, we note by Equation (5.4) that r (t) =

d dt

F(t) 1 − F(t)

Integrating both sides yields " t log(1 − F(t)) = − r (t) dt + k 0

! " t 1 − F(t) = e exp − r (t) dt k

Letting t = 0 shows that k = 0 and thus ! " t F(t) = 1 − exp − r (t) dt 0

The preceding identity can also be used to show that exponential random variables are the only ones that are memoryless. Because if X is memoryless, then its failure rate function must be constant. But if r (t) = c, then by the preceding equation 1 − F(t) = e−ct showing that the random variable is exponential. Example 5.6 Let X 1 , . . . , X n be independent exponential random variables with respective rates λ1 , . . . , λn , where λi ̸= λ j when i ̸= j. Let T be independent of these random variables and suppose that n . j=1

Pj = 1

where P j = P{T = j}

286

Introduction to Probability Models

The random variable X T is said to be a hyperexponential random variable. To see how such a random variable might originate, imagine that a bin contains n different types of batteries, with a type j battery lasting for an exponential distributed time with rate λ j , j = 1, . . . , n. Suppose further that P j is the proportion of batteries in the bin that are type j for each j = 1, . . . , n. If a battery is randomly chosen, in the sense that it is equally likely to be any of the batteries in the bin, then the lifetime of the battery selected will have the hyperexponential distribution specified in the preceding. To obtain the distribution function F of X = X T , condition on T . This yields 1 − F(t) = P{X > t} n . = P{X > t|T = i}P{T = i} i=1

n .

Pi e−λi t

i=1

Differentiation of the preceding yields f , the density function of X . f (t) =

n .

λi Pi e−λi t

i=1

Consequently, the failure rate function of a hyperexponential random variable is /n −λ j t j=1 P j λ j e r (t) = /n −λi t i=1 Pi e By noting that

P{X > t|T = j}P{T = j} P{X > t} P j e−λ j t = /n −λi t i=1 Pi e

P{T = j|X > t} =

we see that the failure rate function r (t) can also be written as r (t) =

n . j=1

λ j P{T = j|X > t}

If λ1 < λi , for all i > 1, then P1 e−λ1 t /n P1 + i=2 Pi e−λi t P1 /n = P1 + i=2 Pi e−(λi −λ1 )t →1 as t → ∞

P{T = 1|X > t} =

e−λ1 t

The Exponential Distribution and the Poisson Process

287

Similarly, P{T = i|X > t} → 0 when i ̸= 1, thus showing that lim r (t) = min λi

t→∞

That is, as a randomly chosen battery ages its failure rate converges to the failure rate of the exponential type having the smallest failure rate, which is intuitive since the longer the battery lasts, the more likely it is a battery type with the smallest failure rate. #

5.2.3

Further Properties of the Exponential Distribution

Let X 1 , . . . , X n be independent and identically distributed exponential random variables having mean 1/λ. It follows from the results of Example 2.39 that X 1 + · · · + X n has a gamma distribution with parameters n and λ. Let us now give a second verification of this result by using mathematical induction. Because there is nothing to prove when n = 1, let us start by assuming that X 1 + · · · + X n−1 has density given by f X 1 +···+X n−1 (t) = λe−λt

(λt)n−2 (n − 2)!

Hence, f X 1 + ··· +X n−1 +X n (t) = =

∞ 0 t 0

f X n (t − s) f X 1 + ··· +X n−1 (s) ds

λe−λ(t−s) λe−λs

= λe−λt

(λt)n−1 (n − 1)!

(λs)n−2 ds (n − 2)!

which proves the result. Another useful calculation is to determine the probability that one exponential random variable is smaller than another. That is, suppose that X 1 and X 2 are independent exponential random variables with respective means 1/λ1 and 1/λ2 ; what is P{X 1 < X 2 }? This probability is easily calculated by conditioning on X 1 : " ∞ P{X 1 < X 2 |X 1 = x}λ1 e−λ1 x d x P{X 1 < X 2 } = "0 ∞ P{x < X 2 }λ1 e−λ1 x d x = 0 " ∞ e−λ2 x λ1 e−λ1 x d x = 0 " ∞ λ1 e−(λ1 +λ2 )x d x = 0

λ1 λ1 + λ2

(5.5)

288

Introduction to Probability Models

Suppose that X 1 , X 2 , . . . , X n are independent exponential random variables, with X i having rate µi , i = 1, . . . , n. It turns out that the smallest of the X i is exponential with a rate equal to the sum of the µi . This is shown as follows: P{minimum(X 1 , . . . , X n ) > x} = P{X i > x for each i = 1, . . . , n} n 0 = P{X i > x} (by independence) =

i=1 n 0 i=1

e−µi x 1 2

= exp −

n . i=1

µi

3 4 x

(5.6)

Example 5.7 (Analyzing Greedy Algorithms for the Assignment Problem) A group of n people is to be assigned to a set of n jobs, with one person assigned to each job. For a given set of n 2 values Ci j , i, j = 1, . . . , n, a cost Ci j is incurred when person i is assigned to job j. The classical assignment problem is to determine the set of assignments that minimizes the sum of the n costs incurred. Rather than trying to determine the optimal assignment, let us consider two heuristic algorithms for solving this problem. The first heuristic is as follows. Assign person 1 to the job that results in the least cost. That is, person 1 is assigned to job j1 where C(1, j1 ) = minimum j C(1, j). Now eliminate that job from consideration and assign person 2 to the job that results in the least cost. That is, person 2 is assigned to job j2 where C(2, j2 ) = minimum j̸= j1 C(2, j). This procedure is then continued until all n persons are assigned. Since this procedure always selects the best job for the person under consideration, we will call it Greedy Algorithm A. The second algorithm, which we call Greedy Algorithm B, is a more “global” version of the first greedy algorithm. It considers all n 2 cost values and chooses the pair i 1 , j1 for which C(i, j) is minimal. It then assigns person i 1 to job j1 . It then eliminates all cost values involving either person i 1 or job j1 (so that (n − 1)2 values remain) and continues in the same fashion. That is, at each stage it chooses the person and job that have the smallest cost among all the unassigned people and jobs. Under the assumption that the Ci j constitute a set of n 2 independent exponential random variables each having mean 1, which of the two algorithms results in a smaller expected total cost? Solution: Suppose first that Greedy Algorithm A is employed. Let Ci denote the cost associated with person i, i = 1, . . . , n. Now C1 is the minimum of n independent exponentials each having rate 1; so by Equation (5.6) it will be exponential with rate n. Similarly, C2 is the minimum of n − 1 independent exponentials with rate 1, and so is exponential with rate n − 1. Indeed, by the same reasoning Ci will be exponential with rate n − i + 1, i = 1, . . . , n. Thus, the expected total cost under

The Exponential Distribution and the Poisson Process

289

Greedy Algorithm A is E A [total cost] = E[C1 + · · · + Cn ] n . = 1/i i=1

Let us now analyze Greedy Algorithm B. Let Ci be the cost of the ith person-job pair assigned by this algorithm. Since C1 is the minimum of all the n 2 values Ci j , it follows from Equation (5.6) that C1 is exponential with rate n 2 . Now, it follows from the lack of memory property of the exponential that the amounts by which the other Ci j exceed C1 will be independent exponentials with rates 1. As a result, C2 is equal to C1 plus the minimum of (n − 1)2 independent exponentials with rate 1. Similarly, C3 is equal to C2 plus the minimum of (n − 2)2 independent exponentials with rate 1, and so on. Therefore, we see that E[C1 ] = 1/n 2 ,

E[C2 ] = E[C3 ] = .. . E[C j ] = .. . E[Cn ] =

E[C1 ] + 1/(n − 1)2 , E[C2 ] + 1/(n − 2)2 , E[C j−1 ] + 1/(n − j + 1)2 , E[Cn−1 ] + 1

Therefore, E[C1 ] = 1/n 2 ,

E[C2 ] = 1/n 2 + 1/(n − 1)2 , E[C3 ] = 1/n 2 + 1/(n − 1)2 + 1/(n − 2)2 , .. . E[Cn ] = 1/n 2 + 1/(n − 1)2 + 1/(n − 2)2 + · · · + 1 Adding up all the E[Ci ] yields E B [total cost] = n/n 2 + (n − 1)/(n − 1)2 + (n − 2)/(n − 2)2 + · · · + 1 n . 1 = i i=1

The expected cost is thus the same for both greedy algorithms.

Let X 1 , . . . , X n be independent exponential random variables, with respective rates λ1 , . . . , λn . A useful result, generalizing Equation (5.5), is that X i is the smallest of

290

Introduction to Probability Models

/ these with probability λi / j λ j . This is shown as follows: ! ! P X i = min X j = P X i < min X j j

λi

j̸=i

= /n

j=1 λ j

where the final equality / uses Equation (5.5) along with the fact that min j̸=i X j is exponential with rate j̸=i λ j . Another important fact is that mini X i and the rank ordering of the X i are independent. To see why this is true, consider the conditional probability that X i1 < X i2 < · · · < X in given that the minimal value is greater than t. Because mini X i > t means that all the X i are greater than t, it follows from the lack of memory property of exponential random variables that their remaining lives beyond t remain independent exponential random variables with their original rates. Consequently, # # 5 5 # # P X i1 < · · · < X in min X i > t = P X i1 − t < · · · < X in − t min X i > t i

= P{X i1 < · · · < X in }

That is, we have proven the following. Proposition If X 1 , . . . , X n are independent exponential random /n variables with respecλi . Further, mini X i tive rates λ1 , . . . , λn , then mini X i is exponential with rate i=1 and the rank order of the variables X 1 , . . . , X n are independent. Example 5.8 Suppose you arrive at a post office having two clerks at a moment when both are busy but there is no one else waiting in line. You will enter service when either clerk becomes free. If service times for clerk i are exponential with rate λi , i = 1, 2, find E[T ], where T is the amount of time that you spend in the post office. Solution: Let Ri denote the remaining service time of the customer with clerk i, i = 1, 2, and note, by the lack of memory property of exponentials, that R1 and R2 are independent exponential random variables with respective rates λ1 and λ2 . Conditioning on which of R1 or R2 is the smallest yields E[T ] = E[T |R1 < R2 ]P{R1 < R2 } + E[T |R2 " R1 ]P{R2 " R1 } λ1 λ2 = E[T |R1 < R2 ] + E[T |R2 " R1 ] λ1 + λ2 λ1 + λ2

Now, with S denoting your service time

E[T |R1 < R2 ] = E[R1 + S|R1 < R2 ]

= E[R1 |R1 < R2 ] + E[S|R1 < R2 ] 1 = E[R1 |R1 < R2 ] + λ1 1 1 = + λ1 + λ2 λ1

The Exponential Distribution and the Poisson Process

291

The final equation used that conditional on R1 < R2 the random variable R1 is the minimum of R1 and R2 and is thus exponential with rate λ1 + λ2 ; and also that conditional on R1 < R2 you are served by server 1. As we can show in a similar fashion that E[T |R2 " R1 ] = we obtain the result E[T ] =

1 1 + λ1 + λ2 λ2

3 λ1 + λ2

Another way to obtain E[T ] is to write T as a sum, take expectations, and then condition where needed. This approach yields E[T ] = E[min(R1 , R2 ) + S] = E[min(R1 , R2 )] + E[S] 1 = + E[S] λ1 + λ2

To compute E[S], we condition on which of R1 and R2 is smallest. E[S] = E[S|R1 < R2 ] =

2 λ1 + λ2

λ1 λ2 + E[S|R2 " R1 ] λ1 + λ2 λ1 + λ2

Example 5.9 There are n cells in the body, of which cells 1, . . . , k are target cells. Associated with each cell is a weight, with wi being the weight associated with cell i, i = 1, . . . , n. The cells are destroyed one at a time in a random order, which is such that if S is the current set of surviving cells then, independent of the order in which the cells / not in S have been destroyed, the next cell killed is i, i ∈ S, with probability wi / j∈S w j . In other words, the probability that a given surviving cell is the next one to be killed is the weight of that cell divided by the sum of the weights of all still surviving cells. Let A denote the total number of cells that are still alive at the moment when all the cells 1, 2, . . . , k have been killed, and find E[A]. Solution: Although it would be quite difficult to solve this problem by a direct combinatorial argument, a nice solution can be obtained by relating the order in which cells are killed to a ranking of independent exponential random variables. To do so, let X 1 , . . . , X n be independent exponential random variables, with X i having rate wi , i = 1,/ . . . , n. Note that X i will be the smallest of these exponentials with probability wi / j w j ; further,/given that X i is the smallest, X r will be the next smallest with probability wr / j̸=i w j ; further, given that X i and X r are, respectively, the first / and second smallest, X s , s ̸= i, r , will be the third smallest with probability ws / j̸=i,r w j ; and so on. Consequently, if we let I j be the index of the jth smallest of X 1 , . . . , X n —so that X I1 < X I2 < · · · < X In —then the order

292

Introduction to Probability Models

in which the cells are destroyed has the same distribution as I1 , . . . , In . So, let us suppose that the order in which the cells are killed is determined by the ordering of X 1 , . . . , X n . (Equivalently, we can suppose that all cells will eventually be killed, with cell i being killed at time X i , i = 1, . . . , n.) If we let A j equal 1 if cell j is still alive at the moment when all the cells 1, . . . , k have been killed, and let it equal 0 otherwise, then A=

n .

j=k+1

Because cell j will be alive at the moment when all the cells 1, . . . , k have been killed if X j is larger than all the values X 1 , . . . , X k , we see that for j > k E[A j ] = P{A j = 1} = P{X j > max X i } i=1,...,k " ∞ ! = P X j > max X i |X j = x w j e−w j x d x i=1,...,k 0 " ∞ P{X i < x for all i = 1, . . . , k}w j e−w j x d x = = =

k ∞0 i=1

k 10

0 i=1

(1 − e−wi x ) w j e−w j x d x

(1 − y wi /w j ) dy

where the final equality follows from the substitution y = e−w j x . Thus, we obtain the result E[A] =

" n .

k 10

j=k+1 0 i=1

(1 − y

wi /w j

) dy =

n k . 0

j=k+1 i=1

(1 − y wi /w j ) dy

Example 5.10 Suppose that customers are in line to receive service that is provided sequentially by a server; whenever a service is completed, the next person in line enters the service facility. However, each waiting customer will only wait an exponentially distributed time with rate θ ; if its service has not yet begun by this time then it will immediately depart the system. These exponential times, one for each waiting customer, are independent. In addition, the service times are independent exponential random variables with rate µ. Suppose that someone is presently being served and consider the person who is nth in line. (a) Find Pn , the probability that this customer is eventually served. (b) Find Wn , the conditional expected amount of time this person spends waiting in line given that she is eventually served.

The Exponential Distribution and the Poisson Process

293

Solution: Consider the n + 1 random variables consisting of the remaining service time of the person in service along with the n additional exponential departure times with rate θ of the first n in line. (a) Given that the smallest of these n +1 independent exponentials is the departure time of the nth person in line, the conditional probability that this person will be served is 0; on the other hand, given that this person’s departure time is not the smallest, the conditional probability that this person will be served is the same as if it were initially in position n − 1. Since the probability that a given departure time is the smallest of the n + 1 exponentials is θ/(nθ + µ), we obtain Pn =

(n − 1)θ + µ Pn−1 nθ + µ

Using the preceding with n − 1 replacing n gives Pn =

(n − 1)θ + µ (n − 2)θ + µ (n − 2)θ + µ Pn−2 = Pn−2 nθ + µ (n − 1)θ + µ nθ + µ

Continuing in this fashion yields the result Pn =

θ +µ µ P1 = nθ + µ nθ + µ

(b) To determine an expression for Wn , we use the fact that the minimum of independent exponentials is, independent of their rank ordering, exponential with a rate equal to the sum of the rates. Since the time until the nth person in line enters service is the minimum of these n + 1 random variables plus the additional time thereafter, we see, upon using the lack of memory property of exponential random variables, that Wn =

1 + Wn−1 nθ + µ

Repeating the preceding argument with successively smaller values of n yields the solution Wn =

5.2.4

n . i=1

1 iθ + µ

Convolutions of Exponential Random Variables

Let X i , i = 1, . . . , n, be independent exponential random variables with respective rates /n λi , i = 1, . . . , n, and suppose that λi ̸= λ j for i ̸= j. The random variable i=1 X i is said to be a hypoexponential random variable. To compute its probability

294

Introduction to Probability Models

density function, let us start with the case n = 2. Now, " t f X 1 +X 2 (t) = f X 1 (s) f X 2 (t − s) ds 0 " t = λ1 e−λ1 s λ2 e−λ2 (t−s) ds 0 " t = λ1 λ2 e−λ2 t e−(λ1 −λ2 )s ds 0

λ1 = λ2 e−λ2 t (1 − e−(λ1 −λ2 )t ) λ1 − λ2 λ1 λ2 = λ2 e−λ2 t + λ1 e−λ1 t λ1 − λ2 λ2 − λ1 Using the preceding, a similar computation yields, when n = 3, f X 1 +X 2 +X 3 (t) =

3 .

λi e−λi t

i=1

which suggests the general result f X 1 + ··· +X n (t) = where Ci,n =

0 j̸=i

n .

60 j̸=i

λj 7 λ j − λi

Ci,n λi e−λi t

i=1

λj λ j − λi

We will now prove the preceding formula by induction on n. Since we have already established it for n = 2, assume it for n and consider n + 1 arbitrary independent exponentials X i with distinct rates λi , i = 1, . . . , n + 1. If necessary, renumber X 1 and X n+1 so that λn+1 < λ1 . Now, " t f X 1 + ··· +X n+1 (t) = f X 1 + ··· +X n (s)λn+1 e−λn+1 (t−s) ds 0

= =

n .

Ci,n

i=1

n . i=1

Ci,n

λi e−λi s λn+1 e−λn+1 (t−s) ds

λi λn+1 λn+1 e−λn+1 t + λi e−λi t λi − λn+1 λn+1 − λi

= K n+1 λn+1 e−λn+1 t +

n . i=1

Ci,n+1 λi e−λi t

, (5.7)

/n where K n+1 = i=1 Ci,n λi /(λi − λn+1 ) is a constant that does not depend on t. But, we also have that " t f X 1 +···+X n+1 (t) = f X 2 +···+X n+1 (s)λ1 e−λ1 (t−s) ds 0

The Exponential Distribution and the Poisson Process

295

which implies, by the same argument that resulted in Equation (5.7), that for a constant K1 f X 1 +···+X n+1 (t) = K 1 λ1 e−λ1 t +

n+1 .

Ci,n+1 λi e−λi t

i=2

Equating these two expressions for f X 1 +···+X n+1 (t) yields K n+1 λn+1 e−λn+1 t + C1,n+1 λ1 e−λ1 t = K 1 λ1 e−λ1 t + Cn+1,n+1 λn+1 e−λn+1 t Multiplying both sides of the preceding equation by eλn+1 t and then letting t → ∞ yields [since e−(λ1 −λn+1 )t → 0 as t → ∞] K n+1 = Cn+1,n+1 and this, using /n Equation (5.7), completes the induction proof. Thus, we have shown X i , then that if S = i=1 f S (t) =

n .

Ci,n λi e−λi t

(5.8)

i=1

where Ci,n =

0 j̸=i

λj λ j − λi

Integrating both sides of the expression for f S from t to ∞ yields that the tail distribution function of S is given by P{S > t} =

n .

Ci,n e−λi t

(5.9)

i=1

Hence, we obtain from Equations (5.8) and (5.9) that r S (t), the failure rate function of S, is as follows: /n Ci,n λi e−λi t r S (t) = /i=1 n −λi t i=1 Ci,n e

If we let λ j = min(λ1 , . . . , λn ), then it follows, upon multiplying the numerator and denominator of r S (t) by eλ j t , that lim r S (t) = λ j

t→∞

From the preceding, we can conclude that the remaining lifetime of a hypoexponentially distributed item that has survived to age t is, for t large, approximately that of an exponentially distributed random variable with a rate equal to the minimum of the rates of the random variables whose sums make up the hypoexponential.

296

Introduction to Probability Models

Remark Although

∞

f S (t) dt =

n . i=1

Ci,n =

n 0 . i=1 j̸=i

λj λ j − λi

it should not be thought that the Ci,n , i = 1, . . . , n are probabilities, because some of them will be negative. Thus, while the form of the hypoexponential density is similar to that of the hyperexponential density (see Example 5.6) these two random variables are very different. Example 5.11 Let X 1 , . . . , X m be independent exponential random variables with λ j when i ̸= j. Let N be independent of respective rates λ1 , . . . , λm , where λi ̸=/ these random variables and suppose that m n=1 Pn = 1, where Pn = P{N = n}. The random variable Y =

N .

j=1

is said to be a Coxian random variable. Conditioning on N gives its density function:

f Y (t) = = = =

m . n=1 m . n=1 m . n=1 m . n=1

f Y (t|N = n)Pn f X 1 + ··· +X n (t|N = n)Pn f X 1 + ··· +X n (t)Pn Pn

n .

Ci,n λi e−λi t

i=1

Let r (n) = P{N = n|N ! n} If we interpret N as a lifetime measured in discrete time periods, then r (n) denotes the probability that an item will die in its nth period of use given that it has survived up to that time. Thus, r (n) is the discrete time analog of the failure rate function r (t), and is correspondingly referred to as the discrete time failure (or hazard) rate function.

The Exponential Distribution and the Poisson Process

297

Coxian random variables often arise in the following manner. Suppose that an item must go through m stages of treatment to be cured. However, suppose that after each stage there is a probability that the item will quit the program. If we suppose that the amounts of time that it takes the item to pass through the successive stages are independent exponential random variables, and that the probability that an item that has just completed stage n quits the program is (independent of how long it took to go through the n stages) equal to r (n), then the total time that an item spends in the program is a Coxian random variable. #

5.3 5.3.1

The Poisson Process Counting Processes

A stochastic process {N (t), t ! 0} is said to be a counting process if N (t) represents the total number of “events” that occur by time t. Some examples of counting processes are the following: (a) If we let N (t) equal the number of persons who enter a particular store at or prior to time t, then {N (t), t ! 0} is a counting process in which an event corresponds to a person entering the store. Note that if we had let N (t) equal the number of persons in the store at time t, then {N (t), t ! 0} would not be a counting process (why not?). (b) If we say that an event occurs whenever a child is born, then {N (t), t ! 0} is a counting process when N (t) equals the total number of people who were born by time t. (Does N (t) include persons who have died by time t? Explain why it must.) (c) If N (t) equals the number of goals that a given soccer player scores by time t, then {N (t), t ! 0} is a counting process. An event of this process will occur whenever the soccer player scores a goal. From its definition we see that for a counting process N (t) must satisfy: (i) (ii) (iii) (iv)

N (t) ! 0. N (t) is integer valued. If s < t, then N (s) " N (t). For s < t, N (t) − N (s) equals the number of events that occur in the interval (s, t].

A counting process is said to possess independent increments if the numbers of events that occur in disjoint time intervals are independent. For example, this means that the number of events that occur by time 10 (that is, N (10)) must be independent of the number of events that occur between times 10 and 15 (that is, N (15) − N (10)). The assumption of independent increments might be reasonable for example (a), but it probably would be unreasonable for example (b). The reason for this is that if in example (b) N (t) is very large, then it is probable that there are many people alive at time t; this would lead us to believe that the number of new births between time t and time t + s would also tend to be large (that is, it does not seem reasonable that N (t)

298

Introduction to Probability Models

is independent of N (t + s) − N (t), and so {N (t), t ! 0} would not have independent increments in example (b)). The assumption of independent increments in example (c) would be justified if we believed that the soccer player’s chances of scoring a goal today do not depend on “how he’s been going.” It would not be justified if we believed in “hot streaks” or “slumps.” A counting process is said to possess stationary increments if the distribution of the number of events that occur in any interval of time depends only on the length of the time interval. In other words, the process has stationary increments if the number of events in the interval (s, s + t) has the same distribution for all s. The assumption of stationary increments would only be reasonable in example (a) if there were no times of day at which people were more likely to enter the store. Thus, for instance, if there was a rush hour (say, between 12 P.M. and 1 P.M.) each day, then the stationarity assumption would not be justified. If we believed that the earth’s population is basically constant (a belief not held at present by most scientists), then the assumption of stationary increments might be reasonable in example (b). Stationary increments do not seem to be a reasonable assumption in example (c) since, for one thing, most people would agree that the soccer player would probably score more goals while in the age bracket 25–30 than he would while in the age bracket 35–40. It may, however, be reasonable over a smaller time horizon, such as one year.

5.3.2

Definition of the Poisson Process

One of the most important types of counting process is the Poisson process. As a prelude to giving its definition, we define the concept of a function f (·) being o(h). Definition 5.1 The function f (·) is said to be o(h) if lim

h→0

f (h) =0 h

Example 5.12 (a) The function f (x) = x 2 is o(h) since lim

h→0

f (h) h2 = lim = lim h = 0 h→0 h h→0 h

(b) The function f (x) = x is not o(h) since lim

h→0

f (h) h = lim = lim 1 = 1 ̸= 0 h→0 h h→0 h

h→0

f (h) + g(h) f (h) g(h) = lim + lim =0+0=0 h→0 h h→0 h h

(d) If f (·) is o(h), then so is g(·) = c f (·). This follows since lim

h→0

f (h) c f (h) = c lim =c·0=0 h h

The Exponential Distribution and the Poisson Process

299

(e) From (c) and (d) it follows that any finite linear combination of functions, each of which is o(h), is o(h). # In order for the function f (·) to be o(h) it is necessary that f (h)/h go to zero as h goes to zero. But if h goes to zero, the only way for f (h)/h to go to zero is for f (h) to go to zero faster than h does. That is, for h small, f (h) must be small compared with h. The o(h) notation can be used to make statements more precise. For instance, if X is continuous with density f and failure rate function λ(t), then the approximate statements P(t < X < t + h) ≈ f (t) h

P(t < X < t + h|X > t) ≈ λ(t) h can be precisely expressed as

P(t < X < t + h) = f (t) h + o(h)

P(t < X < t + h|X > t) = λ(t) h + o(h)

We are now in position to define the Poisson process. Definition 5.2 The counting process {N (t), t ! 0} is said to be a Poisson process with rate λ > 0 if the following axioms hold: (i) (ii) (iii) (iv)

N (0) = 0 {N (t), t ! 0} has independent increments P(N (t + h) − N (t) = 1) = λh + o(h) P(N (t + h) − N (t) ! 2) = o(h)

The preceding is called a Poisson process because the number of events in any interval of length t is Poisson distributed with mean λt, as is shown by the following important theorem. Theorem 5.1 If {N (t), t ! 0} is a Poisson process with rate λ > 0, then for all s > 0, t > 0, N (s + t) − N (s) is a Poisson random variable with mean λt. That is, the number of events in any interval of length t is a Poisson random variable with mean λt. Proof. We begin by deriving E[e−u N (t) ], the Laplace transform of N (t). To do so, fix u > 0 and define g(t) = E[e−u N (t) ] We will obtain g(t) by deriving a differential equation as follows. g(t + h) = E[e−u N (t+h) ]

= E[e−u(N (t)+N (t+h)−N (t) ] = E[e−u N (t) e−u(N (t+h)−N (t)) ]

= E[e−u N (t) ] E[e−u(N (t+h)−N (t)) ] (by independent increments)

= g(t) E[e−u(N (t+h)−N (t)) ]

(5.10)

300

Introduction to Probability Models

Now, from Axioms (iii) and (iv) P{N (t + h) − N (t) = 0} = 1 − λh + o(h) P{N (t + h) − N (t) = 1} = λh + o(h) P{N (t + h) − N (t) ! 2} = o(h)

Conditioning on which of these three possibilities occurs gives that E[e−u[N (t+h)−N (t)] ] = 1 − λh + o(h) + e−u (λh + o(h)) + o(h) = 1 − λh + e−u λh + o(h)

(5.11)

Therefore, from Equations (5.10) and (5.11) we obtain g(t + h) = g(t)(1 + λh(e−u − 1) + o(h)) which can be written as g(t + h) − g(t) o(h) = g(t)λ(e−u − 1) + h h Letting h → 0 yields the differential equation g ′ (t) = g(t) λ(e−u − 1)

or g ′ (t) = λ(e−u − 1) g(t) Noting that the left side is the derivative of log(g(t)) yields, upon integration, that log(g(t)) = λ(e−u − 1)t + C

Because g(0) = E[e−u N (0) ] = 1 it follows that C = 0, and so the Laplace transform of N (t) is E[e−u N (t) ] = g(t) = eλt(e

−u −1)

However, if X is a Poisson random variable with mean λt, then its Laplace transform is . e−ui e−λt (λt)i /i! E[e−u X ] = i

−λt

. i

(λte−u )i /i! = e−λt eλte

−u

= eλt(e

−u −1)

Because the Laplace transform uniquely determines the distribution, we can thus conclude that N (t) is Poisson with mean λt. To show that N (s + t) − N (s) is also Poisson with mean λt, fix s and let Ns (t) = N (s + t) − N (s) equal the number of events in the first t time units when we start our count at time s. It is now straightforward to verify that the counting process {Ns (t), t ! 0} satisfies all the axioms for being a Poisson process with rate λ. Consequently, by our preceding result, we can conclude that Ns (t) is Poisson distributed with mean λt. #

The Exponential Distribution and the Poisson Process

301

Remarks (i) The result that N (t), or more generally N (t + s) − N (s), has a Poisson distribution is a consequence of the Poisson approximation to the binomial distribution (see Section 2.2.4). To see this, subdivide the interval [0, t] into k equal parts where k is very large (Figure 5.1). Now it can be shown using axiom (iv) of Definition 5.2 that as k increases to ∞ the probability of having two or more events in any of the k subintervals goes to 0. Hence, N (t) will (with a probability going to 1) just equal the number of subintervals in which an event occurs. However, by stationary and independent increments this number will have a binomial distribution with parameters k and p = λt/k + o(t/k). Hence, by the Poisson approximation to the binomial we see by letting k approach ∞ that N (t) will have a Poisson distribution with mean equal to + ,% $ t to(t/k) t = λt + lim lim k λ + o k→∞ k→∞ t/k k k = λt

by using the definition of o(h) and the fact that t/k → 0 as k → ∞. (ii) Because the distribution of N (t + s) − N (s) is the same for all s, it follows that the Poisson process has stationary increments.

5.3.3

Interarrival and Waiting Time Distributions

Consider a Poisson process, and let us denote the time of the first event by T1 . Further, for n > 1, let Tn denote the elapsed time between the (n − 1)st and the nth event. The sequence {Tn , n = 1, 2, . . .} is called the sequence of interarrival times. For instance, if T1 = 5 and T2 = 10, then the first event of the Poisson process would have occurred at time 5 and the second at time 15. We shall now determine the distribution of the Tn . To do so, we first note that the event {T1 > t} takes place if and only if no events of the Poisson process occur in the interval [0, t] and thus, P{T1 > t} = P{N (t) = 0} = e−λt

Hence, T1 has an exponential distribution with mean 1/λ. Now, P{T2 > t} = E[P{T2 > t|T1 }]

However,

P{T2 > t | T1 = s} = P{0 events in (s, s + t] | T1 = s} = P{0 events in (s, s + t]} = e−λt

Figure 5.1

(5.12)

302

Introduction to Probability Models

where the last two equations followed from independent and stationary increments. Therefore, from Equation (5.12) we conclude that T2 is also an exponential random variable with mean 1/λ and, furthermore, that T2 is independent of T1 . Repeating the same argument yields the following. Proposition 5.1 Tn , n = 1, 2, . . . , are independent identically distributed exponential random variables having mean 1/λ. Remark The proposition should not surprise us. The assumption of stationary and independent increments is basically equivalent to asserting that, at any point in time, the process probabilistically restarts itself. That is, the process from any point on is independent of all that has previously occurred (by independent increments), and also has the same distribution as the original process (by stationary increments). In other words, the process has no memory, and hence exponential interarrival times are to be expected. Another quantity of interest is Sn , the arrival time of the nth event, also called the waiting time until the nth event. It is easily seen that Sn =

n .

Ti ,

n!1

i=1

and hence from Proposition 5.1 and the results of Section 2.2 it follows that Sn has a gamma distribution with parameters n and λ. That is, the probability density of Sn is given by f Sn (t) = λe−λt

(λt)n−1 , (n − 1)!

t !0

(5.13)

Equation (5.13) may also be derived by noting that the nth event will occur prior to or at time t if and only if the number of events occurring by time t is at least n. That is, N (t) ! n

⇔

Sn " t

Hence, FSn (t) = P{Sn " t} = P{N (t) ! n} =

∞ .

e−λt

j=n

(λt) j j!

which, upon differentiation, yields f Sn (t) = −

∞ .

= λe

j=n

−λt

λe−λt

∞

(λt) j . −λt (λt) j−1 + λe j! ( j − 1)! j=n

(λt)n−1

(n − 1)!

(λt)n−1 = λe−λt (n − 1)!

∞ .

j=n+1

λe

−λt

∞

(λt) j−1 . −λt (λt) j − λe ( j − 1)! j! j=n

The Exponential Distribution and the Poisson Process

303

Example 5.13 Suppose that people immigrate into a territory at a Poisson rate λ = 1 per day. (a) What is the expected time until the tenth immigrant arrives? (b) What is the probability that the elapsed time between the tenth and the eleventh arrival exceeds two days? Solution: (a) E[S10 ] = 10/λ = 10 days. (b) P{T11 > 2} = e−2λ = e−2 ≈ 0.133.

Proposition 5.1 also gives us another way of defining a Poisson process. Suppose we start with a sequence {Tn , n ! 1} of independent identically distributed exponential random variables each having mean 1/λ. Now let us define a counting process by saying that the nth event of this process occurs at time Sn ≡ T1 + T2 + · · · + Tn

The resultant counting process {N (t), t ! 0}∗ will be Poisson with rate λ.

Remark Another way of obtaining the density function of Sn is to note that because Sn is the time of the nth event, P{t < Sn < t + h} = P{N (t) = n − 1, one event in (t, t + h)} + o(h) = P{N (t) = n − 1}P{one event in (t, t + h)} + o(h) (λt)n−1 [λh + o(h)] + o(h) = e−λt (n − 1)! (λt)n−1 h + o(h) = λe−λt (n − 1)! where the first equality uses the fact that the probability of 2 or more events in (t, t + h) is o(h). If we now divide both sides of the preceding equation by h and then let h → 0, we obtain f Sn (t) = λe−λt

5.3.4

(λt)n−1 (n − 1)!

Further Properties of Poisson Processes

Consider a Poisson process {N (t), t ! 0} having rate λ, and suppose that each time an event occurs it is classified as either a type I or a type II event. Suppose further that each event is classified as a type I event with probability p or a type II event with probability 1 − p, independently of all other events. For example, suppose that customers arrive at a store in accordance with a Poisson process having rate λ; and suppose that each arrival is male with probability 21 and female with probability 21 . Then a type I event would correspond to a male arrival and a type II event to a female arrival. ∗

A formal definition of N (t) is given by N (t) ≡ max{n: Sn " t} where S0 ≡ 0.

304

Introduction to Probability Models

Let N1 (t) and N2 (t) denote respectively the number of type I and type II events occurring in [0, t]. Note that N (t) = N1 (t) + N2 (t).

Proposition 5.2 {N1 (t), t ! 0} and {N2 (t), t ! 0} are both Poisson processes having respective rates λ p and λ(1 − p). Furthermore, the two processes are independent. Proof. It is easy to verify that {N1 (t), t ! 0} is a Poisson process with rate λ p by verifying that it satisfies Definition 5.3.

• N1 (0) = 0 follows from the fact that N (0) = 0. • It is easy to see that {N1 (t), t ! 0} inherits the stationary and independent increment properties of the process {N (t), t ! 0}. This is true because the distribution of the number of type I events in an interval can be obtained by conditioning on the number of events in that interval, and the distribution of this latter quantity depends only on the length of the interval and is independent of what has occurred in any nonoverlapping interval. • P{N1 (h) = 1} = P{N1 (h) = 1 | N (h) = 1}P{N (h) = 1} +P{N1 (h) = 1 | N (h) ! 2}P{N (h) ! 2}

= p(λh + o(h)) + o(h) = λ ph + o(h) • P{N1 (h) ! 2} " P{N (h) ! 2} = o(h)

Thus we see that {N1 (t), t ! 0} is a Poisson process with rate λ p and, by a similar argument, that {N2 (t), t ! 0} is a Poisson process with rate λ(1 − p). Because the probability of a type I event in the interval from t to t + h is independent of all that occurs in intervals that do not overlap (t, t + h), it is independent of knowledge of when type II events occur, showing that the two Poisson processes are independent. (For another way of proving independence, see Example 3.23.) # Example 5.14 If immigrants to area A arrive at a Poisson rate of ten per week, and if 1 , then what is the probability each immigrant is of English descent with probability 12 that no people of English descent will emigrate to area A during the month of February? Solution: By the previous proposition it follows that the number of Englishmen emigrating to area A during the month of February is Poisson distributed with mean 1 −10/3 . = 10 # 4 · 10 · 12 3 . Hence, the desired probability is e

Example 5.15 Suppose nonnegative offers to buy an item that you want to sell arrive according to a Poisson process with rate λ. Assume that each offer is the value of a continuous random variable having density function f (x). Once the offer is presented to you, you must either accept it or reject it and wait for the next offer. We suppose that you incur costs at a rate c per unit time until the item is sold, and that your objective is to maximize your expected total return, where the total return is equal to the amount received minus the total cost incurred. Suppose you employ the policy of accepting the first offer that is greater than some specified value y. (Such a type of policy, which we call a y-policy, can be shown to be optimal.) What is the best value of y? What is the maximal expected net return?

The Exponential Distribution and the Poisson Process

305

Solution: Let us compute the expected total return when you use the y-policy, and then choose the value of y that maximizes this quantity. Let X denote the value of 8∞ ¯ a random offer, and let F(x) = P{X > x} = x f (u) du be its tail distribution ¯ function. Because each offer will be greater than y with probability F(y), it follows ¯ that such offers occur according to a Poisson process with rate λ F(y). Hence, the ¯ time until an offer is accepted is an exponential random variable with rate λ F(y). Letting R(y) denote the total return from the policy that accepts the first offer that is greater than y, we have E[R(y)] = E[accepted offer] − cE[time to accept] c = E[X |X > y] − ¯ λ F(y) " ∞ c x f X |X >y (x) d x − = ¯ λ F(y) 0 " ∞ c f (x) dx − x = ¯ ¯ λ F(y) F(y) y 8∞ y x f (x) d x − c/λ = ¯ F(y)

(5.14)

Differentiation yields d ¯ E[R(y)] = 0 ⇔ − F(y)y f (y) + dy

∞ y

x f (x) d x −

c λ

f (y) = 0

Therefore, the optimal value of y satisfies ¯ y F(y) =

∞ y

x f (x) d x −

c λ

or y

∞ y

f (x) d x =

∞ y

x f (x) d x −

c λ

or "

∞ y

(x − y) f (x) d x =

c λ

It is not difficult to show that there is a unique value of y that satisfies the preceding. Hence, the optimal policy is the one that accepts the first offer that is greater than y ∗ , where y ∗ is such that "

∞ y∗

(x − y ∗ ) f (x) d x = c/λ

306

Introduction to Probability Models

Putting y = y ∗ in Equation (5.14) shows that the maximal expected net return is " ∞ 1 E[R(y ∗ )] = (x − y ∗ + y ∗ ) f (x) d x − c/λ) ( ¯ ∗ ) y∗ F(y " ∞ " ∞ 1 ∗ ∗ = (x − y ) f (x) d x + y f (x) d x − c/λ) ( ¯ ∗ ) y∗ F(y y∗ 1 ¯ ∗ ) − c/λ) = (c/λ + y ∗ F(y ¯ F(y ∗ ) = y∗ Thus, the optimal critical value is also the maximal expected net return. To understand why this is so, let m be the maximal expected net return, and note that when an offer is rejected the problem basically starts anew and so the maximal expected additional net return from then on is m. But this implies that it is optimal to accept an offer if and only if it is at least as large as m, showing that m is the optimal critical value. # It follows from Proposition 5.2 that if each of a Poisson number of individuals is independently classified into one of two possible groups with respective probabilities p and 1 − p, then the number of individuals in each of the two groups will be independent Poisson random variables. Because this result easily generalizes to the case where the classification is into any one of r possible groups, we have the following application to a model of employees moving about in an organization. Example 5.16 Consider a system in which individuals at any time are classified as being in one of r possible states, and assume that an individual changes states in accordance with a Markov chain having transition probabilities Pi j , i, j = 1, . . . , r . That is, if an individual is in state i during a time period then, independently of its previous states, it will be in state j during the next time period with probability Pi j . The individuals are assumed to move through the system independently of each other. Suppose that the numbers of people initially in states 1, 2, . . . , r are independent Poisson random variables with respective means λ1 , λ2 , . . . , λr . We are interested in determining the joint distribution of the numbers of individuals in states 1, 2, . . . , r at some time n. Solution: For fixed i, let N j (i), j = 1, . . . , r denote the number of those individuals, initially in state i, that are in state j at time n. Now each of the (Poisson distributed) number of people initially in state i will, independently of each other, be in state j at time n with probability Pinj , where Pinj is the n-stage transition probability for the Markov chain having transition probabilities Pi j . Hence, the N j (i), j = 1, . . . , r will be independent Poisson random variables with respective means λi Pinj , j = 1, . . . , r . Because the sum of independent Poisson random variables is itself a Poisson random/variable, it follows that the number of individuals r in state j at time n—namely / i=1 N j (i)—will be independent Poisson random # variables with respective means i λi Pinj , for j = 1, . . . , r .

Example 5.17 (The Coupon Collecting Problem) There are m different types of coupons. Each time a person collects a coupon it is, independently of ones previously

The Exponential Distribution and the Poisson Process

307

/ obtained, a type j coupon with probability p j , mj=1 p j = 1. Let N denote the number of coupons one needs to collect in order to have a complete collection of at least one of each type. Find E[N ]. Solution: If we let N j denote the number one must collect to obtain a type j coupon, then we can express N as N = max N j 1! j!m

However, even though each N j is geometric with parameter p j , the foregoing representation of N is not that useful, because the random variables N j are not independent. We can, however, transform the problem into one of determining the expected value of the maximum of independent random variables. To do so, suppose that coupons are collected at times chosen according to a Poisson process with rate λ = 1. Say that an event of this Poisson process is of type j, 1 " j " m, if the coupon obtained at that time is a type j coupon. If we now let N j (t) denote the number of type j coupons collected by time t, then it follows from Proposition 5.2 that {N j (t), t ! 0}, j = 1, . . . , m are independent Poisson processes with respective rates λ p j = p j . Let X j denote the time of the first event of the jth process, and let X = max X j 1! j!m

denote the time at which a complete collection is amassed. Since the X j are independent exponential random variables with respective rates p j , it follows that P{X < t} = P{max1! j!m X j < t} = P{X j < t, for j = 1, . . . , m} m 0 = (1 − e− p j t ) j=1

Therefore, E[X ] =

∞

P{X > t} dt ⎫ ⎧ " ∞⎨ m ⎬ 0 = (1 − e− p j t ) dt 1− ⎭ 0 ⎩ 0

(5.15)

j=1

It remains to relate E[X ], the expected time until one has a complete set, to E[N ], the expected number of coupons it takes. This can be done by letting Ti denote the ith interarrival time of the Poisson process that counts the number of coupons obtained. Then it is easy to see that X=

N . i=1

308

Introduction to Probability Models

Since the Ti are independent exponentials with rate 1, and N is independent of the Ti , we see that E[X |N ] = N E[Ti ] = N Therefore, E[X ] = E[N ] and so E[N ] is as given in Equation (5.15). Let us now compute the expected number of types that appear only once in the complete collection. Letting Ii equal 1 if there is only a single type i coupon in the final set, and letting it equal 0 otherwise, we thus want ? m @ m . . Ii = E[Ii ] E i=1

i=1

m . i=1

P{Ii = 1}

Now there will be a single type i coupon in the final set if a coupon of each type has appeared before the second coupon of type i is obtained. Thus, letting Si denote the time at which the second type i coupon is obtained, we have P{Ii = 1} = P{X j < Si , for all j ̸= i} Using that Si has a gamma distribution with parameters (2, pi ), this yields " ∞ P{X j < Si for all j ̸= i|Si = x} pi e− pi x pi x d x P{Ii = 1} = 0 " ∞ P{X j < x, for all j ̸= i} pi2 x e− pi x d x = 0 " ∞0 (1 − e− p j x ) pi2 xe− pi x d x = 0

j̸=i

Therefore, we have the result ? m @ " ∞. m 0 . E Ii = (1 − e− p j x ) pi2 xe− pi x d x 0

i=1

∞

i=1 j̸=i m 0

j=1

(1 − e− p j x )

m . i=1

pi2

e− pi x dx 1 − e− pi x

The next probability calculation related to Poisson processes that we shall determine is the probability that n events occur in one Poisson process before m events have occurred in a second and independent Poisson process. More formally let {N1 (t), t ! 0}

The Exponential Distribution and the Poisson Process

309

and {N2 (t), t ! 0} be two independent Poisson processes having respective rates λ1 and λ2 . Also, let Sn1 denote the time of the nth event of the first process, and Sm2 the time of the mth event of the second process. We seek B A P Sn1 < Sm2

Before attempting to calculate this for general n and m, let us consider the special case n = m = 1. Since S11 , the time of the first event of the N1 (t) process, and S12 , the time of the first event of the N2 (t) process, are both exponentially distributed random variables (by Proposition 5.1) with respective means 1/λ1 and 1/λ2 , it follows from Section 5.2.3 that A B λ1 P S11 < S12 = (5.16) λ1 + λ2

Let us now consider the probability that two events occur in the N1 (t) process before a single event has occurred in the N2 (t) process. That is, P{S21 < S12 }. To calculate this we reason as follows: In order for the N1 (t) process to have two events before a single event occurs in the N2 (t) process, it is first necessary for the initial event that occurs to be an event of the N1 (t) process (and this occurs, by Equation (5.16), with probability λ1 /(λ1 + λ2 )). Now, given that the initial event is from the N1 (t) process, the next thing that must occur for S21 to be less than S12 is for the second event also to be an event of the N1 (t) process. However, when the first event occurs both processes start all over again (by the memoryless property of Poisson processes) and hence this conditional probability is also λ1 /(λ1 + λ2 ); thus, the desired probability is given by P

S21

0 we will condition on when the first type 1 event occurs. With X equal to the time of the first type 1 event (or to ∞ if there are no type 1 events), its distribution function is obtained by noting that X " y ⇔ N1 (y) > 0 thus showing that FX (y) = P(X " y) = P(N1 (y) > 0) = 1 − e−λ

8y 0

¯ G(t−s) ds

y"t

Differentiating gives the density function of X : ¯ − y) e−λ f X (y) = λG(t

8y 0

¯ G(t−s) ds

y"t

To use the identity P(R(t) = n) =

t 0

P(R(t) = n|X = y) f X (y) dy

(5.19)

note that if X = y " t then the leading car that is on the road at time t entered at time y. Because all other cars that arrive between y and t will also be on the road at time t, it follows that, conditional on X = y, the number of cars on the road at time t will be distributed as 1 plus a Poisson random variable with mean λ(t − y). Therefore, for n > 0 1 n−1 e−λ(t−y) (λ(t−y)) , if y " t (n−1)! P(R(t) = n|X = y) = 0, if y = ∞

The Exponential Distribution and the Poisson Process

315

Substituting this into Equation (5.19) yields " t 8y (λ(t − y))n−1 ¯ ¯ P(R(t) = n) = λG(t − y) e−λ 0 G(t−s) ds dy e−λ(t−y) (n − 1)! 0 (b) Let T be the free travel time of the car that enters the road at time y, and let A(y) be its actual travel time. To determine P(A(y) < x), let t = y + x and note that A(y) will be less than x if and only if both T < x and there have been no type 1 events (using t = y + x) before time y. That is, A(y) < x ⇔ T < x, N1 (y) = 0 Because T is independent of what has occurred prior to time y, the preceding gives P(A(y) < x) = P(T < x)P(N1 (y) = 0) = G(x)e−λ

= G(x)e−λ

8y 0

¯ G(y+x−s) ds

8 y+x

¯ G(u) du

Example 5.20 (Tracking the Number of HIV Infections) There is a relatively long incubation period from the time when an individual becomes infected with the HIV virus, which causes AIDS, until the symptoms of the disease appear. As a result, it is difficult for public health officials to be certain of the number of members of the population that are infected at any given time. We will now present a first approximation model for this phenomenon, which can be used to obtain a rough estimate of the number of infected individuals. Let us suppose that individuals contract the HIV virus in accordance with a Poisson process whose rate λ is unknown. Suppose that the time from when an individual becomes infected until symptoms of the disease appear is a random variable having a known distribution G. Suppose also that the incubation times of different infected individuals are independent. Let N1 (t) denote the number of individuals who have shown symptoms of the disease by time t. Also, let N2 (t) denote the number who are HIV positive but have not yet shown any symptoms by time t. Now, since an individual who contracts the virus at time s will have symptoms by time t with probability G(t − s) and will not ¯ − s), it follows from Proposition 5.3 that N1 (t) and N2 (t) are with probability G(t independent Poisson random variables with respective means " t " t G(t − s) ds = λ G(y) dy E[N1 (t)] = λ 0

and E[N2 (t)] = λ

t 0

¯ − s) ds = λ G(t

t 0

¯ G(y) dy

Now, if we knew λ, then we could use it to estimate N2 (t), the number of individuals infected but without any outward symptoms at time t, by its mean value E[N2 (t)].

316

Introduction to Probability Models

However, since λ is unknown, we must first estimate it. Now, we will presumably know the value of N1 (t), and so we can use its known value as an estimate of its mean E[N1 (t)]. That is, if the number of individuals who have exhibited symptoms by time t is n 1 , then we can estimate that n 1 ≈ E[N1 (t)] = λ

G(y) dy

Therefore, we can estimate λ by the quantity λˆ given by λˆ = n 1

t 0

G(y) dy

Using this estimate of λ, we can estimate the number of infected but symptomless individuals at time t by estimate of N2 (t) = λˆ

¯ G(y) dy

8t ¯ dy n 1 0 G(y) = 8t 0 G(y) dy

¯ For example, suppose that G is exponential with mean µ. Then G(y) = e−y/µ , and a simple integration gives that estimate of N2 (t) =

n 1 µ(1 − e−t/µ ) t − µ(1 − e−t/µ )

If we suppose that t = 16 years, µ = 10 years, and n 1 = 220 thousand, then the estimate of the number of infected but symptomless individuals at time 16 is estimate =

2,200(1 − e−1.6 ) = 218.96 16 − 10(1 − e−1.6 )

That is, if we suppose that the foregoing model is approximately correct (and we should be aware that the assumption of a constant infection rate λ that is unchanging over time is almost certainly a weak point of the model), then if the incubation period is exponential with mean 10 years and if the total number of individuals who have exhibited AIDS symptoms during the first 16 years of the epidemic is 220 thousand, then we can expect that approximately 219 thousand individuals are HIV positive though symptomless at time 16. # Proof of Proposition 5.3 Let us compute the joint probability P{Ni (t) = n i , i = 1, . . . , k}. To do so note first that in order for/there to have been n i type i events for k n i events. Hence, conditioning on i = 1, . . . , k there must have been a total of i=1

The Exponential Distribution and the Poisson Process

317

N (t) yields P{N1 (t) = n 1 , . . . , Nk (t) = n k } 1 =P

×P

k . # N1 (t) = n 1 , . . . , Nk (t) = n k # N (t) = ni

N (t) =

k . i=1

i=1

Now consider an arbitrary event that occurred in the interval [0, t]. If it had occurred at time s, then the probability that it would be a type i event would be Pi (s). Hence, since by Theorem 5.2 this event will have occurred at some time uniformly distributed on [0, t], it follows that the probability that this event will be a type i event is Pi =

1 t

t 0

Pi (s) ds

independently of the other events. Hence, P

k . # # ni Ni (t) = n i , i = 1, . . . , k N (t) = i=1

will just equal/the multinomial probability of n i type i outcomes for i = 1, . . . , k k n i independent trials results in outcome i with probability Pi , i = when each of i=1 1, . . . , k. That is, P

k . # N1 (t) = n 1 , . . . , Nk (t) = n k # N (t) = ni i=1

) ! n1 P · · · Pkn k = n1! · · · nk ! 1 ( /k

i=1 n i

Consequently,

P{N1 (t) = n 1 , . . . , Nk (t) = n k } / / ni ( i n i )! n 1 n k −λt (λt) i / P1 · · · Pk e = n1! · · · nk ! ( i n i )! =

k 0 i=1

e−λt Pi (λt Pi )n i /n i !

and the proof is complete.

We now present some additional examples of the usefulness of Theorem 5.2. Example 5.21 Insurance claims are made at times distributed according to a Poisson process with rate λ; the successive claim amounts are independent random variables having distribution G with mean µ, and are independent of the claim arrival times.

318

Introduction to Probability Models

Let Si and Ci denote, respectively, the time and the amount of the ith claim. The total discounted cost of all claims made up to time t, call it D(t), is defined by D(t) =

N (t) .

e−αSi Ci

i=1

where α is the discount rate and N (t) is the number of claims made by time t. To determine the expected value of D(t), we condition on N (t) to obtain E[D(t)] =

∞ . n=0

(λt)n n!

E[D(t)|N (t) = n]e−λt

Now, conditional on N (t) = n, the claim arrival times S1 , . . . , Sn are distributed as the ordered values U(1) , . . . , U(n) of n independent uniform (0, t) random variables U1 , . . . , Un . Therefore, ? n @ . −αU(i) E[D(t)|N (t) = n] = E Ci e i=1

= =

n . i=1

E[Ci e−αU(i) ] E[Ci ]E[e−αU(i) ]

where the final equality used the independence of the claim amounts and their arrival times. Because E[Ci ] = µ, continuing the preceding gives E[D(t)|N (t) = n] = µ

n . i=1

= µE = µE

E[e−αU(i) ]

? n .

i=1 ? n . i=1

−αU(i)

−αUi

The last equality because (1) , . . . , U(n) are the values U1 , . . . , Un in increasing /U /nfollows n e−αU(i) = i=1 e−αUi . Continuing the string of equalities yields order, and so i=1 E[D(t)|N (t) = n] = nµE[e−αU ] " µ t −αx e dx =n t 0 µ = n (1 − e−αt ) αt

The Exponential Distribution and the Poisson Process

319

Therefore, E[D(t)|N (t)] = N (t)

µ (1 − e−αt ) αt

Taking expectations yields the result E[D(t)] =

λµ (1 − e−αt ) α

Example 5.22 (An Optimization Example) Suppose that items arrive at a processing plant in accordance with a Poisson process with rate λ. At a fixed time T , all items are dispatched from the system. The problem is to choose an intermediate time, t ∈ (0, T ), at which all items in the system are dispatched, so as to minimize the total expected wait of all items. If we dispatch at time t, 0 < t < T , then the expected total wait of all items will be λ(T − t)2 λt 2 + 2 2 To see why this is true, we reason as follows: The expected number of arrivals in (0, t) is λt, and each arrival is uniformly distributed on (0, t), and hence has expected wait t/2. Thus, the expected total wait of items arriving in (0, t) is λt 2 /2. Similar reasoning holds for arrivals in (t, T ), and the preceding follows. To minimize this quantity, we differentiate with respect to t to obtain $ 2 % t (T − t)2 d λ +λ = λt − λ(T − t) dt 2 2 and equating to 0 shows that the dispatch time that minimizes the expected total wait is t = T /2. #

We end this section with a result, quite similar in spirit to Theorem 5.2, which states that given Sn , the time of the nth event, then the first n − 1 event times are distributed as the ordered values of a set of n − 1 random variables uniformly distributed on (0, Sn ). Proposition 5.4 Given that Sn = t, the set S1 , . . . , Sn−1 has the distribution of a set of n − 1 independent uniform (0, t) random variables. Proof. We can prove the preceding in the same manner as we did Theorem 5.2, or we can argue more loosely as follows: S1 , . . . , Sn−1 | Sn = t ∼ S1 , . . . , Sn−1 | Sn = t, N (t − ) = n − 1 ∼ S1 , . . . , Sn−1 | N (t − ) = n − 1

where ∼ means “has the same distribution as” and t − is infinitesimally smaller than t. The result now follows from Theorem 5.2. #

320

5.3.6

Introduction to Probability Models

Estimating Software Reliability

When a new computer software package is developed, a testing procedure is often put into effect to eliminate the faults, or bugs, in the package. One common procedure is to try the package on a set of well-known problems to see if any errors result. This goes on for some fixed time, with all resulting errors being noted. Then the testing stops and the package is carefully checked to determine the specific bugs that were responsible for the observed errors. The package is then altered to remove these bugs. Because we cannot be certain that all the bugs in the package have been eliminated, however, a problem of great importance is the estimation of the error rate of the revised software package. To model the preceding, let us suppose that initially the package contains an unknown number, m, of bugs, which we will refer to as bug 1, bug 2, . . . , bug m. Suppose also that bug i will cause errors to occur in accordance with a Poisson process having an unknown rate λi , i = 1, . . . , m. Then, for instance, the number of errors due to bug i that occurs in any s units of operating time is Poisson distributed with mean λi s. Also suppose that these Poisson processes caused by bugs i, i = 1, . . . , m are independent. In addition, suppose that the package is to be run for t time units with all resulting errors being noted. At the end of this time a careful check of the package is made to determine the specific bugs that caused the errors (that is, a debugging, takes place). These bugs are removed, and the problem is then to determine the error rate for the revised package. If we let ! 1, if bug i has not caused an error by t ψi (t) = 0, otherwise then the quantity we wish to estimate is . λi ψi (t) &(t) = i

the error rate of the final package. To start, note that . E[&(t)] = λi E[ψi (t)] i

λi e−λi t

(5.20)

Now each of the bugs that is discovered would have been responsible for a certain number of errors. Let us denote by M j (t) the number of bugs that were responsible caused exactly one error, for j errors, j ! 1. That is, M1 (t) is the number of bugs that / M2 (t) is the number that caused two errors, and so on, with j j M j (t) equaling the total number of errors that resulted. To compute E[M1 (t)], let us define the indicator variables, Ii (t), i ! 1, by ! 1, bug i causes exactly 1 error Ii (t) = 0, otherwise

The Exponential Distribution and the Poisson Process

321

Then, M1 (t) =

Ii (t)

and so E[M1 (t)] =

. i

E[Ii (t)] =

λi te−λi t

(5.21)

Thus, from (5.20) and (5.21) we obtain the intriguing result that % $ M1 (t) =0 E &(t) − t

(5.22)

Thus suggests the possible use of M1 (t)/t as an estimate of &(t). To determine whether or not M1 (t)/t constitutes a “good” estimate of &(t) we shall look at how far apart these two quantities tend to be. That is, we will compute ?+ , @ , + M1 (t) 2 M1 (t) &(t) − from (5.22) E = Var &(t) − t t 2 1 = Var(&(t)) − Cov(&(t), M1 (t)) + 2 Var(M1 (t)) t t

Now, Var(&(t)) = Var(M1 (t)) =

. i

Cov(&(t), M1 (t)) = Cov = =

λi2 Var(ψi (t)) = Var(Ii (t)) = +. i

.. i

. i

= −

λi ψi (t),

λi2 e−λi t (1 − e−λi t ),

λi te−λi t (1 − λi te−λi t ),

I j (t)

Cov(λi ψi (t), I j (t))

λi Cov(ψi (t), Ii (t))

λi e−λi t λi te−λi t

where the last two equalities follow since ψi (t) and I j (t) are independent when i ̸= j because they refer to different Poisson processes and ψi (t)Ii (t) = 0. Hence, we obtain ?+ , @ . M1 (t) 2 1. &(t) − E λi2 e−λi t + λi e−λi t = t t i

E[M1 (t) + 2M2 (t)] = t2

322

Introduction to Probability Models

where the last equality follows from (5.21) and the identity (which we leave as an exercise) E[M2 (t)] =

1. (λi t)2 e−λi t 2 i

Thus, we can estimate the average square of the difference between &(t) and M1 (t)/t by the observed value of M1 (t) + 2M2 (t) divided by t 2 .

Example 5.23 Suppose that in 100 units of operating time 20 bugs are discovered of which two resulted in exactly one, and three resulted in exactly two, errors. Then we would estimate that &(100) is something akin to the value of a random variable whose mean is equal to 1/50 and whose variance is equal to 8/10,000. #

5.4

Generalizations of the Poisson Process

5.4.1

Nonhomogeneous Poisson Process

In this section we consider two generalizations of the Poisson process. The first of these is the nonhomogeneous, also called the nonstationary, Poisson process, which is obtained by allowing the arrival rate at time t to be a function of t. Definition 5.3 The counting process {N (t), t ! 0} is said to be a nonhomogeneous Poisson process with intensity function λ(t), t ! 0, if (i) (ii) (iii) (iv)

N (0) = 0. {N (t), t ! 0} has independent increments. P{N (t + h) − N (t) ! 2} = o(h). P{N (t + h) − N (t) = 1} = λ(t)h + o(h).

The function m(t) defined by " t λ(y) dy m(t) = 0

is called the mean value function of the nonhomogeneous Poisson process, for reasons indicated in the following important theorem. Theorem 5.3 If {N (t), t ! 0} is a nonstationary Poisson process with intensity function λ(t), t ! 0,8 then N (t + s) − N (s) is a Poisson random variable with mean t+s m(t + s) − m(s) = s λ(y) dy.

Proof. We first show that N (t) is Poisson with mean m(t) by mimicking the proof of Theorem 5.1 for the stationary Poisson process. Letting g(t) = E[e−u N (t) ] and following the exact steps of that proof leads us to the equation g(t + h) = g(t) E[e−u Nt (h) ]

The Exponential Distribution and the Poisson Process

323

where Nt (h) = N (t + h) − N (t). Using that P(Nt (h) = 0) = 1 − λ(t)h + o(h), we obtain from Axioms (iii) and (iv) upon conditioning on whether Nt (h) is 0, 1, or ! 2, that g(t + h) = g(t)(1 − λ(t)h + e−u λ(t)h + o(h))

Hence,

g(t + h) − g(t) = g(t)λ(t)(e−u − 1)h + o(h)

Dividing by h and letting h → 0 yields the differential equation g ′ (t) = g(t)λ(t)(e−u − 1)

which can be written as g ′ (t) = λ(t)(e−u − 1) g(t)

Integrating both sides from 0 to t gives log(g(t)) − log(g(0)) = (e−u − 1) Using that g(0) = 1 and that g(t) = exp{m(t)(e

−u

8t 0

t 0

λ(t) dt

λ(t)dt = m(t) the preceding shows that

− 1)}

Thus E[e−u N (t) ], the Laplace transform of N (t), is exp{m(t)(e−u − 1)}. Because the latter is the Laplace transform of a Poisson random variable with mean m(t) it follows that N (t) is Poisson with mean m(t). The proposition now follows by noting that, with Ns (t) = N (s + t) − N (s), the counting process {Ns (t), t ! 0} is a nonstationary Poisson process with intensity function λs (t) = λ(s + t), t > 0. Consequently, Ns (t) is Poisson with mean " t " t " s+t λs (y) dy = λ(s + y) dy = λ(x) d x 0

and the result is proven.

8 s+t

Remark That N (s + t) − N (s) has a Poisson distribution with mean s λ(y) dy is a consequence of the Poisson limit of the sum of independent Bernoulli random variables (see Example 2.47). To see why, subdivide the interval [s, s + t] into n subintervals of length nt , where subinterval i goes from s + (i − 1) nt to s + i nt , i = 1, . . . , n. Let Ni = N (s + i nt ) − N (s + (i − 1) nt ) be the number of events that occur in subinterval i, and note that 2 n 3 E P{! 2 events in some subinterval} = P {Ni ! 2} i=1

n .

P{Ni ! 2}

i=1

= no(t/n)

by Axiom (iii)

324

Introduction to Probability Models

Because lim no(t/n) = lim t

n→∞

o(t/n) =0 t/n

it follows that, as n increases to ∞, the probability of having two or more events in any of the n subintervals goes to 0. Consequently, with a probability going to 1, N (t) will equal the number of subintervals in which an event occurs. Because the probability of an event in subinterval i is λ(s + i nt ) nt + o( nt ), it follows, because the number of events in different subintervals are independent, that when n is large the number of subintervals that contain an event is approximately a Poisson random variable with mean , n + . t t + no(t/n) λ s +i n n i=1

But, , " s+t n + . t t lim + no(t/n) = λ s +i λ(y) dy n→∞ n n s i=1

and the result follows.

Time sampling an ordinary Poisson process generates a nonhomogeneous Poisson process. That is, let {N (t), t ! 0} be a Poisson process with rate λ, and suppose that an event occurring at time t is, independently of what has occurred prior to t, counted with probability p(t). With Nc (t) denoting the number of counted events by time t, the counting process {Nc (t), t ! 0} is a nonhomogeneous Poisson process with intensity function λ(t) = λ p(t). This is verified by noting that {Nc (t), t ! 0} satisfies the nonhomogeneous Poisson process axioms. 1. Nc (0) = 0. 2. The number of counted events in (s, s + t) depends solely on the number of events of the Poisson process in (s, s + t), which is independent of what has occurred prior to time s. Consequently, the number of counted events in (s, s + t) is independent of the process of counted events prior to s, thus establishing the independent increment property. 3. Let Nc (t, t + h) = Nc (t + h) − Nc (t), with a similar definition of N (t, t + h). P{Nc (t, t + h) ! 2} " P{N (t, t + h) ! 2} = o(h) 4. To compute P{Nc (t, t + h) = 1}, condition on N (t, t + h). P{Nc (t, t + h) = 1}

= P{Nc (t, t + h) = 1|N (t, t + h) = 1}P{N (t, t + h) = 1} + P{Nc (t, t + h) = 1|N (t, t + h) ! 2}P{N (t, t + h) ! 2} = P{Nc (t, t + h) = 1|N (t, t + h) = 1}λh + o(h) = p(t)λh + o(h)

The Exponential Distribution and the Poisson Process

325

The importance of the nonhomogeneous Poisson process resides in the fact that we no longer require the condition of stationary increments. Thus we now allow for the possibility that events may be more likely to occur at certain times than during other times. Example 5.24 Siegbert runs a hot dog stand that opens at 8 A.M. From 8 until 11 A.M. customers seem to arrive, on the average, at a steadily increasing rate that starts with an initial rate of 5 customers per hour at 8 A.M. and reaches a maximum of 20 customers per hour at 11 A.M. From 11 A.M. until 1 P.M. the (average) rate seems to remain constant at 20 customers per hour. However, the (average) arrival rate then drops steadily from 1 P.M. until closing time at 5 P.M. at which time it has the value of 12 customers per hour. If we assume that the numbers of customers arriving at Siegbert’s stand during disjoint time periods are independent, then what is a good probability model for the preceding? What is the probability that no customers arrive between 8:30 A.M. and 9:30 A.M. on Monday morning? What is the expected number of arrivals in this period? Solution: A good model for the preceding would be to assume that arrivals constitute a nonhomogeneous Poisson process with intensity function λ(t) given by ⎧ 0"t "3 ⎨5 + 5t, 20, 3"t "5 λ(t) = ⎩ 20 − 2(t − 5), 5 " t " 9 and

λ(t) = λ(t − 9)

for t > 9

Note that N (t) represents the number of arrivals during the first t hours that the store is open. That is, we do not count the hours between 5 P.M. and 8 A.M. If for some reason we wanted N (t) to represent the number of arrivals during the first t hours regardless of whether the store was open or not, then, assuming that the process begins at midnight we would let ⎧ 0, 0"t "8 ⎪ ⎪ ⎪ ⎪ 8 " t " 11 ⎨5 + 5(t − 8), 11 " t " 13 λ(t) = 20, ⎪ ⎪ 20 − 2(t − 13), 13 " t " 17 ⎪ ⎪ ⎩ 0, 17 < t " 24 and

λ(t) = λ(t − 24)

for t > 24

As the number of arrivals between 8:30 A.M. and 9:30 A.M. will be Poisson with 17 mean m( 23 ) − m( 21 ) in the first representation (and m( 19 2 ) − m( 2 ) in the second representation), we have that the probability that this number is zero is ! " 3/2 exp − (5 + 5t) dt = e−10 1/2

326

Introduction to Probability Models

and the mean number of arrivals is " 3/2 (5 + 5t) dt = 10

1/2

Suppose that events occur according to a Poisson process with rate λ, and suppose that, independent of what has previously occurred, an event at time s is a type 1 event with probability P1 (s) or a type 2 event with probability P2 (s) = 1 − P1 (s). If Ni (t), t ! 0, denotes the number of type i events by time t, then it easily follows from Definition 5.3 that {N1 (t), t ! 0} and {N2 (t), t ! 0} are independent nonhomogeneous Poisson processes with respective intensity functions λi (t) = λPi (t), i = 1, 2. (The proof mimics that of Proposition 5.2.) This result gives us another way of understanding (or of proving) the time sampling Poisson process result of Proposition 5.3, which states that N1 (t) 8 t and N2 (t) are independent Poisson random variables with means E[Ni (t)] = λ 0 Pi (s) ds, i = 1, 2.

Example 5.25 (The Output Process of an Infinite Server Poisson Queue) It turns out that the output process of the M/G/∞ queue—that is, of the infinite server queue having Poisson arrivals and general service distribution G—is a nonhomogeneous Poisson process having intensity function λ(t) = λG(t). To verify this claim, let us first argue that the departure process has independent increments. Towards this end, consider nonoverlapping intervals O1 , . . . , Ok ; now say that an arrival is type i, i = 1, . . . , k, if that arrival departs in the interval Oi . By Proposition 5.3, it follows that the numbers of departures in these intervals are independent, thus establishing independent increments. Now, suppose that an arrival is “counted” if that arrival departs between t and t + h. Because an arrival at time s, s < t + h, will be counted with probability P(s), where ! G(t + h − s) − G(t − s), if s < t P(s) = G(t + h − s), if t < s < t + h it follows from Proposition 5.3 that the number of departures in (t, t + h) is a Poisson random variable with mean " t+h " t " t+h P(s)ds = λ G(t + h − s)ds − λ G(t − s)ds λ 0

=λ =λ Therefore, and

t+h

G(y)dy − λ

t+h

G(y)dy

= λG(t)h + o(h)

P{1 departure in (t, t + h)} = λG(t)h e−λG(t)h + o(h) = λG(t)h + o(h) P{! 2 departures in (t, t + h)} = o(h)

which completes the verification.

The Exponential Distribution and the Poisson Process

327

If we let Sn denote the time of the nth event of the nonhomogeneous Poisson process, then we can obtain its density as follows: P{t < Sn < t + h} = P{N (t) = n − 1, one event in (t, t + h)} + o(h)

= P{N (t) = n − 1}P{one event in (t, t + h)} + o(h) [m(t)]n−1 [λ(t)h + o(h)] + o(h) = e−m(t) (n − 1)! [m(t)]n−1 h + o(h) = λ(t)e−m(t) (n − 1)!

which implies that f Sn (t) = λ(t)e−m(t)

[m(t)]n−1 (n − 1)!

where m(t) =

5.4.2

λ(s) ds

Compound Poisson Process

A stochastic process {X (t), t ! 0} is said to be a compound Poisson process if it can be represented as X (t) =

N (t) .

Yi ,

t !0

(5.23)

i=1

where {N (t), t ! 0} is a Poisson process, and {Yi , i ! 1} is a family of independent and identically distributed random variables that is also independent of {N (t), t ! 0}. As noted in Chapter 3, the random variable X (t) is said to be a compound Poisson random variable.

Examples of Compound Poisson Processes (i) If Yi ≡ 1, then X (t) = N (t), and so we have the usual Poisson process. (ii) Suppose that buses arrive at a sporting event in accordance with a Poisson process, and suppose that the numbers of fans in each bus are assumed to be independent and identically distributed. Then {X (t), t ! 0} is a compound Poisson process where X (t) denotes the number of fans who have arrived by t. In Equation (5.23) Yi represents the number of fans in the ith bus. (iii) Suppose customers leave a supermarket in accordance with a Poisson process. If the Yi , the amount spent by the ith customer, i = 1, 2, . . . , are independent and identically distributed, then {X (t), t ! 0} is a compound Poisson process when X (t) denotes the total amount of money spent by time t. #

328

Introduction to Probability Models

Because X (t) is a compound Poisson random variable with Poisson parameter λt, we have from Examples 3.10 and 3.17 that E[X (t)] = λt E[Y1 ]

(5.24)

and & ' Var(X (t)) = λt E Y12

(5.25)

Example 5.26 Suppose that families migrate to an area at a Poisson rate λ = 2 per week. If the number of people in each family is independent and takes on the values 1, 2, 3, 4 with respective probabilities 16 , 13 , 13 , 16 , then what is the expected value and variance of the number of individuals migrating to this area during a fixed five-week period? Solution: Letting Yi denote the number of people in the ith family, we have E[Yi ] = 1 · 16 + 2 · 13 + 3 · 13 + 4 · 16 = 25 , & ' E Yi2 = 12 · 16 + 22 · 13 + 32 · 13 + 42 · 16 =

43 6

Hence, letting X (5) denote the number of immigrants during a five-week period, we obtain from Equations (5.24) and (5.25) that E[X (5)] = 2 · 5 ·

5 2

= 25

and Var[X (5)] = 2 · 5 ·

43 6

215 3

Example 5.27 (Busy Periods in Single-Server Poisson Arrival Queues) Consider a single-server service station in which customers arrive according to a Poisson process having rate λ. An arriving customer is immediately served if the server is free; if not, the customer waits in line (that is, he or she joins the queue). The successive service times are independent with a common distribution. Such a system will alternate between idle periods when there are no customers in the system, so the server is idle, and busy periods when there are customers in the system, so the server is busy. A busy period will begin when an arrival finds the system empty, and because of the memoryless property of the Poisson arrivals it follows that the distribution of the length of a busy period will be the same for each such period. Let B denote the length of a busy period. We will compute its mean and variance. To begin, let S denote the service time of the first customer in the busy period and let N (S) denote the number of arrivals during that time. Now, if N (S) = 0 then the busy period will end when the initial customer completes his service, and so B will equal S in this case. Now, suppose that one customer arrives during the service time of the initial customer. Then, at time S there will be a single customer in the system who is just about to enter service. Because the arrival stream from time S on will still be a Poisson process with rate λ, it thus follows that the additional time from S until

The Exponential Distribution and the Poisson Process

329

the system becomes empty will have the same distribution as a busy period. That is, if N (S) = 1 then B = S + B1 where B1 is independent of S and has the same distribution as B. Now, consider the general case where N (S) = n, so there will be n customers waiting when the server finishes his initial service. To determine the distribution of remaining time in the busy period note that the order in which customers are served will not affect the remaining time. Hence, let us suppose that the n arrivals, call them C1 , . . . , Cn , during the initial service period are served as follows. Customer C1 is served first, but C2 is not served until the only customers in the system are C2 , . . . , Cn . For instance, any customers arriving during C1 ’s service time will be served before C2 . Similarly, C3 is not served until the system is free of all customers but C3 , . . . , Cn , and so on. A little thought reveals that the times between the beginnings of service of customers Ci and Ci+1 , i = 1, . . . , n − 1, and the time from the beginning of service of Cn until there are no customers in the system, are independent random variables, each distributed as a busy period. It follows from the preceding that if we let B1 ,B2 , . . . be a sequence of independent random variables, each distributed as a busy period, then we can express B as B=S+ Hence,

N (S) .

i=1

⎡

and

E[B|S] = S + E ⎣ ⎛

Var(B|S) = Var ⎝

N (S) . i=1

⎤

Bi |S ⎦ ⎞

Bi |S ⎠

/ N (S) However, given S, i=1 Bi is a compound Poisson random variable, and thus from Equations (5.24) and (5.25) we obtain E[B|S] = S + λS E[B] = (1 + λE[B])S Var(B|S) = λS E[B 2 ] Hence, E[B] = E[E[B|S]] = (1 + λE[B])E[S] implying, provided that λE[S] < 1, that E[B] =

E[S] 1 − λE[S]

330

Introduction to Probability Models

Also, by the conditional variance formula Var(B) = Var(E[B|S]) + E[Var(B|S)]

yielding

= (1 + λE[B])2 Var(S) + λE[S]E[B 2 ] = (1 + λE[B])2 Var(S) + λE[S](Var(B) + E 2 [B])

Var(B) =

Var(S)(1 + λE[B])2 + λE[S]E 2 [B] 1 − λE[S]

Using E[B] = E[S]/(1 − λE[S]), we obtain Var(B) =

Var(S) + λE 3 [S] (1 − λE[S])3

There is a very nice representation of the compound Poisson process when the set of possible values of the Yi is finite or countably infinite. So let us suppose that there are numbers α j , j ! 1, such that . P{Yi = α j } = p j , pj = 1 j

Now, a compound Poisson process arises when events occur according to a Poisson process and each event results in a random amount Y being added to the cumulative sum. Let us say that the event is a type j event whenever it results in adding the amount α j , j ! 1. That is, the ith event of the Poisson process is a type j event if Yi = α j . If we let N j (t) denote the number of type j events by time t, then it follows from Proposition 5.2 that the random variables N j (t), j ! 1, are independent Poisson random variables with respective means E[N j (t)] = λ p j t

Since, for each j, the amount α j is added to the cumulative sum a total of N j (t) times by time t, it follows that the cumulative sum at time t can be expressed as . α j N j (t) (5.26) X (t) = j

As a check of Equation (5.26), let us use it to compute the mean and variance of X (t). This yields $. % E[X (t)] = E α j N j (t) = =

α j E[N j (t)]

αjλpjt

= λt E[Y1 ]

The Exponential Distribution and the Poisson Process

331

Also, Var[X (t)] = Var = =

α j N j (t)

α 2j Var[N j (t)] by the independence of the N j (t), j ! 1

α 2j λ p j t

= λt E[Y12 ] where the next to last equality follows since the variance of the Poisson random variable N j (t) is equal to its mean. Thus, we see that the representation (5.26) results in the same expressions for the mean and variance of X (t) as were previously derived. One of the uses of the representation (5.26) is that it enables us to conclude that as t grows large, the distribution of X (t) converges to the normal distribution. To see why, note first that it follows by the central limit theorem that the distribution of a Poisson random variable converges to a normal distribution as its mean increases. (Why is this?) Therefore, each of the random variables N j (t) converges to a normal random variable as t increases. Because they are independent, and because the sum of independent normal random variables is also normal, it follows that X (t) also approaches a normal distribution as t increases. Example 5.28 In Example 5.26, find the approximate probability that at least 240 people migrate to the area within the next 50 weeks. Solution: Since λ = 2, E[Yi ] = 5/2, E[Yi2 ] = 43/6, we see that E[X (50)] = 250,

Var[X (50)] = 4300/6

Now, the desired probability is P{X (50) ! 240} = P{X (50) ! 239.5} ! X (50) − 250 239.5 − 250 =P ! √ √ 4300/6 4300/6 = 1 − φ( − 0.3922) = φ(0.3922) = 0.6525

where Table 2.3 was used to determine φ(0.3922), the probability that a standard normal is less than 0.3922. # Another useful result is that if {X (t), t ! 0} and {Y (t), t ! 0} are independent compound Poisson processes with respective Poisson parameters and distributions λ1 , F1 and λ2 , F2 , then {X (t) + Y (t), t ! 0} is also a compound Poisson process. This is true because in this combined process events will occur according to a Poisson process with

332

Introduction to Probability Models

rate λ1 + λ2 , and each event independently will be from the first compound Poisson process with probability λ1 /(λ1 + λ2 ). Consequently, the combined process will be a compound Poisson process with Poisson parameter λ1 + λ2 , and with distribution function F given by F(x) =

5.4.3

λ1 λ2 F1 (x) + F2 (x) λ1 + λ2 λ1 + λ2

Conditional or Mixed Poisson Processes

Let {N (t), t ! 0} be a counting process whose probabilities are defined as follows. There is a positive random variable L such that, conditional on L = λ, the counting process is a Poisson process with rate λ. Such a counting process is called a conditional or a mixed Poisson process. Suppose that L is continuous with density function g. Because " ∞ P{N (t + s) − N (s) = n} = P{N (t + s) − N (s) = n | L = λ}g(λ) dλ "0 ∞ (λt)n g(λ) dλ (5.27) e−λt = n! 0

we see that a conditional Poisson process has stationary increments. However, because knowing how many events occur in an interval gives information about the possible value of L, which affects the distribution of the number of events in any other interval, it follows that a conditional Poisson process does not generally have independent increments. Consequently, a conditional Poisson process is not generally a Poisson process. Example 5.29 If g is the gamma density with parameters m and θ , g(λ) = θ e−θλ then

(θ λ)m−1 , (m − 1)! "

∞

λ>0

(λt)n −θλ (θ λ)m−1 θe dλ n! (m − 1)! 0 " ∞ tnθm = e−(t+θ)λ λn+m−1 dλ n!(m − 1)! 0

P{N (t) = n} =

e−λt

Multiplying and dividing by

(n+m−1)! (t+θ)n+m

gives

t n θ m (n + m − 1)! P{N (t) = n} = n!(m − 1)!(t + θ )n+m θ )e−(t+θ)λ ((t

θ )λ)n+m−1 /(n

∞ 0

(t + θ )e−(t+θ)λ

((t + θ )λ)n+m−1 dλ (n + m − 1)!

Because (t + + + m − 1)! is the density function of a gamma (n + m, t + θ ) random variable, its integral is 1, giving the result ,m + ,n + ,+ t θ n+m−1 P{N (t) = n} = n t +θ t +θ

The Exponential Distribution and the Poisson Process

333

Therefore, the number of events in an interval of length t has the same distribution of the number of failures that occur before a total of m successes are amassed, when each θ . # trial is a success with probability t+θ To compute the mean and variance of N (t), condition on L. Because, conditional on L , N (t) is Poisson with mean Lt, we obtain E[N (t)|L] = Lt Var(N (t)|L) = Lt where the final equality used that the variance of a Poisson random variable is equal to its mean. Consequently, the conditional variance formula yields Var(N (t)) = E[Lt] + Var(Lt) = t E[L] + t 2 Var(L) We can compute the conditional distribution function of L, given that N (t) = n, as follows. P{L " x, N (t) = n} P{N (t) = n} 8∞ P{L " x, N (t) = n|L = λ}g(λ) dλ = 0 P{N (t) = n} 8x P{N (t) = n|L = λ}g(λ) dλ = 0 P{N (t) = n} 8 x −λt e (λt)n g(λ) dλ = 8 0∞ −λt (λt)n g(λ) dλ 0 e

P{L " x|N (t) = n} =

where the final equality used Equation (5.27). In other words, the conditional density function of L given that N (t) = n is f L|N (t) (λ | n) = 8 ∞ 0

e−λt λn g(λ) , λ!0 e−λt λn g(λ) dλ

(5.28)

Example 5.30 An insurance company feels that each of its policyholders has a rating value and that a policyholder having rating value λ will make claims at times distributed according to a Poisson process with rate λ, when time is measured in years. The firm also believes that rating values vary from policyholder to policyholder, with the probability distribution of the value of a new policyholder being uniformly distributed over (0, 1). Given that a policyholder has made n claims in his or her first t years, what is the conditional distribution of the time until the policyholder’s next claim? Solution: If T is the time until the next claim, then we want to compute P{T > x | N (t) = n}. Conditioning on the policyholder’s rating value gives, upon using

334

Introduction to Probability Models

Equation (5.28), P{T > x | N (t) = n} =

∞

P{T > x | L = λ, N (t) = n}

× f L|N (t) (λ | n) dλ 8 1 −λx −λt n e e λ dλ = 08 1 −λt λn dλ 0 e

There is a nice formula for the probability that more than n events occur in an interval of length t. In deriving it we will use the identity ∞ .

−λt

j=n+1

(λt) j = j!

λe−λx

(λx)n dx n!

(5.29)

which follows by noting that it equates the probability that the number of events by time t of a Poisson process with rate λ is greater than n with the probability that the time of the (n + 1)st event of this process (which has a gamma (n + 1, λ) distribution) is less than t. Interchanging λ and t in Equation (5.29) yields the equivalent identity ∞ .

j=n+1

−λt

(λt) j = j!

te−t x

(t x)n dx n!

(5.30)

Using Equation (5.27) we now have P{N (t) > n} = =

∞ " .

∞

j=n+1 0 " ∞ . ∞ 0

e−λt

(λt) j g(λ) dλ j!

e−λt

(λt) j g(λ) dλ j!

j=n+1

∞" λ

(t x)n d xg(λ) dλ n! "0 ∞ "0 ∞ (t x)n dx = g(λ) dλte−t x n! 0 x " ∞ n −t x (t x) ¯ = G(x)te dx n! 0 =

5.5

te−t x

(by interchanging)

(using (5.30)) (by interchanging)

Random Intensity Functions and Hawkes Processes

Whereas the intensity function λ(t) of a nonhomogeneous Poisson process is a deterministic function, there are counting processes {N (t), t ! 0} whose intensity function value at time t, call it R(t), is a random variable whose value depends on the history

The Exponential Distribution and the Poisson Process

335

of the process up to time t. That is, if we let Ht denote the “history” of the process up to time t then R(t), the intensity rate at time t, is a random variable whose value is determined by Ht and which is such that P(N (t + h) − N (t) = 1|Ht ) = R(t)h + o(h) and P(N (t + h) − N (t) ! 2|Ht ) = o(h) The Hawkes process is an example of a counting process having a random intensity function. This counting process assumes that there is a base intensity value λ > 0, and that associated with each event is a nonnegative random variable, called a mark, whose value is independent of all that has previously occurred and has distribution F. Whenever an event occurs, it is supposed that the current value of the random intensity function increases by the amount of the event’s mark, with this increase decreasing over time at an exponential rate α. More specifically, if there have been a total of N (t) events by time t, with S1 < S2 < . . . < S N (t) being the event times and Mi being the mark of event i, i = 1, . . . , N (t), then R(t) = λ +

N (t) .

Mi e−α(t−Si )

i=1

In other words, a Hawkes process is a counting process in which 1. R(0) = λ; 2. whenever an event occurs, the random intensity increases by the value of the event’s mark; 3. if there are no events between s and s + t then R(s + t) = λ + (R(s) − λ)e−αt .

Because the intensity increases each time an event occurs, the Hawkes process is said to be a self-exciting process. We will derive E[N (t)], the expected number of events of a Hawkes process that occur by time t. To do so, we will need the following lemma, which is valid for all counting processes. Lemma Let R(t), t ! 0 be the random intensity function of the counting process {N (t), t ! 0} having N (0) = 0. Then, with m(t) = E[N (t)] " t E[R(s)] ds m(t) = 0

Proof. E[N (t + h)|N (t), R(t)] = N (t) + R(t)h + o(h) Taking expectations gives E[N (t + h)] = E[N (t)] + E[R(t)]h + o(h)

336

Introduction to Probability Models

That is, m(t + h) = m(t) + h E[R(t)] + o(h) or o(h) m(t + h) − m(t) = E[R(t)] + h h Letting h go to 0 gives m ′ (t) = E[R(t)] Integrating both sides from 0 to t now gives the result: " t E[R(s)] ds m(t) = 0

Using the preceding, we can now prove the following proposition. Proposition 5.5 If µ is the expected value of a mark in a Hawkes process, then for this process E[N (t)] = λt +

λµ (e(µ−α)t − 1 − (µ − α)t) (µ − α)2

Proof. To determine the mean value function m(t) it suffices, by the preceding lemma, to determine E[R(t)], which will be accomplished by deriving and then solving a differential equation. To begin note that, with Mt (h) equal to the sum of the marks of all events occurring between t and t + h, R(t + h) = λ + (R(t) − λ)e−αh + Mt (h) + o(h)

Letting g(t) = E[R(t)] and taking expectations of the preceding gives g(t + h) = λ + (g(t) − λ)e−αh + E[Mt (h)] + o(h)

Using the identity e−αh = 1 − αh + o(h) shows that g(t + h) = λ + (g(t) − λ)(1 − αh) + E[Mt (h)] + o(h) = g(t) − αhg(t) + λαh + E[Mt (h)] + o(h)

(5.31)

Now, given R(t), there will be 1 event between t and t +h with probability R(t)h+o(h), and there will be 2 or more with probability o(h). Hence, conditioning on the number of events between t and t + h yields, upon using that µ is the expected value of a mark, that E[Mt (h)|R(t)] = µR(t)h + o(h) Taking expectations of both sides of the preceding gives that E[Mt (h)] = µg(t)h + o(h)

The Exponential Distribution and the Poisson Process

337

Substituting back into Equation (5.31) gives g(t + h) = g(t) − αhg(t) + λαh + µg(t)h + o(h) or, equivalently, g(t + h) − g(t) o(h) = (µ − α)g(t) + λα + h h Letting h go to 0 gives that g ′ (t) = (µ − α)g(t) + λα Letting f (t) = (µ − α)g(t) + λα,the preceding can be written as f ′ (t) = f (t) µ−α or f ′ (t) =µ−α f (t) Integration now yields log( f (t)) = (µ − α)t + C Now, g(0) = E[R(0)] = λ and so f (0) = µλ, showing that C = log(µλ) and giving the result f (t) = µλe(µ−α)t Using that g(t) = g(t) = λ +

f (t)−λα µ−α

f (t) µ−α

+λ−

λµ µ−α

gives

λµ (e(µ−α)t − 1) µ−α

Hence, from Lemma 5.1 "

λµ (e(µ−α)s − 1) ds 0 µ−α λµ = λt + (e(µ−α)t − 1 − (µ − α)t) (µ − α)2

E[N (t)] = λt +

and the result is proved.

338

Introduction to Probability Models

Exercises 1. The time T required to repair a machine is an exponentially distributed random variable with mean 21 (hours). (a) What is the probability that a repair time exceeds 21 hour? (b) What is the probability that a repair takes at least 12 21 hours given that its duration exceeds 12 hours? 2. Suppose that you arrive at a single-teller bank to find five other customers in the bank, one being served and the other four waiting in line. You join the end of the line. If the service times are all exponential with rate µ, what is the expected amount of time you will spend in the bank? 3. Let X be an exponential random variable. Without any computations, tell which one of the following is correct. Explain your answer. (a) E[X 2 |X > 1] = E[(X + 1)2 ] (b) E[X 2 |X > 1] = E[X 2 ] + 1 (c) E[X 2 |X > 1] = (1 + E[X ])2

4. Consider a post office with two clerks. Three people, A, B, and C, enter simultaneously. A and B go directly to the clerks, and C waits until either A or B leaves before he begins service. What is the probability that A is still in the post office after the other two have left when (a) the service time for each clerk is exactly (nonrandom) ten minutes? (b) the service times are i with probability 13 , i = 1, 2, 3? (c) the service times are exponential with mean 1/µ? 5. If X is exponential with rate λ, show that Y = [X ]+1 is geometric with parameter p = 1 − e−λ , where [x] is the largest integer less than or equal to x. 6. In Example 5.3 if server i serves at an exponential rate λi , i = 1, 2, show that P{Smith is not last} =

λ1 λ1 + λ2

λ2 λ1 + λ2

*7. If X 1 and X 2 are independent nonnegative continuous random variables, show that P{X 1 < X 2 | min(X 1 , X 2 ) = t} =

r1 (t) r1 (t) + r2 (t)

where ri (t) is the failure rate function of X i . 8. If X and Y are independent exponential random variables with respective rates λ and µ, what is the conditional distribution of X given that X < Y ? 9. Machine 1 is currently working. Machine 2 will be put in use at a time t from now. If the lifetime of machine i is exponential with rate λi , i = 1, 2, what is the probability that machine 1 is the first machine to fail?

The Exponential Distribution and the Poisson Process

339

*10. Let X and Y be independent exponential random variables with respective rates λ and µ. Let M = min(X, Y ). Find (a) E[M X |M = X ], (b) E[M X |M = Y ], (c) Cov(X, M). 11. Let X, Y1 , . . . , Yn be independent exponential random variables; X having rate λ, and Yi having rate µ. Let A j be the event that the jth smallest of these n + 1 random variables is one of the Yi . Find p = P{X > maxi Yi }, by using the identity p = P(A1 · · · An ) = P(A1 )P(A2 |A1 ) · · · P(An |A1 · · · An−1 ) Verify your answer when n = 2 by conditioning on X to obtain p. 12. If X i , i = 1, 2, 3, are independent exponential random variables with rates λi , i = 1, 2, 3, find (a) (b) (c) (d)

P{X 1 < X 2 < X 3 }, P{X 1 < X 2 | max(X 1 , X 2 , X 3 ) = X 3 }, E[max X i |X 1 < X 2 < X 3 ], E[max X i ].

13. Find, in Example 5.10, the expected time until the nth person on line leaves the line (either by entering service or departing without service). 14. I am waiting for two friends to arrive at my house. The time until A arrives is exponentially distributed with rate λa , and the time until B arrives is exponentially distributed with rate λb . Once they arrive, both will spend exponentially distributed times, with respective rates µa and µb at my home before departing. The four exponential random variables are independent. (a) What is the probability that A arrives before and departs after B? (b) What is the expected time of the last departure? 15. One hundred items are simultaneously put on a life test. Suppose the lifetimes of the individual items are independent exponential random variables with mean 200 hours. The test will end when there have been a total of 5 failures. If T is the time at which the test ends, find E[T ] and Var(T ). 16. There are three jobs that need to be processed, with the processing time of job i being exponential with rate µi . There are two processors available, so processing on two of the jobs can immediately start, with processing on the final job to start when one of the initial ones is finished. (a) Let Ti denote the time at which the processing of job i is completed. If the objective is to minimize E[T1 + T2 + T3 ], which jobs should be initially processed if µ1 < µ2 < µ3 ? (b) Let M, called the makespan, be the time until all three jobs have been processed. With S equal to the time that there is only a single processor working,

340

Introduction to Probability Models

show that 2E[M] = E[S] +

3 .

1/µi

i=1

For the rest of this problem, suppose that µ1 = µ2 = µ, µ3 = λ. Also, let P(µ) be the probability that the last job to finish is either job 1 or job 2, and let P(λ) = 1 − P(µ) be the probability that the last job to finish is job 3. Express E[S] in terms of P(µ) and P(λ). Let Pi, j (µ) be the value of P(µ) when i and j are the jobs that are initially started. Show that P1,2 (µ) " P1,3 (µ). If µ > λ show that E[M] is minimized when job 3 is one of the jobs that is initially started. If µ < λ show that E[M] is minimized when processing is initially started on jobs 1 and 2.

17. A set of n cities is to be connected via communication links. The cost to construct a link between cities i and j is Ci j , i ̸= j. Enough links should be constructed so that for each pair of cities there is a path of links that connects them. As a result, only n − 1 links need be constructed. A minimal cost algorithm for solving this problem (known as the minimal spanning tree problem) first constructs the ( ) cheapest of all the n2 links. Then, at each additional stage it chooses the cheapest link that connects a city without any links to one with links. That is, if the first link is between cities 1 and 2, then the second link will either be between 1 and one of the links 3, . . . , n or between 2 and one of the links 3, . . . , n. Suppose that ( ) all of the n2 costs Ci j are independent exponential random variables with mean 1. Find the expected cost of the preceding algorithm if

(a) n = 3, (b) n = 4.

*18. Let X 1 and X 2 be independent exponential random variables, each having rate µ. Let X (1) = minimum(X 1 , X 2 ) and X (2) = maximum(X 1 , X 2 ) Find (a) (b) (c) (d)

E[X (1) ], Var[X (1) ], E[X (2) ], Var[X (2) ].

19. In a mile race between A and B, the time it takes A to complete the mile is an exponential random variable with rate λa and is independent of the time it takes B to complete the mile, which is an exponential random variable with rate λb .

The Exponential Distribution and the Poisson Process

341

The one who finishes earliest is declared the winner and receives Re−αt if the winning time is t, where R and α are constants. If the loser receives 0, find the expected amount that runner A wins. 20. Consider a two-server system in which a customer is served first by server 1, then by server 2, and then departs. The service times at server i are exponential random variables with rates µi , i = 1, 2. When you arrive, you find server 1 free and two customers at server 2—customer A in service and customer B waiting in line. (a) Find PA , the probability that A is still in service when you move over to server 2. (b) Find PB , the probability that B is still in the system when you move over to server 2. (c) Find E[T ], where T is the time that you spend in the system. Hint: Write

T = S1 + S2 + W A + W B where Si is your service time at server i, W A is the amount of time you wait in queue while A is being served, and W B is the amount of time you wait in queue while B is being served. 21. In a certain system, a customer must first be served by server 1 and then by server 2. The service times at server i are exponential with rate µi , i = 1, 2. An arrival finding server 1 busy waits in line for that server. Upon completion of service at server 1, a customer either enters service with server 2 if that server is free or else remains with server 1 (blocking any other customer from entering service) until server 2 is free. Customers depart the system after being served by server 2. Suppose that when you arrive there is one customer in the system and that customer is being served by server 1. What is the expected total time you spend in the system? 22. Suppose in Exercise 21 you arrive to find two others in the system, one being served by server 1 and one by server 2. What is the expected time you spend in the system? Recall that if server 1 finishes before server 2, then server 1’s customer will remain with him (thus blocking your entrance) until server 2 becomes free. *23. A flashlight needs two batteries to be operational. Consider such a flashlight along with a set of n functional batteries—battery 1, battery 2, . . . , battery n. Initially, battery 1 and 2 are installed. Whenever a battery fails, it is immediately replaced by the lowest numbered functional battery that has not yet been put in use. Suppose that the lifetimes of the different batteries are independent exponential random variables each having rate µ. At a random time, call it T , a battery will fail and our stockpile will be empty. At that moment exactly one of the batteries—which we call battery X —will not yet have failed. (a) (b) (c) (d) (e)

What is P{X = n}? What is P{X = 1}? What is P{X = i}? Find E[T ]. What is the distribution of T ?

342

Introduction to Probability Models

24. There are two servers available to process n jobs. Initially, each server begins work on a job. Whenever a server completes work on a job, that job leaves the system and the server begins processing a new job (provided there are still jobs waiting to be processed). Let T denote the time until all jobs have been processed. If the time that it takes server i to process a job is exponentially distributed with rate µi , i = 1, 2, find E[T ] and Var(T ). 25. Customers can be served by any of three servers, where the service times of server i are exponentially distributed with rate µi , i = 1, 2, 3. Whenever a server becomes free, the customer who has been waiting the longest begins service with that server. (a) If you arrive to find all three servers busy and no one waiting, find the expected time until you depart the system. (b) If you arrive to find all three servers busy and one person waiting, find the expected time until you depart the system. 26. Each entering customer must be served first by server 1, then by server 2, and finally by server 3. The amount of time it takes to be served by server i is an exponential random variable with rate µi , i = 1, 2, 3. Suppose you enter the system when it contains a single customer who is being served by server 3. (a) Find the probability that server 3 will still be busy when you move over to server 2. (b) Find the probability that server 3 will still be busy when you move over to server 3. (c) Find the expected amount of time that you spend in the system. (Whenever you encounter a busy server, you must wait for the service in progress to end before you can enter service.) (d) Suppose that you enter the system when it contains a single customer who is being served by server 2. Find the expected amount of time that you spend in the system. 27. Show, in Example 5.7, that the distributions of the total cost are the same for the two algorithms. 28. Consider n components with independent lifetimes, which are such that component i functions for an exponential time with rate λi . Suppose that all components are initially in use and remain so until they fail. (a) Find the probability that component 1 is the second component to fail. (b) Find the expected time of the second failure. Hint: Do not make use of part (a). 29. Let X and Y be independent exponential random variables with respective rates λ and µ, where λ > µ. Let c > 0. (a) Show that the conditional density function of X , given that X + Y = c, is f X |X +Y (x|c) =

(λ − µ)e−(λ−µ)x , 1 − e−(λ−µ)c

0 Y + c] = E[min(X, Y )|X > Y ] 1 = E[min(X, Y )] = λ+µ (c) Give a verbal explanation of why min(X, Y ) and X − Y are (unconditionally) independent. 34. Two individuals, A and B, both require kidney transplants. If she does not receive a new kidney, then A will die after an exponential time with rate µ A , and B after an exponential time with rate µ B . New kidneys arrive in accordance with a Poisson process having rate λ. It has been decided that the first kidney will go to A (or to B if B is alive and A is not at that time) and the next one to B (if still living). (a) (b) (c) (d)

What is the probability that A obtains a new kidney? What is the probability that B obtains a new kidney? What is the probability that neither A nor B obtains a new kidney? What is the probability that both A and B obtain new kidneys?

35. If {N (t), t ! 0} is a Poisson process with rate λ, verify that {Ns (t), t ! 0} satisfies the axioms for being a Poisson process with rate λ, where Ns (t) = N (s + t) − N (s). *36. Let S(t) denote the price of a security at time t. A popular model for the process {S(t), t ! 0} supposes that the price remains unchanged until a “shock” occurs, at which time the price is multiplied by a random factor. If we let N (t) denote the number of shocks by time t, and let X i denote the ith multiplicative factor, then

344

Introduction to Probability Models

this model supposes that S(t) = S(0)

N (t) 0

i=1

C N (t) where i=1 X i is equal to 1 when N (t) = 0. Suppose that the X i are independent exponential random variables with rate µ; that {N (t), t ! 0} is a Poisson process with rate λ; that {N (t), t ! 0} is independent of the X i ; and that S(0) = s.

(a) Find E[S(t)]. (b) Find E[S 2 (t)].

37. A machine works for an exponentially distributed time with rate µ and then fails. A repair crew checks the machine at times distributed according to a Poisson process with rate λ; if the machine is found to have failed then it is immediately replaced. Find the expected time between replacements of machines. 38. Let {Mi (t), t ! 0}, i = 1, 2, 3 be independent Poisson processes with respective rates λi , i = 1, 2, and set N1 (t) = M1 (t) + M2 (t),

N2 (t) = M2 (t) + M3 (t)

The stochastic process {(N1 (t), N2 (t)), t ! 0} is called a bivariate Poisson process. (a) Find P{N1 (t) = n, N2 (t) = m}. ( ) (b) Find Cov N1 (t), N2 (t) .

39. A certain scientific theory supposes that mistakes in cell division occur according to a Poisson process with rate 2.5 per year, and that an individual dies when 196 such mistakes have occurred. Assuming this theory, find (a) the mean lifetime of an individual, (b) the variance of the lifetime of an individual. Also approximate (c) the probability that an individual dies before age 67.2, (d) the probability that an individual reaches age 90, (e) the probability that an individual reaches age 100. *40. Show that if {Ni (t), t ! 0} are independent Poisson processes with rate λi , i = 1, 2, then {N (t), t ! 0} is a Poisson process with rate λ1 + λ2 where N (t) = N1 (t) + N2 (t). 41. In Exercise 40 what is the probability that the first event of the combined process is from the N1 process? 42. Let {N (t), t ! 0} be a Poisson process with rate λ. Let Sn denote the time of the nth event. Find (a) E[S4 ],

The Exponential Distribution and the Poisson Process

345

(b) E[S4 |N (1) = 2], (c) E[N (4) − N (2)|N (1) = 3].

43. Customers arrive at a two-server service station according to a Poisson process with rate λ. Whenever a new customer arrives, any customer that is in the system immediately departs. A new arrival enters service first with server 1 and then with server 2. If the service times at the servers are independent exponentials with respective rates µ1 and µ2 , what proportion of entering customers completes their service with server 2? 44. Cars pass a certain street location according to a Poisson process with rate λ. A woman who wants to cross the street at that location waits until she can see that no cars will come by in the next T time units. (a) Find the probability that her waiting time is 0. (b) Find her expected waiting time. Hint: Condition on the time of the first car. 45. Let {N (t), t ! 0} be a Poisson process with rate λ that is independent of the nonnegative random variable T with mean µ and variance σ 2 . Find (a) Cov(T, N (T )), (b) Var(N (T )). 46. Let {N (t), t ! 0} be a Poisson process with rate λ that is independent of the sequence X 1 , X 2 , . . . of independent and identically distributed random variables with mean µ and variance σ 2 . Find ⎛ ⎞ N (t) . Xi ⎠ Cov⎝ N (t), i=1

47. Consider a two-server parallel queuing system where customers arrive according to a Poisson process with rate λ, and where the service times are exponential with rate µ. Moreover, suppose that arrivals finding both servers busy immediately depart without receiving any service (such a customer is said to be lost), whereas those finding at least one free server immediately enter service and then depart when their service is completed. (a) If both servers are presently busy, find the expected time until the next customer enters the system. (b) Starting empty, find the expected time until both servers are busy. (c) Find the expected time between two successive lost customers. 48. Consider an n-server parallel queuing system where customers arrive according to a Poisson process with rate λ, where the service times are exponential random variables with rate µ, and where any arrival finding all servers busy immediately departs without receiving any service. If an arrival finds all servers busy, find (a) the expected number of busy servers found by the next arrival, (b) the probability that the next arrival finds all servers free,

346

Introduction to Probability Models

(c) the probability that the next arrival finds exactly i of the servers free. 49. Events occur according to a Poisson process with rate λ. Each time an event occurs, we must decide whether or not to stop, with our objective being to stop at the last event to occur prior to some specified time T , where T > 1/λ. That is, if an event occurs at time t, 0 " t " T , and we decide to stop, then we win if there are no additional events by time T , and we lose otherwise. If we do not stop when an event occurs and no additional events occur by time T , then we lose. Also, if no events occur by time T , then we lose. Consider the strategy that stops at the first event to occur after some fixed time s, 0 " s " T . (a) Using this strategy, what is the probability of winning? (b) What value of s maximizes the probability of winning? (c) Show that one’s probability of winning when using the preceding strategy with the value of s specified in part (b) is 1/e. 50. The number of hours between successive train arrivals at the station is uniformly distributed on (0, 1). Passengers arrive according to a Poisson process with rate 7 per hour. Suppose a train has just left the station. Let X denote the number of people who get on the next train. Find (a) E[X ], (b) Var(X ). 51. If an individual has never had a previous automobile accident, then the probability he or she has an accident in the next h time units is βh + o(h); on the other hand, if he or she has ever had a previous accident, then the probability is αh + o(h). Find the expected number of accidents an individual has by time t. 52. Teams 1 and 2 are playing a match. The teams score points according to independent Poisson processes with respective rates λ1 and λ2 . If the match ends when one of the teams has scored k more points than the other, find the probability that team 1 wins. Hint: Relate this to the gambler’s ruin problem. 53. The water level of a certain reservoir is depleted at a constant rate of 1000 units daily. The reservoir is refilled by randomly occurring rainfalls. Rainfalls occur according to a Poisson process with rate 0.2 per day. The amount of water added to the reservoir by a rainfall is 5000 units with probability 0.8 or 8000 units with probability 0.2. The present water level is just slightly below 5000 units. (a) What is the probability the reservoir will be empty after five days? (b) What is the probability the reservoir will be empty sometime within the next ten days? 54. A viral linear DNA molecule of length, say, 1 is often known to contain a certain “marked position,” with the exact location of this mark being unknown. One approach to locating the marked position is to cut the molecule by agents that break it at points chosen according to a Poisson process with rate λ. It is then possible to determine the fragment that contains the marked position. For instance,

The Exponential Distribution and the Poisson Process

347

letting m denote the location on the line of the marked position, then if L 1 denotes the last Poisson event time before m (or 0 if there are no Poisson events in [0, m]), and R1 denotes the first Poisson event time after m (or 1 if there are no Poisson events in [m, 1]), then it would be learned that the marked position lies between L 1 and R1 . Find (a) (b) (c) (d)

P{L 1 P{L 1 P{R1 P{R1

= 0}, < x}, 0 < x < m, = 1}, > x}, m < x < 1.

By repeating the preceding process on identical copies of the DNA molecule, we are able to zero in on the location of the marked position. If the cutting procedure is utilized on n identical copies of the molecule, yielding the data L i , Ri , i = 1, . . . , n, then it follows that the marked position lies between L and R, where L = max L i , i

R = min Ri i

(e) Find E[R − L], and in doing so, show that E[R − L] ∼

2 nλ .

55. Consider a single server queuing system where customers arrive according to a Poisson process with rate λ, service times are exponential with rate µ, and customers are served in the order of their arrival. Suppose that a customer arrives and finds n − 1 others in the system. Let X denote the number in the system at the moment that customer departs. Find the probability mass function of X . 56. An event independently occurs on each day with probability p. Let N (n) denote the total number of events that occur on the first n days, and let Tr denote the day on which the r th event occurs. (a) (b) (c) (d)

What is the distribution of N (n)? What is the distribution of T1 ? What is the distribution of Tr ? Given that N (n) = r , show that the set of r days on which events occurred has the same distribution as a random selection (without replacement) of r of the values 1, 2, . . . , n.

*57. Events occur according to a Poisson process with rate λ = 2 per hour.

(a) What is the probability that no event occurs between 8 P.M. and 9 P.M.? (b) Starting at noon, what is the expected time at which the fourth event occurs? (c) What is the probability that two or more events occur between 6 P.M. and 8 P.M.?

58. Each round played by a contestant is either a success with probability p or a failure with probability 1 − p. If the round is a success, then a random amount of money having an exponential distribution with rate λ is won. If the round is a failure, then the contestant loses everything that had been accumulated up to that time and cannot play any additional rounds. After a successful round, the contestant can either elect to quit playing and keep whatever has already been won or can

348

Introduction to Probability Models

elect to play another round. Suppose that a newly starting contestant plans on continuing to play until either her total winnings exceeds t or a failure occurs. (a) What is the distribution of N , equal to the number of successful rounds that it would take until her fortune exceeds t? (b) What is the probability the contestant will be successful in reaching a fortune of at least t? (c) Given the contestant is successful, what is her expected winnings? (d) What is the expected value of the contestant’s winnings? 59. There are two types of claims that are made to an insurance company. Let Ni (t) denote the number of type i claims made by time t, and suppose that {N1 (t), t ! 0} and {N2 (t), t ! 0} are independent Poisson processes with rates λ1 = 10 and λ2 = 1. The amounts of successive type 1 claims are independent exponential random variables with mean $1000 whereas the amounts from type 2 claims are independent exponential random variables with mean $5000. A claim for $4000 has just been received; what is the probability it is a type 1 claim? *60. Customers arrive at a bank at a Poisson rate λ. Suppose two customers arrived during the first hour. What is the probability that (a) both arrived during the first 20 minutes? (b) at least one arrived during the first 20 minutes? 61. A system has a random number of flaws that we will suppose is Poisson distributed with mean c. Each of these flaws will, independently, cause the system to fail at a random time having distribution G. When a system failure occurs, suppose that the flaw causing the failure is immediately located and fixed. (a) What is the distribution of the number of failures by time t? (b) What is the distribution of the number of flaws that remain in the system at time t? (c) Are the random variables in parts (a) and (b) dependent or independent? 62. Suppose that the number of typographical errors in a new text is Poisson distributed with mean λ. Two proofreaders independently read the text. Suppose that each error is independently found by proofreader i with probability pi , i = 1, 2. Let X 1 denote the number of errors that are found by proofreader 1 but not by proofreader 2. Let X 2 denote the number of errors that are found by proofreader 2 but not by proofreader 1. Let X 3 denote the number of errors that are found by both proofreaders. Finally, let X 4 denote the number of errors found by neither proofreader. (a) Describe the joint probability distribution of X 1 , X 2 , X 3 , X 4 . (b) Show that 1 − p2 1 − p1 E[X 2 ] E[X 1 ] = = and E[X 3 ] p2 E[X 3 ] p1 Suppose now that λ, p1 , and p2 are all unknown.

The Exponential Distribution and the Poisson Process

349

(c) By using X i as an estimator of E[X i ], i = 1, 2, 3, present estimators of p1 , p2 , and λ. (d) Give an estimator of X 4 , the number of errors not found by either proofreader. 63. Consider an infinite server queuing system in which customers arrive in accordance with a Poisson process with rate λ, and where the service distribution is exponential with rate µ. Let X (t) denote the number of customers in the system at time t. Find (a) E[X (t + s)|X (s) = n]; (b) Var[X (t + s)|X (s) = n].

Hint: Divide the customers in the system at time t + s into two groups, one consisting of “old” customers and the other of “new” customers.

(c) Consider an infinite server queuing system in which customers arrive according to a Poisson process with rate λ, and where the service times are all exponential random variables with rate µ. If there is currently a single customer in the system, find the probability that the system becomes empty when that customer departs. *64. Suppose that people arrive at a bus stop in accordance with a Poisson process with rate λ. The bus departs at time t. Let X denote the total amount of waiting time of all those who get on the bus at time t. We want to determine Var(X ). Let N (t) denote the number of arrivals by time t. (a) What is E[X |N (t)]? (b) Argue that Var[X |N (t)] = N (t)t 2 /12. (c) What is Var(X )? 65. An average of 500 people pass the California bar exam each year. A California lawyer practices law, on average, for 30 years. Assuming these numbers remain steady, how many lawyers would you expect California to have in 2050? 66. Policyholders of a certain insurance company have accidents at times distributed according to a Poisson process with rate λ. The amount of time from when the accident occurs until a claim is made has distribution G. (a) Find the probability there are exactly n incurred but as yet unreported claims at time t. (b) Suppose that each claim amount has distribution F, and that the claim amount is independent of the time that it takes to report the claim. Find the expected value of the sum of all incurred but as yet unreported claims at time t. 67. Satellites are launched into space at times distributed according to a Poisson process with rate λ. Each satellite independently spends a random time (having distribution G) in space before falling to the ground. Find the probability that none of the satellites in the air at time t was launched before time s, where s < t. 68. Suppose that electrical shocks having random amplitudes occur at times distributed according to a Poisson process {N (t), t ! 0} with rate λ. Suppose that the amplitudes of the successive shocks are independent both of other amplitudes

350

Introduction to Probability Models

and of the arrival times of shocks, and also that the amplitudes have distribution F with mean µ. Suppose also that the amplitude of a shock decreases with time at an exponential rate α, meaning that an initial amplitude A will have value Ae−αx after an additional time x has elapsed. Let A(t) denote the sum of all amplitudes at time t. That is, A(t) =

N (t) .

Ai e−α(t−Si )

i=1

where Ai and Si are the initial amplitude and the arrival time of shock i. (a) Find E[A(t)] by conditioning on N (t). (b) Without any computations, explain why A(t) has the same distribution as does D(t) of Example 5.21. 69. Suppose in Example 5.19 that a car can overtake a slower moving car without any loss of speed. Suppose a car that enters the road at time s has a free travel time equal to t0 . Find the distribution of the total number of other cars that it encounters on the road (either by passing or by being passed). 70. For the infinite server queue with Poisson arrivals and general service distribution G, find the probability that (a) the first customer to arrive is also the first to depart. Let S(t) equal the sum of the remaining service times of all customers in the system at time t. (b) Argue that S(t) is a compound Poisson random variable. (c) Find E[S(t)]. (d) Find Var(S(t)). 71. Let Sn denote the time of the nth event of the Poisson process {N (t), t ! 0} hav/ N (t) ing rate λ. Show, for an arbitrary function g, that the random variable i=1 g(Si ) / N (t) has the same distribution as the compound Poisson random variable i=1 g(Ui ), where U1 , U2 , . . . is a sequence of independent and identically distributed uniform (0, t) random variables that is independent of N , a Poisson random variable with mean λt. Consequently, conclude that ⎡ ⎤ ⎛ ⎞ " t " t N (t) N (t) . . ⎣ ⎦ ⎝ ⎠ E g(Si ) = λ g(x) d x Var g(Si ) = λ g 2 (x) d x i=1

i=1

72. A cable car starts off with n riders. The times between successive stops of the car are independent exponential random variables with rate λ. At each stop one rider gets off. This takes no time, and no additional riders get on. After a rider gets off the car, he or she walks home. Independently of all else, the walk takes an exponential time with rate µ. (a) What is the distribution of the time at which the last rider departs the car?

The Exponential Distribution and the Poisson Process

351

(b) Suppose the last rider departs the car at time t. What is the probability that all the other riders are home at that time? 73. Shocks occur according to a Poisson process with rate λ, and each shock independently causes a certain system to fail with probability p. Let T denote the time at which the system fails and let N denote the number of shocks that it takes. (a) Find the conditional distribution of T given that N = n. (b) Calculate the conditional distribution of N , given that T = t, and notice that it is distributed as 1 plus a Poisson random variable with mean λ(1 − p)t. (c) Explain how the result in part (b) could have been obtained without any calculations. 74. The number of missing items in a certain location, call it X , is a Poisson random variable with mean λ. When searching the location, each item will independently be found after an exponentially distributed time with rate µ. A reward of R is received for each item found, and a searching cost of C per unit of search time is incurred. Suppose that you search for a fixed time t and then stop. (a) Find your total expected return. (b) Find the value of t that maximizes the total expected return. (c) The policy of searching for a fixed time is a static policy. Would a dynamic policy, which allows the decision as to whether to stop at each time t, depend on the number already found by t be beneficial? Hint: How does the distribution of the number of items not yet found by time t depend on the number already found by that time? 75. Suppose that the times between successive arrivals of customers at a single-server station are independent random variables having a common distribution F. Suppose that when a customer arrives, he or she either immediately enters service if the server is free or else joins the end of the waiting line if the server is busy with another customer. When the server completes work on a customer, that customer leaves the system and the next waiting customer, if there are any, enters service. Let X n denote the number of customers in the system immediately before the nth arrival, and let Yn denote the number of customers that remain in the system when the nth customer departs. The successive service times of customers are independent random variables (which are also independent of the interarrival times) having a common distribution G. (a) If F is the exponential distribution with rate λ, which, if any, of the processes {X n }, {Yn } is a Markov chain? (b) If G is the exponential distribution with rate µ, which, if any, of the processes {X n }, {Yn } is a Markov chain? (c) Give the transition probabilities of any Markov chains in parts (a) and (b). 76. For the model of Example 5.27, find the mean and variance of the number of customers served in a busy period. 77. Suppose that customers arrive to a system according to a Poisson process with rate λ. There are an infinite number of servers in this system so a customer begins

352

Introduction to Probability Models

service upon arrival. The service times of the arrivals are independent exponential random variables with rate µ, and are independent of the arrival process. Customers depart the system when their service ends. Let N be the number of arrivals before the first departure. (a) (b) (c) (d) (e)

Find P(N = 1). Find P(N = 2). Find P(N = j). Find the probability that the first to arrive is the first to depart. Find the expected time of the first departure.

78. A store opens at 8 A.M. From 8 until 10 A.M. customers arrive at a Poisson rate of four an hour. Between 10 A.M. and 12 P.M. they arrive at a Poisson rate of eight an hour. From 12 P.M. to 2 P.M. the arrival rate increases steadily from eight per hour at 12 P.M. to ten per hour at 2 P.M.; and from 2 to 5 P.M. the arrival rate drops steadily from ten per hour at 2 P.M. to four per hour at 5 P.M.. Determine the probability distribution of the number of customers that enter the store on a given day. *79. Suppose that events occur according to a nonhomogeneous Poisson process with intensity function λ(t), t > 0. Further, suppose that an event that occurs at time s is a type 1 event with probability p(s), s > 0. If N1 (t) is the number of type 1 events by time t, what type of process is {N1 (t), t ! 0}? 80. Let T1 , T2 , . . . denote the interarrival times of events of a nonhomogeneous Poisson process having intensity function λ(t). (a) Are the Ti independent? (b) Are the Ti identically distributed? (c) Find the distribution of T1 . 81. (a) Let {N (t), t ! 0} be a nonhomogeneous Poisson process with mean value function m(t). Given N (t) = n, show that the unordered set of arrival times has the same distribution as n independent and identically distributed random variables having distribution function ! m(x) , x "t F(x) = m(t) 1, x !t (b) Suppose that workmen incur accidents in accordance with a nonhomogeneous Poisson process with mean value function m(t). Suppose further that each injured man is out of work for a random amount of time having distribution F. Let X (t) be the number of workers who are out of work at time t. By using part (a), find E[X (t)]. 82. Let X 1 , X 2 , . . . be independent positive continuous random variables with a common density function f , and suppose this sequence is independent of N , a Poisson random variable with mean λ. Define N (t) = number of i " N : X i " t

Show that {N (t), t ! 0} is a nonhomogeneous Poisson process with intensity function λ(t) = λ f (t).

The Exponential Distribution and the Poisson Process

353

83. Suppose that {N0 (t), t ! 0} is a Poisson process with rate λ = 1. Let λ(t) denote a nonnegative function of t, and let " t λ(s) ds m(t) = 0

Define N (t) by

N (t) = N0 (m(t))

Argue that {N (t), t ! 0} is a nonhomogeneous Poisson process with intensity function λ(t), t ! 0. Hint: Make use of the identity m(t + h) − m(t) = m ′ (t)h + o(h)

*84. Let X 1 , X 2 , . . . be independent and identically distributed nonnegative continuous random variables having density function f (x). We say that a record occurs at time n if X n is larger than each of the previous values X 1 , . . . , X n−1 . (A record automatically occurs at time 1.) If a record occurs at time n, then X n is called a record value. In other words, a record occurs whenever a new high is reached, and that new high is called the record value. Let N (t) denote the number of record values that are less than or equal to t. Characterize the process {N (t), t ! 0} when (a) f is an arbitrary continuous density function. (b) f (x) = λe−λx .

Hint: Finish the following sentence: There will be a record whose value is between t and t + dt if the first X i that is greater than t lies between . . . 85. An insurance company pays out claims on its life insurance policies in accordance with a Poisson process having rate λ = 5 per week. If the amount of money paid on each policy is exponentially distributed with mean $2000, what is the mean and variance of the amount of money paid by the insurance company in a four-week span? 86. In good years, storms occur according to a Poisson process with rate 3 per unit time, while in other years they occur according to a Poisson process with rate 5 per unit time. Suppose next year will be a good year with probability 0.3. Let N (t) denote the number of storms during the first t time units of next year. (a) (b) (c) (d) (e)

Find P{N (t) = n}. Is {N (t)} a Poisson process? Does {N (t)} have stationary increments? Why or why not? Does it have independent increments? Why or why not? If next year starts off with three storms by time t = 1, what is the conditional probability it is a good year?

87. Determine Cov[X (t), X (t + s)]

when {X (t), t ! 0} is a compound Poisson process.

354

Introduction to Probability Models

88. Customers arrive at the automatic teller machine in accordance with a Poisson process with rate 12 per hour. The amount of money withdrawn on each transaction is a random variable with mean $30 and standard deviation $50. (A negative withdrawal means that money was deposited.) The machine is in use for 15 hours daily. Approximate the probability that the total daily withdrawal is less than $6000. 89. Some components of a two-component system fail after receiving a shock. Shocks of three types arrive independently and in accordance with Poisson processes. Shocks of the first type arrive at a Poisson rate λ1 and cause the first component to fail. Those of the second type arrive at a Poisson rate λ2 and cause the second component to fail. The third type of shock arrives at a Poisson rate λ3 and causes both components to fail. Let X 1 and X 2 denote the survival times for the two components. Show that the joint distribution of X 1 and X 2 is given by P{X 1 > s, X 1 > t} = exp{−λ1 s − λ2 t − λ3 max(s, t)} This distribution is known as the bivariate exponential distribution. 90. In Exercise 89 show that X 1 and X 2 both have exponential distributions. *91. Let X 1 , X 2 , . . . , X n be independent and identically distributed exponential random variables. Show that the probability that the largest of them is greater than the sum of the others is n/2n−1 . That is, if M = max X j j

then show P

n . i=1

Xi − M

Hint: What is P{X 1 > 92. Prove Equation (5.22). 93. Prove that

i=2

n 2n−1

X i }?

(a) max(X 1 , X 2 ) = X 1 + X 2 − min(X 1 , X 2 ) and, in general, n . .. (b) max(X 1 , . . . , X n ) = Xi − min(X i , X j ) 1

i< j

... + min(X i , X j , X k ) + · · · i< j t} = e−λπ t , (b) E[X ] = √1 . 2 λ

95. Let {N (t), t ! 0} be a conditional Poisson process with a random rate L. (a) Derive an expression for E[L|N (t) = n]. (b) Find, for s > t, E[N (s)|N (t) = n]. (c) Find, for s < t, E[N (s)|N (t) = n].

96. For the conditional Poisson process, let m 1 = E[L], m 2 = E[L 2 ]. In terms of m 1 and m 2 , find Cov(N (s), N (t)) for s " t. 97. Consider a conditional Poisson process in which the rate L is, as in Example 5.29, gamma distributed with parameters m and p. Find the conditional density function of L given that N (t) = n. 98. Let M(t) = E[D(t)] in Example 5.21. (a) Show that M(t + h) = M(t) + e−αt λhµ + o(h) (b) Use (a) to show that M ′ (t) = λµe−αt (c) Show that M(t) =

λµ (1 − e−αt ) α

99. Let X be the time between the first and the second event of a Hawkes process with mark distribution F. Find P(X > t).

356

Introduction to Probability Models

References [1] H. Cramér and M. Leadbetter, “Stationary and Related Stochastic Processes,” John Wiley, New York, 1966. [2] S. Ross, “Stochastic Processes,” Second Edition, John Wiley, New York, 1996. [3] S. Ross, “Probability Models for Computer Science,” Academic Press, 2002.

Continuous-Time Markov Chains

6.1

Introduction

In this chapter we consider a class of probability models that has a wide variety of applications in the real world. The members of this class are the continuous-time analogs of the Markov chains of Chapter 4 and as such are characterized by the Markovian property that, given the present state, the future is independent of the past. One example of a continuous-time Markov chain has already been met. This is the Poisson process of Chapter 5. For if we let the total number of arrivals by time t (that is, N (t)) be the state of the process at time t, then the Poisson process is a continuoustime Markov chain having states 0, 1, 2, . . . that always proceeds from state n to state n + 1, where n ! 0. Such a process is known as a pure birth process since when a transition occurs the state of the system is always increased by one. More generally, an exponential model that can go (in one transition) only from state n to either state n − 1 or state n + 1 is called a birth and death model. For such a model, transitions from state n to state n + 1 are designated as births, and those from n to n − 1 as deaths. Birth and death models have wide applicability in the study of biological systems and in the study of waiting line systems in which the state represents the number of customers in the system. These models will be studied extensively in this chapter. In Section 6.2 we define continuous-time Markov chains and then relate them to the discrete-time Markov chains of Chapter 4. In Section 6.3 we consider birth and death processes and in Section 6.4 we derive two sets of differential equations—the forward and backward equations—that describe the probability laws for the system. The material in Section 6.5 is concerned with determining the limiting (or long-run) probabilities connected with a continuous-time Markov chain. In Section 6.6 we consider the topic of time reversibility. We show that all birth and death processes are time Introduction to Probability Models, Eleventh Edition. http://dx.doi.org/10.1016/B978-0-12-407948-9.00006-2 © 2014 Elsevier Inc. All rights reserved.

358

Introduction to Probability Models

reversible, and then illustrate the importance of this observation to queueing systems. In the final section we show how to “uniformize” Markov chains, a technique useful for numerical computations.

6.2

Continuous-Time Markov Chains

Suppose we have a continuous-time stochastic process {X (t), t!0} taking on values in the set of nonnegative integers. In analogy with the definition of a discrete-time Markov chain, given in Chapter 4, we say that the process {X (t), t ! 0} is a continuous-time Markov chain if for all s, t ! 0 and nonnegative integers i, j, x(u), 0 " u < s P{X (t + s) = j|X (s) = i, X (u) = x(u), 0 " u < s} = P{X (t + s) = j|X (s) = i}

In other words, a continuous-time Markov chain is a stochastic process having the Markovian property that the conditional distribution of the future X (t + s) given the present X (s) and the past X (u), 0 " u < s, depends only on the present and is independent of the past. If, in addition, P{X (t + s) = j|X (s) = i}

is independent of s, then the continuous-time Markov chain is said to have stationary or homogeneous transition probabilities. All Markov chains considered in this text will be assumed to have stationary transition probabilities. Suppose that a continuous-time Markov chain enters state i at some time, say, time 0, and suppose that the process does not leave state i (that is, a transition does not occur) during the next ten minutes. What is the probability that the process will not leave state i during the following five minutes? Since the process is in state i at time 10 it follows, by the Markovian property, that the probability that it remains in that state during the interval [10,15] is just the (unconditional) probability that it stays in state i for at least five minutes. That is, if we let Ti denote the amount of time that the process stays in state i before making a transition into a different state, then P{Ti > 15|Ti > 10} = P{Ti > 5}

or, in general, by the same reasoning,

P{Ti > s + t|Ti > s} = P{Ti > t}

for all s, t ! 0. Hence, the random variable Ti is memoryless and must thus (see Section 5.2.2) be exponentially distributed. In fact, the preceding gives us another way of defining a continuous-time Markov chain. Namely, it is a stochastic process having the properties that each time it enters state i (i) the amount of time it spends in that state before making a transition into a different state is exponentially distributed with mean, say, 1/vi , and

Continuous-Time Markov Chains

359

(ii) when the process leaves state i, it next enters state j with some probability, say, Pi j . Of course, the Pi j must satisfy ! j

Pii = 0, all i

Pi j = 1, all i

In other words, a continuous-time Markov chain is a stochastic process that moves from state to state in accordance with a (discrete-time) Markov chain, but is such that the amount of time it spends in each state, before proceeding to the next state, is exponentially distributed. In addition, the amount of time the process spends in state i, and the next state visited, must be independent random variables. For if the next state visited were dependent on Ti , then information as to how long the process has already been in state i would be relevant to the prediction of the next state—and this contradicts the Markovian assumption. Example 6.1 (A Shoe Shine Shop) Consider a shoe shine establishment consisting of two chairs—chair 1 and chair 2. A customer upon arrival goes initially to chair 1 where his shoes are cleaned and polish is applied. After this is done the customer moves on to chair 2 where the polish is buffed. The service times at the two chairs are assumed to be independent random variables that are exponentially distributed with respective rates µ1 and µ2 . Suppose that potential customers arrive in accordance with a Poisson process having rate λ, and that a potential customer will enter the system only if both chairs are empty. The preceding model can be analyzed as a continuous-time Markov chain, but first we must decide upon an appropriate state space. Since a potential customer will enter the system only if there are no other customers present, it follows that there will always either be 0 or 1 customers in the system. However, if there is 1 customer in the system, then we would also need to know which chair he was presently in. Hence, an appropriate state space might consist of the three states 0, 1, and 2 where the states have the following interpretation: State 0 1 2

Interpretation system is empty a customer is in chair 1 a customer is in chair 2

We leave it as an exercise for you to verify that v0 = λ, v1 = µ1 , v2 = µ2 , P01 = P12 = P20 = 1

6.3

Birth and Death Processes

Consider a system whose state at any time is represented by the number of people in the system at that time. Suppose that whenever there are n people in the system, then (i) new

360

Introduction to Probability Models

arrivals enter the system at an exponential rate λn , and (ii) people leave the system at an exponential rate µn . That is, whenever there are n persons in the system, then the time until the next arrival is exponentially distributed with mean 1/λn and is independent of the time until the next departure, which is itself exponentially distributed with mean 1/µn . Such a system is called a birth and death process. The parameters {λn }∞ n=0 and are called, respectively, the arrival (or birth) and departure (or death) rates. {µn }∞ n=1 Thus, a birth and death process is a continuous-time Markov chain with states {0, 1, . . .} for which transitions from state n may go only to either state n − 1 or state n + 1. The relationships between the birth and death rates and the state transition rates and probabilities are v0 = λ0 , vi = λi + µi ,

i >0

P0 1 = 1,

λi , λi + µi µi = , λi + µi

Pi,i+1 =

i >0

Pi,i−1

i >0

The preceding follows, because if there are i in the system, then the next state will be i + 1 if a birth occurs before a death, and the probability that an exponential random variable with rate λi will occur earlier than an (independent) exponential with rate µi is λi /(λi + µi ). Moreover, the time until either a birth or a death occurs is exponentially distributed with rate λi + µi (and so, vi = λi + µi ). Example 6.2 (The Poisson Process) Consider a birth and death process for which µn = 0, for all n ! 0 λn = λ, for all n ! 0 This is a process in which departures never occur, and the time between successive arrivals is exponential with mean 1/λ. Hence, this is just the Poisson process. # A birth and death process for which µn = 0 for all n is called a pure birth process. Another pure birth process is given by the next example. Example 6.3 (A Birth Process with Linear Birth Rate) Consider a population whose members can give birth to new members but cannot die. If each member acts independently of the others and takes an exponentially distributed amount of time, with mean 1/λ, to give birth, then if X (t) is the population size at time t, then {X (t), t ! 0} is a pure birth process with λn = nλ, n ! 0. This follows since if the population consists of n persons and each gives birth at an exponential rate λ, then the total rate at which births occur is nλ. This pure birth process is known as a Yule process after G. Yule, who used it in his mathematical theory of evolution. #

Continuous-Time Markov Chains

361

Example 6.4 (A Linear Growth Model with Immigration) A model in which n!1

µn = nµ,

λn = nλ + θ , n ! 0

is called a linear growth process with immigration. Such processes occur naturally in the study of biological reproduction and population growth. Each individual in the population is assumed to give birth at an exponential rate λ; in addition, there is an exponential rate of increase θ of the population due to an external source such as immigration. Hence, the total birth rate where there are n persons in the system is nλ + θ . Deaths are assumed to occur at an exponential rate µ for each member of the population, so µn = nµ. Let X (t) denote the population size at time t. Suppose that X (0) = i and let M(t) = E[X (t)] We will determine M(t) by deriving and then solving a differential equation that it satisfies. We start by deriving an equation for M(t + h) by conditioning on X (t). This yields M(t + h) = E[X (t + h)]

= E[E[X (t + h)|X (t)]]

Now, given the size of the population at time t then, ignoring events whose probability is o(h), the population at time t + h will either increase in size by 1 if a birth or an immigration occurs in (t, t + h), or decrease by 1 if a death occurs in this interval, or remain the same if neither of these two possibilities occurs. That is, given X (t), ⎧ ⎨ X (t) + 1, with probability [θ + X (t)λ]h + o(h) X (t + h) = X (t) − 1, with probability X (t)µh + o(h) ⎩ X (t), with probability 1 − [θ + X (t)λ+ X (t)µ] h + o(h) Therefore,

E[X (t + h)|X (t)] = X (t) + [θ + X (t)λ − X (t)µ]h + o(h) Taking expectations yields M(t + h) = M(t) + (λ − µ)M(t)h + θ h + o(h) or, equivalently, o(h) M(t + h) − M(t) = (λ − µ)M(t) + θ + h h Taking the limit as h → 0 yields the differential equation M ′ (t) = (λ − µ)M(t) + θ

(6.1)

362

Introduction to Probability Models

If we now define the function h(t) by h(t) = (λ − µ)M(t) + θ then h ′ (t) = (λ − µ)M ′ (t) Therefore, Differential Equation (6.1) can be rewritten as

h ′ (t) = h(t) λ−µ h ′ (t) =λ−µ h(t)

Integration yields log[h(t)] = (λ − µ)t + c or h(t) = Ke(λ−µ)t Putting this back in terms of M(t) gives θ + (λ − µ)M(t) = K e(λ−µ)t To determine the value of the constant K , we use the fact that M(0) = i and evaluate the preceding at t = 0. This gives θ + (λ − µ)i = K Substituting this back in the preceding equation for M(t) yields the following solution for M(t): M(t) =

θ [e(λ−µ)t − 1] + ie(λ−µ)t λ−µ

Note that we have implicitly assumed that λ ̸= µ. If λ = µ, then Differential Equation (6.1) reduces to M ′ (t) = θ

(6.2)

Integrating (6.2) and using that M(0) = i gives the solution M(t) = θ t + i

Continuous-Time Markov Chains

363

Example 6.5 (The Queueing System M/M/1) Suppose that customers arrive at a single-server service station in accordance with a Poisson process having rate λ. That is, the times between successive arrivals are independent exponential random variables having mean 1/λ. Upon arrival, each customer goes directly into service if the server is free; if not, then the customer joins the queue (that is, he waits in line). When the server finishes serving a customer, the customer leaves the system and the next customer in line, if there are any waiting, enters the service. The successive service times are assumed to be independent exponential random variables having mean 1/µ. The preceding is known as the M/M/1 queueing system. The first M refers to the fact that the interarrival process is Markovian (since it is a Poisson process) and the second to the fact that the service distribution is exponential (and, hence, Markovian). The 1 refers to the fact that there is a single server. If we let X (t) denote the number in the system at time t then {X (t), t ! 0} is a birth and death process with µn = µ, n ! 1

λn = λ, n ! 0

Example 6.6 (A Multiserver Exponential Queueing System) Consider an exponential queueing system in which there are s servers available, each serving at rate µ. An entering customer first waits in line and then goes to the first free server. This is a birth and death process with parameters % nµ, sµ, λn = λ,

µn =

1"n"s n>s n!0

To see why this is true, reason as follows: If there are n customers in the system, where n " s, then n servers will be busy. Since each of these servers works at rate µ, the total departure rate will be nµ. On the other hand, if there are n customers in the system, where n > s, then all s of the servers will be busy, and thus the total departure rate will be sµ. This is known as an M/M/s queueing model. # Consider now a general birth and death process with birth rates {λn } and death rates {µn }, where µ0 = 0, and let Ti denote the time, starting from state i, it takes for the process to enter state i + 1, i ! 0. We will recursively compute E[Ti ], i ! 0, by starting with i = 0. Since T0 is exponential with rate λ0 , we have E[T0 ] =

1 λ0

For i > 0, we condition whether the first transition takes the process into state i − 1 or i + 1. That is, let % 1, if the first transition from i is to i + 1 Ii = 0, if the first transition from i is to i − 1

364

Introduction to Probability Models

and note that 1 , λi + µi 1 E[Ti |Ii = 0] = + E[Ti−1 ] + E[Ti ] λi + µi E[Ti |Ii = 1] =

(6.3)

This follows since, independent of whether the first transition is from a birth or death, the time until it occurs is exponential with rate λi + µi ; if this first transition is a birth, then the population size is at i + 1, so no additional time is needed; whereas if it is death, then the population size becomes i − 1 and the additional time needed to reach i + 1 is equal to the time it takes to return to state i (this has mean E[Ti−1 ]) plus the additional time it then takes to reach i + 1 (this has mean E[Ti ]). Hence, since the probability that the first transition is a birth is λi /(λi + µi ), we see that E[Ti ] =

1 µi + (E[Ti−1 ] + E[Ti ]) λi + µi λi + µi

or, equivalently, E[Ti ] =

1 µi + E[Ti−1 ], λi λi

i !1

Starting with E[T0 ] = 1/λ0 , the preceding yields an efficient method to successively compute E[T1 ], E[T2 ], and so on. Suppose now that we wanted to determine the expected time to go from state i to state j where i < j. This can be accomplished using the preceding by noting that this quantity will equal E[Ti ] + E[Ti+1 ] + · · · + E[T j−1 ]. Example 6.7 For the birth and death process having parameters λi ≡ λ, µi ≡ µ, 1 µ + E[Ti−1 ] λ λ 1 = (1 + µE[Ti−1 ]) λ

E[Ti ] =

Starting with E[T0 ] = 1/λ, we see that 1& 1+ λ( 1 1+ E[T2 ] = λ E[T1 ] =

and, in general, E[Ti ] = =

µ' , λ ) µ & µ '2 + λ λ

( & µ 'i ) µ & µ '2 1 1+ + + ··· + λ λ λ λ 1 − (µ/λ)i+1 , λ−µ

i !0

Continuous-Time Markov Chains

365

The expected time to reach state j, starting at state k, k < j, is E[time to go from k to j] =

j−1 ! i=k

E[Ti ]

+ * (µ/λ)k+1 1 − (µ/λ) j−k j −k − = λ−µ λ−µ 1 − µ/λ

The foregoing assumes that λ ̸= µ. If λ = µ, then

i +1 , λ j( j + 1) − k(k + 1) E[time to go from k to j] = 2λ E[Ti ] =

We can also compute the variance of the time to go from 0 to i + 1 by utilizing the conditional variance formula. First note that Equation (6.3) can be written as E[Ti |Ii ] = Thus,

1 + (1 − Ii )(E[Ti−1 ] + E[Ti ]) λi + µi

Var(E[Ti |Ii ]) = (E[Ti−1 ] + E[Ti ])2 Var(Ii ) µi λi = (E[Ti−1 ] + E[Ti ])2 (µi + λi )2

(6.4)

where Var(Ii ) is as shown since Ii is a Bernoulli random variable with parameter p = λi /(λi + µi ). Also, note that if we let X i denote the time until the transition from i occurs, then Var(Ti |Ii = 1) = Var(X i |Ii = 1) = Var(X i ) 1 = (λi + µi )2

(6.5)

where the preceding uses the fact that the time until transition is independent of the next state visited. Also, Var(Ti |Ii = 0) = Var(X i + time to get back to i + time to then reach i +1) = Var(X i ) + Var(Ti−1 ) + Var(Ti ) (6.6) where the foregoing uses the fact that the three random variables are independent. We can rewrite Equations (6.5) and (6.6) as Var(Ti |Ii ) = Var(X i ) + (1 − Ii )[Var(Ti−1 ) + Var(Ti )] so E[Var(Ti |Ii )] =

1 µi + [Var(Ti−1 ) + Var(Ti )] (µi + λi )2 µi + λi

(6.7)

366

Introduction to Probability Models

Hence, using the conditional variance formula, which states that Var(Ti ) is the sum of Equations (6.7) and (6.4), we obtain Var(Ti ) =

or, equivalently, Var(Ti ) =

1 µi + [Var(Ti−1 ) + Var(Ti )] 2 (µi + λi ) µi + λi µi λi + (E[Ti−1 ] + E[Ti ])2 (µi + λi )2 µi 1 µi + Var(Ti−1 ) + (E[Ti−1 ] + E[Ti ])2 λi (λi + µi ) λi µi + λi

Starting with Var(T0 ) = 1/λ20 and using the former recursion to obtain the expectations, we can recursively compute Var(Ti ). In addition, if we want the variance of the time to reach state j, starting from state k, k < j, then this can be expressed as the time to go from k to k + 1 plus the additional time to go from k + 1 to k + 2, and so on. Since, by the Markovian property, these successive random variables are independent, it follows that Var(time to go from k to j) =

6.4

j−1 !

Var(Ti )

i=k

The Transition Probability Function Pij (t)

Let Pi j (t) = P{X (t + s) = j|X (s) = i} denote the probability that a process presently in state i will be in state j a time t later. These quantities are often called the transition probabilities of the continuous-time Markov chain. We can explicitly determine Pi j (t) in the case of a pure birth process having distinct birth rates. For such a process, let X k denote the time the process spends in state k before making a transition into state k + 1, k ! 1. Suppose that the process is presently in state i, and let j > i. Then, as X i is the time it spends in state i before moving to state i + 1, and X i+1 is the time it then spends in state i + 1 before moving to state , j−1 i + 2, and so on, it follows that k=i X k is the time it takes until the process enters state j. Now, if the process has not yet entered state j by time t, then its state at time t is smaller than j, and vice versa. That is, X (t) < j ⇔ X i + · · · + X j−1 > t Therefore, for i < j, we have for a pure birth process that ⎧ ⎫ j−1 ⎨! ⎬ P{X (t) < j|X (0) = i} = P Xk > t ⎩ ⎭ k=i

Continuous-Time Markov Chains

367

However, since X i , . . . , X j−1 are independent exponential random variables with respective rates λi , . . . , λj−1 , we obtain from the preceding and Equation (5.9), which gives , j−1 the tail distribution function of k=i X k , that P{X (t) < j|X (0) = i} =

j−1 !

j−1 0

e−λk t

r ̸=k, r =i

k=i

λr λr − λk

Replacing j by j + 1 in the preceding gives P{X (t) < j + 1|X (0) = i} =

j !

j 0

e−λk t

λr λr − λk

r ̸=k, r =i

k=i

Since P{X (t) = j|X (0) = i} = P{X (t) < j + 1|X (0) = i} − P{X (t) < j|X (0) = i} and since Pii (t) = P{X i > t} = e−λi t , we have shown the following.

Proposition 6.1 For a pure birth process having λi ̸= λ j when i ̸= j Pi j (t) =

j !

−λk t

j 0

j−1

r ̸=k, r =i

k=i

Pii (t) = e−λi t

! λr − e−λk t λr − λk k=i

j−1 0

r ̸=k, r =i

λr , λr − λk

i< j

Example 6.8 Consider the Yule process, which is a pure birth process in which each individual in the population independently gives birth at rate λ, and so λn = nλ, n ! 1. Letting i = 1, we obtain from Proposition 6.1 j !

j−1 j−1 ! 0 r r −kλt e e P1 j (t) = − r −k r −k k=1 r ̸=k, r =1 k=1 r ̸=k, r =1 ⎛ j−1 j j−1 j−1 0 r ! 0 0 r − jλt −kλt ⎝ + − =e e r−j r −k −kλt

j 0

r =1

= e− jλt (−1) j−1 +

k=1

j−1 ! k=1

e−kλt

r ̸=k, r =1

j j −k

6 −1

j−1 0

r ̸=k, r =1

r r −k

Now, k j −k

j−1 0

r ( j − 1)! = r −k (1 − k)(2 − k) · · · (k − 1 − k)( j − k)! r ̸=k, r =1 5 6 j −1 = (−1)k−1 k−1

⎞ r ⎠ r −k

368

Introduction to Probability Models

so P1 j (t) =

6 j 5 ! j − 1 −kλt (−1)k−1 e k−1 k=1

= e−λt =e

−λt

6 j−1 5 ! j − 1 −iλt e (−1)i i i=0

(1 − e−λt ) j−1

Thus, starting with a single individual, the population size at time t has a geometric distribution with mean eλt . If the population starts with i individuals, then we can regard each of these individuals as starting her own independent Yule process, and so the population at time t will be the sum of i independent and identically distributed geometric random variables with parameter e−λt . But this means that the conditional distribution of X (t), given that X (0) = i, is the same as the distribution of the number of times that a coin that lands heads on each flip with probability e−λt must be flipped to amass a total of i heads. Hence, the population size at time t has a negative binomial distribution with parameters i and e−λt , so 5 6 j − 1 −iλt Pi j (t) = (1 − e−λt ) j−i , j !i !1 e i −1 (We could, of course, have used Proposition 6.1 to immediately obtain an equation for Pi j (t), rather than just using it for P1 j (t), but the algebra that would have then been needed to show the equivalence of the resulting expression to the preceding result is somewhat involved.) # We shall now derive a set of differential equations that the transition probabilities Pi j (t) satisfy in a general continuous-time Markov chain. However, first we need a definition and a pair of lemmas. For any pair of states i and j, let qi j = vi Pi j Since vi is the rate at which the process makes a transition when in state i and Pi j is the probability that this transition is into state j, it follows that qi j is the rate, when in state i, at which the process makes a transition into state j. The quantities qi j are called the instantaneous transition rates. Since ! ! vi = vi Pi j = qi j j

and Pi j =

qi j qi j =, vi j qi j

it follows that specifying the instantaneous transition rates determines the parameters of the continuous-time Markov chain.

Continuous-Time Markov Chains

369

Lemma 6.2 1 − Pii (h) = vi h Pi j (h) limh→0 = qi j h

(a)

limh→0

(b)

when i ̸= j

Proof. We first note that since the amount of time until a transition occurs is exponentially distributed it follows that the probability of two or more transitions in a time h is o(h). Thus, 1 − Pii (h), the probability that a process in state i at time 0 will not be in state i at time h, equals the probability that a transition occurs within time h plus something small compared to h. Therefore, 1 − Pii (h) = vi h + o(h) and part (a) is proven. To prove part (b), we note that Pi j (h), the probability that the process goes from state i to state j in a time h, equals the probability that a transition occurs in this time multiplied by the probability that the transition is into state j, plus something small compared to h. That is, Pi j (h) = hvi Pi j + o(h) #

and part (b) is proven. Lemma 6.3 For all s ! 0, t ! 0, Pi j (t + s) =

∞ !

Pik (t)Pk j (s)

(6.8)

k=0

Proof. In order for the process to go from state i to state j in time t + s, it must be somewhere at time t and thus Pi j (t + s) = P{X (t + s) = j|X (0) = i} ∞ ! P{X (t + s) = j, X (t) = k|X (0) = i} = = = =

k=0 ∞ ! k=0 ∞ ! k=0 ∞ !

P{X (t + s) = j|X (t) = k, X (0) = i} · P{X (t) = k|X (0) = i} P{X (t + s) = j|X (t) = k} · P{X (t) = k|X (0) = i} Pk j (s)Pik (t)

k=0

and the proof is completed.

370

Introduction to Probability Models

The set of Equations (6.8) is known as the Chapman–Kolmogorov equations. From Lemma 6.3, we obtain Pi j (h + t) − Pi j (t) = =

∞ ! k=0

! k̸=i

Pik (h)Pk j (t) − Pi j (t) Pik (h)Pk j (t) − [1 − Pii (h)]Pi j (t)

and thus ⎫ ⎧ ) ( ⎬ ⎨! P (h) Pi j (t + h) − Pi j (t) 1 − Pii (h) ik lim = lim Pk j (t) − Pi j (t) ⎭ h→0 h→0 ⎩ h h h k̸=i

Now, assuming that we can interchange the limit and the summation in the preceding and applying Lemma 6.2, we obtain Pi′j (t) =

! k̸=i

qik Pk j (t) − vi Pi j (t)

It turns out that this interchange can indeed be justified and, hence, we have the following theorem. Theorem 6.1 (Kolmogorov’s Backward Equations) For all states i, j, and times t ! 0, ! qik Pk j (t) − vi Pi j (t) Pi′j (t) = k̸=i

Example 6.9 The backward equations for the pure birth process become Pi′j (t) = λi Pi+1, j (t) − λi Pi j (t)

Example 6.10 The backward equations for the birth and death process become P0′ j (t) = λ0 P1 j (t) − λ0 P0 j (t), ( ) λi µi Pi′j (t) = (λi + µi ) Pi+1, j (t) + Pi−1, j (t) − (λi + µi )Pi j (t) λi + µi λi + µi or equivalently, P0′ j (t) = λ0 [P1 j (t) − P0 j (t)],

Pi′j (t) = λi Pi+1, j (t) + µi Pi−1, j (t) − (λi + µi )Pi j (t),

i >0

(6.9) #

Continuous-Time Markov Chains

371

Example 6.11 (A Continuous-Time Markov Chain Consisting of Two States) Consider a machine that works for an exponential amount of time having mean 1/λ before breaking down; and suppose that it takes an exponential amount of time having mean 1/µ to repair the machine. If the machine is in working condition at time 0, then what is the probability that it will be working at time t = 10? To answer this question, we note that the process is a birth and death process (with state 0 meaning that the machine is working and state 1 that it is being repaired) having parameters λ0 = λ, λi = 0, i ̸= 0,

µ1 = µ, µi = 0, i ̸= 1

We shall derive the desired probability, namely, P00 (10) by solving the set of differential equations given in Example 6.10. From Equation (6.9), we obtain ′ (t) = λ[P10 (t) − P00 (t)], P00

(6.10)

′ (t) = µP00 (t) − µP10 (t) P10

(6.11)

Multiplying Equation (6.10) by µ and Equation (6.11) by λ and then adding the two equations yields ′ ′ (t) + λP10 (t) = 0 µP00

By integrating, we obtain µP00 (t) + λP10 (t) = c However, since P00 (0) = 1 and P10 (0) = 0, we obtain c = µ and hence, µP00 (t) + λP10 (t) = µ

(6.12)

or equivalently, λP10 (t) = µ[1 − P00 (t)] By substituting this result in Equation (6.10), we obtain ′ (t) = µ[1 − P00 (t)] − λP00 (t) P00

= µ − (µ + λ)P00 (t)

Letting h(t) = P00 (t) − we have

µ µ+λ

( h ′ (t) = µ − (µ + λ) h(t) + = −(µ + λ)h(t)

µ µ+λ

)

372

Introduction to Probability Models

or h ′ (t) = −(µ + λ) h(t) By integrating both sides, we obtain or

log h(t) = −(µ + λ)t + c h(t) = Ke−(µ+λ)t

and thus

P00 (t) = Ke−(µ+λ)t +

µ µ+λ

which finally yields, by setting t = 0 and using the fact that P00 (0) = 1, P00 (t) =

λ µ e−(µ+λ)t + µ+λ µ+λ

From Equation (6.12), this also implies that µ µ −(µ+λ)t P10 (t) = − e µ+λ µ+λ

Hence, our desired probability is as follows: P00 (10) =

λ µ e−10(µ+λ) + µ+λ µ+λ

Another set of differential equations, different from the backward equations, may also be derived. This set of equations, known as Kolmogorov’s forward equations is derived as follows. From the Chapman–Kolmogorov equations (Lemma 6.3), we have ∞ ! Pi j (t + h) − Pi j (t) = Pik (t)Pk j (h) − Pi j (t) k=0

and thus

! k̸= j

Pik (t)Pk j (h) − [1 − P j j (h)]Pi j (t)

⎫ ⎧ ( ) ⎬ ⎨! Pk j (h) Pi j (t + h) − Pi j (t) 1 − P j j (h) = lim − Pi j (t) lim Pik (t) ⎭ h→0 h→0 ⎩ h h h k̸= j

and, assuming that we can interchange limit with summation, we obtain from Lemma 6.2 ! qk j Pik (t) − v j Pi j (t) Pi′j (t) = k̸= j

Unfortunately, we cannot always justify the interchange of limit and summation and thus the preceding is not always valid. However, they do hold in most models, including all birth and death processes and all finite state models. We thus have the following.

Continuous-Time Markov Chains

373

Theorem 6.2 (Kolmogorov’s Forward Equations) Under suitable regularity conditions, ! qk j Pik (t) − v j Pi j (t) (6.13) Pi′j (t) = k̸= j

We shall now solve the forward equations for the pure birth process. For this process, Equation (6.13) reduces to Pi′j (t) = λ j−1 Pi, j−1 (t) − λ j Pi j (t) However, by noting that Pi j (t) = 0 whenever j < i (since no deaths can occur), we can rewrite the preceding equation to obtain Pii′ (t) = −λi Pii (t),

Pi′j (t) = λ j−1 Pi, j−1 (t) − λ j Pi j (t),

j !i +1

(6.14)

Proposition 6.4 For a pure birth process, Pii (t) = e−λi t , Pi j (t) = λ j−1 e−λ j t

i !0 eλ j s Pi, j−1 (s) ds,

j !i +1

Proof. The fact that Pii (t) = e−λi t follows from Equation (6.14) by integrating and using the fact that Pii (0) = 1. To prove the corresponding result for Pi j (t), we note by Equation (6.14) that 8 9 eλ j t Pi′j (t) + λ j Pi j (t) = eλ j t λ j−1 Pi, j−1 (t) or

+ d * λjt e Pi j (t) = λ j−1 eλ j t Pi, j−1 (t) dt

Hence, since Pi j (0) = 0, we obtain the desired results.

Example 6.12 (Forward Equations for Birth and Death Process) The forward equations (Equation 6.13) for the general birth and death process become ! ′ Pi0 (t) = qk0 Pik (t) − λ0 Pi0 (t) k̸=0

= µ1 Pi1 (t) − λ0 Pi0 (t) ! Pi′j (t) = qk j Pik (t) − (λ j + µ j )Pi j (t) k̸= j

= λ j−1 Pi, j−1 (t) + µ j+1 Pi, j+1 (t) − (λ j + µ j )Pi j (t)

(6.15)

(6.16) #

374

Introduction to Probability Models

6.5

Limiting Probabilities

In analogy with a basic result in discrete-time Markov chains, the probability that a continuous-time Markov chain will be in state j at time t often converges to a limiting value that is independent of the initial state. That is, if we call this value P j , then P j ≡ lim Pi j (t) t→∞

where we are assuming that the limit exists and is independent of the initial state i. To derive a set of equations for the P j , consider first the set of forward equations ! Pi′j (t) = qk j Pik (t) − v j Pi j (t) (6.17) k̸= j

Now, if we let t approach ∞, then assuming that we can interchange limit and summation, we obtain ⎤ ⎡ ! qk j Pik (t) − v j Pi j (t)⎦ lim Pi′j (t) = lim ⎣ t→∞

t→∞

! k̸= j

k̸= j

qk j Pk − v j P j

However, as Pi j (t) is a bounded function (being a probability it is always between 0 and 1), it follows that if Pi′j (t) converges, then it must converge to 0 (why is this?). Hence, we must have ! qk j Pk − v j P j 0= k̸= j

or v j Pj =

qk j Pk ,

all states j

(6.18)

k̸= j

The preceding set of equations, along with the equation ! Pj = 1

(6.19)

can be used to solve for the limiting probabilities. Remark (i) We have assumed that the limiting probabilities P j exist. A sufficient condition for this is that (a) all states of the Markov chain communicate in the sense that starting in state i there is a positive probability of ever being in state j, for all i, j and

Continuous-Time Markov Chains

375

(b) the Markov chain is positive recurrent in the sense that, starting in any state, the mean time to return to that state is finite If conditions (a) and (b) hold, then the limiting probabilities will exist and satisfy Equations (6.18) and (6.19). In addition, P j also will have the interpretation of being the long-run proportion of time that the process is in state j. (ii) Equations (6.18) and (6.19) have a nice interpretation: In any interval (0, t) the number of transitions into state j must equal to within 1 the number of transitions out of state j (why?). Hence, in the long run, the rate at which transitions into state j occur must equal the rate at which transitions out of state j occur. When the process is in state j, it leaves at rate v j , and, as P j is the proportion of time it is in state j, it thus follows that v j P j = rate at which the process leaves state j Similarly, when the process is in state k, it enters j at a rate qk j . Hence, as Pk is the proportion of time in state k, we see that the rate at which transitions from k to j occur is just qk j Pk ; thus ! qk j Pk = rate at which the process enters state j k̸= j

So, Equation (6.18) is just a statement of the equality of the rates at which the process enters and leaves state j. Because it balances (that is, equates) these rates, Equation (6.18) is sometimes referred to as a set of “balance equations.” (iii) When the limiting probabilities P j exist, we say that the chain is ergodic. The P j are sometimes called stationary probabilities since it can be shown that (as in the discrete-time case) if the initial state is chosen according to the distribution {P j }, then the probability of being in state j at time t is P j , for all t. Let us now determine the limiting probabilities for a birth and death process. From Equation (6.18) or equivalently, by equating the rate at which the process leaves a state with the rate at which it enters that state, we obtain State 0 1 2 n, n ! 1

Rate at which leave = rate at which enter λ0 P0 = µ1 P1 (λ1 + µ1 )P1 = µ2 P2 + λ0 P0 ( λ2 + µ2 )P2 = µ3 P3 + λ1 P1 (λn + µn )Pn = µn+1 Pn+1 + λn−1 Pn−1

By adding to each equation the equation preceding it, we obtain λ0 P0 = µ1 P1 , λ1 P1 = µ2 P2 , λ2 P2 = µ3 P3 , .. .

λn Pn = µn+1 Pn+1 ,

n!0

376

Introduction to Probability Models

Solving in terms of P0 yields λ0 P0 , µ1 λ1 λ1 λ0 P2 = P1 = P0 , µ2 µ2 µ1 λ2 λ2 λ1 λ0 P3 = P2 = P0 , µ3 µ3 µ2 µ1 .. . λn−1 λn−1 λn−2 · · · λ1 λ0 Pn−1 = P0 Pn = µn µn µn−1 · · · µ2 µ1 , And by using the fact that ∞ n=0 Pn = 1, we obtain P1 =

1 = P0 + P0

or P0 = and so Pn =

∞ ! λn−1 · · · λ1 λ0 n=1

,∞

µn · · · µ2 µ1

λ0 λ1 ···λn−1 n=1 µ1 µ2 ···µn

λ0 λ1 · · · λn−1 & ', , λ0 λ1 ···λn−1 µ1 µ2 · · · µn 1 + ∞ n=1 µ1 µ2 ···µn

n!1

(6.20)

The foregoing equations also show us what condition is necessary for these limiting probabilities to exist. Namely, it is necessary that ∞ ! λ0 λ1 · · · λn−1 n=1

µ1 µ2 · · · µn

k in the system then only k of the n will be in service, and so the departure rate in this case is kµ. Hence, the M/M/k is a birth and death queueing model with arrival rates λn = λ, n " 0 and departure rates ) nµ, µn = kµ,

if n # k if n " k

Queueing Theory

501

To analyze the general birth and death queueing model, let Pn denote the long-run proportion of time there are n in the system. Then, either as a consequence of the balance equations given by state rate at which pr ocess leaves = rate at which pr ocess enter s n=0 λ0 P0 = µ1 P1 n"1 (λn + µn )Pn = λn−1 Pn−1 + µn+1 Pn+1 or by directly using the result that the rate at which arrivals find n in the system is equal to the rate at which departures leave behind n, we obtain λn Pn = µn+1 Pn+1 ,

n"0

or, equivalently, that Pn+1 =

λn Pn , µn+1

n"0

Thus, P0 = P0 , λ0 P1 = P0 , µ1 λ1 P2 = P1 = µ2 λ2 P3 = P2 = µ3

λ1 λ0 P0 , µ2 µ1 λ2 λ1 λ0 P0 µ3 µ2 µ1

and, in general λ0 λ1 · · · λn−1 P0 , n " 1 µ1 µ2 · · · µn $ Using that ∞ n=0 Pn = 1 shows that % & ∞ # λ0 λ1 · · · λn−1 1 = P0 1 + µ1 µ2 · · · µn Pn =

n=1

Hence,

P0 = and Pn =

$∞

λ0 λ1 ···λn−1 n=1 µ1 µ2 ···µn

λ0 λ1 ···λn−1 µ1 µ2 ···µn $∞ λ0 λ1 ···λn−1 1 + n=1 µ1 µ2 ···µn

, n"1

502

Introduction to Probability Models

The necessary and sufficient conditions for the long-run probabilities to exist is that the denominator in the preceding is finite. That is, we need have that ∞ # λ0 λ1 · · · λn−1 n=1

µ1 µ2 · · · µn

k n n−k , µ k!k

Hence, using that P0 =

λn

µn k!k n−k

= (λ/kµ)n k k /k! we see that 1

n=1

(λ/µ)n /n! +

Pn = P0 (λ/µ)n /n!,

$∞

n k n=k+1 (λ/kµ) k /k!

if n # k

n k

Pn = P0 (λ/kµ) k /k!, if n > k It follows from the preceding that the condition needed for the limiting probabilities to exist is λ < kµ. Because kµ is the service rate when all servers are busy, the preceding is just the intuitive condition that for limiting probabilities to exist the service rate needs to be larger than the arrival rate when there are many customers in the system. ! Example 8.7 (M/M/1 Queue with Impatient Customers) Consider a single-server queue where customers arrive according to a Poisson process with rate λ and where the service distribution is exponential with rate µ, but now suppose that each customer will only spend an exponential time with rate α in queue before quitting the system. Assume that the impatient times are independent of all else, and that a customer who enters service always remains until its service is completed. This system can be modeled as a birth and death process with birth and death rates λn = λ, n " 0

µn = µ + (n − 1)α, n " 1 Using the previously obtained limiting probabilities enables us to answer a variety of questions about this system. For instance, suppose we wanted to determine the proportion of arrivals that receive service. Calling this quantity πs , it can be obtained by letting λs be the average rate at which customers are served and noting that λs λ To verify the preceding equation, let Na (t) and Ns (t) denote, respectively, the number of arrivals and the number of services by time t. Then, πs =

πs = lim

t→∞

Ns (t) Ns (t)/t λs = lim = Na (t) t→∞ Na (t)/t λ

Queueing Theory

503

Because the service departure rate is 0 when the system is empty and is µ when the system is nonempty, it follows that λs = µ(1 − P0 ), yielding that πs =

µ(1 − P0 ) λ

To determine W , the average time that a customer spends in the system, for the birth and death queueing system, we employ the fundamental queueing identity L = λa W . Because L is the average number of customers in the system, L=

∞ #

n Pn

n=0

Also, because the arrival rate when there are n in the system is λn and the proportion of time in which there are n in the system is Pn , we see that the average arrival rate of customers is λa =

∞ #

λn Pn

n=0

Consequently, $∞ n=0 n Pn W = $∞ n=0 λn Pn

Now consider an equal to the proportion of arrivals that find n in the system. Since arrivals are at rate λn whenever there are n in system it follows that the rate at which arrivals find n is λn Pn . Hence, in a large time T approximately λn Pn T of the approximately λa T arrivals will encounter n. Letting T go to infinity shows that the long-run proportion of arrivals finding n in the system is an =

λn Pn λa

Let us now consider the average length of a busy period, where we say that the system alternates between idle periods when there are no customers in the system and busy periods in which there is at least one customer in the system. Now, an idle period begins when the system is empty and ends when the next customer arrives. Because the arrival rate when the system is empty is λ0 , it thus follows that, independent of all that previously occurred, the length of an idle period is exponential with rate λ0 . Because a busy period always begins when there is one in the system and ends when the system is empty, it is easy to see that the lengths of successive busy periods are independent the lengths of the j th and identically distributed. Let I j and B j denote, respectively, $n th idle and the j busy period, j " 1. Now, in the first j=1 (I j + B j ) time units the $ system will be empty for a time nj=1 I j . Consequently, P0 , the long-run proportion

504

Introduction to Probability Models

of time in which the system is empty, can be expressed as P0 = long-run proportion of time empty I1 + . . . + In = lim n→∞ I1 + . . . + In + B1 + . . . + Bn + , I1 + . . . + In /n , + , = lim + n→∞ I1 + . . . + In /n + B1 + . . . + Bn /n E[I ] = E[I ] + E[B]

(8.11)

where I and B represent, respectively, the lengths of an idle and of a busy period, and where the final equality follows from the strong law of large numbers. Hence, using that E[I ] = 1/λ0 , we see that P0 = or,

1 1 + λ0 E[B]

E[B] =

1 − P0 λ0 P0

(8.12)

λ/µ 1 = µ−λ . For instance, in the M/M/1 queue, this yields E[B] = λ(1−λ/µ) Another quantity of interest is Tn , the amount of time during a busy period that there are n in the system. To determine its mean, note that E[Tn ] is the average amount of time there are n in the system in intervals between successive busy periods. Because the average time between successive busy periods is E[B] + E[I ], it follows that

Pn = long-run proportion of time there are n in system E[Tn ] = E[I ] + E[B] E[Tn ]P0 from (8.11) = E[I ]

Hence,

E[Tn ] =

Pn λ1 · · · λn−1 = λ0 P0 µ1 µ2 · · · µn

As a check, note that B= and thus,

∞ #

n=1

E[B] =

∞ # n=1

E[Tn ] =

∞ 1 # 1 − P0 Pn = λ0 P0 λ0 P0 n=1

which is in agreement with (8.12).

Queueing Theory

505

For the M/M/1 system, the preceding gives E[Tn ] = λn−1 /µn . Whereas in exponential birth and death queueing models the state of the system is just the number of customers in the system, there are other exponential models in which a more detailed state space is needed. To illustrate, we consider some examples.

8.3.4

A Shoe Shine Shop

Consider a shoe shine shop consisting of two chairs. Suppose that an entering customer first will go to chair 1. When his work is completed in chair 1, he will go either to chair 2 if that chair is empty or else wait in chair 1 until chair 2 becomes empty. Suppose that a potential customer will enter this shop as long as chair 1 is empty. (Thus, for instance, a potential customer might enter even if there is a customer in chair 2.) If we suppose that potential customers arrive in accordance with a Poisson process at rate λ, and that the service times for the two chairs are independent and have respective exponential rates of µ1 and µ2 , then (a) (b) (c) (d)

what proportion of potential customers enters the system? what is the mean number of customers in the system? what is the average amount of time that an entering customer spends in the system? Find πb , equal to the fraction of entering customers that are blockers? That is, find the fraction of entering customers that will have to wait after completing service with server 1 before they can enter chair 2.

To begin we must first decide upon an appropriate state space. It is clear that the state of the system must include more information than merely the number of customers in the system. For instance, it would not be enough to specify that there is one customer in the system as we would also have to know which chair he was in. Further, if we only know that there are two customers in the system, then we would not know if the man in chair 1 is still being served or if he is just waiting for the person in chair 2 to finish. To account for these points, the following state space, consisting of the five states (0, 0), (1, 0), (0, 1), (1, 1), and (b, 1), will be used. The states have the following interpretation: State Interpretation (0, 0) (1, 0) (0, 1) (1, 1) (b, 1)

There are no customers in the system. There is one customer in the system, and he is in chair 1. There is one customer in the system, and he is in chair 2. There are two customers in the system, and both are presently being served. There are two customers in the system, but the customer in the first chair has completed his work in that chair and is waiting for the second chair to become free.

It should be noted that when the system is in state (b, 1), the person in chair 1, though not being served, is nevertheless “blocking” potential arrivals from entering the system. As a prelude to writing down the balance equations, it is usually worthwhile to make a transition diagram. This is done by first drawing a circle for each state and

506

Introduction to Probability Models

Figure 8.1 A transition diagram.

then drawing an arrow labeled by the rate at which the process goes from one state to another. The transition diagram for this model is shown in Figure 8.1. The explanation for the diagram is as follows: The arrow from state (0, 0) to state (1, 0) that is labeled λ means that when the process is in state (0, 0), that is, when the system is empty, then it goes to state (1, 0) at a rate λ, that is, via an arrival. The arrow from (0, 1) to (1, 1) is similarly explained. When the process is in state (1, 0), it will go to state (0, 1) when the customer in chair 1 is finished and this occurs at a rate µ1 ; hence the arrow from (1, 0) to (0, 1) labeled µ1 . The arrow from (1, 1) to (b, 1) is similarly explained. When in state (b, 1) the process will go to state (0, 1) when the customer in chair 2 completes his service (which occurs at rate µ2 ); hence the arrow from (b, 1) to (0, 1) labeled µ2 . Also, when in state (1, 1) the process will go to state (1, 0) when the man in chair 2 finishes; hence the arrow from (1, 1) to (1, 0) labeled µ2 . Finally, if the process is in state (0, 1), then it will go to state (0, 0) when the man in chair 2 completes his service; hence the arrow from (0, 1) to (0, 0) labeled µ2 . Because there are no other possible transitions, this completes the transition diagram. To write the balance equations we equate the sum of the arrows (multiplied by the probability of the states where they originate) coming into a state with the sum of the arrows (multiplied by the probability of the state) going out of that state. This gives State (0, 0) (1, 0) (0, 1) (1, 1) (b, 1)

Rate that the process leaves = rate that it enters λP00 = µ2 P01 µ1 P10 = λP00 + µ2 P11 (λ + µ2 )P01 = µ1 P10 + µ2 Pb1 (µ1 + µ2 )P11 = λP01 µ2 Pb1 = µ1 P11

These along with the equation P00 + P10 + P01 + P11 + Pb1 = 1

Queueing Theory

507

may be solved to determine the limiting probabilities. Though it is easy to solve the preceding equations, the resulting solutions are quite involved and hence will not be explicitly presented. However, it is easy to answer our questions in terms of these limiting probabilities. First, since a potential customer will enter the system when the state is either (0, 0) or (0, 1), it follows that the proportion of customers entering the system is P00 + P01 . Secondly, since there is one customer in the system whenever the state is (0, 1) or (1, 0) and two customers in the system whenever the state is (1, 1) or (b, 1), it follows that L, the average number in the system, is given by L = P01 + P10 + 2(P11 + Pb1 ) To derive the average amount of time that an entering customer spends in the system, we use the relationship W = L/λa . Since a potential customer will enter the system when the state is either (0, 0) or (0, 1), it follows that λa = λ(P00 + P01 ) and hence W =

P01 + P10 + 2(P11 + Pb1 ) λ(P00 + P01 )

One way to determine the proportion of entering customers that are blockers is to condition on the state seen by the customer. Because the state seen by an entering customer is either (0, 0) or (0, 1), the probability that an entering customers finds the 01 . As an entering customer will be system in state (0, 1) is P(01 | 00 or 01) = P0,0P+P 0,1 a blocker if he or she enters the system when the state is (0, 1) and then completes service at 1 before server 2 has finished its service, we see that πb =

µ1 P01 P00 + P01 µ1 + µ2

Another way to obtain the proportion of entering customers that are blockers is to let λb be the rate at which customers become blockers, and then use that the proportion of entering customers that are blockers is λb /λa . Because blockers originate when the state is (1, 1) and a service at 1 occurs, it follows that λb = µ1 P11 , and so πb =

µ1 P11 λ(P00 + P01 )

That the two solutions agree follows from the balance equation for state (1, 1).

8.3.5

A Queueing System with Bulk Service

In this model, we consider a single-server exponential queueing system in which the server is able to serve two customers at the same time. Whenever the server completes a service, she then serves the next two customers at the same time. However, if there is only one customer in line, then she serves that customer by herself. We shall assume that her service time is exponential at rate µ whether she is serving one or two customers. As usual, we suppose that customers arrive at an exponential rate λ. One example of

508

Introduction to Probability Models

Figure 8.2 A transition diagram.

such a system might be an elevator or a cable car that can take at most two passengers at any time. It would seem that the state of the system would have to tell us not only how many customers there are in the system, but also whether one or two are presently being served. However, it turns out that we can more easily solve the problem not by concentrating on the number of customers in the system, but rather on the number in queue. So let us define the state as the number of customers waiting in queue, with two states when there is no one in queue. That is, let us have as a state space 0′ , 0, 1, 2, . . ., with the interpretation State 0′ 0 n, n > 0

I nter pr etation No one in service Server busy; no one waiting n customers waiting

The transition diagram is shown in Figure 8.2 and the balance equations are State Rate at which the pr ocess leaves = rate at which it enter s 0′ λP0′ = µP0 0 (λ + µ)P0 = λP0′ + µP1 + µP2 n, n " 1 (λ + µ)Pn = λPn−1 + µPn+2 Now the set of equations (λ + µ)Pn = λPn−1 + µPn+2 ,

n = 1, 2, . . .

has a solution of the form Pn = α n P0 To see this, substitute the preceding in Equation (8.13) to obtain (λ + µ)α n P0 = λα n−1 P0 + µα n+2 P0 or (λ + µ)α = λ + µα 3

(8.13)

Queueing Theory

509

Solving this for α yields the following three roots: √ √ −1 − 1 + 4λ/µ −1 + 1 + 4λ/µ α = 1, α= , and α = 2 2 As the first two are clearly not possible, it follows that √ 1 + 4λ/µ − 1 α= 2 Hence, Pn = α n P0 , µ P0′ = P0 λ where the bottom equation follows from the first balance equation. (We can ignore the second balance equation as one of these equations is always redundant.) To obtain P0 , we use P0 + P0′ + or P0

n=1

Pn = 1

& ∞ µ # n α =1 1+ + λ

or P0

∞ #

n=1

( 1 µ + =1 1−α λ

P0 = and, thus

λ(1 − α) λ + µ(1 − α)

α n λ(1 − α) , λ + µ(1 − α) µ(1 − α) P0′ = λ + µ(1 − α) Pn =

where

α=

√

1 + 4λ/µ − 1 2

n"0 (8.14)

510

Introduction to Probability Models

Note that for the preceding to be valid we need α < 1, or equivalently λ/µ < 2, which is intuitive since the maximum service rate is 2µ, which must be larger than the arrival rate λ to avoid overloading the system. All the relevant quantities of interest now can be determined. For instance, to determine the proportion of customers that are served alone, we first note that the rate at which customers are served alone is λP0′ + µP1 , since when the system is empty a customer will be served alone upon the next arrival and when there is one customer in queue he will be served alone upon a departure. As the rate at which customers are served is λ, it follows that λP0′ + µP1 λ µ ′ = P0 + P1 λ

proportion of customers that are served alone =

Also, LQ =

∞ #

n Pn

n=1

∞

λ(1 − α) # n nα λ + µ(1 − α)

λα (1 − α)[λ + µ(1 − α)]

and WQ =

from Equation(8.14)

n=1

LQ , λ

W = WQ +

by algebraic identity

∞ # 1

nα n =

α (1 − α)2

1 , µ

L=λW

8.4 8.4.1

Network of Queues Open Systems

Consider a two-server system in which customers arrive at a Poisson rate λ at server 1. After being served by server 1 they then join the queue in front of server 2. We suppose there is infinite waiting space at both servers. Each server serves one customer at a time with server i taking an exponential time with rate µi for a service, i = 1, 2. Such a system is called a tandem or sequential system (see Figure 8.3). To analyze this system we need to keep track of the number of customers at server 1 and the number at server 2. So let us define the state by the pair (n, m)—meaning

Queueing Theory

511

Figure 8.3 A tandem queue.

that there are n customers at server 1 and m at server 2. The balance equations are Rate that the pr ocess leaves = rate that it enter s λP0,0 = µ2 P0,1 (λ + µ1 )Pn,0 = µ2 Pn,1 + λPn−1,0 (λ + µ2 )P0,m = µ2 P0,m+1 + µ1 P1,m−1 (λ + µ1 + µ2 )Pn,m = µ2 Pn,m+1 + µ1 Pn+1,m−1 + λPn−1,m (8.15) $ Rather than directly attempting to solve these (along with the equation n,m Pn,m = 1) we shall guess at a solution and then verify that it indeed satisfies the preceding. We first note that the situation at server 1 is just as in an M/M/1 model. Similarly, as it was shown in Section 6.6 that the departure process of an M/M/1 queue is a Poisson process with rate λ, it follows that what server 2 faces is also an M/M/1 queue. Hence, the probability that there are n customers at server 1 is " ! "n ! λ λ 1− P{n at server 1} = µ1 µ1 State 0, 0 n, 0; n > 0 0, m; m > 0 n, m; nm > 0

and, similarly, P{m at server 2} =

λ µ2

"m ! " λ 1− µ2

Now, if the numbers of customers at servers 1 and 2 were independent random variables, then it would follow that " ! "m ! " ! "n ! λ λ λ λ 1− 1− (8.16) Pn,m = µ1 µ1 µ2 µ2 To verify that Pn,m is indeed equal to the preceding (and thus that the number of customers at server 1 is independent of the number at server 2), all we need do is verify that the preceding satisfies Equations (8.15)—this suffices since we know that the Pn,m are the unique solution of Equations (8.15). Now, for instance, if we consider the first equation of (8.15), we need to show that "! " ! "! "! " ! λ λ λ λ λ 1− = µ2 1 − 1− λ 1− µ1 µ2 µ1 µ2 µ2

512

Introduction to Probability Models

which is easily verified. We leave it as an exercise to show that the Pn,m , as given by Equation (8.16), satisfy all of the equations of (8.15), and are thus the limiting probabilities. From the preceding we see that L, the average number of customers in the system, is given by # (n + m)Pn,m L= n,m

" # ! "m ! " # ! λ "n ! λ λ λ 1− + 1− = n m µ1 µ1 µ2 µ2 n m =

λ λ + µ1 − λ µ2 − λ

and from this we see that the average time a customer spends in the system is W =

1 1 L = + λ µ1 − λ µ2 − λ

Remarks (i) The result (Equations (8.15)) could have been obtained as a direct consequence of the time reversibility of an M/M/1 (see Section 6.6). For not only does time reversibility imply that the output from server 1 is a Poisson process, but it also implies (Exercise 26 of Chapter 6) that the number of customers at server 1 is independent of the past departure times from server 1. As these past departure times constitute the arrival process to server 2, the independence of the numbers of customers in the two systems follows. (ii) Since a Poisson arrival sees time averages, it follows that in a tandem queue the numbers of customers an arrival (to server 1) sees at the two servers are independent random variables. However, it should be noted that this does not imply that the waiting times of a given customer at the two servers are independent. For a counter example suppose that λ is very small with respect to µ1 = µ2 , and thus almost all customers have zero wait in queue at both servers. However, given that the wait in queue of a customer at server 1 is positive, his wait in queue at server 2 also will be positive with probability at least as large as 21 (why?). Hence, the waiting times in queue are not independent. Remarkably enough, however, it turns out that the total times (that is, service time plus wait in queue) that an arrival spends at the two servers are indeed independent random variables. The preceding result can be substantially generalized. To do so, consider a system of k servers. Customers arrive from outside the system to server i, i = 1, . . . , k, in accordance with independent Poisson processes at rate ri ; they then join the queue at i until their turn at service comes. Once a customer is served by server i, he $then joins the queue in front of server j, j = 1, . . . , k, with probability Pi j . Hence, kj=1 Pi j # 1, $ and 1 − kj=1 Pi j represents the probability that a customer departs the system after being served by server i.

Queueing Theory

513

If we let λ j denote the total arrival rate of customers to server j, then the λ j can be obtained as the solution of λj = rj +

k #

i = 1, . . . , k

λi Pi j ,

i=1

(8.17)

Equation (8.17) follows since rj is the arrival rate of customers to j coming from outside the system and, as λi is the rate at which customers depart server i (rate in must equal rate out), λi Pi j is the arrival rate to j of those coming from server i. It turns out that the number of customers at each of the servers is independent and of the form ! "n ! " λj λj P{n customers at server j} = 1− , n"1 µj µj where µ j is the exponential service rate at server j and the λ j are the solution to Equation (8.17). Of course, it is necessary that λ j /µ j < 1 for all j. To prove this, we first note that it is equivalent to asserting that the limiting probabilities P(n 1 , n 2 , . . . , n k ) = P{n j at server j, j = 1, . . . , k} are given by P(n 1 , n 2 , . . . , n k ) =

" ! " k ! λj λj nj 1− µj µj

(8.18)

j=1

which can be verified by showing that it satisfies the balance equations for this model. The average number of customers in the system is L= =

k #

average number at server j

j=1

k #

λj µj − λj

j=1

The average time a customer spends in the system can be obtained from L = λW with $ $ λ = kj=1 r j . (Why not λ = kj=1 λ j ?) This yields W =

j=1 λ j /(µ j − λ j ) $k j=1 r j

Remark The result embodied in Equation (8.18) is rather remarkable in that it says that the distribution of the number of customers at server i is the same as in an M/M/1 system with rates λi and µi . What is remarkable is that in the network model the arrival process at node i need not be a Poisson process. For if there is a possibility that a customer may visit a server more than once (a situation called feedback), then the arrival process will not be Poisson. An easy example illustrating this is to suppose that there is a

514

Introduction to Probability Models

single server whose service rate is very large with respect to the arrival rate from outside. Suppose also that with probability p = 0.9 a customer upon completion of service is fed back into the system. Hence, at an arrival time epoch there is a large probability of another arrival in a short time (namely, the feedback arrival); whereas at an arbitrary time point there will be only a very slight chance of an arrival occurring shortly (since λ is so very small). Hence, the arrival process does not possess independent increments and so cannot be Poisson. Thus, we see that when feedback is allowed the steady-state probabilities of the number of customers at any given station have the same distribution as in an M/M/1 model even though the model is not M/M/1. (Presumably such quantities as the joint distribution of the number at the station at two different time points will not be the same as for an M/M/1.) Example 8.8 Consider a system of two servers where customers from outside the system arrive at server 1 at a Poisson rate 4 and at server 2 at a Poisson rate 5. The service rates of 1 and 2 are respectively 8 and 10. A customer upon completion of service at server 1 is equally likely to go to server 2 or to leave the system (i.e., P11 = 0, P12 = 21 ); whereas a departure from server 2 will go 25 percent of the time to server 1 and will depart the system otherwise (i.e., P21 = 41 , P22 = 0). Determine the limiting probabilities, L, and W . Solution: The total arrival rates to servers 1 and 2—call them λ1 and λ2 —can be obtained from Equation (8.17). That is, we have λ1 = 4 + 41 λ2 ,

λ2 = 5 + 21 λ1 implying that λ1 = 6,

λ2 = 8

Hence, P{n at server 1, m at server 2} = =

and 8 6 + = 7, 8 − 6 10 − 8 7 L W = = 9 9

+ 3 ,n 4 1 20

1 4

+ 4 ,m

1 5 + 3 ,n + 4 ,m 4 5 5

8.4.2

Closed Systems

The queueing systems described in Section 8.4.1 are called open systems since customers are able to enter and depart the system. A system in which new customers never enter and existing ones never depart is called a closed system.

Queueing Theory

515

Let us suppose that we have m customers moving among a system of k servers, where the service times at server i are exponential with rate µi , i = 1, . . . , k. When a customer completes service at server i, she then joins the queue $ in front of server j, j = 1, . . . , k, with probability Pi j , where we now suppose that kj=1 Pi j = 1 for all i = 1, . . . , k. That is, P = [Pi j ] is a Markov transition probability matrix, which we shall assume is irreducible. Let π = (π1 , . . . , πk ) denote the stationary probabilities for this Markov chain; that is, π is the unique positive solution of πj = k # j=1

k #

πi Pi j ,

i=1

πj = 1

(8.19)

If we denote the average arrival rate (or equivalently the average service completion rate) at server j by λm ( j), j = 1, . . . , k then, analogous to Equation (8.17), the λm ( j) satisfy λm ( j) =

k #

λm (i)Pi j

i=1

Hence, from (8.19) we can conclude that λm ( j) = λm π j ,

j = 1, 2, . . . , k

(8.20)

where λm =

k #

(8.21)

λm ( j)

j=1

From Equation (8.21), we see that λm is the average service completion rate of the entire system, that is, it is the system throughput rate.∗ If we let Pm (n 1 , n 2 , . . . , n k ) denote the limiting probabilities Pm (n 1 , n 2 , . . . , n k ) = P{n j customers at server j, j = 1, . . . , k} then, by verifying that they satisfy the balance equation, it can be shown that )

Km Pm (n 1 , n 2 , . . . , n k ) = 0, ∗

j=1 (λm ( j)/µ j )

nj,

$ if kj=1 n j = m otherwise

We are just using the notation λm ( j) and λm to indicate the dependence on the number of customers in the closed system. This will be used in recursive relations we will develop.

516

Introduction to Probability Models

But from Equation (8.20) we thus obtain ) .k $ Cm j=1 (π j /µ j )n j , if kj=1 n j = m Pm (n 1 , n 2 , . . . , n k ) = 0, otherwise

(8.22)

where ⎤−1

⎡

k ⎥ ⎢ # Cm = ⎢ (π j /µ j )n j ⎥ ⎦ ⎣ n$ 1 ,...,n k : n j =m

(8.23)

j=1

Equation (8.22) is not as useful as we might suppose, for in order to utilize it we must determine the normalizing constant Cm given by Equation (8.23), which requires summing the products %kj=1 (π j /µ j )n j over all the feasible vectors , + $ (n 1 , . . . , n k ): kj=1 n j = m. Hence, since there are m +mk − 1 vectors this is only computationally feasible for relatively small values of m and k. We will now present an approach that will enable us to determine recursively many of the quantities of interest in this model without first computing the normalizing constants. To begin, consider a customer who has just left server i and is headed to server j, and let us determine the probability of the system as seen by this customer. In particular, let us determine the probability $k that this customer observes, at that moment, nl = m − 1. This is done as follows: nl customers at server l, l = 1, . . . , k, l=1 P{customer observes nl at server l, l = 1, . . . , k | customer goes from i to j} P{state is (n 1 , . . . , n i + 1, . . . , n j , . . . , n k ), customer goes from i to j} = P{customer goes from i to j} Pm (n 1 , . . . , n i + 1, . . . , n j , . . . , n k )µi Pi j =$ $ n: n j =m−1 Pm (n 1 , . . . , n i + 1, . . . , n k )µi Pi j . (πi /µi ) kj=1 (π j /µ j )n j from (8.22) = K k =C (π j /µ j )n j j=1

where C does not depend on n 1 , . . . , n k . But $ because the preceding is a probability density on the set of vectors (n 1 , . . . , n k ), kj=1 n j = m − 1, it follows from (8.22) that it must equal Pm−1 (n 1 , . . . , n k ). Hence, P{customer observes nl at server l, l = 1, . . . , k | customer goes from i to j} = Pm−1 (n 1 , . . . , n k ),

k # i=1

ni = m − 1

(8.24)

As (8.24) is true for all i, we thus have proven the following proposition, known as the arrival theorem.

Queueing Theory

517

Proposition 8.3 (The Arrival Theorem) In the closed network system with m customers, the system as seen by arrivals to server j is distributed as the stationary distribution in the same network system when there are only m − 1 customers.

Denote by L m ( j) and Wm ( j) the average number of customers and the average time a customer spends at server j when there are m customers in the network. Upon conditioning on the number of customers found at server j by an arrival to that server, it follows that 1 + E m [number at server j as seen by an arrival] Wm ( j) = µj 1 + L m−1 ( j) = (8.25) µj where the last equality follows from the arrival theorem. Now when there are m − 1 customers in the system, then, from Equation (8.20), λm−1 ( j), the average arrival rate to server j, satisfies λm−1 ( j) = λm−1 π j Now, applying the basic cost identity Equation (8.1) with the cost rule being that each customer in the network system of m − 1 customers pays one per unit time while at server j, we obtain L m−1 ( j) = λm−1 π j Wm−1 ( j)

(8.26)

Using Equation (8.25), this yields 1 + λm−1 π j Wm−1 ( j) (8.27) µj $ Also using the fact that kj=1 L m−1 ( j) = m − 1 (why?) we obtain, from Equation (8.26), the following: Wm ( j) =

m − 1 = λm−1 or λm−1 = $k

k #

π j Wm−1 ( j)

j=1

m−1

i=1 πi Wm−1 (i)

(8.28)

Hence, from Equation (8.27), we obtain the recursion Wm ( j) =

(m − 1)π j Wm−1 ( j) 1 + $k µj µ j i=1 πi Wm−1 (i)

(8.29)

Starting with the stationary probabilities π j , j = 1, . . . , k, and W1 ( j) = 1/µ j we can now use Equation (8.29) to determine recursively W2 ( j), W3 ( j), . . . , Wm ( j). We can then determine the throughput rate λm by using Equation (8.28), and this will determine L m ( j) by Equation (8.26). This recursive approach is called mean value analysis.

518

Introduction to Probability Models

Example 8.9 Consider a k-server network in which the customers move in a cyclic permutation. That is, Pi,i+1 = 1,

i = 1, 2 . . . , k − 1,

Pk,1 = 1

Let us determine the average number of customers at server j when there are two customers in the system. Now, for this network, πi = 1/k,

i = 1, . . . , k

and as W1 ( j) =

1 µj

we obtain from Equation (8.29) that (1/k)(1/µ j ) 1 + $k µj µ j i=1 (1/k)(1/µi ) 1 1 = + $k 2 µj µ j i=1 1/µi

W2 ( j) =

Hence, from Equation (8.28), λ2 =

2 k # l=1

1 W2 (l) k

k # l=1

2k 1 1 + $k 2 µl µl i=1 1/µi

and finally, using Equation (8.26), 5

6 1 1 + 2 $k µj µ2j i=1 1/µi 1 L 2 ( j) = λ2 W2 ( j) = 5 6 k k # 1 1 + $k µl µl2 i=1 1/µi

l=1

Another approach to learning about the stationary probabilities specified by Equation (8.22), which finesses the computational difficulties of computing the constant Cm , is to use the Gibbs sampler of Section 4.9 to generate a Markov chain having these stationary probabilities. To begin, note that since there are always a total of m customers in the system, Equation (8.22) may equivalently be written as a joint mass function of the numbers of customers at each of the servers 1, . . . , k − 1, as follows: $

Pm (n 1 , . . . , n k−1 ) = Cm (πk /µk )m− =K

k−1 -

j=1

(a j )n j ,

k−1 -

(π j /µ j )n j

j=1

k−1 # j=1

nj # m

Queueing Theory

519

where a j = (π j µk )/(πk µ j ), j = 1, . . . , k − 1. Now, if N = (N1 , . . . , Nk−1 ) has the preceding joint mass function then P{Ni = n|N1 = n 1 , . . . , Ni−1 = n i−1 , Ni+1 = n i+1 , . . . , Nk−1 = n k−1 } Pm (n 1 , . . . , n i−1 , n, n i+1 , . . . , n k−1 ) =$ r Pm (n 1 , . . . , n i−1 , r, n i+1 , . . . , n k−1 ) # = Cain , n#m− nj j̸=i

It follows from the preceding that we may use the Gibbs sampler to generate the values of a Markov chain whose limiting probability mass function is Pm (n 1 , . . . , n k−1 ) as follows: $ 1. Let (n 1 , . . . , n k−1 ) be arbitrary nonnegative integers satisfying k−1 j=1 n j # m. 2. Generate a random variable I that is equally likely to be any of 1, . . . , k − 1. $ 3. If I = i, set s = m − j̸=i n j , and generate the value of a random variable X having probability mass function P{X = n} = Cain ,

n = 0, . . . , s

4. Let n I = X and go to step 2.

$ The successive values of the state vector (n 1 , . . . , n k−1 , m − k−1 j=1 n j ) constitute the sequence of states of a Markov chain with the limiting distribution Pm . All quantities of interest can be estimated from this sequence. For instance, the average of the values of the jth coordinate of these vectors will converge to the mean number of individuals at station j, the proportion of vectors whose jth coordinate is less than r will converge to the limiting probability that the number of individuals at station j is less than r , and so on. Other quantities of interest can also be obtained from the simulation. For instance, suppose we want to estimate W j , the average amount of time a customer spends at server j on each visit. Then, as noted in the preceding, L j , the average number of customers at server j, can be estimated. To estimate W j , we use the identity L j = λjWj where λ j is the rate at which customers arrive at server j. Setting λ j equal to the service completion rate at server j shows that λ j = P{ j is busy}µ j Using the Gibbs sampler simulation to estimate P{ j is busy} then leads to an estimator of W j .

520

Introduction to Probability Models

8.5

The System M/G/1

8.5.1

Preliminaries: Work and Another Cost Identity

For an arbitrary queueing system, let us define the work in the system at any time t to be the sum of the remaining service times of all customers in the system at time t. For instance, suppose there are three customers in the system—the one in service having been there for three of his required five units of service time, and both people in queue having service times of six units. Then the work at that time is 2 + 6 + 6 = 14. Let V denote the (time) average work in the system. Now recall the fundamental cost equation (8.1), which states that the average rate at which the system earns = λa × average amount a customer pays and consider the following cost rule: Each customer pays at a rate of y/unit time when his remaining service time is y, whether he is in queue or in service. Thus, the rate at which the system earns is just the work in the system; so the basic identity yields V = λa E[amount paid by a customer] Now, let S and W Q∗ denote respectively the service time and the time a given customer spends waiting in queue. Then, since the customer pays at a constant rate of S per unit time while he waits in queue and at a rate of S − x after spending an amount of time x in service, we have ( ' 7 S (S − x)d x E[amount paid by a customer] = E SW Q∗ + 0

and thus V = λa E[SW Q∗ ] +

λa E[S 2 ] 2

(8.30)

It should be noted that the preceding is a basic queueing identity (like Equations (8.2)–(8.4)) and as such is valid in almost all models. In addition, if a customer’s service time is independent of his wait in queue (as is usually, but not always the case),∗then we have from Equation (8.30) that V = λa E[S]W Q +

8.5.2

λa E[S 2 ] 2

(8.31)

Application of Work to M/G/1

The M/G/1 model assumes (i) Poisson arrivals at rate λ; (ii) a general service distribution; and (iii) a single server. In addition, we will suppose that customers are served in the order of their arrival. ∗

For an example where it is not true, see Section 8.6.2.

Queueing Theory

521

Now, for an arbitrary customer in an M/G/1 system, customer’s wait in queue = work in the system when he arrives

(8.32)

This follows since there is only a single server (think about it!). Taking expectations of both sides of Equation (8.32) yields W Q = average work as seen by an arrival But, due to Poisson arrivals, the average work as seen by an arrival will equal V , the time average work in the system. Hence, for the model M/G/1, WQ = V The preceding in conjunction with the identity V = λE[S]W Q +

λE[S 2 ] 2

yields the so-called Pollaczek–Khintchine formula, WQ =

λE[S 2 ] 2(1 − λE[S])

(8.33)

where E[S] and E[S 2 ] are the first two moments of the service distribution. The quantities L , L Q , and W can be obtained from Equation (8.33) as λ2 E[S 2 ] , 2(1 − λE[S]) λE[S 2 ] + E[S], W = W Q + E[S] = 2(1 − λE[S]) λ2 E[S 2 ] + λE[S] L = λW = 2(1 − λE[S])

L Q = λW Q =

(8.34)

Remarks (i) For the preceding quantities to be finite, we need λE[S] < 1. This condition is intuitive since we know from renewal theory that if the server was always busy, then the departure rate would be 1/E[S] (see Section 7.3), which must be larger than the arrival rate λ to keep things finite. (ii) Since E[S 2 ] = Var(S) + (E[S])2 , we see from Equations (8.33) and (8.34) that, for fixed mean service time, L , L Q , W , and W Q all increase as the variance of the service distribution increases. (iii) Another approach to obtain W Q is presented in Exercise 38.

522

Introduction to Probability Models

8.5.3

Busy Periods

The system alternates between idle periods (when there are no customers in the system, and so the server is idle) and busy periods (when there is at least one customer in the system, and so the server is busy). Let I and B represent, respectively, the length of an idle and of a busy period. Because I represents the time from when a customer departs and leaves the system empty until the next arrival, it follows, since arrivals are according to a Poisson process with rate λ, that I is exponential with rate λ and thus E[I ] =

1 λ

(8.35)

To determine E[B] we argue, as in Section 8.3.3, that the long-run proportion of time the system is empty is equal to the ratio of E[I ] to E[I ] + E[B]. That is, P0 =

E[I ] E[I ] + E[B]

(8.36)

To compute P0 , we note from Equation (8.4) (obtained from the fundamental cost equation by supposing that a customer pays at a rate of one per unit time while in service) that average number of busy servers = λE[S] However, as the left-hand side of the preceding equals 1 − P0 (why?), we have P0 = 1 − λE[S]

(8.37)

and, from Equations (8.35)–(8.37), 1 − λE[S] = or E[B] =

1/λ 1/λ + E[B]

E[S] 1 − λE[S]

Another quantity of interest is C, the number of customers served in a busy period. The mean of C can be computed by noting that, on the average, for every E[C] arrivals exactly one will find the system empty (namely, the first customer in the busy period). Hence, a0 =

1 E[C]

and, as a0 = P0 = 1 − λE[S] because of Poisson arrivals, we see that E[C] =

1 1 − λE[S]

Queueing Theory

8.6

523

Variations on the M/G/1

8.6.1

The M/G/1 with Random-Sized Batch Arrivals

Suppose that, as in the M/G/1, arrivals occur in accordance with a Poisson process having rate λ. But now suppose that each arrival consists not of a single customer but of a random number of customers. As before there is a single server whose service times have distribution G. Let us denote by α j , j " 1, the probability that an arbitrary batch consists of j customers; and let N denote a random variable representing the size of a batch and so P{N = j} = α j . Since λa = λE(N ), the basic formula for work (Equation (8.31)) becomes ' ( E(S 2 ) V = λE[N ] E(S)W Q + (8.38) 2

To obtain a second equation relating V to W Q , consider an average customer. We have that his wait in queue = work in system when he arrives + his waiting time due to those in his batch

Taking expectations and using the fact that Poisson arrivals see time averages yields W Q = V + E[waiting time due to those in his batch] = V + E[W B ]

(8.39)

Now, E(W B ) can be computed by conditioning on the number in the batch, but we must be careful because the probability that our average customer comes from a batch of size j is not α j . For α j is the proportion of batches that are of size j, and if we pick a customer at random, it is more likely that he comes from a larger rather than a smaller batch. (For instance, suppose α1 = α100 = 21 , then half the batches are of size 1 but 100/101 of the customers will come from a batch of size 100!) To determine the probability that our average customer came from a batch of size j we reason as follows: Let M be a large number. Then of the first M batches approximately Mα j will be of size j, j " 1, and thus there would have been approximately arrivals in j Mα j customers that arrived in a batch of size j. Hence, the proportion of$ the first M batches that were from batches of size j is approximately j Mα j / j j Mα j . This proportion becomes exact as M → ∞, and so we see that jα j proportion of customers from batches of size j = $ j jα j

jα j E[N ] We are now ready to compute E(W B ), the expected wait in queue due to others in the batch: # jα j (8.40) E[W B | batch of size j] E[W B ] = E[N ] =

524

Introduction to Probability Models

Now if there are j customers in his batch, then our customer would have to wait for i − 1 of them to be served if he was ith in line among his batch members. As he is equally likely to be either 1st, 2nd, . . . , or jth in line we see that E[W B | batch is of size j] = =

j # i=1

(i − 1)E(S)

1 j

j −1 E[S] 2

Substituting this in Equation (8.40) yields E[W B ] = =

E[S] # ( j − 1) jα j 2E[N ] j

E[S](E[N 2 ] − E[N ]) 2E[N ]

and from Equations (8.38) and (8.39) we obtain WQ =

E[S](E[N 2 ] − E[N ])/2E[N ] + λE[N ]E[S 2 ]/2 1 − λE[N ]E[S]

Remarks (i) Note that the condition for W Q to be finite is that λE(N )
0 waiting in queue before entering service. (a) Show that, conditional on the preceding, the number of other customers that were in the system when the customer arrived is distributed as 1 + P, where P is a Poisson random variable with mean λ. (b) Let W Q∗ denote the amount of time that an M/M/1 customer spends in queue. As a by-product of your analysis in part (a), show that P{W Q∗

# x} =

1−

λ µ λ µ

λ −(µ−λ)x ) µ (1 − e

if x = 0

if x > 0

548

Introduction to Probability Models

5. It follows from Exercise 4 that if, in the M/M/1 model, W Q∗ is the amount of time that a customer spends waiting in queue, then * 0, with probability 1 − λ/µ ∗ WQ = Exp(µ − λ), with probability λ/µ where Exp(µ − λ) is an exponential random variable with rate µ − λ. Using this, find Var(W Q∗ ). *6. Show that W is smaller in an M/M/1 model having arrivals at rate λ and service at rate 2µ than it is in a two-server M/M/2 model with arrivals at rate λ and with each server at rate µ. Can you give an intuitive explanation for this result? Would it also be true for W Q ? 7. Consider the M/M/1 queue with impatient customers model as presented in Example 8.7 Give your answers in terms of the limiting probabilities Pn , n " 0. (a) What is the average amount of time that a customer spends in queue. (b) If en denotes the probability that a customer who finds n others in the system upon arrival will be served, find en , n " 0. (c) Find the conditional probability that a served customer found n in the system upon arrival. That is, find P(arrival finds n| arrival is served). (d) Find the average amount of time spent in queue by a customer that is served. (e) Find the average amount of time spent in queue by a customer that departs before entering service. 8. A facility produces items according to a Poisson process with rate λ. However, it has shelf space for only k items and so it shuts down production whenever k items are present. Customers arrive at the facility according to a Poisson process with rate µ. Each customer wants one item and will immediately depart either with the item or empty handed if there is no item available. (a) Find the proportion of customers that go away empty handed. (b) Find the average time that an item is on the shelf. (c) Find the average number of items on the shelf. 9. A group of n customers moves around among two servers. Upon completion of service, the served customer then joins the queue (or enters service if the server is free) at the other server. All service times are exponential with rate µ. Find the proportion of time that there are j customers at server 1, j = 0, . . . , n. 10. A group of m customers frequents a single-server station in the following manner. When a customer arrives, he or she either enters service if the server is free or joins the queue otherwise. Upon completing service the customer departs the system, but then returns after an exponential time with rate θ . All service times are exponentially distributed with rate µ. (a) Find the average rate at which customers enter the station. (b) Find the average time that a customer spends in the station per visit. 11. Families arrive at a taxi stand according to a Poisson process with rate λ. An arriving family finding N other families waiting for a taxi does not wait. Taxis

Queueing Theory

549

arrive at the taxi stand according to a Poisson process with rate µ. A taxi finding M other taxis waiting does not wait. Derive expressions for the following quantities. (a) (b) (c) (d) (e)

The proportion of time are there no families waiting. The proportion of time are there no taxis waiting. The average amount of time that a family waits. The average amount of time that a taxi waits. The fraction of families that take taxis.

Now redo the problem if we assume that N = M = ∞ and that each family will only wait for an exponential time with rate α before seeking other transportation, and each taxi will only wait for an exponential time with rate β before departing without a fare. *12. A supermarket has two exponential checkout counters, each operating at rate µ. Arrivals are Poisson at rate λ. The counters operate in the following way: (i) One queue feeds both counters. (ii) One counter is operated by a permanent checker and the other by a stock clerk who instantaneously begins checking whenever there are two or more customers in the system. The clerk returns to stocking whenever he completes a service, and there are fewer than two customers in the system. (a) Find Pn , proportion of time there are n in the system. (b) At what rate does the number in the system go from 0 to 1? From 2 to 1? (c) What proportion of time is the stock clerk checking? Hint: Be a little careful when there is one in the system. 13. Two customers move about among three servers. Upon completion of service at server i, the customer leaves that server and enters service at whichever of the other two servers is free. (Therefore, there are always two busy servers.) If the service times at server i are exponential with rate µi , i = 1, 2, 3, what proportion of time is server i idle? 14. Consider a queueing system having two servers and no queue. There are two types of customers. Type 1 customers arrive according to a Poisson process having rate λ1 , and will enter the system if either server is free. The service time of a type 1 customer is exponential with rate µ1 . Type 2 customers arrive according to a Poisson process having rate λ2 . A type 2 customer requires the simultaneous use of both servers; hence, a type 2 arrival will only enter the system if both servers are free. The time that it takes (the two servers) to serve a type 2 customer is exponential with rate µ2 . Once a service is completed on a customer, that customer departs the system. (a) Define states to analyze the preceding model. (b) Give the balance equations. In terms of the solution of the balance equations, find (c) the average amount of time an entering customer spends in the system; (d) the fraction of served customers that are type 1.

550

Introduction to Probability Models

15. Consider a sequential-service system consisting of two servers, A and B. Arriving customers will enter this system only if server A is free. If a customer does enter, then he is immediately served by server A. When his service by A is completed, he then goes to B if B is free, or if B is busy, he leaves the system. Upon completion of service at server B, the customer departs. Assume that the (Poisson) arrival rate is two customers an hour, and that A and B serve at respective (exponential) rates of four and two customers an hour. (a) (b) (c) (d)

What proportion of customers enter the system? What proportion of entering customers receive service from B? What is the average number of customers in the system? What is the average amount of time that an entering customer spends in the system?

16. Customers arrive at a two-server system according to a Poisson process having rate λ = 5. An arrival finding server 1 free will begin service with that server. An arrival finding server 1 busy and server 2 free will enter service with server 2. An arrival finding both servers busy goes away. Once a customer is served by either server, he departs the system. The service times at server i are exponential with rates µi , where µ1 = 4, µ2 = 2. (a) What is the average time an entering customer spends in the system? (b) What proportion of time is server 2 busy? 17. Customers arrive at a two-server station in accordance with a Poisson process with a rate of two per hour. Arrivals finding server 1 free begin service with that server. Arrivals finding server 1 busy and server 2 free begin service with server 2. Arrivals finding both servers busy are lost. When a customer is served by server 1, she then either enters service with server 2 if 2 is free or departs the system if 2 is busy. A customer completing service at server 2 departs the system. The service times at server 1 and server 2 are exponential random variables with respective rates of four and six per hour. (a) What fraction of customers do not enter the system? (b) What is the average amount of time that an entering customer spends in the system? (c) What fraction of entering customers receives service from server 1? 18. Arrivals to a three-server system are according to a Poisson process with rate λ. Arrivals finding server 1 free enter service with 1. Arrivals finding 1 busy but 2 free enter service with 2. Arrivals finding both 1 and 2 busy do not join the system. After completion of service at either 1 or 2 the customer will then either go to server 3 if 3 is free or depart the system if 3 is busy. After service at 3 customers depart the system. The service times at i are exponential with rate µi , i = 1, 2, 3. (a) Define states to analyze the above system. (b) Give the balance equations.

Queueing Theory

551

(c) In terms of the solution of the balance equations, what is the average time that an entering customer spends in the system? (d) Find the probability that a customer who arrives when the system is empty is served by server 3. 19. The economy alternates between good and bad periods. During good times customers arrive at a certain single-server queueing system in accordance with a Poisson process with rate λ1 , and during bad times they arrive in accordance with a Poisson process with rate λ2 . A good time period lasts for an exponentially distributed time with rate α1 , and a bad time period lasts for an exponential time with rate α2 . An arriving customer will only enter the queueing system if the server is free; an arrival finding the server busy goes away. All service times are exponential with rate µ. (a) Define states so as to be able to analyze this system. (b) Give a set of linear equations whose solution will yield the long-run proportion of time the system is in each state. In terms of the solutions of the equations in part (b), (c) what proportion of time is the system empty? (d) what is the average rate at which customers enter the system? 20. There are two types of customers. Type 1 and 2 customers arrive in accordance with independent Poisson processes with respective rate λ1 and λ2 . There are two servers. A type 1 arrival will enter service with server 1 if that server is free; if server 1 is busy and server 2 is free, then the type 1 arrival will enter service with server 2. If both servers are busy, then the type 1 arrival will go away. A type 2 customer can only be served by server 2; if server 2 is free when a type 2 customer arrives, then the customer enters service with that server. If server 2 is busy when a type 2 arrives, then that customer goes away. Once a customer is served by either server, he departs the system. Service times at server i are exponential with rate µi , i = 1, 2. Suppose we want to find the average number of customers in the system. (a) Define states. (b) Give the balance equations. Do not attempt to solve them. In terms of the long-run probabilities, what is (c) the average number of customers in the system? (d) the average time a customer spends in the system? *21. Suppose in Exercise 20 we want to find out the proportion of time there is a type 1 customer with server 2. In terms of the long-run probabilities given in Exercise 20, what is (a) (b) (c) (d)

the rate at which a type 1 customer enters service with server 2? the rate at which a type 2 customer enters service with server 2? the fraction of server 2’s customers that are type 1? the proportion of time that a type 1 customer is with server 2?

552

Introduction to Probability Models

22. Customers arrive at a single-server station in accordance with a Poisson process with rate λ. All arrivals that find the server free immediately enter service. All service times are exponentially distributed with rate µ. An arrival that finds the server busy will leave the system and roam around “in orbit” for an exponential time with rate θ at which time it will then return. If the server is busy when an orbiting customer returns, then that customer returns to orbit for another exponential time with rate θ before returning again. An arrival that finds the server busy and N other customers in orbit will depart and not return. That is, N is the maximum number of customers in orbit. (a) Define states. (b) Give the balance equations. In terms of the solution of the balance equations, find (c) the proportion of all customers that are eventually served; (d) the average time that a served customer spends waiting in orbit. 23. Consider the M/M/1 system in which customers arrive at rate λ and the server serves at rate µ. However, suppose that in any interval of length h in which the server is busy there is a probability αh + o(h) that the server will experience a breakdown, which causes the system to shut down. All customers that are in the system depart, and no additional arrivals are allowed to enter until the breakdown is fixed. The time to fix a breakdown is exponentially distributed with rate β. (a) Define appropriate states. (b) Give the balance equations. In terms of the long-run probabilities, (c) what is the average amount of time that an entering customer spends in the system? (d) what proportion of entering customers complete their service? (e) what proportion of customers arrive during a breakdown? *24. Reconsider Exercise 23, but this time suppose that a customer that is in the system when a breakdown occurs remains there while the server is being fixed. In addition, suppose that new arrivals during a breakdown period are allowed to enter the system. What is the average time a customer spends in the system? 25. Poisson (λ) arrivals join a queue in front of two parallel servers A and B, having exponential service rates µ A and µ B (see Figure 8.4). When the system is empty, arrivals go into server A with probability α and into B with probability 1 − α. Otherwise, the head of the queue takes the first free server. (a) Define states and set up the balance equations. Do not solve.

Figure 8.4

Queueing Theory

553

(b) In terms of the probabilities in part (a), what is the average number in the system? Average number of servers idle? (c) In terms of the probabilities in part (a), what is the probability that an arbitrary arrival will get serviced in A? 26. In a queue with unlimited waiting space, arrivals are Poisson (parameter λ) and service times are exponentially distributed (parameter µ). However, the server waits until K people are present before beginning service on the first customer; thereafter, he services one at a time until all K units, and all subsequent arrivals, are serviced. The server is then “idle” until K new arrivals have occurred. (a) Define an appropriate state space, draw the transition diagram, and set up the balance equations. (b) In terms of the limiting probabilities, what is the average time a customer spends in queue? (c) What conditions on λ and µ are necessary? 27. Consider a single-server exponential system in which ordinary customers arrive at a rate λ and have service rate µ. In addition, there is a special customer who has a service rate µ1 . Whenever this special customer arrives, she goes directly into service (if anyone else is in service, then this person is bumped back into queue). When the special customer is not being serviced, she spends an exponential amount of time (with mean 1/θ ) out of the system. (a) What is the average arrival rate of the special customer? (b) Define an appropriate state space and set up balance equations. (c) Find the probability that an ordinary customer is bumped n times. *28. Let D denote the time between successive departures in a stationary M/M/1 queue with λ < µ. Show, by conditioning on whether or not a departure has left the system empty, that D is exponential with rate λ. Hint: By conditioning on whether or not the departure has left the system empty we see that ) Exponential(µ), with probability λ/µ D= Exponential(λ) ∗ Exponential(µ), with probability 1 − λ/µ where Exponential(λ) ∗ Exponential(µ) represents the sum of two independent exponential random variables having rates µ and λ. Now use moment-generating functions to show that D has the required distribution. Note that the preceding does not prove that the departure process is Poisson. To prove this we need show not only that the interdeparture times are all exponential with rate λ, but also that they are independent. 29. Potential customers arrive to a single-server hair salon according to a Poisson process with rate λ. A potential customer who finds the server free enters the system; a potential customer who finds the server busy goes away. Each potential customer is type i with probability pi , where p1 + p2 + p3 = 1. Type 1 customers have their hair washed by the server; type 2 customers have their hair cut by the

554

Introduction to Probability Models

server; and type 3 customers have their hair first washed and then cut by the server. The time that it takes the server to wash hair is exponentially distributed with rate µ1 , and the time that it takes the server to cut hair is exponentially distributed with rate µ2 . (a) Explain how this system can be analyzed with four states. (b) Give the equations whose solution yields the proportion of time the system is in each state. In terms of the solution of the equations of (b), find (c) the proportion of time the server is cutting hair; (d) the average arrival rate of entering customers. 30. For the tandem queue model verify that Pn,m = (λ/µ1 )n (1 − λ/µ1 )(λ/µ2 )m (1 − λ/µ2 ) satisfies the balance Equation (8.15). 31. Consider a network of three stations with a single server at each station. Customers arrive at stations 1, 2, 3 in accordance with Poisson processes having respective rates 5, 10, and 15. The service times at the three stations are exponential with respective rates 10, 50, and 100. A customer completing service at station 1 is equally likely to (i) go to station 2, (ii) go to station 3, or (iii) leave the system. A customer departing service at station 2 always goes to station 3. A departure from service at station 3 is equally likely to either go to station 2 or leave the system. (a) What is the average number of customers in the system (consisting of all three stations)? (b) What is the average time a customer spends in the system? 32. Consider a closed queueing network consisting of two customers moving among two servers, and suppose that after each service completion the customer is equally likely to go to either server—that is, P1,2 = P2,1 = 21 . Let µi denote the exponential service rate at server i, i = 1, 2. (a) Determine the average number of customers at each server. (b) Determine the service completion rate for each server. 33. Explain how a Markov chain Monte Carlo simulation using the Gibbs sampler can be utilized to estimate (a) the distribution of the amount of time spent at server j on a visit. Hint: Use the arrival theorem. (b) the proportion of time a customer is with server j (i.e., either in server j’s queue or in service with j).

Queueing Theory

555

34. For open queueing networks (a) state and prove the equivalent of the arrival theorem; (b) derive an expression for the average amount of time a customer spends waiting in queues. 35. Customers arrive at a single-server station in accordance with a Poisson process having rate λ. Each customer has a value. The successive values of customers are independent and come from a uniform distribution on (0, 1). The service time of a customer having value x is a random variable with mean 3 + 4x and variance 5. (a) What is the average time a customer spends in the system? (b) What is the average time a customer having value x spends in the system?

*36. Compare the M/G/1 system for first-come, first-served queue discipline with one of last-come, first-served (for instance, in which units for service are taken from the top of a stack). Would you think that the queue size, waiting time, and busy-period distribution differ? What about their means? What if the queue discipline was always to choose at random among those waiting? Intuitively, which discipline would result in the smallest variance in the waiting time distribution? 37. In an M/G/1 queue, (a) what proportion of departures leave behind 0 work? (b) what is the average work in the system as seen by a departure? 38. For the M/G/1 queue, let X n denote the number in the system left behind by the nth departure. (a) If X n+1 =

)

X n − 1 + Yn , if X n " 1 if X n = 0 Yn ,

what does Yn represent? (b) Rewrite the preceding as X n+1 = X n − 1 + Yn + δn where

(8.64)

) 1, if X n = 0 δn = 0, if X n " 1

Take expectations and let n → ∞ in Equation (8.64) to obtain E[δ∞ ] = 1 − λE[S] (c) Square both sides of Equation (8.64), take expectations, and then let n → ∞ to obtain E[X ∞ ] =

λ2 E[S 2 ] + λE[S] 2(1 − λE[S])

556

Introduction to Probability Models

(d) Argue that E[X ∞ ], the average number as seen by a departure, is equal to L.

*39. Consider an M/G/1 system in which the first customer in a busy period has the service distribution G 1 and all others have distribution G 2 . Let C denote the number of customers in a busy period, and let S denote the service time of a customer chosen at random. Argue that (a) a0 = P0 = 1 − λE[S]. (b) E[S] = a0 E[S1 ] + (1 − a0 )E[S2 ] where Si has distribution G i . (c) Use (a) and (b) to show that E[B], the expected length of a busy period, is given by E[B] =

E[S1 ] 1 − λE[S2 ]

(d) Find E[C]. 40. Consider a M/G/1 system with λE[S] < 1. (a) Suppose that service is about to begin at a moment when there are n customers in the system. (i) Argue that the additional time until there are only n − 1 customers in the system has the same distribution as a busy period. (ii) What is the expected additional time until the system is empty? (b) Suppose that the work in the system at some moment is A. We are interested in the expected additional time until the system is empty—call it E[T ]. Let N denote the number of arrivals during the first A units of time. (i) Compute E[T |N ]. (ii) Compute E[T ].

41. Carloads of customers arrive at a single-server station in accordance with a Poisson process with rate 4 per hour. The service times are exponentially distributed with rate 20 per hour. If each carload contains either 1, 2, or 3 customers with respective probabilities 41 , 21 , and 41 , compute the average customer delay in queue. 42. In the two-class priority queueing model of Section 8.6.2, what is W Q ? Show that W Q is less than it would be under FIFO if E[S1 ] < E[S2 ] and greater than under FIFO if E[S1 ] > E[S2 ]. 43. In a two-class priority queueing model suppose that a cost of Ci per unit time is incurred for each type i customer that waits in queue, i = 1, 2. Show that type 1 customers should be given priority over type 2 (as opposed to the reverse) if E[S1 ] E[S2 ] < C1 C2 44. Consider the priority queueing model of Section 8.6.2 but now suppose that if a type 2 customer is being served when a type 1 arrives then the type 2 customer is bumped out of service. This is called the preemptive case. Suppose that when

Queueing Theory

557

a bumped type 2 customer goes back in service his service begins at the point where it left off when he was bumped. (a) Argue that the work in the system at any time is the same as in the nonpreemptive case. (b) Derive W Q1 . Hint: How do type 2 customers affect type 1s? (c) Why is it not true that VQ2 = λ2 E[S2 ]W Q2 (d) Argue that the work seen by a type 2 arrival is the same as in the nonpreemptive case, and so W Q2 = W Q2 (nonpreemptive) + E[extra time] where the extra time is due to the fact that he may be bumped. (e) Let N denote the number of times a type 2 customer is bumped. Why is E[extra time|N ] =

N E[S1 ] 1 − λ1 E[S1 ]

Hint: When a type 2 is bumped, relate the time until he gets back in service to a “busy period.” (f) Let S2 denote the service time of a type 2. What is E[N |S2 ]? (g) Combine the preceding to obtain W Q2 = W Q2 (nonpreemptive) +

λ1 E[S1 ]E[S2 ] 1 − λ1 E[S1 ]

*45. Calculate explicitly (not in terms of limiting probabilities) the average time a customer spends in the system in Exercise 24. 46. In the G/M/1 model if G is exponential with rate λ show that β = λ/µ. 47. In the k server Erlang loss model, suppose that λ = 1 and E[S] = 4. Find L if Pk = .2. 48. Verify the formula given for the Pi of the M/M/k. 49. In the Erlang loss system suppose the Poisson arrival rate is λ = 2, and suppose there are three servers, each of whom has a service distribution that is uniformly distributed over (0, 2). What proportion of potential customers is lost? 50. In the M/M/k system, (a) what is the probability that a customer will have to wait in queue? (b) determine L and W . 51. Verify the formula for the distribution of W Q∗ given for the G/M/k model.

558

Introduction to Probability Models

*52. Consider a system where the interarrival times have an arbitrary distribution F, and there is a single server whose service distribution is G. Let Dn denote the amount of time the nth customer spends waiting in queue. Interpret Sn , Tn so that ) Dn + Sn − Tn , if Dn + Sn − Tn " 0 Dn+1 = 0, if Dn + Sn − Tn < 0 53. Consider a model in which the interarrival times have an arbitrary distribution F, and there are k servers each having service distribution G. What condition on F and G do you think would be necessary for there to exist limiting probabilities?

References [1] J. Cohen, “The Single Server Queue,” North-Holland, Amsterdam, 1969. [2] R. B. Cooper, “Introduction to Queueing Theory,” Second Edition, Macmillan, New York, 1984. [3] D. R. Cox and W. L. Smith, “Queues,” Wiley, New York, 1961. [4] F. Kelly, “Reversibility and Stochastic Networks,” Wiley, New York, 1979. [5] L. Kleinrock, “Queueing Systems,” Vol. I, Wiley, New York, 1975. [6] S. Nozaki and S. Ross, “Approximations in Finite Capacity Multiserver Queues with Poisson Arrivals,” J. Appl. Prob. 13, 826–834 (1978). [7] L. Takacs, “Introduction to the Theory of Queues,” Oxford University Press, London and New York, 1962. [8] H. Tijms, “Stochastic Models: An Algorithmic Approach,” Wiley, New York, 1994. [9] P. Whittle, “Systems in Stochastic Equilibrium,” Wiley, New York, 1986. [10] Wolff, “Stochastic Modeling and the Theory of Queues,” Prentice Hall, New Jersey, 1989.

Reliability Theory

9.1

Introduction

Reliability theory is concerned with determining the probability that a system, possibly consisting of many components, will function. We shall suppose that whether or not the system functions is determined solely from a knowledge of which components are functioning. For instance, a series system will function if and only if all of its components are functioning, while a parallel system will function if and only if at least one of its components is functioning. In Section 9.2, we explore the possible ways in which the functioning of the system may depend upon the functioning of its components. In Section 9.3, we suppose that each component will function with some known probability (independently of each other) and show how to obtain the probability that the system will function. As this probability often is difficult to explicitly compute, we also present useful upper and lower bounds in Section 9.4. In Section 9.5 we look at a system dynamically over time by supposing that each component initially functions and does so for a random length of time at which it fails. We then discuss the relationship between the distribution of the amount of time that a system functions and the distributions of the component lifetimes. In particular, it turns out that if the amount of time that a component functions has an increasing failure rate on the average (IFRA) distribution, then so does the distribution of system lifetime. In Section 9.6 we consider the problem of obtaining the mean lifetime of a system. In the final section we analyze the system when failed components are subjected to repair.

560

Introduction to Probability Models

Figure 9.1 A series system.

9.2

Structure Functions

Consider a system consisting of n components, and suppose that each component is either functioning or has failed. To indicate whether or not the ith component is functioning, we define the indicator variable xi by ! 1, if the ith component is functioning xi = 0, if the ith component has failed The vector x = (x1 , . . . , xn ) is called the state vector. It indicates which of the components are functioning and which have failed. We further suppose that whether or not the system as a whole is functioning is completely determined by the state vector x. Specifically, it is supposed that there exists a function φ(x) such that ! 1, if the system is functioning when the state vector is x φ(x) = 0, if the system has failed when the state vector is x The function φ(x) is called the structure function of the system. Example 9.1 (The Series Structure) A series system functions if and only if all of its components are functioning. Hence, its structure function is given by φ(x) = min(x1 , . . . , xn ) =

n "

i=1

We shall find it useful to represent the structure of a system in terms of a diagram. The relevant diagram for the series structure is shown in Figure 9.1. The idea is that if a signal is initiated at the left end of the diagram then in order for it to successfully reach the right end, it must pass through all of the components; hence, they must all be functioning. ! Example 9.2 (The Parallel Structure) A parallel system functions if and only if at least one of its components is functioning. Hence, its structure function is given by φ(x) = max(x1 , . . . , xn ) A parallel structure may be pictorially illustrated by Figure 9.2. This follows since a signal at the left end can successfully reach the right end as long as at least one component is functioning. ! Example 9.3 (The k-out-of-n Structure) The series and parallel systems are both special cases of a k-out-of-n system. Such a system functions if and only if at least

Reliability Theory

561

Figure 9.2 A parallel system.

Figure 9.3 A two-out-of-three system.

#n k of the n components are functioning. As i=1 xi equals the number of functioning components, the structure function of a k-out-of-n system is given by ⎧ n # ⎪ ⎪ xi " k ⎨1, if i=1 φ(x) = n # ⎪ ⎪ xi < k ⎩0, if i=1

Series and parallel systems are respectively n-out-of-n and 1-out-of-n systems. The two-out-of-three system may be diagrammed as shown in Figure 9.3.

Example 9.4 (A Four-Component Structure) Consider a system consisting of four components, and suppose that the system functions if and only if components 1 and 2 both function and at least one of components 3 and 4 function. Its structure function is given by φ(x) = x1 x2 max(x3 , x4 )

562

Introduction to Probability Models

Figure 9.4

Pictorially, the system is as shown in Figure 9.4. A useful identity, easily checked, is that for binary variables,∗ xi , i = 1, . . . , n, max(x1 , . . . , xn ) = 1 − When n = 2, this yields

n " i=1

(1 − xi )

max(x1 , x2 ) = 1 − (1 − x1 )(1 − x2 ) = x1 + x2 − x1 x2 Hence, the structure function in the example may be written as φ(x) = x1 x2 (x3 + x4 − x3 x4 )

It is natural to assume that replacing a failed component by a functioning one will never lead to a deterioration of the system. In other words, it is natural to assume that the structure function φ(x) is an increasing function of x, that is, if xi # yi , i = 1, . . . , n, then φ(x) # φ(y). Such an assumption shall be made in this chapter and the system will be called monotone.

9.2.1

Minimal Path and Minimal Cut Sets

In this section we show how any system can be represented both as a series arrangement of parallel structures and as a parallel arrangement of series structures. As a preliminary, we need the following concepts. A state vector x is called a path vector if φ(x) = 1. If, in addition, φ(y) = 0 for all y < x, then x is said to be a minimal path vector.∗∗ If x is a minimal path vector, then the set A = {i : xi = 1} is called a minimal path set. In other words, a minimal path set is a minimal set of components whose functioning ensures the functioning of the system. Example 9.5 Consider a five-component system whose structure is illustrated by Figure 9.5. Its structure function equals φ(x) = max(x1 , x2 ) max(x3 x4 , x5 )

= (x1 + x2 − x1 x2 )(x3 x4 + x5 − x3 x4 x5 )

∗ A binary variable is one that assumes either the value 0 or 1. ∗∗ We say that y < x if y # x , i = 1, . . . , n, with y < x for some i. i i i i

Reliability Theory

563

Figure 9.5

There are four minimal path sets, namely, {1, 3, 4}, {2, 3, 4}, {1, 5}, {2, 5}. ! (n ) Example 9.6 In a k-out-of-n system, there are k minimal path sets, namely, all of the sets consisting of exactly k components.

Let A1 , . . . , As denote the minimal path sets of a given system. We define α j (x), the indicator function of the jth minimal path set, by ! 1, if all the components of Aj are functioning α j (x) = 0, otherwise " xi = i∈A j

By definition, it follows that the system will function if all the components of at least one minimal path set are functioning; that is, if α j (x) = 1 for some j. On the other hand, if the system functions, then the set of functioning components must include a minimal path set. Therefore, a system will function if and only if all the components of at least one minimal path set are functioning. Hence, ! 1, if α j (x) = 1 for some j φ(x) = 0, if α j (x) = 0 for all j or equivalently, φ(x) = max αj (x) j " xi = max j

(9.1)

i∈A j

Since α j (x) is a series structure function of the components of the jth minimal path set, Equation (9.1) expresses an arbitrary system as a parallel arrangement of series systems. Example 9.7 Consider the system of Example 9.5. Because its minimal path sets are A1 = {1, 3, 4}, A2 = {2, 3, 4}, A3 = {1, 5}, and A4 = {2, 5}, we have by Equation (9.1) that φ(x) = max{x1 x3 x4 , x2 x3 x4 , x1 x5 , x2 x5 }

= 1 − (1 − x1 x3 x4 )(1 − x2 x3 x4 )(1 − x1 x5 )(1 − x2 x5 )

564

Introduction to Probability Models

Figure 9.6

Figure 9.7 The bridge system.

Figure 9.8

You should verify that this equals the value of φ(x) given in Example 9.5. (Make use of the fact that, since xi equals 0 or 1, xi2 = xi .) This representation may be pictured as shown in Figure 9.6. ! Example 9.8 The system whose structure is as pictured in Figure 9.7 is called the bridge system. Its minimal path sets are {1, 4}, {1, 3, 5}, {2, 5}, and {2, 3, 4}. Hence, by Equation (9.1), its structure function may be expressed as φ(x) = max{x1 x4 , x1 x3 x5 , x2 x5 , x2 x3 x4 } = 1 − (1 − x1 x4 )(1 − x1 x3 x5 )(1 − x2 x5 )(1 − x2 x3 x4 ) This representation φ(x) is diagrammed as shown in Figure 9.8.

A state vector x is called a cut vector if φ(x) = 0. If, in addition, φ(y) = 1 for all y > x, then x is said to be a minimal cut vector. If x is a minimal cut vector, then the set C = {i : xi = 0} is called a minimal cut set. In other words, a minimal cut set is a minimal set of components whose failure ensures the failure of the system.

Reliability Theory

565

Figure 9.9

Let C1 , . . . , Ck denote the minimal cut sets of a given system. We define β j (x), the indicator function of the jth minimal cut set, by ⎧ 1, if at least one component of the jth minimal ⎪ ⎪ ⎨ cut set is functioning β j (x) = 0, if all of the components of the jth minimal ⎪ ⎪ ⎩ cut set are not functioning = max xi i∈C j

Since a system is not functioning if and only if all the components of at least one minimal cut set are not functioning, it follows that φ(x) =

k "

j=1

β j (x) =

k "

j=1

max xi i∈C j

(9.2)

Since β j (x) is a parallel structure function of the components of the jth minimal cut set, Equation (9.2) represents an arbitrary system as a series arrangement of parallel systems. Example 9.9 The minimal cut sets of the bridge structure shown in Figure 9.9 are {1, 2}, {1, 3, 5}, {2, 3, 4}, and {4, 5}. Hence, from Equation (9.2), we may express φ(x) by φ(x) = max(x1 , x2 ) max(x1 , x3 , x5 ) max(x2 , x3 , x4 ) max(x4 , x5 ) = [1 − (1 − x1 )(1 − x2 )][1 − (1 − x1 )(1 − x3 )(1 − x5 )] × [1 − (1 − x2 )(1 − x3 )(1 − x4 )][1 − (1 − x4 )(1 − x5 )]

This representation of φ(x) is pictorially expressed as Figure 9.10.

9.3

Reliability of Systems of Independent Components

In this section, we suppose that X i , the state of the ith component, is a random variable such that P{X i = 1} = pi = 1 − P{X i = 0}

566

Introduction to Probability Models

Figure 9.10 Minimal cut representation of the bridge system.

The value pi , which equals the probability that the ith component is functioning, is called the reliability of the ith component. If we define r by r = P{φ(X) = 1}, where X = (X 1 , . . . , X n ) then r is called the reliability of the system. When the components, that is, the random variables X i , i = 1, . . . , n, are independent, we may express r as a function of the component reliabilities. That is, r = r (p), where p = ( p1 , . . . , pn ) The function r (p) is called the reliability function. We shall assume throughout the remainder of this chapter that the components are independent. Example 9.10 (The Series System) The reliability function of the series system of n independent components is given by r (p) = P{φ(X) = 1} = P{X i = 1 for all i = 1, . . . , n} n " = pi

i=1

Example 9.11 (The Parallel System) The reliability function of the parallel system of n independent components is given by r (p) = P{φ(X) = 1} = P{X i = 1 for some i = 1, . . . , n}

= 1 − P{X i = 0 for all i = 1, . . . , n} n " =1− (1 − pi ) i=1

Reliability Theory

567

Example 9.12 (The k-out-of-n System with Equal Probabilities) Consider a k-outof-n system. If pi = p for all i = 1, . . . , n, then the reliability function is given by r ( p, . . . , p) = P{φ(X) = 1} * n , + =P Xi " k i=1

n - . + n i=k

pi (1 − p)n−i

Example 9.13 (The Two-out-of-Three System) The reliability function of a twoout-of-three system is given by r (p) = P{φ(X) = 1}

= P{X = (1, 1, 1)} + P{X = (1, 1, 0)} + P{X = (1, 0, 1)} + P{X = (0, 1, 1)}

= p1 p2 p3 + p1 p2 (1 − p3 ) + p1 (1 − p2 ) p3 + (1 − p1 ) p2 p3 = p1 p2 + p1 p3 + p2 p3 − 2 p1 p2 p3

Example 9.14 (The Three-out-of-Four System) The reliability function of a threeout-of-four system is given by r (p) = P{X = (1, 1, 1, 1)} + P{X = (1, 1, 1, 0)} + P{X = (1, 1, 0, 1)} + P{X = (1, 0, 1, 1)} + P{X = (0, 1, 1, 1)} = p1 p2 p3 p4 + p1 p2 p3 (1 − p4 ) + p1 p2 (1 − p3 ) p4 + p1 (1 − p2 ) p3 p4 + (1 − p1 ) p2 p3 p4 = p1 p2 p3 + p1 p2 p4 + p1 p3 p4 + p2 p3 p4 − 3 p1 p2 p3 p4

Example 9.15 (A Five-Component System) Consider a five-component system that functions if and only if component 1, component 2, and at least one of the remaining components function. Its reliability function is given by r (p) = P{X 1 = 1, X 2 = 1, max(X 3 , X 4 , X 5 ) = 1}

= P{X 1 = 1}P{X 2 = 1}P{max(X 3 , X 4 , X 5 ) = 1} = p1 p2 [1 − (1 − p3 )(1 − p4 )(1 − p5 )]

Since φ(X) is a 0–1 (that is, a Bernoulli) random variable, we may also compute r (p) by taking its expectation. That is, r (p) = P{φ(X) = 1} = E[φ(X)]

Example 9.16 (A Four-Component System) A four-component system that functions when both components 1 and 4, and at least one of the other components function has its structure function given by φ(x) = x1 x4 max(x2 , x3 )

568

Introduction to Probability Models

Hence, r (p) = E[φ(X)]

= E[X 1 X 4 (1 − (1 − X 2 )(1 − X 3 ))] = p1 p4 [1 − (1 − p2 )(1 − p3 )]

An important and intuitive property of the reliability function r (p) is given by the following proposition. Proposition 9.1 If r (p) is the reliability function of a system of independent components, then r (p) is an increasing function of p. Proof. obtain

By conditioning on X i and using the independence of the components, we

r (p) = E[φ(X)]

= pi E[φ(X) | X i = 1] + (1 − pi )E[φ(X) | X i = 0] = pi E[φ(1i , X)] + (1 − pi )E[φ(0i , X)]

where (1i , X) = (X 1 , . . . , X i−1 , 1, X i+1 , . . . , X n ), (0i , X) = (X 1 , . . . , X i−1 , 0, X i+1 , . . . , X n ) Thus, r (p) = pi E[φ(1i , X) − φ(0i , X)] + E[φ(0i , X)] However, since φ is an increasing function, it follows that E[φ(1i , X) − φ(0i , X)] " 0 and so the preceding is increasing in pi for all i. Hence, the result is proven.

Let us now consider the following situation: A system consisting of n different components is to be built from a stockpile containing exactly two of each type of component. How should we use the stockpile so as to maximize our probability of attaining a functioning system? In particular, should we build two separate systems, in which case the probability of attaining a functioning one would be P{at least one of the two systems function} = 1 − P{neither of the systems function} = 1 − [(1 − r (p))(1 − r (p′ ))]

where pi ( pi′ ) is the probability that the first (second) number i component functions; or should we build a single system whose ith component functions if at least one of the number i components functions? In this latter case, the probability that the system will function equals r [1 − (1 − p)(1 − p′ )]

Reliability Theory

569

since 1 − (1 − pi )(1 − pi′ ) equals the probability that the ith component in the single system will function.∗ We now show that replication at the component level is more effective than replication at the system level. For any reliability function r and vectors p, p′ ,

Theorem 9.1

r [1 − (1 − p)(1 − p′ )] " 1 − [1 − r (p)][1 − r (p′ )] Proof. Let X 1 , . . . , X n , X 1′ , . . . , X n′ be mutually independent 0–1 random variables with pi = P{X i = 1},

pi′ = P{X i′ = 1}

Since P{max(X i , X i′ ) = 1} = 1 − (1 − pi )(1 − pi′ ), it follows that r [1 − (1 − p)(1 − p′ )] = E(φ[max(X, X′ )]) However, by the monotonicity of φ, we have that φ[max(X, X′ )] is greater than or equal to both φ(X) and φ(X′ ) and hence is at least as large as max[φ(X), φ(X′ )]. Hence, from the preceding we have r [1 − (1 − p)(1 − p′ )] " E[max(φ(X), φ(X′ ))] = P{max[φ(X), φ(X′ )] = 1}

= 1 − P{φ(X) = 0, φ(X′ ) = 0} = 1 − [1 − r (p)][1 − r (p′ )]

where the first equality follows from the fact that max[φ(X), φ(X′ )] is a 0–1 random variable and hence its expectation equals the probability that it equals 1. ! As an illustration of the preceding theorem, suppose that we want to build a series system of two different types of components from a stockpile consisting of two of each of the kinds of components. Suppose that the reliability of each component is 21 . If we use the stockpile to build two separate systems, then the probability of attaining a working system is 1−

( 3 )2 4

7 16

while if we build a single system, replicating components, then the probability of attaining a working system is ( 3 )2 4

9 16

Hence, replicating components leads to a higher reliability than replicating systems (as, of course, it must by Theorem 9.1). ∗

Notation: If x = (x1 , . . . , xn ), y = (y1 , . . . , yn ), then xy = (x1 y1 , . . . , xn yn ). Also, max(x, y) = (max(x1 , y1 ), . . . , max(xn , yn )) and min(x, y) = (min(x1 , y1 ), . . . , min(xn , yn )).

570

Introduction to Probability Models

Figure 9.11

9.4

Bounds on the Reliability Function

Consider the bridge system of Example 9.8, which is represented by Figure 9.11. Using the minimal path representation, we have φ(x) = 1 − (1 − x1 x4 )(1 − x1 x3 x5 )(1 − x2 x5 )(1 − x2 x3 x4 ) Hence, r (p) = 1 − E[(1 − X 1 X 4 )(1 − X 1 X 3 X 5 )(1 − X 2 X 5 )(1 − X 2 X 3 X 4 )] However, since the minimal path sets overlap (that is, they have components in common), the random variables (1−X 1 X 4 ), (1−X 1 X 3 X 5 ), (1−X 2 X 5 ), and (1−X 2 X 3 X 4 ) are not independent, and thus the expected value of their product is not equal to the product of their expected values. Therefore, in order to compute r (p), we must first multiply the four random variables and then take the expected value. Doing so, using that X i2 = X i , we obtain r (p) = E[X 1 X 4 + X 2 X 5 + X 1 X 3 X 5 + X 2 X 3 X 4 − X 1 X 2 X 3 X 4 − X1 X2 X3 X5 − X1 X2 X4 X5 − X1 X3 X4 X5 − X2 X3 X4 X5 + 2X 1 X 2 X 3 X 4 X 5 ]

= p1 p4 + p2 p5 + p1 p3 p5 + p2 p3 p4 − p1 p2 p3 p4 − p1 p2 p3 p5 − p1 p2 p4 p5 − p1 p3 p4 p5 − p2 p3 p4 p5 + 2 p1 p2 p3 p4 p5

As can be seen by the preceding example, it is often quite tedious to evaluate r (p), and thus it would be useful if we had a simple way of obtaining bounds. We now consider two methods for this.

9.4.1

Method of Inclusion and Exclusion

The following is a well-known formula for the probability of the union of the events E1, E2 , . . . , En : / n 1 n 0 + ++ +++ P Ei = P(E i ) − P(E i E j ) + P(E i E j E k ) i=1

i=1

i < j

n+1

− · · · + (−1)

P(E 1 E 2 · · · E n )

i < j t} = r F(t) ¯ where F(t) = ( F¯1 (t), . . . , F¯n (t)). Hence, by a well-known formula that states that for any nonnegative random variable X , 8 ∞ P{X > x} d x, E[X ] = 0

we see

that∗

E[system life] =

∞ 0

( ) ¯ r F(t) dt

(9.21)

Example 9.26 (A Series System of Uniformly Distributed Components) Consider a series system of three independent components each of which functions for an amount of time (in hours) uniformly distributed over (0, 10). Hence, r (p) = p1 p2 p3 and ! t/10, 0 # t # 10 Fi (t) = i = 1, 2, 3 1, t > 10 Therefore, ⎧.3 ⎪ ( ) ⎨ 10 − t , ¯ r F(t) = 10 ⎪ ⎩0,

0 # t # 10 t > 10

and so from Equation (9.21) we obtain . 8 10 10 − t 3 dt E[system life] = 10 0 8 1 = 10 y 3 dy 0

5 2

Example 9.27 (A Two-out-of-Three System) Consider a two-out-of-three system of independent components, in which each component’s lifetime is (in months) uniformly distributed over (0, 1). As was shown in Example 9.13, the reliability of such a system is given by r (p) = p1 p2 + p1 p3 + p2 p3 − 2 p1 p2 p3

∗

9 That E[X ] = 0∞ P{X > x} d x can be shown as follows when X has density f : 8 ∞8 ∞ 8 ∞8 y 8 ∞ 8 ∞ P{X > x} d x = f (y) d y d x = f (y) d x d y = y f (y) dy = E[X ] 0

Reliability Theory

589

Since

! t, Fi (t) = 1,

0#t #1 t >1

we see from Equation (9.21) that 8 1 [3(1 − t)2 − 2(1 − t)3 ] dt E[system life] = 0

=1−

(3y 2 − 2y 3 ) dy 1 2

1 2

Example 9.28 (A Four-Component System) Consider the four-component system that functions when components 1 and 2 and at least one of components 3 and 4 functions. Its structure function is given by φ(x) = x1 x2 (x3 + x4 − x3 x4 ) and thus its reliability function equals r (p) = p1 p2 ( p3 + p4 − p3 p4 ) Let us compute the mean system lifetime when the ith component is uniformly distributed over (0, i), i = 1, 2, 3, 4. Now, ! 1 − t, 0 # t # 1 ¯ F1 (t) = 0, t >1 ! 1 − t/2, 0 # t # 2 F¯2 (t) = 0, t >2 ! 1 − t/3, 0 # t # 3 F¯3 (t) = 0, t >3 ! 1 − t/4, 0 # t # 4 F¯4 (t) = 0, t >4 Hence,

⎧ .3 4 3−t 4−t (3 − t)(4 − t) ( ) ⎨(1 − t) 2 − t + − , ¯ r F(t) = 2 3 4 12 ⎩ 0,

0#t #1 t >1

Therefore,

1 E[system life] = 24

1 0

(1 − t)(2 − t)(12 − t 2 ) dt

593 (24)(60) ≈ 0.41

590

Introduction to Probability Models

We end this section by obtaining the mean lifetime of a k-out-of-n system of independent identically distributed exponential components. If θ is the mean lifetime of each component, then F¯i (t) = e−t/θ Hence, since for a k-out-of-n system, r ( p, p, . . . , p) =

n - . + n

i=k

pi (1 − p)n−i

we obtain from Equation (9.21) E[system life] =

n ∞+ i=k

. n (e−t/θ )i (1 − e−t/θ )n−i dt i

Making the substitution y = e−t/θ ,

1 y dy = − e−t/θ dt = − dt θ θ

yields E[system life] = θ

n - .8 + n

i=k

y i−1 (1 − y)n−i dy

Now, it is not difficult to show that∗ 8

y n (1 − y)m dy =

m!n! (m + n + 1)!

(9.22)

Thus, the foregoing equals E[system life] = θ =θ ∗

n + i=k

(i − 1)!(n − i)! n! (n − i)!i! n!

n + 1 i=k

(9.23)

Let C(n, m) =

8 1 0

y n (1 − y)m dy

Integration by parts yields C(n, m) = [m/(n + 1)]C(n + 1, m − 1). Starting with C(n, 0) = 1/(n + 1), Equation (9.22) follows by mathematical induction.

Reliability Theory

591

Remark Equation (9.23) could have been proven directly by making use of special properties of the exponential distribution. First note that the lifetime of a k-out-of-n system can be written as T1 + · · · + Tn−k+1 , where Ti represents the time between the (i − 1)st and ith failure. This is true since T1 + · · · + Tn−k+1 equals the time at which the (n − k + 1)st component fails, which is also the first time that the number of functioning components is less than k. Now, when all n components are functioning, the rate at which failures occur is n/θ . That is, T1 is exponentially distributed with mean θ/n. Similarly, since Ti represents the time until the next failure when there are n − (i − 1) functioning components, it follows that Ti is exponentially distributed with mean θ/(n − i + 1). Hence, the mean system lifetime equals 4 3 1 1 + ··· + E[T1 + · · · + Tn−k+1 ] = θ n k Note also that it follows, from the lack of memory of the exponential, that the Ti , i = 1, . . . , n − k + 1, are independent random variables.

9.6.1

An Upper Bound on the Expected Life of a Parallel System

Consider a parallel system of n components, whose lifetimes are not necessarily independent. The system lifetime can be expressed as system life = max X i i

where X i is the lifetime of component i, i = 1, . . . , n. We can bound the expected system lifetime by making use of the following inequality. Namely, for any constant c max X i # c + i

n + i=1

(X i − c)+

(9.24)

where x + , the positive part of x, is equal to x if x > 0 and is equal to 0 if x # 0. The validity of Inequality (9.24) is immediate since if max X i < c then the left side is equal to max X i and the right side is equal to c. On the other hand, if X (n) = max X i > c then the right side is at least as large as c + (X (n) − c) = X (n) . It follows from Inequality (9.24), upon taking expectations, that n + = > E max X i # c + E[(X i − c)+ ] i

i=1

Now, (X i − c)+ is a nonnegative random variable and so 8 ∞ + E[(X i − c) ] = P{(X i − c)+ > x} d x 0 8 ∞ P{X i − c > x} d x = 80 ∞ P{X i > y} dy = c

(9.25)

592

Introduction to Probability Models

Thus, we obtain n + > = E max X i # c + i

i=1

∞ c

(9.26)

P{X i > y} dy

Because the preceding is true for all c, it follows that we obtain the best bound by letting c equal the value that minimizes the right side of the preceding. To determine that value, differentiate the right side of the preceding and set the result equal to 0, to obtain 1−

n + i=1

P{X i > c} = 0

That is, the minimizing value of c is that value c∗ for which n + i=1

P{X i > c∗ } = 1

#n Since i=1 P{X i > c} is a decreasing function of c, the value of c∗ can be easily approximated and then utilized in Inequality (9.26). Also, it is interesting to note that c∗ is such that the expected number of the X i that exceed c∗ is equal to 1 (see Exercise 32). That the optimal value of c has this property is interesting and somewhat intuitive in as much as Inequality (9.24) is an equality when exactly one of the X i exceeds c. Example 9.29 Suppose the lifetime of component i is exponentially distributed with rate λi , i = 1, . . . , n. Then the minimizing value of c is such that 1=

n + i=1

P{X i > c∗ } =

n +

e−λi c

∗

i=1

and the resulting bound of the mean system life is n + > = E[(X i − c∗ )+ ] E max X i # c∗ + i

i=1

= c∗ +

n + ( i=1

E[(X i − c∗ )+ | X i > c∗ ]P{X i > c∗ ]

+ E[(X i − c∗ )+ | X i # c∗ ] P{X i # c∗ ]

n + 1 −λi c∗ =c + e λi ∗

)

i=1

In the special case where all the rates are equal, say, λi = λ, i = 1, . . . , n, then 1 = ne−λc

∗

or c∗ =

1 log(n) λ

Reliability Theory

593

and the bound is = > 1 E max X i # ( log(n) + 1) i λ

That is, if X 1 , . . . , X n are identically distributed exponential random variables with rate λ, then the preceding gives a bound on the expected value of their maximum. In the special case where these random variables are also independent, the following exact expression, given by Equation (9.25), is not much less than the preceding upper bound: n = > 1+ 1 E max X i = 1/i ≈ i λ λ i=1

9.7

1 1 d x ≈ log (n) x λ

Systems with Repair

Consider an n-component system having reliability function r (p). Suppose that component i functions for an exponentially distributed time with rate λi and then fails; once failed it takes an exponential time with rate µi to be repaired, i = 1, . . . , n. All components act independently. Let us suppose that all components are initially working, and let A(t) = P{system is working at t} A(t) is called the availability at time t. Since the components act independently, A(t) can be expressed in terms of the reliability function as follows: A(t) = r (A1 (t), . . . , An (t))

(9.27)

where Ai (t) = P{component i is functioning at t} Now the state of component i—either on or off—changes in accordance with a two-state continuous time Markov chain. Hence, from the results of Example 6.12 we have Ai (t) = P00 (t) =

µi λi + e−(λi +µi )t µi + λi µi + λi

Thus, we obtain . λ µ + e−(λ+µ)t A(t) = r µ+λ µ+λ

If we let t approach ∞, then we obtain the limiting availability—call it A—which is given by . µ A = lim A(t) = r t→∞ λ+µ

594

Introduction to Probability Models

Remarks (i) If the on and off distribution for component i are arbitrary continuous distributions with respective means 1/λi and 1/µi , i = 1, . . . , n, then it follows from the theory of alternating renewal processes (see Section 7.5.1) that Ai (t) →

1/λi µi = 1/λi + 1/µi µi + λi

and thus using the continuity of the reliability function, it follows from (9.27) that the limiting availability is . µ A = lim A(t) = r t→∞ µ+λ

Hence, A depends only on the on and off distributions through their means. (ii) It can be shown (using the theory of regenerative processes as presented in Section 7.5) that A will also equal the long-run proportion of time that the system will be functioning. 7n Example 9.30 For a series system, r (p) = i=1 pi and so A(t) =

and A=

n 3 " i=1

n " i=1

µi λi + e−(λi +µi )t µi + λi µi + λi

µi µi + λi

Example 9.31

For a parallel system, r (p) = 1 −

A(t) = 1 −

n 3 "

A(t) = 1 −

n "

and

i=1

λi (1 − e−(λi +µi )t ) µi + λi

λi µi + λi

i=1

(1 − pi ) and thus

The preceding system will alternate between periods when it is up and periods when it is down. Let us denote by Ui and Di , i " 1, the lengths of the ith up and down period respectively. For instance in a two-out-of-three system, U1 will be the time until two components are down; D1 , the additional time until two are up; U2 the additional time until two are down, and so on. Let U1 + · · · + Un , U¯ = lim n→∞ n D1 + · · · + Dn D¯ = lim n→∞ n

Reliability Theory

595

denote the average length of an up and down period respectively.∗ ¯ ¯ n up–down cycles—that is, in time #nTo determine U and D, note first that in the first # n (U + D )—the system will be up for a time i i i=1 i=1 Ui . Hence, the proportion of time the system will be up in the first n up–down cycles is #n U1 + · · · + Un i=1 Ui /n #n = #n U 1 + · · · + U n + D1 + · · · + Dn U /n + i=1 Di /n i i=1 As n → ∞, this must converge to A, the long-run proportion of time the system is up. Hence, . U¯ µ (9.28) = A=r λ+µ U¯ + D¯

However, to solve for U¯ and D¯ we need a second equation. To obtain #none consider the rate at which the system fails. As there will be n failures in time i=1 (Ui + Di ), it follows that the rate at which the system fails is n #n rate at which system fails = lim #n n→∞ U + i 1 1 D1 1 n #n = = lim #n (9.29) ¯ n→∞ U /n + D /n U + D¯ i i 1 1

That is, the foregoing yields the intuitive result that, on average, there is one failure every U¯ + D¯ time units. To utilize this, let us determine the rate at which a failure of component i causes the system to go from up to down. Now, the system will go from up to down when component i fails if the states of the other components x1 , . . . , xi−1 , xi−1 , . . . , xn are such that φ(1i , x) = 1, φ(0i , x) = 0. That is, the states of the other components must be such that φ(1i , x) − φ(0i , x) = 1

(9.30)

Since component i will, on average, have one failure every 1/λi + 1/µi time units, it follows that the rate at which component i fails is equal to (1/λi + 1/µi )−1 = λi µi /(λi + µi ). In addition, the states of the other components will be such that (9.30) holds with probability P{φ(1i , X (∞)) − φ(0i , X (∞)) = 1}

since φ(1i , X (∞)) − φ(0i , X (∞)) = E[φ(1i , X (∞)) − φ(0i , X (∞))] is a Bernoulli random variable . . µ µ − r 0i , = r 1i , λ+µ λ+µ

Hence, putting the preceding together we see that . .4 3 λi µi µ µ rate at which component − r 0i , r 1i , = i causes the system to fail λi + µi λ+µ λ+µ

∗ It can be shown using the theory of regenerative processes that, with probability 1, the preceding limits will exist and will be constants.

596

Introduction to Probability Models

Summing this over all components i thus gives . .4 + λi µi 3 µ µ r 1i , − r 0i , rate at which system fails = λi + µi λ+µ λ+µ i

Finally, equating the preceding with (9.29) yields . .4 + λi µi 3 µ µ 1 r 1i , − r 0i , = λi + µi λ+µ λ+µ U¯ + D¯ i

(9.31)

Solving (9.28) and (9.31), we obtain -

. µ r λ+µ U¯ = n . .4 , + λi µi 3 µ µ r 1i , − r 0i , λi + µi λ+µ λ+µ

(9.32)

.4 3 µ 1−r U¯ λ+µ ¯ . D= µ r λ+µ

(9.33)

i=1

Also, (9.31) yields the rate at which the system fails. ¯ we did not make use of the Remark In establishing the formulas for U¯ and D, assumption of exponential on and off times and in fact, our derivation is valid and Equations (9.32) and (9.33) hold whenever U¯ and D¯ are well defined (a sufficient condition is that all on and off distributions are continuous). The quantities λi , µi , i = 1, . . . , n, will represent, respectively, the reciprocals of the mean lifetimes and mean repair times. Example 9.32

For a series system, 7

µi 1 µ i + λi =# , U¯ = # λi µi 7 µj i λi i λi + µi j̸=i µ j + λ j 7 µi 1− i 1 µ i + λi D¯ = 7 ×# µi i λi i µi + λi i

Reliability Theory

597

whereas for a parallel system, 7

7 λi λi 1− i 1 µi + λi µi + λi U¯ = = ×# , # λi µi 7 7 λj λj i µi i j λi + µi j̸=i µ j + λ j µj + λj 7 λi i 1 µ i + λi D¯ = U¯ = # 7 λi i µi 1− i µi + λi 1−

The preceding formulas hold for arbitrary continuous up and down distributions with 1/λi and 1/µi denoting respectively the mean up and down times of component i, i = 1, . . . , n. !

9.7.1

A Series Model with Suspended Animation

Consider a series consisting of n components, and suppose that whenever a component (and thus the system) goes down, repair begins on that component and each of the other components enters a state of suspended animation. That is, after the down component is repaired, the other components resume operation in exactly the same condition as when the failure occurred. If two or more components go down simultaneously, one of them is arbitrarily chosen as being the failed component and repair on that component begins; the others that went down at the same time are considered to be in a state of suspended animation, and they will instantaneously go down when the repair is completed. We suppose that (not counting any time in suspended animation) the distribution of time that component i functions is Fi with mean u i , whereas its repair distribution is G i with mean di , i = 1, . . . , n. To determine the long-run proportion of time this system is working, we reason as follows. To begin, consider the time, call it T , at which the system has been up for a time t. Now, when the system is up, the failure times of component i constitute a renewal process with mean interarrival time u i . Therefore, it follows that number of failures of i in time T ≈

t ui

As the average repair time of i is di , the preceding implies that total repair time of i in time T ≈

tdi ui

Therefore, in the period of time in which the system has been up for a time t, the total system downtime has approximately been t

n + i=1

di /u i

598

Introduction to Probability Models

Hence, the proportion of time that the system has been up is approximately t +t

t #n

i=1 di /u i

Because this approximation should become exact as we let t become larger, it follows that proportion of time the system is up = which also shows that

1 #

(9.34)

di /u i

proportion of time the system is down = 1 − proportion of time the system is up # i di /u i # = 1 + i di /u i

Moreover, in the time interval from 0 to T , the proportion of the repair time that has been devoted to component i is approximately tdi /u i # i tdi /u i

Thus, in the long run, di /u i proportion of down time that is due to component i = # i di /u i

Multiplying the preceding by the proportion of time the system is down gives proportion of time component i is being repaired =

di /u i # i di /u i

Also, since component j will be in suspended animation whenever any of the other components is in repair, we see that # i̸= j di /u i # proportion of time component j is in suspended animation = 1 + i di /u i

Another quantity of interest is the long-run rate at which the system fails. Since component i fails at rate 1/u i when the system is up, and does not fail when the system is down, it follows that proportion of time system is up ui 1/u i # = 1 + i di /u i

rate at which i fails =

Since the system fails when any of its components fail, the preceding yields that # i 1/u i # (9.35) rate at which the system fails = 1 + i di /u i

Reliability Theory

599

If we partition the time axis into periods when the system is up and those when it is down, we can determine the average length of an up period by noting that if U (t) is the total amount of time that the system is up in the interval [0, t], and if N (t) is the number of failures by time t, then U (t) N (t) U (t)/t = lim t→∞ N (t)/t 1 =# i 1/u i

average length of an up period = lim

t→∞

where the final equality used Equations (9.34) and (9.35). Also, in a similar manner it can be shown that # di /u i (9.36) average length of a down period = #i i 1/u i

Exercises 1. Prove that, for any structure function φ, φ(x) = xi φ(1i , x) + (1 − xi )φ(0i , x) where (1i , x) = (x1 , . . . , xi−1 , 1, xi+1 , . . . , xn ), (0i , x) = (x1 , . . . , xi−1 , 0, xi+1 , . . . , xn ) 2. Show that (a) if φ(0, 0, . . . , 0) = 0 and φ(1, 1, . . . , 1) = 1, then min xi # φ(x) # max xi (b) φ( max(x, y)) " max(φ(x), φ(y)) (c) φ( min(x, y)) # min(φ(x), φ(y)) 3. For any structure function, we define the dual structure φ D by φ D (x) = 1 − φ(1 − x) (a) (b) (c) (d)

Show that the dual of a parallel (series) system is a series (parallel) system. Show that the dual of a dual structure is the original structure. What is the dual of a k-out-of-n structure? Show that a minimal path (cut) set of the dual system is a minimal cut (path) set of the original structure.

600

Introduction to Probability Models

*4. Write the structure function corresponding to the following: (a) See Figure 9.16:

Figure 9.16

(b) See Figure 9.17:

Figure 9.17

Figure 9.18

5. Find the minimal path and minimal cut sets for: (a) See Figure 9.19:

Figure 9.19

Reliability Theory

601

(b) See Figure 9.20:

Figure 9.20

*6. The minimal path sets are {1, 2, 4}, {1, 3, 5}, and {5, 6}. Give the minimal cut sets. 7. The minimal cut sets are {1, 2, 3}, {2, 3, 4}, and {3, 5}. What are the minimal path sets? 8. Give the minimal path sets and the minimal cut sets for the structure given by Figure 9.21 9. Component i is said to be relevant to the system if for some state vector x, φ(1i , x) = 1,

φ(0i , x) = 0

Otherwise, it is said to be irrelevant.

Figure 9.21

(a) Explain in words what it means for a component to be irrelevant. (b) Let A1 , . . . , As be the minimal path 2ssets of a system, and let S denote the Ai if and only if all components are set of components. Show that S = i=1 relevant. 2k Ci if and (c) Let C1 , . . . , Ck denote the minimal cut sets. Show that S = i=1 only if all components are relevant.

10. Let ti denote the time of failure of the ith component; let τφ (t) denote the time to failure of the system φ as a function of the vector t = (t1 , . . . , tn ). Show that max min ti = τφ (t) = min max ti

1! j!s i∈A j

1! j!k i∈C j

where C1 , . . . , Ck are the minimal cut sets, and A1 , . . . , As the minimal path sets. 11. Give the reliability function of the structure of Exercise 8.

602

Introduction to Probability Models

*12. Give the minimal path sets and the reliability function for the structure in Figure 9.22.

Figure 9.22

13. Let r (p) be the reliability function. Show that r (p) = pi r (1i , p) + (1 − pi )r (0i , p) 14. Compute the reliability function of the bridge system (see Figure 9.11) by conditioning upon whether or not component 3 is working. 15. Compute upper and lower bounds of the reliability function (using Method 2) for the systems given in Exercise 4, and compare them with the exact values when pi ≡ 21 . 16. Compute the upper and lower bounds of r (p) using both methods for the (a) two-out-of-three system and (b) two-out-of-four system. (c) Compare these bounds with the exact reliability when (i) pi ≡ 0.5 (ii) pi ≡ 0.8 (iii) pi ≡ 0.2

*17. Let N be a nonnegative, integer-valued random variable. Show that P{N > 0} "

(E[N ])2 E[N 2 ]

and explain how this inequality can be used to derive additional bounds on a reliability function. Hint: E[N 2 ] = E[N 2 | N > 0]P{N > 0} " (E[N | N > 0])2 P{N > 0}

(Why?) (Why?)

Now multiply both sides by P{N > 0}. 18. Consider a structure in which the minimal path sets are {1, 2, 3} and {3, 4, 5}. (a) What are the minimal cut sets?

Reliability Theory

603

(b) If the component lifetimes are independent uniform (0, 1) random variables, determine the probability that the system life will be less than 21 . 19. Let X 1 , X 2 , . . . , X n denote independent and identically distributed random variables and define the order statistics X (1) , . . . , X (n) by X (i) ≡ ith smallest of X 1 , . . . , X n Show that if the distribution of X j is IFR, then so is the distribution of X (i) . Hint: Relate this to one of the examples of this chapter. 20. Let F be a continuous distribution function. For some positive α, define the distribution function G by α ¯ ¯ G(t) = ( F(t))

Find the relationship between λG (t) and λ F (t), the respective failure rate functions of G and F. 21. Consider the following four structures: (i) See Figure 9.23:

Figure 9.23

(ii) See Figure 9.24:

Figure 9.24

(iii) See Figure 9.25:

Figure 9.25

604

Introduction to Probability Models

(iv) See Figure 9.26:

Figure 9.26

Let F1 , F2 , and F3 be the corresponding component failure distributions; each of which is assumed to be IFR (increasing failure rate). Let F be the system failure distribution. All components are independent. (a) For which structures is F necessarily IFR if F1 = F2 = F3 ? Give reasons. (b) For which structures is F necessarily IFR if F2 = F3 ? Give reasons. (c) For which structures is F necessarily IFR if F1 ̸= F2 ̸= F3 ? Give reasons.

*22. Let X denote the lifetime of an item. Suppose the item has reached the age of t. Let X t denote its remaining life and define F¯t (a) = P{X t > a}

In words, F¯t (a) is the probability that a t-year-old item survives an additional time a. Show that ¯ + a)/ F(t) ¯ where F is the distribution function of X . (a) F¯t (a) = F(t (b) Another definition of IFR is to say that F is IFR if F¯t (a) decreases in t, for all a. Show that this definition is equivalent to the one given in the text when F has a density. 23. Show that if each (independent) component of a series system has an IFR distribution, then the system lifetime is itself IFR by (a) showing that λ F (t) =

λi (t)

where λ F (t) is the failure rate function of the system; and λi (t) the failure rate function of the lifetime of component i. (b) using the definition of IFR given in Exercise 22. 24. Show that if F is IFR, then it is also IFRA, and show by counterexample that the reverse is not true. *25. We say that ζ is a p-percentile of the distribution F if F(ζ ) = p. Show that if ζ is a p-percentile of the IFRA distribution F, then ¯ F(x) # e−θ x , ¯ F(x) " e−θ x ,

x "ζ x #ζ

Reliability Theory

605

where θ=

− log(1 − p) ζ

26. Prove Lemma 9.3. Hint: Let x = y+δ. Note that f (t) = t α is a concave function when 0 # α # 1, and use the fact that for a concave function f (t + h) − f (t) is decreasing in t. 27. Let r ( p) = r ( p, p, . . . , p). Show that if r ( p0 ) = p0 , then r ( p) " p r ( p) # p

for p " p0 for p # p0

Hint: Use Proposition 9.2. 28. Find the mean lifetime of a series system of two components when the component lifetimes are respectively uniform on (0, 1) and uniform on (0, 2). Repeat for a parallel system. 29. Show that the mean lifetime of a parallel system of two components is µ1 µ2 1 + + µ1 + µ2 (µ1 + µ2 )µ2 (µ1 + µ2 )µ1

when the first component is exponentially distributed with mean 1/µ1 and the second is exponential with mean 1/µ2 . *30. Compute the expected system lifetime of a three-out-of-four system when the first two component lifetimes are uniform on (0, 1) and the second two are uniform on (0, 2). 31. Show that the variance of the lifetime of a k-out-of-n system of components, each of whose lifetimes is exponential with mean θ , is given by θ2

n + 1 i2 i=k

32. In Section 9.6.1 show that the expected number of X i that exceed c∗ is equal to 1. 33. Let X i be an exponential random variable with mean 8 + 2i, for i = 1, 2, 3. Use the results of Section 9.6.1 to obtain an upper bound on E[max X i ], and then compare this with the exact result when the X i are independent. 34. For the model of Section 9.7, compute for a k-out-of-n structure (i) the average up time, (ii) the average down time, and (iii) the system failure rate. 35. Prove the combinatorial identity . - . . - . n−1 n n n = − + ··· ± , i #n i −1 i i +1 n (a) by induction on i (b) by a backwards induction argument on i—that is, prove it first for i = n, then assume it for i = k and show that this implies that it is true for i = k −1.

36. Verify Equation (9.36).

606

Introduction to Probability Models

References [1] R. E. Barlow and F. Proschan, “Statistical Theory of Reliability and Life Testing,” Holt, New York, 1975. [2] H. Frank and I. Frisch, “Communication, Transmission, and Transportation Network,” Addison-Wesley, Reading, Massachusetts, 1971. [3] I. B. Gertsbakh, “Statistical Reliability Theory,” Marcel Dekker, New York and Basel, 1989.

Brownian Motion and Stationary Processes

10.1

Brownian Motion

Let us start by considering the symmetric random walk, which in each time unit is equally likely to take a unit step either to the left or to the right. That is, it is a Markov chain with Pi,i+1 = 21 = Pi,i−1 , i = 0, ±1, . . . . Now suppose that we speed up this process by taking smaller and smaller steps in smaller and smaller time intervals. If we now go to the limit in the right manner what we obtain is Brownian motion. More precisely, suppose that each !t time unit we take a step of size !x either to the left or the right with equal probabilities. If we let X (t) denote the position at time t then X (t) = !x(X 1 + · · · + X [t/!t] ) where Xi =

+1, −1,

(10.1)

if the ith step of length !x is to the right if it is to the left

[t/!t] is the largest integer less than or equal to t/!t, and the X i are assumed independent with P{X i = 1} = P{X i = −1} =

1 2

As E[X i ] = 0, Var(X i ) = E[X i2 ] = 1, we see from Equation (10.1) that E[X (t)] = 0,

Var(X (t)) = (!x)2

t !t

(10.2)

608

Introduction to Probability Models

We shall now let !x and !t go to 0. However, we must do it in a way such that the resulting limiting process is nontrivial (for instance, if we let !x = !t and let !t → 0, then from the preceding we see that E[X (t)] and Var(X (t)) would both √ converge to 0 and thus X (t) would equal 0 with probability 1). If we let !x = σ !t for some positive constant σ then from Equation (10.2) we see that as !t → 0 E[X (t)] = 0,

Var(X (t)) → σ 2 t We now √ list some intuitive properties of this limiting process obtained by taking !x = σ !t and then letting !t → 0. From Equation (10.1) and the central limit theorem the following seems reasonable: (i) X (t) is normal with mean 0 and variance σ 2 t. In addition, because the changes of value of the random walk in nonoverlapping time intervals are independent, (ii) {X (t), t ! 0} has independent increments, in that for all t1 < t2 < · · · < tn X (tn ) − X (tn−1 ), X (tn−1 ) − X (tn−2 ), . . . , X (t2 ) − X (t1 ), X (t1 ) are independent. Finally, because the distribution of the change in position of the random walk over any time interval depends only on the length of that interval, it would appear that (iii) {X (t), t ! 0} has stationary increments, in that the distribution of X (t +s)− X (t) does not depend on t. We are now ready for the following formal definition. Definition 10.1 process if

A stochastic process {X (t), t ! 0} is said to be a Brownian motion

(i) X (0) = 0; (ii) {X (t), t ! 0} has stationary and independent increments; (iii) for every t > 0, X (t) is normally distributed with mean 0 and variance σ 2 t. The Brownian motion process, sometimes called the Wiener process, is one of the most useful stochastic processes in applied probability theory. It originated in physics as a description of Brownian motion. This phenomenon, named after the English botanist Robert Brown who discovered it, is the motion exhibited by a small particle that is totally immersed in a liquid or gas. Since then, the process has been used beneficially in such areas as statistical testing of goodness of fit, analyzing the price levels on the stock market, and quantum mechanics. The first explanation of the phenomenon of Brownian motion was given by Einstein in 1905. He showed that Brownian motion could be explained by assuming that the immersed particle was continually being subjected to bombardment by the molecules of the surrounding medium. However, the preceding concise definition of this stochastic process underlying Brownian motion was given by Wiener in a series of papers originating in 1918. When σ = 1, the process is called standard Brownian motion. Because any Brownian motion can be converted to the standard process by letting B(t) = X (t)/σ we shall, unless otherwise stated, suppose throughout this chapter that σ = 1.

Brownian Motion and Stationary Processes

609

The interpretation of Brownian motion as the limit of the random walks (Equation (10.1)) suggests that X (t) should be a continuous function of t, which turns out to be true. To prove this, we must show that with probability 1 lim (X (t + h) − X (t)) = 0

h→0

Although a rigorous proof of the preceding is beyond the scope of this text, a plausibility argument is obtained by noting that the random variable X (t + h) − X (t) has mean 0 and variance h, and so would seem to converge to a random variable with mean 0 and variance 0 as h → 0. That is, it seems reasonable that X (t + h) − X (t) converges to 0, thus yielding continuity. Although X (t) will, with probability 1, be a continuous function of t, it possesses the interesting property of being nowhere differentiable. To see why this might be (t) has mean 0 and variance 1/h. Because the variance the case, note that X (t+h)−X h X (t+h)−X (t) converges to ∞ as h → 0, it is not surprising that the ratio does not of h converge. As X (t) is normal with mean 0 and variance t, its density function is given by 1 2 e−x /2t f t (x) = √ 2π t To obtain the joint density function of X (t1 ), X (t2 ), . . . , X (tn ) for t1 < · · · < tn , note first that the set of equalities X (t1 ) = x1 , X (t2 ) = x2 , .. . X (tn ) = xn is equivalent to X (t1 ) = x1 , X (t2 ) − X (t1 ) = x2 − x1 , .. . X (tn ) − X (tn−1 ) = xn − xn−1 However, by the independent increment assumption it follows that X (t1 ), X (t2 ) − X (t1 ), . . . , X (tn ) − X (tn−1 ), are independent and, by the stationary increment assumption, that X (tk ) − X (tk−1 ) is normal with mean 0 and variance tk − tk−1 . Hence, the joint density of X (t1 ), . . . , X (tn ) is given by f (x1 , x2 , . . . , xn ) = f t1 (x1 ) f t2 −t1 (x2 − x1 ) · · · f tn −tn−1 (xn − xn−1 ) ! %& $ 1 x12 (x2 − x1 )2 (xn − xn−1 )2 exp − + + ··· + 2 t1 t2 − t1 tn − tn−1 = n/2 (2π ) [t1 (t2 − t1 ) · · · (tn − tn−1 )]1/2

(10.3)

610

Introduction to Probability Models

From this equation, we can compute in principle any desired probabilities. For instance, suppose we require the conditional distribution of X (s) given that X (t) = B where s < t. The conditional density is f s|t (x|B) =

f s (x) f t−s (B − x) f t (B)

= K 1 exp{−x 2 /2s − (B − x)2 /2(t − s)} ( ) * ' 1 1 Bx 2 + + = K 2 exp −x 2s 2(t − s) t −s ( )* ' t sB x2 − 2 x = K 2 exp − 2s(t − s) t * ' 2 (x − Bs/t) = K 3 exp − 2s(t − s)/t where K 1 , K 2 , and K 3 do not depend on x. Hence, we see from the preceding that the conditional distribution of X (s) given that X (t) = B is, for s < t, normal with mean and variance given by s B, t s Var[X (s)|X (t) = B] = (t − s) t E[X (s)|X (t) = B] =

(10.4)

Example 10.1 In a bicycle race between two competitors, let Y (t) denote the amount of time (in seconds) by which the racer that started in the inside position is ahead when 100t percent of the race has been completed, and suppose that {Y (t), 0 " t " 1} can be effectively modeled as a Brownian motion process with variance parameter σ 2 . (a) If the inside racer is leading by σ seconds at the midpoint of the race, what is the probability that she is the winner? (b) If the inside racer wins the race by a margin of σ seconds, what is the probability that she was ahead at the midpoint? Solution: (a)

P{Y (1) > 0|Y (1/2) = σ }

= P{Y (1) − Y (1/2) > −σ |Y (1/2) = σ } = P{Y (1) − Y (1/2) > −σ } by independent increments

= P{Y (1/2) > −σ } by stationary increments ' * √ Y (1/2) =P √ >− 2 σ/ 2 √ = $( 2)

≈ 0.9213

where $(x) = P{N (0, 1) " x} is the standard normal distribution function.

Brownian Motion and Stationary Processes

611

(b) Because we must compute P{Y (1/2) > 0|Y (1) = σ }, let us first determine the conditional distribution of Y (s) given that Y (t) = C, when s < t. Now, since {X (t), t ! 0} is standard Brownian motion when X (t) = Y (t)/σ , we obtain from Equation (10.4) that the conditional distribution of X (s), given that X (t) = C/σ , is normal with mean sC/tσ and variance s(t − s)/t. Hence, the conditional distribution of Y (s) = σ X (s) given that Y (t) = C is normal with mean sC/t and variance σ 2 s(t − s)/t. Hence, P{Y (1/2) > 0|Y (1) = σ } = P{N (σ/2, σ 2 /4) > 0} = $(1) ≈ 0.8413

10.2

Hitting Times, Maximum Variable, and the Gambler’s Ruin Problem

Let Ta denote the first time the Brownian motion process hits a. When a > 0 we will compute P{Ta " t} by considering P{X (t) ! a} and conditioning on whether or not Ta " t. This gives P{X (t) ! a} = P{X (t) ! a|Ta " t}P{Ta " t}

+ P{X (t) ! a|Ta > t}P{Ta > t}

(10.5)

Now if Ta " t, then the process hits a at some point in [0, t] and, by symmetry, it is just as likely to be above a or below a at time t. That is, P{X (t) ! a|Ta " t} =

1 2

As the second right-hand term of Equation (10.5) is clearly equal to 0 (since, by continuity, the process value cannot be greater than a without having yet hit a), we see that P{Ta " t} = 2P{X (t) ! a} + ∞ 2 2 =√ e−x /2t d x 2π t a + ∞ 2 −y 2 /2 =√ dy, √ e 2π a/ t

a>0

(10.6)

For a < 0, the distribution of Ta is, by symmetry, the same as that of T−a . Hence, from Equation (10.6) we obtain + ∞ 2 −y 2 /2 dy (10.7) P{Ta " t} = √ √ e 2π |a|/ t

612

Introduction to Probability Models

Another random variable of interest is the maximum value the process attains in [0, t]. Its distribution is obtained as follows: For a > 0 ' * P max X (s) ! a = P{Ta " t} by continuity 0!s!t

= 2P{X (t) ! a} from (10.6) + ∞ 2 2 −y /2 =√ dy √ e 2π a/ t

Let us now consider the probability that Brownian motion hits A before −B where A > 0, B > 0. To compute this we shall make use of the interpretation of Brownian motion as being a limit of the symmetric random walk. To start let us recall from the results of the gambler’s ruin problem (see Section 4.5.1) that the probability that the symmetric random walk goes up A before going down B when each step is equally likely to be either up or down a distance !x is (by Equation (4.14) with N = (A+ B)/!x, i = B/!x) equal to B!x/(A + B)!x = B/(A + B). Hence, upon letting !x → 0, we see that P{up A before down B} =

10.3 10.3.1

B A+B

Variations on Brownian Motion Brownian Motion with Drift

We say that {X (t), t ! 0} is a Brownian motion process with drift coefficient µ and variance parameter σ 2 if (i) X (0) = 0; (ii) {X (t), t ! 0} has stationary and independent increments; (iii) X (t) is normally distributed with mean µt and variance tσ 2 . An equivalent definition is to let {B(t), t ! 0} be standard Brownian motion and then define X (t) = σ B(t) + µt It follows from this representation that X (t) will also be a continuous function of t.

10.3.2

Geometric Brownian Motion

If {Y (t), t ! 0} is a Brownian motion process with drift coefficient µ and variance parameter σ 2 , then the process {X (t), t ! 0} defined by X (t) = eY (t) is called geometric Brownian motion.

Brownian Motion and Stationary Processes

613

For a geometric Brownian motion process {X (t)}, let us compute the expected value of the process at time t given the history of the process up to time s. That is, for s < t, consider E[X (t)|X (u), 0 " u " s]. Now, E[X (t)|X (u), 0 " u " s] = E[eY (t) |Y (u), 0 " u " s] = E[eY (s)+Y (t)−Y (s) |Y (u), 0 " u " s] = eY (s) E[eY (t)−Y (s) |Y (u), 0 " u " s] = X (s)E[eY (t)−Y (s) ]

where the next to last equality follows from the fact that Y (s) is given, and the last equality from the independent increment property of Brownian motion. Now, the moment generating function of a normal random variable W is given by E[eaW ] = ea E[W ]+a

2 Var(W )/2

Hence, since Y (t) − Y (s) is normal with mean µ(t − s) and variance (t − s)σ 2 , it follows by setting a = 1 that E[eY (t)−Y (s) ] = eµ(t−s)+(t−s)σ

2 /2

Thus, we obtain

E[X (t)|X (u), 0 " u " s] = X (s)e(t−s)(µ+σ

2 /2)

(10.8)

Geometric Brownian motion is useful in the modeling of stock prices over time when you feel that the percentage changes are independent and identically distributed. For instance, suppose that X n is the price of some stock at time n. Then, it might be reasonable to suppose that X n /X n−1 , n ! 1, are independent and identically distributed. Let Yn = X n /X n−1

and so

X n = Yn X n−1

Iterating this equality gives X n = Yn Yn−1 X n−2 = Yn Yn−1 Yn−2 X n−3 .. . Thus,

= Yn Yn−1 · · · Y1 X 0

log(X n ) =

n , i=1

log (Yi ) + log (X 0 )

Since log(Yi ), i ! 1 are independent and identically distributed, {log (X n )} will, when suitably normalized, approximately be Brownian motion with a drift, and so {X n } will be approximately geometric Brownian motion.

614

10.4

Introduction to Probability Models

Pricing Stock Options

10.4.1

An Example in Options Pricing

In situations in which money is to be received or paid out in differing time periods, we must take into account the time value of money. That is, to be given the amount v a time t in the future is not worth as much as being given v immediately. The reason for this is that if we were immediately given v, then it could be loaned out with interest and so be worth more than v at time t. To take this into account, we will suppose that the time 0 value, also called the present value, of the amount v to be earned at time t is ve−αt . The quantity α is often called the discount factor. In economic terms, the assumption of the discount function e−αt is equivalent to the assumption that we can earn interest at a continuously compounded rate of 100α percent per unit time. We will now consider a simple model for pricing an option to purchase a stock at a future time at a fixed price. Suppose the present price of a stock is $100 per unit share, and suppose we know that after one time period it will be, in present value dollars, either $200 or $50 (see Figure 10.1). It should be noted that the prices at time 1 are the present value (or time 0) prices. That is, if the discount factor is α, then the actual possible prices at time 1 are either 200eα or 50eα . To keep the notation simple, we will suppose that all prices given are time 0 prices. Suppose that for any y, at a cost of cy, you can purchase at time 0 the option to buy y shares of the stock at time 1 at a (time 0) cost of $150 per share. Thus, for instance, if you do purchase this option and the stock rises to $200, then you would exercise the option at time 1 and realize a gain of $200 − 150 = $50 for each of the y option units purchased. On the other hand, if the price at time 1 was $50, then the option would be worthless at time 1. In addition, at a cost of 100x you can purchase x units of the stock at time 0, and this will be worth either 200x or 50x at time 1. We will suppose that both x or y can be either positive or negative (or zero). That is, you can either buy or sell both the stock and the option. For instance, if x were negative then you would be selling −x shares of the stock, yielding you a return of −100x, and you would then be responsible for buying −x shares of the stock at time 1 at a cost of either $200 or $50 per share. We are interested in determining the appropriate value of c, the unit cost of an option. Specifically, we will show that unless c = 50/3 there will be a combination of purchases that will always result in a positive gain.

Figure 10.1

Brownian Motion and Stationary Processes

615

To show this, suppose that at time 0 we buy x units of stock, and buy y units of options where x and y (which can be either positive or negative) are to be determined. The value of our holding at time 1 depends on the price of the stock at that time; and it is given by the following ' 200x + 50y, if price is 200 value = 50x, if price is 50 The preceding formula follows by noting that if the price is 200 then the x units of the stock are worth 200x, and the y units of the option to buy the stock at a unit price of 150 are worth (200 − 150)y. On the other hand, if the stock price is 50, then the x units are worth 50x and the y units of the option are worthless. Now, suppose we choose y to be such that the preceding value is the same no matter what the price at time 1. That is, we choose y so that or

200x + 50y = 50x y = −3x

(Note that y has the opposite sign of x, and so if x is positive and as a result x units of the stock are purchased at time 0, then 3x units of stock options are also sold at that time. Similarly, if x is negative, then −x units of stock are sold and −3x units of stock options are purchased at time 0.) Thus, with y = −3x, the value of our holding at time 1 is value = 50x

Since the original cost of purchasing x units of the stock and −3x units of options is original cost = 100x − 3xc,

we see that our gain on the transaction is gain = 50x − (100x − 3xc) = x(3c − 50)

Thus, if 3c = 50, then the gain is 0; on the other hand if 3c ̸= 50, we can guarantee a positive gain (no matter what the price of the stock at time 1) by letting x be positive when 3c > 50 and letting it be negative when 3c < 50. For instance, if the unit cost per option is c = 20, then purchasing 1 unit of the stock (x = 1) and simultaneously selling 3 units of the option (y = −3) initially costs us 100 − 60 = 40. However, the value of this holding at time 1 is 50 whether the stock goes up to 200 or down to 50. Thus, a guaranteed profit of 10 is attained. Similarly, if the unit cost per option is c = 15, then selling 1 unit of the stock (x = −1) and buying 3 units of the option (y = 3) leads to an initial gain of 100 − 45 = 55. On the other hand, the value of this holding at time 1 is −50. Thus, a guaranteed profit of 5 is attained. A sure win betting scheme is called an arbitrage. Thus, as we have just seen, the only option cost c that does not result in an arbitrage is c = 50/3.

616

Introduction to Probability Models

10.4.2

The Arbitrage Theorem

Consider an experiment whose set of possible outcomes is S = {1, 2, . . . , m}. Suppose that n wagers are available. If the amount x is bet on wager i, then the return xri ( j) is earned if the outcome of the experiment is j. In other words, ri (·) is the return function for a unit bet on wager i. The amount bet on a wager is allowed to be either positive or negative or zero. A betting scheme is a vector x = (x1 , . . . , xn ) with the interpretation that x1 is bet on wager 1, x2 on wager 2, . . . , and xn on wager n. If the outcome of the experiment is j, then the return from the betting scheme x is return from x =

n ,

xi ri ( j)

i=1

The following theorem states that either there exists a probability vector p = ( p1 , . . . , pm ) on the set of possible outcomes of the experiment under which each of the wagers has expected return 0, or else there is a betting scheme that guarantees a positive win. Theorem 10.1 (The Arbitrage Theorem) Exactly one of the following is true: Either (i) there exists a probability vector p = ( p1 , . . . , pm ) for which m ,

p j ri ( j) = 0,

n ,

xi ri ( j) > 0,

j=1

for all i = 1, . . . , n

or (ii) there exists a betting scheme x = (x1 , . . . , xn ) for which i=1

for all j = 1, . . . , m

In other words, if X is the outcome of the experiment, then the arbitrage theorem states that either there is a probability vector p for X such that E p [ri (X )] = 0,

for all i = 1, . . . , n

or else there is a betting scheme that leads to a sure win. Remark This theorem is a consequence of the (linear algebra) theorem of the separating hyperplane, which is often used as a mechanism to prove the duality theorem of linear programming. The theory of linear programming can be used to determine a betting strategy that guarantees the greatest return. Suppose that the absolute value of the amount bet on each wager must be less than or equal to 1. To determine the vector x that yields the greatest guaranteed win—call this win v—we need to choose x and v so as to maximize v, subject to the constraints n ,

xi ri ( j) ! v,

i=1

−1 " xi " 1,

for j = 1, . . . , m i = 1, . . . , n

Brownian Motion and Stationary Processes

617

This optimization problem is a linear program and can be solved by standard techniques (such as by using the simplex algorithm). The arbitrage theorem yields - that the optimal v will be positive unless there is a probability vector p for which mj=1 p j ri ( j) = 0 for all i = 1, . . . , n.

Example 10.2 In some situations, the only types of wagers allowed are to choose one of the outcomes i, i = 1, . . . , m, and bet that i is the outcome of the experiment. The return from such a bet is often quoted in terms of “odds.” If the odds for outcome i are oi (often written as “oi to 1”) then a 1-unit bet will return oi if the outcome of the experiment is i and will return −1 otherwise. That is, ' o , if j = i ri ( j) = i −1 otherwise Suppose the odds o1 , . . . , om are posted. In order for there not to be a sure win there must be a probability vector p = ( p1 , . . . , pm ) such that 0 ≡ E p [ri (X )] = oi pi − (1 − pi ) That is, we must have pi =

1 1 + oi

Since the pi must sum to 1, this means that the condition for there not to be an arbitrage is that m , i=1

(1 + oi )−1 = 1

Thus, if the posted odds are such that i (1 + oi )−1 ̸= 1, then a sure win is possible. For instance, suppose there are three possible outcomes and the odds are as follows: Outcome 1 2 3

Odds 1 2 3

That is, the odds for outcome 1 are 1 − 1, the odds for outcome 2 are 2 − 1, and that for outcome 3 are 3 − 1. Since 1 2

1 3

1 4

a sure win is possible. One possibility is to bet −1 on outcome 1 (and so you either win 1 if the outcome is not 1 and lose 1 if the outcome is 1) and bet −0.7 on outcome 2, and −0.5 on outcome 3. If the experiment results in outcome 1, then we win −1 + 0.7 + 0.5 = 0.2; if it results in outcome 2, then we win 1 − 1.4 + 0.5 = 0.1; if it results in outcome 3, then we win 1 + 0.7 − 1.5 = 0.2. Hence, in all cases we win a positive amount. #

618

Introduction to Probability Models

Remark xi =

i (1 + oi )

−1

̸= 1, then the betting scheme

(1 + oi )−1 , 1 − i (1 + oi )−1

i = 1, . . . , n

will always yield a gain of exactly 1. Example 10.3 Let us reconsider the option pricing example of the previous section, where the initial price of a stock is 100 and the present value of the price at time 1 is either 200 or 50. At a cost of c per share we can purchase at time 0 the option to buy the stock at time 1 at a present value price of 150 per share. The problem is to set the value of c so that no sure win is possible. In the context of this section, the outcome of the experiment is the value of the stock at time 1. Thus, there are two possible outcomes. There are also two different wagers: to buy (or sell) the stock, and to buy (or sell) the option. By the arbitrage theorem, there will be no sure win if there is a probability vector ( p, 1 − p) that makes the expected return under both wagers equal to 0. Now, the return from purchasing 1 unit of the stock is ' 200 − 100 = 100, return = 50 − 100 = −50,

if the price is 200 at time 1 if the price is 50 at time 1

Hence, if p is the probability that the price is 200 at time 1, then E[return] = 100 p − 50(1 − p) Setting this equal to 0 yields p=

1 3

That is, the only probability vector ( p, 1 − p) for which wager 1 yields an expected return 0 is the vector ( 13 , 23 ). Now, the return from purchasing one share of the option is ' 50 − c, return = −c,

if price is 200 if price is 50

Hence, the expected return when p =

1 3

E[return] = (50 − c) 13 − c 23 =

50 3

−c

Thus, it follows from the arbitrage theorem that the only value of c for which there will not be a sure win is c = 50 # 3 , which verifies the result of section 10.4.1.

Brownian Motion and Stationary Processes

10.4.3

619

The Black-Scholes Option Pricing Formula

Suppose the present price of a stock is X (0) = x0 , and let X (t) denote its price at time t. Suppose we are interested in the stock over the time interval 0 to T . Assume that the discount factor is α (equivalently, the interest rate is 100α percent compounded continuously), and so the present value of the stock price at time t is e−αt X (t). We can regard the evolution of the price of the stock over time as our experiment, and thus the outcome of the experiment is the value of the function X (t), 0 " t " T . The types of wagers available are that for any s < t we can observe the process for a time s and then buy (or sell) shares of the stock at price X (s) and then sell (or buy) these shares at time t for the price X (t). In addition, we will suppose that we may purchase any of N different options at time 0. Option i, costing ci per share, gives us the option of purchasing shares of the stock at time ti for the fixed price of K i per share, i = 1, . . . , N . Suppose we want to determine values of the ci for which there is no betting strategy that leads to a sure win. Assuming that the arbitrage theorem can be generalized (to handle the preceding situation, where the outcome of the experiment is a function), it follows that there will be no sure win if and only if there exists a probability measure over the set of outcomes under which all of the wagers have expected return 0. Let P be a probability measure on the set of outcomes. Consider first the wager of observing the stock for a time s and then purchasing (or selling) one share with the intention of selling (or purchasing) it at time t, 0 " s < t " T . The present value of the amount paid for the stock is e−αs X (s), whereas the present value of the amount received is e−αt X (t). Hence, in order for the expected return of this wager to be 0 when P is the probability measure on X (t), 0 " t " T , we must have E P [e−αt X (t)|X (u), 0 " u " s] = e−αs X (s)

(10.9)

Consider now the wager of purchasing an option. Suppose the option gives us the right to buy one share of the stock at time t for a price K . At time t, the worth of this option will be as follows: ' X (t) − K , if X (t) ! K worth of option at time t = 0, if X (t) < K

That is, the time t worth of the option is (X (t) − K )+ . Hence, the present value of the worth of the option is e−αt (X (t) − K )+ . If c is the (time 0) cost of the option, we see that, in order for purchasing the option to have expected (present value) return 0, we must have E P [e−αt (X (t) − K )+ ] = c

(10.10)

By the arbitrage theorem, if we can find a probability measure P on the set of outcomes that satisfies Equation (10.9), then if c, the cost of an option to purchase one share at time t at the fixed price K , is as given in Equation (10.10), then no arbitrage is possible. On the other hand, if for given prices ci , i = 1, . . . , N , there is no probability measure P that satisfies both (10.9) and the equality ci = E P [e−αti (X (ti ) − K i )+ ],

i = 1, . . . , N

620

Introduction to Probability Models

then a sure win is possible. We will now present a probability measure P on the outcome X (t), 0 " t " T , that satisfies Equation (10.9). Suppose that X (t) = x0 eY (t) where {Y (t), t ! 0} is a Brownian motion process with drift coefficient µ and variance parameter σ 2 . That is, {X (t), t ! 0} is a geometric Brownian motion process (see Section 10.3.2). From Equation (10.8) we have that, for s < t, E[X (t)|X (u), 0 " u " s] = X (s)e(t−s)(µ+σ

2 /2)

Hence, if we choose µ and σ 2 so that µ + σ 2 /2 = α then Equation (10.9) will be satisfied. That is, by letting P be the probability measure governing the stochastic process {x0 eY (t) , 0 " t " T }, where {Y (t)} is Brownian motion with drift parameter µ and variance parameter σ 2 , and where µ + σ 2 /2 = α, Equation (10.9) is satisfied. It follows from the preceding that if we price an option to purchase a share of the stock at time t for a fixed price K by c = E P [e−αt (X (t) − K )+ ] then no arbitrage is possible. Since X (t) = x0 eY (t) , where Y (t) is normal with mean µt and variance tσ 2 , we see that ceαt = =

∞

−∞ ∞

(x0 e y − K )+ √

1 2π tσ 2

e−(y−µt)

2 /2tσ 2

1 2 2 (x0 e y − K ) √ e−(y−µt) /2tσ dy 2 log(K /x0 ) 2π tσ

Making the change of variable w = (y − µt)/(σ t 1/2 ) yields 1 ceαt = x0 eµt √ 2π

where a=

log(K /x0 ) − µt √ σ t

∞

√

eσ w t e−w

2 /2

1 dw − K √ 2π

∞

e−w

2 /2

(10.11)

Brownian Motion and Stationary Processes

621

Now, 1 √ 2π

∞

√ σ w t −w2 /2

dw = e

tσ 2 /2

= etσ

2 /2

1 √ 2π

∞

e−(w−σ

√

t)2 /2

√ P{N (σ t, 1) ! a}

√ P{N (0, 1) ! a − σ t} √ 2 = etσ /2 P{N (0, 1) " −(a − σ t)} √ 2 = etσ /2 φ(σ t − a)

= etσ

2 /2

where N (m, v) is a normal random variable with mean m and variance v, and φ is the standard normal distribution function. Thus, we see from Equation (10.11) that √ 2 ceαt = x0 eµt+σ t/2 φ(σ t − a) − K φ(−a) Using that µ + σ 2 /2 = α and letting b = −a, we can write this as follows: √ c = x0 φ(σ t + b) − K e−αt φ(b)

(10.12)

where b=

αt − σ 2 t/2 − log(K /x0 ) √ σ t

The option price formula given by Equation (10.12) depends on the initial price of the stock x0 , the option exercise time t, the option exercise price K , the discount (or interest rate) factor α, and the value σ 2 . Note that for any value of σ 2 , if the options are priced according to the formula of Equation (10.12) then no arbitrage is possible. However, as many people believe that the price of a stock actually follows a geometric Brownian motion—that is, X (t) = x0 eY (t) where Y (t) is Brownian motion with parameters µ and σ 2 —it has been suggested that it is natural to price the option according to the formula of Equation (10.12) with the parameter σ 2 taken equal to the estimated value (see the remark that follows) of the variance parameter under the assumption of a geometric Brownian motion model. When this is done, the formula of Equation (10.12) is known as the Black–Scholes option cost valuation. It is interesting that this valuation does not depend on the value of the drift parameter µ but only on the variance parameter σ 2 . If the option itself can be traded, then the formula of Equation (10.12) can be used to set its price in such a way so that no arbitrage is possible. If at time s the price of the stock is X (s) = xs , then the price of a (t, K ) option—that is, an option to purchase one unit of the stock at time t for a price K —should be set by replacing t by t − s and x0 by xs in Equation (10.12).

622

Introduction to Probability Models

Remark If we observe a Brownian motion process with variance parameter σ 2 over any time interval, then we could theoretically obtain an arbitrarily precise estimate of σ 2 . For suppose we observe such a process {Y (s)} for a time t. Then, for fixed h, let N = [t/h] and set W1 = Y (h) − Y (0),

W2 = Y (2h) − Y (h), .. . W N = Y (N h) − Y (N h − h) Then random variables W1 , . . . , W N are independent and identically distributed normal random variables having variance hσ 2 . We now use the fact (see Section 3.6.4) that (N − 1)S 2 /(σ 2 h) has a chi-squared distribution with N − 1 degrees of freedom, where S 2 is the sample variance defined by S2 =

N , i=1

(Wi − W¯ )2 /(N − 1)

Since the expected value and variance of a chi-squared with k degrees of freedom are equal to k and 2k, respectively, we see that E[(N − 1)S 2 /(σ 2 h)] = N − 1 and Var[(N − 1)S 2 /(σ 2 h)] = 2(N − 1) From this, we see that E[S 2 /h] = σ 2 and Var[S 2 /h] = 2σ 4 /(N − 1) Hence, as we let h become smaller (and so N = [t/h] becomes larger) the variance of the unbiased estimator of σ 2 becomes arbitrarily small. # Equation (10.12) is not the only way in which options can be priced so that no arbitrage is possible. Let {X (t), 0 " t " T } be any stochastic process satisfying, for s < t, E[e−αt X (t)|X (u), 0 " u " s] = e−αs X (s)

(10.13)

(that is, Equation (10.9) is satisfied). By setting c, the cost of an option to purchase one share of the stock at time t for price K , equal to c = E[e−αt (X (t) − K )+ ] it follows that no arbitrage is possible.

(10.14)

Brownian Motion and Stationary Processes

623

Another type of stochastic process, aside from geometric Brownian motion, that satisfies Equation (10.13) is obtained as follows. Let Y1 , Y2 , . . . be a sequence of independent random variables having a common mean µ, and suppose that this process is independent of {N (t), t ! 0}, which is a Poisson process with rate λ. Let X (t) = x0

N (t) .

N (s) .

i=1

Using the identity X (t) = x0

i=1

N (t) .

j=N (s)+1

and the independent increment assumption of the Poisson process, we see that, for s < t, ⎡ ⎤ N (t) . E[X (t)|X (u), 0 " u " s] = X (s) E ⎣ Yj⎦ j=N (s)+1

Conditioning on the number of events between s and t yields ⎡ ⎤ N (t) ∞ . , ⎣ ⎦ E Yj = µn e−λ(t−s) [λ(t − s)]n /n! n=0

j=N (s)+1

= e−λ(t−s)(1−µ)

Hence, E[X (t)|X (u), 0 " u " s] = X (s)e−λ(t−s)(1−µ) Thus, if we choose λ and µ to satisfy λ(1 − µ) = −α then Equation (10.13) is satisfied. Therefore, if for any value of λ we let the Yi have any distributions with a common mean equal to µ = 1 + α/λ and then price the options according to Equation (10.14), then no arbitrage is possible. Remark If {X (t), t ! 0} satisfies Equation (10.13), then the process {e−αt X (t), t ! 0} is called a Martingale. Thus, any pricing of options for which the expected gain on the option is equal to 0 when {e−αt X (t)} follows the probability law of some Martingale will result in no arbitrage possibilities. That is, if we choose any Martingale process {Z (t)} and let the cost of a (t, K ) option be c = E[e−αt (eαt Z (t) − K )+ ] = E[(Z (t) − K e−αt )+ ] then there is no sure win.

624

Introduction to Probability Models

In addition, while we did not consider the type of wager where a stock that is purchased at time s is sold not at a fixed time t but rather at some random time that depends on the movement of the stock, it can be shown using results about Martingales that the expected return of such wagers is also equal to 0. Remark A variation of the arbitrage theorem was first noted by de Finetti in 1937. A more general version of de Finetti’s result, of which the arbitrage theorem is a special case, is given in Reference 3.

10.5

The Maximum of Brownian Motion with Drift

For {X (y), y ! 0} being a Brownian motion process with drift coefficient µ and variance parameter σ 2 , define M(t) = max X (y) 0!y!t

to be the maximal value of the process up to time t. We will determine the distribution of M(t) by deriving the conditional distribution of M(t) given the value of X (t). To do so, we first show that the conditional distribution of X (y), 0 " y " t, given the value of X (t), does not depend on µ. That is, given the value of the process at time t, the distribution of its history up to time t does not depend on µ. We start with a lemma. Lemma 10.1 If Y1 , . . . , Yn are independent and identically distributed normal random 2 variables with -n mean θ and variance v , then the conditional distribution of Y1 , . . . , Yn given that i=1 Yi = x does not depend on θ . -n Yi = x, the value of Yn is determined by knowledge of Proof. Because, given i=1 , . . . , Yn−1 , it suffices to consider the conditional density of Y1 , . . . , Yn−1 those of Y1-n n given that i=1 Yi = x. Letting X = i=1 Yi , this is obtained as follows. f Y1 ,...,Yn−1 |X (y1 , . . . , yn−1 |x) =

f Y1 ,...,Yn−1 ,X (y1 , . . . , yn−1 , x) f X (x)

Now, because Y1 = y1 , . . . , Yn−1 = yn−1 , X = x ⇔ Y1 = y1 , . . . , Yn−1 = yn−1 , Yn = x − it follows that f Y1 ,...,Yn−1 ,X (y1 , . . . , yn−1 , x) = f Y1 ,...,Yn−1 ,Yn (y1 , . . . , yn−1 , x − = f Y1 (y1 ) · · · f Yn−1 (yn−1 ) f Yn (x −

n−1 ,

i=1 n−1 , i=1

yi ) yi )

n−1 , i=1

Brownian Motion and Stationary Processes

625

where Hence, using that X = -n the last equality used that Y1 , . . . , Yn are independent. 2 , we obtain Y is normal with mean nθ and variance nv i=1 i -n−1 yi ) f Y1 (y1 ) · · · f Yn−1 (yn−1 ) f Y (x − i=1 f Y1 ,...,Yn−1 |X (y1 , . . . , yn−1 |x) = n f X (x) -n−1 2 2 3n−1 −(y −θ)2 /2v 2 i e−(x− i=1 yi −θ) /2v i=1 e =K 2 2 e−(x−nθ) /2nv n−1 , 1 = K exp{− 2 [(x − yi − θ )2 2v i=1

n−1 , i=1

(yi − θ )2 − (x − nθ )2 /n]}

where K does not depend on θ . Expanding the squares in the preceding, and treating everything that does not depend on θ as a constant, shows that f Y1 ,...,Yn−1 |X (y1 , . . . , yn−1 |x) = K ′ exp{− = K′

n−1

i=1

, , 1 [−2θ (x − yi ) + θ 2 − 2θ yi + (n − 1)θ 2 + 2θ x − nθ 2 ]} 2 2v

where K ′ = K ′ (v, y1 , . . . , yn−1 , x) is a function that does not depend on θ . Thus the result is proven. # Remark Suppose that the distribution of random variables Y1 , . . . , Yn depends on some parameter θ . Further, suppose that there is some function D(Y1 , . . . , Yn ) of Y1 , . . . , Yn such that the conditional distribution of Y1 , . . . , Yn given the value of D(Y1 , . . . , Yn ) does not depend on θ . Then it is said in statistical theory that D(Y1 , . . . , Yn ) is a sufficient statistic for θ . For suppose we wanted to use the data Y1 , . . . , Yn to estimate the value of θ . Because, given the value of D(Y1 , . . . , Yn ), the conditional distribution of the data Y1 , . . . , Yn does not depend on θ , it follows that if the value of D(Y1 , . . . , Yn ) is known then no additional information about θ can be obtained from knowing all the data values Y1 , . . . , Yn . Thus our preceding lemma proves that the sum of the data values of independent and identically distributed normal random variables is a sufficient statistic for their -nmean. (Because knowing the value of Yi /n, called the sample mean, the the sum is equivalent to knowing the value of i=1 common terminology in statistics is that the sample mean is a sufficient statistic for the mean of a normal population.) # Theorem 10.2 Let X (t), t ! 0 be a Brownian motion process with drift coefficient µ and variance parameter σ 2 . Given that X (t) = x, the conditional distribution of X (y), 0 " y " t is the same for all values of µ. Proof. Fix n and set ti = i t/n, i = 1, . . . , n. To prove the theorem we will show for any n that the conditional distribution of X (t1 ), . . . , X (tn ) given the value of X (t) does

626

Introduction to Probability Models

not depend on µ. To do so, let Y1 = X (t1 ), Yi = X (ti )− X (ti−1 ), i = 2, . . . , n and note that Y1 , . . . , Yn are independent and -n identically distributed normal random variables Yi = X (t) it follows from Lemma 10.1 that with mean θ = µt/n. Because i=1 the conditional distribution of Y1 , . . . , Yn given X (t) does not depend on µ. Because knowing Y1 , . . . , Yn is equivalent to knowing X (t1 ), . . . , X (tn ) the result follows. # We now derive the conditional distribution of M(t) given the value of X (t). Theorem 10.3

For y > x 2

P(M(t) ! y|X (t) = x) = e−2y(y−x)/tσ ,

y!0

Proof. Because X (0) = 0 it follows that M(t) ! 0, and so the result is true when y = 0 (since both sides are equal to 1 in this case). So suppose that y > 0. Because it follows from Theorem 10.2 that P(M(t) ! y|X (t) = x) does not depend on the value of µ, let us suppose that µ = 0. Now, let Ty denote the first time that the Brownian motion reaches the value y, and note that it follows from the continuity property of Brownian motion that the event that M(t) ! y is equivalent to the event that Ty " t. This is true because before the process can exceed the positive value y it must, by continuity, first pass through that value. Now, let h be a small positive number for which y > x + h. Then P(M(t) ! y, x " X (t) " x + h) = P(Ty " t, x " X (t) " x + h) = P(x " X (t) " x + h|Ty " t)P(Ty " t)

Now, given Ty " t, the event x " X (t) " x + h will occur if after hitting y the process will decrease by an amount between y − x − h and y − x in the time between Ty and t. But because µ = 0, in any period of time the process is just as likely to increase as it is to decrease by an amount between y − x − h and y − x. Consequently, P(x " X (t) " x + h|Ty " t) = P(2y − x − h " X (t) " 2y − x|Ty " t)

which gives that

P(M(t) ! y, x " X (t) " x + h) = P(2y − x − h " X (t) " 2y − x|Ty " t)

×P(Ty " t) = P(2y − x − h " X (t) " 2y − x, Ty " t) = P(2y − x − h " X (t) " 2y − x)

where the final equation follows because the assumption y > x + h implies that 2y − x − h > y and so, by the continuity of Brownian motion, if 2y − x − h " X (t) then Ty " t. Hence, P(2y − x − h " X (t) " 2y − x) P(x " X (t) " x + h) f X (t) (2y − x) h + o(h) = f X (t) (x) h + o(h) f X (t) (2y − x) + o(h)/h = f X (t) (x) + o(h)/h

P(M(t) ! y|x " X (t) " x + h) =

Brownian Motion and Stationary Processes

627

where f X (t) , the density function of X (t), is the density function of a normal random variable with mean 0 and variance tσ 2 . Letting h → 0 in the preceding gives P(M(t) ! y|X (t) = x) = =

f X (t) (2y − x) f X (t) (x)

e−(2y−x) e−x

2 /2tσ 2

= e−2y(y−x)/tσ

With Z being a standard normal random variable, and $ its distribution function, let ¯ $(x) = 1 − $(x) = P(Z > x) We now have Corollary 10.1 2 ¯ P(M(t) ! y) = e2yµ/σ $

(

µt + y √ σ t

)

¯ +$

(

y − µt √ σ t

)

Proof. Conditioning on X (t) and using Theorem 10.3 yields + ∞ P(M(t) ! y) = P(M(t) ! y|X (t) = x) f X (t) (x)d x −∞ + y + = P(M(t) ! y|X (t) = x) f X (t) (x)d x + +

−∞ y

f X (t) (x)d x

e−(x−µt) /2tσ d x + P(X (t) > y) −∞ ' + y 1 1 −2y 2 /tσ 2 −µ2 t 2 /2tσ 2 =√ e exp − (x 2 − 2µt x e 2 2tσ 2π t σ −∞ * − 4yx) d x + P(X (t) > y)

e−2y(y−x)/tσ √

∞

2π tσ 2

1 2 2 2 2 =√ e−(4y +µ t )/2tσ 2π t σ ' * + y 1 2 × exp − (x − 2x(µt + 2y)) d x + P(X (t) > y) 2tσ 2 −∞

Now, x 2 − 2x(µt + 2y) = (x − (µt + 2y))2 − (µt + 2y)2 giving that P(M(t) ! y) = e−(4y

2 +µ2 t 2 −(µt+2y)2 )/2tσ 2

+ P(X (t) > y)

1 √ 2π t σ

y −∞

e−(x−µt−2y)

2 /2tσ 2

628

Introduction to Probability Models

Making the change of variable w=

√ d x = σ t dw

x − µt − 2y , √ σ t

gives + −µt−y √ 1 2 2 σ t e−w /2 dw + P(X (t) > y) P(M(t) ! y) = e2yµ/σ √ 2π −∞ ) ( −µt − y 2 + P(X (t) > y) = e2yµ/σ $ √ σ t ) ( ) ( y − µt µt + y 2 ¯ ¯ = e2yµ/σ $ +$ √ √ σ t σ t #

and the proof is complete.

In the proof of Theorem 10.3 we let Ty denote the first time the Brownian motion is equal to y. In addition, as previously noted, the continuity of Brownian motion implies that, for y > 0, the process would have hit y by time t if and only if the maximum of the process by time t was at least y. Consequently, for y > 0, Ty " t ⇔ M(t) ! y which, using Corollary 10.1, gives ) ( ) ( y − µt y + µt 2 ¯ ¯ P(Ty " t) = e2yµ/σ $ +$ , √ √ σ t σ t

10.6

y>0

White Noise

Let {X (t), t ! 0} denote a standard Brownian motion process and let f be a function 4b having a continuous derivative in the region [a, b]. The stochastic integral a f (t) d X (t) is defined as follows: + b n , f (t) d X (t) ≡ lim f (ti−1 )[X (ti ) − X (ti−1 )] (10.15) a

n→∞

i=1 max (ti −ti−1 )→0

where a = t0 < t1 < · · · < tn = b is a partition of the region [a, b]. Using the identity (the integration by parts formula applied to sums) n , i=1

f (ti−1 )[X (ti ) − X (ti−1 )]

= f (b)X (b) − f (a)X (a) −

n , i=1

X (ti )[ f (ti ) − f (ti−1 )]

Brownian Motion and Stationary Processes

629

we see that + + b f (t) d X (t) = f (b)X (b) − f (a)X (a) − a

X (t) d f (t)

(10.16)

4b Equation (10.16) is usually taken as the definition of a f (t) d X (t). By using the right side of Equation (10.16) we obtain, upon assuming the interchangeability of expectation and limit, that # "+ b f (t) d X (t) = 0 E a

Also, Var

5 n , i=1

f (ti−1 )[X (ti ) − X (ti−1 )] = =

n , i=1

f 2 (ti−1 )Var[X (ti ) − X (ti−1 )] f 2 (ti−1 )(ti − ti−1 )

where the top equality follows from the independent increments of Brownian motion. Hence, we obtain from Equation (10.15) upon taking limits of the preceding that # + b "+ b f (t) d X (t) = f 2 (t) dt Var a

Remark The preceding gives operational meaning to the family of quantities {d X (t), 0 " t < ∞} by viewing it as an operator that carries functions f into the values 4b a f (t) d X (t). This is called a white noise transformation, or more loosely {d X (t), 0 " t < ∞} is called white noise since it can be imagined that a time varying function 4b f travels through a white noise medium to yield the output (at time b) a f (t) d X (t).

Example 10.4 Consider a particle of unit mass that is suspended in a liquid and suppose that, due to the liquid, there is a viscous force that retards the velocity of the particle at a rate proportional to its present velocity. In addition, let us suppose that the velocity instantaneously changes according to a constant multiple of white noise. That is, if V (t) denotes the particle’s velocity at t, suppose that V ′ (t) = −βV (t) + α X ′ (t)

where {X (t), t ! 0} is standard Brownian motion. This can be written as follows: eβt [V ′ (t) + βV (t)] = αeβt X ′ (t) or d βt [e V (t)] = αeβt X ′ (t) dt

630

Introduction to Probability Models

Hence, upon integration, we obtain βt

e V (t) = V (0) + α

eβs X ′ (s) ds

or V (t) = V (0)e−βt + α

e−β(t−s) d X (s)

Hence, from Equation (10.16), V (t) = V (0)e

10.7

−βt

+ α X (t) −

X (s)βe

−β(t−s)

Gaussian Processes

We start with the following definition. Definition 10.2 A stochastic process X (t), t ! 0 is called a Gaussian, or a normal, process if X (t1 ), . . . , X (tn ) has a multivariate normal distribution for all t1 , . . . , tn . If {X (t), t ! 0} is a Brownian motion process, then because each of X (t1 ), X (t2 ), . . . , X (tn ) can be expressed as a linear combination of the independent normal random variables X (t1 ), X (t2 ) − X (t1 ), X (t3 ) − X (t2 ), . . . , X (tn ) − X (tn−1 ) it follows that Brownian motion is a Gaussian process. Because a multivariate normal distribution is completely determined by the marginal mean values and the covariance values (see Section 2.6) it follows that standard Brownian motion could also be defined as a Gaussian process having E[X (t)] = 0 and, for s " t, Cov(X (s), X (t)) = Cov(X (s), X (s) + X (t) − X (s)) = Cov(X (s), X (s)) + Cov(X (s), X (t) − X (s))

= Cov(X (s), X (s)) by independent increments =s since Var(X (s)) = s

(10.17)

Let {X (t), t ! 0} be a standard Brownian motion process and consider the process values between 0 and 1 conditional on X (1) = 0. That is, consider the conditional stochastic process {X (t), 0 " t " 1|X (1) = 0}. Since the conditional distribution of X (t1 ), . . . , X (tn ) is multivariate normal it follows that this conditional process, known as the Brownian bridge (as it is tied down both at 0 and at 1), is a Gaussian process. Let us compute its covariance function. As, from Equation (10.4), E[X (s)|X (1) = 0] = 0,

for s < 1

Brownian Motion and Stationary Processes

631

we have that, for s < t < 1, Cov[(X (s), X (t))|X (1) = 0] = E[X (s)X (t)|X (1) = 0]

Thus, the Brownian bridge can be defined as a Gaussian process with mean value 0 and covariance function s(1 − t), s " t. This leads to an alternative approach to obtaining such a process. Proposition 10.1 If {X (t), t ! 0} is standard Brownian motion, then {Z (t), 0 " t " 1} is a Brownian bridge process when Z (t) = X (t) − t X (1). Proof. As it is immediate that {Z (t), t ! 0} is a Gaussian process, all we need verify is that E[Z (t)] = 0 and Cov(Z (s), Z (t)) = s(1 − t), when s " t. The former is immediate and the latter follows from Cov(Z (s), Z (t)) = Cov(X (s) − s X (1), X (t) − t X (1)) = Cov(X (s), X (t)) − tCov(X (s), X (1))

− s Cov(X (1), X (t)) + st Cov(X (1), X (1)) = s − st − st + st = s(1 − t)

and the proof is complete. If {X (t), t ! 0} is Brownian motion, then the process {Z (t), t ! 0} defined by Z (t) =

X (s) ds

(10.18)

is called integrated Brownian motion. As an illustration of how such a process may arise in practice, suppose we are interested in modeling the price of a commodity throughout time. Letting Z (t) denote the price at t then, rather than assuming that {Z (t)} is Brownian motion (or that log Z (t) is Brownian motion), we might want to assume that the rate of change of Z (t) follows a Brownian motion. For instance, we might suppose that the rate of change of the commodity’s price is the current inflation

632

Introduction to Probability Models

rate, which is imagined to vary as Brownian motion. Hence, d Z (t) = X (t), dt Z (t) = Z (0) +

X (s) ds 0

It follows from the fact that Brownian motion is a Gaussian process that {Z (t), t ! 0} is also Gaussian. To prove this, first recall that W1 , . . . , Wn is said to have a multivariate normal distribution if they can be represented as Wi =

m ,

i = 1, . . . , n

ai j U j ,

j=1

where U j , j = 1, . . . , m are independent normal random variables. From this it follows that any set of partial sums of W1 , . . . , Wn are also jointly normal. The fact that Z (t1 ), . . . , Z (tn ) is multivariate normal can now be shown by writing the integral in Equation (10.18) as a limit of approximating sums. As {Z (t), t ! 0} is Gaussian it follows that its distribution is characterized by its mean value and covariance function. We now compute these when {X (t), t ! 0} is standard Brownian motion. E[Z (t)] = E + = =0

"+ t

X (s) ds 0

E[X (s)] ds

For s " t, Cov[Z (s), Z (t)] = E[Z (s)Z (t)] "+ t # + s =E X (y) dy X (u) du 0 0 # "+ s + t X (y)X (u) dy du =E 0 0 + s+ t E[X (y)X (u)] dy du = 0 0 + s+ t = min(y, u) dy du by (10.17) 0 0 ) ( ) + s (+ u + t t s − = y dy + u dy du = s 2 2 6 0 0 u

Brownian Motion and Stationary Processes

10.8

633

Stationary and Weakly Stationary Processes

A stochastic process {X (t), t ! 0} is said to be a stationary process if for all n, s, t1 , . . . , tn the random vectors X (t1 ), . . . , X (tn ) and X (t1 + s), . . . , X (tn + s) have the same joint distribution. In other words, a process is stationary if, in choosing any fixed point s as the origin, the ensuing process has the same probability law. Two examples of stationary processes are: (i) An ergodic continuous-time Markov chain {X (t), t ! 0} when P{X (0) = j} = P j ,

j !0

where {P j , j ! 0} are the limiting probabilities. (ii) {X (t), t ! 0} when X (t) = N (t + L) − N (t), t ! 0, where L > 0 is a fixed constant and {N (t), t ! 0} is a Poisson process having rate λ. The first one of these processes is stationary for it is a Markov chain whose initial state is chosen according to the limiting probabilities, and it can thus be regarded as an ergodic Markov chain that we start observing at time ∞. Hence, the continuation of this process at time s after observation begins is just the continuation of the chain starting at time ∞+s, which clearly has the same probability for all s. That the second example— where X (t) represents the number of events of a Poisson process that occur between t and t + L—is stationary follows from the stationary and independent increment assumption of the Poisson process, which implies that the continuation of a Poisson process at any time s remains a Poisson process. Example 10.5 (The Random Telegraph Signal Process) Let {N (t), t ! 0} denote a Poisson process, and let X 0 be independent of this process and be such that P{X 0 = 1} = P{X 0 = −1} = 21 . Defining X (t) = X 0 (−1) N (t) then {X (t), t ! 0} is called a random telegraph signal process. To see that it is stationary, note first that starting at any time t, no matter what the value of N (t), as X 0 is equally likely to be either plus or minus 1, it follows that X (t) is equally likely to be either plus or minus 1. Hence, because the continuation of a Poisson process beyond any time remains a Poisson process, it follows that {X (t), t ! 0} is a stationary process. Let us compute the mean and covariance function of the random telegraph signal. E[X (t)] = E[X 0 (−1) N (t) ] = E[X 0 ]E[(−1) N (t) ]

by independence

=0 since E[X 0 ] = 0, Cov[X (t), X (t + s)] = E[X (t)X (t + s)]

= E[X 02 ( − 1) N (t)+N (t+s) ]

= E[(−1)2N (t) ( − 1) N (t+s)−N (t) ] = E[(−1) N (t+s)−N (t) ]

= E[(−1) N (s) ]

634

Introduction to Probability Models

∞ ,

(−1)i e−λs

i=0 −2λs

(λs)i i!

(10.19)

For an application of the random telegraph signal consider a particle moving at a constant unit velocity along a straight line and suppose that collisions involving this particle occur at a Poisson rate λ. Also suppose that each time the particle suffers a collision it reverses direction. Therefore, if X 0 represents the initial velocity of the particle, then its velocity at time t—call it X (t)—is given by X (t) = X 0 (−1) N (t) , where N (t) denotes the number of collisions involving the particle by time t. Hence, if X 0 is equally likely to be plus or minus 1, and is independent of {N (t), t ! 0}, then {X (t), t ! 0} is a random telegraph signal process. If we now let + t X (s) ds D(t) = 0

then D(t) represents the displacement of the particle at time t from its position at time 0. The mean and variance of D(t) are obtained as follows: + t E[D(t)] = E[X (s)] ds = 0, 0

Var[D(t)] = E[D 2 (t)] "+ t # + t =E X (y) dy X (u) du 0 0 + t+ t E[X (y)X (u)] dy du = 0 0 ++ =2 E[X (y)X (u)] dy du 0 x t1 !s!t2

8. Consider the random walk that in each !t time unit either goes up or down the √ √ amount !t with respective probabilities p and 1− p, where p = 21 (1+µ !t). (a) Argue that as !t → 0 the resulting limiting process is a Brownian motion process with drift rate µ. (b) Using part (a) and the results of the gambler’s ruin problem (Section 4.5.1), compute the probability that a Brownian motion process with drift rate µ goes up A before going down B, A > 0, B > 0.

9. Let {X (t), t ! 0} be a Brownian motion process with drift coefficient µ and variance parameter σ 2 . What is the joint density function of X (s) and X (t), s < t?

640

Introduction to Probability Models

*10. Let {X (t), t ! 0} be a Brownian motion process with drift coefficient µ and variance parameter σ 2 . What is the conditional distribution of X (t) given that X (s) = c when (a) s < t? (b) t < s?

11. Consider a process whose value changes every h time units; its new value being its √ √ old value multiplied either by the factor eσ h with probability p = 21 (1 + µ σ h), √

or by the factor e−σ h with probability 1− p. As h goes to zero, show that this process converges to geometric Brownian motion with drift coefficient µ and variance parameter σ 2 . 12. A stock is presently selling at a price of $50 per share. After one time period, its selling price will (in present value dollars) be either $150 or $25. An option to purchase y units of the stock at time 1 can be purchased at cost cy. (a) (b) (c) (d)

What should c be in order for there to be no sure win? If c = 4, explain how you could guarantee a sure win. If c = 10, explain how you could guarantee a sure win. Use the arbitrage theorem to verify your answer to part (a).

13. Verify the statement made in the remark following Example 10.2. 14. The present price of a stock is 100. The price at time 1 will be either 50, 100, or 200. An option to purchase y shares of the stock at time 1 for the (present value) price ky costs cy. (a) If k = 120, show that an arbitrage opportunity occurs if and only if c > 80/3. (b) If k = 80, show that there is not an arbitrage opportunity if and only if 20 " c " 40. 15. The current price of a stock is 100. Suppose that the logarithm of the price of the stock changes according to a Brownian motion process with drift coefficient µ = 2 and variance parameter σ 2 = 1. Give the Black-Scholes cost of an option to buy the stock at time 10 for a cost of (a) 100 per unit. (b) 120 per unit. (c) 80 per unit. Assume that the continuously compounded interest rate is 5 percent. A stochastic process {Y (t), t ! 0} is said to be a Martingale process if, for s < t, E[Y (t)|Y (u), 0 " u " s] = Y (s) 16. If {Y (t), t ! 0} is a Martingale, show that E[Y (t)] = E[Y (0)] 17. Show that standard Brownian motion is a Martingale.

Brownian Motion and Stationary Processes

641

18. Show that {Y (t), t ! 0} is a Martingale when Y (t) = B 2 (t) − t What is E[Y (t)]? Hint: First compute E[Y (t)|B(u), 0 " u " s]. *19. Show that {Y (t), t ! 0} is a Martingale when Y (t) = exp{cB(t) − c2 t/2} where c is an arbitrary constant. What is E[Y (t)]? An important property of a Martingale is that if you continually observe the process and then stop at some time T , then, subject to some technical conditions (which will hold in the problems to be considered), E[Y (T )] = E[Y (0)] The time T usually depends on the values of the process and is known as a stopping time for the Martingale. This result, that the expected value of the stopped Martingale is equal to its fixed time expectation, is known as the Martingale stopping theorem. *20. Let T = Min{t : B(t) = 2 − 4t} That is, T is the first time that standard Brownian motion hits the line 2 − 4t. Use the Martingale stopping theorem to find E[T ]. 21. Let {X (t), t ! 0} be Brownian motion with drift coefficient µ and variance parameter σ 2 . That is, X (t) = σ B(t) + µt Let µ > 0, and for a positive constant x let T = Min{t: X (t) = x} ' * x − µt = Min t: B(t) = σ That is, T is the first time the process {X (t), t ! 0} hits x. Use the Martingale stopping theorem to show that E[T ] = x/µ 22. Let X (t) = σ B(t) + µt, and for given positive constants A and B, let p denote the probability that {X (t), t ! 0} hits A before it hits −B.

642

Introduction to Probability Models

(a) Define the stopping time T to be the first time the process hits either A or −B. Use this stopping time and the Martingale defined in Exercise 19 to show that E[exp{c(X (T ) − µT )/σ − c2 T /2}] = 1 (b) Let c = −2µ/σ , and show that E[exp{−2µX (T )/σ }] = 1 (c) Use part (b) and the definition of T to find p. Hint: What are the possible values of exp{−2µX (T )/σ 2 }? 23. Let X (t) = σ B(t)+µt, and define T to be the first time the process {X (t), t ! 0} hits either A or −B, where A and B are given positive numbers. Use the Martingale stopping theorem and part (c) of Exercise 22 to find E[T ]. *24. Let {X (t), t ! 0} be Brownian motion with drift coefficient µ and variance parameter σ 2 . Suppose that µ > 0. Let x > 0 and define the stopping time T (as in Exercise 21) by T = Min{t: X (t) = x} Use the Martingale defined in Exercise 18, along with the result of Exercise 21, to show that Var(T ) = xσ 2 /µ3 In Exercises 25 to 27, {X (t), t ! 0} is a Brownian motion process with drift parameter µ and variance parameter σ 2 . √ 25. Suppose every ! time units a process either increases by the amount σ ! with √ probability p or decreases by the amount σ ! with probability 1 − p where p=

µ√ 1 (1 + !). 2 σ

Show that as ! goes to 0, this process converges to a Brownian motion process with drift parameter µ and variance parameter σ 2 . 26. Let Ty be the first time that the process is equal to y. For y > 0, show that ' 1, if µ ! 0 P(Ty < ∞) = 2yµ/σ 2 , if µ < 0 e Let M = max0!t 0 and Y (0) = 0.

(a) What is the distribution of Y (t)? (b) Compare Cov(Y (s), Y (t)). (c) Argue that {Y (t), t ! 0} is a standard Brownian motion process.

30. Let Y (t) = B(a 2 t)/a for a > 0. Argue that {Y (t)} is a standard Brownian motion process. 31. For s < t, argue that B(s) − st B(t) and B(t) are independent. 32. Let {Z (t), t ! 0} denote a Brownian bridge process. Show that if Y (t) = (t + 1)Z (t/(t + 1)) then {Y (t), t ! 0} is a standard Brownian motion process. 33. Let X (t) = N (t + 1) − N (t) where {N (t), t ! 0} is a Poisson process with rate λ. Compute Cov[X (t), X (t + s)] 34. Let {N (t), t ! 0} denote a Poisson process with rate λ and define Y (t) to be the time from t until the next Poisson event. (a) Argue that {Y (t), t ! 0} is a stationary process. (b) Compute Cov[Y (t), Y (t + s)].

35. Let {X (t), −∞ < t < ∞} be a weakly stationary process having covariance function R X (s) = Cov[X (t), X (t + s)]. (a) Show that

Var(X (t + s) − X (t)) = 2R X (0) − 2R X (t) (b) If Y (t) = X (t + 1) − X (t) show that {Y (t), −∞ < t < ∞} is also weakly stationary having a covariance function RY (s) = Cov[Y (t), Y (t + s)] that satisfies RY (s) = 2R X (s) − R X (s − 1) − R X (s + 1) 36. Let Y1 and Y2 be independent unit normal random variables and for some constant w set X (t) = Y1 cos wt + Y2 sin wt,

−∞ < t < ∞

(a) Show that {X (t)} is a weakly stationary process. (b) Argue that {X (t)} is a stationary process.

644

Introduction to Probability Models

37. Let {X (t), −∞ < t < ∞} be weakly stationary with covariance function R(s) = 9 Cov(X (t), X (t +s)) and let R(w) denote the power spectral density of the process. 9 9 (i) Show that R(w) = R(−w). It can be shown that R(s) =

1 2π

∞

−∞

iws 9 dw R(w)e

(ii) Use the preceding to show that +

∞

−∞

References

9 R(w) dw = 2π E[X 2 (t)]

[1] M.S. Bartlett, “An Introduction to Stochastic Processes,” Cambridge University Press, London, 1954. [2] U. Grenander and M. Rosenblatt, “Statistical Analysis of Stationary Time Series,” John Wiley, New York, 1957. [3] D. Heath and W. Sudderth, “On a Theorem of De Finetti, Oddsmaking, and Game Theory,” Ann. Math. Stat. 43, 2072–2077 (1972). [4] S. Karlin and H. Taylor, “A Second Course in Stochastic Processes,” Academic Press, Orlando, FL, 1981. [5] L.H. Koopmans, “The Spectral Analysis of Time Series,” Academic Press, Orlando, FL, 1974. [6] S. Ross, “Stochastic Processes,” Second Edition, John Wiley, New York, 1996.

Simulation

11.1

Introduction

Let X = (X 1 , . . . , X n ) denote a random vector having a given density function f (x1 , . . . , xn ) and suppose we are interested in computing !! ! E[g(X)] = · · · g(x1 , . . . , xn ) f (x1 , . . . , xn ) d x1 d x2 · · · d xn for some n-dimensional function g. For instance, g could represent the total delay in queue of the first [n/2] customers when the X values represent the first [n/2] interarrival and service times.∗ In many situations, it is not analytically possible either to compute the preceding multiple integral exactly or even to numerically approximate it within a given accuracy. One possibility that remains is to approximate E[g(X)] by means of simulation. E[g(X)], start by generating a random vector X(1) = # " To approximate (1) (1) having the joint density f (x1 , . . . , xn ) and then compute Y (1) = X1 , . . . , Xn $ (1) % g X . Now generate a second random vector (independent of the first) X(2) and % $ compute Y (2) = g X(2) . Keep on doing this until r , a fixed number of independent % $ and identically distributed random variables Y (i) = g X(i) , i = 1, . . . , r have been generated. Now by the strong law of large numbers, we know that Y (1) + · · · + Y (r ) = E[Y (i) ] = E[g(X)] r →∞ r lim

∗

We are using the notation [a] to represent the largest integer less than or equal to a.

646

Introduction to Probability Models

and so we can use the average of the generated Y s as an estimate of E[g(X)]. This approach to estimating E[g(X)] is called the Monte Carlo simulation approach. Clearly there remains the problem of how to generate, or simulate, random vectors having a specified joint distribution. The first step in doing this is to be able to generate random variables from a uniform distribution on (0, 1). One way to do this would be to take 10 identical slips of paper, numbered 0, 1, . . . , 9, place them in a hat and then successively select n slips, with replacement, from the hat. The sequence of digits obtained (with a decimal point in front) can$be %regarded as the value of a uniform (0, 1) 1 n . For instance, if the sequence of digits random variable rounded off to the nearest 10 selected is 3, 8, 7, 2, 1, then the value of the uniform (0, 1) random variable is 0.38721 (to the nearest 0.00001). Tables of the values of uniform (0, 1) random variables, known as random number tables, have been extensively published (for instance, see The RAND Corporation, A Million Random Digits with 100,000 Normal Deviates (New York: The Free Press, 1955)). Table 11.1 is such a table. However, this is not the way in which digital computers simulate uniform (0, 1) random variables. In practice, they use pseudo random numbers instead of truly random ones. Most random number generators start with an initial value X 0 , called the seed, and then recursively compute values by specifying positive integers a, c, and m, and then letting X n+1 = (a X n + c) modulo m, n ! 0 where the preceding means that a X n + c is divided by m and the remainder is taken as the value of X n+1 . Thus each X n is either 0, 1, . . ., or m − 1 and the quantity X n /m is taken as an approximation to a uniform (0, 1) random variable. It can be shown that subject to suitable choices for a, c, m, the preceding gives rise to a sequence of numbers that looks as if it were generated from independent uniform (0, 1) random variables. As our starting point in the simulation of random variables from an arbitrary distribution, we shall suppose that we can simulate from the uniform (0, 1) distribution, and we shall use the term “random numbers” to mean independent random variables from this distribution. In Sections 11.2 and 11.3 we present both general and special techniques for simulating continuous random variables; and in Section 11.4 we do the same for discrete random variables. In Section 11.5 we discuss the simulation both of jointly distributed random variables and stochastic processes. Particular attention is given to the simulation of nonhomogeneous Poisson processes, and in fact three different approaches for this are discussed. Simulation of two-dimensional Poisson processes is discussed in Section 11.5.2. In Section 11.6 we discuss various methods for increasing the precision of the simulation estimates by reducing their variance; and in Section 11.7 we consider the problem of choosing the number of simulation runs needed to attain a desired level of precision. Before beginning this program, however, let us consider two applications of simulation to combinatorial problems. Example 11.1 (Generating a Random Permutation) Suppose we are interested in generating a permutation of the numbers 1, 2, . . . , n that is such that all n! possible orderings are equally likely. The following algorithm will accomplish this by first choosing one of the numbers 1, . . . , n at random and then putting that number in

Simulation

647

Table 11.1 A Random Number Table 04839 68086 39064 25669 64117 87917 62797 95876 29888 73577 27958 90999 18845 94824 35605 33362 88720 39475 06990 40980 83974 33339 31662 93526 20492 04153 05520 47498 23167 23792 85900 42559 14349 17403 23632

96423 26432 66432 26422 94305 77341 56170 55293 88604 12908 30134 49127 49618 78171 81263 64270 82765 46473 67245 07391 29992 31926 25388 70765 38391 53381 91962 87637 49323 14422 98275 78985 82674 53363 27889

24878 46901 84673 44407 26766 42206 86324 18988 67917 30883 04024 20044 02304 84610 39667 01638 34476 23219 68350 58745 65381 14883 61642 10592 91132 79401 04739 99016 45021 15059 32388 05300 66523 44167 47914

82651 20848 40027 44048 25940 35126 88072 27354 48708 18317 86385 59931 51038 82834 47358 92477 17032 53416 82948 25774 38857 24413 34072 04542 21999 21438 13092 71060 33132 45799 52390 22164 44133 64486 02584

66566 89768 32832 37937 39972 74087 76222 26575 18912 28290 29880 06115 20655 09922 56873 66969 87589 94970 11398 22987 50490 59744 81249 76463 59516 83035 97662 88824 12544 22716 16815 24369 00697 64758 37680

14778 81536 61362 63904 22209 99547 36086 08625 82271 35797 99730 20542 58727 25417 56307 98420 40836 25832 42878 80059 83765 92351 35648 54328 81652 92350 24822 71013 41035 19792 69298 54224 35552 75366 20801

76797 86645 98947 45766 71500 81817 84637 40801 65424 05998 55536 18059 28168 44137 61607 04880 32427 69975 80287 39911 55657 97473 56891 02349 27195 36693 94730 18735 80780 09983 82732 35083 35970 76554 72152

14780 12659 96067 66134 64568 42607 93161 59920 69774 41688 84855 02008 15475 48413 49518 45585 70002 94884 88267 96189 14361 89286 69352 17247 48223 31238 06496 20286 45393 74353 38480 19687 19124 31601 39339

13300 92259 64760 75470 91402 43808 76038 29841 33611 34952 29080 73708 56942 25555 89356 46565 70663 19661 47363 41151 31720 35931 48373 28865 46751 59649 35090 23153 44812 68668 73817 11062 63318 12614 34806

87074 57102 64584 66520 42416 76655 65855 80150 54262 37888 09250 83517 53389 21246 20103 04102 88863 72828 46634 14222 57375 04110 45578 14777 22923 91754 04822 72924 12515 30429 32523 91491 29686 33072 08930

position n; it then chooses at random one of the remaining n − 1 numbers and puts that number in position n − 1; it then chooses at random one of the remaining n − 2 numbers and puts it in position n − 2, and so on (where choosing a number at random means that each of the remaining numbers is equally likely to be chosen). However, so that we do not have to consider exactly which of the numbers remain to be positioned, it is convenient and efficient to keep the numbers in an ordered list and then randomly choose the position of the number rather than the number itself. That is, starting with any initial ordering p1 , p2 , . . . , pn , we pick one of the positions 1, . . . , n at random and then interchange the number in that position with the one in position n. Now we

648

Introduction to Probability Models

randomly choose one of the positions 1, . . . , n − 1 and interchange the number in this position with the one in position n − 1, and so on. To implement the preceding, we need to be able to generate a random variable that is equally likely to take on any of the values 1, 2, . . . , k. To accomplish this, let U denote a random number—that is, U is uniformly distributed over (0, 1)—and note that kU is uniform on (0, k) and so 1 , i = 1, . . . , k k Hence, the random variable I = [kU ] + 1 will be such that P{i − 1 < kU < i} =

1 k The preceding algorithm for generating a random permutation can now be written as follows: P{I = i} = P{[kU ] = i − 1} = P{i − 1 < kU < i} =

Step 1: Let p1 , p2 , . . . , pn be any permutation of 1, 2, . . . , n (for instance, we can choose p j = j, j = 1, . . . , n). Step 2: Set k = n. Step 3: Generate a random number U and let I = [kU ] + 1. Step 4: Interchange the values of p I and pk . Step 5: Let k = k − 1 and if k > 1 go to step 3. Step 6: p1 , . . . , pn is the desired random permutation. For instance, suppose n = 4 and the initial permutation is 1, 2, 3, 4. If the first value of I (which is equally likely to be either 1, 2, 3, 4) is I = 3, then the new permutation is 1, 2, 4, 3. If the next value of I is I = 2 then the new permutation is 1, 4, 2, 3. If the final value of I is I = 2, then the final permutation is 1, 4, 2, 3, and this is the value of the random permutation. One very important property of the preceding algorithm is that it can also be used to generate a random subset, say of size r , of the integers 1, . . . , n. Namely, just follow the algorithm until the positions n, n − 1, . . . , n − r + 1 are filled. The elements in these positions constitute the random subset. " Example 11.2 (Estimating the Number of Distinct Entries in a Large List) Consider a list of n entries where n is very large, and suppose we are interested in estimating d, the number of distinct elements in the list. If we let m i denote the number of times that the element in position i appears on the list, then we can express d by n & 1 d= mi i=1

To estimate d, suppose that we generate a random value X equally likely to be either 1, 2, . . . , n (that is, we take X = [nU ] + 1) and then let m(X ) denote the number of times the element in position X appears on the list. Then ( & ' n d 1 1 1 = = E m(X ) mi n n i=1

Simulation

649

Hence, if we generate k such random variables X 1 , . . . , X k we can estimate d by )k n i=1 1/m(X i ) d≈ k Suppose now that each item in the list has a value attached to it—v(i) being the value of the ith element. The sum of the values of the distinct items—call it v—can be expressed as v=

n & v(i) m(i) i=1

Now if X = [nU ] + 1, where U is a random number, then ' ( & n v(X ) v(i) 1 v E = = m(X ) m(i) n n i=1

Hence, we can estimate v by generating X 1 , . . . , X k and then estimating v by v≈

k n & v(X i ) k m(X i ) i=1

For an important application of the preceding, let Ai = {ai,1 ,$* . . . , ai,n i %}, i = 1, . . . , s s Ai . Since denote events, and suppose we are interested in estimating P i=1 + s ni s & , & & P(ai, j ) Ai = P(a) = P m(ai, j ) i=1

a∈∪Ai

i=1 j=1

where m(ai, j ) is the number of events $* to %which the point ai, j belongs, the preceding method can be used to estimate P s1 Ai . Note that the preceding procedure for estimating v can be effected without prior knowledge of the set of values {v1 , . . . , vn }. That is, it suffices that we can determine the value of an element in a specific place and the number of times that element appears on the list. When the set of values is a priori known, there is another approach available as will be shown in Example 11.11. "

11.2

General Techniques for Simulating Continuous Random Variables

In this section we present three methods for simulating continuous random variables.

11.2.1

The Inverse Transformation Method

A general method for simulating a random variable having a continuous distribution— called the inverse transformation method—is based on the following proposition.

650

Introduction to Probability Models

Proposition 11.1 Let U be a uniform (0, 1) random variable. For any continuous distribution function F if we define the random variable X by X = F −1 (U )

then the random variable X has distribution function F. (F −1 (u) is defined to equal that value x for which F(x) = u.) Proof.

FX (a) = P{X # a} = P{F −1 (U ) # a}

(11.1)

Now, since F(x) is a monotone function, it follows that F −1 (U ) # a if and only if U # F(a). Hence, from Equation (11.1), we see that FX (a) = P{U # F(a)} = F(a)

Hence, we can simulate a random variable X from the continuous distribution F, when F −1 is computable, by simulating a random number U and then setting X = F −1 (U ). Example 11.3 (Simulating an Exponential Random Variable) then F −1 (u) is that value of x such that

If F(x) = 1 − e−x ,

1 − e−x = u or x = − log(1 − u) Hence, if U is a uniform (0, 1) variable, then F −1 (U ) = − log(1 − U ) is exponentially distributed with mean 1. Since 1 − U is also uniformly distributed on (0, 1) it follows that − log U is exponential with mean 1. Since cX is exponential with mean c when X is exponential with mean 1, it follows that −c log U is exponential with mean c. "

11.2.2

The Rejection Method

Suppose that we have a method for simulating a random variable having density function g(x). We can use this as the basis for simulating from the continuous distribution having density f (x) by simulating Y from g and then accepting this simulated value with a probability proportional to f (Y )/g(Y ). Specifically, let c be a constant such that f (y) # c for all y g(y)

Simulation

651

We then have the following technique for simulating a random variable having density f . Rejection Method Step 1: Step 2:

Simulate Y having density g and simulate a random number U . If U # f (Y )/cg(Y ) set X = Y . Otherwise return to step 1.

Proposition 11.2 The random variable X generated by the rejection method has density function f . Proof. Let X be the value obtained, and let N denote the number of necessary iterations. Then P{X # x} = P{Y N # x}

= P{Y # x|U # f (Y )/cg(Y )}

P{Y # x, U # f (Y )/cg(Y )} K . P{Y # x, U # f (Y )/cg(Y )|Y = y}g(y) dy = K .x ( f (y)/cg(y))g(y) dy = −∞ K .x f (y) dy = −∞ Kc =

where K = P{U # f (Y )/cg(Y )}. Letting x → ∞ shows that K = 1/c and the proof is complete. " Remarks (i) The preceding method was originally presented by Von Neumann in the special case where g was positive only in some finite interval (a, b), and Y was chosen to be uniform over (a, b) (that is, Y = a + (b − a)U ). (ii) Note that the way in which we “accept the value Y with probability f (Y )/cg(Y )” is by generating a uniform (0, 1) random variable U and then accepting Y if U # f (Y )/cg(Y ). (iii) Since each iteration of the method will, independently, result in an accepted value with probability P{U # f (Y )/cg(Y )} = 1/c it follows that the number of iterations is geometric with mean c. (iv) Actually, it is not necessary to generate a new uniform random number when deciding whether or not to accept, since at a cost of some additional computation, a single random number, suitably modified at each iteration, can be used throughout. To see how, note that the actual value of U is not used—only whether or not U < f (Y )/cg(Y ). Hence, if Y is rejected—that is, if U > f (Y )/cg(Y )—we can use the fact that, given Y , cU g(Y ) − f (Y ) U − f (Y )/cg(Y ) = 1 − f (Y )/cg(Y ) cg(Y ) − f (Y )

652

Introduction to Probability Models

is uniform on (0, 1). Hence, this may be used as a uniform random number in the next iteration. As this saves the generation of a random number at the cost of the preceding computation, whether it is a net savings depends greatly upon the method being used to generate random numbers. " Example 11.4 Let us use the rejection method to generate a random variable having density function f (x) = 20x(1 − x)3 , 0 < x < 1 Since this random variable (which is beta with parameters 2, 4) is concentrated in the interval (0, 1), let us consider the rejection method with g(x) = 1, 0 < x < 1 To determine the constant c such that f (x)/g(x) # c, we use calculus to determine the maximum value of f (x) = 20x(1 − x)3 g(x) Differentiation of this quantity yields ' ( / 0 d f (x) = 20 (1 − x)3 − 3x(1 − x)2 d x g(x) Setting this equal to 0 shows that the maximal value is attained when x = 14 , and thus 1 2 1 23 f (x) 1 3 135 # 20 ≡c = g(x) 4 4 64

Hence, f (x) 256 = x(1 − x)3 cg(x) 27 and thus the rejection procedure is as follows: Step 1: Step 2:

Generate random numbers U1 and U2 . 3 If U2 # 256 27 U1 (1 − U1 ) , stop and set X = U1 . Otherwise return to step 1.

The average number of times that step 1 will be performed is c =

135 64 .

Example 11.5 (Simulating a Normal Random Variable) To simulate a standard normal random variable Z (that is, one with mean 0 and variance 1) note first that the absolute value of Z has density function 2 2 f (x) = √ e−x /2 , 0 < x < ∞ 2π

(11.2)

Simulation

653

We will start by simulating from the preceding density by using the rejection method with g(x) = e−x , 0 < x < ∞ Now, note that 3 f (x) 3 = 2e/π exp{−(x − 1)2 /2} # 2e/π g(x)

Hence, using the rejection method we can simulate from Equation (11.2) as follows: (a) Generate independent random variables Y and U, Y being exponential with rate 1 and U being uniform on (0, 1). (b) If U # exp{−(Y − 1)2 /2}, or equivalently, if − log U ! (Y − 1)2 /2 set X = Y . Otherwise return to step (a).

Once we have simulated a random variable X having Density Function (11.2) we can then generate a standard normal random variable Z by letting Z be equally likely to be either X or −X . To improve upon the foregoing, note first that from Example 11.3 it follows that − log U will also be exponential with rate 1. Hence, steps (a) and (b) are equivalent to the following: (a′ ) Generate independent exponentials with rate 1, Y1 , and Y2 . (b′ ) Set X = Y1 if Y2 ! (Y1 − 1)2 /2. Otherwise return to step (a′ ).

Now suppose that we accept step (b′ ). It then follows by the lack of memory property of the exponential that the amount by which Y2 exceeds (Y1 − 1)2 /2 will also be exponential with rate 1. Hence, summing up, we have the following algorithm which generates an exponential with rate 1 and an independent standard normal random variable: Step 1: Generate Y1 , an exponential random variable with rate 1. Step 2: Generate Y2 , an exponential with rate 1. Step 3: If Y2 −(Y1 −1)2 /2 > 0, set Y = Y2 −(Y1 −1)2 /2 and go to step 4. Otherwise go to step 1. Step 4: Generate a random number U and set 4 Y1 , if U # 21 Z= −Y1 , if U > 21

The random variables Z and Y generated by the preceding are independent with Z being normal with mean 0 and variance 1 and Y being exponential with rate 1. (If we want the normal random variable to have mean µ and variance σ 2 , just take µ + σ Z .) "

654

Introduction to Probability Models

Remarks √ (i) Since c = 2e/π ≈ 1.32, the preceding requires a geometric distributed number of iterations of step 2 with mean 1.32. (ii) The final random number of step 4 need not be separately simulated but rather can be obtained from the first digit of any random number used earlier. That is, suppose we generate a random number to simulate an exponential; then we can strip off the initial digit of this random number and just use the remaining digits (with the decimal point moved one step to the right) as the random number. If this initial digit is 0, 1, 2, 3, or 4 (or 0 if the computer is generating binary digits), then we take the sign of Z to be positive and take it to be negative otherwise. (iii) If we are generating a sequence of standard normal random variables, then we can use the exponential obtained in step 4 as the initial exponential needed in step 1 for the next normal to be generated. Hence, on the average, we can simulate a unit normal by generating 1.64 exponentials and computing 1.32 squares.

11.2.3

The Hazard Rate Method

¯ Let F be a continuous distribution function with F(0) = 1. Recall that λ(t), the hazard rate function of F, is defined by λ(t) =

f (t) , t !0 ¯ F(t)

(where f (t) = F ′ (t) is the density function). Recall also that λ(t) represents the instantaneous probability intensity that an item having life distribution F will fail at time t given it has survived to that time. .∞ Suppose now that we are given a bounded function λ(t), such that 0 λ(t) dt = ∞, and we desire to simulate a random variable S having λ(t) as its hazard rate function. To do so let λ be such that λ(t) # λ for all t ! 0 To simulate from λ(t), t ! 0, we will (a) simulate a Poisson process having rate λ. We will then only “accept” or “count” certain of these Poisson events. Specifically we will (b) count an event that occurs at time t, independently of all else, with probability λ(t)/λ. We now have the following proposition. Proposition 11.3 The time of the first counted event—call it S— is a random variable whose distribution has hazard rate function λ(t), t ! 0.

Simulation

655

Proof. P{t < S < t + dt|S > t}

= P{first counted event in (t, t + dt)|no counted events prior to t} = P{Poisson event in (t, t + dt), it is counted|no counted events prior to t} = P{Poisson event in (t, t + dt), it is counted} λ(t) = [$ dt + o(dt)] = λ(t) dt + o(dt) λ

which completes the proof. Note that the next to last equality follows from the independent increment property of Poisson processes. " Because the interarrival times of a Poisson process having rate λ are exponential with rate λ, it thus follows from Example 11.3 and the previous proposition that the following algorithm will generate a random variable having hazard rate function λ(t), t ! 0.

Hazard Rate Method for Generating S: λs (t) = λ(t)

Let λ be such that λ(t) # λ for all t ! 0. Generate pairs of random variables Ui , X i , i ! 1, with X i being exponential with rate λ and Ui being uniform (0, 1), stopping at 4 + n -5 6 & λ N = min n : Un # λ Xi i=1

Set S=

N &

i=1

To compute E[N ] we need the result, known as Wald’s equation, which states that if X 1 , X 2 , . . . are independent and identically distributed random variables that are observed in sequence up to some random time N then E

7 N & i=1

= E[N ]E[X ]

More precisely let X 1 , X 2 , . . . denote a sequence of independent random variables and consider the following definition. Definition 11.1 An integer-valued random variable N is said to be a stopping time for the sequence X 1 , X 2 , . . . if the event {N = n} is independent of X n+1 , X n+2 , . . . for all n = 1, 2, . . . . Intuitively, we observe the X n s in sequential order and N denotes the number observed before stopping. If N = n, then we have stopped after observing X 1 , . . . , X n and before observing X n+1 , X n+2 , . . . for all n = 1, 2, . . . .

656

Introduction to Probability Models

Example 11.6

Let X n , n = 1, 2, . . . , be independent and such that

P{X n = 0} = P{X n = 1} = 21 , n = 1, 2, . . . If we let N = min{n : X 1 + · · · + X n = 10} then N is a stopping time. We may regard N as being the stopping time of an experiment that successively flips a fair coin and then stops when the number of heads reaches 10. " Proposition 11.4 (Wald’s Equation) If X 1 , X 2 , . . . are independent and identically distributed random variables having finite expectations, and if N is a stopping time for X 1 , X 2 , . . . such that E[N ] < ∞, then 7 N 8 & E X n = E[N ]E[X ] 1

Proof.

Letting 4 1, if N ! n In = 0, if N < n

we have N & n=1

Hence, 7 E

Xn =

N & n=1

∞ &

X n In

n=1

Xn = E

∞ & n=1

X n In =

∞ & n=1

E[X n In ]

(11.3)

However, In = 1 if and only if we have not stopped after successively observing X 1 , . . . , X n−1 . Therefore, In is determined by X 1 , . . . , X n−1 and is thus independent of X n . From Equation (11.3) we thus obtain 8 7 N ∞ & & Xn = E[X n ]E[In ] E n=1

n=1

= E[X ]

∞ & n=1

= E[X ]E

E[In ]

∞ & n=1

= E[X ]E[N ]

8 "

Simulation

657

Returning to the hazard rate method, we have S=

N &

i=1

$)n % As N = min{n: Un # λ 1 X i /λ} it follows that the event that N = n is independent of X n+1 , X n+2 , . . .. Hence, by Wald’s equation, E[S] = E[N ]E[X i ] =

E[N ] λ

E[N ] = λE[S]

where E[S] is the mean of the desired random variable.

11.3

Special Techniques for Simulating Continuous Random Variables

Special techniques have been devised to simulate from most of the common continuous distributions. We now present certain of these.

11.3.1

The Normal Distribution

Let X and Y denote independent standard normal random variables and thus have the joint density function 1 −$x 2 +y 2 %/2 e , −∞ < x < ∞, −∞ < y < ∞ 2π Consider now the polar coordinates of the point (X, Y ). As shown in Figure 11.1, f (x, y) =

R2 = X 2 + Y 2,

% = tan−1 Y /X

To obtain the joint density of R 2 and %, consider the transformation d = x 2 + y 2 , θ = tan−1 y/x

The Jacobian of this transformation is 9 9 9 ∂d ∂d 9 99 9 9 2x 9 ∂ x ∂ y 9 99 1 2 9=9 J = 99 1 −y 9 9 ∂θ ∂θ 9 9 9 ∂ x ∂ y 9 9 1 + y 2 /x 2 x 2 9 9 9 x y 99 9 9=2 y x = 2 99 9 9− x 2 + y 2 x 2 + y 2 9

9 9 9 1 29 1 1 9 9 1 + y 2 /x 2 x 9 2y

658

Introduction to Probability Models

Figure 11.1

Hence, from Section 2.5.3 the joint density of R 2 and % is given by 1 −d/2 1 e 2π 2 1 −d/2 1 = e , 0 < d < ∞, 0 < θ < 2π 2 2π

f R 2 ,% (d, θ) =

Thus, we can conclude that R 2 and % are independent with R 2 having an exponential distribution with rate 21 and % being uniform on (0, 2π ). Let us now go in reverse from the polar to the rectangular coordinates. From the preceding if we start with W , an exponential random variable with rate 21 (W plays the role of R 2 ) and with V , independent distributed over (0, 2π ) √ √ of W and uniformly (V plays the role of %) then X = W cos V, Y = W sin V will be independent standard normals. Hence, using the results of Example 11.3 we see that if U1 and U2 are independent uniform (0, 1) random numbers, then X = (−2 log U1 )1/2 cos(2πU2 ), Y = (−2 log U1 )1/2 sin(2πU2 )

(11.4)

are independent standard normal random variables. Remark The fact that X 2 + Y 2 has an exponential distribution with rate 21 is quite interesting for, by the definition of the chi-square distribution, X 2 + Y 2 has a chisquared distribution with two degrees of freedom. Hence, these two distributions are identical. The preceding approach to generating standard normal random variables is called the Box–Muller approach. Its efficiency suffers somewhat from its need to compute the preceding sine and cosine values. There is, however, a way to get around this potentially time-consuming difficulty. To begin, note that if U is uniform on (0, 1), then 2U is uniform on (0, 2), and so 2U − 1 is uniform on (−1, 1). Thus, if we generate random

Simulation

659

Figure 11.2

numbers U1 and U2 and set V1 = 2U1 − 1, V2 = 2U2 − 1

then (V1 , V2 ) is uniformly distributed in the square of area 4 centered at (0, 0) (see Figure 11.2). Suppose now that we continually generate such pairs (V1 , V2 ) until we obtain one that is contained in the circle of radius 1 centered at (0, 0)—that is, until (V1 , V2 ) is such that V12 + V22 # 1. It now follows that such a pair (V1 , V2 ) is uniformly distributed ¯ % ¯ denote the polar coordinates of this pair, then it is easy to in the circle. If we let R, ¯ are independent, with R¯ 2 being uniformly distributed on (0, 1), verify that R¯ and % ¯ uniformly distributed on (0, 2π ). and % Since V2 ¯ = V2 / R¯ = : , sin % V12 + V22 V1 ¯ = V1 / R¯ = : cos % V12 + V22

it follows from Equation (11.4) that we can generate independent standard normals X and Y by generating another random number U and setting ¯ X = (−2 log U )1/2 V1 / R, Y = (−2 log U )1/2 V2 / R¯

In fact, since (conditional on V12 + V22 # 1) R¯ 2 is uniform on (0, 1) and is independent ¯ we can use it instead of generating a new random number U ; thus showing that of %, ; −2 log S 2 1/2 V1 , X = (−2 log R¯ ) V1 / R¯ = S ; −2 log S V2 Y = (−2 log R¯ 2 )1/2 V2 / R¯ = S

660

Introduction to Probability Models

are independent standard normals, where S = R¯ 2 = V12 + V22 Summing up, we thus have the following approach to generating a pair of independent standard normals: Step 1: Step 2: Step 3: Step 4:

Generate random numbers U1 and U2 . Set V1 = 2U1 − 1, V2 = 2U2 − 1, S = V12 + V22 . If S > 1, return to step 1. Return the independent unit normals X=

;

−2 log S V1 , Y = S

;

−2 log S V2 S

The preceding is called the polar method. Since the probability that a random point in the square will fall within the circle is equal to π/4 (the area of the circle divided by the area of the square), it follows that, on average, the polar method will require 4/π = 1.273 iterations of step 1. Hence, it will, on average, require 2.546 random numbers, 1 logarithm, 1 square root, 1 division, and 4.546 multiplications to generate 2 independent standard normals.

11.3.2

The Gamma Distribution

To simulate from a gamma distribution with parameters (n, λ), where n is an integer, we use the fact that the sum of n independent exponential random variables each having rate λ has this distribution. Hence, if U1 , . . . , Un are independent uniform (0, 1) random variables, + n n < 1 1& log Ui = − log Ui X= λ λ i=1

i=1

has the desired distribution. When n is large, there are other techniques available that do not require so many random numbers. One possibility is to use the rejection procedure with g(x) being taken as the density of an exponential random variable with mean n/λ (as this is the mean of the gamma). It can be shown that for large n the average number of iterations needed by the rejection algorithm is e[(n − 1)/2π ]1/2 . In addition, if we wanted to generate a series of gammas, then, just as in Example 11.4, we can arrange things so that upon acceptance we obtain not only a gamma random variable but also, for free, an exponential random variable that can then be used in obtaining the next gamma (see Exercise 8).

11.3.3

The Chi-Squared Distribution

The chi-squared distribution with n degrees of freedom is the distribution of χn2 = Z 12 + · · · + Z n2 where Z i , i = 1, . . . , n are independent standard normals. Using

Simulation

661

the fact noted in the remark at the end of Section 3.1 we see that Z 12 + Z 22 has an 2 has a exponential distribution with rate 21 . Hence, when n is even—say, n = 2k—χ2k $ 1% =k gamma distribution with parameters k, 2 . Hence, −2 log ( i=1 Ui ) has a chi-squared distribution with 2k degrees of freedom. We can simulate a chi-squared random variable with 2k + 1 degrees of freedom by first simulating a standard normal random variable Z and then adding Z 2 to the preceding. That is, 2 χ2k+1

= Z − 2 log

+ k < i=1

where Z , U1 , . . . , Un are independent with Z being a standard normal and the others being uniform (0, 1) random variables.

11.3.4

The Beta (n, m) Distribution

The random variable X is said to have a beta distribution with parameters n, m if its density is given by f (x) =

(n + m − 1)! n−1 (1 − x)m−1 , 0 < x < 1 x (n − 1)!(m − 1)!

One approach to simulating from the preceding distribution is to let U1 , . . . , Un+m−1 be independent uniform (0, 1) random variables and consider the nth smallest value of this set—call it U(n) . Now U(n) will equal x if, of the n + m − 1 variables, (i) n − 1 are smaller than x, (ii) one equals x, (iii) m − 1 are greater than x. Hence, if the n + m − 1 uniform random variables are partitioned into three subsets of sizes n − 1, 1, and m − 1 the probability (density) that each of the variables in the first set is less than x, the variable in the second set equals x, and all the variables in the third set are greater than x is given by (P{U < x})n−1 fU (x)(P{U > x})m−1 = x n−1 (1 − x)m−1 Hence, as there are (n + m − 1)!/(n − 1)!(m − 1)! possible partitions, it follows that U(n) is beta with parameters (n, m). Thus, one way to simulate from the beta distribution is to find the nth smallest of a set of n + m − 1 random numbers. However, when n and m are large, this procedure is not particularly efficient. For another approach consider a Poisson process with rate 1, and recall that given Sn+m , the time of the (n + m)th event, the set of the first n + m − 1 event times is distributed independently and uniformly on (0, Sn+m ). Hence, given Sn+m , the nth smallest of the first n + m − 1 event times—that is, Sn —is distributed as the nth smallest of a set of n + m − 1 uniform (0, Sn+m ) random variables. But from the

662

Introduction to Probability Models

preceding we can thus conclude that Sn /Sn+m has a beta distribution with parameters (n, m). Therefore, if U1 , . . . , Un+m are random numbers, =n Ui − log i=1 =m+n is beta with parameters (n, m) − log i=1 Ui By writing the preceding as =n Ui − log i=1 =n = − log 1 Ui − log n+m n+1 Ui

we see that it has the same distribution as X /(X + Y ) where X and Y are independent gamma random variables with respective parameters (n, 1) and (m, 1). Hence, when n and m are large, we can efficiently simulate a beta by first simulating two gamma random variables.

11.3.5

The Exponential Distribution—The Von Neumann Algorithm

As we have seen, an exponential random variable with rate 1 can be simulated by computing the negative of the logarithm of a random number. Most computer programs for computing a logarithm, however, involve a power series expansion, and so it might be useful to have at hand a second method that is computationally easier. We now present such a method due to Von Neumann. To begin let U1 , U2 , . . . be independent uniform (0, 1) random variables and define N , N ! 2, by N = min{n: U1 ! U2 ! · · · ! Un−1 < Un } That is, N is the index of the first random number that is greater than its predecessor. Let us now compute the joint distribution of N and U1 . ! 1 P{N > n, U1 # y|U1 = x} d x P{N > n, U1 # y} = !0 y = P{N > n|U1 = x} d x 0

Now, given that U1 = x, N will be greater than n if x ! U2 ! · · · ! Un or, equivalently, if

(a) Ui # x, i = 2, . . . , n and

(b) U2 ! · · · ! Un

Now, (a) has probability x n−1 of occurring and given (a), since all of the (n − 1)! possible rankings of U2 , . . . , Un are equally likely, (b) has probability 1/(n − 1)! of occurring. Hence, P{N > n|U1 = x} =

x n−1 (n − 1)!

Simulation

663

and so P{N > n, U1 # y} =

y 0

x n−1 yn dx = (n − 1)! n!

which yields P{N = n, U1 # y} = P{N > n − 1, U1 # y} − P{N > n, U1 # y} yn y n−1 − = (n − 1)! n! Upon summing over all the even integers, we see that y2 y3 y4 + − − ··· 2! 3! 4! = 1 − e−y

P{N is even, U1 # y} = y −

(11.5)

We are now ready for the following algorithm for generating an exponential random variable with rate 1. Step 1: Generate uniform random numbers U1 , U2 , . . . stopping at N = min{n: U1 ! · · · ! Un−1 < Un }. Step 2: If N is even accept that run, and go to step 3. If N is odd reject the run, and return to step 1. Step 3: Set X equal to the number of failed runs plus the first random number in the successful run. To show that X is exponential with rate 1, first note that the probability of a successful run is, from Equation (11.5) with y = 1, P{N is even} = 1 − e−1 Now, in order for X to exceed x, the first [x] runs must all be unsuccessful and the next run must either be unsuccessful or be successful but have U1 > x − [x] (where [x] is the largest integer not exceeding x). As P{N even, U1 > y} = P{N even} − P{N even, U1 # y}

= 1 − e−1 − (1 − e−y ) = e−y − e−1

we see that P{X > x} = e−[x] [e−1 + e−(x−[x]) − e−1 ] = e−x which yields the result. Let T denote the number of trials needed to generate a successful run. As each trial is a success with probability 1−e−1 it follows that T is geometric with mean 1/(1−e−1 ). If we let Ni denote the number of uniform random variables used on the ith run, i ! 1, then T (being the first run i for which Ni is even) is a stopping time for this sequence.

664

Introduction to Probability Models

Hence, by Wald’s equation, the mean number of uniform random variables needed by this algorithm is given by 7 T 8 & Ni = E[N ]E[T ] E i=1

Now,

E[N ] =

∞ &

=1+ =1+ and so E

T & i=1

P{N > n}

n=0

∞ & n=1 ∞ & n=1

P{U1 ! · · · ! Un } 1/n! = e

e ≈ 4.3 1 − e−1

Hence, this algorithm, which computationally speaking is quite easy to perform, requires on the average about 4.3 random numbers to execute.

11.4

Simulating from Discrete Distributions

All of the general methods for simulating from continuous distributions have analogs in the discrete case. For instance, if we want to simulate a random variable X having probability mass function & Pj = 1 P{X = x j } = P j , j = 1, 2, . . . , j

we can use the following discrete time analog of the inverse transform technique: To simulate X for which P{X = xj } = Pj let U be uniformly distributed over (0, 1), and set ⎧ x1 , if U < P1 ⎪ ⎪ ⎪ ⎪ x 2 , if P1 < U < P1 + P2 ⎪ ⎪ ⎪ . ⎪ ⎪ ⎨ .. j−1 j X= & & ⎪ ⎪ Pi < U < Pi x j , if ⎪ ⎪ ⎪ ⎪ ⎪ 1 i ⎪ ⎪ ⎩ .. .

Simulation

665

As, P{X = x j } = P

⎧ j−1 ⎨& ⎩

Pi < U
j − 1} = 1 − (1 − p) j−1

we can simulate such a random variable by generating a random number U and then setting X equal to that value j for which 1 − (1 − p) j−1 < U < 1 − (1 − p) j

or, equivalently, for which

(1 − p) j < 1 − U < (1 − p) j−1

As 1 − U has the same distribution as U , we can thus define X by F E log U j X = min{ j: (1 − p) < U } = min j: j > log(1 − p) ( ' log U =1+ log(1 − p)

As in the continuous case, special simulation techniques have been developed for the more common discrete distributions. We now present certain of these. Example 11.8 (Simulating a Binomial Random Variable) A binomial (n, p) random variable can be most easily simulated by recalling that it can be expressed as the sum of n independent Bernoulli random variables. That is, if U1 , . . . , Un are independent uniform (0, 1) variables, then letting E 1, if Ui < p Xi = 0, otherwise )n X i is a binomial random variable with parameters n and p. it follows that X ≡ i=1 One difficulty with this procedure is that it requires the generation of n random numbers. To show how to reduce the number of random numbers needed, note first that this procedure does not use the actual value of a random number U but only whether or not it exceeds p. Using this and the result that the conditional distribution of U given that U < p is uniform on (0, p) and the conditional distribution of U given that U > p is uniform on ( p, 1), we now show how we can simulate a binomial (n, p) random variable using only a single random number:

666

Introduction to Probability Models

Step 1: Let α = 1/ p, β = 1/(1 − p). Step 2: Set k = 0. Step 3: Generate a uniform random number U . Step 4: If k = n stop. Otherwise reset k to equal k + 1. Step 5: If U # p set X k = 1 and reset U to equal αU . If U > p set X k = 0 and reset U to equal β(U − p). Return to step 4. )n This procedure generates X 1 , . . . , X n and X = i=1 X i is the desired random variable. It works by noting whether Uk # p or Uk > p; in the former case it takes Uk+1 to equal Uk / p, and in the latter case it takes Uk+1 to equal (Uk − p)/(1 − p).† " Example 11.9 (Simulating a Poisson Random Variable) To simulate a Poisson random variable with mean λ, generate independent uniform (0, 1) random variables U1 , U2 , . . . stopping at 4 6 n < N + 1 = min n: Ui < e−λ i=1

The random variable N has the desired distribution, which can be seen by noting that 4 6 n & N = max n: − log Ui < λ i=1

But − log Ui is exponential with rate 1, and so if we interpret − log Ui , i ! 1, as the interarrival times of a Poisson process having rate 1, we see that N = N (λ) would equal the number of events by time λ. Hence N is Poisson with mean λ. When λ is large we can reduce the amount of computation in the preceding simulation of N (λ), the number of events by time λ of a Poisson process having rate 1, by first choosing an integer m and simulating Sm , the time of the mth event of the Poisson process, and then simulating N (λ) according to the conditional distribution of N (λ) given Sm . Now the conditional distribution of N (λ) given Sm is as follows: N (λ)|Sm = s ∼ m + Poisson(λ − s), if s < λ 2 1 λ , if s > λ N (λ)|Sm = s ∼ Binomial m − 1, s where ∼ means “has the distribution of.” This follows since if the mth event occurs at time s, where s < λ, then the number of events by time λ is m plus the number of events in (s, λ). On the other hand given that Sm = s the set of times at which the first m − 1 events occur has the same distribution as a set of m − 1 uniform (0, s) random variables (see Section 5.3.5). Hence, when λ < s, the number of these that occur by time λ is binomial with parameters m − 1 and λ/s. Hence, we can simulate N (λ) by first simulating Sm and then simulating, either P(λ − Sm ), a Poisson random variable † Because of computer round-off errors, a single random number should not be continuously used when n is large.

Simulation

667

with mean λ − Sm , when Sm < λ, or simulating Bin(m − 1, λ/Sm ), a binomial random variable with parameters m − 1 and λ/Sm , when Sm > λ; and then setting E if Sm < λ m + P(λ − Sm ), N (λ) = Bin(m − 1, λ/Sm ), if Sm > λ In the preceding it has been found computationally effective to let m be approximately 7 8 λ. Of course, Sm is simulated by simulating from a gamma (m, λ) distribution via an approach that is computationally fast when m is large (see Section 11.3.3). " There are also rejection and hazard rate methods for discrete distributions but we leave their development as exercises. However, there is a technique available for simulating finite discrete random variables—called the alias method—which, though requiring some setup time, is very fast to implement.

11.4.1

The Alias Method

In what follows, the quantities P, P(k) , Q(k) , k # n − 1 will represent probability mass functions on the integers 1, 2, . . . , n—that is, they will be n-vectors of nonnegative numbers summing to 1. In addition, the vector P(k) will have at most k nonzero components, and each of the Q(k) will have at most two nonzero components. We show that any probability mass function P can be represented as an equally weighted mixture of n − 1 probability mass functions Q (each having at most two nonzero components). That is, we show that for suitably defined Q(1) , . . . , Q(n−1) , P can be expressed as n−1

1 & (k) Q n−1

(11.6)

k=1

As a prelude to presenting the method for obtaining this representation, we will need the following simple lemma whose proof is left as an exercise. Lemma 11.5

Let P = {Pi , i = 1, . . . , n} denote a probability mass function, then

(a) there exists an i, 1 # i # n, such that Pi < 1/(n − 1), and (b) for this i, there exists a j, j ̸= i, such that Pi + P j ! 1/(n − 1).

Before presenting the general technique for obtaining the representation of Equation (11.6), let us illustrate it by an example. 7 , P2 = 21 , P3 = Example 11.10 Consider the three-point distribution P with P1 = 16 1 16 . We start by choosing i and j such that they satisfy the conditions of Lemma 11.5. As P3 < 21 and P3 + P2 > 21 , we can work with i = 3 and j = 2. We will now define a two-point mass function Q(1) putting all of its weight on 3 and 2 and such that P will be expressible as an equally weighted mixture between Q(1) and a second two-point mass function Q(2) . Secondly, all of the mass of point 3 will be contained in Q(1) . As we will have

Pj =

1 (1) (2) (Q + Q j ), 2 j

j = 1, 2, 3

(11.7)

668

Introduction to Probability Models (2)

and, by the preceding, Q 3 is supposed to equal 0, we must therefore take 1 7 (1) (1) , Q2 = 1 − Q3 = , 8 8 To satisfy Equation (11.7), we must then set (1)

Q 3 = 2P3 =

(1)

Q1 = 0

1 7 7 (2) = , Q 1 = 2P1 = 8 8 8 Hence, we have the desired representation in this case. Suppose now that the original distribution was the following four-point mass function: (2)

Q 3 = 0,

P1 =

7 , 16

(2)

Q 2 = 2P2 −

P2 =

1 , 4

P3 =

1 , 8

P4 =

3 16

Now, P3 < 13 and P3 + P1 > 13 . Hence our initial two-point mass function—Q(1) —will concentrate on points 3 and 1 (giving no weights to 2 and 4). As the final representation will give weight 13 to Q(1) and in addition the other Q( j) , j = 2, 3, will not give any mass to the value 3, we must have 1 1 (1) Q = P3 = 3 3 8 Hence, 5 3 3 (1) , Q1 = 1 − = 8 8 8 Also, we can write (1)

Q3 =

1 (1) 2 (3) Q + P 3 3

where P(3) , to satisfy the preceding, must be the vector 2 1 11 3 1 = P1 − Q (1) , P1(3) = 2 3 1 32 3 3 P2(3) = P2 = , 2 8 (3) P3 = 0, 3 9 (3) P4 = P4 = 2 32 Note that P(3) gives no mass to the value 3. We can now express the mass function P(3) as an equally weighted mixture of two-point mass functions Q(2) and Q(3) , and we will end up with 2 1 2 1 (2) 1 (3) 1 Q + Q P = Q(1) + 3 3 2 2 1 (1) = (Q + Q(2) + Q(3) ) 3 (We leave it as an exercise for you to fill in the details.) "

Simulation

669

The preceding example outlines the following general procedure for writing the n-point mass function P in the form of Equation (11.6) where each of the Q(i) are mass functions giving all their mass to at most two points. To start, we choose i and j satisfying the conditions of Lemma 11.5. We now define the mass function Q(1) concentrating on the points i and j and which will contain all of the mass for point i by noting that, in the representation of Equation (11.6), Q i(k) = 0 for k = 2, . . . , n − 1, implying that (1)

(1)

= (n − 1)Pi , and so Q j = 1 − (n − 1)Pi

Writing P=

n − 2 (n−1) 1 Q(1) + P n−1 n−1

(11.8)

where P(n−1) represents the remaining mass, we see that (n−1)

(n−1)

= 0, 2 1 1 2 n−1 n−1 1 1 (1) = Pj − Qj Pi + P j − , = n−2 n−1 n−2 n−1 n−1 Pk , k ̸= i or j = n−2

That the foregoing is indeed a probability mass function is easily checked—for instance, the nonnegativity of P j(n−1) follows from the fact that j was chosen so that Pi + P j ! 1/(n − 1). We may now repeat the foregoing procedure on the (n − 1)-point probability mass function P(n−1) to obtain P(n−1) =

1 n − 3 (n−2) Q(2) + P n−2 n−2

and thus from Equation (11.8) we have P=

1 n − 3 (n−2) 1 Q(1) + Q(2) + P n−1 n−1 n−1

We now repeat the procedure on P(n−2) and so on until we finally obtain P=

1 (Q(1) + · · · + Q(n−1) ) n−1

In this way we are able to represent P as an equally weighted mixture of n − 1 twopoint mass functions. We can now easily simulate from P by first generating a random integer N equally likely to be either 1, 2, . . ., or n − 1. If the resulting value N is such that Q(N ) puts positive weight only on the points i N and j N , then we can set X (N ) equal to i N if a second random number is less than Q i N and equal to j N otherwise. The random variable X will have probability mass function P. That is, we have the following procedure for simulating from P:

670

Introduction to Probability Models

Step 1: Step 2:

and set N = 1 + [(n − 1)U1 ]. and set

Generate U1 Generate U2 4 i , X= N jN ,

(N )

if U2 < Q i N otherwise

Remarks (i) The preceding is called the alias method because by a renumbering of the Qs we (k) can always arrange things so that for each k, Q k > 0. (That is, we can arrange things so that the kth two-point mass function gives positive weight to the value k.) Hence, the procedure calls for simulating N , equally likely to be 1, 2, . . ., or n − 1, and then if N = k it either accepts k as the value of X , or it accepts for the value of X the “alias” of k (namely, the other value that Q(k) gives positive weight). (ii) Actually, it is not necessary to generate a new random number in step 2. Because N −1 is the integer part of (n−1)U1 , it follows that the remainder (n−1)U1 −(N −1) is independent of U1 and is uniformly distributed in (0, 1). Hence, rather than generating a new random number U2 in step 2, we can use (n − 1)U1 − (N − 1) = (n − 1)U1 − [(n − 1)U1 ]. Example 11.11 Let us return to the problem of Example 11.1, which considers a list of n, not necessarily distinct, items. Each item has a value—v(i) being the value of the item in position i—and we are interested in estimating v=

n &

v(i)/m(i)

i=1

where m(i) is the number of times the item in position i appears on the list. In words, v is the sum of the values of the (distinct) items on the list. To estimate v, note that if X is a random variable such that P{X = i} = v(i)

n G& 1

v( j), i = 1, . . . , n

then ) n G& i v(i)/m(i) E[1/m(X )] = ) =v v( j) j v( j) j=1

Hence, we can estimate v by using the alias (or any other) method to generate independent random variables X 1 , . . . , X k having the same distribution as X and then estimating v by n

j=1

i=1

& 1& v≈ v( j) 1/m(X i ) k

Simulation

11.5

671

Stochastic Processes

We can easily simulate a stochastic process by simulating a sequence of random variables. For instance, to simulate the first t time units of a renewal process having interarrival distribution F we can simulate independent random variables X 1 , X 2 , . . . having distribution F, stopping at N = min{n: X 1 + · · · + X n > t} The X i , i ! 1, represent the interarrival times of the renewal process and so the preceding simulation yields N − 1 events by time t—the events occurring at times X 1 , X 1 + X 2 , . . . , X 1 + · · · + X N −1 . Actually there is another approach for simulating a Poisson process that is quite efficient. Suppose we want to simulate the first t time units of a Poisson process having rate λ. To do so, we can first simulate N (t), the number of events by t, and then use the result that given the value of N (t), the set of N (t) event times is distributed as a set of n independent uniform (0, t) random variables. Hence, we start by simulating N (t), a Poisson random variable with mean λt (by one of the methods given in Example 11.9). Then, if N (t) = n, generate a new set of n random numbers—call them U1 , . . . , Un — and {tU1 , . . . , tUn } will represent the set of N (t) event times. If we could stop here this would be much more efficient than simulating the exponentially distributed interarrival times. However, we usually desire the event times in increasing order—for instance, for s < t, N (s) = number of Ui : tUi # s and so to compute the function N (s), s # t, it is best to first order the values Ui , i = 1, . . . , n before multiplying by t. However, in doing so you should not use an allpurpose sorting algorithm, such as quick sort (see Example 3.14), but rather one that takes into account that the elements to be sorted come from a uniform (0, 1) population. Such a sorting algorithm of n uniform (0, 1) variables is as follows: Rather than a single list to be sorted of length n we will consider n ordered, or linked, lists of random size. The value U will be put in list i if its value is between (i − 1)/n and i/n—that is, U is put in list [nU ] + 1. The individual lists are then ordered, and the total linkage of all the lists is the desired ordering. As almost all of the n lists will be of relatively small size (for instance, if n = 1000 the mean number of lists of size greater than 4 is (using the 65 −1 e ) ≃ 4) Poisson approximation to the binomial) approximately equal to 1000(1 − 24 the sorting of individual lists will be quite quick, and so the running time of such an algorithm will be proportional to n (rather than to n log n as in the best all-purpose sorting algorithms). An extremely important counting process for modeling purposes is the nonhomogeneous Poisson process, which relaxes the Poisson process assumption of stationary increments. Thus it allows for the possibility that the arrival rate need not be constant but can vary with time. However, there are few analytical studies that assume a nonhomogeneous Poisson arrival process for the simple reason that such models are not usually mathematically tractable. (For example, there is no known expression for

672

Introduction to Probability Models

the average customer delay in the single-server exponential service distribution queueing model that assumes a nonhomogeneous arrival process.)‡ Clearly such models are strong candidates for simulation studies.

11.5.1

Simulating a Nonhomogeneous Poisson Process

We now present three methods for simulating a nonhomogeneous Poisson process having intensity function λ(t), 0 # t < ∞.

Method 1. Sampling a Poisson Process To simulate the first T time units of a nonhomogeneous Poisson process with intensity function λ(t), let λ be such that λ(t) # λ for all t # T Now, as shown in Chapter 5, such a nonhomogeneous Poisson process can be generated by a random selection of the event times of a Poisson process having rate λ. That is, if an event of a Poisson process with rate λ that occurs at time t is counted (independently of what has transpired previously) with probability λ(t)/λ then the process of counted events is a nonhomogeneous Poisson process with intensity function λ(t), 0 # t # T . Hence, by simulating a Poisson process and then randomly counting its events, we can generate the desired nonhomogeneous Poisson process. We thus have the following procedure: Generate independent random variables X 1 , U1 , X 2 , U2 , . . . where the X i are exponential with rate λ and the Ui are random numbers, stopping at 6 4 n & Xi > T N = min n: i=1

Now let, for j = 1, . . . , N − 1, # ") 4 j 1, if U j # λ /λ X i i=1 Ij = 0, otherwise and set J = { j: I j = 1}

)j Thus, the counting process having events at the set of times { i=1 X i : j ∈ J } constitutes the desired process. The foregoing procedure, referred to as the thinning algorithm (because it “thins” the homogeneous Poisson points) will clearly be most efficient, in the sense of having the fewest number of rejected event times, when λ(t) is near λ throughout the interval. ‡ One queueing model that assumes a nonhomogeneous Poisson arrival process and is mathematically tractable is the infinite server model.

Simulation

673

Thus, an obvious improvement is to break up the interval into subintervals and then use the procedure over each subinterval. That is, determine appropriate values k, 0 < t1 < t2 < · · · < tk < T, λ1 , . . . , λk+1 , such that λ(s) # λi when ti−1 # s < ti , i = 1, . . . , k + 1 (where t0 = 0, tk+1 = T )

(11.9)

Now simulate the nonhomogeneous Poisson process over the interval (ti−1 , ti ) by generating exponential random variables with rate λi and accepting the generated event occurring at time s, s ∈ (ti−1 , ti ), with probability λ(s)/λi . Because of the memoryless property of the exponential and the fact that the rate of an exponential can be changed upon multiplication by a constant, it follows that there is no loss of efficiency in going from one subinterval to the next. In other words, if we are at t ∈ [ti−1 , ti ) and generate X , an exponential with rate λi , which is such that t + X > ti then we can use λi [X −(ti − t)]/λi+1 as the next exponential with rate λi+1 . Thus, we have the following algorithm for generating the first t time units of a nonhomogeneous Poisson process with intensity function λ(s) when the relations (11.9) are satisfied. In the algorithm, t will represent the present time and I the present interval (that is, I = i when ti−1 # t < ti ). Step 1: t = 0, I = 1. Step 2: Generate an exponential random variable X having rate λ I . Step 3: If t + X < t I , reset t = t + X , generate a random number U , and accept the event time t if U # λ(t)/λ I . Return to step 2. Step 4: (Step reached if t + X ! t I ). Stop if I = k + 1. Otherwise, reset X = (X − t I + t)λ I /λ I +1 . Also reset t = t I and I = I + 1, and go to step 3. Suppose now that over some subinterval (ti−1 , ti ) it follows that λi > 0 where λi ≡ infimum {λ(s): ti−1 # s < ti } In such a situation, we should not use the thinning algorithm directly but rather should first simulate a Poisson process with rate λi over the desired interval and then simulate a nonhomogeneous Poisson process with the intensity function λ(s) = λ(s) − λi when s ∈ (ti−1 , ti ). (The final exponential generated for the Poisson process, which carries one beyond the desired boundary, need not be wasted but can be suitably transformed so as to be reusable.) The superposition (or, merging) of the two processes yields the desired process over the interval. The reason for doing it this way is that it saves the need to generate uniform random variables for a Poisson distributed number, with mean λi (ti − ti−1 ) of the event times. For instance, consider the case where λ(s) = 10 + s, 0 < s < 1 Using the thinning method with λ = 11 would generate an expected number of 11 events each of which would require a random number to determine whether or not to accept it. On the other hand, to generate a Poisson process with rate 10 and then merge it with a generated nonhomogeneous Poisson process with rate λ(s) = s, 0 < s < 1, would yield an equally distributed number of event times but with the expected number needing to be checked to determine acceptance being equal to 1.

674

Introduction to Probability Models

Figure 11.3

Another way to make the simulation of nonhomogeneous Poisson processes more efficient is to make use of superpositions. For instance, consider the process where ⎧ 0 < t < 1.5 ⎨ exp{t 2 }, 1.5 < t < 2.5 λ(t) = exp{2.25}, ⎩ exp{(4 − t)2 }, 2.5 < t < 4

A plot of this intensity function is given in Figure 11.3. One way of simulating this process up to time 4 is to first generate a Poisson process with rate 1 over this interval; then generate a Poisson process with rate e − 1 over this interval, accept all events in (1, 3), and only accept an event at time t that is not contained in (1, 3) with probability [λ(t) − 1]/(e − 1); then generate a Poisson process with rate e2.25 − e over the interval (1, 3), accepting all event times between 1.5 and 2.5 and any event time t outside this interval with probability [λ(t) − e]/(e2.25 − e). The superposition of these processes is the desired nonhomogeneous Poisson process. In other words, what we have done is to break up λ(t) into the following nonnegative parts: λ(t) = λ1 (t) + λ2 (t) + λ3 (t), 0 < t < 4

where

λ1 (t) ≡ 1, ⎧ ⎨λ(t) − 1, 0 < t < 1 1 λ 0

and determine X 1 , . . . , X N −1 by λ λ

0 X2 X1

X N −1 X N −2

g(x) d x = ϵ1 , g(x) d x = ϵ2 , .. . g(x) d x = ϵ N −1

If we now simulate U1 , . . . , U N −1 —independent uniform (0, 1) random numbers— then as the projection on the y-axis of the Poisson point whose x-coordinate is X i is uniform on (0, g(X i )), it follows that the simulated Poisson points in the interval are (X i , Ui g(X i )), i = 1, . . . , N − 1.

680

Introduction to Probability Models

Of course, the preceding technique is most useful when g is regular enough so that the foregoing equations can be solved for the X i . For instance, if g(x) = y (and so the region of interest is a rectangle), then Xi =

ϵ1 + · · · + ϵi , i = 1, . . . , N − 1 λy

and the Poisson points are (X i , yUi ), i = 1, . . . , N − 1

11.6

Variance Reduction Techniques

Let X 1 , . . . , X n have a given joint distribution, and suppose we are interested in computing θ ≡ E[g(X 1 , . . . , X n )]

where g is some specified function. It is often the case that it is not possible to analytically compute the preceding, and when such is the case we can attempt to use simulation (1) (1) to estimate θ . This is done as follows: Generate X 1 , . . . , X n having the same joint distribution as X 1 , . . . , X n and set # " (1) Y1 = g X 1 , . . . , X n(1)

Now, simulate a second set of random variables (independent of the first set) (2) (2) X 1 , . . . , X n having the distribution of X 1 , . . . , X n and set # " (2) Y2 = g X 1 , . . . , X n(2) Continue this until you have generated k (some predetermined number) sets, and so have also computed Y1 , Y2 , . . . , Yk . Now, Y1 , . . . , Yk are independent and identically distributed random variables each having the same distribution of g(X 1 , . . . , X n ). Thus, if we let Y¯ denote the average of these k random variables—that is, Y¯ = then

k &

Yi /k

i=1

E[Y¯ ] = θ , 0 / E (Y¯ − θ )2 = Var(Y¯ )

Hence, we can use Y¯ as an estimate of θ . As the expected square of the difference between Y¯ and θ is equal to the variance of Y¯ , we would like this quantity to be as small as possible. In the preceding situation, Var(Y¯ ) = Var(Yi )/k, which is usually not known in advance but must be estimated from the generated values Y1 , . . . , Yn . We now present three general techniques for reducing the variance of our estimator.

Simulation

11.6.1

681

Use of Antithetic Variables

In the preceding situation, suppose that we have generated Y1 and Y2 , identically distributed random variables having mean θ . Now, 2 1 1 Y1 + Y2 = [Var(Y1 ) + Var(Y2 ) + 2Cov(Y1 , Y2 )] Var 2 4 Var(Y1 ) Cov(Y1 , Y2 ) = + 2 2 Hence, it would be advantageous (in the sense that the variance would be reduced) if Y1 and Y2 rather than being independent were negatively correlated. To see how we could arrange this, let us suppose that the random variables X 1 , . . . , X n are independent and, in addition, that each is simulated via the inverse transform technique. That is, X i is simulated from Fi−1 (Ui ) where Ui is a random number and Fi is the distribution of X i . Hence, Y1 can be expressed as " # Y1 = g F1−1 (U1 ), . . . , Fn−1 (Un )

Now, since 1 − U is also uniform over (0, 1) whenever U is a random number (and is negatively correlated with U ) it follows that Y2 defined by " # Y2 = g F1−1 (1 − U1 ), . . . , Fn−1 (1 − Un )

will have the same distribution as Y1 . Hence, if Y1 and Y2 were negatively correlated, then generating Y2 by this means would lead to a smaller variance than if it were generated by a new set of random numbers. (In addition, there is a computational savings since rather than having to generate n additional random numbers, we need only subtract each of the previous n from 1.) The following theorem will be the key to showing that this technique—known as the use of antithetic variables—will lead to a reduction in variance whenever g is a monotone function. Theorem 11.1 If X 1 , . . . , X n are independent, then, for any increasing functions f and g of n variables, E[ f (X)g(X)] ! E[ f (X)]E[g(X)]

(11.11)

where X = (X 1 , . . . , X n ).

Proof. The proof is by induction on n. To prove it when n = 1, let f and g be increasing functions of a single variable. Then, for any x and y, ( f (x) − f (y))(g(x) − g(y)) ! 0 since if x ! y (x # y) then both factors are nonnegative (nonpositive). Hence, for any random variables X and Y , ( f (X ) − f (Y ))(g(X ) − g(Y )) ! 0

682

Introduction to Probability Models

implying that E[( f (X ) − f (Y ))(g(X ) − g(Y ))] ! 0 or, equivalently, E[ f (X )g(X )] + E[ f (Y )g(Y )] ! E[ f (X )g(Y )] + E[ f (Y )g(X )] If we suppose that X and Y are independent and identically distributed, as in this case, then E[ f (X )g(X )] = E[ f (Y )g(Y )], E[ f (X )g(Y )] = E[ f (Y )g(X )] = E[ f (X )]E[g(X )] and so we obtain the result when n = 1. So assume that (11.11) holds for n − 1 variables, and now suppose that X 1 , . . . , X n are independent and f and g are increasing functions. Then E[ f (X)g(X)|X n = xn ]

= E[ f (X 1 , . . . , X n−1 , xn )g(X 1 , . . . , X n−1 , xn )|X n = x] = E[ f (X 1 , . . . , X n−1 , xn )g(X 1 , . . . , X n−1 , xn )] by independence ! E[ f (X 1 , . . . , X n−1 , xn )]E[g(X 1 , . . . , X n−1 , xn )] by the induction hypothesis = E[ f (X)|X n = xn ]E[g(X)|X n = xn ]

Corollary 11.7 If U1 , . . . , Un are independent, and k is either an increasing or decreasing function, then % $ Cov k(U1 , . . . , Un ), k(1 − U1 , . . . , 1 − Un ) # 0

Simulation

683

Proof. Suppose k is increasing. As −k(1 − U1 , . . . , 1 − Un ) is increasing in U1 , . . . , Un , then, from Theorem 11.1, $ % Cov k(U1 , . . . , Un ), −k(1 − U1 , . . . , 1 − Un ) ! 0 When k is decreasing just replace k by its negative.

Since Fi−1 (Ui ) is increasing in Ui (as Fi , being a distribution function, is increasing) it follows that g(F1−1 (U1 ), . . . , Fn−1 (Un )) is a monotone function of U1 , . . . , Un whenever g is monotone. Hence, if g is monotone the antithetic variable approach of twice using each set of random numbers U1 , . . . , Un by first computing g(F1−1 (U1 ), . . . , Fn−1 (Un )) and then g(F1−1 (1 − U1 ), . . . , Fn−1 (1 − Un )) will reduce the variance of the estimate of E[g(X 1 , . . . , X n )]. That is, rather than generating k sets of n random numbers, we should generate k/2 sets and use each set twice. Example 11.14 (Simulating the Reliability Function) Consider a system of n components in which component i, independently of other components, works with probability pi , i = 1, . . . , n. Letting E 1, if component i works Xi = 0, otherwise suppose there is a monotone structure function φ such that E 1, if the system works under X 1 , . . . , X n φ(X 1 , . . . , X n ) = 0, otherwise We are interested in using simulation to estimate r ( p1 , . . . , pn ) ≡ E[φ(X 1 , . . . , X n )] = P{φ(X 1 , . . . , X n ) = 1} Now, we can simulate the X i by generating uniform random numbers U1 , . . . , Un and then setting E 1, if Ui < pi Xi = 0, otherwise Hence, we see that φ(X 1 , . . . , X n ) = k(U1 , . . . , Un ) where k is a decreasing function of U1 , . . . , Un . Hence, Cov(k(U), k(1 − U)) # 0 and so the antithetic variable approach of using U1 , . . . , Un to generate both k(U1 , . . . , Un ) and k(1 − U1 , . . . , 1 − Un ) results in a smaller variance than if an independent set of random numbers was used to generate the second k. "

684

Introduction to Probability Models

Example 11.15 (Simulating a Queueing System) Consider a given queueing system, let Di denote the delay in queue of the ith arriving customer, and suppose we are interested in simulating the system so as to estimate θ = E[D1 + · · · + Dn ] Let X 1 , . . . , X n denote the first n interarrival times and S1 , . . . , Sn the first n service times of this system, and suppose these random variables are all independent. Now in most systems D1 + · · · + Dn will be a function of X 1 , . . . , X n , S1 , . . . , Sn —say, D1 + · · · + Dn = g(X 1 , . . . , X n , S1 , . . . , Sn ) Also, g will usually be increasing in Si and decreasing in X i , i = 1, . . . , n. If we use the inverse transform method to simulate X i , Si , i = 1, . . . , n—say, X i = Fi−1 (1 − Ui ), Si = G i−1 (U¯ i ) where U1 , . . . , Un , U¯ 1 , . . . , U¯ n are independent uniform random numbers—then we may write D1 + · · · + Dn = k(U1 , . . . , Un , U¯ 1 , . . . , U¯ n ) where k is increasing in its variates. Hence, the antithetic variable approach will reduce the variance of the estimator of θ . (Thus, we would generate Ui , U¯ i , i = 1, . . . , n and set X i = Fi−1 (1 − Ui ) and Yi = G i−1 (U¯ i ) for the first run, and X i = Fi−1 (Ui ) and Yi = G i−1 (1 − U¯ i ) for the second.) As all the Ui and U¯ i are independent, however, this is equivalent to setting X i = Fi−1 (Ui ), Yi = G i−1 (U¯ i ) in the first run and using 1 − Ui for Ui and 1 − U¯ i for U¯ i in the second. "

11.6.2

Variance Reduction by Conditioning

Let us start by recalling (see Proposition 3.1) the conditional variance formula Var(Y ) = E[Var(Y |Z )] + Var(E[Y |Z ])

(11.12)

Now suppose we are interested in estimating E[g(X 1 , . . . , X n )] by simulating X = (X 1 , . . . , X n ) and then computing Y = g(X 1 , . . . , X n ). Now, if for some random variable Z we can compute E[Y |Z ] then, as Var(Y |Z ) ! 0, it follows from the conditional variance formula that Var(E[Y |Z ]) # Var(Y ) implying, since E[E[Y |Z ]] = E[Y ], that E[Y |Z ] is a better estimator of E[Y ] than is Y . In many situations, there are a variety of Z i that can be conditioned on to obtain an improved estimator. Each of these estimators E[Y |Z i ] will have mean E[Y ] and smaller variance than )does the raw ) estimator Y . We now show that for any choice of weights λi , λi ! 0, i λi = 1, i λi E[Y |Z i ] is also an improvement over Y . )∞ λi = 1, Proposition 11.8 For any λi ! 0, i=1 J) K (a) E i λi E[Y |Z i ] = E[Y ],

Simulation

(b) Var

685

% $ % λi E[Y |Z i ] # Var Y .

Proof. The proof of (a) is immediate. To prove (b), let N denote an integer valued random variable independent of all the other random variables under consideration and such that P{N = i} = λi , i ! 1 Applying the conditional variance formula twice yields Var(Y ) ! Var(E[Y |N , Z N ]) $ % ! Var E[E[Y |N , Z N ]|Z 1 , . . .] & = Var λi E[Y |Z i ]

Example 11.16 Consider a queueing system having Poisson arrivals and suppose that any customer arriving when there are already N others in the system is lost. Suppose that we are interested in using simulation to estimate the expected number of lost customers by time t. The raw simulation approach would be to simulate the system up to time t and determine L, the number of lost customers for that run. A better estimate, however, can be obtained by conditioning on the total time in [0, t] that the system is at capacity. Indeed, if we let T denote the time in [0, t] that there are N in the system, then E[L|T ] = λT where λ is the Poisson arrival rate. Hence, a better estimate for E[L] than the average value of L over all simulation runs can be obtained by multiplying the average value of T per simulation run by λ. If the arrival process were a nonhomogeneous Poisson process, then we could improve over the raw estimator L by keeping track of those time periods for which the system is at capacity. If we let I1 , . . . , IC denote the time intervals in [0, t] in which there are N in the system, then E[L|I1 , . . . , IC ] =

C ! & i=1

λ(s) ds Ii

where λ(s) is the intensity function of the nonhomogeneous Poisson arrival process. The use of the right side of the preceding would thus lead to a better estimate of E[L] than the raw estimator L. " Example 11.17 Suppose that we wanted to estimate the expected sum of the times in the system of the first n customers in a queueing system. That is, if Wi is the time that the ith customer spends in the system, then we are interested in estimating 7 n 8 & θ=E Wi i=1

686

Introduction to Probability Models

Let Yi denote the “state of the system” at the moment at which ) the ith customer arrives. n It can be shown§ that for a wide class of models the estimator i=1 E[Wi |Yi ] has (the )n same mean and) a smaller variance than the estimator i=1 Wi . (It should be noted that whereas it is immediate that E[Wi |Yi ] has smaller variance than )n Wi , because of the covariance terms involved it is not immediately apparent that i=1 E[Wi |Yi ] has )n Wi .) For instance, in the model G/M/1 smaller variance than i=1 E[Wi |Yi ] = (Ni + 1)/µ

where Ni is the number in the system encountered by the ith arrival and 1/µ is the )n (Ni + 1)/µ is a better estimate of the mean service time; the result implies that i=1 expected total time in the system of the first n customers than is the raw estimator )n " i=1 Wi .

Example 11.18 (Estimating the Renewal Function by Simulation) Consider a queueing model in which customers arrive daily in accordance with a renewal process having interarrival distribution F. However, suppose that at some fixed time T , for instance 5 P.M., no additional arrivals are permitted and those customers that are still in the system are serviced. At the start of the next and each succeeding day customers again begin to arrive in accordance with the renewal process. Suppose we are interested in determining the average time that a customer spends in the system. Upon using the theory of renewal reward processes (with a cycle starting every T time units), it can be shown that average time that a customer spends in the system E[sum of the times in the system of arrivals in (0, T )] = m(T ) where m(T ) is the expected number of renewals in (0, T ). If we were to use simulation to estimate the preceding quantity, a run would consist of simulating a single day, and as part of a simulation run, we would observe the quantity N (T ), the number of arrivals by time T . Since E[N (T )] = m(T ), the natural simulation estimator of m(T ) would be the average (over all simulated days) value of N (T ) obtained. However, Var(N (T )) is, for large T , proportional to T (its asymptotic form being T σ 2 /µ3 , where σ 2 is the variance and µ the mean of the interarrival distribution F), and so, for large T , the variance of our estimator would be large. A considerable improvement can be obtained by using the analytic formula (see Section 7.3) E[Y (T )] T (11.13) m(T ) = − 1 + µ µ where Y (T ) denotes the time from T until the next renewal—that is, it is the excess life at T . Since the variance of Y (T ) does not grow with T (indeed, it converges to a finite value provided the moments of F are finite), it follows that for T large, we would do much better by using the simulation to estimate E[Y (T )] and then using Equation (11.13) to estimate m(T ). § S. M. Ross, “Simulating Average Delay—Variance Reduction by Conditioning,” Probability in the Engineering and Informational Sciences 2(3), (1988), pp. 309–312.

Simulation

687

Figure 11.5 A(T ) = x.

However, by employing conditioning, we can improve further on our estimate of m(T ). To do so, let A(T ) denote the age of the renewal process at time T —that is, it is the time at T since the last renewal. Then, rather than using the value of Y (T ), we can reduce the variance by considering E[Y (T )|A(T )]. Now, knowing that the age at T is equal to x is equivalent to knowing that there was a renewal at time T − x and the next interarrival time X is greater than x. Since the excess at T will equal X − x (see Figure 11.5), it follows that E[Y (T )|A(T ) = x] = E[X − x|X > x] ! ∞ P{X − x > t} dt = P{X > x} 0 ! ∞ [1 − F(t + x)] dt = 1 − F(x) 0

which can be numerically evaluated if necessary. As an illustration of the preceding note that if the renewal process is a Poisson process with rate λ, then the raw simulation estimator N (T ) will have variance λT ; since Y (T ) will be exponential with rate λ, the estimator based on (11.13) will have variance λ2 Var {Y (T )} = 1. On the other hand, since Y (T ) will be independent of A(T ) (and E[Y (T )|A(T )] = 1/λ), it follows that the variance of the improved estimator E[Y (T )|A(T )] is 0. That is, conditioning on the age at time T yields, in this case, the exact answer. " Example 11.19 Consider the M/G/1 queueing system where customers arrive in accordance with a Poisson process with rate λ to a single server having service distribution G with mean E[S]. Suppose that, for a specified time t0 , the server will take a break at the first time t ! t0 at which the system is empty. That is, if X (t) is the number of customers in the system at time t, then the server will take a break at time T = min{t ! t0 : X (t) = 0} To efficiently use simulation to estimate E[T ], generate the system to time t0 ; let R denote the remaining service time of the customer in service at time t0 , and let X Q equal the number of customers waiting in queue at time t0 . (Note that R is equal to 0 if X (t0 ) = 0, and X Q = (X (t0 ) − 1)+ .) Now, with N equal to the number of customers that arrive in the remaining service time R, it follows that if N = n and X Q = n Q , then the additional amount of time from t0 + R until the server can take a break is equal to the amount of time that it takes until the system, starting with n + n Q customers, becomes empty. Because this is equal to the sum of n + nQ busy periods, it follows from Section 8.5.3 that E[T |R, N , X Q ] = t0 + R + (N + X Q )

E[S] 1 − λE[S]

688

Introduction to Probability Models

Consequently, K J E[T |R, X Q ] = E E[T |R, N , X Q ]|R, X Q

= t0 + R + (E[N |R, X Q ] + X Q )

= t0 + R + (λR + X Q )

E[S] 1 − λE[S]

Thus, rather than using the generated value of T as the estimator from a simulation run, it E[S] is better to stop the simulation at time t0 and use the estimator t0 + (λR + X Q ) 1−λE[S] . "

11.6.3

Control Variates

Again suppose we want to use simulation to estimate E[g(X)] where X = (X 1 , . . . , X n ). But now suppose that for some function f the expected value of f (X) is known—say, E[ f (X)] = µ. Then for any constant a we can also use W = g(X) + a( f (X) − µ) as an estimator of E[g(X)]. Now, Var(W ) = Var(g(X)) + a 2 Var( f (X)) + 2a Cov(g(X), f (X)) Simple calculus shows that the preceding is minimized when a=

−Cov( f (X), g(X)) Var( f (X))

and, for this value of a, Var(W ) = Var(g(X)) −

[Cov( f (X), g(X))]2 Var( f (X))

Because Var( f (X)) and Cov( f (X), g(X)) are usually unknown, the simulated data should be used to estimate these quantities. Dividing the preceding equation by Var(g(X)) shows that Var(W ) = 1 − Corr2 ( f (X), g(X)) Var(g(X)) where Corr(X, Y ) is the correlation between X and Y . Consequently, the use of a control variate will greatly reduce the variance of the simulation estimator whenever f (X) and g(X) are strongly correlated. Example 11.20 Consider a continuous-time Markov chain that, upon entering state i, spends an exponential time with rate vi in that state before making a transition into some other state, with the transition being into state j with probability Pi, j , i ! 0, j ̸= i. Suppose that costs are incurred at rate C(i) ! 0 per unit time whenever the chain is in

Simulation

689

state i, i ! 0. With X (t) equal to the state at time t, and α being a constant such that 0 < α < 1, the quantity ! ∞ W = e−αt C(X (t)) dt 0

represents the total discounted cost. For a given initial state, suppose we want to use simulation to estimate E[W ]. Whereas at first it might seem that we cannot obtain an unbiased estimator without simulating the continuous-time Markov chain for an infinite amount of time (which is clearly impossible), we can make use of the results of Example 5.1, which gives the equivalent expression for E[W ]: '! T ( E[W ] = E C(X (t)) dt 0

where T is an exponential random variable with rate α that is independent of the continuous-time Markov chain. Therefore, we can first generate the value of T , then generate the states of the continuous-time Markov chain up to time T , to obtain the .T unbiased estimator 0 C(X (t)) dt. Because all the cost rates are nonnegative this estimator is strongly positively correlated with T , which will thus make an effective control variate. "

Example 11.21 (A Queueing System) Let Dn+1 denote the delay in queue of the n+1 customer in a queueing system in which the interarrival times are independent and identically distributed (i.i.d.) with distribution F having mean µ F and are independent of the service times, which are i.i.d. with distribution G having mean µG . If X i is the interarrival time between arrival i and i + 1, and if Si is the service time of customer i, i ! 1, we may write Dn+1 = g(X 1 , . . . , X n , S1 , . . . , Sn )

To take into account the possibility that the simulated variables X i , Si may by chance be quite different from what might be expected we can let f (X 1 , . . . , X n , S1 , . . . , Sn ) =

n & i=1

(Si − X i )

As E[ f (X, S)] = n(µG − µ F ) we could use g(X, S) + a[ f (X, S) − n(µG − µ F )]

as an estimator of E[Dn+1 ]. Since Dn+1 and f are both increasing functions of Si , −X i , i = 1, . . . , n it follows from Theorem 11.1 that f (X, S) and Dn+1 are positively correlated, and so the simulated estimate of a should turn out to be negative. If we wanted to estimate the expected sum of the delays in queue of the first N (T ) ) N (T ) arrivals, then we could use i=1 Si as our control variable. Indeed as the arrival process is usually assumed independent of the service times, it follows that ⎤ ⎡ N (T ) & Si ⎦ = E[S]E[N (T )] E⎣ i=1

690

Introduction to Probability Models

where E[N (T )] can either be computed by the method suggested in Section 7.8 or estimated from the simulation as in Example 11.18. This control variable could also be used if the arrival process were a nonhomogeneous Poisson with rate λ(t); in this case, E[N (T )] =

11.6.4

λ(t) dt

Importance Sampling

Let X = (X 1 , . . . , X n ) denote a vector of random variables having a joint density function (or joint mass function in the discrete case) f (x) = f (x1 , . . . , xn ), and suppose that we are interested in estimating ! θ = E[h(X)] = h(x) f (x) dx where the preceding is an n-dimensional integral. (If the X i are discrete, then interpret the integral as an n-fold summation.) Suppose that a direct simulation of the random vector X, so as to compute values of h(X), is inefficient, possibly because (a) it is difficult to simulate a random vector having density function f (x), or (b) the variance of h(X) is large, or (c) a combination of (a) and (b). Another way in which we can use simulation to estimate θ is to note that if g(x) is another probability density such that f (x) = 0 whenever g(x) = 0, then we can express θ as ! h(x) f (x) g(x) dx θ= g(x) ' ( h(X) f (X) = Eg (11.14) g(X) where we have written E g to emphasize that the random vector X has joint density g(x). It follows from Equation (11.14) that θ can be estimated by successively generating values of a random vector X having density function g(x) and then using as the estimator the average of the values of h(X) f (X)/g(X). If a density function g(x) can be chosen so that the random variable h(X) f (X)/g(X) has a small variance then this approach— referred to as importance sampling—can result in an efficient estimator of θ . Let us now try to obtain a feel for why importance sampling can be useful. To begin, note that f (X) and g(X) represent the respective likelihoods of obtaining the vector X when X is a random vector with respective densities f and g. Hence, if X is distributed according to g, then it will usually be the case that f (X) will be small in relation to g(X) and thus when X is simulated according to g the likelihood ratio f (X)/g(X) will usually be small in comparison to 1. However, it is easy to check that its mean is 1: ' ( ! ! f (X) f (x) = g(x) d x = f (x) dx = 1 Eg g(X) g(x)

Simulation

691

Thus we see that even though f (X)/g(X) is usually smaller than 1, its mean is equal to 1; thus implying that it is occasionally large and so will tend to have a large variance. So how can h(X) f (X)/g(X) have a small variance? The answer is that we can sometimes arrange to choose a density g such that those values of x for which f (x)/g(x) is large are precisely the values for which h(x) is exceedingly small, and thus the ratio h(X) f (X)/g(X) is always small. Since this will require that h(x) sometimes be small, importance sampling seems to work best when estimating a small probability; for in this case the function h(x) is equal to 1 when x lies in some set and is equal to 0 otherwise. We will now consider how to select an appropriate density g. . We will find that the so-called tilted densities are useful. Let M(t) = E f [et X ] = et x f (x) d x be the moment generating function corresponding to a one-dimensional density f . Definition 11.2 f t (x) =

A density function

et x f (x) M(t)

is called a tilted density of f, −∞ < t < ∞.

A random variable with density f t tends to be larger than one with density f when t > 0 and tends to be smaller when t < 0. In certain cases the tilted distributions f t have the same parametric form as does f . Example 11.22

If f is the exponential density with rate λ then

f t (x) = Cet x λe−λx = λCe−(λ−t)x where C = 1/M(t) does not depend on x. Therefore, for t # λ, f t is an exponential density with rate λ − t. If f is a Bernoulli probability mass function with parameter p, then f (x) = p x (1 − p)1−x , x = 0, 1 Hence, M(t) = E f [et X ] = pet + 1 − p and so 1 ( pet )x (1 − p)1−x M(t) 2x 1 21−x 1 1− p pet = pet + 1 − p pet + 1 − p

f t (x) =

(11.15)

That is, f t is the probability mass function of a Bernoulli random variable with parameter pt =

pet pet + 1 − p

We leave it as an exercise to show that if f is a normal density with parameters µ and " σ 2 then f t is a normal density with mean µ + σ 2 t and variance σ 2 .

692

Introduction to Probability Models

In certain situations the quantity of interest is the sum of the independent random variables X 1 , . . . , X n . In this case the joint density f is the product of one-dimensional densities. That is, f (x1 , . . . , xn ) = f 1 (x1 ) · · · f n (xn ) where f i is the density function of X i . In this situation it is often useful to generate the X i according to their tilted densities, with a common choice of t employed. Example 11.23 Let X 1 , . . . , X n be independent random variables having respective probability density (or mass) functions f i , for i=1, . . . , n. Suppose we are interested in approximating the probability that their sum is at least as large as a, where a is much larger than the mean of the sum. That is, we are interested in θ = P{S ! a}

)n )n X i , and where a > i=1 E[X i ]. Letting I {S ! a} equal 1 if S ! a where S = i=1 and letting it be 0 otherwise, we have that θ = E f [I {S ! a}] where f = ( f 1 , . . . , f n ). Suppose now that we simulate X i according to the tilted mass function f i,t , i = 1, . . . , n, with the value of t, t > 0 left to be determined. The importance sampling estimator of θ would then be θˆ = I {S ! a} Now,

< f i (X i ) f i,t (X i )

f i (X i ) = Mi (t)e−t X i f i,t (X i ) and so θˆ = I {S ! a}M(t)e−t S

= where M(t) = Mi (t) is the moment generating function of S. Since t > 0 and I {S ! a} is equal to 0 when S < a, it follows that I {S ! a}e−t S # e−ta and so θˆ # M(t)e−ta To make the bound on the estimator as small as possible we thus choose t, t > 0, to minimize M(t)e−ta . In doing so, we will obtain an estimator whose value on each

Simulation

693

iteration is between 0 and mint M(t)e−ta . It can be shown that the minimizing t, call it t ∗ , is such that 7 n 8 & E t ∗ [S] = E t ∗ Xi = a i=1

where, in the preceding, we mean that the expected value is to be taken under the assumption that the distribution of X i is f i,t ∗ for i = 1, . . . , n. For instance, suppose that X 1 , . . . , X n are independent Bernoulli random variables having respective parameters pi , for i = 1, . . . , n. Then, if we generate the X i according to their tilted mass functions pi,t , i = 1, . . . , n, the importance sampling estimator of θ = P{S ! a} is θˆ = I {S ! a}e−t S

n < $ i=1

pi et + 1 − pi

Since pi,t is the mass function of a Bernoulli random variable with parameter pi et / ( pi et + 1 − pi ) it follows that 7 n 8 n & & pi et Xi = Et pi et + 1 − pi i=1

i=1

The value of t that makes the preceding equal to a can be numerically approximated and then utilized in the simulation. As an illustration, suppose that n = 20, pi = 0.4, and a = 16. Then E t [S] = 20

0.4et 0.4et + 0.6

Setting this equal to 16 yields, after a little algebra, ∗

et = 6 Thus, if we generate the Bernoullis using the parameter ∗

0.4et = 0.8 ∗ 0.4et + 0.6

then because

∗

M(t ∗ ) = (0.4et + 0.6)20 and e−t

∗S

= (1/6) S

we see that the importance sampling estimator is θˆ = I {S ! 16}(1/6) S 320 It follows from the preceding that θˆ # (1/6)16 320 = 81/216 = 0.001236

694

Introduction to Probability Models

That is, on each iteration the value of the estimator is between 0 and 0.001236. Since, in this case, θ is the probability that a binomial random variable with parameters 20, 0.4 is at least 16, it can be explicitly computed with the result θ = 0.000317. Hence, the raw simulation estimator I , which on each iteration takes the value 0 if the sum of the Bernoullis with parameter 0.4 is less than 16 and takes the value 1 otherwise, will have variance Var(I ) = θ (1 − θ ) = 3.169 × 10−4 On the other hand, it follows from the fact that 0 # θˆ # 0.001236 that (see Exercise 33) Var(θˆ ) # 2.9131 × 10−7

Example 11.24 Consider a single-server queue in which the times between successive customer arrivals have density function f and the service times have density g. Let Dn denote the amount of time that the nth arrival spends waiting in queue and suppose we are interested in estimating α = P{Dn ! a} when a is much larger than E[Dn ]. Rather than generating the successive interarrival and service times according to f and g, respectively, they should be generated according to the densities f −t and gt , where t is a positive number to be determined. Note that using these distributions as opposed to f and g will result in smaller interarrival times (since −t < 0) and larger service times. Hence, there will be a greater chance that Dn > a than if we had simulated using the densities f and g. The importance sampling estimator of α would then be αˆ = I {Dn > a}et(Sn −Yn ) [M f (−t)Mg (t)]n where Sn is the sum of the first n interarrival times, Yn is the sum of the first n service times, and M f and Mg are the moment generating functions of the densities f and g, respectively. The value of t used should be determined by experimenting with a variety of different choices. "

11.7

Determining the Number of Runs

Suppose that we are going to use simulation to generate r independent and identically distributed random variables Y (1) , . . . , Y (r ) having mean µ and variance σ 2 . We are then going to use Y (1) + · · · + Y (r ) Y¯r = r as an estimate of µ. The precision of this estimate can be measured by its variance Var(Y¯r ) = E[(Y¯r − µ)2 ] = σ 2 /r

Simulation

695

Hence, we would want to choose r , the number of necessary runs, large enough so that σ 2 /r is acceptably small. However, the difficulty is that σ 2 is not known in advance. To get around this, you should initially simulate k runs (where k ! 30) and then use the simulated values Y (1) , . . . , Y (k) to estimate σ 2 by the sample variance k & i=1

(Y (i) − Y¯k )2 /(k − 1)

Based on this estimate of σ 2 the value of r that attains the desired level of precision can now be determined and an additional r − k runs can be generated.

11.8 11.8.1

Generating from the Stationary Distribution of a Markov Chain Coupling from the Past

Consider an irreducible Markov chain with states 1, . . . , m and transition probabilities Pi, j and suppose we want to generate the value of a random variable whose distribution is that of the stationary distribution of this Markov chain. Whereas we could approximately generate such a random variable by arbitrarily choosing an initial state, simulating the resulting Markov chain for a large fixed number of time periods, and then choosing the final state as the value of the random variable, we will now present a procedure that generates a random variable whose distribution is exactly that of the stationary distribution. If, in theory, we generated the Markov chain starting at time −∞ in any arbitrary state, then the state at time 0 would have the stationary distribution. So imagine that we do this, and suppose that a different person is to generate the next state at each of these times. Thus, if X (−n), the state at time −n, is i, then person −n would generate a random variable that is equal to j with probability Pi, j , j = 1, . . . , m, and the value generated would be the state at time −(n −1). Now suppose that person −1 wants to do his random variable generation early. Because he does not know what the state at time −1 will be, he generates a sequence of random variables N−1 (i), i = 1, . . . , m, where N−1 (i), the next state if X (−1) = i, is equal to j with probability Pi, j , j = 1, . . . , m. If it results that X (−1) = i, then person −1 would report that the state at time 0 is S−1 (i) = N−1 (i), i = 1, . . . , m (That is, S−1 (i) is the simulated state at time 0 when the simulated state at time −1 is i.) Now suppose that person −2, hearing that person −1 is doing his simulation early, decides to do the same thing. She generates a sequence of random variables N−2 (i), i = 1, . . . , m, where N−2 (i) is equal to j with probability Pi, j , j = 1, . . . , m. Consequently, if it is reported to her that X (−2) = i, then she will report that X (−1) = N−2 (i). Combining this with the early generation of person −1 shows

696

Introduction to Probability Models

that if X (−2) = i, then the simulated state at time 0 is S−2 (i) = S−1 (N−2 (i)), i = 1, . . . , m Continuing in the preceding manner, suppose that person −3 generates a sequence of random variables N−3 (i), i = 1, . . . , m, where N−3 (i) is to be the generated value of the next state when X (−3) = i. Consequently, if X (−3) = i then the simulated state at time 0 would be S−3 (i) = S−2 (N−3 (i)), i = 1, . . . , m Now suppose we continue the preceding, and so obtain the simulated functions S−1 (i), S−2 (i), S−3 (i), . . . , i = 1, . . . , m Going backward in time in this manner, we will at some time, say −r , have a simulated function S−r (i) that is a constant function. That is, for some state j, S−r (i) will equal j for all states i = 1, . . . , m. But this means that no matter what the simulated values from time −∞ to −r , we can be certain that the simulated value at time 0 is j. Consequently, j can be taken as the value of a generated random variable whose distribution is exactly that of the stationary distribution of the Markov chain. Example 11.25 Consider a Markov chain with states 1, 2, 3 and suppose that simulation yielded the values ⎧ ⎨3, if i = 1 N−1 (i) = 2, if i = 2 ⎩ 2, if i = 3

and

Then

then

⎧ ⎨1, N−2 (i) = 3, ⎩ 1,

if i = 1 if i = 2 if i = 3

⎧ ⎨3, N−3 (i) = 1, ⎩ 1,

if i = 1 if i = 2 if i = 3

⎧ ⎨3, if i = 1 S−2 (i) = 2, if i = 2 ⎩ 3, if i = 3

⎧ ⎨3, S−3 (i) = 3, ⎩ 3,

if i = 1 if i = 2 if i = 3

Therefore, no matter what the state is at time −3, the state at time 0 will be 3.

Simulation

697

Remark The procedure developed in this section for generating a random variable whose distribution is the stationary distribution of the Markov chain is called coupling from the past.

11.8.2

Another Approach

Consider a Markov chain whose state space is the nonnegative integers. Suppose the chain has stationary probabilities, and denote them by πi , i ! 0. We now present another way of simulating a random variable whose distribution is given by the πi , i ! 0, which can be utilized if the chain satisfies the following property. Namely, that for some state, which we will call state 0, and some positive number α Pi,0 ! α > 0 for all states i. That is, whatever the current state, the probability that the next state will be 0 is at least some positive value α. To simulate a random variable distributed according to the stationary probabilities, start by simulating the Markov chain in the obvious manner. Namely, whenever the chain is in state i, generate a random variable that is equal to j with probability Pi, j , j ! 0, and then set the next state equal to the generated value of this random variable. In addition, however, whenever a transition into state 0 occurs a coin, whose probability of coming up heads depends on the state from which the transition occurred, is flipped. Specifically, if the transition into state 0 was from state i, then the coin flipped has probability α/Pi,0 of coming up heads. Call such a coin an i-coin, i ! 0. If the coin comes up heads then we say that an event has occurred. Consequently, each transition of the Markov chain results in an event with probability α, implying that events occur at rate α. Now say that an event is an i-event if it resulted from a transition out of state i; that is, an event is an i-event if it resulted from the flip of an i-coin. Because πi is the proportion of transitions that are out of state i, and each such transition will result in an i-event with probability α, it follows that the rate at which i-events occur is απi . Therefore, the proportion of all events that are i-events is απi /α = πi , i ! 0. Now, suppose that X 0 = 0. Fix i, and let I j equal 1 if the j th event that occurs is an i-event, and let I j equal 0 otherwise. Because an event always leaves the chain in state 0 it follows that I j , j ! 1, are independent and identically distributed random variables. Because the proportion of the I j that are equal to 1 is πi , we see that πi = lim

n→∞

I1 + . . . + In n

= E[I1 ] = P(I1 = 1)

where the second equality follows from the strong law of large numbers. Hence, if we let T = min{n > 0 : an event occurs at time n}

denote the time of the first event, then it follows from the preceding that πi = P(I1 = 1) = P(X T −1 = i)

698

Introduction to Probability Models

As the preceding is true for all states i, it follows that X T −1 , the state of the Markov chain at time T − 1, has the stationary distribution.

Exercises *1. Suppose it is relatively easy to simulate from the distributions Fi , i = 1, 2, . . . , n. If n is small, how can we simulate from F(x) =

n &

Pi Fi (x),

Pi ! 0,

i=1

& i

Pi = 1?

Give a method for simulating from ⎧ −2x + 2x ⎪ ⎪1 − e , 0 k

720

Introduction to Probability Models

(d) P3 = .62 = .36

P4 = .4P4 + .24P3 ⇒ P4 = .144 P5 = .4P5 + .24P4 + .144P3 ⇒ P5 = .144 P6 = .4P6 + .24P5 + .144P4 ⇒ P6 = .64(.144)

P7 = .4P7 + .24P6 + .144P5 ⇒ P7 = .4(.144) P8 = .4P8 + .24P7 + .144P6 ⇒ P8 = (1.96)(.144)(.16) = 0.0451584

55. Conditioning on the result of the trial following the first time that there have been k − 1 successes in a row gives Mk = Mk−1 + p(1) + (1 − p)Mk Hence, Mk = 1 +

1 p Mk−1 ,

yielding that, with α = 1/ p

Mk = 1 + α(1 + α Mk−2 )

= 1 + α + α 2 Mk−2 = 1 + α + α 2 + α 3 Mk−3 =

k−1 %

αi

i=0

58. (a) r /λ; (b) E[Var(N |Y )] + Var(E[N |Y ]) = E[Y ] + Var(Y ) = λr + λr2 λ (c) With p = λ+1 * P(N = n) = P(N = n|Y = y) f Y (y) dy * y n λe−λy (λy)r −1 dy = e−y n! (r − 1)! * λr = e−(λ+1)y y n+r −1 dy n!(r − 1)! * λr e−x x n+r −1 d x = n!(r − 1)!(λ + 1)n+r λr (n + r − 1)! = n!(r − 1)!(λ + 1)n+r # $ n +r −1 r = p (1 − p)n r −1 (d) The total number of failures before the r th success when each trial is independently a success with probability p is distributed as X − r where X , equal to the number of trials until the r th success, is negative binomial. Hence, # $ n +r −1 r P(X − r = n) = P(X = n + r ) = p (1 − p)n r −1

Solutions to Starred Exercises

60.

721

(a) Intuitive that f ( p) is increasing in p, since the larger p is the greater is the advantage of going first. (b) 1. (c) 21 since the advantage of going first becomes nil. (d) Condition on the outcome of the first flip: f ( p) = P{I wins | h} p + P{I wins | t}(1 − p) = p + [1 − f ( p)](1 − p) Therefore, f ( p) =

1 2− p

67. Part (a) is proven by noting that a run of j successive heads can occur within the first n flips in two mutually exclusive ways. Either there is a run of j successive heads within the first n − 1 flips; or there is no run of j successive heads within the first n − j − 1 flips, flip n − j is not a head, and flips n − j + 1 through n are all heads. Let A be the event that a run of j successive heads occurs within the first n, (n " j), flips. Conditioning on X , the trial number of the first non-head, gives the following % P(A|X = k) p k−1 (1 − p) P j (n) = k

= = =

j % k=1

j % i=1

P(A|X = k) p k−1 (1 − p) + P j (n − k) p k−1 (1 − p) +

∞ %

k= j+1

∞ %

k= j+1

P(A|X = k) p k−1 (1 − p)

p k−1 (1 − p)

P j (n − k) p k−1 (1 − p) + p j

73. Condition on the value of the sum prior to going over 100. In all cases the most likely value is 101. (For instance, if this sum is 98 then the final sum is equally likely to be either 101, 102, 103, or 104. If the sum prior to going over is 95, then the final sum is 101 with certainty.) 84. Suppose in Example 3.32 that a point is only won if the winner of the rally was the server of that rally. (a) If A is currently serving, what is the probability that A wins the next point? (b) Explain how to obtain the final score probabilities. 93. (a) By symmetry, for any value of (T1 , . . . , Tm ), the random vector (I1 , . . . , Im ) is equally likely to be any of the m! permutations.

722

Introduction to Probability Models

(b) E[N ] =

m %

E[N |X = i]P{X = i}

i=1

m 1 % E[N |X = i] m i=1 : 9m−1 ( 1 %' E[Ti ] + E[N ] + E[Tm−1 ] = m

i=1

where the final equality used the independence of X and Ti . Therefore, E[N ] = E[Tm−1 ] + (c) E[Ti ] =

i %

(d) E[N ] =

m−1 %

= = =

j=1

m−1 % j=1

m−1 %

E[Ti ]

m−1 i

%% m m + m+1− j m+1− j i=1 j=1

m−1 % m−1 % m m + m+1− j m+1− j j=1 i= j

m + m+1− j

m−1 %# j=1

i=1

m m+1− j

j=1

m−1 %

m−1 % j=1

m(m − j) m+1− j

m m(m − j) + m+1− j m+1− j

= m(m − 1) 97. Let X be geometric with parameter p. To compute Var (X ), we will use the conditional variance formula, conditioning on the outcome of the first trial. Let I equal 1 if the first trial is a success, and let it equal 0 otherwise. If I = 1, then X = 1; since the variance of a constant is 0, this gives Var(X |I = 1) = 0 On the other hand, if I = 0 then the conditional distribution of X given that I = 0 is the same as the unconditional distribution of 1 (the first trial) plus a geometric with parameter p (the number of additional trials needed for a success). Therefore, Var(X |I = 0) = Var(X ) yielding E[Var(X |I )] = Var(X |I = 1)P(I = 1) + Var(X |I = 0)P(I = 0) = (1 − p)Var(X )

Solutions to Starred Exercises

723

Similarly, E[X |I = 1] = 1,

E[X |I = 0] = 1 + E[X ] = 1 +

1 p

which can be written as 1 (1 − I ) p

E[X |I ] = 1 + yielding Var(E[X |I ]) =

1 1 1− p Var(I ) = 2 p(1 − p) = 2 p p p

The conditional variance formula now gives Var(X ) = E[Var(X |I )] + Var(E[X |I ]) 1− p = (1 − p)Var(X ) + p or Var(X ) =

1− p p2

98. E[N S] = E[E[N S|N ]] = E[N E[S|N ]] = E[N 2 E[X ]] = E[X ]E[N 2 ]. Hence, Cov(N , S) = E[X ]E[N 2 ] − (E[N ])2 E[X ] = E[X ]Var(N ) 99.

(a) p k , (b) In order for N = k + r the pattern must not have occurred in the first r − 1 trials, trial r must be a failure, and trials r +1, . . . , r +k must all be successes. ∞ ∞ % % (c) 1 − P(N = k) = P(N = k +r ) = P(N > r − 1)qp k = E[N ]qp k r =1

r =1

Chapter 4 1. P01 = 1, P10 = 19 , P11 =

P12 =

4 9, 4 9,

P21 = 49 , P22 = P23 =

4 9 1 9

P32 = 1

¯ signifies that the ¯ 1, ¯ 2}, ¯ where state i(i) 4. Let the state space be S = {0, 1, 2, 0, present value is i, and the present day is even (odd). 10 = .5078. 9. P0,3 In a sequence of independent flips of a fair coin that comes up heads with probability .6, what is the probability that there is a run of 3 consecutive heads within the first 10 flips?

724

Introduction to Probability Models

16. If Pi j were (strictly) positive, then P jin would be 0 for all n (otherwise, i and j would communicate). But then the process, starting in i, has a positive probability of at least Pi j of never returning to i. This contradicts the recurrence of i. Hence Pi j = 0. 21. The transition probabilities are Pi, j

+ 1 − 3α, = α,

if j = i if j ̸= i

By symmetry, 1 (1 − Piin ), 3

Pinj =

j ̸= i

So, let us prove by induction that Pi,n j =

,1 4 1 4

+ 34 (1 − 4α)n −

1 n 4 (1 − 4α)

if j = i

if j ̸= i

As the preceding is true for n = 1, assume it for n. To complete the induction proof, we need to show that Pi,n+1 j =

,1 4 1 4

+ 43 (1 − 4α)n+1 −

1 n+1 4 (1 − 4α)

if j = i

if j ̸= i

Now, n+1 n Pi,i = Pi,i Pi,i +

Pi,n j P j,i

j̸=i

$ # $ 1 1 1 3 n n + (1 − 4α) (1 − 3α) + 3 − (1 − 4α) α = 4 4 4 4 1 3 = + (1 − 4α)n (1 − 3α − α) 4 4 1 3 = + (1 − 4α)n+1 4 4 #

By symmetry, for j ̸= i Pin+1 = j

1 1 1 (1 − Piin+1 ) = − (1 − 4α)n+1 3 4 4

and the induction is complete. By letting n → ∞ in the preceding, or by using that the transition probability matrix is doubly stochastic, or by just using a symmetry argument, we obtain that πi = 1/4, i = 1, 2, 3, 4.

Solutions to Starred Exercises

725

27. (a) It is a Markov chain because each individual’s state the next period depends only on its current state and not on any information about earlier times. (b) If i of the N individuals are currently active, then the number of actives in the next period is the sum of two independent random variables; Ri , the number of the i currently active who remain active in the next period; and Bi , the number of the N − i inactives who become active in the next period. Because Ri is binomial (i, α), and Bi is binomial (N − i, b), where b = 1 − β, we see that E[X n |X n−1 = i] = iα + (N − i)(1 − β) = N (1 − β) + (α + β − 1)i Hence, E[X n |X n−1 ] = N (1 − β) + (α + β − 1)X n−1 giving that E[X n ] = N (1 − β) + (α + β − 1)E[X n−1 ] Letting a = N (1 − β), b = α + β − 1, the preceding gives E[X n ] = a + bE[X n−1 ] = a + b(a + bE[X n−2 ]) = a + ba + b2 E[X n−2 ] = a + ba + b2 a + b3 E[X n−3 ]

Continuing this, we arrive at / . ; < E[X n ] = a 1 + b + · · · + bn−1 + bn E X 0 Thus,

/ . E[X n |X 0 = i] = a 1 + b + · · · + bn−1 + bn i

Note that

lim E[X n ] =

n→∞

1−β a =N 1−b 2−α−β

(c) With Ri , Bi as previously defined ' ( Pi, j = P Ri + Bi = j # $ % ' ( i = P Ri + Bi = j|Ri = k α i (1 − α)i−k k k # $ % #N − i $ i (1 − β) j−k β N −i− j+k α i (1 − α)i−k = j −k k k

where

'm ( r

= 0 if r < 0 or r > m.

726

Introduction to Probability Models

(d) Suppose N = 1. Then, with 1 standing for active and 0 for inactive, the limiting probabilities are such that ' ( π0 = π0 β + π1 1 − α ' ( π1 = π0 1 − β + π1 α π0 + π1 = 1 Solving yields π1 =

1−β 1−α , π0 = 2−α−β 2−α−β

Now consider the case of population size N . Because each member will, in steady state, be active with probability π1 and because each of the members changes states independently of each other it follows that the steady state number of actives has a binomial (N , π1 ) distribution. Hence, the long-run proportion of time that exactly j people are active is $j # $N − j # $# 1−α 1−β N π j (N ) = j 2−α−β 2−α−β 1−α Note that the steady state expected number of actives is N 2−α−β , in accord with what we saw in part (b).

32. With the state being the number of on switches this is a three-state Markov chain. The equations for the long-run proportions are 9 1 1 π0 + π1 + π2 , 16 4 16 3 1 3 π1 = π0 + π1 + π2 , 8 2 8 π0 + π1 + π2 = 1

π0 =

This gives the solution π0 =

2 , 7

π1 =

3 , 7

π2 =

2 7

38. (a) .4 p + .6 p 2 ; 2 + 2P 2 = .32 + 1.36 = 1.68, (b) P2,1 2.2 (c) π1 p + π2 p 2 = (1/3) p + (2/3) p 2

Capa plays either one or two chess games every day, with the number of games that she plays on successive days being a Markov chain with transition probabilities P1,1 = .2,

P1,2 = .8

P2,1 = .4,

P2,2 = .6

Capa wins each game with probability p. Suppose she plays 2 games on Monday.

Solutions to Starred Exercises

727

(a) What is the probability that she wins all the games she plays on Tuesday? (b) What is the expected number of games that she plays on Wednesday? (c) In the long run, on what proportion of days does Capa win all her games. 39. (a) Follows by symmetry because in any state there are an infinite number of states that are smaller and an infinite number that are larger and at each stage one & likely to go either to the next higher or next lower state. & is equally π = (b) 1 = 0 or is ∞ if π1 > 0. i i i π1 which is 0 if π& (c) Because there is no solution of i πi = 1, we can conclude that πi = π1 = 0 and so the chain is null recurrent. 40. The chain is doubly stochastic and so π1 = 1/12. Hence, 1/πi = 12. j−1 j−1 % % = P(enters j directly from i) = ei Pi, j e j 41. i=0

i=0

e1 = 1/3 e2 = 1/3 + 1/3(1/3) = 4/9

e3 = 1/3 + 1/3(1/3) + 4/9(1/3) = 16/27 e4 = 1/3(1/3) + 4/9(1/3) + 16/27(1/3) = 37/81 e5 = 4/9(1/3) + 16/27(1/3) + 37/81(1/3) = 158/243

47. {Yn , n " 1} is a Markov chain with states (i, j). + 0, if j ̸= k P(i, j),(k,l) = if j = k P jl , where P jl is the transition probability for {X n }. lim P{Yn = (i, j)} = lim P{X n = i, X n+1 = j}

n→∞

= lim[P{X n = i}Pi j ] n

= πi Pi j

60. (a) Let Pi be the probability that state 3 is entered before state 4 given the initial state is i, i = 1, 2. Then, conditioning yields P1 = .4P1 + .3P2 + .2 P2 = .2P1 + .2P2 + .2 yielding that P1 = 11/21. (b) Letting m i denote the mean number of transitions until either state 3 or state 4 is entered, starting in state i. Then m 1 = 1 + .4m 1 + .3m 2

m 2 = 1 + .2m 1 + .2m 2 yielding that m 1 = 55/21.

728

Introduction to Probability Models

62. It is easy to verify that the stationary probabilities are πi = time to return to the initial position is n + 1. % % 68. (a) % πi Q i j = π j P ji = π j P ji = π j i

1 n+1 . Hence, the mean

(b) Whether persuing the sequence of states in the forward direction of time or in the reverse direction, the proportion of time the state is i will be the same.

Chapter 5 5. P(Y = n) = P(n − 1 < X < n) = e−λ(n−1) − e−λn = (e−λ )n−1 (1 − e−λ ) 7. P{X 1 < X 2 | min(X 1 , X 2 ) = t} = = =

P{X 1 < X 2 , min(X 1 , X 2 ) = t} P{min(X 1 , X 2 ) = t}

P{X 1 = t, X 2 > t} P{X 1 = t, X 2 > t} + P{X 2 = t, X 1 > t} f 1 (t)[1 − F2 (t)] f 1 (t)[1 − F2 (t)] + f 2 (t)[1 − F1 (t)]

Dividing through by [1 − F1 (t)][1 − F2 (t)] yields the result. (Of course, f i and Fi are the density and distribution function of X i , i = 1, 2.) To make the preceding derivation rigorous, we should replace “= t” by ∈ (t, t + ε) throughout and then let ε → 0. 8. Exponential with rate λ + µ. 10. (a) E[M X |M = X ] = E[M 2 |M = X ] = E[M 2 ] 2 = (λ + µ)2 (b) By the memoryless property of exponentials, given that M = Y, X is distributed as M + X ′ where X ′ is an exponential with rate λ that is independent of M. Therefore, E[M X |M = Y ] = E[M(M + X ′ )]

= E[M 2 ] + E[M]E[X ′ ] 1 2 + = (λ + µ)2 λ(λ + µ) µ λ (c) E[M X ] = E[M X |M = X ] + E[M X |M = Y ] λ+µ λ+µ 2λ + µ = λ(λ + µ)2

Solutions to Starred Exercises

729

Therefore, Cov(X, M) =

λ λ(λ + µ)2

µb λb a 14. (a) λaλ+λ b µa +λb µa +µb (b) Let F be the time of the first departure. Write F = T + A where T is the time of the first arrival and A is the additional time from then until the first departure. First take expectations and then condition on who arrives first to obtain

E[F] =

1 λa λb + E[A|a] + E[A|b] λa + λb λa + λb λa + λb

Now use E[A|a] =

1 1 λb + µa + λb µa + λb µa + µb

E[A|b] =

1 1 λa + µb + λa µb + λa µa + µb

and

18. (a) 1/(2µ). (b) 1/(4µ2 ), since the variance of an exponential is its mean squared. (c) and (d). By the lack of memory property of the exponential it follows that A, the amount by which X (2) exceeds X (1) , is exponentially distributed with rate µ and is independent of X (1) . Therefore, 1 1 + 2µ µ 1 1 5 + 2 = Var(X (2) ) = Var(X (1) + A) = 4µ2 µ 4µ2 E[X (2) ] = E[X (1) + A] =

19 Using that the winning time is exponential with rate r ≡ λa + λb , independent of who wins, gives that with X equal to amount that runner A wins * λa r λa E[X ] = R e−αt r e−r t dt = R λa + λb λa + λb r + α

23. (a) 21 . (b) ( 21 )n−1 . Whenever battery 1 is in use and a failure occurs the probability is 21 that it is not battery 1 that has failed. (c) ( 21 )n−i+1 , i > 1. (d) T is the sum of n − 1 independent exponentials with rate 2µ (since each time a failure occurs the time until the next failure is exponential with rate 2µ). (e) Gamma with parameters n − 1 and 2µ.

730

Introduction to Probability Models

34. (c) (d)

µA µB λ+µ A +µ B λ+µ B

λ λ λ+µ A +µ B λ+µ B

µB µA λ+µ A +µ B λ+µ A

35. The axioms are immediate. For instance, P(Ns (t + h) − Ns (t) = 1) = P(N (s + t + h) − N (s + t) = 1) = λh + o(h) ⎡ ⎤ N (t) ) E[S(t)|N (t) = n] = sE ⎣ X i |N (t) = n ⎦

36.

i=1

= sE = sE

2 n )

i=1 2 n )

X i |N (t) = n Xi

i=1

= s(E[X ])n = s(1/µ)n

Thus, E[S(t)] = s

= se

(1/µ)n e−λt (λt)n /n!

−λt

(λt/µ)n /n!

= se−λt+λt/µ

By the same reasoning and

E[S 2 (t)|N (t) = n] = s 2 (E[X 2 ])n = s 2 (2/µ2 )n 2

E[S 2 (t)] = s 2 e−λt+2λt/µ

40. The easiest way is to use Definition 5.3. It is easy to see that {N (t), t " 0} will also possess stationary and independent increments. Since the sum of two independent Poisson random variables is also Poisson, it follows that N (t) is a Poisson random variable with mean (λ1 + λ2 )t. −2 57. (a) e . (b) 2 P.M. (c) 1 − 5e−4 . 58. (a) It has the distribution of N (t) + 1 where N (t), t > 0 is a Poisson process with rate λ. Hence, it is distributed as 1 plus a Poisson with mean λt. & n+1 e−λt (λt)n /n! = pe−λt(1− p) (b) E[ p N ] = ∞ n=0 p 1 λ pe−λt(1− p) (t

(d)

+ λ1 )

Solutions to Starred Exercises

60. (a) (b)

731

1 9. 5 9.

64. (a) Since, given N (t), each arrival is uniformly distributed on (0, t) it follows that * t t ds = N (t) E[X |N (t)] = N (t) (t − s) t 2 0 (b) Let U1 , U2 , . . . be independent uniform (0, t) random variables. Then 2 n 3 % Var(X |N (t) = n) = Var (t − Ui ) i=1

= n Var(Ui ) = n

t2 12

λtt 2 λtt 2 λt 3 + = 4 12 3 8 s+t0 8t ¯ 69. Poisson with mean λ t0 F(y)dy + λ 00 F(y)dy. *79. It is a nonhomogeneous Poisson process with intensity function p(t)λ(t), t > 0. 84. There is a record whose value is between t and t + dt if the first X larger than t lies between t and t + dt. From this we see that, independent of all record values less than t, there will be one between t and t + dt with probability λ(t) dt where λ(t) is the failure rate function given by =

λ(t) =

f (t) 1 − F(t)

Since the counting process of record values has, by the preceding, independent increments we can conclude (since there cannot be multiple record values because the X i are continuous) that it is a nonhomogeneous Poisson process with intensity function λ(t). When f is the exponential density, λ(t) = λ and so the counting process of record values becomes an ordinary Poisson process with rate λ. 91. To begin, note that , n % P X1 > X i = P{X 1 > X 2 }P{X 1 − X 2 > X 3 |X 1 > X 2 } 2

×P{X 1 − X 2 − X 3 > X 4 |X 1 > X 2 + X 3 } · · · ×P{X 1 − X 2 · · · − X n−1 > X n |X 1 > X 2 + · · · + X n−1 } # $n−1 1 = by lack of memory 2

732

Introduction to Probability Models

Hence, P

n % i=1

Xi − M

+ n % 7 % = P Xi > Xj = i=1

j̸=i

n 2n−1

98. (a) Start with D(t + h) = D(t) + D(t, t + h)

where D(t, t + h) is the discounted value of claims that occur between times t and t + h. Take expectations, and then condition on X , the number of claims made between times t and t + h to obtain M(t + h) = M(t) + E[D(t, t + h)|X = 1]λh + o(h) = M(t) + µe−αt λh + o(h)

o(h) (b) M(t + h) − M(t) = µe−αt λ + h h and let h go to 0. (c) This is immediate upon integration. * t y 99. P(X > t|M1 = y) = exp{− (λ + ye−αs )ds} = exp{−λt − (1 − e−αt )} α 0 Hence,

P(X > t) =

∞ 0

exp{−λt −

y (1 − e−αt )} f (y) dy α

where f is the density function of the value of a mark.

Chapter 6 2. Let N A (t) be the number of organisms in state A and let N B (t) be the number of organisms in state B. Then {N A (t), N B (t)} is a continuous-Markov chain with ν{n,m} = αn + βm αn P{n,m},{n−1,m+1} = αn + βm βm P{n,m},{n+2,m−1} = αn + βm

4. Let N (t) denote the number of customers in the station at time t. Then {N (t)} is a birth and death process with

λn = λαn ,

µn = µ

(a) Yes! (b) For n = (n 1 , . . . , n i , n i+1 , . . . , n k−1 ) let

Si (n) = (n 1 , . . . , n i − 1, n i+1 + 1, . . . , n k−1 ),

i = 1, . . . , k − 2

Solutions to Starred Exercises

733

Sk−1 (n) = (n 1 , . . . , n i , n i+1 , . . . , n k−1 − 1), S0 (n) = (n 1 + 1, . . . , n i , n i+1 , . . . , n k−1 ).

Then qn,Si (n) = n i µ, qn,S0 (n) = λ 11.

i = 1, . . . , k − 1

(b) Follows from the hint about using the lack of memory property and the fact that εi , the minimum of j − (i − 1) independent exponentials with rate λ, is exponential with rate ( j − i − 1)λ. (c) From parts (a) and (b) 0 1 P{T1 + · · · + T j # t} = P max X i # t = (1 − e−λt ) j 1!i! j

(d) With all probabilities conditional on X (0) = 1, P1 j (t) = P{X (t) = j} = P{X (t) " j} − P{X (t) " j + 1} = P{T1 + · · · + T j # t} − P{T1 + · · · + T j+1 # t} (e) The sum of i independent geometrics, each having parameter p = e−λt , is a negative binomial with parameters i, p. The result follows since starting with an initial population of i is equivalent to having i independent Yule processes, each starting with a single individual. 16. Let the state be 2: an acceptable molecule is attached 0: no molecule attached 1: an unacceptable molecule is attached. Then, this is a birth and death process with balance equations µ1 P1 = λ(1 − α)P0

µ2 P2 = λα P0 & Since 20 Pi = 1, we get

" ! 1 − α µ2 −1 µ2 λαµ1 P2 = 1 + + = λα α µ1 λαµ1 + µ1 µ2 + λ(1 − α)µ2

where P2 is the percentage of time the site is occupied by an acceptable molecule. The percentage of time the site is occupied by an unacceptable molecule is P1 =

1 − α µ2 λ(1 − α)µ2 P1 = α µ1 λαµ1 + µ1 µ2 + λ(1 − α)µ2

734

Introduction to Probability Models

19. There are four states. Let state 0 mean that no machines are down, state 1 that machine 1 is down and 2 is up, state 2 that machine 1 is up and 2 is down, and state 3 that both machines are down. The balance equations are as follows: (λ1 + λ2 )P0 = µ1 P1 + µ2 P2 (µ1 + λ2 )P1 = λ1 P0

(λ1 + µ2 )P2 = λ2 P0 + µ1 P3 µ1 P3 = λ2 P1 + λ1 P2 P0 + P1 + P2 + P3 = 1

The equations are easily solved and the proportion of time machine 2 is down is P2 + P3 . 24. We will let the state be the number of taxis waiting. Then, we get a birth and death process with λn = 1, µn = 2. This is an M/M/1. Therefore:

1 1 = = 1. µ−λ 2−1 (b) The proportion of arriving customers that gets taxis is the proportion of arriving customers that find at least one taxi waiting. The rate of arrival of such customers is 2(1 − P0 ). The proportion of such arrivals is therefore $ # λ 1 λ 2(1 − P0 ) = 1 − P0 = 1 − 1 − = = 2 µ µ 2 (a) Average number of taxis waiting =

28. Let Pixj , vix denote the parameters of the X (t) and Pi j , vi of the Y (t) process; and y let the limiting probabilities be Pix , Pi , respectively. By independence we have that for the Markov chain {X (t), Y (t)} its parameters are y

v(i,l) = vix + vl , vx P(i,l)( j,l) = x i y Pixj , vi + vl y

P(i,l)(i,k) = and

vix + vl

Plk ,

lim P{(X (t), Y (t)) = (i, j)} = Pix P j

t→∞

Hence, we need to show that y

Pix Pl vix Pixj = P jx Pl v xj P jix

(That is, the rate from (i, l) to ( j, l) equals the rate from ( j, l) to (i, l).) But this follows from the fact that the rate from i to j in X (t) equals the rate from j to i; that is, Pix vix Pixj = P jx v xj P jix

The analysis is similar in looking at pairs (i, l) and (i, k).

Solutions to Starred Exercises

735

33. Suppose first that the waiting room is of infinite size. Let X i (t) denote the number of customers at server i, i = 1, 2. Then since each of the M/M/1 processes {X 1 (t)} is time reversible, it follows from Exercise 28 that the vector process {(X 1 (t), (X (t)), t " 0} is a time reversible Markov chain. Now the process of interest is just the truncation of this vector process to the set of states A where A = {(0, m): m # 4} ∪ {(n, 0): n # 4} ∪ {(n, m): nm > 0, n + m # 5} Hence, the probability that there are n with server 1 and m with server 2 is $ # $m # $ # $n # λ1 λ2 λ2 λ1 Pn,m = k 1− 1− µ1 µ1 µ2 µ2 # $n # $m λ2 λ1 =C , (n, m) ∈ A µ1 µ2 The constant C is determined from % Pn,m = 1

37.

where the sum is over all (n, m) in A. (a) The state is (n 1 , . . . , n k ) if there are n i type i patients in the hospital, for all i = 1, . . . , k. (b) It is a M/M/∞ birth and death process, and thus time reversible. (c) Because Ni (t), t " 0 are independent processes for i = 1, . . . , k, the vector process is a time reversible continuous time Markov chain. k ) e−λi /µi (λi /µi )n i /n i ! P(n 1 , . . . , n k ) = (d) i=1

(e) As a truncation of a time reversible continuous time Markov chain, it has stationary probabilites P(n 1 , . . . , n k ) = K where A = {(n 1 , . . . , n k ) : K

k )

(n 1 ,...,n k )∈A i=1

k ) i=1

e−λi /µi (λi /µi )n i /n i !, (n 1 , . . . , n k ) ∈ A

i=1 n i wi

# C}, and K is such that

e−λi /µi (λi /µi )n i /n i ! = 1

(f) With ri equal to the rate at which type i patients are admitted, % λi P(n 1 , . . . , n k ) ri = (g)

i=1 ri /;

(n 1 ,...,n i +1,...,n k )∈A

i=1 λi

736

38.

Introduction to Probability Models

(a) The state is the set of idle servers. (b) For i ∈ S, j ∈ / S, the infinitesimal rates of the chain are q S,S−i = λ/|S|, q S,S+ j = µ j where |S| is the number of elements in S. The time reversibility equations are P(S)λ/|S| = P(S − i)µi which has a solution P(S) = P0 |S|!

)

(µk /λ)

k∈S

where P0 , the probability there are no idle servers, is found from % ) P0 [1 + |S|! (µk /λ)] = 1 S

39.

k∈S

where the preceding sum is over all nonempty subsets of {1, . . . , n}.

(a) The state is (i 1 , . . . , i k ) if {i 1 , . . . , i k } is the set of idle servers, with i 1 having been idle the longest, i 2 the second longest, and so on. (b) and (c) For j ∈ / {i 1 , . . . , i k }, the infinitesimal rates of the chain are q(i1 ,...,ik ),(i1 ,...,ik−1 ) = λ, q(i1 ,...,ik ),(i1 ,...,ik , j) = µ j The time reversibility equations are P(i 1 , . . . , i k )λ = P(i 1 , . . . , i k−1 )µik giving the solution P(i 1 , . . . , i k ) =

µi1 · · · µik P(0) λk

where P(0) is the probability there are no idle servers. 40. The time reversible equations are P(i)

vj vi = P( j) n−1 n−1

yielding the solution

1/v j P( j) = &n i=1 1/vi

Hence, the chain is time reversible with long run proportions given by the preceding. 41. Show in Example 6.22 that the limiting probabilities satisfy Equations (6.33), (6.34), and (6.35).

Solutions to Starred Exercises

737

42. Because the stationary departure process from an M/M/1 queue is a Poisson process it follows that the number of customers with server 2 is the stationary probability of an M/M/1 system. 43. We make the conjecture that the reverse chain is a system of same type, except that the Poisson arrivals at rate λ arrive at server 3, then go to server 2, then to server 1, and then depart the system. Let ek be the 3-vector with 1 in position k and 0 elsewhere. With the state being i = (i 1 , i 2 , i 3 ) when that there are i j customers at server j for j = 1, 2, 3, the instantaneous transition rates of the chain are q(i, j,k),(i+1, j,k) = λ

q(i, j,k),(i−1, j+1,k) = µ1 , i > 0 q(i, j,k),(i, j−1,k+1) = µ2 , j > 0 q(i, j,k),(i, j,k−1) = µ3 , k > 0

whereas the conjectured instantaneous rates for the reversed chain are ∗ q(i, j,k),(i, j,k+1) = λ

∗ q(i, j,k),(i, j+1,k−1) = µ3 , k > 0

∗ q(i, j,k),(i+1, j−1,k) = µ2 , j > 0 q(i, j,k),(i−1, j,k) = µ1 , i > 0

The conjecture is correct if we can find probabilities P(i, j, k) that satisfy the reverse time equations when the preceding are the instantaneous rates for the reversed chain, and it is easy to check that P(i, j, k) = K 49.

λ µ1

$i #

λ µ2

$j #

λ µ3

satisfy. (a) The matrix P∗ can be written as P∗ = I + R/v n and so Pi∗n j can be obtained by taking the i, j element of (I + R/v) , which gives the result when v = n/t. (b) Uniformization shows that Pi j (t) = E[Pi∗N j ], where N is independent of the Markov chain with transition probabilities Pi∗j and is Poisson distributed with mean vt. Since a Poisson random variable with mean vt has standard deviation (vt)1/2 , it follows that for large values of vt it should be near vt. (For instance, a Poisson random variable with mean 106 has standard deviation 103 and thus will, with high probability, be within 3000 of 106 .) Hence, since for fixed i and j, Pi∗m j should not vary much for values of m about vt where vt is large, it follows that, for large vt, ∗n E[Pi∗N j ] ≈ Pi j

where n = vt

738

Introduction to Probability Models

Chapter 7 3.

+ 1 − F(t − y), if y # t (a) P(N (t) = n|Sn = y) = 0, if y > t * ∞ (b) P(N (t) = n) = P(N (t) = n|Sn = y) f Sn (y) dy 0 * t = e−λ(t−y) λe−λy (λy)n−1 /(n − 1)! dy 0

* t e−λt λn = y n−1 dy (n − 1)! 0 e−λt (λt)n = n!!

(a) Consider a Poisson process having rate λ and say that an event of the renewal process occurs whenever one of the events numbered r, 2r, 3r, . . . of the Poisson process occurs. Then P{N (t) " n} = P{nr or more Poisson events by t} ∞ % e−λt (λt)i /i! = (b) E[N (t)] = =

∞ % n=1

i=nr

P{N (t) " n} =

∞ [i/r % %] i=r n=1

∞ ∞ % %

e−λt (λt)i /i!

n=1 i=nr

e−λt (λt)i /i! =

∞ % [i/r ]e−λt (λt)i /i! i=r

(a) The number of replaced machines by time t constitutes a renewal process. The time between replacements equals T , if the lifetime of new machine is " T ; x, if the lifetime of new machine is x, x < T . Hence, * T E[time between replacements] = x f (x) d x + T [1 − F(T )] 0

and the result follows by Proposition 3.1. (b) The number of machines that have failed in use by time t constitutes a renewal process. The mean time between in-use failures, E[F], can be calculated by conditioning on the lifetime of the initial machine as E[F] = E[E[F| lifetime of initial machine]]. Now + x, if x # T E[F| lifetime of machine is x] = T + E[F], if x > T Hence, E[F] =

T 0

x f (x) d x + (T + E[F])[1 − F(T )]

Solutions to Starred Exercises

739

or E[F] =

8T 0

x f (x) d x + T [1 − F(T )] F(T )

and the result follows from Proposition 3.1. 13. With Wi equal to your winnings in game i, i " 1, and N the number of games played, Wald’s equation yields E[X ] = E[N ]E[W ] = 0 With pi = P(X = i), p1 = 1/2 + 1/8 = 5/8, p−1 = 1/4, p−3 = 1/8, verifying that E[X ] = 0. 18. We can imagine that a renewal corresponds to a machine failure, and each time a new machine is put in use its life distribution will be exponential with rate µ1 with probability p, and exponential with rate µ2 otherwise. Hence, if our state is the index of the exponential life distribution of the machine presently in use, then this is a two-state continuous-time Markov chain with intensity rates q1,2 = µ1 (1 − p),

q2,1 = µ2 p

Hence, P11 (t) =

µ1 (1 − p) exp{−[µ1 (1 − p) + µ2 p]t} µ1 (1 − p) + µ2 p µ2 p + µ1 (1 − p) + µ2 p

with similar expressions for the other transition probabilities (P12 (t) = 1 − P11 (t), and P22 (t) is the same with µ2 p and µ1 (1 − p) switching places). Conditioning on the initial machine now gives E[Y (t)] = pE[Y (t)|X (0) = 1] + (1 − p)E[Y (t)|X (0) = 2] " ! " ! P21 (t) P12 (t) P22 (t) P11 (t) + (1 − p) + + =p µ1 µ2 µ1 µ2 Finally, we can obtain m(t) from µ[m(t) + 1] = t + E[Y (t)] where µ = p/µ1 + (1 − p)/µ2 is the mean interarrival time.

740

Introduction to Probability Models

(a) Let X denote the length of time that J keeps a car. Let I equal 1 if there is a breakdown by time T and equal 0 otherwise. Then E[X ] = E[X |I = 1](1 − e−λT ) + E[X |I = 0]e−λT $ $ # # 1 1 −λT −λT (1 − e e = T+ )+ T + µ λ =T+

e−λT 1 − e−λT + µ λ

1/E[X ] is the rate that J buys a new car. (b) Let W equal to the total cost involved with purchasing a car. Then, with Y equal to the time of the first breakdown * ∞ E[W |Y = y]λe−λy dy E[W ] = 0

=C+

r (1 + µ(T − y) + 1)λe

= C + r (2 − e 23.

−λT

)+r

−λy

dy +

∞

r λe−λy dy

µ(T − y)λe−λy dy

J’s long run average cost is E[W ]/E[X ].

(a) Say that a new cycle begins each time A wins a point. With N equal to the number of points in a cycle E[N ] = 1 + qa / pb where the preceding used that, starting with B serving, the number of points played until A wins a point is geometric with parameter pb . Hence, by renewal b . reward, the proportion of points won by A is 1/E[N ] = pbp+q a (b) ( pa + pb )/2 b > ( pa + pb )/2 is equivalent to pa qa > pb qb . (c) pbp+q a

30.

t − S N (t) A(t) = t t S N (t) =1− t S N (t) N (t) =1− N (t) t

The result follows since S N (t) /N (t) → µ (by the strong law of large numbers) and N (t)/t → 1/µ. 35. (a) We can view this as an M/G/∞ system where a satellite launching corresponds to an arrival and F is the service distribution. Hence, P{X (t) = k} = e−λ(t) [λ(t)]k /k! 8t where λ(t) = λ 0 (1 − F(s)) ds.

Solutions to Starred Exercises

741

(b) By viewing the system as an alternating renewal process that is on when there is at least one satellite orbiting, we obtain lim P{X (t) = 0} =

1/λ 1/λ + E[T ]

where T , the on time in a cycle, is the quantity of interest. From part (a) lim P{X (t) = 0} = e−λµ 8∞ where µ = 0 (1 − F(s)) ds is the mean time that a satellite orbits. Hence, e−λµ =

1/λ 1/λ + E[T ]

E[T ] =

1 − e−λµ λe−λµ

42.

* 1 x −y/µ e dy = 1 − e−x/µ . µ 0 * 1 x x (b) Fe (x) = 0 # x # c. dy = , c 0 c (c) You will receive a ticket if, starting when you park, an official appears within one hour. From Example 7.23 the time until the official appears has the distribution Fe which, by part (a), is the uniform distribution on (0, 2). Thus, the probability is equal to 21 . (a) Fe (x) =

44. (a) Let Ni denote the number of passengers that get on bus i. If we interpret X i as the reward incurred at time i then we have a renewal reward process whose ith cycle is of length Ni , and has reward X N1 +···+Ni−1 +1 + · · · + X N1 +···+Ni . Hence, part (a) follows because N is the time and X 1 + · · · + X N is the cost of the first cycle. (b) Condition on N (t) and use that conditional on N (t) = n the n arrival times are independently and uniformly distributed on (0, t). As S ≡ X 1 + · · · + X N is the number of these n passengers whose waiting time is less than x, this gives + nx/t, if x < t E[S|T = t, N (t) = n] = n, if x > t That is, E[S|T = t, N (t)] = N (t) min(x, t)/t. Taking expectations yields E[S|T = t] = λ min(x, t)

(c) From (b), E[S|T ] = λ min(x, T ) and (c) follows upon taking expectations. (d) This follows from parts (a) and (c) using that * ∞ * x P( min(x, T ) > t)dt = P(T > t)dt E[min(x, T )] = 0

along with the identity E[N ] = λE[T ].

742

Introduction to Probability Models

(e) Because the waiting time for an arrival is the time until the next bus, the preceding result yields the PASTA result that the proportion of arrivals who see the excess life of the renewal process of bus arrivals to be less than x is equal to the proportion of time it is less than x. 49. Think of each interarrival time as consisting of n independent phases—each of which is exponentially distributed with rate λ—and consider the semi-Markov process whose state at any time is the phase of the present interarrival time. Hence, this semi-Markov process goes from state 1 to 2 to 3 . . . to n to 1, and so on. Also the time spent in each state has the same distribution. Thus, clearly the limiting probability of this semi-Markov chain is Pi = 1/n, i = 1, . . . , n. To compute lim P{Y (t) < x}, we condition on the phase at time t and note that if it is n − i + 1, which will be the case with probability 1/n, then the time until a renewal occurs will be sum of i exponential phases, which will thus have a gamma distribution with parameters i and λ. 2 52. (a) If T is exponential then E[T 2 ]/E[T ] = 2E[T ]. Hence, λ E[T ] = λE[T ] = 2E[T ] E[N ]. (b) Because we are averaging over all time, we are giving more weight to those cycles (times between bus arrivals) that are large. (c) Because buses arrive according to a Poisson process, the average number of waiting people seen by a bus must, by PASTA, equal to the average number waiting when averaged over all time. 53. Letting X i = 1 if flip i comes up heads, and 0 if it comes up tails, then we want &N E[ i=1 X i ], where N is the number of flips until the pattern appears. With q = 1− p 1 + E[N H T H ] p4 q 3 1 1 = 4 3 + 2 + E[N H ] p q p q 1 1 1 = 4 3+ 2 + p q p q p

E[N ] =

Because N is a stopping time for the sequence X i , i " 1, it follows from Wald’s equation that 2 N 3 % 1 1 +1 E X i = E[N ] p = 3 3 + p q pq i=1

Chapter 8

2. This problem can be modeled by an M/M/1 queue in which λ = 6, µ = 8. The average cost rate will be $10 per hour per machine × average number of broken machines

Solutions to Starred Exercises

743

The average number of broken machines is just L, which can be computed from Equation (3.2): λ µ−λ 6 = =3 2

Hence, the average cost rate = $30/hour. 6. To compute W for the M/M/2, set up balance equations as follows: (each server has rate µ) λP0 = µP1 (λ + µ)P1 = λP0 + 2µP2 (λ + 2µ)Pn = λPn−1 + 2µPn+1 , n"2 These have solutions Pn = ρ n /2n−1 P0 where ρ = λ/µ. The boundary condition & ∞ n=0 Pn = 1 implies P0 =

(2 − ρ) 1 − ρ/2 = 1 + ρ/2 (2 + ρ)

Now we have Pn , so we can compute L, and hence W from L = λW : L=

∞ % n=0

n Pn = ρ P0

∞ . ρ /n % = 2P0 n 2

∞ . ρ /n−1 % n 2 n=0

n=0

(2 − ρ) (ρ/2) =2 (2 + ρ) (1 − ρ/2)2 4ρ = (2 + ρ)(2 − ρ) 4µλ = (2µ + λ)(2µ − λ) From L = λW we have W = W (M/M/2) =

(See derivation of Equation (8.7).)

4µ (2µ + λ)(2µ − λ)

The M/M/1 queue with service rate 2µ has W (M/M/1) =

1 2µ − λ

from Equation (8.8). We assume that in the M/M/1 queue, 2µ > λ so that the queue is stable. But then 4µ > 2µ + λ, or 4µ/(2µ + λ) > 1, which implies

744

Introduction to Probability Models

W (M/M/2) > W (M/M/1). The intuitive explanation is that if one finds the queue empty in the M/M/2 case, it would do no good to have two servers. One would be better off with one faster server. Now let W Q1 = W Q (M/M/1) and W Q2 = W Q (M/M/2). Then, W Q1 = W (M/M/1) − 1/2µ W Q2 = W (M/M/2) − 1/µ

So, W Q1 =

λ 2µ(2µ − λ)

W Q2 =

λ2 µ(2µ − λ)(2µ + λ)

from Equation (8.8)

and

Then, W Q1 > W Q2 ⇔ λ < 2µ

1 λ > 2 (2µ + λ)

Since we assume λ < 2µ for stability in the M/M/1 case, W Q2 < W Q1 whenever this comparison is possible, that is, whenever λ < 2µ. & 7. (a) n n Pn /λ An−1 µ + iα (b) e0 = 1, en = i=0 ,n > 0 µ + (i + 1)α en Pn (c) P(n|served) = & n en Pn ∞ n−1 & & 1 (d) P(n|served) µ + (i + 1)α n=1 i=0 (e) Let W Q (s) and W Q (n) be the averages of the time spent in queue by those that are served and by those that are not. Then 9 : % % % (n−1)Pn /λ = W Q = en Pn W Q (s)+ 1 − en Pn W Q (n) n>0

where W Q (s) is given in part (d), and where the right hand equality is obtained by conditioning on whether an arrival was served. 11. Let the state be (n, m) if there are n families and m taxis waiting, nm = 0. The time reversibility equations are Pn−1,0 λ = Pn,0 µ, n = 1, . . . , N

P0,m−1 µ = P0,m λ, m = 1, . . . , M

Solutions to Starred Exercises

745

Solving yields

where

Pn,0 = (λ/µ)n P0,0 , n = 0, 1, . . . , N P0,m = (µ/λ)m P0,0 , m = 0, 1, . . . , M N

n=0

m=1

% % 1 = (λ/µ)n + (µ/λ)m P0,0

(a) (b) (c) (d)

m=0

P0,m

n=0 Pn,0 &N n=0 n Pn,0 λ(1−PN ,0 ) &M m=0 m P0,m µ(1−P0,M )

(e) 1 − PN ,0 When N = M = ∞ the time reversibility equations become Pn−1,0 λ = Pn,0 (µ + nα), n " 1 P0,m−1 µ = P0,m (λ + mβ), m " 1

which yields

Pn,0 = P0,0 P0,m = P0,0

13.

n )

λ , n"1 µ + iα

i=1

µ , m"1 λ + iβ

i=1 m )

The rest is similar to the preceding. λP0 = µP1 (a) (λ + µ)P1 = λP0 + 2µP2

(λ + 2µ)Pn = λPn−1 + 2µPn+1 ,

n"2

These are the same balance equations as for the M/M/2 queue and have solution $ # 2µ − λ λn , Pn = n−1 n P0 P0 = 2µ + λ 2 µ

(b) The system goes from 0 to 1 at rate λP0 =

λ(2µ − λ) (2µ + λ)

The system goes from 2 to 1 at rate 2µP2 =

λ2 (2µ − λ) µ (2µ + λ)

746

Introduction to Probability Models

(c) Introduce a new state cl to indicate that the stock clerk is checking by himself. The balance equation for Pcl is (λ + µ)Pcl = µP2 Hence, Pcl =

(2µ − λ) µ λ2 P2 = λ+µ 2µ(λ + µ) (2µ + λ)

Finally, the proportion of time the stock clerk is checking is Pcl + 21.

(a) (b) (c) (d)

∞ % n=2

Pn = Pcl +

2λ2 µ(2µ − λ)

λ1 P10 . λ2 (P0 + P10 ). λ1 P10 /[λ1 P10 + λ2 (P0 + P10 )]. This is equal to the fraction of server 2’s customers that are type 1 multiplied by the proportion of time server 2 is busy. (This is true since the amount of time server 2 spends with a customer does not depend on which type of customer it is.) By (c) the answer is thus (P01 + P11 )λ1 P10 λ1 P10 + λ2 (P0 + P10 )

24. The states are now n, n " 0, and n ′ , n " 1 where the state is n when there are n in the system and no breakdown, and n ′ when there are n in the system and a breakdown is in progress. The balance equations are λP0 = µP1 (λ + µ + α)Pn = λPn−1 + µPn+1 + β Pn ′ ,

(β + λ)P1′ = α P1 (β + λ)Pn ′ = α Pn + λP(n−1)′ , ∞ ∞ % % Pn + Pn ′ = 1 n=0

n=1

In terms of the solution to the preceding, L=

∞ % n=1

n(Pn + Pn′ )

and so W =

L L = λa λ

n"2

n"1

Solutions to Starred Exercises

747

28. If a customer leaves the system busy, the time until the next departure is the time of a service. If a customer leaves the system empty, the time until the next departure is the time until an arrival plus the time of a service. Using moment generating functions we get B C λ E es D = E{es D |system left busy} µ $ # λ E{es D |system left empty} + 1− µ $ # $ # $# µ λ λ + 1− E{es(X +Y ) } = µ µ−s µ where X has the distribution of interarrival times, Y has the distribution of service times, and X and Y are independent. Then ; < ; < E es(X +Y ) = E es X esY ; < ; < by independence = E es X E esY $# $ # µ λ = λ−s µ−s So, $ # $# $# $ # $# µ λ λ µ λ + 1− µ µ−s µ λ−s µ−s λ = (λ − s)

B C E es D =

By the uniqueness of generating functions, it follows that D has an exponential distribution with parameter λ. 36. The distributions of the queue size and busy period are the same for all three disciplines; that of the waiting time is different. However, the means are identical. This can be seen by using W = L/λ, since L is the same for all. The smallest variance in the waiting time occurs under first-come, first-served and the largest under last-come, first-served. 39. (a) a0 = P0 due to Poisson arrivals. Assuming that each customer pays 1 per unit time while in service the cost identity of Equation (8.1) states that average number in service = λE[S] or 1 − P0 = λE[S] (b) Since a0 is the proportion of arrivals that have service distribution G 1 and 1−a0 the proportion having service distribution G 2 , the result follows.

748

Introduction to Probability Models

E[I ] E[I ] + E[B]

and E[I ] = 1/λ and thus,

1 − P0 λP0 E[S] = 1 − λE[S]

E[B] =

Now from parts (a) and (b) we have E[S] = (1 − λE[S])E[S1 ] + λE[S]E[S2 ] or E[S] =

E[S1 ] 1 + λE[S1 ] + λE[S2 ]

Substituting into E[B] = E[S]/(1 − λE[S]) now yields the result. (d) a0 = 1/E[C], implying that E[C] =

E[S1 ] + 1/λ − E[S2 ] 1/λ − E[S2 ]

45. By regarding any breakdowns that occur during a service as being part of that service, we see that this is an M/G/1 model. We need to calculate the first two moments of a service time. Now the time of a service is the time T until something happens (either a service completion or a breakdown) plus any additional time A. Thus, E[S] = E[T + A] = E[T ] + E[A] To compute E[A], we condition upon whether the happening is a service or a breakdown. This gives α µ + E[A|breakdown] µ+α µ+α α = E[A|breakdown] µ+α $ # α 1 + E[S] = β µ+α

E[A] = E[A|service]

Since E[T ] = 1/(α + µ) we obtain # $ 1 1 α E[S] = + + E[S] α+µ β µ+α

Solutions to Starred Exercises

749

or E[S] =

1 α + µ µβ

We also need E[S 2 ], which is obtained as follows: E[S 2 ] = E[(T + A)2 ]

= E[T 2 ] + 2E[AT ] + E[A2 ] = E[T 2 ] + 2E[A]E[T ] + E[A2 ]

The independence of A and T follows because the time of the first happening is independent of whether the happening was a service or a breakdown. Now, E[A2 ] = E[A2 |breakdown]

α µ+α

α E[(downtime + S ∗ )2 ] µ+α C α B E[down2 ] + 2E[down]E[S] + E[S 2 ] = µ+α + ! " 7 2 α 2 1 α 2 + + E[S + ] = µ + α β2 β µ µβ =

Hence, # $" ! α 1 α 2 α + + +2 E[S ] = (µ + β)2 β(µ + α) µ + α µ µβ + ! " 7 2 α α 2 1 2 + + E[S ] + + µ + α β2 β µ µβ 2

Now solve for E[S 2 ]. The desired answer is WQ =

λE[S 2 ] 2(1 − λE[S])

In the preceding, S ∗ is the additional service needed after the breakdown is over and S ∗ has the same distribution as S. The preceding also uses the fact that the expected square of an exponential is twice the square of its mean. Another way of calculating the moments of S is to use the representation S=

N % i=1

(Ti + Bi ) + TN +1

where N is the number of breakdowns while a customer is in service, Ti is the time starting when service commences for the ith time until a happening occurs, and Bi is the length of the ith breakdown. We now use the fact that, given N , all of

750

Introduction to Probability Models

the random variables in the representation are independent exponentials with the Ti having rate µ + α and the Bi having rate β. This yields N N +1 + , µ+α β N +1 N Var(S|N ) = + 2 (µ + α)2 β E[S|N ] =

Therefore, since 1 + N is geometric with mean (µ + α)/µ (and variance α(α + µ)/µ2 ) we obtain E[S] =

α 1 + µ µβ

and, using the conditional variance formula, " ! 1 2 α(α + µ) α 1 1 + + Var(S) = + 2 µ+α β µ µ(µ + α) µβ 2

47. The identity L = λa W gives that L = .8 · 4 = 3.2 52. Sn is the service time of the nth customer; Tn is the time between the arrival of the nth and (n + 1)st customer.

Chapter 9

4. (a) φ(x) = x1 max(x2 , x3 , x4 )x5 . (b) φ(x) = x1 max(x2 x4 , x3 x5 )x6 . (c) φ(x) = max(x1 , x2 x3 )x4 .

6. A minimal cut set has to contain at least one component of each minimal path set. There are six minimal cut sets: {1, 5}, {1, 6}, {2, 5}, {2, 3, 6}, {3, 4, 6}, {4, 5}. 12. The minimal path sets are {1, 4}, {1, 5}, {2, 4}, {2, 5}, {3, 4}, {3, 5}. With qi = 1 − pi , the reliability function is r (p) = P{either of 1, 2, or 3 works}P{either of 4 or 5 works} = (1 − q1 q2 q3 )(1 − q4 q5 )

17. E[N 2 ] = E[N 2 |N > 0]P{N > 0} " (E[N |N > 0])2 P{N > 0}, Thus,

since E[X 2 ] " (E[X ])2

E[N 2 ]P{N > 0} " (E[N |N > 0]P[N > 0])2 = (E[N ])2

Let N denote the number of minimal path sets having all of its components functioning. Then r ( p) = P{N > 0}. Similarly, if we define N as the number of minimal cut sets having all of its components failed, then 1 − r ( p) = P{N > 0}. In both cases we can compute expressions for E[N ] and E[N 2 ] by writing N as the sum of indicator (i.e., Bernoulli) random variables. Then we can use the inequality to derive bounds on r ( p).

Solutions to Starred Exercises

22.

(a)

751

F¯t (a) = P{X > t + a|X > t} ¯ + a) F(t P[X > t + a} = = ¯ P{X > t} F(t)

(b) Suppose λ(t) is increasing. Recall that ¯ F(t) = e−

8t 0

λ(s) ds

Hence, 7 + * t+a ¯ + a) F(t λ(s)ds = exp − ¯ F(t) t which decreases in t since λ(t) is increasing. To go the other way, suppose ¯ + a)/ F(t) ¯ decreases in t. Now when a is small F(t ¯ + a) F(t ≈ e−aλ(t) ¯ F(t) Hence, e−aλ(t) must decrease in t and thus λ(t) increases. 25. For x " ξ , ξ/x ¯ ) = F(x(ξ/x)) ¯ ¯ 1 − p = F(ξ " [ F(x)]

¯ since IFRA. Hence, F(x) # (1 − p)x/ξ = e−θ x . For x # ξ , ¯ ¯ ¯ )]x/ξ F(x) = F(ξ(x/ξ )) " [ F(ξ ¯ since IFRA. Hence, F(x) " (1 − p)x/ξ = e−θ x . 30. r (p) = p1 p2 p3 + p1 p2 p4 + p1 p3 p4 + p2 p3 p4 − 3 p1 p2 p3 p4 r (1 − F(t)) =

⎧ 2(1 − t)2 (1 − t/2) + 2(1 − t)(1 − t/2)2 ⎪ ⎨

2 2 0#t #1 ⎪ ⎩−3(1 − t) (1 − t/2) , 0, 1#t #2 * 14 2(1 − t)2 (1 − t/2) + 2(1 − t)(1 − t/2)2 E[lifetime] = 0 5 −3(1 − t)2 (1 − t/2)2 dt

31 60

752

Introduction to Probability Models

Chapter 10 1. B(s) + B(t) = 2B(s) + B(t) − B(s). Now 2B(s) is normal with mean 0 and variance 4s and B(t) − B(s) is normal with mean 0 and variance t − s. Because B(s) and B(t) − B(s) are independent, it follows that B(s) + B(t) is normal with mean 0 and variance 4s + t − s = 3s + t. 3. E[B(t1 )B(t2 )B(t3 )] = E[E[B(t1 )B(t2 )B(t3 )|B(t1 ), B(t2 )]] = E[B(t1 )B(t2 )E[B(t3 )|B(t1 ), B(t2 )]] = E[B(t1 )B(t2 )B(t2 )] = E[E[B(t1 )B 2 (t2 )|B(t1 )]] = E[B(t1 )E[B 2 (t2 )|B(t1 )]]

= E[B(t1 ){(t2 − t1 ) + B 2 (t1 )}] (∗) 3 = E[B (t1 )] + (t2 − t1 )E[B(t1 )]

where the equality (∗) follows since given B(t1 ), B(t2 ) is normal with mean B(t1 ) and variance t2 − t1 . Also, E[B 3 (t)] = 0 since B(t) is normal with mean 0. 5. P{T1 < T−1 < T2 } = P{hit 1 before −1 before 2} = P{hit 1 before −1} ×P{hit −1 before 2|hit 1 before −1} = 21 P{down 2 before up 1} 1 11 = = 23 6

The next to last equality follows by looking at the Brownian motion when it first hits 1. 10. (a) Writing X (t) = X (s) + X (t) − X (s) and using independent increments, we see that given X (s) = c, X (t) is distributed as c + X (t) − X (s). By stationary increments this has the same distribution as c + X (t − s), and is thus normal with mean c + µ(t − s) and variance (t − s)σ 2 . (b) Use the representation X (t) = σ B(t) + µt, where {B(t)} is standard Brownian motion. Using Equation (10.4), but reversing s and t, we see that the conditional distribution of B(t) given that B(s) = (c − µs)/σ is normal with mean t(c−µs)/(σ s) and variance t(s−t)/s. Thus, the conditional distribution of X (t) given that X (s) = c, s > t, is normal with mean " ! (c − µs)t t(c − µs) + µt = + µt σ σs s and variance σ 2 t(s − t) s

Solutions to Starred Exercises

753

19. Since knowing the value of Y (t) is equivalent to knowing B(t), we have E[Y (t)|Y (u), 0 # u # s] = e−c =e

2 t/2

−c2 t/2

E[ecB(t) |B(u), 0 # u # s] E[ecB(t) |B(s)]

Now, given B(s), the conditional distribution of B(t) is normal with mean B(s) and variance t − s. Using the formula for the moment generating function of a normal random variable we see that e−c

2 t/2

E[ecB(t) |B(s)] = e−c

2 t/2

ecB(s)+(t−s)c

2 /2

= e−c s/2 ecB(s) = Y (s)

Thus {Y (t)} is a Martingale. E[Y (t)] = E[Y (0)] = 1 20. By the Martingale stopping theorem E[B(T )] = E[B(0)] = 0

However, B(T ) = 2 − 4T and so 2 − 4E[T ] = 0, or E[T ] = 21 . 24. It follows from the Martingale stopping theorem and the result of Exercise 18 that E[B 2 (T ) − T ] = 0 where T is the stopping time given in this problem and B(t) =

X (t) − µt σ

Therefore, " ! (X (T ) − µT )2 − T =0 E σ2 However, X (T ) = x and so the preceding gives that E[(x − µT )2 ] = σ 2 E[T ]

But, from Exercise 21, E[T ] = x/µ and so the preceding is equivalent to Var(µT ) = σ 2

x x or Var(T ) = σ 2 3 µ µ

25. Let X . (t) be the value of the process at time t. With Ii = 1 if the ith change is an increase and −1 if it is a decrease, then % √ [t/.] Ii X . (t) = σ . i=1

754

Introduction to Probability Models

Because the Ii are independent it is clear that this process has independent increments and that the limiting process (as . → 0) will have stationary increments. Also, by the central limit theorem, it is clear that the limiting distribution of X . (t)will be normal. The result now follows because √ E[X . (t)] = σ . [t/.](2 p − 1) = µ . [t/.] → µt and

Var(X . (t)) = σ 2 . [t/.](1 − (2 p − 1)2 ) → σ 2t

where the preceding used that p → 1/2 as . → 0. 26. With M(t) = max0!y!t X (y), P(Ty < ∞) = lim P(M(t) " y) t

¯ ¯ = 0 and lims→∞ / and the result follows from Corollary 10.1 since lims→∞ /(s) (−s) = 1. Hence, for µ < 0, 2

P(M " y) = lim P(M(t) " y) = e2yµ/σ . t

27. Using that {−X (y), y " 0} is Brownian motion with drift parameter −µ and variance parameter σ 2 , we obtain from Corollary 10.1 that for s < 0 ( ' ( ' P min X (y) # s = P max −X (y) " −s 0!y!t 0!y!t $ # $ # −s + µt −µt − s 2 ¯ ¯ +/ = e2sµ/σ / √ √ σ t σ t 30. E[X (a 2 t)/a] = (1/a)E[X (a 2 t)] = 0. For s < t,

' ( ' ( 1 Cov Y (s), Y (t) = 2 Cov X (a 2 s), X (a 2 t) a 1 = 2 a2s = s a Because {Y (t)} is clearly Gaussian, the result follows.

33.

(a) Starting at any time t the continuation of the Poisson process remains a Poisson process with rate λ. * ∞ (b) E[Y (t)Y (t + s)] = E[Y (t)Y (t + s)|Y (t) = y]λe−λy dy 0 * s = y E[Y (t + s)|Y (t) = y]λe−λy dy 0 * ∞ y(y − s)λe−λy dy + s * s * ∞ 1 = y λe−λy dy + y(y − s)λe−λy dy λ 0 s

Solutions to Starred Exercises

755

where the preceding used that E[Y (t)Y (t + s)|Y (t) = y] =

y E(Y (t + s)) = y(y − s),

y , if y < s λ if y > s

Hence, Cov(Y (t), Y (t + s)) =

ye−λy dy +

∞

y(y − s)λe−λy dy −

1 λ2

Chapter 11 1.

& &i (a) Let U be a random number. If i−1 j=1 P j then simulate from j=1 P j < U # &i−1 Fi . (In the preceding j=1 P j ≡ 0 when i = 1.) (b) Note that F(x) =

2 1 F1 (x) + F2 (x) 3 3

where F1 (x) = 1 − e2x , 0 < x < ∞ + x, 0 < x < 1 F2 (x) = 1, 1 < x Hence, using part (a), let U1 , U2 , U3 be random numbers and set ⎧ ⎨ − log U2 , if U1 < 13 X= 2 ⎩U , if U > 1 3

The preceding uses the fact that − log U2 /2 is exponential with rate 2.

3. If a random sample of size n is chosen from a set of N + M items of which N are acceptable, then X , the number of acceptable items in the sample, is such that # $# $H # $ N M N+M P{X = k} = k n−k k To simulate X , note that if + 1, if the jth selection is acceptable Ij = 0, otherwise then P{I j = 1|I1 , . . . , I j−1 } =

& j−1 N − 1 Ii N + M − ( j − 1)

756

Introduction to Probability Models

Hence, we can simulate I1 , . . . , In by generating random numbers U1 , . . . , Un and then setting ⎧ & j−1 N − 1 Ii ⎨ I j = 1, if U j < N + M − ( j − 1) ⎩ 0, otherwise &n and X = j=1 I j has the desired distribution. Another way is to let + 1, the jth acceptable item is in the sample Xj = 0, otherwise and then simulate X 1 , . . . , X N by generating random numbers U1 , . . . , U N and then setting ⎧ & j−1 n − i=1 X i ⎨ X j = 1, if U j < N + M − ( j − 1) ⎩ 0, otherwise

& and X = Nj=1 X j then has the desired distribution. The former method is preferable when n # N and the latter when N # n. 6. Let +

7 ! + 2 7" f (x) 2 −x = √ max exp + λx c(λ) = max x λe−λx 2 λ 2π x + 27 2 λ = √ exp 2 λ 2π Hence, " + 27! I 1 d λ c(λ) = 2/π exp 1− 2 dλ 2 λ

Hence (d/dλ)c(λ) = 0 when λ = 1 and it is easy to check that this yields the minimal value of c(λ). 16. (a) They can be simulated in the same sequential fashion in which they are defined. That is, first generate the value of a random variable I1 such that wi P{I1 = i} = &n

j=1 w j

i = 1, . . . , n

Then, if I1 = k, generate the value of I2 where P{I2 = i} = &

wi j̸=k

i ̸= k

and so on. However, the approach given in part (b) is more efficient.

Solutions to Starred Exercises

757

(b) Let I j denote the index of the jth smallest X i . 8t 23. Let m(t) = 0 λ(s) ds, and let m −1 (t) be the inverse function. That is, m(m −1 (t)) = t. (a)

P{m(X 1 ) > x} = P{X 1 > m −1 (x)} = P{N (m −1 (x)) = 0} = e−m(m = e−x

(b)

−1 (x))

P{m(X i ) − m(X i−1 ) > x|m(X 1 ), . . . , m(X i−1 ) − m(X i−2 )}

= P{m(X i ) − m(X i−1 ) > x|X 1 , . . . , X i−1 } = P{m(X i ) − m(X i−1 ) > x|X i−1 } = P{m(X i ) − m(X i−1 ) > x|m(X i−1 )}

Now, P{m(X i ) − m(X i−1 ) > x|X i−1 = y} 7 +* X i λ(t)dt > x|X i−1 = y =P y

= P{X i > c|X i−1 = y}

where

c y

λ(t) dt = x

= P{N (c) − N (y) = 0|X i−1 = y}

= P{N (c) − N (y) = 0} 7 + * c λ(t) dt = exp − y

= e−x

32. Var[(X + Y )/2] = 41 [Var(X ) + Var(Y ) + 2Cov(X, Y )] Var(X ) + Cov(X, Y ) = 2 Now it is always true that Cov(V, W ) #1 √ Var(V )Var(W ) and so when X and Y have the same distribution Cov(X, Y ) # Var(X ).

Index

A Absorbing states, 185–186 Algorithmic efficiency, model for, 223 Aloha protocol, instability of, 201–202 Alternating renewal process, 439 Aperiodic chain, 219–220 Arbitrage, 615 Arbitrage theorem, 616 Arbitrary queueing system, 520 Arrival theorem, 516–517 Autoregressive process, 635 B Backward approach, 257–258 Balance equations, 375, 487–488 Bayes’ formula, 11 Bernoulli probability mass function, 691 Bernoulli random variables, 26, 50–51, 63–64, 281–282, 323, 365–366, 453– 454, 595–596, 665, 691, 693 expectation of, 35 independent, 118 Best prize problem, 118–119 Beta density with parameters, 56–57 Beta distribution, 661 Binomial compounding distribution, 161 Binomial distribution with parameters, 59 Binomial probabilities, 199, 240 Binomial random variables, 26, 95, 665 expectation of, 35–36 variance of, 50–51 Birth and death model, 357 Birth and death processes, 364–365, 359–360 forward equations for, 373 limiting probabilities for, 375 Birth and death queueing models, 499 Black-Scholes option pricing formula, 619 Bonus Malus automobile insurance system, 186, 218–219

Boolean formula, 230 Boolean variables, 230 Bose–Einstein statistics, 141 Box–Muller approach, 658–659 Bridge system, 564 Brownian bridge, 630–631 Brownian motion, 607, 752 Gaussian processes, 630 independent increment assumption, 609–610 integrated, 631–632 interpretation of, 609 maximum variable and gambler’s ruin problem, 611 pricing stock options arbitrage theorem, 616 Black-Scholes option pricing formula, 619 Brownian motion with drift, 624 example in options pricing, 614 process, 620 with variance parameter, 622 stationary processes, 633 stochastic process, 608 variations on with drift, 612 geometric, 612 weakly stationary processes, 637 white noise, 628 C Car buying model, 428 Central limit theorem, 73–74, 225 heuristic proof of, 76–77 for renewal processes, 426 Chapman–Kolmogorov equations, 187, 195, 370, 372–373 Chebyshev’s inequality, 72 Chi-squared density, 98–99 Chi-square distribution, 660 definition of, 658 759

760

Chi-squared random variable, 66–68 Class mobility model, 207 Closed systems, network of queues, 514 Common distribution function, 125–126 Communication systems, 184 Compounding distribution, 157–158 Compound Poisson process, 327, 330–332 examples of, 327 Compound Poisson random variable, 113 Compound random variables, 102 identity for, 157 variance of, 113 Computer software package, 320 Conditional density function, 333–334 Conditional distribution of arrival times, 675 function, 333–334 Conditional expectation, 97 probability and. See Conditional probability and expectation Conditional/mixed Poisson processes, 332 Conditional Poisson process, 332–333 Conditional probability, 6, 11 density function, 97 mass function, 93–95 Conditional probability and expectation applications Bose–Einstein statistics, 141 k-record values of discrete random variables, 149 left skip free random walks, 152 list model, 133 mean time for patterns, 146 Polya’s Urn model, 141 random graph, 135 computing expectations by conditioning, 100 variances, 111 computing probabilities by conditioning, 115 continuous case, 97 discrete case, 93 identity for compound random variables, 157 binomial compounding distribution, 161 compounding distribution related to negative binomial, 162 Poisson compounding distribution, 160 introduction, 93

Index

Conditional variance defined, 112 formula, 111–112, 234–235, 281–282, 329–330, 333, 365–366, 498–499, 531–532, 731, 748–749 Continuous case, 97 random variables, 37 Continuous-Markov chain, 732 Continuous random variables, 24, 30, 32–33 exponential random variables, 32 gamma random variables, 33 general techniques for hazard rate method, 654 inverse transformation method, 649 rejection method, 650 increasing runs of, 461 normal random variables, 33 with probability density function, 39 special techniques for beta distribution, 661 chi-squared distribution, 660 exponential distribution, 662 gamma distribution, 660 normal distribution, 657 Von Neumann algorithm, 662 uniform random variable, 31 Continuous-time Markov chain, 357–358, 371, 374, 380–381, 384–385, 387–388, 393, 481, 688–689 birth and death processes, 359 limiting probabilities, 374 reversed chain, 387 time reversibility, 380 transition probabilities of, 366, 396 transition probability function, 366 uniformization, 393 Continuous-time process, 78 Continuous-time stochastic process, 358 Convolution of distributions, 52 Cost equations, 482 Coupon collecting problem, 306–307 Covariance, properties of, 48 Coxian random variable, 296–297 Cumulative distribution function, 24–25, 32, 42, 52 of random variable, 32 and probability density, relationship between, 31 Cut vector, 564

Index

D Data sequence, defined, 149–150 Decreasing failure rate (DFR), 581 Delayed renewal processes, 452–453, 470 Density function, 691 of gamma random variable, 115–116 Desired probability, 222 Dirichlet distribution, 143–144 Disconnected graph, 135 Discrete case, 93 random variables, 34 Discrete random variables, 24–25 Bernoulli random variable, 26 binomial random variable, 26 distributed, 146 geometric random variable, 28 k-record values of, 149 patterns of, 453 Poisson random variable, 29 with probability mass function, 39 Discrete-time Markov chains, 380–383, 393 Discrete-time process, 78 Distributed discrete random variables, 146 Distributed random variables, independent and identically, 424–425, 459 Drift, Brownian motion with, 612 E Elementary renewal theorem, 419, 424–425, 462 proof of, 422–423 Embedded chain, 380 Equilibrium distribution, 442 Ergodic continuous-time Markov chain, 387–388 Ergodic Markov chain, 242 Erlang’s loss formula, 482 Erlang’s loss system, 542 Event times, simulating, 676 Exponential distribution, 662 definition, 278 exponential random variables, convolutions of, 293 with parameter λ, 60 properties of, 277, 280, 287 Exponential interarrival random variables, 493–494 Exponential models

761

birth and death queueing models, 499 queueing system with bulk service, 507 shoe shine shop, 505 single-server exponential queueing system, 486 with finite capacity, 495 Exponential random variables, 32, 285–286, 450, 492, 650 convolutions of, 293 expectation of, 37 and expected discounted returns, 279 F Failure rate function, 284–285, 293, 296, 581 of distribution, 585 of hyperexponential random variable, 285–286 Feedback arrival, 513–514 FIFO, 525 Finite source model, 538 Finite-state Markov chain, 197 First-order autoregressive process, 635 Five-component system, 562–563, 567 Forward approach, 257–258 Four-component structure, 561–562 Fourier transforms, 638–639 Front-of-the-line rule, 134, 244 Fundamental queueing identity, 503–504 G Gambler’s fortune, 156–157 Gambler’s ruin problem, 220, 232, 611 Gambling model, 185–186 Gamma distribution, 115–116, 582, 660 Gamma random variables, 33 density function of, 115–116 Gaussian processes, 630 finite dimensional distributions of, 634 Genetics, Markov chain in, 208 Geometric Brownian motion, 612 Geometric distribution, 665 Geometric random variable, 28, 103, 105–107, 151 expectation of, 36 variance of, 111–112 Gibbs sampler, 249–250 simulation, 519

762

G/M/k queue, 544 G/M/1 model, 534 busy and idle periods, 538 Graph connected, 241 components, 139–140 disconnected, 135 random, 135 Greedy algorithms, 288 H Hardy–Weinberg law, 208 Hastings–Metropolis algorithm, 247–250 Hawkes processes, random intensity functions and, 334 Hazard function, 585 Hazard rate function. See Failure rate function Hazard rate method, 654 Heuristic proof of equation, 483 Hidden Markov chains, 254 predicting states, 259 Hitting time theorem, 154 Hyperexponential random variable, 285–286 Hypergeometric distribution, 51, 95 Hypoexponential random variable, 293–296

Index

exponential, 99–100, 289–290, 306–307, 539–540 finite-valued, 150 normal, 63, 67, 713 Poisson, 53–54, 63, 306 sequence of, 329–330 standard normal, 67 Independent time reversible continuous-time Markov chains, 386 Independent with distribution function, 313 Indicator random variable, 23 Induction hypothesis, 137, 154–156 Infinite server Poisson queue, output process of, 326 Infinite server queue, 311, 441–442 Instantaneous transition rates, 368, 384 Insurance ruin problem, 462 Integrated Brownian motion, 631–632 Intensity function, nonhomogeneous Poisson process with, 322 Interarrival density, 469 Inverse transformation method, 649 Irreducible Markov chain, 245 J

I Ignatov’s theorem, 125–126 Impulse response function, 637 Inclusion–exclusion bounds, 573 Inclusion–exclusion identity, 6, 69–71 Inclusion–exclusion theorem, 128 Increasing failure rate (IFR) distribution, 581, 586 Increasing failure rate on the average (IFRA) distribution, 559 Independent Bernoulli random variables, 74 distribution of, 118 Independent Bernoulli trials, 148 Independent components, reliability systems of, 565 Independent events, 9 Independent geometrics, 733 Independent increments, 297, 752 assumption, 297–298, 302, 609–610 Independent random variables, 45, 306–307, 653 binomial, 62 distributed, 424–425

Joint cumulative probability distribution function, 42 Joint density function of n random variables, 57 Joint distribution functions, 42 Jointly distributed random variables covariance and variance of, 46 independent random variables, 45 joint distribution functions, 42 joint probability distribution of functions, 55 Joint probability distributions, 43 Joint probability mass function, 42 K “Key renewal theorem,” 436 Kolmogorov backward equations, 370, 396–397 Kolmogorov forward equations, 372–373 k-out-of-n system, 560–561, 563 with equal probabilities, 567 with identical components, 583–584

Index

L Laplace transform, 64, 299, 322 Limiting probabilities for Markov chain, 209 Limit theorems, 71 Linear birth rate, birth process with, 360 Linear growth model with immigration, 361, 376–377 Linear program, 253–254 Little’s formula, 483 Long-run proportions, Markov chain, 204 M Machine repair model, 377–378 Markov chain, 183–184, 534, 537 applications algorithmic efficiency, model for, 223 gambler’s ruin problem, 220 probabilistic algorithm for satisfiability problem, 226 branching processes, 234 Chapman–Kolmogorov equations, 187 classification of states, 194 defined, 191 ergodic, 242 finite-state, 197 in genetics, 208 hidden, 254 predicting the states, 259 irreducible, 245 limiting probabilities for, 204, 209, 219 long-run proportions and, 204 Markov decision processes, 251 mean pattern times in, 215 mean time spent in transient states, 231 Monte Carlo methods, 247 stationary distribution of, 695 stationary probabilities, 217, 539–540 of successive states, 218–219 time reversible, 237 transforming process into, 185 transition probabilities, 186, 212, 238, 544 transition probability matrix, 189–190, 195–196, 198, 206–207 two-dimensional, 252–253 two-state, 219–220 Markov decision processes, 251 Markovian property, 437 Markov’s inequality, 72, 711

763

Markov transition probability matrix, 247–248, 515 Martingale process, 623 Martingale stopping theorem, 753 Matching rounds problem, variance in, 113–114 Mean value analysis, 517 function, 322 M/G/k queue, 546 M/G/1 system application of, 520 busy periods, 522 optimization example, 527 preliminaries, 520 queue with server breakdown, 531 variations on, 523 Minimal cut set, 564 Minimal cut vector, 564 Minimal path set, 562 Minimal path vector, 562 M/M/1 queue, 363, 378, 487, 500 with balking, 500 with impatient customers, 502–503 steady-state customer, 492 M/M/k queue, 544 Moment generating functions, 58, 713, 747 formula for, 753 joint distribution of mean and variance, 66 number of events, distribution of, 69 Monotone, 562 system of independent components, 587 Monte Carlo methods, 247 Monte Carlo simulation, 247 approach, 645–646 Moving average process, 636 m-step transition probability, 191 Multinomial distribution, 104–105, 143–144 Multinomial probability, 199–200 Multiserver exponential queueing system, 363, 376 Multiserver queues, 542 Multiserver systems, 482 Multivariate normal distribution, 65, 67 N NBU. See New better than used

764

Negative binomial distribution, 129–130, 411, 715 of noncentral chisquared random variable, 718 Network of queues closed systems, 514 open systems, 510 New better than used (NBU), 457 Nodes, 135 Nonhomogeneous Poisson process, 322, 334–335, 672, 685 with intensity function, 731 Nonnegative random variable, 591–592, 712 Nonoverlapping patterns, 147–148 Nonstationary Poisson process. See Nonhomogeneous Poisson process Normal distribution, 657 with parameter µ, 60–61 Normal random variables, 33, 652–653 expectation of, 38 variance of, 41 n-step transition probabilities, 187–188 O One-closer rule, 134, 243 One-step transition probabilities, 187–188 Open systems, network of queues, 510 Order statistics, 54–55 Ornstein–Uhlenbeck process, 635 P Pairwise independent events, 10 Parallel structure, 560 Parallel system, 559, 566 PASTA principle, 434, 486 Path vector, 562 Periodic chain, 219–220 Poisson compounding distribution, 160 Poisson distribution, 64, 212–215, 492 with mean λ, 59–60 Poisson mean, 115–116 Poisson paradigm, 63–64 Poisson process, 277, 299–300, 308–309, 320, 360, 409, 448–449, 462–463, 470, 481, 486, 522, 527, 623, 633, 646, 654–655, 661–662, 666–667, 742 assumption, 671–672

Index

conditional distribution of arrival times, 309 counting processes, 297 definition of, 298 generalizations of, 322 interarrival and waiting time distributions, 301 nonhomogeneous, 672 properties of, 303 sampling, 311, 672 software reliability, estimating, 320 two-dimensional, 677 Poisson random variables, 29, 95–96, 117, 132, 139, 186–187, 212–213, 218– 219, 323, 329, 331, 531, 533, 666, 730, 737 expectation of, 36–37 variance of, 312–315 Polar method, 660 Pollaczek–Khintchine formula, 521 Polya’s Urn model, 141 Power spectral density, 638–639 Preceding method, 149 Present value, 614 Pricing stock options arbitrage theorem, 616 Black-Scholes option pricing formula, 619 example in options pricing, 614 Priority queueing systems, 524–525 Priority queues, 524 Probabilistic algorithm, 230 Probability density function, 30–32, 125–126, 712 relationship between cumulative distribution and, 31 Probability mass function, 25, 29, 42–43, 69–70, 125–126 Probability theory, 1 Bayes’ formula, 11 conditional probabilities, 6 independent events, 9 probabilities defined on events, 4 sample space and events, 1 Production process, 439 Pure birth process, 357 Q Queueing cost identity, 489–490

Index

Queueing models, fundamental quantities for, 482 Queueing system, 684, 689 with bulk service, 507 Queueing theory exponential models birth and death queueing models, 499 queueing system with bulk service, 507 shoe shine shop, 505 single-server exponential queueing system, 486, 495 finite source model, 538 G/M/1 model, 534 M/G/1 system application of, 520 busy periods, 522 optimization example, 527 preliminaries, 520 queue with server breakdown, 531 variations on, 523 multiserver queues, 542 network of queues closed systems, 514 open systems, 510 preliminaries cost equations, 482 steady-state probabilities, 484 Quick-sort algorithm, 108–110 R Random graph, 135, 141, 575 Random intensity functions, 334–335 and Hawkes processes, 334 Random numbers, 646 Random permutation, generating, 646–648 Random-sized batch arrivals, M/G/1 with, 523 Random telegraph signal process, 633–634, 636 Random variables, 21, 652–653, 675 covariance and variance of sums of, 46 density function, 652 expectation of continuous case, 37 discrete case, 34 function of, 38 expected value of, 40 joint probability distribution of functions, 55

765

Random walk model, 185, 198 Random walk process, 424–425 Rate-equality principle, 487–488, 495 Rate of the distribution, 285 Recursive equation, 121–122 Rejection method, 650 Reliability function, 565–567, 683 bounds on, 570 inclusion and exclusion, method of, 570 obtaining bounds on r(p), method for, 578 Reliability theory, 559 independent components, reliability systems of, 565 reliability function, bounds on, 570 inclusion and exclusion method, 570 obtaining bounds on r(p), 578 structure functions, 560 minimal path and minimal cut sets, 562 system lifetime, expected, 587 parallel system, upper bound on, 591 systems with repair, 593 suspended animation, series model with, 597 Renewal arrivals, queueing system with, 437 Renewal equation, 413–414 Renewal function, 412–413 computing, 449 Renewal interarrival distribution, 450 Renewal process, 409–410 age of, 440 average, 432–433 average excess of, 433–434 excess of, 441 reward processes, 427, 430–431, 686 Renewal reward theory, 432–434, 476 Renewal theory and applications distribution of N (t), 411 inspection paradox, 447 introduction, 409 limit theorems and applications, 415 patterns, applications to continuous random variables, 461 discrete random variables, 453 distinct values, expected time to maximal run of, 459 insurance ruin problem, 462

766

regenerative processes, 436 alternating, 439 renewal function, computing, 449 renewal reward processes, 427 semi-Markov processes, 444 Reverse chain equations, 387, 389 Reversed process, 382–383 Reverse time equations, 390–392 S Sample mean, 49, 625 Sample variance, 66 Satisfiability problem, 230 probabilistic algorithm for, 226 Second-order stationary process, 634, 636 Self-exciting process, 335 Semi-Markov processes, 444–445, 477, 742 Sequence of interarrival times, 301–303 Sequential queueing system, 391 Series system, 559 Simplex algorithm, 253–254 Simulation, 645 continuous random variables. See Continuous random variables determining number of runs, 694 from discrete distributions, 664 alias method, 667 renewal function by, 686 stationary distribution of Markov chain, coupling from past, 695, 697 stochastic processes, 671 nonhomogeneous Poisson process, 672 two-dimensional Poisson process, 677 of two-dimensional Poisson processes, 646 variance reduction techniques, 680 by conditioning, 684 control variates, 688 importance sampling, 690 use of antithetic variables, 681 Single-server exponential queueing system, 486, 507–508 finite capacity, 495 Single-server Poisson arrival queues, 328 Single-server queue, 694 Single-server system, 482 Sorting algorithm, 671 Standard Brownian motion, 608, 632 Standard normal distribution function, 426

Index

Standard normal random variables, 68, 654 Standard/unit normal distribution, 34 State space of stochastic process, 78 State vector, 560 Stationary distribution of Markov chain, 695 Stationary ergodic Markov chain, 237 Stationary increments, 298, 302, 752 Stationary probabilities, 211–215, 375 Stationary processes, 633 weakly, 634 Steady-state distribution, 544 Steady-state probability, 253–254, 484, 514 Stirling’s approximation, 199–200, 203, 225–226 Stochastic processes, 77, 144–145, 183, 436, 608 Strong law of large numbers, 73 Structure function, 560 Symmetric random walk, 199 T Tail distribution function, 293–296 Tandem queue, 512 Tandem/sequential system, 510 Taylor series expansion, 76–77 Time inventory, long-run proportion of, 443–444 Time reversibility, 380 equations, 241–242, 736 Time reversible chain, 385 Time reversible equations, 384–385 Time reversible Markov chain, 237 T-random variable, 98–99 Transition probabilities computing, 396 of continuous-time Markov chain, 366 defined, 237 function, 366 matrix, 724 Two-dimensional Poisson process, 677 Two independent uniform random variables, sum of, 53 Two-state continuous-time Markov chain, 394, 422, 424, 593 Two-state Markov chain, 219–220 Two-step transition matrix, 188–189

Index

U Unconditional probabilities, 194 Uniform distribution, 143–144 Uniformization, 393 Uniformly distributed components, series system of, 588 Uniform priors, 141 Uniform random variable, 31, 712 expectation of, 37

767

Von Neumann algorithm, 662 W Wald’s equation, 419–420, 656, 663–664, 739 Weakly stationary processes, 634 harmonic analysis of, 637 Weibull distribution, 582 White noise transformation, 628 Wiener process, 608

V Variance parameter, process with, 622 Viterbi algorithm, 259

Y Yule process, 360, 367–368