Probability density function vs cumulative distribution function

Contents

  • Definition of the Cumulative Distribution Function
  • Functions of a Continuous Random Variable

For any random variable X,X, the cumulative distribution function FXF_X is defined as

FX(x)=P(X≤x), F_X(x) = P(X \leq x),

which is the probability that XX is less than or equal to x.x.

Using this definition, one can write the probability that XX takes a value in a certain interval [a,b][a,b] without using an integral. Recall that previously this probability was defined in terms of a PDF:

P(a≤X≤b)=∫abfX(x)dx.P(a\leq X \leq b) = \int_a^b f_X (x) \,dx.

Now, the probability is rewritten as the difference in values of the CDF:

P(a≤X≤b)=FX(b)−F X(a).P(a \leq X \leq b) = F_X(b) - F_X(a).

So the CDF gives the amount of area underneath the PDF between two points. It increases from zero (for very low values of xx ) to one (for very high values of xx). This is because as x→−∞x \to -\infty, there is no probability that X X will be found that far out if the PDF is normalized. If x→∞x \to \infty, this corresponds to P(X≤∞)P(X \leq \infty) which will be one because it is certain that XX takes some finite value.

In the case of discrete random variables, the value of FX F_X makes a discrete jump at all possible values of xx; the size of the jump corresponds to the probability P(X=x)P(X = x) of that value. In the case of a continuous random variable, the function increases continuously; it is not meaningful to speak of the probability that X=xX = x because this probability is always zero. Instead one considers the probability that the value of XX lies in a given interval:

P(X∈[a,b])=P(a≤X≤ b)=FX(b)−FX(a).P(X \in [a,b]) = P(a ≤ X ≤ b) = F_X(b)-F_X(a).

Note that it does not matter if the inequalities are strict (if the interval is [a,b][a,b] or (a,b)(a,b) for example): since the probability of any given value is zero, the endpoints can be included or not without changing any probabilities.

Still, one frequently wants to make use of the probability density function fX(x)f_X (x) rather than the CDF. Since the CDF corresponds to the integral of the PDF, the PDF corresponds to the derivative of the CDF:

fX(x)=FX′(x)=dFXdx.f_X(x) = F_X'(x) = \frac{dF_X}{dx} .

A fly lands on a 30 cm30\text{ cm} long ruler at a random position chosen uniformly along the ruler. Let X X be the position of the fly in centimeters, and let fX(x)f_X(x) be the probability density function for X. X. What is fX(5)f_X(5)?


Solution:

This probability distribution is uniform, meaning that the probability density is constant on the entire interval [0,30][0, 30]. This means that FXF_X is a linear function: FX (x)={0x≤0x30 0≤x≤30130≤x.F_X(x) = \left\{\begin{array}{ll} 0 & x \leq 0 \\ \frac{x}{30} & 0 \leq x \leq 30 \\ 1 & 30 \leq x. \end{array}\right. The probability density function is the derivative: fX(x)={0 x≤01300≤x≤30030 ≤x.f_X(x) = \left\{\begin{array}{ll} 0 & x \leq 0 \\ \frac{1}{30} & 0 \leq x \leq 30 \\ 0 & 30 \leq x. \end{array}\right.

Therefore the probability density function at x=5x = 5 is equal to 130.\frac{1}{30}.

A dart player always hits the dartboard (with a radius of 20 cm20\text{ cm} ), but has such a poor aim that the distribution of darts is uniform across the entire board. Let RR be the distance in cm between the dart and the center. Evaluate the probability density function for RR at 0,0, 10,10, and 20.20.


Solution:

The probability P(R<r)P(R < r) is directly proportional to the area of a circle with radius rr:

FR(r)=P(R<r)=area of circle with radius rarea of dartboard=πr2π×2 02=(r20)2.F_R(r) = P(R < r) = \frac{\text{area of circle with radius}\ r}{\text{area of dartboard}} = \frac{\pi r^2}{\pi\times 20^2} = \left(\frac r{20}\right)^2.

The probability density function is the derivative:

fR(r)=r200.f_R(r) = \frac r{200}.

Thus one obtains:

fR(0 )=0,  fR(10)=120,  fR(20)=110.f_R(0) = 0,\ \ f_R(10) = \tfrac1{20},\ \ f_R(20) = \tfrac1{10}.

1−e−100λ1-e^{-100 \lambda} λe− 100λ\lambda e^{-100 \lambda} e100λe^{100 \lambda} −e−100λ-e^{-100 \lambda}

The probability density function of a certain random variable XX is:

f X(x)=λe−λx,f_X (x) = \lambda e^{-\lambda x},

where xx takes values in [0,∞)[0,\infty).

Find the probability that x<100x < 100.

One question that often comes up in applications of continuous probability is the following: given the PDF of a random variable, is it possible to find the PDF of an arbitrary function of that random variable?

The answer is yes, and the easiest method uses the CDF of the random variable. The general case goes as follows: consider the CDF FX( x)F_X (x) of the random variable XX, and let Z=g(X)Z = g(X) be a function of XX. It's important to note the distinction between upper and lower case: XX is a random variable while x x is a real number. Recall that the PDF is given by the derivative of the CDF:

fX(x)=ddXFX(x)=d dxP(X≤x).f_X (x) = \frac{d}{dX} F_X (x) = \frac{d}{dx} P(X \leq x).

Now write the formula for the CDF of ZZ:

fZ(z)=ddzP(Z≤z)=ddzP(g(X)≤z )=ddzP(X≤g−1(z))=ddzFX(g−1(z)).f_Z (z) = \frac{d}{dz} P(Z \leq z) = \frac{d}{dz} P(g(X) \leq z) = \frac{d}{dz} P(X \leq g^{-1} (z)) = \frac{d}{dz} F_X (g^{-1} (z)).

If gg is invertible and increasing, then by the chain rule:

fZ(z)=fX(g−1(z))dg−1(z)dz.f_Z (z) = f_X (g^{-1} (z)) \frac{dg^{-1} (z)}{dz}.

This formula can be generalized straightforwardly to cases where gg is not invertible or increasing.

Consider a uniform random variable on the interval [0,1][0,1]. Find the distribution (i.e., PDF) of Z=X3Z = X^3.


Solution:

Note that Z=g(X)Z = g(X) where gg is an invertible and increasing function, so the discussion above will apply. The CDF of XX is:

FX(x)= x.F_X (x) = x.

So:

fZ(z)=ddzFX(g−1(z))=ddzz1/3=13z−2/3.f_Z (z) = \frac{d}{dz} F_X (g^{-1} (z)) = \frac{d}{dz} z^{1/3} = \frac13 z^{-2/3}.

This is consistent with the formula derived above.

What is the relationship between probability density function and cumulative distribution function?

Probability and Random Variables (1.7), p(x) = F′(x). Thus, the probability density is the derivative of the cumulative distribution function. This in turn implies that the probability density is always nonnegative, p(x) ≥ 0, because F is monotone increasing.

Is cumulative distribution function same as probability distribution function?

The cumulative distribution function is used to describe the probability distribution of random variables. It can be used to describe the probability for a discrete, continuous or mixed variable. It is obtained by summing up the probability density function and getting the cumulative probability for a random variable.

What is the difference between probability and cumulative probability?

Probability is the measure of the possibility that a given event will occur. Cumulative probability is the measure of the chance that two or more events will happen. Usually, this consists of events in a sequence, such as flipping "heads" twice in a row on a coin toss, but the events may also be concurrent.

What is the major difference between CDF and PMF or PDF )?

The PMF is one way to describe the distribution of a discrete random variable. As we will see later on, PMF cannot be defined for continuous random variables. The cumulative distribution function (CDF) of a random variable is another method to describe the distribution of random variables.