readna.blogspot.com

Monday, December 23, 2013

who want to learn caculus

Preliminaries
Consider the function f (x) = cos(x), its derivative f
(x) = −sin(x), and its antiderivative
F(x) = sin(x) + C. These formulas were studied in calculus. The former
is used to determine the slope m = f
(x0) of the curve y = f (x) at a point (x0, f (x0)),
and the latter is used to compute the area under the curve for a ≤ x ≤ b.
The slope at the point (π/2, 0) is m = f
(π/2) = −1 and can be used to find the
tangent line at this point (see Figure 1.1(a)):
ytan = m

x − π
2

+ 0 = f
π
2

x − π
2

= −x + π
2
.
y
0.5
0.0
0.5
1.0 1.5 2.0
x
1.0
−0.5
y = cos(x)
Figure 1.1 (a) The tangent line to
the curve y = cos(x) at the point
(π/2, 0).
1
2 CHAP. 1 PRELIMINARIES
0.5
0.0
0.5
1.0 1.5 2.0
x
1.0
−0.5
y
y = cos(x)
Figure 1.1 (b) The area under the
curve y = cos(x) over the interval
[0, π/2].
The area under the curve for 0 ≤ x ≤ π/2 is computed using an integral (see Figure
1.1(b)):
area =
π/2
0
cos(x) dx = F
π
2

− F(0) = sin
π
2

− 0 = 1.
These are some of the results that we will need to use from calculus.
1.1 Review of Calculus
It is assumed that the reader is familiar with the notation and subject matter covered in
the undergraduate calculus sequence. This should have included the topics of limits,
continuity, differentiation, integration, sequences, and series. Throughout the book we
refer to the following results.
Limits and Continuity
Definition 1.1. Assume that f (x) is defined on an open interval containing x = x0,
except possibly at x = x0 itself. Then f is said to have the limit L at x = x0, and we
write
(1) lim
x→x0
f (x) = L,
if given any > 0 there exists a δ > 0 such that | f (x) − L| < whenever 0 <
|x − x0| < δ. When the h-increment notation x = x0 + h is used, equation (1)
becomes
(2) lim
h→0
f (x0 + h) = L.
SEC. 1.1 REVIEW OF CALCULUS 3
Definition 1.2. Assume that f (x) is defined on an open interval containing x = x0.
Then f is said to be continuous at x = x0 if
(3) lim
x→x0
f (x) = f (x0).
The function f is said to be continuous on a set S if it is continuous at each point
x ∈ S. The notation Cn(S) stands for the set of all functions f such that f and its
first n derivatives are continuous on S. When S is an interval, say [a, b], then the
notation Cn[a, b] is used. As an example, consider the function f (x) = x4/3 on the
interval [−1, 1]. Clearly, f (x) and f
(x) = (4/3)x1/3 are continuous on [−1, 1],
while f
(x) = (4/9)x−2/3 is not continuous at x = 0.
Definition 1.3. Suppose that {xn}∞
n=1 is an infinite sequence. Then the sequence is
said to have the limit L, and we write
(4) lim
n→∞
xn = L,
if given any > 0, there exists a positive integer N = N( ) such that n > N implies
that |xn − L| < .
When a sequence has a limit, we say that it is a convergent sequence. Another
commonly used notation is “xn → L as n→∞.” Equation (4) is equivalent to
(5) lim
n→∞
(xn − L) = 0.
Thus we can view the sequence { n}∞
n=1
= {xn − L}∞
n=1 as an error sequence. The
following theorem relates the concepts of continuity and convergent sequence.
Theorem 1.1. Assume that f (x) is defined on the set S and x0 ∈ S. The following
statements are equivalent:
(a) The function f is continuous at x0.
(b) If lim
n→∞
xn = x0, then lim
n→∞
f (xn) = f (x0) (6) .
Theorem 1.2 (Intermediate Value Theorem). Assume that f ∈ C[a, b] and L is
any number between f (a) and f (b). Then there exists a number c, with c ∈ (a, b),
such that f (c) = L.
Example 1.1. The function f (x) = cos(x −1) is continuous over [0, 1], and the constant
L = 0.8 ∈ (cos(0), cos(1)). The solution to f (x) = 0.8 over [0, 1] is c1 = 0.356499.
Similarly, f (x) is continuous over [1, 2.5], and L = 0.8 ∈ (cos(2.5), cos(1)). The solution
to f (x) = 0.8 over [1, 2.5] is c2 = 1.643502. These two cases are shown in Figure 1.2.
4 CHAP. 1 PRELIMINARIES
y
0.0 x
0.5 1.0 1.5 2.0 2.5
0.2
0.4
0.6
0.8
1.0
c1 c2
y = L
y = f (x)
Figure 1.2 The intermediate value
theorem applied to the function
f (x) = cos(x − 1) over [0, 1] and
over the interval [1, 2.5].
0.0
(a, f (a))
y
(x1, f (x1))
y = f (x)
(x2, f (x2))
x
(b, f (b))
0.5 1.0 1.5 2.0 2.5 3.0
10
20
30
40
50
60
Figure 1.3 The extreme value
theorem applied to the function
f (x) = 35 + 59.5x − 66.5x2 + 15x3
over the interval [0, 3].
Theorem 1.3 (Extreme Value Theorem for a Continuous Function). Assume that
f ∈ C[a, b]. Then there exists a lower bound M1, an upper bound M2, and two
numbers x1, x2 ∈ [a, b] such that
(7) M1 = f (x1) ≤ f (x) ≤ f (x2) = M2 whenever x ∈ [a, b].
We sometimes express this by writing
(8) M1 = f (x1) = min
a≤x≤b
{ f (x)} and M2 = f (x2) = max
a≤x≤b
{ f (x)}.
Differentiable Functions
Definition 1.4. Assume that f (x) is defined on an open interval containing x0. Then
f is said to be differentiable at x0 if
(9) lim
x→x0
f (x) − f (x0)
x − x0
SEC. 1.1 REVIEW OF CALCULUS 5
exists. When this limit exists, it is denoted by f
(x0) and is called the derivative of f
at x0. An equivalent way to express this limit is to use the h-increment notation:
(10) lim
h→0
f (x0 + h) − f (x0)
h
= f
(x0).
A function that has a derivative at each point in a set S is said to be differentiable
on S. Note that the number m = f
(x0) is the slope of the tangent line to the graph of
the function y = f (x) at the point (x0, f (x0)).
Theorem 1.4. If f (x) is differentiable at x = x0, then f (x) is continuous at x = x0.
It follows from Theorem 1.3 that if a function f is differentiable on a closed interval
[a, b], then its extreme values occur at the endpoints of the interval or at the critical
points (solutions of f
(x) = 0) in the open interval (a, b).
Example 1.2. The function f (x) = 15x3−66.5x2+59.5x+35 is differentiable on [0, 3].
The solutions to f
(x) = 45x2 − 123x + 59.5 = 0 are x1 = 0.54955 and x2 = 2.40601.
The maximum and minimum values of f on [0, 3] are:
min{ f (0), f (3), f (x1), f (x2)} = min{35, 20, 50.10438, 2.11850} = 2.11850
and
max{ f (0), f (3), f (x1), f (x2)} = max{35, 20, 50.10438, 2.11850} = 50.10438
(see Figure 1.3).
Theorem 1.5 (Rolle’s Theorem). Assume that f ∈ C[a, b] and that f
(x) exists for
all x ∈ (a, b). If f (a) = f (b) = 0, then there exists a number c, with c ∈ (a, b), such
that f
(c) = 0.
Theorem 1.6 (Mean Value Theorem). Assume that f ∈ C[a, b] and that f
(x)
exists for all x ∈ (a, b). Then there exists a number c, with c ∈ (a, b), such that
(11) f
(c) = f (b) − f (a)
b − a
.
Geometrically, the mean value theorem says that there is at least one number c ∈
(a, b) such that the slope of the tangent line to the graph of y = f (x) at the point
(c, f (c)) equals the slope of the secant line through the points (a, f (a)) and (b, f (b)).
Example 1.3. The function f (x) = sin(x) is continuous on the closed interval [0.1, 2.1]
and differentiable on the open interval (0.1, 2.1). Thus, by the mean value theorem, there
is a number c such that
f
(c) = f (2.1) − f (0.1)
2.1 − 0.1
= 0.863209 − 0.099833
2.1 − 0.1
= 0.381688.
The solution to f
(c) = cos(c) = 0.381688 in the interval (0.1, 2.1) is c = 1.179174.
The graphs of f (x), the secant line y = 0.381688x + 0.099833, and the tangent line
y = 0.381688x + 0.474215 are shown in Figure 1.4.
6 CHAP. 1 PRELIMINARIES
a 0.5 1.0 c 1.5 2.0 b
f (a)
f (b)
1.0 (c, f (c))
(a, f (a))
(b, f (b))
m = f ′(c)
0.5
y
x
Figure 1.4 The mean value theorem applied to f (x) =
sin(x) over the interval [0.1, 2.1].
Theorem 1.7 (Generalized Rolle’s Theorem). Assume that f ∈ C[a, b] and that
f
(x), f
(x), . . . , f (n)(x) exist over (a, b) and x0, x1, . . . , xn ∈ [a, b]. If f (x j ) = 0
for j = 0, 1, . . . , n, then there exists a number c, with c ∈ (a, b), such that f (n)(c) = 0.
Integrals
Theorem 1.8 (First Fundamental Theorem). If f is continuous over [a, b] and F
is any antiderivative of f on [a, b], then
(12)
b
a
f (x) dx = F(b) − F(a) where F
(x) = f (x).
Theorem 1.9 (Second Fundamental Theorem). If f is continuous over [a, b] and
x ∈ (a, b), then
(13)
d
dx
x
a
f (t) dt = f (x).
Example 1.4. The function f (x) = cos(x) satisfies the hypotheses of Theorem 1.9 over
the interval [0, π/2]; thus by the chain rule
d
dx
x2
0
cos(t) dt = cos(x2)(x2)
= 2x cos(x2).
Theorem 1.10 (Mean Value Theorem for Integrals). Assume that f ∈ C[a, b].
Then there exists a number c, with c ∈ (a, b), such that
1
b − a
b
a
f (x) dx = f (c).
The value f (c) is the average value of f over the interval [a, b].
SEC. 1.1 REVIEW OF CALCULUS 7
0.0
0.2
0.4
0.6
0.8
0.0 0.5 1.0 1.5 2.0 2.5
y
y = f (x)
x
Figure 1.5 The mean value
theorem for integrals applied to
f (x) = sin(x) + 13
sin(3x) over the
interval [0, 2.5].
Example 1.5. The function f (x) = sin(x) + 13
sin(3x) satisfies the hypotheses of Theorem
1.10 over the interval [0, 2.5]. An antiderivative of f (x) is F(x) = −cos(x) −
19
cos(3x). The average value of the function f (x) over the interval [0, 2.5] is
1
2.5 − 0
2.5
0
f (x) dx = F(2.5) − F(0)
2.5
= 0.762629 − (−1.111111)
2.5
= 1.873740
2.5
= 0.749496.
There are three solutions to the equation f (c) = 0.749496 over the interval [0, 2.5]:
c1 = 0.440566, c2 = 1.268010, and c3 = 1.873583. The area of the rectangle with
base b − a = 2.5 and height f (c j ) = 0.749496 is f (c j )(b − a) = 1.873740. The area
of the rectangle has the same numerical value as the integral of f (x) taken over the interval
[0, 2.5]. A comparison of the area under the curve y = f (x) and that of the rectangle
can be seen in Figure 1.5.
Theorem 1.11 (Weighted Integral Mean Value Theorem). Assume that f, g ∈
C[a, b] and g(x) ≥ 0 for x ∈ [a, b]. Then there exists a number c, with c ∈ (a, b),
such that
(14)
b
a
f (x)g(x) dx = f (c)
b
a
g(x) dx.
Example 1.6. The functions f (x) = sin(x) and g(x) = x2 satisfy the hypotheses of
Theorem 1.11 over the interval [0, π/2]. Thus there exists a number c such that
sin(c) =
π/2
0 x2 sin(x) dx
π/2
0 x2 dx
= 1.14159
1.29193
= 0.883631
or c = sin−1(0.883631) = 1.08356.
8 CHAP. 1 PRELIMINARIES
Series
Definition 1.5. Let {an}∞
n=1 be a sequence. Then

n=1 an is an infinite series. The
nth partial sum is Sn = nk
=1 ak . The infinite series converges if and only if the
sequence {Sn}∞
n=1 converges to a limit S, that is,
(15) lim
n→∞
Sn = lim
n→∞
n
k=1
ak = S.
If a series does not converge, we say that it diverges.
Example 1.7. Consider the infinite sequence {an}∞
n=1
=

1
n(n + 1)

n=1
. Then the nth
partial sum is
Sn =
n
k=1
1
k(k + 1)
=
n
k=1
   
1
k
− 1
k + 1


= 1 − 1
n + 1
.
Therefore, the sum of the infinite series is
S = lim
n→∞
Sn = lim
n→∞
   
1 − 1
n + 1


= 1.
Theorem 1.12 (Taylor’s Theorem). Assume that f ∈ Cn+1[a, b] and let x0 ∈
[a, b]. Then, for every x ∈ (a, b), there exists a number c = c(x) (the value of c
depends on the value of x) that lies between x0 and x such that
(16) f (x) = Pn(x) + Rn(x),
where
(17) Pn(x) =
n
k=0
f (k)(x0)
k! (x − x0)k
and
(18) Rn(x) = f (n+1)(c)
(n + 1)! (x − x0)n+1.
Example 1.8. The function f (x) = sin(x) satisfies the hypotheses of Theorem 1.12. The
Taylor polynomial Pn(x) of degree n = 9 expanded about x0 = 0 is obtained by evaluating
SEC. 1.1 REVIEW OF CALCULUS 9
−1.0
−0.5
0.0
0.5
1.0
1 2 3 4 5 6
y
y = P(x)
x
y = f (x)
Figure 1.6 The graph of f (x) = sin(x) and the Taylor
polynomial P9(x) = x − x3/3! + x5/5! − x7/7! + x9/9!.
the following derivatives at x = 0 and substituting the numerical values into formula (17).
f (x) = sin(x), f (0) = 0,
f
(x) = cos(x), f
(0) = 1,
f
(x) = −sin(x), f
(0) = 0,
f (3)(x) = −cos(x), f (3)(0) = −1,
...
...
f (9)(x) = cos(x), f (9)(0) = 1,
P9(x) = x − x3
3!
+ x5
5!
− x7
7!
+ x9
9! .
A graph of both f and P9 over the interval [0, 2π] is shown in Figure 1.6.
Corollary 1.1. If Pn(x) is the Taylor polynomial of degree n given in Theorem 1.12,
then
(19) P(k)
n (x0) = f (k)(x0) for k = 0, 1, . . . , n.
Evaluation of a Polynomial
Let the polynomial P(x) of degree n have the form
(20) P(x) = anxn + an−1xn−1 +· · ·+a2x2 + a1x + a0.
10 CHAP. 1 PRELIMINARIES
Horner’s method or synthetic division is a technique for evaluating polynomials. It
can be thought of as nested multiplication. For example, a fifth-degree polynomial can
be written in the nested multiplication form
P5(x) = ((((a5x + a4)x + a3)x + a2)x + a1)x + a0.
Theorem 1.13 (Horner’s Method for Polynomial Evaluation). Assume that P(x)
is the polynomial given in equation (20) and x = c is a number for which P(c) is to be
evaluated.
Set bn = an and compute
(21) bk = ak + cbk+1 for k = n − 1, n − 2, . . ., 1, 0;
then b0 = P(c). Moreover, if
(22) Q0(x) = bnxn−1 + bn−1xn−2 +· · ·+b3x2 + b2x + b1,
then
(23) P(x) = (x − c)Q0(x) + R0,
where Q0(x) is the quotient polynomial of degree n − 1 and R0 = b0 = P(c) is the
remainder.
Proof. Substituting the right side of equation (22) for Q0(x) and b0 for R0 in equation
(23) yields
P(x) = (x − c)(bnxn−1 + bn−1xn−2 +· · ·+b3x2 + b2x + b1) + b0
= bnxn + (bn−1 − cbn)xn−1 +· · ·+(b2 − cb3)x2
+ (b1 − cb2)x + (b0 − cb1).
(24)
The numbers bk are determined by comparing the coefficients of xk in equations (20)
and (24), as shown in Table 1.1.
The value P(c) = b0 is easily obtained by substituting x = c into equation (22)
and using the fact that R0 = b0:
(25) P(c) = (c − c)Q0(c) + R0 = b0. •
The recursive formula for bk given in (21) is easy to implement with a computer.
A simple algorithm is
b(n) = a(n);
for k = n − 1: −1: 0
b(k) = a(k) + c ∗ b(k + 1);
end
SEC. 1.1 REVIEW OF CALCULUS 11
Table 1.1 Coefficients bk for Horner’s Method
xk Comparing (20) and (24) Solving for bk
xn an =bn bn =an
xn−1 an−1 =bn−1−cbn bn−1 =an−1+cbn
...
...
...
xk ak =bk −cbk+1 bk =ak +cbk+1
...
...
...
x0 a0 =b0 − cb1 b0 =a0 + cb1
Table 1.2 Horner’s Table for the Synthetic Division Process
Input an an−1 an−2 · · · ak · · · a2 a1 a0
c xbn xbn−1 · · · xbk+1 · · · xb3 xb2 xb1
bn bn−1 bn−2 · · · bk · · · b2 b1 b0 = P(c)
Output
When Horner’s method is performed by hand, it is easier to write the coefficients of
P(x) on a line and perform the calculation bk = ak + cbk+1 below ak in a column.
The format for this procedure is illustrated in Table 1.2.
Example 1.9. Use synthetic division (Horner’s method) to find P(3) for the polynomial
P(x) = x5 − 6x4 + 8x3 + 8x2 + 4x − 40.
a5 a4 a3 a2 a1 a0
Input 1 −6 8 8 4 −40
c = 3 3 −9 −3 15 57
1 −3 −1 5 19 17 = P(3) = b0
b5 b4 b3 b2 b1 Output
Therefore, P(3) = 17.
Numerical Methods Using Matlab, 4th Edition, 2004
John H. Mathews and Kurtis K. Fink
ISBN: 0-13-065248-2
Prentice-Hall Inc.
Upper Saddle River, New Jersey, USA
http://vig.prenhall.com/

Saturday, December 7, 2013


Correlation and regression

Abstract

The present review introduces methods of analyzing the relationship between two quantitative variables. The calculation and interpretation of the sample product moment correlation coefficient and the linear regression equation are discussed and illustrated. Common misuses of the techniques are considered. Tests and confidence intervals for the population parameters are described, and failures of the underlying assumptions are highlighted.
Keywords: coefficient of determination, correlation coefficient, least squares regression line

Introduction

The most commonly used techniques for investigating the relationship between two quantitative variables are correlation and linear regression. Correlation quantifies the strength of the linear relationship between a pair of variables, whereas regression expresses the relationship in the form of an equation. For example, in patients attending an accident and emergency unit (A&E), we could use correlation and regression to determine whether there is a relationship between age and urea level, and whether the level of urea can be predicted for a given age.

Scatter diagram

When investigating a relationship between two variables, the first step is to show the data values graphically on a scatter diagram. Consider the data given in Table Table1.1. These are the ages (years) and the logarithmically transformed admission serum urea (natural logarithm [ln] urea) for 20 patients attending an A&E. The reason for transforming the urea levels was to obtain a more Normal distribution [1]. The scatter diagram for ln urea and age (Fig. (Fig.1)1) suggests there is a positive linear relationship between these variables.
Figure 1
Scatter diagram for ln urea and age
Table 1
Age and ln urea for 20 patients attending an accident and emergency unit

Correlation

On a scatter diagram, the closer the points lie to a straight line, the stronger the linear relationship between two variables. To quantify the strength of the relationship, we can calculate the correlation coefficient. In algebraic notation, if we have two variables x and y, and the data take the form of n pairs (i.e. [x1, y1], [x2, y2], [x3, y3] ... [xn, yn]), then the correlation coefficient is given by the following equation:
An external file that holds a picture, illustration, etc.
Object name is cc2401-i1.gif
where An external file that holds a picture, illustration, etc.
Object name is cc2401-i2.gif is the mean of the x values, and An external file that holds a picture, illustration, etc.
Object name is cc2401-i3.gif is the mean of the y values.
This is the product moment correlation coefficient (or Pearson correlation coefficient). The value of r always lies between -1 and +1. A value of the correlation coefficient close to +1 indicates a strong positive linear relationship (i.e. one variable increases with the other; Fig. Fig.2).2). A value close to -1 indicates a strong negative linear relationship (i.e. one variable decreases as the other increases; Fig. Fig.3).3). A value close to 0 indicates no linear relationship (Fig. (Fig.4);4); however, there could be a nonlinear relationship between the variables (Fig. (Fig.55).
Figure 2
Correlation coefficient (r) = +0.9. Positive linear relationship.
Figure 3
Correlation coefficient (r) = -0.9. Negative linear relationship.
Figure 4
Correlation coefficient (r) = 0.04. No relationship.
Figure 5
Correlation coefficient (r) = -0.03. Nonlinear relationship.
For the A&E data, the correlation coefficient is 0.62, indicating a moderate positive linear relationship between the two variables.

Hypothesis test of correlation

We can use the correlation coefficient to test whether there is a linear relationship between the variables in the population as a whole. The null hypothesis is that the population correlation coefficient equals 0. The value of r can be compared with those given in Table Table2,2, or alternatively exact P values can be obtained from most statistical packages. For the A&E data, r = 0.62 with a sample size of 20 is greater than the value highlighted bold in Table Table22 for P = 0.01, indicating a P value of less than 0.01. Therefore, there is sufficient evidence to suggest that the true population correlation coefficient is not 0 and that there is a linear relationship between ln urea and age.
Table 2
5% and 1% points for the distribution of the correlation coefficient under the null hypothesis that the population correlation is 0 in a two-tailed test

Confidence interval for the population correlation coefficient

Although the hypothesis test indicates whether there is a linear relationship, it gives no indication of the strength of that relationship. This additional information can be obtained from a confidence interval for the population correlation coefficient.
To calculate a confidence interval, r must be transformed to give a Normal distribution making use of Fisher's z transformation [2]:
An external file that holds a picture, illustration, etc.
Object name is cc2401-i4.gif
The standard error [3] of zr is approximately:
An external file that holds a picture, illustration, etc.
Object name is cc2401-i5.gif
and hence a 95% confidence interval for the true population value for the transformed correlation coefficient zr is given by zr - (1.96 × standard error) to zr + (1.96 × standard error). Because zr is Normally distributed, 1.96 deviations from the statistic will give a 95% confidence interval.
For the A&E data the transformed correlation coefficient zr between ln urea and age is:
An external file that holds a picture, illustration, etc.
Object name is cc2401-i6.gif
The standard error of zr is:
An external file that holds a picture, illustration, etc.
Object name is cc2401-i7.gif
The 95% confidence interval for zr is therefore 0.725 - (1.96 × 0.242) to 0.725 + (1.96 × 0.242), giving 0.251 to 1.199.
We must use the inverse of Fisher's transformation on the lower and upper limits of this confidence interval to obtain the 95% confidence interval for the correlation coefficient. The lower limit is:
An external file that holds a picture, illustration, etc.
Object name is cc2401-i8.gif
giving 0.25 and the upper limit is:
An external file that holds a picture, illustration, etc.
Object name is cc2401-i9.gif
giving 0.83. Therefore, we are 95% confident that the population correlation coefficient is between 0.25 and 0.83.
The width of the confidence interval clearly depends on the sample size, and therefore it is possible to calculate the sample size required for a given level of accuracy. For an example, see Bland [4].

Misuse of correlation

There are a number of common situations in which the correlation coefficient can be misinterpreted.
One of the most common errors in interpreting the correlation coefficient is failure to consider that there may be a third variable related to both of the variables being investigated, which is responsible for the apparent correlation. Correlation does not imply causation. To strengthen the case for causality, consideration must be given to other possible underlying variables and to whether the relationship holds in other populations.
A nonlinear relationship may exist between two variables that would be inadequately described, or possibly even undetected, by the correlation coefficient.
A data set may sometimes comprise distinct subgroups, for example males and females. This could result in clusters of points leading to an inflated correlation coefficient (Fig. (Fig.6).6). A single outlier may produce the same sort of effect.
Figure 6
Subgroups in the data resulting in a misleading correlation. All data: r = 0.57; males: r = -0.41; females: r = -0.26.
It is important that the values of one variable are not determined in advance or restricted to a certain range. This may lead to an invalid estimate of the true correlation coefficient because the subjects are not a random sample.
Another situation in which a correlation coefficient is sometimes misinterpreted is when comparing two methods of measurement. A high correlation can be incorrectly taken to mean that there is agreement between the two methods. An analysis that investigates the differences between pairs of observations, such as that formulated by Bland and Altman [5], is more appropriate.

Regression

In the A&E example we are interested in the effect of age (the predictor or x variable) on ln urea (the response or y variable). We want to estimate the underlying linear relationship so that we can predict ln urea (and hence urea) for a given age. Regression can be used to find the equation of this line. This line is usually referred to as the regression line.
Note that in a scatter diagram the response variable is always plotted on the vertical (y) axis.

Equation of a straight line

The equation of a straight line is given by y = a + bx, where the coefficients a and b are the intercept of the line on the y axis and the gradient, respectively. The equation of the regression line for the A&E data (Fig. (Fig.7)7) is as follows: ln urea = 0.72 + (0.017 × age) (calculated using the method of least squares, which is described below). The gradient of this line is 0.017, which indicates that for an increase of 1 year in age the expected increase in ln urea is 0.017 units (and hence the expected increase in urea is 1.02 mmol/l). The predicted ln urea of a patient aged 60 years, for example, is 0.72 + (0.017 × 60) = 1.74 units. This transforms to a urea level of e1.74 = 5.70 mmol/l. The y intercept is 0.72, meaning that if the line were projected back to age = 0, then the ln urea value would be 0.72. However, this is not a meaningful value because age = 0 is a long way outside the range of the data and therefore there is no reason to believe that the straight line would still be appropriate.
Figure 7
Regression line for ln urea and age: ln urea = 0.72 + (0.017 × age).

Method of least squares

The regression line is obtained using the method of least squares. Any line y = a + bx that we draw through the points gives a predicted or fitted value of y for each value of x in the data set. For a particular value of x the vertical difference between the observed and fitted value of y is known as the deviation, or residual (Fig. (Fig.8).8). The method of least squares finds the values of a and b that minimise the sum of the squares of all the deviations. This gives the following formulae for calculating a and b:
Figure 8
Regression line obtained by minimizing the sums of squares of all of the deviations.
An external file that holds a picture, illustration, etc.
Object name is cc2401-i10.gif
Usually, these values would be calculated using a statistical package or the statistical functions on a calculator.

Hypothesis tests and confidence intervals

We can test the null hypotheses that the population intercept and gradient are each equal to 0 using test statistics given by the estimate of the coefficient divided by its standard error.
An external file that holds a picture, illustration, etc.
Object name is cc2401-i11.gif
An external file that holds a picture, illustration, etc.
Object name is cc2401-i12.gif
An external file that holds a picture, illustration, etc.
Object name is cc2401-i13.gif
The test statistics are compared with the t distribution on n - 2 (sample size - number of regression coefficients) degrees of freedom [4].
The 95% confidence interval for each of the population coefficients are calculated as follows: coefficient ± (tn-2 × the standard error), where tn-2 is the 5% point for a t distribution with n - 2 degrees of freedom.
For the A&E data, the output (Table (Table3)3) was obtained from a statistical package. The P value for the coefficient of ln urea (0.004) gives strong evidence against the null hypothesis, indicating that the population coefficient is not 0 and that there is a linear relationship between ln urea and age. The coefficient of ln urea is the gradient of the regression line and its hypothesis test is equivalent to the test of the population correlation coefficient discussed above. The P value for the constant of 0.054 provides insufficient evidence to indicate that the population coefficient is different from 0. Although the intercept is not significant, it is still appropriate to keep it in the equation. There are some situations in which a straight line passing through the origin is known to be appropriate for the data, and in this case a special regression analysis can be carried out that omits the constant [6].
Table 3
Regression parameter estimates, P values and confidence intervals for the accident and emergency unit data

Analysis of variance

As stated above, the method of least squares minimizes the sum of squares of the deviations of the points about the regression line. Consider the small data set illustrated in Fig. Fig.9.9. This figure shows that, for a particular value of x, the distance of y from the mean of y (the total deviation) is the sum of the distance of the fitted y value from the mean (the deviation explained by the regression) and the distance from y to the line (the deviation not explained by the regression).
Figure 9
Total, explained and unexplained deviations for a point.
The regression line for these data is given by y = 6 + 2x. The observed, fitted values and deviations are given in Table Table4.4. The sum of squared deviations can be compared with the total variation in y, which is measured by the sum of squares of the deviations of y from the mean of y. Table Table44 illustrates the relationship between the sums of squares. Total sum of squares = sum of squares explained by the regression line + sum of squares not explained by the regression line. The explained sum of squares is referred to as the 'regression sum of squares' and the unexplained sum of squares is referred to as the 'residual sum of squares'.
Table 4
Small data set with the fitted values from the regression, the deviations and their sums of squares
This partitioning of the total sum of squares can be presented in an analysis of variance table (Table (Table5).5). The total degrees of freedom = n - 1, the regression degrees of freedom = 1, and the residual degrees of freedom = n - 2 (total - regression degrees of freedom). The mean squares are the sums of squares divided by their degrees of freedom.
Table 5
Analysis of variance for a small data set
If there were no linear relationship between the variables then the regression mean squares would be approximately the same as the residual mean squares. We can test the null hypothesis that there is no linear relationship using an F test. The test statistic is calculated as the regression mean square divided by the residual mean square, and a P value may be obtained by comparison of the test statistic with the F distribution with 1 and n - 2 degrees of freedom [2]. Usually, this analysis is carried out using a statistical package that will produce an exact P value. In fact, the F test from the analysis of variance is equivalent to the t test of the gradient for regression with only one predictor. This is not the case with more than one predictor, but this will be the subject of a future review. As discussed above, the test for gradient is also equivalent to that for the correlation, giving three tests with identical P values. Therefore, when there is only one predictor variable it does not matter which of these tests is used.
The analysis of variance for the A&E data (Table (Table6)6) gives a P value of 0.006 (the same P value as obtained previously), again indicating a linear relationship between ln urea and age.
Table 6
Analysis of variance for the accident and emergency unit data

Coefficent of determination

Another useful quantity that can be obtained from the analysis of variance is the coefficient of determination (R2).
An external file that holds a picture, illustration, etc.
Object name is cc2401-i14.gif
It is the proportion of the total variation in y accounted for by the regression model. Values of R2 close to 1 imply that most of the variability in y is explained by the regression model. R2 is the same as r2 in regression when there is only one predictor variable.
For the A&E data, R2 = 1.462/3.804 = 0.38 (i.e. the same as 0.622), and therefore age accounts for 38% of the total variation in ln urea. This means that 62% of the variation in ln urea is not accounted for by age differences. This may be due to inherent variability in ln urea or to other unknown factors that affect the level of ln urea.

Prediction

The fitted value of y for a given value of x is an estimate of the population mean of y for that particular value of x. As such it can be used to provide a confidence interval for the population mean [3]. The fitted values change as x changes, and therefore the confidence intervals will also change.
The 95% confidence interval for the fitted value of y for a particular value of x, say xp, is again calculated as fitted y ± (tn-2 × the standard error). The standard error is given by:
An external file that holds a picture, illustration, etc.
Object name is cc2401-i15.gif
Fig. Fig.1010 shows the range of confidence intervals for the A&E data. For example, the 95% confidence interval for the population mean ln urea for a patient aged 60 years is 1.56 to 1.92 units. This transforms to urea values of 4.76 to 6.82 mmol/l.
Figure 10
Regression line, its 95% confidence interval and the 95% prediction interval for individual patients.
The fitted value for y also provides a predicted value for an individual, and a prediction interval or reference range [3] can be obtained (Fig. (Fig.10).10). The prediction interval is calculated in the same way as the confidence interval but the standard error is given by:
An external file that holds a picture, illustration, etc.
Object name is cc2401-i16.gif
For example, the 95% prediction interval for the ln urea for a patient aged 60 years is 0.97 to 2.52 units. This transforms to urea values of 2.64 to 12.43 mmol/l.
Both confidence intervals and prediction intervals become wider for values of the predictor variable further from the mean.

Assumptions and limitations

The use of correlation and regression depends on some underlying assumptions. The observations are assumed to be independent. For correlation both variables should be random variables, but for regression only the response variable y must be random. In carrying out hypothesis tests or calculating confidence intervals for the regression parameters, the response variable should have a Normal distribution and the variability of y should be the same for each value of the predictor variable. The same assumptions are needed in testing the null hypothesis that the correlation is 0, but in order to interpret confidence intervals for the correlation coefficient both variables must be Normally distributed. Both correlation and regression assume that the relationship between the two variables is linear.
A scatter diagram of the data provides an initial check of the assumptions for regression. The assumptions can be assessed in more detail by looking at plots of the residuals [4,7]. Commonly, the residuals are plotted against the fitted values. If the relationship is linear and the variability constant, then the residuals should be evenly scattered around 0 along the range of fitted values (Fig. (Fig.1111).
Figure 11
(a) Scatter diagram of y against x suggests that the relationship is nonlinear. (b) Plot of residuals against fitted values in panel a; the curvature of the relationship is shown more clearly. (c) Scatter diagram of y against x suggests that the variability ...
In addition, a Normal plot of residuals can be produced. This is a plot of the residuals against the values they would be expected to take if they came from a standard Normal distribution (Normal scores). If the residuals are Normally distributed, then this plot will show a straight line. (A standard Normal distribution is a Normal distribution with mean = 0 and standard deviation = 1.) Normal plots are usually available in statistical packages.
Figs Figs1212 and and1313 show the residual plots for the A&E data. The plot of fitted values against residuals suggests that the assumptions of linearity and constant variance are satisfied. The Normal plot suggests that the distribution of the residuals is Normal.
Figure 12
Plot of residuals against fitted values for the accident and emergency unit data.
Figure 13
Normal plot of residuals for the accident and emergency unit data.
When using a regression equation for prediction, errors in prediction may not be just random but also be due to inadequacies in the model. In particular, extrapolating beyond the range of the data is very risky.
A phenomenon to be aware of that may arise with repeated measurements on individuals is regression to the mean. For example, if repeat measures of blood pressure are taken, then patients with higher than average values on their first reading will tend to have lower readings on their second measurement. Therefore, the difference between their second and first measurements will tend to be negative. The converse is true for patients with lower than average readings on their first measurement, resulting in an apparent rise in blood pressure. This could lead to misleading interpretations, for example that there may be an apparent negative correlation between change in blood pressure and initial blood pressure.

Conclusion

Both correlation and simple linear regression can be used to examine the presence of a linear relationship between two variables providing certain assumptions about the data are satisfied. The results of the analysis, however, need to be interpreted with care, particularly when looking for a causal relationship or when using the regression equation for prediction. Multiple and logistic regression will be the subject of future reviews.

Competing interests

None declared.


Random Variables

A random variable, usually written X, is a variable whose possible values are numerical outcomes of a random phenomenon. There are two types of random variables, discrete and continuous.

Discrete Random Variables

A discrete random variable is one which may take on only a countable number of distinct values such as 0,1,2,3,4,........ Discrete random variables are usually (but not necessarily) counts. If a random variable can take only a finite number of distinct values, then it must be discrete. Examples of discrete random variables include the number of children in a family, the Friday night attendance at a cinema, the number of patients in a doctor's surgery, the number of defective light bulbs in a box of ten. The probability distribution of a discrete random variable is a list of probabilities associated with each of its possible values. It is also sometimes called the probability function or the probability mass function.
(Definitions taken from Valerie J. Easton and John H. McColl's Statistics Glossary v1.1)
Suppose a random variable X may take k different values, with the probability that X = xi defined to be P(X = xi) = pi. The probabilities pi must satisfy the following:
1: 0 < pi < 1 for each i
2: p1 + p2 + ... + pk = 1.

Example

Suppose a variable X can take the values 1, 2, 3, or 4.
The probabilities associated with each outcome are described by the following table:
 Outcome  1 2 3 4
 Probability 0.1 0.3 0.4 0.2
The probability that X is equal to 2 or 3 is the sum of the two probabilities: P(X = 2 or X = 3) = P(X = 2) + P(X = 3) = 0.3 + 0.4 = 0.7. Similarly, the probability that X is greater than 1 is equal to 1 - P(X = 1) = 1 - 0.1 = 0.9, by the complement rule. This distribution may also be described by the probability histogram shown to the right:


All random variables (discrete and continuous) have a cumulative distribution function. It is a function giving the probability that the random variable X is less than or equal to x, for every value x. For a discrete random variable, the cumulative distribution function is found by summing up the probabilities. (Definition taken from Valerie J. Easton and John H. McColl's Statistics Glossary v1.1)

Example

The cumulative distribution function for the above probability distribution is calculated as follows:
The probability that X is less than or equal to 1 is 0.1,
the probability that X is less than or equal to 2 is 0.1+0.3 = 0.4,
the probability that X is less than or equal to 3 is 0.1+0.3+0.4 = 0.8, and
the probability that X is less than or equal to 4 is 0.1+0.3+0.4+0.2 = 1. The probability histogram for the cumulative distribution of this random variable is shown to the right:

Continuous Random Variables

A continuous random variable is one which takes an infinite number of possible values. Continuous random variables are usually measurements. Examples include height, weight, the amount of sugar in an orange, the time required to run a mile. (Definition taken from Valerie J. Easton and John H. McColl's Statistics Glossary v1.1)
A continuous random variable is not defined at specific values. Instead, it is defined over an interval of values, and is represented by the area under a curve (in advanced mathematics, this is known as an integral). The probability of observing any single value is equal to 0, since the number of values which may be assumed by the random variable is infinite.
Suppose a random variable X may take all values over an interval of real numbers. Then the probability that X is in the set of outcomes A, P(A), is defined to be the area above A and under a curve. The curve, which represents a function p(x), must satisfy the following:
1: The curve has no negative values (p(x) > 0 for all x)
2: The total area under the curve is equal to 1.
A curve meeting these requirements is known as a density curve.

The Uniform Distribution

A random number generator acting over an interval of numbers (a,b) has a continuous distribution. Since any interval of numbers of equal width has an equal probability of being observed, the curve describing the distribution is a rectangle, with constant height across the interval and 0 height elsewhere. Since the area under the curve must be equal to 1, the length of the interval determines the height of the curve. The following graphs plot the density curves for random number generators over the intervals (4,5) (top left), (2,6) (top right), (5,5.5) (lower left), and (3,5) (lower right). The distributions corresponding to these curves are known as uniform distributions.
Consider the uniform random variable X defined on the interval (2,6). Since the interval has width = 4, the curve has height = 0.25 over the interval and 0 elsewhere. The probability that X is less than or equal to 5 is the area between 2 and 5, or (5-2)*0.25 = 0.75. The probability that X is greater than 3 but less than 4 is the area between 3 and 4, (4-3)*0.25 = 0.25. To find that probability that X is less than 3 or greater than 5, add the two probabilities:
P(X < 3 and X > 5) = P(X < 3) + P(X > 5) = (3-2)*0.25 + (6-5)*0.25 = 0.25 + 0.25 = 0.5. The uniform distribution is often used to simulate data. Suppose you would like to simulate data for 10 rolls of a regular 6-sided die. Using the MINITAB "RAND" command with the "UNIF" subcommand generates 10 numbers in the interval (0,6):
MTB > RAND 10 c2;
SUBC> unif 0 6.
Assign the discrete random variable X to the values 1, 2, 3, 4, 5, or 6 as follows:
if 0<X<1, X=1
if 1<X<2, X=2
if 2<X<3, X=3
if 3<X<4, X=4
if 4<X<5, X=5
if X>5, X=6.
Use the generated MINITAB data to assign X to a value for each roll of the die:
Uniform Data  X Value
4.53786   5
5.77474   6
3.69518   4
1.03929   2
4.23835   5
0.37096   1
0.75272   1
5.56563   6
0.89045   1
3.18086   4

Another type of continuous density curve is the normal distribution. The area under the curve is not easy to calculate for a normal random variable X with mean and standard deviation . However, tables (and computer functions) are available for the standard random variable Z, which is computed from X by subtracting and dividing by . All of the rules of probability apply to the normal distribution.

Random variables and probability distributions

Random Variable
Expected Value
Variance
Probability Distribution
Cumulative Distribution Function
Probability Density Function
Discrete Random Variable
Continuous Random Variable
Independent Random Variables
Probability-Probability (PP) Plot
Quantile-Quantile (QQ) Plot
Normal Distribution
Poisson Distribution
Binomial Distribution
Geometric Distribution
Uniform Distribution
Central Limit Theorem

Main Contents page | Index of all entries


Random Variable The outcome of an experiment need not be a number, for example, the outcome when a coin is tossed can be 'heads' or 'tails'. However, we often want to represent outcomes as numbers. A random variable is a function that associates a unique numerical value with every outcome of an experiment. The value of the random variable will vary from trial to trial as the experiment is repeated.
There are two types of random variable - discrete and continuous.
A random variable has either an associated probability distribution (discrete random variable) or probability density function (continuous random variable).
Examples
  1. A coin is tossed ten times. The random variable X is the number of tails that are noted. X can only take the values 0, 1, ..., 10, so X is a discrete random variable.
  2. A light bulb is burned until it burns out. The random variable Y is its lifetime in hours. Y can take any positive real value, so Y is a continuous random variable.


Expected Value The expected value (or population mean) of a random variable indicates its average or central value. It is a useful summary value (a number) of the variable's distribution.
Stating the expected value gives a general impression of the behaviour of some random variable without giving full details of its probability distribution (if it is discrete) or its probability density function (if it is continuous).
Two random variables with the same expected value can have very different distributions. There are other useful descriptive measures which affect the shape of the distribution, for example variance.
The expected value of a random variable X is symbolised by E(X) or µ.
If X is a discrete random variable with possible values x1, x2, x3, ..., xn, and p(xi) denotes P(X = xi), then the expected value of X is defined by:
sum of xi.p(xi)
where the elements are summed over all values of the random variable X.
If X is a continuous random variable with probability density function f(x), then the expected value of X is defined by:
integral of xf(x)dx
Example
Discrete case : When a die is thrown, each of the possible faces 1, 2, 3, 4, 5, 6 (the xi's) has a probability of 1/6 (the p(xi)'s) of showing. The expected value of the face showing is therefore:
µ = E(X) = (1 x 1/6) + (2 x 1/6) + (3 x 1/6) + (4 x 1/6) + (5 x 1/6) + (6 x 1/6) = 3.5
Notice that, in this case, E(X) is 3.5, which is not a possible value of X.
See also sample mean.


Variance The (population) variance of a random variable is a non-negative number which gives an idea of how widely spread the values of the random variable are likely to be; the larger the variance, the more scattered the observations on average.
Stating the variance gives an impression of how closely concentrated round the expected value the distribution is; it is a measure of the 'spread' of a distribution about its average value.
Variance is symbolised by V(X) or Var(X) or sigma^2
The variance of the random variable X is defined to be:
V(X)=E(X^2)-E(X)^2
where E(X) is the expected value of the random variable X.
Notes
  1. the larger the variance, the further that individual values of the random variable (observations) tend to be from the mean, on average;
  2. the smaller the variance, the closer that individual values of the random variable (observations) tend to be to the mean, on average;
  3. taking the square root of the variance gives the standard deviation, i.e.:
    sqrt(V(X))=sigma
  4. the variance and standard deviation of a random variable are always non-negative.
See also sample variance.


Probability Distribution The probability distribution of a discrete random variable is a list of probabilities associated with each of its possible values. It is also sometimes called the probability function or the probability mass function.
More formally, the probability distribution of a discrete random variable X is a function which gives the probability p(xi) that the random variable equals xi, for each value xi:
p(xi) = P(X=xi)
It satisfies the following conditions:
  1. 0 <= p(xi) <= 1
  2. sum of all p(xi) is 1


Cumulative Distribution Function All random variables (discrete and continuous) have a cumulative distribution function. It is a function giving the probability that the random variable X is less than or equal to x, for every value x.
Formally, the cumulative distribution function F(x) is defined to be:
F(x) = P(X<=x)
for
-infinity < x < infinity
For a discrete random variable, the cumulative distribution function is found by summing up the probabilities as in the example below.
For a continuous random variable, the cumulative distribution function is the integral of its probability density function.
Example
Discrete case : Suppose a random variable X has the following probability distribution p(xi):
xi 0 1 2 3 4 5
p(xi) 1/32 5/32 10/32 10/32 5/32 1/32
This is actually a binomial distribution: Bi(5, 0.5) or B(5, 0.5). The cumulative distribution function F(x) is then:
xi 0 1 2 3 4 5
F(xi) 1/32 6/32 16/32 26/32 31/32 32/32
F(x) does not change at intermediate values. For example:
F(1.3) = F(1) = 6/32
F(2.86) = F(2) = 16/32


Probability Density Function The probability density function of a continuous random variable is a function which can be integrated to obtain the probability that the random variable takes a value in a given interval.
More formally, the probability density function, f(x), of a continuous random variable X is the derivative of the cumulative distribution function F(x):
f(x) = d/dx F(x)
Since F(x) = P(X<=x) it follows that:
integral of f(x)dx = F(b)-F(a) = P(a<X<b)
If f(x) is a probability density function then it must obey two conditions:
  1. that the total probability for all possible values of the continuous random variable X is 1:
    integral of f(x)dx = 1
  2. that the probability density function can never be negative: f(x) > 0 for all x.


Discrete Random Variable A discrete random variable is one which may take on only a countable number of distinct values such as 0, 1, 2, 3, 4, ... Discrete random variables are usually (but not necessarily) counts. If a random variable can take only a finite number of distinct values, then it must be discrete. Examples of discrete random variables include the number of children in a family, the Friday night attendance at a cinema, the number of patients in a doctor's surgery, the number of defective light bulbs in a box of ten.
Compare continuous random variable.


Continuous Random Variable A continuous random variable is one which takes an infinite number of possible values. Continuous random variables are usually measurements. Examples include height, weight, the amount of sugar in an orange, the time required to run a mile.
Compare discrete random variable.


Independent Random Variables Two random variables X and Y say, are said to be independent if and only if the value of X has no influence on the value of Y and vice versa.
The cumulative distribution functions of two independent random variables X and Y are related by
F(x,y) = G(x).H(y)
where
G(x) and H(y) are the marginal distribution functions of X and Y for all pairs (x,y).
Knowledge of the value of X does not effect the probability distribution of Y and vice versa. Thus there is no relationship between the values of independent random variables.
For continuous independent random variables, their probability density functions are related by
f(x,y) = g(x).h(y)
where
g(x) and h(y) are the marginal density functions of the random variables X and Y respectively, for all pairs (x,y).
For discrete independent random variables, their probabilities are related by
P(X = xi ; Y = yj) = P(X = xi).P(Y=yj)
for each pair (xi,yj).


Probability-Probability (P-P) Plot A probability-probability (P-P) plot is used to see if a given set of data follows some specified distribution. It should be approximately linear if the specified distribution is the correct model.
The probability-probability (P-P) plot is constructed using the theoretical cumulative distribution function, F(x), of the specified model. The values in the sample of data, in order from smallest to largest, are denoted x(1), x(2), ..., x(n). For i = 1, 2, ....., n, F(x(i)) is plotted against (i-0.5)/n.
Compare quantile-quantile (Q-Q) plot.


Quantile-Quantile (QQ) Plot A quantile-quantile (Q-Q) plot is used to see if a given set of data follows some specified distribution. It should be approximately linear if the specified distribution is the correct model.
The quantile-quantile (Q-Q) plot is constructed using the theoretical cumulative distribution function, F(x), of the specified model. The values in the sample of data, in order from smallest to largest, are denoted x(1), x(2), ..., x(n). For i = 1, 2, ....., n, x(i) is plotted against F-1((i-0.5)/n).
Compare probability-probability (P-P) plot.


Normal Distribution Normal distributions model (some) continuous random variables. Strictly, a Normal random variable should be capable of assuming any value on the real line, though this requirement is often waived in practice. For example, height at a given age for a given gender in a given racial group is adequately described by a Normal random variable even though heights must be positive.
A continuous random variable X, taking all real values in the range minus infinity to infinity is said to follow a Normal distribution with parameters µ and if it has probability density function
f(x) = {1/sqrt(2.pi.sigma^2)}.exp[-.5{(x-mu)/sigma}^2]
We write
X ~ N(mu, sigma^2)
This probability density function (p.d.f.) is a symmetrical, bell-shaped curve, centred at its expected value µ. The variance is sigma^2.
Many distributions arising in practice can be approximated by a Normal distribution. Other random variables may be transformed to normality.
The simplest case of the normal distribution, known as the Standard Normal Distribution, has expected value zero and variance one. This is written as N(0,1).
Examples

N(0,1) pdf N(2,1) pdf N(0,2) pdf


Poisson Distribution Poisson distributions model (some) discrete random variables. Typically, a Poisson random variable is a count of the number of events that occur in a certain time interval or spatial area. For example, the number of cars passing a fixed point in a 5 minute interval, or the number of calls received by a switchboard during a given period of time.
A discrete random variable X is said to follow a Poisson distribution with parameter m, written X ~ Po(m), if it has probability distribution
P(X=x) = (m^x/x!).e^(-m)
where
x = 0, 1, 2, ..., n
m > 0.
The following requirements must be met:
  1. the length of the observation period is fixed in advance;
  2. the events occur at a constant average rate;
  3. the number of events occurring in disjoint intervals are statistically independent.
The Poisson distribution has expected value E(X) = m and variance V(X) = m; i.e. E(X) = V(X) = m.
The Poisson distribution can sometimes be used to approximate the Binomial distribution with parameters n and p. When the number of observations n is large, and the success probability p is small, the Bi(n,p) distribution approaches the Poisson distribution with the parameter given by m = np. This is useful since the computations involved in calculating binomial probabilities are greatly reduced.
Examples

Po(3) pdf Po(5) pdf


Binomial Distribution Binomial distributions model (some) discrete random variables.
Typically, a binomial random variable is the number of successes in a series of trials, for example, the number of 'heads' occurring when a coin is tossed 50 times.
A discrete random variable X is said to follow a Binomial distribution with parameters n and p, written X ~ Bi(n,p) or X ~ B(n,p), if it has probability distribution
P(X=x) = (n choose x).p^x.(1-p)^n-x
where
x = 0, 1, 2, ......., n
n = 1, 2, 3, .......
p = success probability; 0 < p < 1
(n choose x) = n! / {x!(n-x)!}
The trials must meet the following requirements:
  1. the total number of trials is fixed in advance;
  2. there are just two outcomes of each trial; success and failure;
  3. the outcomes of all the trials are statistically independent;
  4. all the trials have the same probability of success.
The Binomial distribution has expected value E(X) = np and variance V(X) = np(1-p).
Examples

Bi(10,0.5) pdf Bi(10,0.25) pdf


Geometric Distribution Geometric distributions model (some) discrete random variables. Typically, a Geometric random variable is the number of trials required to obtain the first failure, for example, the number of tosses of a coin untill the first 'tail' is obtained, or a process where components from a production line are tested, in turn, until the first defective item is found.
A discrete random variable X is said to follow a Geometric distribution with parameter p, written X ~ Ge(p), if it has probability distribution
P(X=x) = px-1(1-p)x
where
x = 1, 2, 3, ...
p = success probability; 0 < p < 1
The trials must meet the following requirements:
  1. the total number of trials is potentially infinite;
  2. there are just two outcomes of each trial; success and failure;
  3. the outcomes of all the trials are statistically independent;
  4. all the trials have the same probability of success.
The Geometric distribution has expected value E(X)= 1/(1-p) and variance V(X)=p/{(1-p)2}.
The Geometric distribution is related to the Binomial distribution in that both are based on independent trials in which the probability of success is constant and equal to p. However, a Geometric random variable is the number of trials until the first failure, whereas a Binomial random variable is the number of successes in n trials.
Examples

Ge(0.5) pdf Ge(0.75) pdf


Uniform Distribution Uniform distributions model (some) continuous random variables and (some) discrete random variables. The values of a uniform random variable are uniformly distributed over an interval. For example, if buses arrive at a given bus stop every 15 minutes, and you arrive at the bus stop at a random time, the time you wait for the next bus to arrive could be described by a uniform distribution over the interval from 0 to 15.
A discrete random variable X is said to follow a Uniform distribution with parameters a and b, written X ~ Un(a,b), if it has probability distribution
P(X=x) = 1/(b-a)
where
x = 1, 2, 3, ......., n.
A discrete uniform distribution has equal probability at each of its n values.
A continuous random variable X is said to follow a Uniform distribution with parameters a and b, written X ~ Un(a,b), if its probability density function is constant within a finite interval [a,b], and zero outside this interval (with a less than or equal to b).
The Uniform distribution has expected value E(X)=(a+b)/2 and variance {(b-a)2}/12.
Example

Un(10,20) pdf


Central Limit Theorem The Central Limit Theorem states that whenever a random sample of size n is taken from any distribution with mean µ and variance sigma^2, then the sample mean x_bar will be approximately normally distributed with mean µ and variance sigma^2/n. The larger the value of the sample size n, the better the approximation to the normal.
This is very useful when it comes to inference. For example, it allows us (if the sample size is fairly large) to use hypothesis tests which assume normality even if our data appear non-normal. This is because the tests use the sample mean x_bar, which the Central Limit Theorem tells us will be approximately normally distributed.



Top of page | Main Contents page