The binomial distribution is a discrete probability distribution which describes the number of successes in a sequence of n independent experiments, each of which yielding success with probability p. Such a success/failure experiment is also called a Bernoulli experiment.
A typically example is the following: 5% of the population are HIV-positive. You pick 500 people randomly. How likely is it that you get 30 or more HIV-positives?
The number of HIV-positives you pick is a random variable X which follows a binomial distribution with n = 500 and p = .05. We are interested in the probability Pr[X ≥ 30].
In general, if the random variable X follows the binomial distribution with parameters n and p, we write X ~ B(n, p). The probability of getting exactly k successes is given by
- Pr[X = k] = C(n, k) p^{k} (1-p)^{n-k} for k = 0, 1, 2, ..., n
Here, C(
n,
k) denotes the
binomial coefficient of
n and
k, whence the name of the distribution. The formula can be understood as follows: we want
k successes (
p^{k}) and
n-
k failures ((1-
p)
^{n-k}). However, the
k successes can occur anywhere among the
n trials, and there are C(
n,
k) different ways of distributing
k successes in a sequence of
n trials.
If X ~ B(n, p), then the expected value of X is
- E[X] = np
and the
variance is
- Var(X) = np(1-p).
The most likely value or
mode of
X is given by the largest integer less than or equal to (
n+1)
p; if
m = (
n+1)
p is itself an integer, then
m-1 and
m are both modes.
If X ~ B(n, p) and Y ~ B(m, p) are independent binomial variables, then X + Y is again a binomial variable; its distribution is B(n+m, p).
Two other important distributions arise as approximations of binomial distributions:
- If both np and n(1-p) are greater than 5 or so, then an excellent approximation to B(n, p) is given by the normal distribution N(np, np(1-p)). This approximation is a huge time saver; historically, it was the first use of the normal distribution. Nowadays, it can be seen as a consequence of the central limit theorem since B(n, p) is a sum of n independent, identically distributed 0-1 indicator variables.
- If n is large and p is small, so that np is of moderate size, then the Poisson distribution with parameter λ = np is a good approximation to B(n, p).
- pictures of these approximations would be nice.
The formula for Bézier curves was inspired by the binomial distribution.
All Wikipedia text
is available under the
terms of the GNU Free Documentation License