In 
statistics, the method of 
maximum likelihood, pioneered by 
geneticist/
statistician Sir Ronald A. Fisher, is a method of 
point estimation, that uses as an estimate of an unobservable population parameter the member of the parameter space that maximizes the 
likelihood function.
For the moment let 
p denote the unobservable population parameter to be estimated.  Let 
X denote the random variable observed (which in general will not be 
scalar-valued, but often will be a vector of probabilistically 
independent scalar-valued random variables.  The probability of an observed outcome 
X=x (this is case-sensitive notation!), or the value at (lower-case) 
x of the probability density function of the random variable (Capital) 
X, 
as a function of p with x held fixed is the 
likelihood function
- <math>L(p)=P(X=x\mid p).</math>
 
For example, in a large population of voters, the proportion 
p who will vote "yes" is unobservable, and is to be estimated based on a political opinion poll.  A sample of 
n voters is chosen randomly, and it is observed that 
x of those 
n voters will vote "yes".  Then the likelihood function is
- <math>L(p)={n \choose x}p^x(1-p)^{n-x}.</math>
 
The value of 
p that maximizes 
L(p) is the 
maximum-likelihood estimate of 
p. By finding the root of the first derivative one will obtain 
x/n as the maximum-likelihood estimate. In this case, as in many other cases, it is much easier to take 
the logarithm of the likelihood function before finding the root of the derivative:
- <math>\frac{x}{p}-\frac{n-x}{1-p}=0</math>
 
Taking the logarithm of the likelihood is so common that the term 
log-likelihood is commonplace among statisticians. The log-likelihood is closely related to 
information entropy.
  
If we replace the lower-case x with capital X then we have, not the observed value in a particular case, but rather a random variable, which, like all random variables, has a probability distribution.  The value (lower-case) x/n observed in a particular case is an estimate; the random variable (Capital) X/n is an estimator.  The statistician may take the nature of the probability distribution of the estimator to indicate how good the estimator is; in particular it is desirable that the probability that the estimator is far from the parameter p be small.  Maximum-likelihood estimators are typically better than unbiased estimators.  They also have a property called "functional invariance" that unbiased estimators lack: for any function f, the maximum-likelihood estimator of f(p) is f(T), where T is the maximum-likelihood estimator of p.
 
All Wikipedia text 
is available under the 
terms of the GNU Free Documentation License