Lagrange multipliers are a method for solving certain
optimization problems.
The class of problems to be solved deals with finding local
extrema of multidimensional
functions with one-dimensional contraints in the form of a curve.
The method reduces the constrained problem to an unconstrained problem which can be solved by the usual
gradient method.
Let f (x1, x2, … xn) be a function defined on an n-dimensional open set.
f has to have continuous partial derivatives on that set.
The constraint is defined by the curve which satisfies g (x1, x2, … xn) = C which is contained in the open set (C is a constant).
Additionally, g ≠ 0 everywhere on the curve (where is the gradient).
The task is now to find a local maximum (or minimum) of f(x) on the set {x ∈ Rn | g (x) = C}.
By setting the derivative of f along the curve g (x) = C to zero, one can show that there exists a real number λ (the Lagrange multiplier) such that
- <math>\nabla (f+\lambda g)=0</math>
or - written differently
- <math>\frac{\partial f}{\partial x_i} + \lambda \frac{\partial g}{\partial x_i} = 0 \mbox{ for all } i = 1,\cdots,n</math>
for each
x that is such a local extremum.
Because the solution of this equation still gives values outside of the curve, the constraint
g (
x) =
C has to be used again.
This results in all possible solutions.
Suppose we wish to find the discrete probability distribution with maximal information entropy. Then
- <math>f(p_1,p_2,\cdots,p_n) = -\sum_{k=1}^n p_k\log_2 p_k</math>
Of course, the sum of these probabilities equals 1, so our constraint function is
- <math>g(p_1,p_2,\cdots,p_n)=\sum_{k=1}^n p_k=1.</math>
To find the point of maximum entropy (depending on the probabilities), we can use the Lagrange multiplier.
We set:
- <math>\mbox{For all } i = 1,\cdots,n</math>
- <math>\frac{\partial}{\partial p_i}\left(-\sum_{k=1}^n p_k \log_2 p_k + \lambda\sum_{k=1}^n p_k\right) = 0</math>
Carrying out the differentation of these
n equations, we get
- <math>-\left(\frac{1}{\ln 2}+\log_2 p_i \right) + \lambda = 0</math>
This shows that all
pi are equal (because they depend on λ only).
By using the constraint ∑
k pk = 1 again, we get the uniform distribution:
- <math>p_i = \frac{1}{n}</math>
For another example, see also Derivation of the partition function.
All Wikipedia text
is available under the
terms of the GNU Free Documentation License