Encyclopedia > Lagrange Multiplier

Article Content

Lagrange multiplier

Redirected from Lagrange Multiplier

Lagrange multipliers are a method for solving certain optimization problems. The class of problems to be solved deals with finding local extrema of multidimensional functions with one-dimensional contraints in the form of a curve. The method reduces the constrained problem to an unconstrained problem which can be solved by the usual gradient method.

Formalism of the method of Lagrange multipliers

Let f (x₁, x₂, … x_n) be a function defined on an n-dimensional open set. f has to have continuous partial derivatives on that set. The constraint is defined by the curve which satisfies g (x₁, x₂, … x_n) = C which is contained in the open set (C is a constant). Additionally, g ≠ 0 everywhere on the curve (where is the gradient). The task is now to find a local maximum (or minimum) of f(x) on the set {x ∈ Rⁿ | g (x) = C}. By setting the derivative of f along the curve g (x) = C to zero, one can show that there exists a real number λ (the Lagrange multiplier) such that

<math>\nabla (f+\lambda g)=0</math>

or - written differently

<math>\frac{\partial f}{\partial x_i} + \lambda \frac{\partial g}{\partial x_i} = 0 \mbox{ for all } i = 1,\cdots,n</math>

for each x that is such a local extremum. Because the solution of this equation still gives values outside of the curve, the constraint g (x) = C has to be used again. This results in all possible solutions.

Example

Suppose we wish to find the discrete probability distribution with maximal information entropy. Then

<math>f(p_1,p_2,\cdots,p_n) = -\sum_{k=1}^n p_k\log_2 p_k</math>

Of course, the sum of these probabilities equals 1, so our constraint function is

<math>g(p_1,p_2,\cdots,p_n)=\sum_{k=1}^n p_k=1.</math>

To find the point of maximum entropy (depending on the probabilities), we can use the Lagrange multiplier. We set:

<math>\mbox{For all } i = 1,\cdots,n</math>

<math>\frac{\partial}{\partial p_i}\left(-\sum_{k=1}^n p_k \log_2 p_k + \lambda\sum_{k=1}^n p_k\right) = 0</math>

Carrying out the differentation of these n equations, we get

<math>-\left(\frac{1}{\ln 2}+\log_2 p_i \right) + \lambda = 0</math>

This shows that all p_i are equal (because they depend on λ only). By using the constraint ∑_k p_k = 1 again, we get the uniform distribution:

For another example, see also Derivation of the partition function.

All Wikipedia text is available under the terms of the GNU Free Documentation License

Search Encyclopedia

Search over one million articles, find something about almost anything!