# Machine Learning Notes III

*This note is still open

Machine Learning Notes I

Machine Learning Notes II

## The Primal Question of Optimization

For a general optimization problem, it usually could be rewritten as maximizing or minimizing a certain function with several constrictions. For example, maybe you want the optimized value non-negative. The most basic form of such is called the primal question which looks like this:

\begin{matrix} \underset{x}{min}f(x), x \in \mathbb{R}^n \\ s.t. \\ g_i(x)\leq 0, i=1,2,…,m \\ h_i(x)= 0, i=1,2,…,p \end{matrix}

And we call f(x) the objective function and the rest two the constriction function.

## The Lagrangian Function

To solve such primal question with constrictions, we can use the Lagrange multiplier to encode the three functions into one:

L(x,u,v)=f(x)+\sum_{i=1}^{m}u_ig_i(x)+\sum_{i=1}^{m}v_ih_i(x)

As such, we call u_i,v_i the lagrange multipliers.

And we make u_i \geq 0, v_i \in \mathbb{R} so that:

Because g_i(x) \leq 0, so that the maximum of \sum_{i=1}^{m}u_ig_i(x) is 0. And since h_i(x)=0, so \sum_{i=1}^{m}v_ih_i(x) is also equal to 0. Thus:

\underset{u,v}{max}L(x,u,v)=f(x)+0+0=f(x)

In this way, we find:

\underset{x}{min}f(x)=\underset{x}{min}\underset{u,v}{max}L(x,u,v)

## The Lagrangian Dual Function

But we find the expression above is hard to solve, so we need to transfer it to a dual function as such:

Define D the feasible domain:

g(u,v)=\underset{x\in D}{inf}L(x,u,v)=\underset{x\in D}{inf}[f(x)+\sum_{i=1}^{m}u_ig_i(x)+\sum_{i=1}^{m}v_ih_i(x)]

while the function does not have a lower bound, we define the value as -\infty, so that the dual is a concave function, and since we are trying to find the maximum, it could be treated as a convex optimization problem.

Because:

\underset{x}{min}f(x)=\underset{x}{min}\underset{u,v}{max}L(x,u,v)\geq \underset{u,v}{max}\underset{x}{min}L(x,u,v)= \underset{u,v}{max}g(x)

### Proof of the Minimax Theorem

euclid.kmj_.1138038812

## Dual gap

we use p^* to represent the optimized solution of the primal problem:

p^*=\underset{x}{min}f(x)

And d^* to represent the optimized solution of the lagrangian dual:

d^*=\underset{u,v}{max}g(u,v)

And because the minimax theorem:

p^* \geq d^*

We are using d^* to approach p^* and calling p^*-d^* the dual gap

When the dual gap is zero, we call the situation as strong dual, otherwise the weak dual.

I am not going to write about KKT&Slater for now…*

$latex \int_{-\infty}^\infty e^{-x^2}\,dx=\sqrt\pi$