Machine Learning Notes I

Machine Learning Notes II

The Primal Question of Optimization

For a general optimization problem, it usually could be rewritten as maximizing or minimizing a certain function with several constrictions. For example, maybe you want the optimized value non-negative. The most basic form of such is called the primal question which looks like this:

$$ \begin{matrix} \underset{x}{min}f(x), x \in \mathbb{R}^n \newline s.t. \newline g_i(x)\leq 0, i=1,2,…,m \newline h_i(x)= 0, i=1,2,…,p \end{matrix} $$

And we call $f(x)$ the objective function and the rest two the constriction function.

The Lagrangian Function

To solve such primal question with constrictions, we can use the Lagrange multiplier to encode the three functions into one:

$$ L(x,u,v)=f(x)+\sum_{i=1}^{m}u_ig_i(x)+\sum_{i=1}^{m}v_ih_i(x) $$

As such, we call $u_i,v_i$ the lagrange multipliers.

And we make $u_i \geq 0, v_i \in \mathbb{R}$ so that:

Because $g_i(x) \leq 0$, so that the maximum of $\sum_{i=1}^{m}u_ig_i(x)$ is 0. And since $h_i(x)=0$, so $\sum_{i=1}^{m}v_ih_i(x)$ is also equal to 0. Thus:

$$\underset{u,v}{max}L(x,u,v)=f(x)+0+0=f(x)$$

In this way, we find:

$$\underset{x}{min}f(x)=\underset{x}{min}\underset{u,v}{max}L(x,u,v)$$

The Lagrangian Dual Function

But we find the expression above is hard to solve, so we need to transfer it to a dual function as such:

Define $D$ the feasible domain:

$$g(u,v)=\underset{x\in D}{inf}L(x,u,v)=\underset{x\in D}{inf}[f(x)+\sum_{i=1}^{m}u_ig_i(x)+\sum_{i=1}^{m}v_ih_i(x)]$$

while the function does not have a lower bound, we define the value as $-\infty$, so that the dual is a concave function, and since we are trying to find the maximum, it could be treated as a convex optimization problem.

Because:

$$\underset{x}{min}f(x)=\underset{x}{min}\underset{u,v}{max}L(x,u,v)\geq \underset{u,v}{max}\underset{x}{min}L(x,u,v)= \underset{u,v}{max}g(x)$$

Proof of the Minimax Theorem

    / [pdf]

Dual gap

we use $p^*$ to represent the optimized solution of the primal problem:

$$p^*=\underset{x}{min}f(x)$$

And $d^*$ to represent the optimized solution of the lagrangian dual:

$$d^*=\underset{u,v}{max}g(u,v)$$

And because the minimax theorem:

$$p^* \geq d^*$$

We are using $ d^* $ to approach $ p^* $ and calling $ p^* - d^* $ the dual gap

When the dual gap is zero, we call the situation as strong dual, otherwise the weak dual.

I am not going to write about KKT&Slater for now…*