Deemo(Yizhou) Chen's Observatory

Machine Learning Notes III


*This note is still open

Machine Learning Notes I

Machine Learning Notes II

The Primal Question of Optimization

For a general optimization problem, it usually could be rewritten as maximizing or minimizing a certain function with several constrictions. For example, maybe you want the optimized value non-negative. The most basic form of such is called the primal question which looks like this:

\begin{matrix} \underset{x}{min}f(x), x \in \mathbb{R}^n \\ s.t. \\ g_i(x)\leq 0, i=1,2,…,m \\ h_i(x)= 0, i=1,2,…,p \end{matrix}

And we call f(x) the objective function and the rest two the constriction function.

The Lagrangian Function

To solve such primal question with constrictions, we can use the Lagrange multiplier to encode the three functions into one:


As such, we call u_i,v_i the lagrange multipliers.

And we make u_i \geq 0, v_i \in \mathbb{R} so that:

Because g_i(x) \leq 0, so that the maximum of \sum_{i=1}^{m}u_ig_i(x) is 0. And since h_i(x)=0, so \sum_{i=1}^{m}v_ih_i(x) is also equal to 0. Thus:


In this way, we find:


The Lagrangian Dual Function

But we find the expression above is hard to solve, so we need to transfer it to a dual function as such:

Define D the feasible domain:

g(u,v)=\underset{x\in D}{inf}L(x,u,v)=\underset{x\in D}{inf}[f(x)+\sum_{i=1}^{m}u_ig_i(x)+\sum_{i=1}^{m}v_ih_i(x)]
the lagrangian dual function(g(x))

while the function does not have a lower bound, we define the value as -\infty, so that the dual is a concave function, and since we are trying to find the maximum, it could be treated as a convex optimization problem.


\underset{x}{min}f(x)=\underset{x}{min}\underset{u,v}{max}L(x,u,v)\geq \underset{u,v}{max}\underset{x}{min}L(x,u,v)= \underset{u,v}{max}g(x)

Proof of the Minimax Theorem


Dual gap

we use p^* to represent the optimized solution of the primal problem:


And d^* to represent the optimized solution of the lagrangian dual:


And because the minimax theorem:

p^* \geq d^*

We are using d^* to approach p^* and calling p^*-d^* the dual gap

When the dual gap is zero, we call the situation as strong dual, otherwise the weak dual.

I am not going to write about KKT&Slater for now…*

About the author

1 comment


By Deemo
Deemo(Yizhou) Chen's Observatory