 ## Machine Learning Notes III

*This note is still open

Machine Learning Notes I

Machine Learning Notes II

## The Primal Question of Optimization

For a general optimization problem, it usually could be rewritten as maximizing or minimizing a certain function with several constrictions. For example, maybe you want the optimized value non-negative. The most basic form of such is called the primal question which looks like this:

\begin{matrix} \underset{x}{min}f(x), x \in \mathbb{R}^n \\ s.t. \\ g_i(x)\leq 0, i=1,2,…,m \\ h_i(x)= 0, i=1,2,…,p \end{matrix}

And we call f(x) the objective function and the rest two the constriction function.

## The Lagrangian Function

To solve such primal question with constrictions, we can use the Lagrange multiplier to encode the three functions into one:

L(x,u,v)=f(x)+\sum_{i=1}^{m}u_ig_i(x)+\sum_{i=1}^{m}v_ih_i(x)

As such, we call u_i,v_i the lagrange multipliers.

And we make u_i \geq 0, v_i \in \mathbb{R} so that:

Because g_i(x) \leq 0, so that the maximum of \sum_{i=1}^{m}u_ig_i(x) is 0. And since h_i(x)=0, so \sum_{i=1}^{m}v_ih_i(x) is also equal to 0. Thus:

\underset{u,v}{max}L(x,u,v)=f(x)+0+0=f(x)

In this way, we find:

\underset{x}{min}f(x)=\underset{x}{min}\underset{u,v}{max}L(x,u,v)

## The Lagrangian Dual Function

But we find the expression above is hard to solve, so we need to transfer it to a dual function as such:

Define D the feasible domain:

g(u,v)=\underset{x\in D}{inf}L(x,u,v)=\underset{x\in D}{inf}[f(x)+\sum_{i=1}^{m}u_ig_i(x)+\sum_{i=1}^{m}v_ih_i(x)]

while the function does not have a lower bound, we define the value as -\infty, so that the dual is a concave function, and since we are trying to find the maximum, it could be treated as a convex optimization problem.

Because:

\underset{x}{min}f(x)=\underset{x}{min}\underset{u,v}{max}L(x,u,v)\geq \underset{u,v}{max}\underset{x}{min}L(x,u,v)= \underset{u,v}{max}g(x)

### Proof of the Minimax Theorem

euclid.kmj_.1138038812

## Dual gap

we use p^* to represent the optimized solution of the primal problem:

p^*=\underset{x}{min}f(x)

And d^* to represent the optimized solution of the lagrangian dual:

d^*=\underset{u,v}{max}g(u,v)

And because the minimax theorem:

p^* \geq d^*

We are using d^* to approach p^* and calling p^*-d^* the dual gap

When the dual gap is zero, we call the situation as strong dual, otherwise the weak dual.

I am not going to write about KKT&Slater for now…*

{ 1 comment }

## Machine Learning Notes II

Link to Machine Learning Notes I

### The least squares estimates of α and β

For simple linear regression:

E\left ( Y|X=x \right ) = \alpha +\beta x

we have:

\hat{\beta } = \frac{cov\left ( X, Y \right )}{var\left ( X \right )}

\hat{\alpha} = \bar{Y} - \hat{\beta}\bar{X}

### Linear Regression way

We can all use the NN method to solve the regression problem but that leads to being nearly impossible to locate exactly which layer foreshadows which feature of the data. Thus, maybe the better way is to upscale the dimension of the linear regression method. That, we not only use x but x, x^{2}, x^{1/2}... to approach the true curve.

## Classification method

### Parametric methods:

Direct way = E\left ( Y|X=x \right ) = Sigmoid(\beta ^{T}X)

Bayes way *TODO:needs elaboration

### Nonparametric methods:

KNN *TODO:needs elaboration

Elastic Net *TODO:needs elaboration

PCA *TODO:needs elaboration

## Convex Analysis&Optimization

For the case of convex data, it is very easy to find the minimal(global), but for the case of non-convex data, the situation would be a little complex. To analysis such a situation, we need to discuss it under different assumptions.

### Lipschitz continuous

Definition, for all x_{0}, x_{1}:

\left | f(x_{0})-f(x_{1}) \right |\leq L(|x_{0}-x_{1}|)

For the purpose of clearance, we would use 2-dimensional space.

If the function is only Lipschitz continuous, then even it could varies in a certain range but it is not smooth, so we cannot apply gradient descent.

Let’s define a domain [0,1], and we can divide the domain in to k cuts, so the cutting points are: \frac{1}{k}, \frac{2}{k},...,\frac{k}{k}, and know the target is \chi \in [\frac{i}{k}, \frac{i+1}{k}]

Now, how about the distance \chi -\frac{i}{k}|? Since we assume the function is lipschitz continuous, and each cut is \frac{1(,or L)}{k}, then we have |f(\chi) -f(\frac{i}{k})|\leq L|\chi -\frac{i}{k}| \leq \frac{L}{k}

And we have a concept called tolerance, \epsilon, which talks about the precision of the result you want, for example, 1e-6.

*TODO: Here remains a question

### L-Smooth

Definition:

\left | \bigtriangledown f(x_{0})-\bigtriangledown f(x_{1}) \right |\leq \left | x_{0}-x{1} \right |

Which is equivalents to:

f(y)\leq f(x)+\bigtriangledown f(x)^{T}(y-x)+\frac{L}{2}\left | y-x \right |_{2}^{2}

So such assumption implies a certain relationship between f(y) and f(x)

### Convexity

Set Convexity

Function Convexity

“A function is convex if and only if its epigraph, the region (in green) above its graph (in blue), is a convex set.”

So for a convex function, there would only be one local, which is the global, minimum.

### The Convergence rate of Gradient Descent

Suppose the function f:\mathbb{R}^{n}\rightarrow \mathbb{R} is convex and differentiable, and that its gradient is Lipschitz continuous with constant L > 0, i.e. we have that \left | \bigtriangledown f(x_{0})-\bigtriangledown f(x_{1}) \right |\leq \left | x_{0}-x_{1} \right | for any x, y. Then if we run gradient descent for k iterations with a fixed step size t ≤ \frac{1}{L}, it will yield a solution f(k) which satisfies:

f(x^{(k)})-f(x^{*})\leq \frac{\left | x^{(0)}-x^{*} \right |_{2}^{2}}{2tk}

Such expression implies how it guarantees improvement(converge)

Epochs to run:

\frac{1}{\epsilon}

TODO: USE LOG?

{ 1 comment }

## Bayes’ Rule

When I was in the high school learning about AP statistics I learned the formula:

P(A|B)=\frac{P(A\cap B)}{P(B)}, P(B|A)=\frac{P(A\cap B)}{P(A)}

Which able to be transformed as:

P(A\cap B)=P(A)\cdot P(B|A)=P(B)\cdot P(A|B)

P(A|B) is called “Conditional probability” which pretty much self-explained itself. For which I only knew the meaning of each element but not the whole idea, what I do is just plug in numbers, because it is kinda abstract to understand from itself: “The probability of event A happens given event B happened = The probability of events A and B happens divided by the probability of event B happens”

Before we talking about the Bayes’ Rule, I want to discuss why the conditional probability satisfies such a relationship:

For the Venn diagram showing above, the outer space represents the whole sample space. When we want to know P(A|B), which by definition: the probability of A while B is true. Naturally, we can find that the portion makes A true in circle BA\cap B, then divided it by B\frac{P(A\cap B)}{P(B)}. So the latter B restricts the space into that blue circle B, and what we do is just find the part in which A is true.

For people who dig deep, they might ask: “That’s can’t be right, A and B are just samples, they are not probabilities, so the formula did above does not match!” Well, the fact is we did omit a little about the sample space, for which we call it S. What we saying “probability” is actually, using A as an example, P(A)=\frac{A}{S}. And the S would be canceled out, so I omit a little and directly using the A&B portion in the diagram.

P(A|B)=\frac{ P(B|A) }{P(B)}\cdot P(A)

So, what should the Bayes’ Rule mean?

Base on the formula, it is asking the probability of event A to happen with the restriction of the event B. The formula itself could be easily derived from the conditional probability formula. And here we focus on how to interpret it.

The basic idea of the Bayes’ Rule is to adjust the general probability, in this case, we say P(A), by a parameter \frac{P(B|A)}{P(B)} to gain a better idea of P(A) with a restriction, or a piece of new evidence, B, and be called as P(A|B)

More related equations:

P(B)=\sum_{i=1}^{n}P(B|A_{i})\cdot P(A_{i})

My dear teachers and parents, my lovely fellow class of 2020, it’s my great honor here to speak as a student representative, and thank you for choosing to join today’s graduation ceremony.

You know, it might be the last time you see most of your surroundings, your classmates, your teacher. I mean, you would not even get in touch with them anymore. Maybe in the near future, you would call them, you would Wechat them, but as long as your life track does not match with them, then eventually you guys would go apart.

20,000 days, this is the number of days you last in this world, have you figure out who you are yet? And what’s “you” even mean? Are you still you if you lose your arms, or your legs? If you have a third level burn, and loses your original skin and look, are you still you? And actually the human cells in your body would be fully replaced for seven years, are you just the physical body? Well, yes, but no. You see, if we beginning to rip off your body parts when you stop being you? If I replace your brain with other’s brains, would this body still be considered as you? I don’t think so.OK, brain, if we say the brain is you, then what is it? Can two random babies grow with the exact same experience have the same consciousness? Can two same babies with exact brain structure grow with different experiences have the same consciousness? Obviously not, so how about two babies with exact brain structure grow with the exact same experience? Yes, they definitely would. You are the current consciousness which is the combination of your brain structure and the memory.

According to the second law of thermodynamics, the entropy of an isolated system always increases. Which implies one day our universe would be completely homogeneous, or same everywhere. This also implies no one could reverse the process because the information has already lost. Our entire existence will be nothing, and cannot even be tracked. What we live for? What do we want?  What is the goal? What is the fuel that drives your existence?

Our life, or our consciousness, is about making choices, it is a very general idea, which includes which eye you gonna blink in the next second, to the college experience you choose to have. There is a standard that helps us make decisions, a feeling that we expect to get, or we can call it happiness. Happiness here is not about emotional feelings like  “fun”, “chill”, or “relax” that are typically positive. It is a general thing we expect to get after choosing to do a thing. We want happiness, this is the only goal, and the only fuel keeps us burning. Let’s have an example: you choose to save another person by sacrificing yourself, then you choose to do so is just because you expect to get more happiness by doing so, and that’s it.

Happiness is the only reason, we at any moment are choosing the one we expect could bring us more happiness, So you cannot blame the past you, you are always making the best decision you think.

Our universe might be started with a simple parameter, and theoretically could be calculated. Does that mean we do not have a free willing? Is our destiny a sure thing? Yes, but no. It is true that our universe could be calculated, but only the universe itself could have such huge computing power to do so, so when we talk about prediction, it is actually the real-time calculation.

Shamefully, even every time we made the choice we expect to bring us more happiness, but it still not coming. The problem is, many people do not have a clear and clean choosing standard. Do you understand who you are? What do you want? So-called the feeling of guilt, confusion, and regret, are all because there is no long term goal. But if you have, then it is OK to offense people, or be offended. You would not regret about short-term failures, and you would not be confused immediately after making choices. Once you have the standard, every choice you made is the trajectory regression towards that target, maybe the instant happiness does not come, but every adjustment is bringing you out from that comfort zone dragging you down.

Above are my answer to: “Who I am?” “ Where am I?”, and “Where am I going to?”

So, my schoolmates, my speech is over, your new life is beginning–please answer, hope you all could make the right choice.

{ 1 comment }

## Machine Learning Notes I

Lately, I was into the studying process of machine learning, and outputting(taking notes) is a vital step of it. Here, I am using Andrew Ng’s Stanford Machine Learning course in Coursera with the language of MATLAB.

So the rest of the code I will write in this post by default are based on MATLAB.

## What is ML?

“A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.”

Tom Mitchell

Supervised Learning&Unsupervised Learning

SL: with labels; direct feedback; predict

Under SL, there are regression and classification

USL: without labels; no feedback; finding the hidden structure

Under USL, there are clustering and non-clustering

For now, I would focus on these two but not reinforcement learning.

## The Basic Model & Notation

We use x^{(i)} to represent the “input” value, with the variable x represent the value at the position i in a matrix , or vector in most of the time. And y^{(i)} is the actual “output” when we have a input x at position variable i. A pair of (x^{(i)}, y^{(i)}) is called a training sample. Then we have a list of such samples with i=1,...,m—is called a training set. And the purpose of ML is to have a “good” hypothesis function h(x) which could predict the output while only knowing the input x. If we only want to have a simple linear form of h(x), then it looks like: h(x)=\theta_0 + \theta_1x, which both \theta_0 and \theta_1 is the parameter we want to find that letting h(x) to predict “better”.

### Linear Algebra Review

Matrix-Vector Multiplication:\begin{bmatrix} a & b \\ c & d \\ e & f \end{bmatrix} *\begin{bmatrix} x\\y \end{bmatrix} =\begin{bmatrix} a*x + b*y \\ c*x + d*y \\ e*x + f*y \end{bmatrix}

Matrix-Matrix Multiplication: \begin{bmatrix} a & b \\ c & d \\ e & f \end{bmatrix} * \begin{bmatrix} w & x \\ y & z \\ \end{bmatrix}=\begin{bmatrix} a*w + b*y & a*x + b*z \\ c*w + d*y & c*x + d*z \\e*w + f*y & e*x + f*z \end{bmatrix}

Identity Matrix looks like this—with 1 on the diagonal and the rest of the elements are zeros: \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix}

eye(3)

#### Multiplication Properties

Matrices are not commutative:  A∗B \neq B∗A

Matrices are associative: (A∗B)∗C = A∗(B∗C)

#### Inverse and Transpose

Inverse: A matrix A mutiply with its inverse A_inv results to be a identity matrix I:

I = A*inv(A)

Transposition is like rotating the matrix 90 degrees, for a matrix A with dimension m * n, its transpose is with dimension n * m:

A = \begin{bmatrix} a & b \\ c & d \\ e & f \end{bmatrix}, A^T = \begin{bmatrix} a & c & e \\ b & d & f \\ \end{bmatrix}

Also we can get:

A_{ij}=A_{ji}^{T}

### Cost Function

A cost function shows how accurate our hypothesis function predict while output the error (the deviation between y(x) and h(x)). And it looks like this:

J(\theta_0, \theta_1) = \dfrac {1}{2m} \displaystyle \sum _{i=1}^m \left (h_\theta (x^{(i)}) - y^{(i)} \right)^2

For people who are familier with statistics, it is called “Squared error funtion” while the square makes each error becomes a positice value, and the \frac{1}{2} helps to simplify the expression later when we do derivative during the process of gradient descent. Now, we turn the question to “How to find the \theta_0&\theta_1 that minilize J(\theta_0, \theta_1)?”

#### Contour Plot

A contour plot is actually an alternative way to show 3D graphs in 2D, in which the color blue represents low points while red means the high. So the J(\theta_0, \theta_1) that gives the red point is the set of the parameter gives h(x) with the lowest error with the actual output y(x)

Gradient Descent is one of the most basic ML tools. The basic idea is to “move some small steps which lead to minimizing the cost function J(\theta). And it looks like this:

Repeat until convergence:
{
\theta_j := \theta_j - \alpha \frac{\partial}{\partial \theta_j} J(\theta_0, \theta_1)
}

Here, the operator := just means assign the latter part to the former part while we know it could be the same as = in many languages. We say the former \theta_j as the “next step” while the latter one as the “current position”, \frac{\partial}{\partial \theta_j} J(\theta_0, \theta_1) shows the “direction” that make the move increase J(\theta) the most, so that we could just add a negative sign to make it becomes the fastest decrease direction.\alpha gives the length of step we want it to take for each step. And it’s important to make the update of each \theta be simultaneous.

If we take the code above apart, then we have:

repeat until convergence:
{
\theta_0 := \theta_0 - \alpha \frac{1}{m} \sum\limits_{i=1}^{m}(h_\theta(x^{(i)}) - y^{(i)})
\theta_1 := \theta_1 - \alpha \frac{1}{m} \sum\limits_{i=1}^{m}((h_\theta(x^{(i)}) - y^{(i)}) x^{(i)})
}

The term x_i is nothing but a result of the derivative, there is no x_i for \theta_0 because we defined x^{(i)}_0 as 1.

Then here is a full derivative process to show the partial dervative of the cost function J(\theta):

\begin{aligned}\frac{\partial }{\partial \theta_j}J(\theta) &= \frac{\partial }{\partial \theta_j}\frac{1}{2}(h_\theta(x)-y)^{2}\\&=2 \cdot \frac{1}{2}(h_\theta(x)-y) \cdot \frac{\partial }{\partial \theta_j}(h_\theta(x)-y)\\&= (h_\theta(x)-y)\frac{\partial }{\partial \theta_j}\left ( \sum\limits_{i=0}^n\theta^{(i)}x^{(i)}-y \right )\\&=(h_\theta(x)-y)x_j\end{aligned}

And such basic method is called batch gradient descent while it uses all the training set we provide, and just saying for future reference, J(\theta)is convex which means it only has only one global minima and has no chance to be affected by local minima.

### Multivariate Linear Regression

So saying we have not only one variables of input, but many of them. Then we use j in x_j from 1 to n to represents the index of it just like we use i to represents the index of the training example from 1 to m.

x_{j}^{(i)} = value of, in i^{th} training example, feature j

For convenience of notation, we have to define x_0 = 1, since we have \theta_0 in the hypothesis function, and the matrix mutiplication thing:

x = \begin{bmatrix} x_1\\x_2 \\\vdots\\x_n \end{bmatrix} \in\mathbb{R}^{n} , \theta = \begin{bmatrix}\theta_0\\\theta_1\\\theta_2\\\vdots\\\theta_n \end{bmatrix}\in\mathbb{R}^{n+1} \rightarrow x = \begin{bmatrix}x_0\\x_1\\x_2\\\vdots\\x_n\end{bmatrix}\in\mathbb{R}^{n+1}, \theta = \begin{bmatrix}\theta_0\\\theta_1\\\theta_2\\\vdots\\\theta_n \end{bmatrix}\in\mathbb{R}^{n+1}

And the cool thing here we can do now is using vectorization to represents the long mutivariable hypothesis function:

h_\theta (x) = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + \theta_3 x_3 + ... + \theta_n x_n = \begin{bmatrix}\theta_0&\theta_1&\cdots&\theta_n\end{bmatrix}\begin{bmatrix}x_0 \\ x_1 \\ \vdots \\ x_n\end{bmatrix}=\theta^T x

#### Feature Scaling

If the input set x contains features that have very large difference on their data range, the process of getting \theta could oscillating, being slow, or even failed, and feature scaling, or mean normalization is a technique to make the range of data in each feature more even, and the process is very familir if knowing statistics:

x_j:=\frac{x_j-\mu_j}{s_j}

So the input x with feature index j minus the mean of the data in this feature then divided by the standard deviation(or range in some cases)

### Normal Equation

Other than gradient descent, there is another way to find the minimized cost functionJ(\theta). We first need to construct a matrix X which is a another way to show the input data set of x:

x = \begin{bmatrix}x_0 \\ x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} \rightarrow X = \begin{bmatrix}x^{(1)}_0 & x^{(1)}_1 & \cdots&x^{(1)}_n \\ x^{(2)}_0&x^{(2)}_1 & \cdots & x^{(2)}_n \\ \vdots & \vdots & \ddots & \vdots \\x^{(m)}_0&x^{(m)}_1 & \cdots & x^{(m)}_n \end{bmatrix} = \begin{bmatrix} 1&x^{(1)}_1&\cdots&x^{(1)}_n \\ 1&x^{(2)}_1&\cdots&x^{(2)}_n \\ \vdots &\vdots &\ddots &\vdots \\ 1&x^{(m)}_1 &\cdots &x^{(m)}_n \end{bmatrix}

Actually, each row of the matrix X is the transpose of each element in x_j^{(i)}, contains the data set for all features in one iteration. And the normal equation itself looks like:

\theta = (X^{T}X)^{-1}X^{T}y

I am not going to show how it comes but comparing to gradient descent, the normal equation: 1. no need to choose \alpha 2. no need to iterate 3. but slow.

### Classification

Not only we need to solve some continuous problems(linear regression), but also a lot of discrete problems like if someone gets cancer(YES/NO) by the size of one’s tumor. Normally we use 1 and 0 to represent the two outcomes. And the new form of the function we need to use to better shows the concept of classification is called the sigmoid function:

h(x) = \dfrac{1}{1 + e^{-\theta^{T}x}}

So what we did here is basically put the original hypothesis function \theta^{T}x into the standard sigmoid function:

g(z) = \dfrac{1}{1 + e^{-z}}

So that the new hypothesis function will output the probability toward one of the binary output(1/0) without overlapping.

## Decision Boundary

We consider:

h(x) \geq 0.5 \rightarrow y = 1 \\ h(x) < 0.5 \rightarrow y = 0

Becuase of the bahavior of the logistic function:

\theta^{T}x=0, e^{0}=1 \Rightarrow h(x)=1/2 \\ \theta^{T}x \to \infty, e^{-\infty} \to 0 \Rightarrow h(x)=1 \\ \theta^{T}x \to -\infty, e^{\infty}\to \infty \Rightarrow h(x)=0

So that:

\theta^T x \geq 0 \Rightarrow h(x) = 1 \\ \theta^T x < 0 \Rightarrow h(x) = 0

Then you can just set h(x) to 1 or 0 to get the decision boundary. For example:

\theta = \begin{bmatrix}5 \\ -1 \\ 0\end{bmatrix} \\ y = 1 \; \mathbf{if} \; 5 + (-1) x_1 + 0 x_2 \geq 0 \\ Desicion Boundary: x_1 \leq 5

The plot should looks like:

## Where We From and Where Human Will Go

There was a big explosion, the time be created, the world was created; the particles appear, they make impacts on each other by forces, they interact with each other–then the future be determined–just like begin a game of billiard, at the moment the stick hit the first ball, then all the future is determined. If the initial factors of the universe are typed into a supercomputer, then the computer could precisely shows (but not predict) the future. Or there is no future anyway…

By the second law of thermodynamics, entropy increases as the by-product of the time. Everything will go to be the same in every direction.

Then, if our “fate” is pre-determined, and will go nothing.

Why we just not give up our life immediately?

Because human emotion is preventing it.

Trying to give a fake funny cover.

## The Green Night

It’s summer night, hot, but cold inside.

The castle is full of people—women and knights. At that time, I could be the youngest one there. “What is a real knight?” the King Arthur suddenly looking at me with his deep green eyes. I did not except that, but as quick as it is my nature, I response words that I called out for thousands of times when I was an initiate knight: “Honor, wisdom, humble, never afraid!” He looked at me with satisfactory. I firmly believe those words; I already engraved them on my heart–it is my motto, but until the come of that man, I have no clear understanding of that.

It should be another banquet to celebrate another triumphant return of King Arthur—He conquered the East, West, never failed. But for today, the real star is the Green Knight. He was like a green sword, stick into the group; the ladies be shocked and run around like sheep be invaded by a green wolf. Saying he is a knight, but the Green Knight did not even bring any equip—even his sword. That is not the most striking part—He wants someone to cut his head off! While other knights looked at King Arthur, but I consider, it is my chance—just chop head! After my request, the Green Knight looks at me with a strange-looking.

Green Knight’s head was off, but he was still alive. He grasped his hair, looked at me, the thing is, he asked me to come to his place and cut my head off after one year. It is unfair! I was raging in my mind, he has some kind of magic, I don’t! But I cannot say that out loud, King Arthur looking at me, deeply with a smile.

The time in the castle could be very short, I tried not think about the promise, but the day passes, and finally comes. I must go, not only this is a promise for King Arthur and the Green Knight, but also for me, for my chivalry.

The Green Knight lives on the top of a snowy mountain, but I decide to take a rest in a castle under the mountain. The master of the castle is a hunter, he and his wife who has crystal blue eyes, long smooth black hair, and lithe body welcomed me. The wife made an unspeakably delicious borsch for me—she even put shaped basil at the top of the soup as a decorate at that late night. The hunter goes out at whole daytime, so naturally, I and the lady had a great time. It’s the joy of both, I do consider it is a violation of the motto. The night before I go to meet the Green Knight, I got a special gift—a sash could prevent being hurt. I did not think much about that.

I climbed up to the top without a little fear, I almost see the banquet of triumphant return –only for me. I saw him, took a big step in front of him, let him chop my head off. He looks at me, slowly transforms him into the hunter. He looks at me—I cannot make any sound, just a slight wave of his hands, my sash is off. Then he turns back, walks away slowly.

I know that I may not the smartest and the strongest knight, or the humblest, and I may never be. But, here, now, at this moment, I cannot turn around, I cannot return to King Arthur. I can only either live all the rest of my life with shame or die in honor, die in no fear.

“Wait! There is a head you forget to bring away.”

He stopped, turn around, still, with that simile.

I had my return, and I also realize the Green Knight could give his power to others.

After years, there is another Green Knight live in that castle.

## AI Face Change Full Tutorial-Based On DeepFaceLab

Introduction

AI face change is a product under Neuro Network and Machine Learning technologies. And this article aims to be a tutorial that could let people who do not have experience in Machine Learning to replace the face in one video with other faces.

Key Words: #deepfakes #faceswap #face-swap #deep-learning #deeplearning #deep-neural-networks #deepface #deep-face-swap #fakeapp #fake-app #neural-networks #neural-nets

What is DeepFaceLab?

DeepFaceLab is a tool that utilizes machine learning to replace faces in videos.  Github Project: Click here. DeepFaceLab does not have GUI but it does not require a high RAM (at least 2g).

1. Environment

Hardware requirement:

I. Graphics Card: Nvidia Graphic Card with CC(Compute Capability)>3.0 CC Chart, or at least GTX 1060 6G. AMD not be supported 🙁

II. CPU: Intel or AMD CPU recommended with RAM>8G

III. OS: Windows 10 x64

Software requirement:

in custom option, select Visual C++

Open the downloaded CuCNN zip, and copy the three files in it to CUDA’s install location

The Environment in this tutorial:

Graphics Card: Nvidia GTX GeForce 1070 8G

RAM: 16.0GB

CPU: Intel(R) Core(TM) i7-7700k 4.2GHz

OS: Windows 10 Pro 1903 – 18362.10024

2. DeepFaceLab

I. Install:

II. Preparation

1. Prepare two videos, and name them:

The video has the face need to be replaced as data_dst

The video has the face to replace to another video’s face as data_src

2. Put those two into the workspace

III. Processing

There are many .bat files, do as shown below:

(1) clear workspace

While doing a new model of Face Change, you want clear the previous data

(2) extract images from video data_src

Using Enter to skip (set as default)

(3.2) extract images from video data_dst FULL FPS

Using Enter to skip (set as default)

(4) data_src extract faces MT all GPU

Identify the faces in data_src and pull out

(4.2.2) data_src sort by similar histogram

(4.1) data_src check result

Delete unpleasant faces

(4.2.1) data_src sort by blur

(4.1) data_src check result

Delete unpleasant faces

(5) data_dst extract faces MT all GPU

(5.2) data_dst sort by similar histogram

(5.1) data_dst check results

Delete unpleasant faces

(5.3) data_dst sort by blur

(5.1) data_dst check results

Delete unpleasant faces

(6) train H128

Using Enter to skip (set as default)

Press P to update previews; press Enter to save and quit

We wait until the loss rate close to 0.02

(7) convert H128

Factors:

Press Enter to convert the whole frames

(8) converted to mp4

Whole video would be uploaded later on

{ 1 comment }

## Introduction

Walter Lee could be the character that is spilled the most ink on, not only he is the only main male character in the novel, but also there are many values and struggles be put on him, he made mistakes because of his fault, he also does the right things follows his heart.

## Summary of Walter Lee

He is the son of Walter Senior, also the only man in the younger family(Travis be count as a boy). At the beginning of the novel, the author introduces a sense that Walter Lee gives all his money to his son Travis, and that establishes the fundamental tone of the character, he is a person that actually cares about the family, in the novel, there is bad or good, it is a novel that describes a part of the society.

However, the characteristics that let him stand out from others is his willingness to show his manhood. Walter Lee is totally not satisfied with his current job– as a driver to rich people, mainly white. He thinks being a man means to have his own business. Walter Lee always blames the family because he thinks the family does not care about his dream. The most significant scene could be the moment that while Walter Lee talking about his dreams and ambitions, Ruth just directly say: “Your eggs are getting cold” Here, Ruth not only wants to remind Walter Lee with the fact that the food is cold and get eat then but also drag Walter Lee down with something realistic. Here, the egg could also somehow a metaphor for Walter Lee’s ambition. Here introduces the idea that Walter Lee actually does not explain how he would do to achieve his dream; the only thing he is doing is actually just express his indignation that no one could understand him.
But the point is, why people around him should “understand” him anyway, he does no show anything that could make sure, or at least show some indications for his future success.

At this point, his family really could not blame him since there is hardly any chance and hope to let him really begin doing his business. But the change comes, the bill, the money that Walter senior have left be the thing that beyond the meaning of a number, but a possible key for everyone’s person dream, for Walter Lee, it is the chance for him to get into the business field, to be the real success male, to live like rich white people, to be the one that have a driver on the car, to be the one that could let his son pick the university “whatever you what to go”, have to admit that the dream Walter Lee has is kind of lovely, but he is too overconfident, his eyes become blind after these imaginations in the head.
After hearing the news that his mom uses the money to buy a house, he just soonly depressed. From the view as a whole family, surly buying a big house with “a lot of sunlight,” a place that Ruth willing to work many different jobs a day to get is a good thing. However, mom still keeps nearly a half more of the money, and decide half of it to Walter Lee’s Liquor Store(In addition, the reason it could be called as a “good idea” is people like drink, which is obviously a weak point), and half of it to support Benetha’s doctor dream.

Then, Walter Lee begins to make his first mistake; he does not just use his part of the money to invest in the liquor store, but also Benetha’s part, no need to say the money could buy one more big house with a little remain, it is big money. He does not think about it and not talk about this after he was doing it; he just gives the money to another guy.

Now, Walter Lee makes his second mistake that he does not even think about it, he does not do any ensure job on that, he just let another guy takes his money go–his money, his dream, the dead body of his bloodied father, the dream of Bethetha also the family, he is too careless. Not only that, Walter Lee looks really not good at doing business stuff; he never learned about it, never talks about how will the liquor store works, and never introduce the two guys to the family; he really knows nothing about all the things. Surely, his a careless and somewhat stupid man from what has he done.

Later parts are not that dramatic, he in depresses, and say no to Landner saying he is would not give in to these Oppressions.

## Advanced Discussion of Walter Lee

A dream is one kind of motivation, and surely it is powerful since it could be the thing one person dying to get or something else important. But dream so-called dream but not a plan or purpose, a dream is something that might not be achieved in the entire life at all. Here leads out a very interesting topic use the example of Walter Lee: Will Walter Lee feels better if the check does not come at all?
We know that the check is a surprise, before Walter Senior’s death, no one knows there would be such big money comes. And they live, well, kind of normal, at least living. After the check arrives, the family get big, nice, dream-achieve level house; And the original apartment could be sold for extra money, but they also need to pay a lot more for the house compare to the previous one, will endure offensive eyesight by living in the white neighborhood. For Walter Lee himself, not only he needs to endure the pressure the family will put on him because he ruins that half or more money, but more importantly, what called hope, what called a dream, what called the self-belief just gone–like a raisin in the Sun. He would have no so-called “male power” in front of his family. He can always chatter about how the family stops him from being a success, but know the fact prove he cannot do it. His dream, at this point, is dead.