Ref: https://www.quora.com/What-is-the-intuitive-explanation-for-the-duality-in-optimization-Why-are-the-primal-problem-and-the-dual-problem-equivalent

Let us a consider the problem

minimizef(x) subject to g(x)≤0minimizef(x) subject to g(x)≤0

where *f* and *g* are convex. Let x∗x∗ denote the constrained minimizer and p∗=f(x∗)p∗=f(x∗).

The dual function is defined as h(λ)=minxf(x)+λg(x)h(λ)=minxf(x)+λg(x), and given appropriate conditions, we have maxλh(λ)=p∗maxλh(λ)=p∗. Why is that?

Let’s plot all the possible values that (*g*(*x*), *f*(*x*)) can take for every point in the domain. This gives us a 2-dimensional shape in ℝ2R2. But obviously, for optimization purposes we can ignore the bulk of the points in this curve and focus on the lower boundary, or*envelope*.

Now, the problem is to find the minimum value that *f*(*x*) can possible take given that *g*(*x*) is less than or equal to zero. As you can see, this corresponds to the point circled in the graph, which is located at (g(x∗),f(x∗))=(0,p∗)(g(x∗),f(x∗))=(0,p∗).

Now consider adding λg(x)λg(x) to f(x)f(x). To do this, we add a line λg(x)λg(x) to the above graph.

The black line is the envelope of (*g*(x), *f*(x)) and the blue line is the plot of g(x),λg(x)g(x),λg(x). Now, plot (g(x),f(x)+λg(x))(g(x),f(x)+λg(x)) for a particular λλ.

As you can see, adding λg(x)λg(x) has the effect of “tilting” the curve. Let us plot f(x)+λg(x)f(x)+λg(x)for *several* values of λλ:

Each of the colored lines is f(x)+λg(x)f(x)+λg(x) for a different value of λλ. But what do we notice?* *The circled red point, corresponding to the constrained minimum, remains unaffected! In hindsight, this is obvious because g(x∗)=0g(x∗)=0, therefore f(x∗)+λg(x∗)=p∗f(x∗)+λg(x∗)=p∗ is independent of λλ.

Now, because every curve passes through the circled red point (0,p∗)(0,p∗), the minimum of that curve must always be equal to or less than p∗p∗.

In order to get the *correct* answer, we have to find λλ so that the colored curve is minimized at the circled red point. In order to do this, we have to tilt the function just right, so that the curve is flat at the circled red point. This also maximizes the dual function.