Note to reader: this article assumes knowledge of statistical mechanics. I highly recommend the thermal physics lectures notes of the Oxford Physics undergraduate programme, a beautiful exposition of the subject written by the brilliant Alexander Schekochihin. The notes logically constructs statistical mechanics with great clarity without sparing the reader of necessary complexities. I draw most of the mathematical material in this article on Chapter 5 of Convex Optimization by Boyd and Vandenberghe, which is an accurate but accessible introduction to the mathematics of optimisation. Both of these materials are freely available online as pdfs.

The heart of statistical mechanics is the maximum entropy optimisation problem

where the objective function is the negative entropy

and is subject to the equality constraints

  • normalised probability:
  • fixed average energy:
  • fixed average number of particles of each species : .

This is the starting point for obtaining the probability of finding a state with energy and number of particles of the th species in a grand canonical ensemble. Systems in the grand canonical ensemble have a fixed average internal energy and number of particles .

Most undergraduate physics textbooks instruct users to find using the method of Lagrange multipliers. Often the proof that the method works is briskly sketched out or motivated via geometrically intuitive arguments and the student is hurried along to apply the method. But it is an intellectual loss to hurry along because the mathematics behind the method of Lagrange multipliers - the theory of constrained optimisation - is one that is beautiful and fascinating. As we will demonstrate this formalism also informs the physics too.

Theory of Constrained Optimisation

Constrained optimisation is a vast field; here we limit our focus to the mathematics specific to the problem of entropy maximisation. We focus on equality constraints and leave the discussion of inequality constraints to other texts, such as Boyd and Vandenberghe. Moreover we focus on the optimisation of convex functions which is much simpler to deal with because convex functions have a unique global minimum. There are many neat theoretical results

The general class of problems we discuss here are of the form

where is convex and the allowed values of is subject to affine equality constraints of the form

where and . We say is feasible if it satisfies the constraints and to be optimal if it minimises out of the feasible subset of .

The Lagrangian

When we use the method of Lagrange multipliers, we write down a Lagrangian such as the following for the entropy maximisation problem

For notational purposes, we collect the Lagrange multipliers, or dual variables , and into entries of the vector . Then

Observe that reduces to if satisfies the primal constraints . We enforce the constraints on in the primal problem by taking the supremum of the Lagrangian over the dual variables; if the constraints are not satisfied, . This is because when we can make in the Lagrangian arbitrarily large when we take the supremum over if . Minimising can only find minimisers that satisfy the constraints; in other words it is equivalent to finding the minimsers of the primal problem. Supposing feasible points that satisfy the constraint exists, a minimiser of is also a minimiser of , and vice versa:

We call this the primal problem.

The Dual Problem and Duality

Instead of taking first the supremum then the infimum of the Lagrangian, we can consider taking the infimum first and then maximising . We call this problem the dual problem of our primal problem:

How do the primal and dual problems relate to each other? We make use of the Max-min inequality:

Theorem (Max-min inequality). For any function ,

We direct the reader to the wikipedia page on the inequality for its very simple proof. Applying the inequality to the primal and weak problems, we have

In other words, the optimal value of the dual problem sets a lower bound for :

We call this feature of optimisation problems by weak duality. We say that whenever

the problem satisfies strong duality. Since ,

Theorem (Slater’s theorem for Strong Duality). Suppose is convex and there exists (relative interior of the domain) satisfying strict convex inequality constraints and affine equlaity constraints. Then strong duality holds.

The reader can delve into the technical details of the theorem and its proof in pages 226 and 234-236 of Boyd and Vandenberghe.

Because negative entropy is convex, we in fact have strong duality in our max entropy problem. This encourages us to approach our optimisation problem from its dual.

Before we proceed to solve our maximum entropy problem, we take a detour to look at saddle points and their relation to strong duality.

Saddle Points and Strong Duality

Definition: is a saddle point for iff

Proposition: is a saddle point of are primal and dual optimal respectively, and strong duality holds for .

Proof. “” Suppose are primal and dual optimal and strong duality holds. Strong duality implies . Since , we can put and get . Therefore

However since must be primal feasible, with only equality constraints we have . In particular,

Combining and , we obtain the saddle point condition

” Suppose we have the saddle point condition . We first prove that is indeed primal feasible. Writing out the Lagrangian in the left hand side relation of , implies

or . This can only be true if i.e. is primal feasible and .

We proceed to show that is indeed primal optimal. The right hand side of states that . Since is primal feasible, this inequality must also hold if we restrict to the subset of which is primal feasible. But in the feasible subset, . Therefore for all feasible . In other words, is primal optimal.

To show that is dual optimal, we observe that by definition . But from the right hand side of , . Since the infimum of is unique, we conclude that . Since weak duality holds generally, . But , therefore . We have therefore shown that is dual optimal and strong equality holds.

Using Duality to Solve Optimisation Problems

We are now in a position to prove that the method of Lagrange multipliers indeed works!

Theorem: if strong duality holds for the optimisation problem and has a unique minimiser , then .

Proof. Since strong duality holds, there exists a feasible minimiser of the primal problem and a feasible maximiser of the dual function . As a consequence of strong duality, the saddle point of the Lagrangian is primal and dual optimal, satisfying . choosing , . Since minimises , . To satisfy both equalities, we conclude that . is the unique minimiser of over all . If is feasible, then it must also be the unique minimiser over over the feasible subset of , therefore . If is not feasible, then . But then one cannot have while having as the unique minimiser of . Therefore must be feasible and therefore optimal.

We will use this theorem to derive statistical mechanics.

Deriving Statistical Mechanics

We return to solving the max-entropy problem for the grand canonical ensemble. Since our objective function is convex and there are only affine equality constraints placed on , strong duality holds. Therefore we only need to find the maximiser of to find the minimiser of in our primal problem. Differentiating w.r.t and setting it to 0, we find

What remains is to work out the dual optimal point . Since must be primal feasible, it must satisfy the primal constraints:

The constraints implicitly fix the dual variables . Writing out the Lagrangian at the saddle-point i.e. the primal and dual optimum, and letting , and , we have

Definition. The Grand Potential:

If we write out the differential

and substitute the definition of the Grand Potential into the differential (We also observing that .), a page of tedious algebraic simplifications leads us to

Theorem. The First Law of Thermodynamics: where we have assumed the generalised pressure is defined as

Naturally , is really only dependent on the constraints and the particular physics of the states encoded in .

Short cut via Dual Variables

The First Law can be used to derive the identities

As it turns out these relations are naturally borne out of the properties of the dual variables - they are more fundamental than the First Law! Consider again

Suppose we shift the constraints s.t. and find the new minimum of ; we call this the perturbed problem. Denote be the minimum of the original problem and be the minimum of the perturbed problem. If we have strong equality, our previous proposition implies

Choose to be those which satisfy ; then

Furthermore choose to be the minimiser of the perturbed problem, i.e. :

Suppose is differentiable at . For , we can rearrange the inequality

and take , yielding

We rearrange for

Taking ,

Combining inequalities and , we conclude that

The gradient of the objective function minimum with respect to constraints shifts are the optimal dual variables. This lends a natural interpretation to the dual variables: they represent the local sensitivity of the optimum objective function with respect to the changes in the constraints. In the context of the grand canonical ensemble, is simply our definitions of and in (#). As a consequence, rather than slogging through tedious algebraic expressions of differentials, we can use (#) and the definition of the Grand Potential to derive the First Law in one line!

So, what’s new?

So what have we gained by using this complex machinery of duality? Obviously we have learnt nothing particularly new per se; after all statistical mechanics is a mature field of physics! What duality does offer is a new perspective and aesthetic. I put forward two points as to why what we have done here has some aesthetic value.

Our Understanding of Temperature and Chemical Potentials

Asked why temperature is the measure of the change in entropy induced by a change in internal energy (see (#)), a traditional thermodyanmicist might appeal to the First Law of Thermodynamics and write out a one-line derivation of (#). Yet the First Law does not explain how temperature and chemical potentials came into being in the first place: it’s already there in the Law! As a statement, the First Law only contains marginally more information than (#); apart from the pressure term, (#) more or less is a statement of the First Law.

Going deeper, we know full well that the First Law is not a fundamental law; rather it is a consequence of a much deeper philosophy, the Maximum Entropy Principle. Though we could attempt to explain the genesis of (#) by tracing the origin of the First Law back to the Maximum Entropy Principle, this endeavour is not only more difficult but also too convoluted, when we could simply cut out the First Law middle man and define temperature and chemical potentials as local measures of the optimisation objective’s sensitivity to changes in the physical constraints. With this definition temperature and chemical potentials have a direct connection to the fundamental principle behind statistical mechanics.

The Grand Potential, begotten not made?

Most thermodynamics literature pulls the Grand Potential out of thin air and reveres it as a miraculous object; magically, all of the useful quantities in thermodynamics is obtained by taking partial derivatives of the Grand Potential. Textbooks just state its properties and tell students to take them away for good use. It is as if physicists stumbled upon the Grand Potential by chance or they had carried it down from Mount Sinai on a stone tablet.

Yet armed with mathematics we now know better. The definition of the Grand Potential is in fact an application of the equivalence of the Lagragian saddle point with the primal and dual optima, a most elegant theorem! The Grand Potential is the end result of the optimisation problem, not simply an ad hoc utility borne out of convenience. This revelation can only serve to elevate the status of this holy object in statistical physics.