(Taken primarily from Ch. 2 of Frenkel and Smit [1] and Ch. 3 of Introduction to Modern Statistical Mechanics, by David Chandler [5].)
This course is centered upon one mathematical statement: \begin{equation} \label {eq:ensemble_average} \left \langle G\right \rangle = \sum _\nu P_\nu G_\nu \end{equation} That is, the expectation value, \(\left \langle G\right \rangle \), of some observable property \(G\) is an average over all possible microstates available to a system, indexed by \(\nu \), where \(P_\nu \) is the probability of observing the system in microstate \(\nu \), and \(G_\nu \) is the value of the measured property G when the system is in microstate \(\nu \). Eq. 1 illustrates the operation of performing an ensemble average.
Before even considering how to use computer simulation to make such a measurement of a particular property for a particular system, there are three main issues to consider:
In the following subsections, we give a cursory treatment of elementary statistical mechanics aimed at answering these questions. The aim is to give the student an appreciation (not a mastery) of the basic physics that underlies a majority of current molecular simulation.
A microstate is a full specification of all degrees of freedom of a system. A system may be conveniently defined as having \(N\) degrees of freedom confined to a volume \(V\). In general, microscopic degrees of freedom are quantum numbers. The index \(\nu \) in Eq. 1 runs over all unique combination of quantum number values. There is a spectrum of energy eigenstates for any system given by \begin{equation} \label {eq:energy_eigenstates} \mathscr {H}\left |\nu \right \rangle = E_\nu \left |\nu \right \rangle , \end{equation} where \(\mathscr {H}\) is the Hamiltonian, \(\left |\nu \right \rangle \) is shorthand (“ket” notation) for the system wavefunction in state \(\nu \), and \(E_\nu \) is the energy of state \(\nu \). In contrast to model systems usually considered in elementary quantum mechanics, the number of distinct microstates of systems of \(\sim 10^{23}\) particles that have the same energy \(E\) is very large, and this set of eigenstates is in practice impossible to obtain explicitly. This is indeed why we must instead treat this set statistically. We refer to the number of states that satisfy a given energy as the degeneracy of the energy level, denoted \(\Omega (E,N,V)\): \begin{equation} \label {eq:microcanonical} \Omega (N,V,E) = \begin {array}{l} \mbox {number of microstates with $N$ and $V$ and}\\ \mbox {energy between $E$ and $E+\delta E$.} \end {array} \end{equation} The many “equivalent” states numbering \(\Omega (N,V,E)\) is called a microcanonical ensemble.
The Ising spin lattice is a simple statistical mechanical model with discrete energy levels which we can now introduce to gain some understanding of what it means to say \(\Omega \) is “large.” Imagine a linear array of \(N\) spins, each pointing either “up” or “down.”
| A 1-D Ising system. |
Let us suppose that the Hamiltonian of this system is given by \begin{equation} \label {eq:ising_fieldhamil} \mathscr {H} = h\sum _{i=1}^{N}s_i \end{equation} where \(s_i\) is -1 if spin \(i\) is “down” and +1 if spin \(i\) is “up,” and \(h\) is some unit of energy. The ground state, the state with the lowest energy, has all spins down, so \(\Omega _0 = 1\). The next state up has one spin up, but there are \(N\) possible microstates that have this energy: \(\Omega _1 = N\). The next state up has two spins, and there are \(N(N-1)/2\) such microstates: \(\Omega _2 = N(N-1)/2\). For \(m\) spins flipped, there are \(\Omega _m = N!/(N-m)!m!\) distinct microstates. Thus we see that working with \(\Omega \) for statistical mechanical systems means working with enormous numbers.
Although quantum mechanics tells us that atomic systems have discrete energy levels, when systems contain very large numbers of atoms, these energy levels become so closely spaced relative to their span that they may effectively be considered a continuum. We can thus pass into a classical (as opposed to quantum mechanical) representation, where the microstate for a system of \(N\) particles is specified by a point in a \(6N\)-dimensional phase space: \begin{equation} \left (r^N,p^N\right ) \equiv \left ({\bf r}_1,{\bf r}_2,...,{\bf r}_N;{\bf p}_1,...,{\bf p}_N\right ). \end{equation} We can denote the number of states in a microcanonical ensemble for a classical system using the Dirac delta function: \begin{equation} \label {eq:microcanon_class} \Omega (N,V,E) = \frac {1}{h^{3N}N!}\int \int d {\bf r}^N d {\bf p}^N \delta \left [\mathscr {H}\left ({\bf r}^N,{\bf p}^N\right )-E\right ] \end{equation} The microcanonical ensemble represents a hyperdimensional surface in the phase space dimensioned by \(N\) particles with positions limited by the extent of \(V\). The factorial in Eq. 6, \(N!\), takes into account that the particles are indistingishable; that is, ordering particle labels is not important. \(h\) is Planck’s constant; note that it has units of (length)(momentum). Think of it as a quantum-mechanically-required “mesh discretization” for continuous space (it arises due to the Heisenberg uncertainty relation). It also nondimensionalizes the partition function. We will encounter it again in the next section, but we will also see why these “prefactors” are not essential ingredients of most molecular simulations.
You may wonder why there seem to be two viewpoints of statistical mechanics, quantum and classical. First, there really aren’t two viewpoints: the classical picture is an approximation of the more general quantum mechanical picture. But statistical mechanics as a discipline was first formalized by Gibbs and Boltzmann before quantum mechanics was widely accepted, so it dealt necessarily with systems of classical particles obeying Newtonian equations of motion; that is, on classical mechanics. There appears to be a general concensus that it is easier to introduce statistical mechanical concepts using the “sum-over-states” notation of quantum statistical mechanics, rather than the apparently more cumbersome (and anyway approximate) “integral-over-phase-space” notation of classical statistical mechanics.
Scientists are taught early on that when conducting measurements, one must perform repeated experiments and average the results. If one makes \(\mathscr {N}\) independent measurements of some observable \(G\), one computes the mean value as \begin{equation} G_{\rm obs} = \frac {1}{\mathscr {N}}\sum _{i=1}^{\mathscr {N}}G_i. \end{equation}
We now imagine that the time of a measurement is so short that we know that the system is in only one of its many possible microstates. This means we can write \begin{equation} \label {eq:exp_obs} G_{\rm obs} = \sum _{\nu }\left [\frac {1}{\mathscr {N}}\left (\begin {array}{c} \mbox {number of times state $\nu $ is} \\ \mbox {observed in the $\mathscr {N}$ observations} \end {array}\right )\right ]G_\nu , \end{equation} where \(G_\nu \) was introduced previously (Eq. 1) as the value of observable \(G\) when the system is in state \(\nu \).
Now we have to imagine that our system is evolving in time. As it evolves, its degrees of freedom change values, and the system is thought of as tracing out a trajectory in state space. (“State space” is a Hilbert space spanned by all states \(\left |\nu \right \rangle \) in the quantum mechanical case, or phase space in the classical case.) How is the system evolving? The system wavefunction evolves according to Schrödinger’s equation, while particles in a classical system follow Newtonian mechanics. (We will consider Newtonian mechanics in much greater detail later in Sec. 4.) As the experimenters, we control the system by specifying a handful of variables, such as its total energy, \(E\), the number of particles, \(N\), and the volume, \(V\). These contraints force the system’s trajectory to remain on a hyperdimensional surface in state space.
The key assumption we make at this point is that, if we wait long enough, our system will visit every possible state; that is, the trajectory will eventually pass through every available point in state space consistent with our constraints. If this is true, and we make \(\mathscr {N}\) independent observations, then the number of times we observe the system in state \(\nu \) divided by the number of observations, \(\mathscr {N}\), is the probability of observing state \(\nu \), \(P_\nu \), if we happen to make a random observation. So, Eq. 8 above becomes the ensemble average first presented in Eq. 1: \begin{equation} G_{\rm obs} = \sum _\nu P_\nu G_\nu = \left \langle G\right \rangle \end{equation} We see that a time average is the same as an ensemble average.
This assumption is important: it is referred to as the ergodic hypothesis. A system is “ergodic” if, after a sufficiently long time, it visits all possible state space points consistent with whatever constraints are put on it. We cannot in general prove that any system is ergodic; it is something we are comfortable assuming for most systems based on our physical intuition. There are, however, many systems which are non-ergodic. For the most part, we will not concern ourselves with such systems in this course.
Another important consideration is the following: How far apart must the \(\mathscr {N}\) independent measurements be from one another in time to be considered truly “independent”? To answer this question, we must introduce the notion of a relaxation time, \(\tau _{\rm relax}\), which arises naturally due to the presumably chaotic nature of the microscopic system. Given some initial conditions, after a time \(\tau _{\rm relax}\) has elapsed, the system has “lost memory” of the initial condition. We measure this loss of memory in terms of correlation functions, which will be discussed in more detail later. If we wait at least \(\tau _{\rm relax}\) between successive observations, we know they are independent. It turns out that one can use simulation methods to estimate relaxation times (and their spectra; many systems display a broad spectrum of relaxation times, each element cooresponding to a particular type of molecular motion). We will pay particularly close attention to \(\tau _{\rm relax}\) in upcoming sections.
Eq. 3 introduced the quantity \(\Omega (N,V,E)\) as the number of states available to a system under the constraints of constant number of particles, \(N\), volume, \(V\), and energy \(E\). The fundamental postulate of statistical mechanics, also called the “rational basis” by Chandler, is the following:
In statistical equilibrium, all states consistent with the constraints of \(N\), \(V\), and \(E\) are equally probable.
or \begin{equation} P_\nu = 1/\Omega (N,V,E). \end{equation} This relation is often referred to as a statement of the “equal a priori probabilities in state space.” Another way of saying the same thing: The probability distribution for states in the microcanonical ensemble is uniform.
The link between statistical mechanics and classical thermodynamics is given by a definition of entropy: \begin{equation} \label {eq:entropy} S \equiv k_B\ln \Omega \end{equation} Note two important properties of \(S\). First, it is extensive: if we consider a compound system made of subsystems \(A\) and \(B\) with \(\Omega _A\) and \(\Omega _B\) as the respective number of states, the total number of states is \(\Omega _A\Omega _B\), and therefore \(S = S_A + S_B\). Second, it is consistent with the second law of thermodynamics: putting any constraint on the system lowers its entropy because the constraint lowers the number of accessible states.
Temperature is defined using entropy: \(1/T \equiv \left (\partial S/\partial E\right )_{N,V}\), or \begin{equation} \beta = \left (k_BT\right )^{-1} = \left (\partial \ln \Omega /\partial E\right ) \end{equation}
Now we will consider constraining our system not with constant \(E\), but with constant \(T\). The set of all possible states satisfying constraints of \(N\), \(V\), and \(T\) is called the canonical ensemble. Contrary to appearances based on their names, the canonical ensemble can be envisioned as a subsystem in a larger, microcanonical system. Consider such a system divided into a small subsystem \(A\), surrounded by a large “bath” \(B\). We imagine that these two subsystems are “weakly coupled,” meaning they exchange only thermal energy, but no particles, and their volumes remain fixed. We seek to compute the probability of finding the total system in a state such that subsystem \(A\) has energy \(E_A\). The entire system is microcanonical, so the total energy, \(E\) is constant, as is the total number of states available to the system, \(\Omega (E)\) (we omit the \(N\) and \(V\) for simplicity).
When \(A\) has energy \(E_A\), the total system energy is \(E = E_A + E_B\), where \(E_B\) is the energy of the bath. By constraining system \(A\)’s energy, we have reduced the number of states available to the whole system to \(\Omega (E-E_A)\). So, the probability of observing the system in a state in which subsystem \(A\) has energy \(E_A\) looks like \begin{equation} P_A = \frac {\left [\begin {array}{l} \mbox {Number of states for which}\\ \mbox {subsystem $A$ has energy $E_A$} \end {array}\right ]} {\left [\begin {array}{l} \mbox {Number of states available}\\ \mbox {to entire system} \end {array}\right ]} = \frac {\Omega \left (E - E_A\right )}{\Omega \left (E\right )} \end{equation} We can expand \(\Omega \left (E - E_A\right )\) in a Taylor series around \(E_A = 0\):
where the partial derivative implies we are holding \(N\) and \(V\) fixed. We can truncate the Taylor expansion at the first-order term, because higher order terms become less and less important as the size of subsystem \(B\) becomes larger and larger. What results is the Boltzmann distribution law for energies of a system at constant temperature: \begin{equation} P_A \propto \exp \left (-\beta E_A\right ) \end{equation} The normalization condition requires that for all energies of subsystem \(A\), \(E_A\), \begin{equation} \sum _A P_A = 1 = \sum _A \exp \left (-\beta E_A\right ) \equiv Q\left (N,V,T\right ), \end{equation} which defines the canonical partition function, \(Q\). Therefore, \begin{equation} P_A = Q^{-1}\exp \left (-\beta E_A\right ) \end{equation}
Because some energies can correspond to more than one microstate, we should distinguish between “states” and “energy levels.” We can express the canonical partition function as \begin{equation} \label {eq:canonical_1} Q = \sum _{\begin {array}{cc}\nu \\\mbox {states}\end {array}} e^{-\beta E_\nu } = \sum _{\begin {array}{cc}l\\\mbox {levels}\end {array}} \Omega \left (E_l\right ) e^{-\beta E_l} \end{equation} where, as we have seen, \(\Omega \left (E\right )\) is the number of microstates with energy \(E\). Moving to the continuum limit, and assuming a reference energy of \(E_{ref} = 0\), \begin{equation} \label {eq:dos} Q \rightarrow \int _0^\infty dE \overline \Omega \left (E\right )e^{-\beta E} \end{equation} where \(\overline \Omega \left (E\right )\) is the density of states. What is this equation telling us? It is telling us that \(Q\) is the Laplace transform of \(\overline \Omega \). We know that transform pairs are unique, and hence, both \(Q\) and \(\overline \Omega \) contain the same information.
We recognize that for a system described by a canonical ensemble, the energy is a fluctuating quantity. And we now have the probability of observing a state with a given energy, so we can use Eq. 1 to compute the average energy, \(\left \langle E\right \rangle \). Consider
Notice that \begin{equation} \sum _\nu E_\nu \exp \left (-\beta E_\nu \right ) = -\left (\partial Q \left / \partial \beta \right .\right ) \end{equation} Recalling that \(d\ln f(x) / dx = 1/f df/dx\), we see that
Now, let us consider the average magnitude of the fluctuations in energy in the canonical ensemble.
Now, noting that the definition of heat capacity at constant volume, \(C_v\), is \begin{equation} C_v = \left (\frac {\partial E}{\partial T}\right ) \end{equation} we see that \begin{equation} \left \langle \left (\delta E\right )^2\right \rangle = k_B T^2 C_v \end{equation} This is an interesting statement. It relates the magnitude of spontaneous fluctuations in the total energy of a system to that system’s capacity to store or release energy due to changing its temperature.
The fact (Eq. ??) that the average energy in the canonical ensemble is related to a derivative of the log of the partition function implies that \(\ln Q\) is an important thermodynamic quantity. So, let’s go back to our undergraduate thermodynamics course(s) and recall the following statement of the 1st and 2nd Law: \begin{equation} dA = -SdT - pdV + \mu dN \end{equation} where \(A\) is the Helmholtz free energy, defined in terms of internal energy and entropy as \begin{equation} \label {eq:helmholtz} A = \left \langle E\right \rangle - TS \end{equation} Now, consider the following derivative of \(A\):
Therefore, \begin{equation} \label {eq:dAdBeta_E} \left (\frac {\partial \left (\beta A\right )}{\partial \beta }\right )_{N,V} = \left \langle E\right \rangle . \end{equation} Considering Eq. ??, we see that \begin{equation} \label {eq:C} \ln Q + C = -\beta A \end{equation} which does indeed suggest an important link between \(\ln Q\) and the important thermodynamic quantity, the Helmholtz free energy. But what is the constant \(C\)? To evaluate it, consider the “boundary condition” as \(T \rightarrow 0\): \begin{equation} Q = \sum _\nu e^{-\beta E_\nu } \stackrel {T\rightarrow 0}{\longrightarrow } e^{-\beta E_{\rm ground}}. \end{equation} Here, we have assumed that the degeneracy of the ground state, \(\Omega \left (E_{\rm ground}\right )\) is 1. This tells us that \begin{equation} \lim _{T\rightarrow 0} \ln Q = -\beta E_{\rm ground} \end{equation} Using this fact, and combining Eqs. 23 and 25, as \(T \rightarrow 0\), we see that \begin{equation} -\beta E_{\rm ground} + C = -\beta \underbrace {\left \langle E\right \rangle }_{E_{\rm ground}} - \underbrace {\frac {S}{k_B}}_{\rightarrow 0\ (\Omega = 1)}, \end{equation} Hence, \(C\) = 0. So, \begin{equation} \ln Q = -\beta A. \end{equation} The quantity \(-\beta ^{-1}\ln Q\) is the Helmholtz free energy, \(A\). This is denoted \(F\) in Frenkel & Smit [1].
Section 2.2 of Frenkel & Smit [1] discusses a derivation of the “quasi-classical” representation of the canonical partition function, \(Q_{\rm classical}\): \begin{equation} \label {eq:q_classical} Q_{\rm classical} = \frac {1}{h^{dN}N!}\int \int d{\bf r}^N d{\bf p}^N \exp \left [-\beta \mathscr {H}\left ({\bf r}^N,{\bf p}^N\right )\right ] \end{equation} \(\mathscr {H}\left ({\bf r}^N,{\bf p}^N\right )\) is the Hamiltonian function which computes the energy of a point in phase space. The derivation of Eq. 30 is not repeated here. What is important is that the probability of a point in phase space is represented as \begin{equation} P\left ({\bf r}^N,{\bf p}^N\right ) = \left (Q_{\rm classical}\right )^{-1} \exp \left [-\beta \mathscr {H}\left ({\bf r}^N,{\bf p}^N\right )\right ]. \end{equation} So, the general “sum-over-states’ ensemble average of quantum statistical mechanics, first presented in Eq. 1, becomes an integral over phase space in classical statistical mechanics: \begin{equation} \label {eq:classical_ensemble_average} \left \langle G\right \rangle = \frac {\int \int d{\bf r}^N d{\bf p}^N \exp \left [-\beta \mathscr {H}\left ({\bf r}^N,{\bf p}^N\right )\right ] G\left ({\bf r}^N,{\bf p}^N\right )}{\int \int d{\bf r}^N d{\bf p}^N \exp \left [-\beta \mathscr {H}\left ({\bf r}^N,{\bf p}^N\right )\right ]}, \end{equation} where \(G\left ({\bf r}^N,{\bf p}^N\right )\) is the value of the observable \(G\) at phase space point \(\left ({\bf r}^N,{\bf p}^N\right )\). Before moving on, it is useful to recognize that we normall simplify this ensemble average by noting that, for a system of classical particles, the usual choice for the Hamiltonian has the form \begin{equation} \label {eq:classical_hamiltonian} \mathscr {H}\left ({\bf r}^N, {\bf p}^N\right ) = \mathscr {K}\left ({\bf p}^N\right ) + \mathscr {U}\left ({\bf r}^N\right ) \end{equation} where \(\mathscr {K}\) is the kinetic energy, which is only a function of momenta, and \(\mathscr {U}\) is the potential energy, which is only a function of position. The canonical partition function, \(Q\), can in this case be factorized:
The quantity in the left-hand braces is the ideal gas partition function, because it corresponds to the case when the potential \(\mathscr {U}\) is 0. (Note that we have multiplied and divided by \(V^N\); this is the equivalent of scaling the positions in the integration over positions.) The quantity in the right-hand braces is called the configurational partition function, \(Z\).
Because the kinetic energy \(\mathscr {K}\) has the simple form, \begin{equation} \mathscr {K}\left ({\bf p}^N\right ) = \sum _i \frac {{\bf p}_i^2}{2m_i}, \end{equation} where \(m_i\) is the mass of particle \(i\), the integral over particle momenta can be evaluated analytically:
(We have assumed all particles have the same mass, \(m\); in the case of distinct masses, this is just a product of similar factors.)
\(Q_{ideal}\) becomes
where \(\Lambda \) is the de Broglie wavelength.
So, when the observable \(G\) is a function of positions only, the ensemble average becomes a configurational average: \begin{equation} \label {eq:config_avg} \left \langle G\right \rangle = Z^{-1}\int d{\bf r}^N \exp \left [-\beta \mathscr {U}\left ({\bf r}^N\right )\right ] G\left ({\bf r}^N\right ). \end{equation} Note that the integation over momentum yields a factor \(Q_{ideal}\) in both the numerator and denominator, and thus divides out. We can write this configurational average using a probability distribution, \(\rho _{NVT}\), as \begin{equation} \left \langle G\right \rangle \int d{\bf r}^N G\left ({\bf r}^N\right )\rho _{NVT}\left ({\bf r}^N\right ) \end{equation} where \begin{equation} \rho _{NVT}\left ({\bf r}^N\right ) \equiv Z^{-1}e^{-\beta U\left ({\bf r}^N\right )} \end{equation} is called the “canonical probability distribution.” As pointed out on p. 15 of Frenkel & Smit [1], Eq. 35 is “the starting point for virtually all classical simulations of many-body systems”; that is, it is the starting point for all simulations discussed in this course.