No one should have the impression that people who do molecular simulation only care about the Lennard-Jones fluid. It has been and continues to be an important test-bed for theories of the liquid state and phase-equilibria. Nevertheless, molecular simulation has been performed on a wide variety of materials.
As a single example, consider silicon. Perhaps the earliest attempt to use molecular simulation to study a realistic atomic-scale model of silicon was due to Stillinger and Weber [16]. I “cut my teeth” on the Stillinger-Weber potential coding up my first real research code in graduate school. I now make a version of that code available to the students in this course, available at mdswsi.c. This code computes in reduced units as well; \(\sigma \) = 0.20951 nm, \(\epsilon \) = 2.1678 eV, and \(m\) = 28.085 amu, which are appropriate for a system of pure silicon. One reduced unit of temperature, \(T [=] \epsilon /k_B\), corresponds to 25156.73798 K.
The main reason to introduce Stillinger-Weber silicon here is to give you an example of a three-body potential. Silicon forms 4-coordinated tetrahedral bonded structures. The object of the three-body component of the potential is to enforce the tetrahedral bond angle (109.47\(^\circ \)) among triplets of bonded atoms.
The total potential is expressed as two sums, one for unique pair interactions, and another for unique triplet interactions:
The two-body models the bonds: \begin{equation} v_2 \left (r\right ) = \left \{\begin {array}{ll} \epsilon A \left (Br^{-p}-r^{-q}\right )\exp \left [\left (r-a\right )^{-1}\right ], & \mbox {$r < a$}\\ 0 & \mbox {$r\ge a$} \end {array} \right . \end{equation} It is very much like a Lennard-Jones potential, only with different exponents and a “smooth cutoff” as the interatom separation distance, \(r\), approaches some cutoff, \(a\), given by the factor \(\exp \left [\left (r-a\right )^{-1}\right ]\).
The three body models the angles, and is the sum of functions of each of the three angles of a triplet, \(ijk\):
Here I have employed the shorthand notation \(\displaystyle h_{jik} \equiv h\left (r_{ij}, r_{ik}, \theta _{jik}\right )\). Note that, in the notation of this potential, \(\theta _{jik}\) is subtended at \({\bf r}_i\), and \(\cos \theta _{jik}^c = -\frac {1}{3}\):
One computes the angle-\(j\) term, \(h_{ijk}\), and the angle-\(k\) term, \(h_{ikj}\), by permuting the indices appropriately.
The parameters used in the original study by Stillinger and Weber are:
As in any MD simulation, one computes the force on any particle \(i\) from the negative gradient of the total potential:
The two-body term for the \(i\)-\(j\) interaction is only slightly more complicated than Lennard-Jones, due to the smooth cutoff. Here, assuming \(i\) and \(j\) are within interaction range (\(r_{ij} < a\)), we have for the force on \(i\) due to \(j\):
Note that it is still true that \({\bf f}_{ij} = -{\bf f}_{ji}\). Exercise: Show that the right-hand-side of Eq. ?? is well-behaved (that is, it does not diverge, and in fact vanishes) as \(r_{ij}\rightarrow a^-\).
It is comparatively much more tedious to evaluate the three-body gradients.
The triplet forces on the other members of the triplet, \({\bf f}_{j\leftarrow ik}\) and \({\bf f}_{k\leftarrow ij}\), are defined analogously.
Each of the partials in Eq. ?? is unique: \begin{equation} \label {eq:sw3jik} \begin {array}{lll} \mbox {\raisebox {-0.5cm}{\includegraphics [width=1cm]{figpng/hjik.png}}\ \ } \displaystyle {\frac {\partial h_{jik}}{\partial {\bf r}_i}} & \displaystyle = &\displaystyle - \gamma h_{jik}\left [ {{{\bf r}_{ij}}\over {r_{ij}}}{{1}\over {\left (r_{ij}-a\right )^2}} + \frac {{\bf r}_{ik}}{r_{ik}}\frac {1}{\left (r_{ik}-a\right )^2} \right ] \\ & \displaystyle + & \displaystyle 2\lambda \exp \left ( \frac {\gamma }{r_{ij}-a} + \frac {\gamma }{r_{ik}-a} \right ) \left (\cos \theta _{jik}-\cos \theta _{jik}^c\right ) \\ & \displaystyle \times & \displaystyle \left [ \frac {{\bf r}_{ij}}{r_{ij}}\frac {1}{r_{ik}} +\frac {{\bf r}_{ik}}{r_{ik}}\frac {1}{r_{ij}} -\left (\frac {{\bf r}_{ij}}{r_{ij}}\frac {1}{r_{ik}} +\frac {{\bf r}_{ik}}{r_{ik}}\frac {1}{r_{ij}}\right ) \cos \theta _{jik}\right ] \end {array} \end{equation} \begin{equation} \label {eq:sw3ijk} \begin {array}{lll} \mbox {\raisebox {-0.5cm}{\includegraphics [width=1cm]{figpng/hijk.png}}\ \ } \displaystyle \frac {\partial h_{ijk}}{\partial {\bf r}_i} & \displaystyle = & \displaystyle - \gamma _j h_{ijk}\left [ \frac {{\bf r}_{ij}}{r_{ij}}\frac {1}{\left (r_{ij}-a\right )^2} \right ] \\ & \displaystyle + & \displaystyle 2\lambda _j\exp \left ( \frac {\gamma _j}{r_{ij}-a} + \frac {\gamma _j}{r_{jk}-a} \right ) \left (\cos \theta _{ijk}-\cos \theta _{ijk}^c\right ) \\ & \displaystyle \times & \displaystyle \left [ \frac {{\bf r}_{jk}}{r_{jk}}\frac {1}{r_{ij}} +\frac {{\bf r}_{ij}}{r_{ij}}\frac {1}{r_{ij}} \cos \theta _{ijk}\right ] \end {array} \end{equation} \begin{equation} \label {eq:sw3ikj} \begin {array}{lll} \mbox {\raisebox {-0.5cm}{\includegraphics [width=1cm]{figpng/hikj.png}}\ \ } \displaystyle \frac {\partial h_{ikj}}{\partial {\bf r}_i} & \displaystyle = & - \displaystyle \gamma _k h_{ikj}\left [ \frac {{\bf r}_{ik}}{r_{ik}}\frac {1}{\left (r_{ik}-a\right )^2} \right ] \\ & \displaystyle + & \displaystyle 2\lambda _k\exp \left ( \frac {\gamma _k}{r_{ik}-a} + \frac {\gamma _k}{r_{jk}-a} \right ) \left (\cos \theta _{ikj}-\cos \theta _{ikj}^c\right ) \\ & \displaystyle \times & \displaystyle \left [ -\frac {{\bf r}_{jk}}{r_{jk}}\frac {1}{r_{ik}} +\frac {{\bf r}_{ik}}{r_{ik}}\frac {1}{r_{ik}} \cos \theta _{ikj}\right ] \end {array} \end{equation} Note that the right hand sides of each of Eqns. 152, 153, and 154 vanish when the appropriate \(r\)’s are greater than \(a\)’s, as in Eq. 212. Exercise: Show that \({\bf f}_{i\leftarrow jk} + {\bf f}_{j\leftarrow ik} + {\bf f}_{k\leftarrow ij} = 0\).
We can investigate what happens when we willfully ignore the three-body terms. Let us initialize atoms on a diamond-cubic lattice (the minimal energy lattice of silicon); a snapshot appears below.
|
| Snapshot of a 64-atom chuck of Si on a diamond-cubic lattice. |
Let us now run two MD simulations for 10,000 time steps; one with and and the other without three body forces, at reduced initial temperature \(T\) = 0.12 and reduced density \(\rho \) = 0.46 for a small system of 64 atoms. You can download small 64-atom system configuration here, already equilibrated at \(T\)=0.12 (pictured above). Note that you must compile the code mdswsi.c with the flag -DTHREEBODY to include three-body forces. You can create an executable version of mdswsi that ignores three-body forces by compiling without the flag -DTHREEBODY. For example:
% gcc -o mdswsi_no3 mdswsi.c -lm -lgsl % gcc -o mdswsi mdswsi.c -DTHREEBODY -lm -lgsl
Note also that this code uses the Andersen thermostat by default, with a default value of \(\nu \) = 0.1. This can be overridden (or turned off) by specifying a value of \(\nu \) on the command line. For example,
% mdswsi -nu 0 -ns 10001 -icf init.xyz
turns off the thermostat, which is what I have done for this little demonstration.
Below we plot the instantaneous temperature vs. time-step (0.001 \(\tau \)) for the two runs. Notice that the system with the 3-body forces intact remains steady at \(T \approx \) 0.12, while the system without 3-body forces simply “melts,” with \(T\) approaching 20,000 K A quick look at a configuration (not shown) reveals that there is no longer any crystalline order; the system is now an amorphous blob of silicon atoms. The lesson of this little demonstration is that one must have three-body forces to stabilize a diamond-cubic lattice.
|
| Instantaneous temperature \(T\) vs time-step in MD simulations of
silicon using the Stillinger-Weber potential with and without
three-body forces. The system is 64 atoms at an initial
temperature of 0.12. |
As a suggested project, you can try one or more of several things with this code.
In this section, we will consider the technique of transition path sampling Monte Carlo simulation for calculating rate constants [17, 18]. This technique is one of many designed to overcome time-scale limitations in gathering statistics on processes driven by rare events, or thermally-activated transitions. When performed correctly, transition path sampling provides a means to gain a statistically significant understanding of reaction mechanisms. One of the most significant example uses of this technique is a recent prediction of rate constants for folding-like transitions in simulations of atomically specific protein fragments in explicit solvent [19].
The code used in this section was downloaded from Berend Smit’s Molecular Simulation course webpage.
Consider a unimolecular isomerization reaction: \begin{equation} A~^{\stackrel {\textstyle k_{AB}}{\textstyle \rightharpoonup }}_{\stackrel {\textstyle \leftharpoondown }{\textstyle k_{BA}}}~B \end{equation} \(k_{AB}\) is the forward rate constant for the reaction, and \(k_{BA}\) is the backward rate constant. They are required to express the concentration balances for species \(A\) and \(B\) in the traditional way:
At equilibrium, the time derivatives vanish, and \(C_A \rightarrow \left \langle C_A\right \rangle \), and \begin{equation} K = \frac {\left \langle C_A\right \rangle }{\left \langle C_B\right \rangle } = \frac {k_{BA}}{k_{AB}} \end{equation} We now ask, how does an initial perturbation in the concentration of \(A\), written \(\Delta A\), decay with time, assuming \(C_A + C_B = C\) = const.?
We can easily solve this to yield: \begin{equation} \label {eq:tps_macro1} \Delta C_A\left (t\right ) = \Delta C_A\left (0\right )\exp \left [-\left (k_{AB}+k_{BA}\right )t\right ] \equiv \Delta C_A\left (0\right )e^{-t/\tau _R} \end{equation} which defines the “decay time”:
(The last equality assumes the concentrations are normalized such that \(C_A + C_B = 1\); that is, we can consider \(C_A\) as the probability of observing “state” \(A\).)
Now we make a connection to the microscopic. Imagine that the two states labelled \(A\) and \(B\) are separated from each other along some general reaction coordinate \(q\) which has a large free energy barrier at position \(q^*\):
|
This is a free energy barrier because the system presumably has many more degrees of freedom other than (which may or may not contribute to) \(q\). Now, we enter the realm of linear response theory (See Appendix C in F&S), and ask, what is the behavior or the system with a finitely small, static perturbation toward state \(A\)? We express this as a perturbation Hamiltonian: \begin{equation} \mathscr {H} = \mathscr {H}_0 -\epsilon g_A\left (q-q^*\right ) \end{equation} where \(\mathscr {H}_0\) is the reference Hamiltonian of the unperturbed system. Here, \(g_A\) is an indicator function which is 1 if we are in state \(A\) and 0 otherwise: \begin{equation} g_A\left (x\right ) = \left \{\begin {array}{ll} 1 & \mbox {$x<0$}\\ 0 & \mbox {$x>0$} \end {array}\right . \end{equation} Notice that the peturbation lowers the energy by a little bit (\(\epsilon \)) when the system chooses to be in state \(A\). Here, we think of \(\epsilon \) as a switch we can flip in order to perturb the system. We measure the static pertubation as the difference between the average concentration of \(A\) in the perturbed state to that in the unperturbed state: \begin{equation} \Delta C_A = \left \langle C_A\right \rangle _\epsilon - \left \langle C_A\right \rangle _0 = \left \langle g_A\right \rangle _\epsilon - \left \langle g_A\right \rangle _0 \end{equation} Notice that because of our choice in magnitude for \(g_A\), we can consider its ensemble average, \(\langle g_A\rangle \), to be the probability of observing the system in state A, which is, due to our normalization of concentration, equivalent to the concentration of A.
Our first step is to find the linear response of \(\Delta C_A\) to \(\epsilon \), defined as \begin{equation} \left (\frac {\partial \Delta C_A}{\partial \epsilon }\right )_{\epsilon =0} = \left (\frac {\partial \left \langle g_A\right \rangle }{\partial \epsilon }\right )_{\epsilon =0} \end{equation} So let us first compute \(\langle g_A\rangle _\epsilon \) in the canonical ensemble: \begin{equation} \left \langle g_A\right \rangle _\epsilon = \frac { \int d{\bf q}^Nd{\bf p}^N \exp \left [-\beta \mathscr {H}_0\left ({\bf q},{\bf p}\right )+\beta \epsilon g_A\left (q-q^*\right )\right ]g_A\left (q-q^*\right )}{ \underbrace {\int d{\bf q}^Nd{\bf p}^N \exp \left [-\beta \mathscr {H}_0\left ({\bf q},{\bf p}\right )+\beta \epsilon g_A\left (q-q^*\right )\right ]}_{Z_\epsilon }} \end{equation} Here, we have defined \(Z_\epsilon \) as the unnormalized canonical partition function, merely to make the following gymnastics a little more transparent. Differentiating with respect to \(\epsilon \): \begin{equation} \frac {\partial \left \langle g_A\right \rangle _\epsilon }{\partial \epsilon } = \frac {1}{Z_\epsilon }\int d{\bf q}^Nd{\bf p}^N \exp \left (-\beta \mathscr {H}_0+\beta \epsilon g_A\right )\beta g_A^2 - \frac {1}{Z_\epsilon ^2}\left [\int d{\bf q}^Nd{\bf p}^N \exp \left (-\beta \mathscr {H}_0+\beta \epsilon g_A\right )g_A\right ]\left [\int d{\bf q}^Nd{\bf p}^N \exp \left (-\beta \mathscr {H}_0+\beta \epsilon g_A\right )\beta g_A\right ], \end{equation} which simplifies to \begin{equation} \frac {\partial \left \langle g_A\right \rangle _\epsilon }{\partial \epsilon } = \beta \left \langle g_A^2\right \rangle _\epsilon - \beta \left \langle g_A\right \rangle ^2_\epsilon . \end{equation} At \(\epsilon =0\) this gives the remarkable result \begin{equation} \left (\frac {\partial \Delta C_A}{\partial \epsilon }\right )_{\epsilon =0} = \beta \left (\left \langle g_A^2\right \rangle _0 - \left \langle g_A\right \rangle _0^2\right ). \end{equation}
Now, we haven’t said anything in detail about the structure of \(g_A\), but in fact, to perform this analysis, we require that it be continuous through the origin. But, we can make it vary from 1 to 0 across as narrow a region of \(q\) around the origin as we like. Then it is true that, at all values of \(q\) except perhaps right at the barrier position \(q^*\), \begin{equation} g_A^2 = g_A \ \ \ \mbox {so\ \ \ } \left \langle g_A^2\right \rangle = \left \langle g_A\right \rangle \end{equation} This implies \begin{equation} \label {eq:tps_2} \left (\frac {\partial \Delta C_A}{\partial \epsilon }\right )_{\epsilon =0} = \beta \left [\left \langle g_A\right \rangle _0\left (1 - \left \langle g_A\right \rangle _0\right )\right ] = \beta \left \langle C_A\right \rangle \left \langle C_B\right \rangle \end{equation}
The next step is to introduce a clock that begins ticking the moment we set \(\epsilon = 0\). As long as \(\epsilon \) was small, we can compute the decay of the static perturbation \(\Delta C_A\left (t\right )\) to first order in \(\epsilon \) as2 \begin{equation} \label {eq:tps:lrt1} \Delta C_A\left (t\right ) = \beta \epsilon \frac {\int d{\bf q}^Nd{\bf p}^Ne^{-\beta \mathscr {H}_0}\left [g_A\left (0\right )-\left \langle g_A\right \rangle \right ]e^{iL_0t}\left [g_A\left (0\right )-\left \langle g_A\right \rangle \right ]}{\int d{\bf q}^Nd{\bf p}^Ne^{-\beta \mathscr {H}_0}} \end{equation} \(iL_0\) is the Liouville operator, which we first encountered in the Liouville operator formalism for deriving MD integrators (Sec. 4.1.2), and \(e^{iL_0t}\) is the classical propagator: \begin{equation} e^{iL_0t}f\left (0\right ) = f\left (t\right ) \end{equation} This yields: \begin{equation} \label {eq:tps_2.1} \Delta C_A\left (t\right ) = \beta \epsilon \left \langle \Delta g_A\left (0\right )\Delta g_A\left (t\right )\right \rangle ,\ \ \ \ \Delta g_A \equiv g_A - \left \langle g_A\right \rangle \end{equation}
Now, Eq. 167, when integrated implies \begin{equation} \Delta C_A\left (0\right ) = \beta \epsilon \left \langle C_A\right \rangle \left \langle C_B\right \rangle \end{equation} which we can use to eliminate \(\epsilon \) in Eq. 171, yielding \begin{equation} \label {eq:tps_3} \Delta C_A\left (t\right ) = \Delta C_A\left (0\right ) \frac {\left \langle \Delta g_A\left (0\right )\Delta g_A\left (t\right )\right \rangle }{\left \langle C_A\right \rangle \left \langle C_B\right \rangle } \end{equation}
Now, recalling Eq. 157, we can make a connection to the macroscopic: \begin{equation} \label {eq:tps_4} e^{-t/\tau _R} = \frac {\left \langle \Delta g_A\left (0\right )\Delta g_A\left (t\right )\right \rangle }{\left \langle C_A\right \rangle \left \langle C_B\right \rangle } \end{equation}
This result states that the decay behavior is governed by the autocorrelation function of concentration fluctuations. Eq. 174 is itself a remarkable result of modern nonequilibrium statistical mechanics, attributable to Onsager.
We have implicitly assumed that the system spends all of its time in either \(A\) or \(B\); the system spends virtually no time at the barrier crossing itself. Eq. 174 is therefore valid as long as the decay time, \(\tau _R\), is much greater than the “barrier crossing time”; usually, this is true (we assume).
Now, we differentiate Eq. 174 with respect to time: \begin{equation} -\frac {1}{\tau _R}e^{-t/\tau _R} = \frac {\left \langle g_A\left (0\right )\dot {g}_A\left (t\right )\right \rangle }{\left \langle C_A\right \rangle \left \langle C_B\right \rangle } \end{equation} where \(\dot {g_A} \equiv d g_A / dt\). Note that the \(\Delta \)’s are gone thanks to the fact that \(d\left \langle g_A\right \rangle /dt = 0\).
For time much less than \(\tau _R\), but still much greater than the barrier crossing time, \((t \ll \tau _R)\),3 \begin{equation} \label {eq:tps_5} -\left (\tau _R\right )^{-1} = \frac {\left \langle g_A\left (0\right )\dot {g}_A\left (t\right )\right \rangle }{\left \langle C_A\right \rangle \left \langle C_B\right \rangle } = -\frac {\left \langle \dot {g}_A\left (0\right )g_A\left (t\right )\right \rangle }{\left \langle C_A\right \rangle \left \langle C_B\right \rangle } \end{equation}
Recalling from way back from Eq. ?? how \(\tau _R\) and the rate constant \(k_{AB}\) are related, we find finally that \begin{equation} k_{AB}\left (t\right ) = \frac {\left \langle \dot {g}_A\left (0\right )g_A\left (t\right )\right \rangle }{\left \langle C_A\right \rangle } \end{equation}
So: A rate constant is the time-derivative of a concentration autocorrelation function.
The idea of path sampling Monte Carlo is that, because \(k_{AB}\) is an ensemble average, we can compute it using MC, if we determine first what ensemble we need to sample.
In this section, we briefly recapitulate the presentation of the transition path sampling method given in Ref. [18]. Accordingly, we will switch notation a bit. We will call the indicator function \(h_A\), and call a state point at time \(t\) \(x_t\): \begin{equation} h_{A,B}\left (x_t\right ) = \left \{\begin {array}{ll} 1 & \mbox {if $x_t\in A,B$}\\ 0 & \mbox {if $x_t \notin A,B$} \end {array}\right . \end{equation} We consider the correlation function which measures the likelihood of finding the system in state \(B\) at time \(t\) provided that it was in state \(A\) at time 0. Now, for extremely long times, \(C\) approaches the probability to find the system in state B at equilibrium, regardless of the system starting point (i.e., ergodicity is realized). These must correspond to times much longer that the reaction time \(\tau _R\). For times approaching \(\tau _R\), \(C\) approaches \(\left \langle h_B\right \rangle \) exponentially: \begin{equation} C\left (t\right ) \approx \left \langle h_B\right \rangle \left (1-\exp ^{-t/\tau _R}\right ) \end{equation} Recall that \(\tau _R = \left (k_{AB}+k_{BA}\right )^{-1}\). When this time is greater than the short-time-scale molecular relaxation time, \(\tau _{\rm mol}\), \(C(t)\) is a linear function of time: \begin{equation} C\left (t\right ) \approx k_{AB}t \end{equation} The reactive flux, \(dC/dt\) displays a time-independent plateau in this regime which is equal to \(k_{AB}\).
One should realize that \(C(t)\) can be computed from a single molecular dynamics simulation, in principle. However, if the system dynamics is subject to rare event transitions, it may not be possible in practice to simulate long enough to achieve a statistically relevant value of \(C\). Transition path sampling is meant to overcome this limitation.
Let’s consider writing \(C(t)\) as an explicit ensemble average over the equilibrium phase space probability distribution, \(\rho \left (x_0\right )\): \begin{equation} \label {eq:tps_6} C\left (t\right ) = \frac {\int dx_0 \rho \left (x_0\right )h_A\left (x_0\right )h_B\left (x_t\right )}{\int dx_0 \rho \left (x_0\right )h_A\left (x_0\right )} \end{equation} Now, an insight of Dellago and Chandler is that, because both the numerator and denominator of Eq. 181 are partition functions, the log of \(C\) can be interpreted as a free energy difference between two systems: \begin{equation} \Delta F = -\ln C\left (t\right ) \end{equation} So, the log of \(C\) is the free energy price one must pay to constrain the endpoint of a dynamical path of length \(t\) which starts at time 0 in region \(A\) inside state \(B\). This means that we can use (in principle) any free energy method to compute \(C(t)\). Dellago and Chandler chose umbrella sampling.
To see why it is advantageous to use umbrella sampling, we must first imagine an order parameter \(\lambda \left (x\right )\) which indicates when we are in region \(B\) in the following manner: \begin{equation} x \in B\ \ \mbox {if}\ \ \lambda _{\rm min} \le \lambda \left (x\right ) \le \lambda _{\rm max} \end{equation} We now ask, how probable is it to find the system with a particular value of the order parameter, \(\lambda \), at time \(t\)? We can express this probability distribution as an ensemble average by visiting each phase space point at time 0, \(x_0\), and asking does this point initiate a dynamical trajectory that lands at order parameter \(\lambda \) at time \(t\)? \begin{equation} P\left (\lambda ,t\right ) = \frac {\int dx_0 \rho \left (x_0\right )h_A\left (x_0\right )\delta \left [\lambda -\lambda \left (x_t\right )\right ]}{\int dx_0\rho \left (x_0\right )h_A\left (x_0\right )} \end{equation} Here, \(\delta \left (x\right )\) is the Dirac delta function. Because region \(B\) corresponds to an interval of \(\lambda \), \(C\left (t\right )\) is an integral of \(P\left (\lambda ,t\right )\): \begin{equation} \label {eq:tps_cint} C\left (t\right ) = \int _{\lambda _{\rm min}}^{\lambda _{\rm max}}d\lambda P\left (\lambda ,t\right ) \end{equation} Because transitions from \(A\) to \(B\) are rare, \(P\left (\lambda ,t\right )\) in region \(B\) is small for relevant values of time, and we can’t compute it directly. So, we divide phase space into \(N+1\) neighboring overlapping regions \(B[i]\): \begin{equation} \mbox {(whole phase space)} = \bigcup _{i=0}^{N}B[i] \end{equation} Each region is defined by \begin{equation} x \in B[i] \ \ \ \mbox {if}\ \ \ \lambda _{\rm min}[i] \le \lambda \left (x\right ) \le \lambda _{\rm max}[i] \end{equation} Neighboring regions must overlap “a little”; i.e., \(\lambda _{\rm min}[i] < \lambda _{\rm max}[i-1]\). The size of the overlap will be considered in a bit. Now, the distribution of \(\lambda \) in each window \(B[i]\) is \begin{equation}\label {eq:tps_8} P\left (\lambda ,t;i\right ) = \frac {\int dx_0 \rho \left (x_0\right )h_A\left (x_0\right )h_{B[i]}\left (x_t\right )\delta \left [\lambda -\lambda \left (x_t\right )\right ]}{\int dx_0\rho \left (x_0\right )h_A\left (x_0\right )h_{B[i]}\left (x_t\right )} \end{equation} Notice that \(h_{B[i]}\left (x_t\right )\) acts like a Boltzmann factor for an umbrella potential, \(w_i\): \begin{equation} w_i = \left \{\begin {array}{ll} \infty & \mbox {if $\lambda < \lambda _{\rm min}[i]$}\\ 0 & \mbox {if $\lambda _{\rm min}[i] < \lambda < \lambda _{\rm max}[i]$}\\ \infty & \mbox {if $\lambda > \lambda _{\rm max}[i]$} \end {array}\right . \end{equation} And we are computing this probability distribution in window \(i\) using a phase space distribution whose Hamiltonian is modified by \(w_i\); e.g., \(\rho _i\left (x\right ) \propto e^{-\beta \left (\mathscr {H}+w_i\right )}\). This means when we conduct a particular MC run, we sample only within one window \(B[i]\).
The key aspect of umbrella sampling is that, inside window \(B[i]\): \begin{equation} P\left (\lambda ,t\right ) \propto P\left (\lambda ,t;i\right )\ \ \ \mbox {for $\lambda _{\rm min}[i] < \lambda < \lambda _{\rm max}[i]$} \end{equation} This is because the denominator of \(P\left (\lambda ,t;i\right )\) counts only those paths that end at \(t\) in \(B[i]\), while the denominator of \(P\left (\lambda ,t\right )\) counts all paths that end in state \(B\).
This proportionality is important, because it means that one can compute \(P\left (\lambda ,t;i\right )\) for each window \(B[i]\) separately using MC simulations, and then match the resulting distributions (tabulated as histograms over \(\lambda \)) in the overlapping regions, and then renormalize the entire distribution to produce \(P\left (\lambda ,t\right )\). Each MC simulation performs the appropriate random walk focused in its window, and thus maximizes the statistical significance of the results obtained in each window.
Finally, when we have \(P\left (\lambda ,t\right )\), one must only integrate over the appropriate values of \(\lambda \) to obtain \(C(t)\).
The notion of a “path ensemble” comes from interpreting Eq. 188 as an average of the quantity \(\delta \left [\lambda -\lambda \left (x_t\right )\right ]\) over a distribution \(f_{AB[i]}\left (x_0,t\right ) = \rho \left (x_0\right )h_A\left (x_0\right )h_{B[i]}\left (x_t\right )\). \(f_{AB[i]}\left (x_0,t\right )\) is the distribution function of all initial states \(x_0\) whose trajectories lead exactly to state \(B[i]\) in time \(t\). \(P\left (\lambda ,t;i\right )\) is a weighted average over these “paths.” \(f_{AB[i]}\left (x_0,t\right )\) is therefore called a “path ensemble.” The average of any quantity \(A\left (x_{t^\prime }\right )\) in this ensemble, \begin{equation} \left \langle A\left (x_{t^\prime }\right )\right \rangle = \frac {\int dx_0 f_{AB[i]}\left (x_0,t\right ) A\left [x_{t^\prime }\left (x_0\right )\right ]}{\int dx_0 f_{AB[i]}\left (x_0,t\right )}, \end{equation} is called a “path average.”
|
| A schematic of the first transition path ensemble, \(f_{AB[i]}\left (x_0,t\right ) = \rho \left (x_0\right )h_A\left (x_0\right )h_{B[i]}\left (x_t\right )\). |
Let’s take stock: We know that \(k = dC/dt\). In order for \(k\) to be a constant, we need to ensure that \(C(t)\) is linear in time, meaning we must evaluate \(C(t)\) for many values of \(t\). Each evaluation of \(C(t)\) is a “free energy” calculation, so getting at \(k\) this way may be prohibitively expensive. Another insight of Dellago and Chandler [18] was to recognize that a simple factorization of \(C(t)\) leads to an algorithm in which one need only do a single free energy calculation. Consider: \begin{equation} C\left (t\right ) = \frac {\left \langle h_Ah_B\left (t\right )\right \rangle }{\left \langle h_Ah_B\left (t^\prime \right )\right \rangle } \times \frac {\left \langle h_Ah_B\left (t^\prime \right )\right \rangle }{h_A} = \frac {\left \langle h_Ah_B\left (t\right )\right \rangle }{\left \langle h_Ah_B\left (t^\prime \right )\right \rangle } \times C\left (t^\prime \right ), \end{equation} where both \(t\) and \(t^\prime \) are in an interval denoted \([0,T]\). For notational convenience: \(h_A\equiv h_A\left (x_0\right )\) and \(h_B(t)\equiv h_B\left (x_t\right )\).
Next, we define a new indicator function as a property of the interval \([0,T]\): \begin{equation} H_B\left (x_0,T\right ) \equiv \max _{0\le t\le T} h_B\left (x_t\right ) \end{equation} which tells us if the trajectory begun at \(x_0\) visits state \(B\) at least once during the interval \([0,T]\). Since \(H_B = 0\) if \(h_B(t) = 0\) for all \(t \in \left [0,T\right ]\), and \(H_B = 1\) otherwise, we can insert it into our factorized expression for \(C(t)\): \begin{equation}\label {eq:tps_9} C\left (t\right ) = \frac {\left \langle h_Ah_B\left (t\right )H_B\left (T\right )\right \rangle }{\left \langle h_AH_B\left (T\right )\right \rangle }\times \frac {\left \langle h_AH_B\left (T\right )\right \rangle }{\left \langle h_Ah_B\left (t^\prime \right )H_B\left (T\right )\right \rangle } \times C\left (t^\prime \right ), \end{equation} (We’ve also multiplied and divided by \(\left \langle h_AH_B\left (T\right )\right \rangle \).)
If we stare at Eq. 194 long enough, we see that the quantity
is an average of \(h_B(t)\) over the distribution function \begin{equation} F_{AB}\left (x_0,T\right ) \equiv \rho \left (x_0\right )h_A\left (x_0\right )H_B\left (x_0,T\right ) \end{equation} This is the ensemble of all paths that begin in \(A\) and visit \(B\) at least once in the interval \([0,T]\). (It therefore differs from \(f_{AB}\left (x_0,T\right )\).)
|
| A schematic of the second transition path ensemble, \(F_{AB}\left (x_0,T\right ) \equiv \rho \left (x_0\right )h_A\left (x_0\right )H_B\left (x_0,T\right )\). |
Using this notation to denote averaging over this ensemble, \(\left \langle \cdots \right \rangle _{AB}\), \begin{equation}\label {eq:tps_10} C\left (t\right ) = \frac {\left \langle h_B\left (t\right )\right \rangle _{AB}}{\left \langle h_B\left (t^\prime \right )\right \rangle _{AB}}C\left (t^\prime \right ) \end{equation} We can efficiently calculate \(\left \langle h_B\left (t\right )\right \rangle _{AB}\) by sampling \(F_{AB}\left (x_0,T\right )\) in a single simulation. \(C(t^\prime )\) can be calculated from a single free-energy umbrella-sampling calculation. This provides a recipe for obtaining \(k\):
So, we see now that, using the factorization trick, we have two ensembles to average over: \[\begin {array}{rcl} f_{AB}\left (x_0,t\right ) & : & \text {distn. fcn. of all paths starting at } A \text { and ending in } B \text { at time } t \text {, used to compute } P\left (\lambda ,t\right )\\[4pt] F_{AB}\left (x_0,T\right ) & : & \text {distn. fcn. of all paths starting at } A \text { and visiting } B \text { at least once in the interval } [0,T] \text {, used to compute } \left \langle \dot {h}_B(t)\right \rangle _{AB} \text { and } \left \langle h_B(t^\prime )\right \rangle _{AB}. \end {array}\]
We can compute averages in these two ensembles by MC sampling. A new path is generated from an old path, and is accepted or rejected based on a detailed balance criterion. Now, we sample these two path ensembles in two different simulations, but the acceptance rules and trial moves are the same. Let’s generall call \(\mathscr {N}\left (i\right )\) the probability of path \(i\). We start with an old path “\(o\)” and attempt to generate a new path “\(n\)”. The acceptance rule obeying detailed balance is \begin{equation} \frac {{\rm acc}\left (o\rightarrow n\right )}{{\rm acc}\left (n\rightarrow o\right )} = \frac {\mathscr {N}\left (n\right )\alpha \left (n\rightarrow o\right )}{\mathscr {N}\left (o\right )\alpha \left (o\rightarrow n\right )} \end{equation}
\(\alpha \left (o\rightarrow n\right )\) is the a priori probabilit of attemptying to generate \(n\). The moves used in transition path sampling MC guarantee \(\alpha \left (o\rightarrow n\right ) = \alpha \left (n\rightarrow o\right )\). So,
So, what are the moves? There are two basic moves we can consider here.
It seems the real trick is generating an initial path of length \(T\) that successfully connects region \(A\) with region \(B\). This can be done with traditional MD by beginning a trajectory in state \(A\) and simply waiting long enough for it to cross into \(B\). This might be possible, but could easily be prohibitively expensive. A better technique is to guess a path by creating a configuration which you hypothesize to be a transition state, and then integrating forward and backward in time to generate a path. It is accepted as an initial path if the beginning lands in \(A\) and the end in \(B\).
To show the power of transition path sampling, it is perhaps best to consider a simple example of a two-state system where the states are separated by an energy barrier. This presentation is based on an exercise from the molecular simulation course taught by Berend Smit and Daan Frankel in 2001, and uses a code called tps.
Here, we consider dimer isomerization in a simple 2D liquid sample of \(N\) 15 particles confined to a circle of radius \(R\). Particles 1 and 2 form the dimer. The total potential energy is given by \begin{equation} \mathscr {U} = {\sum _{i<j}}^\prime U_{\rm WCA}\left (r_{ij}\right ) + \sum _i U_{\rm wall}\left (r_i\right ) + U_{12}\left (r_{12}\right ) \end{equation} The prime indicates that the 1-2 pair is excluded from this sum. All particles interact with each other via a modified Lennard-Jones pairwise interaction known as the Weeks-Chandler-Andersen (WCA) potential: \begin{equation} U_{\rm WCA}\left (r\right ) = \left \{\begin {array}{ll} 4\left (r^{-12}-r^{-6}\right ) + 1 & \mbox {$r < 2^{1/6}$}\\ 0 & \mbox {$r > 2^{1/6}$} \end {array}\right . \end{equation} The WCA potential is fully repulsive, and cutoff where the force vanishes. The particles interact with the circular wall via another WCA potential: \begin{equation} U_{\rm wall}\left ({\bf r}\right ) = U_{\rm WCA}\left (R+2^{1/6}-r\right ) \end{equation} where \(\bf r\) is the vector position and \(r\) is the radial position (i.e., the circle is centered on the origin). The dimer bond (between particles 1 and 2) is described by a two-well potential: \begin{equation} U_{12}\left (r_{12}\right ) = h\left [1-\frac {\left (r_{12}-w-2^{1/6}\right )^2}{w^2}\right ]^2 \end{equation} This potential has stable minima at \(r_{12} = 2^{1/6}\) and \(r_{12} = 2^{1/6}+2w\). Here, \(h\) defines the height of the potential energy barrier between two stable states as defined by this potential, and \(w\) defines the width of this barrier:
|
| The WCA and dimer potentials used in this case study. |
For now, we fix \(w\) at 0.25, and \(R\) at 3.0. This gives an areal particle number density \(\rho = 0.53~\sigma ^{-2}\).
What is a reasonable order parameter for this system? The bond length, naturally. All configurations for which \(r_{12} < 1.30\) are defined to be in region \(A\), and all configurations for which \(r_{12} > 1.45\) are defined be in region \(B\). It is important to note here that \(h\) is not a measure of the free energy barrier which determines the kinetic rate in this process. To determine the free energy barrier, we need to construct the free energy profile, \(F(r_{12}) = -k_BT\ln Q(r_{12})\). This type of restricted free energy is often termed a Landau free energy.
Let’s consider first a small potential barrier height, \(h = 2\). For this height, we can compute \(P(\lambda ,t\rightarrow \infty )\) very accurately from a single long MD simulation (10,000,000 steps; \(\Delta t\) = 0.001; total energy 15 \(\epsilon \)). We see from the linear region of \(C(t)\) that the rate constant appears to be \(k_{AB} = 4.04\times 10^{-2}~\tau ^{-1}\). Now, let us perform transition path sampling MC on this system.
|
|
\(C(t)\) computed from a long MD simulation for barrier height \(h\) = 2. The
total energy is set at 15 \(\epsilon \). |
First, let’s decide how long an interval \(T\) we need. Clearly, \(C(t)\) appears to be linear up to \(t = 2\); following Dellago, we’ll choose \(T\) = 2.0. 4 Now, we need to conduct two types of MC simulations:
A set of umbrella sampling MC simulations on the following intervals for \(r_{12}\):
| \(i\) | min | max |
| 1 | 0.00 | 1.22 |
| 2 | 1.20 | 1.26 |
| 3 | 1.24 | 1.30 |
| 4 | 1.28 | 1.45 |
| 5 | 1.40 | 2.45 |
When matching the histograms in each of these intervals and renormalizing, we obtain \(P\left (r_{12},t=2\right )\). When we then integrate from \(r_{12} = 1.45\) to \(r_{12} = 2.45\) (region \(B\)).
We’ll use the other default parameter values provided in tps. These include the maximum angle by which velocity vectors are rotated in a shooting move (\(\Delta \phi = 0.6\) rad), and the number of shifting moves per shooting move (95:5), and the number of \(r_{12}\)-slices per window (200).
We see that \(\left \langle \dot {h}_B(t)\right \rangle = 0.22\) and \(\left \langle h_B(2)\right \rangle = 0.55\).
|
|
\(\left \langle h_B(t)\right \rangle \) computed from a transition path sampling MC simulation for
barrier height \(h\) = 2. The total energy is set at 15 \(\epsilon \). |
Now, the histogram:
We see that \(\left \langle \dot {h}_B(t)\right \rangle = 0.22\) and \(\left \langle h_B(2)\right \rangle = 0.55\).
|
|
\(P(r_{12},2)\) computed from a transition path sampling MC simulation and
umbrella sampling, for barrier height \(h\) = 2. The total energy is set
at 15 \(\epsilon \). |
Integrating over region \(B\) yields \(C(2)\) = 0.09037. Performing the requisite operations yields \(k_{AB} = 0.0360~\tau ^{-1}\), which compares well to the MD-calculated value \(k_{AB} = 4.04\times 10^{-2}~\tau ^{-1}\).
Repeating this entire procedure for \(h\) = 6 yields \(k_{AB} = 5.05\times 10^{-4}~\tau ^{-1}\).
[to be completed; runs currently executing, June 25, 2026]
So far, we have considered interparticle interactions that are short-ranged by construction. Because the Lennard-Jones potential decays so strongly with distance (as \(r^{-6}\)), it is acceptable to cut off this interaction at moderate distances and, if desired, add a correction factor which is the result of integrating the potential over a uniform particle density out to \(r = \infty \). However, Coulomb interactions, common in molecular simulation, decay relatively much more slowly (as \(r^{-1}\)) and as a consequence, we cannot compute a correction factor; the integral diverges. There are several ways to handle long-ranged interactions, but the most popular is the Ewald summation [20], which we discuss here. This discussion is drawn primarily from F&S chapter 12, and chapter 3 of the Ph.D. thesis of Marcus Deserno [21]
First, assume we have a collection of charged particles in a cubic box with side length \(L\), with periodic boundary conditions. The collection is assumed neutral; there is an equal number of positive and negative charges. The total Coulombic energy in this system is given by:
\(\phi (r_i)\) is the electrostatic potential at position \(r_i\): \begin{equation} \phi \left (r_i\right ) = {\sum _{j=1}^{N}}^\prime {\sum _{{\bf n}\in \mathbb {Z}^3}} \frac {q_j}{\left |{\bf r}_{ij} + {\bf n}L\right |} \end{equation} \(\bf n\) is a three dimensional integer vector. The prime on the first summation indicates that we do not admit the term for which \(j=i\) if \({\bf n} = (0,0,0)\). That is, we allow each particle to interact with its periodic images, but not with itself.
To evaluate \(\mathscr {U}\) efficiently, we break it into two parts:
How can we do this? The idea of Ewald is to do two things: first, screen each point charge using a diffuse cloud of opposite charge around each point charge, and then compensate for these screening charges using a smoothly varying, periodic charge density. The screening charge is constructed to make the electrostatic potential due to a charge at position \(r_j\) decays rapidly to near zero at a prescribed distance. These interactions are treated in real space. The compensating charge density, which is the sum of all screening densities except with opposite charges, is treated using a Fourier series.
The standard choice for a screening potential is Gaussian: \begin{equation} \rho _{\rm s}(r) = -q_i(\alpha /\pi )^{3/2} e^{-\alpha r^2} \end{equation}
So for each charge, we add such a screening potential. Now, to evaluate \(\mathscr {U}_{\rm Coul}\), we have to evaluate the potential of a charge density that compensates for the screening charge densities at each particle. This is done in Fourier space.
The potential of a given charge distribution is given by Poisson’s equation: \begin{equation} \label {eq:ft_pois} -\nabla ^2\phi (r) = 4\pi \rho (r) \end{equation}
Now, the compensating charge distribution, denoted \(\rho _1\), can be written: \begin{equation} \rho _1(r) = \sum _{j=1}^{N}\sum _{{\bf n}\in \mathbb {Z}^3} q_j (\alpha /\pi )^{3/2} \exp \left [-\alpha \left |{\bf r} - \left ({\bf r}_j + {\bf n}L\right )\right |\right ] \end{equation} Notice that the sum over \(j\) includes the self-interaction when we include the potential due to this charge density in the calcuation of the total Coulombic energy.
Now, consider the Fourier transform of Poisson’s equation: \begin{equation} k^2 \tilde {\phi }(k) = 4\pi \tilde {\rho }(k) \end{equation} The Fourier transform of \(\rho _1(r)\) is given by
(The math required to show this involves noting that the the Fourier transform of a Gaussian is another Gaussian, and that the integral over all space of a normalized Gaussian is unity.) The \(\bf k\)-vectors are given by \begin{equation} {\bf k} = \frac {2\pi }{L} {\bf l} \ \ \ \ {\bf l} \in \mathbb {Z}^3 \end{equation}
We can use Eq. 207 to solve for \(\tilde {\phi }(k)\): \begin{equation} \label {eq:ft_phi1} \tilde {\phi }(k) = \frac {4\pi }{k^2} \sum _{j=1}^{N}q_j e^{-i{\bf k}\cdot {\bf r}_j}e^{-k^2/4\alpha } \end{equation} Note that this solution is not defined for \(k = 0\). In fact, we have to assume that \(\tilde \phi (0) = 0\), which is consistent with the notion that our system and all its periodic images is embedded in a medium of infinite dielectric constant (a perfect conductor; the “tinfoil” boundary condition).
Fourier inverting \(\tilde \phi (k)\) gives \begin{equation} \phi _i(r) = \frac {1}{V}\sum _{{\bf k}\ne 0}\tilde \phi (k)e^{i{\bf k}\cdot {\bf r}} \end{equation} which, when we substitute for \(\tilde {\phi }(k)\) from Eq. 211 yields \begin{equation} \phi _i(r) = \sum _{{\bf k}\ne 0}\sum _{j=1}^{N} \frac {4\pi q_j}{Vk^2} e^{i{\bf k}\cdot \left ({\bf r}-{\bf r}_j\right )}e^{-k^2/4\alpha } \end{equation}
So, the total Coulombic energy due to the compensating charge distribution is
where \begin{equation} \rho (k) = \sum _{i=1}^{N}q_ie^{i{\bf k}\cdot {\bf r}} \end{equation}
Notice that this does indeed include a spurious self-self interaction, because the point charge at \({\bf r}_i\) interacts with the compensating charge cloud also at \({\bf r}_i\). This self-interaction is the potential at the center of a Gaussian charge distribution. First, we solve Poission’s equation for the potential due to a Gaussian charge distribution (details in F&S): \begin{equation} -\frac {1}{r}\frac {\partial ^2r\phi _{\rm Gauss}}{\partial r^2} = 4\pi \rho _{\rm Gauss}(r) \end{equation} yielding \begin{equation} \phi _{\rm Gauss}(r) = \frac {q_i}{r}{\rm erf}(\sqrt {a}r) \end{equation} where \(\rm erf\) is the error function: \begin{equation} {\rm erf}(x) = \frac {2}{\sqrt {\pi }}\int _0^x e^{-r^2}dr \end{equation} At \(r=0\), we have \begin{equation} \phi _{\rm self} = \phi _{\rm Gauss}(0) = 2\left (\frac {\alpha }{\pi }\right )^{1/2}q_i \end{equation}
So the total self-interaction energy becomes \begin{equation} \mathscr {U}_{\rm self} = \left (\frac {\alpha }{\pi }\right )^{1/2}\sum _{i=1}^{N}q_i^2 \end{equation} which must be subtracted from the total Coulombic energy.
Finally, the real-space contribution of the point charge at \({\bf r}_i\) is the screened potential: \begin{equation} \phi _{\rm short}(r) = \frac {q_i}{r} - \frac {q_i}{r}{\rm erf}\left (\sqrt {a}r\right ) \equiv \frac {q_i}{r}{\rm erfc}\left (\sqrt {a}r\right ) \end{equation} where \(\rm erfc\) is the complementary error function. The total real-space Coulombic potential energy is therefore \begin{equation} \mathscr {U}_{\rm short} = \frac {1}{2}\sum _{i\ne j}^{N} \frac {q_i q_j}{r_{ij}} {\rm erfc}\left (\sqrt {a}r_{ij}\right ) \end{equation}
Putting it all together:
where \begin{equation} \rho (k) = \sum _{i=1}^{N}q_ie^{i{\bf k}\cdot {\bf r}}. \end{equation}
Now, the arbitrariness left to us at this point is in a choice for the parameter \(\alpha \). Clearly, very small alphas make the Gaussians tighter and therefore the compensating charge distribution less smoothly varying. This means a Fourier series representation of \(\mathscr {U}_1\) with a given number of terms is more accurate for larger \(\alpha \). We’ll evaluate choice of \(\alpha \) in Sec. 7.3.3.
Now, we can use Eq. ?? in a Monte Carlo simulation of a system of charges, provided that periodic boundary conditions are used and the domain is cubic. (Extensions to non-cubic boxes and slab geometries are discussed to a limited extent in F&S.) We can also use the Ewald technique to calculate forces for use in molecular dynamics simulations.
The force on particle \(i\) due to the charges in the system is given by \begin{equation} {\bf F}_i = -\frac {\partial }{\partial {\bf r}_i}\mathscr {U}_{\rm Coul} \end{equation}
For our purposes, the two contributions to \({\bf F}_i\) are due to the \(k\)-space energy and the short-ranged, real-space energy: \begin{equation} {\bf F}_i = {\bf F}_i^{(k)} + {\bf F}_i^{(r)} \end{equation} Notice that there is no change in \(\mathscr {U}_{\rm self}\) when \({\bf r}_i\) changes, so no forces arise from \(\mathscr {U}_{\rm self}\).
The \(k\)-space contribution is given by \begin{equation} {\bf F}_i^{(k)} = q_i\sum _j q_j \frac {1}{V} \sum _{{\bf k}\ne {\bf 0}} \frac {4\pi {\bf k}}{k^2}e^{-k^2/4\alpha }\sin \left ({\bf k}\cdot {\bf r}_{ij}\right ) \end{equation}
The real-space contribution is given by \begin{equation} {\bf F}_i^{(r)} = q_i\sum _j q_j \left [2\sqrt {\frac {\alpha }{\pi }}e^{-\alpha r_{ij}^2}+\frac {1}{r_{ij}}{\rm erfc}(\sqrt {\alpha } r_{ij})\right ]\frac {{\bf r}_{ij}}{r_{ij}^2} \end{equation}
We will consider an Ewald implementation which is a modified version of the ewald code written for Berend Smit’s Molecular Simulation course webpage. This code simply computes the Ewald energy for a cubic lattice, given an appropriate number of particles, and a value for \(\sqrt {\alpha }\) (which is called \(\alpha \) in the code), and a value for \(k_{\rm max}\), the maximum integer index for enumerating \(k\)-vectors. 5
The units used in a system with electrostatics differ depending on community. So far, we have assumed that the units of electrostatic potential are charge \(\mathscr {C}\), divided by length \(\mathscr {L}\), because we write potential as \(\phi = q/|{\bf r}|\), where \(q\) is measured in units of \(\mathscr {C}\) and distance in units of \(\mathscr {L}\). Energy is therefore written in units of \(\mathscr {C}^2\) over \(\mathscr {L}\), and force in units of \(\mathscr {C}^2\) over \(\mathscr {L}^2\). If we want the final energy in more familiar units, we can choose \(\mathscr {C}\) and \(\mathscr {L}\), and use the standard prefactor \(1/4\pi \epsilon _0\) to convert from “charge squared per length” to “energy”. For example, in SI units, \(\epsilon _0 = 8.85419\times 10^{-12}\) (C\(^2\)/m)/J. In this implementation, we use a length of \(\mathscr {L} = 1\) and \(\mathscr {C} = 1\) and measure energy such that \(1/4\pi \epsilon _0 = 1\).
We will examine two configurations, both with \(N\) = 8\(^3\) = 512 particles, with alternating + and - charges. One configuration has the particle on a cubic lattice with lattice spacing \(r_0 = 1\), which is the standard NaCl crystal structure. We will call this the “crystal” configuration. The other is like the crystal, only each particle is displaced by a random amount from its lattice position with a maximum displacement of 0.3. We will call this the “liquid” configuration. We compute the total electrostatic energy via the Ewald sum technique for various values of \(\sqrt {\alpha }\) and maximum \(k\)-vector index. As we increase the number of \(k\)-vectors taken in the sum, we would like to show that the total energy converges to a certain value. We will measure this in terms of the Madelung constant, \(M\): \begin{equation} \mathscr {U}_{\rm Coul} = -\frac {N q^2 M}{4\pi \epsilon _0r_0} \end{equation}
The following results were obtained for the crystal when \(\alpha = 0.8\):
alpha kmax u_real u_fourier u_self u_total M 1.20 2 -.206695 0.297020E-33 -0.67703 -0.88372 1.7674 1.20 3 -.206695 0.509667E-33 -0.67703 -0.88372 1.7674 1.20 4 -.206695 0.994016E-02 -0.67703 -0.87378 1.7476 1.20 5 -.206695 0.994016E-02 -0.67703 -0.87378 1.7476 1.20 6 -.206695 0.994016E-02 -0.67703 -0.87378 1.7476
This shows that, for the perfectly periodic crystal, very few \(k\)-vectors are needed to reach a converged energy.
For the liquid, the results are somewhat different, again for \(\alpha = 1.2 \):
alpha kmax u_real u_fourier u_self u_total M 1.20 2 -.244210 0.210535E-01 -0.67703 -0.90018 1.8004 1.20 3 -.244210 0.349887E-01 -0.67703 -0.88625 1.7725 1.20 4 -.244210 0.482931E-01 -0.67703 -0.87294 1.7459 1.20 5 -.244210 0.510164E-01 -0.67703 -0.87022 1.7404 1.20 6 -.244210 0.519170E-01 -0.67703 -0.86932 1.7386 1.20 7 -.244210 0.521388E-01 -0.67703 -0.86910 1.7382 1.20 8 -.244210 0.521741E-01 -0.67703 -0.86906 1.7381 1.20 9 -.244210 0.521802E-01 -0.67703 -0.86906 1.7381 1.20 10 -.244210 0.521808E-01 -0.67703 -0.86906 1.7381
Considering the converged results for various values of \(\alpha \), we see that \(M\) is not too sensitive to \(\alpha \), once \(\alpha > 0.6\):
|
| Madelung constant \(M\) vs. Ewald parameter \(\alpha \) for a system of 512ions. |
Note that, in principle, \(\mathscr {U}_{\rm Coul}\) doesn’t depend on \(\alpha \). However, errors due to too low a choice for \(\alpha \) indicate that the Fourier sum is poorly converged, because of the factor \(\exp [-k^2/4\alpha ]\).
One of the most interesting recent developments in molecular simulation are the so-called “density of states” methods, first implemented by Wang and Landau [22, 23]. In principle, these techniques provide a route to calculating the density of states of a system of interest, \(\Omega (E)\) (Eq. 18). 6 Determining \(\Omega \) gives a full understanding of the thermodynamics of a system, because the entropy is calculable directly as \begin{equation} S = k_BT\ln \Omega \end{equation} So, it is not undesirable at all to be able to compute \(\Omega \).
Now, we know that \(\Omega \) is buried in the probability distribution of energy that arises from standard NVT Monte Carlo simulation: \begin{equation} P(E) = \Omega (E)\exp \left (-\beta E\right )\left /\sum _{i}\Omega (E_i)\exp \left (-\beta E_i\right )\right . \end{equation} If we run a simple NVT MC simulation and populate a histogram of total energy, \(H(E)\), we can in principle compute \(\Omega (E)\) by first normalizing \(H(E)\) \(\rightarrow P(E)\) and multiplying each entry \(i\) by the factor \(\exp \left (+\beta E_i\right )\). The problem in practice is that states for which the Boltzmann factor (\(\exp \left (-\beta E_i\right )\)) is low are rarely if ever visited in reasonable time, and the statistical strength of \(H(E)\) in such regions of energy is therefore poor. Indeed, conventional MC is designed to not cover all of energy space, but to perform importance sampling of configurational space. Furthermore, if the Markov process is trapped in a local minimum on the potential energy hypersurface, barrier states with vanishingly small Boltzmann factors effectively prevent the escape of the process, preventing an adequate sampling of the potential energy hypersurface. For these reasons, it is desirable to perform a random walk in energy rather than configurational space.
In this section, we’ll review the two papers by Wang and Landau which introduced their technique by demonstrating how to compute \(\Omega \) for Ising and Potts systems by conducting random walks in energy space. These are lattice systems; more recent work has been focused on developing efficient continuous-space versions of the technique (E.g., [24, 25]).
The Wang-Landau algorithm is a method which converges upon \(\Omega (E)\) to within a given tolerance. First, we must understand what energy levels are accessible to our system; let us call this set \(\left \{E_i: i=1,\dots ,N_e\right \}\), where \(N_e\) is the number of levels. Let us imagine conducting a random walk in energy space such that the observed histogram of visited levels, \(H(E)\), is uniform, or “flat.” This random walk must be guided by some underlying probability distribution, \(\mathscr {N}\), which tells us whether to accept or reject a proposed step, \(o\rightarrow n\), in the walk. Generally, \(\mathscr {N}\) is incorporated in a Metropolis acceptance rule: \begin{equation} \label {eq:acc1} {\rm acc}(o\rightarrow n) = \min \left (1,\frac {\mathscr {N}(n)}{\mathscr {N}(o)}\right ) \end{equation} By conditionally accepting trials based on Eq. 230, we guarantee that \(\mathscr {N}\) is a static equilibrium distribution (assuming our trial moves obey detailed balance, or the balance condition is met in some other way). Now, if \(\mathscr {N}\) is the canonical distribution, the probability of observing a particular energy level, \(E_i\), is proportional to \(\exp \left (-E_i/k_BT\right )\).
Now consider that each energy level has \(\Omega (E_i)\) allowable microstates, each with the same probability, \(1/\Omega (E_i)\), by the fundamental hypothesis. The probability of observing the system in energy level \(E_i\) is the composite probability of observing it in one of the microstates with energy \(E_i\). If we want the probability of observing the system in energy level \(E_i\) to be uniform, then the probability of observing any particular microstate should be proportional to \(1/\Omega (E_i)\). So, the probability distribution we should use is \begin{equation} \mathscr {N} \propto 1/\Omega (E_i), \end{equation} and the acceptance rule becomes: \begin{equation} \label {eq:acc2} {\rm acc}(o\rightarrow n) = \min \left (1,\frac {\Omega (o)}{\Omega (n)}\right ) \end{equation}
Now, here is why we want to construct a flat histogram. If we can, then we know that have the “correct” \(\Omega \). But we don’t know \(\Omega \) to begin with. So now we need a way to begin with a guess for \(\Omega \) and then systematically refine it until we achieve a flat histogram in energy space. An efficient technique for successively refining a guess for \(\Omega \) from a sequence of Monte-Carlo simulations was the major contribution of Wang and Landau. [22, 23]
In this algorithm, we start with the initial guess \(\Omega (E) = 1\), and the histogram is initialized as \(H(E) = 0\). Then, we begin making trial moves and accepting or rejecting based on the acceptance rule in Eq. 232. After a move, we find ourselves in energy state \(E_i\), and we update both the histogram and the density of states:
Note that this is done regardless of whether we wind up in state \(E_i\) due to an accepted move or a rejected move. The density of states update factor, \(f\), begins at a relatively large value (\(f = e^1 = 2.718281828\)). The updates continue until the histogram is “sufficiently flat.” At this point, the histogram is reset to 0, and \(f\) is reduced according to \begin{equation} \label {eq:wl:f_update} f \rightarrow \sqrt {f} \end{equation} Then, the simulation continues again until the histogram is sufficiently flat. After several iterations, \(f\) gets closer and closer to 1, and changes to \(\Omega \) become smaller and smaller. The series of runs is terminated after \(\ln f\) is about 10\(^{-8}\).
By “sufficiently flat,” Wang and Landau suggest that the histogram is such that the bin with the minimum number of hits is within 95% of the mean number of hits per bin. This is a particularly strong criterion, and more recent work suggests that it is adequate to require merely that each bin has a minimum number of hits, regardless of the mean value [26].
The factor update relation (Eq. 233) is also somewhat arbitrary. What is required is that \(f\) approach unity in a scheduled manner. A more general rule might be \begin{equation} f \rightarrow \sqrt [n]{f}\ \ \ \ n \ge 2 \end{equation}
It is important to note that, because the acceptance rule is changing during the simulation, detailed balance is not strictly obeyed. However, it is very close to being obeyed when \(f\approx 1\). How does one choose a proper initial value and update scheme for \(f\)? There is unfortunately no general guidance on this in the current literature, other than the original realization of Wang and Landau that too large a value of \(f\) introduces large statistical fluctuations in \(\Omega \) which might become “frozen in,” while too small a value for \(f\) will require much too many visits to produce a flat histogram.
Also, the magnitude of \(\Omega \) is normally such that we can’t tabulate it directly; it is advantageous instead to tabulate its log. So the array used to store \(\Omega \) actually stores \(\ln \Omega \), and the update of a bin becomes: \begin{equation} \ln \Omega \rightarrow \ln \Omega + \ln f \end{equation} The acceptance rule does not change, of course; it is simply reformulated as \begin{equation} {\rm acc}(o\rightarrow n) = \min \left [1,\exp \left (\ln \Omega (o)-\ln \Omega (n)\right )\right ] \end{equation}
Finally, because the acceptance rule involves a ratio of probabilities, \(\Omega \) is not determined absolutely, but rather to within a constant factor. Unless we know the exact \(\Omega \) for some reference state (as we do for certain lattice models, such at the Ising magnet and the Potts glass), we cannot obtain absolute thermal quantities (entropy and free energies) from the \(\Omega \) obtained using this method. This is usually not a problem, either because we can “pin down” \(\Omega \) at a known reference, or because we don’t care about absolutes, but only differences.
An impressive example of the power of the WL method appears in the latter of the two original WL papers [23]. Here, the object of study is the Potts model [27]. This is an \(L\times L\) lattice of \(N\) sites (i.e., \(N = L^2\)) where each site can have a spin value, \(q_i\), between 1 and \(Q\). WL considered the ten-state model; \(Q\) = 10. The Hamiltonian of the Potts model is \begin{equation} \mathscr {H} = -\sum _{\left \langle ij\right \rangle }\delta \left (q_i,q_j\right ) \end{equation} where the sum is over all nearest neighbor pairs.
Under periodic boundary conditions, the ground state energy of the Potts model is given by \begin{equation} E_0 = -2L^2 \end{equation} and the density of states at \(E_0\) (i.e., the degeneracy of the ground state) is easily seen to be \begin{equation} \Omega (E_0) = Q. \end{equation} The \(N\) energy levels are \begin{equation} -E_0,-E_0+8,-E_0+12,\dots ,-4,0,4,\dots ,E_0-12,E_0-8,E_0 \end{equation} The ten-state Potts model displays a first-order phase transition with a critical temperature of \(T_c = 0.701232\) in the thermodynamic limit (\(N\rightarrow \infty \)). The two coexisting phases are characterized by an average energy per particle of -0.965 and -1.671, respectively.
First, let’s conduct normal NVT sampling of a ten-state, \(L\) = 12 Potts model at temperatures well below, near, and well-above the critical temperature. (The dual purpose code wl-w.c can accomplish this.) At precisely the critical temperature, we should see two peaks of equal height in the energy histogram. Our trial move will consist of randomly selecting one of the 144 spins and then randomly selecting a spin value between 1 and 10. Below, I show the histograms of energy per spin after 10\(^6\) cycles (one cycle = 144 attempted flips) for various temperatures, all begun from the same initial (randomly assigned) configuration which is initially in the high-energy regime (E/N \(\approx \) -0.25). (\(T\) = 0.70991 is the critical temperature, \(T_c\), for the \(L\) = 12, \(Q\) = 10 Potts model as calculated by Wang and Landau.)
|
|
Histograms of energy per spin, \(H(E/N)\), for the \(L\) = 12 ten-state Potts
model at temperatures 0.5, 0.7, 0.70991, and 1.0, 5.0, and 10.0,
after 10\(^6\) cycles using conventional Monte Carlo. |
Notice that precisely pinpointing \(T_c\) using a series of NVT MC simulations (if we assume that 10\(^6\) cycles is enough – I have not yet claimed that it is) would be extremely difficult. For \(T\) = 0.7, the peaks are not of equal height, but an increase of a mere 0.0991 brings them to the same height (the signature of a critical temperature). How would we know how to zoom in on \(T\) = 0.70991? It is conceivable that we could embed NVT MC inside some nonlinear optimization routine whose objective function is a measure of relative peak heights, which must be minimized by allowing \(T\) to vary. This could be automated and could in principle provide a very accurate \(T_c\) after a finite number of runs. But we don’t know ahead of time how many runs would be required, nor if our optimization scheme is efficient enough to allow us to find \(T_c\) to some tolerable level of precision in a reasonable time.
Now, imagine that we could in some way compute the density of states, \(\Omega (E)\), from a single simulation. We could then easily arrive at an estimate for \(T_c\) by simply evaluating canonical energy histograms at various \(T\), each constructed from \(\Omega (E)\), until we find one for which the peak heights are the same. Consider: \begin{equation} \label {eq:canonical_histogram} H(E/N) \propto \Omega (E)\exp \left (-E/k_BT\right ). \end{equation}
Can we compute \(\Omega (E)\) from an NVT MC simulation? In principle, yes. We could rearrange Eq. 241: \begin{equation} \label {eq:anti_boltz} H(E/N)\exp \left (+\beta E\right ) \propto \Omega (E) \end{equation} This certainly suggests that if we run NVT MC, tabulate a histogram of energy states, and the postmultiply it by the “anti-Boltzmann” factor \(\exp \left (+\beta E\right )\), we will recover \(\Omega (E)\). But the data from the 10\(^6\)-cycle MC runs shown in the above figure kills any hope of being able to do this in practice. You can see that none of the histograms cover the entire domain of accessible energy levels, so we cannot use any single histogram to produce \(\Omega (E)\). You can also see that the very highest energy levels are not accessed even at extremely high temperatures. (Interestingly, a negative temperature resolves this part of the energy spectrum quite well; this indicates that the entropy of the Potts model for \(E > -0.25\) decreases with increasing energy.) The histogram taken at the lowest temperature appears to cover a broad part of the domain, but most of it is covered with very poor statistical accuracy, as evidenced by the large fluctuations. The histograms taken near the critical temperature of \(T\) = 0.70991 cover a relatively broad domain relatively well, so it is at least conceivable that we can produce part of \(\Omega (E)\) from these histograms.
The figure below shows the application of Eq. 242 to determine partial densities of states for each histogram shown in the previous figure.
|
|
Partial densities of state computed from \(\Omega (E) = H(E/N)exp(+E/k_BT)\) and scaled such that \(\Omega (E_0) = 10\)
for the \(L\) = 12 ten-state Potts model at temperatures 0.5, 0.7,
0.70991, 1.0, 5.0, 10.0, and -1.0 after 10\(^6\) cycles using conventional
Monte Carlo. \(\Omega (E)\) for \(T\) = 1.0 is scaled to match \(\Omega (E)\) for \(T\) = 0.70991,
and \(\Omega (E)\)’s at higher \(T\)’s (and \(T\) = 1.0) are matched to those at
neighboring lower \(T\). |
From this data, we can see the trace of the curve defining the true density of states. It appears to have a maximum at \(E/M\approx -0.25\) of a whopping 10\(^{145}\) states. We also see that the density of states decreases with energy for \(E/N > -0.25\); energy levels higher than this are apparently sampled well in an MC run with negative temperature. Now, in order to piece the true \(\Omega (E)\) together from these many runs, we took advantage of the fact that \(\Omega (E_0) = Q\), and scaled the densities to have them match in the overlapping regions. Notice that the two partial densities of states for \(T\) = 0.7 and 0.70991 agree (not surprising), but that the density of states for \(T\) = 0.5 appears much too large. All told, it appears that the total curve is adequately represented by the anti-Boltzmann-treated histograms from just three 10\(^6\)-cycle NVT MC runs: \(T \in \left \{0.7,1.0,-1.0\right \}\).
Now, I have to admit that I have cheated somewhat. I knew ahead of time that the density of states has a maximum, so I knew that negative temperature MC would resolve some part of it. I also knew that the critical temperature is 0.70991, but it was nice to see that number supported by standard MC. Let us now learn how to compute \(\Omega \) from a single MC run using the Wang-Landau technique.
Now we will use the WL algorithm to compute \(\Omega (E)\) for the \(L\) = 12 ten-state Potts model. (I used the code wl-w.c to obtain these results.) The relevant run parameters we have to specify are the exponent for the update rule of the factor \(f\), the initial and final values of \(f\), and the flatness criterion for the histogram. The data presented here are from a single run for which \(\ln f_{\rm initial}\) = 1, \(\ln f_{\rm final}\) = 10\(^{-8}\), and \(\ln f_{i+1} = \frac {1}{2}\ln f_{i}\); this prescribes a total of 26 independent values of \(f\) for which a random walk in energy space must be executed. A histogram is considered sufficiently flat if the minimum value if the histogram is greater than or equal to 80% of the mean value.
The figure below shows the resulting final \(\Omega (E)\), compared to the piecemeal \(\Omega (E)\) computed from three NVT MC simulations. We see that the two methods give essentially the same function.
|
|
Density of states, \(\Omega (E)\), for the \(L\) = 12 ten-state Potts model, computed
by the WL technique, and pieced together from NVT MC. |
The close-up shown in the inset reveals some significant differences at the lowest energy levels between the WL and NVT results. This is because the visits to the lowest energy levels were quite rare in the MC case, making the statistical accuracy of the energy histogram in this subdomain poor.
The next figure shows the cumulative cost (in number of cycles) of the WL method for this system, and the relative error in neighboring estimates of \(\Omega \), as functions of the number of updates to the factor \(f\). The relative error is defined as \begin{equation} \epsilon \equiv \int dE \left (\ln \Omega _{i}-\ln \Omega _{i-1}\right )^2 \end{equation}
|
|
Cumulative number of cycles and relative error vs. \(f\)-update
number in WL sampling of the \(L\) = 12 ten-state Potts model. |
We see from this data that it requires roughly \(10^6\) cycles to converge \(\Omega \) to a tolerance of about 10\(^{-4}\), and that an additional \(3\times 10^6\) cycles does not improve this convergence. So, WL for this system is somewhat less expensive than the \(3\times 10^6\) required for the canonical MC runs.
Of course, the canonical MC runs would have been much more expensive had I not known ahead of time what the critical temperature was. For the WL method, no such knowledge is required a priori.
It should be pointed out that \(L\) = 12 is a very small system. Wang and Landau broke major new ground in studying the thermodynamics of the Potts model by considering \(L\) up to 200.
Although WL does arrive at an accurate estimate of \(\Omega \), it is not dramatically more efficient than conventional NVT MC. One of the reasons is due to the nature of the random walk: “steps” in energy space are necessarily local. Changing one spin changes energy by at most \(\pm \)4, and on average the change in energy per flip is much lower in absolute value than that. So the system can spend a lot of time trying to get away from a large subdomain of energy space for which the histogram is already sufficiently flat, in order to explore neighboring subdomains. WL showed that this deficiency can be partially avoided by conducting several parallel runs in different energy windows, which overlap a little bit to allow matching of the resulting densities of states.
In this section, we report results of a small case study which investigates the required flatness criterion to achieve a converged density of states. We will consider the converged \(\Omega \) computed using the 80% criterion as the “exact” result, and compare this to \(\Omega \)’s computed using the criterion that every bin in the energy histogram has at least \(M\) hits. We considered \(M \in \left \{2,20,200,2000,20000,200000\right \}\). In order to minimize the number of cycles, we will sample the histogram (that is, query its flatness) once per cycle. We also used the same termination criterion for the update of \(f\). In the figure below, I show both the relative error between \(\Omega \) computed with the \(M\)-criterion and \(\Omega _{\rm exact}\), as well as the number of cycles required for the \(M\)-criterion computation.
|
|
Relative error (\(\Box \)) and number of cycles (\(\bigcirc \)) vs. minimum hit level \(M\)
for flatness in WL sampling of the \(L\) = 12 ten-state Potts model. |
Interestingly, we see that even out to \(>10^6\) cycles, our result has only marginally converged to the exact result; an error of \(\approx 0.1\) is larger than the iteration-to-iteration convergence tolerance of \(\approx 10^{-4}\) observed in the original run. Nevertheless, this level of error is very small; the curves are indistinguishable on a plot. Another interesting fact is that the \(M\) = 2 run is apparently more expensive than the \(M\) = 20 run. This is likely due to fact that the initial configuration for each run segment (each value of \(f\)) is different; it is the final configuration from the previous segment. It just so happens that the 23rd iteration for \(M\) = 2 begins in a configuration that initiates a walk that more slowly covers energy space than the configurations in the \(M\) = 20 runs. Beyond \(M\) = 20, the behavior is as expected: increasing \(M\) makes the simulation more expensive.