We saw that the Metropolis Monte Carlo simulation technique generates a sequence of states with appropriate probabilities for computing ensemble averages (Eq. 1). Generating states probabilitistically is not the only way to explore phase space. The idea behind the Molecular Dynamics (MD) technique is that we can observe our dynamical system explore phase space by solving all particle equations of motion. We treat the particles as classical objects that, at least at this stage of the course, obey Newtonian mechanics. Not only does this in principle provide us with a properly weighted sequence of states over which we can compute ensemble averages, it additionally gives us time-resolved information, something that Metropolis Monte Carlo cannot provide. The “ensemble averages” computed in traditional MD simulations are in practice time averages: \begin{equation} \label {eq:md:tmavg} \left <G\right > = \bar {G} = \frac {1}{N_{\rm samp}\Delta t}\sum _{i=1}^{N_{\rm samp}} G\left [r\left (t\right )\right ] \end{equation} The ergodic hypothesis partially requires that the measurement time, \(\tau _{\rm meas} = N_{\rm samp}\Delta t\), is greater than the longest relaxation time, \(\tau _r\), in the system. The price we pay for this extra information is that we must at least access if not store particle velocities in addition to positions, and we must compute interparticle forces in addition to potential energy. We will introduce and explore MD in this section.
The Newtonian equations of motion can be expressed as \begin{equation} m \ddot {\bf r}_i + \nabla _i\mathscr {U} = 0 \end{equation} where \(\ddot {\bf r}_i\) is the acceleration of particle \(i\), and the force acting on particle \(i\) is given by the negative gradient of the total potential, \(\mathscr {U}\), with respect to its position: \begin{equation} {\bf f}_i = -\nabla _i\mathscr {U} = -\frac {\partial \mathscr {U}}{\partial {\bf r}_i} \end{equation} Whereas in a typical MC simulation, in which all we really need is the ability to evaluate the potential energy of a configuration, in MD we actually need to evaluate all interparticle forces for a configuration.
We first encountered interparticle forces in Sec. 4.6 in a discussion of the virial in computing pressure in a standard Metropolis Monte Carlo simulation of the Lennard-Jones liquid. At this point, it suffices to consider a system with generic pairwise interactions, for which the total potential is given by: \begin{equation} \label {eq:md:tot} \mathscr {U} = \sum _j\sum _{k>j} u_{jk}\left (r_{jk}\right ) \end{equation} where \(r_{jk}\) is the scalar distance between particles \(j\) and \(k\), and \(u_{jk}\) is the pair potential specific to pair \((j,k)\). For a system of \(N\) identical particles, Eq. 87 is a summation of \(\frac {1}{2}N\left (N-1\right )\) terms. So, the force on any particular particle, \(i\), selects \(N\) terms from the above summation; that is, those terms involving particle \(i\): \begin{equation} \label {eq:md:frci} {\bf f}_i = - \sum _{j=1}^{N} \frac {\partial u_{ij}\left (r_{ij}\right )}{\partial {\bf r}_i} = \sum _{j=1}^{N} {\bf f}_{ij} \end{equation} where we can define the quantity \({\bf f}_{ij}\) is the force exerted on particle \(i\) by virtue of the fact that it interacts with particle \(j\). Because \(u_{ij}\) is a function of a scalar quantity, we can break the derivative up:
Eq. ?? illustrates that, because \({\bf r}_{ij} = -{\bf r}_{ji}\), \begin{equation} {\bf f}_{ij} = -{\bf f}_{ji} \end{equation} This leads us to the comforting result that \begin{equation} {\bf F} = \sum _i {\bf f}_i = 0 \end{equation} That is, the total force on the collection of particles is zero. (The same result holds identically for all potentials which are functions of relative interatomic positions only.) But the practical advantage of this result is that, when we visit the pair \((i,j)\) and compute the force on \(i\) due to its interaction with \(j\), \({\bf f}_{ij}\), we automatically have the force on \(j\) due to its interaction with \(i\), \(-{\bf f}_{ij}\). Some refer to this as “Newton’s Third Law.”
The other key aspect of a simple MD program is a means of numerical integration of the equations of motion of each particle. We first consider the “simple Verlet” algorithm, which is an explicit integration scheme. Let us consider a Taylor-expanded version of one coordinate of the position of a particular particle, \(r(t)\): \begin{equation} r\left (t+\Delta t\right ) = r\left (t\right ) + v\left (t\right )\Delta t +\frac {f\left (t\right )}{2m}\left (\Delta t\right )^2 + \frac {\left (\Delta t\right )^3}{3!}\stackrel {\dots }{r} + \mathscr {O}\left [\left (\Delta t\right )^4\right ], \end{equation} and, letting \(\Delta t \rightarrow -\Delta t\), \begin{equation} r\left (t-\Delta t\right ) = r\left (t\right ) - v\left (t\right )\Delta t + \frac {f\left (t\right )}{2m}\left (\Delta t\right )^2 - \frac {\left (\Delta t\right )^3}{3!}\stackrel {\dots }{r} + \mathscr {O}\left [\left (\Delta t\right )^4\right ]. \end{equation} When we add these together, we obtain: \begin{equation} \label {eq:verlet1} r\left (t+\Delta t\right ) \approx 2r\left (t\right ) - r\left (t-\Delta t\right ) + \frac {f\left (t\right )}{2m}\left (\Delta t\right )^2. \end{equation} Eq. 93 is termed the “Verlet” algorithm (going back to Verlet’s simulations of liquid argon [4]). Notice that, when one chooses a small \(\Delta t\), one can predict the position of a particle at time \(t + \Delta t\) given its position at time \(t\) and the force acting on it at time \(t\). We see that the new position coordinate has an error of order \(\left (\Delta t\right )^4\). \(\Delta t\) is called the “time step” in a molecular dynamics simulation.
A system obeying Newtonian mechanics conserves total energy. For a dynamical system (i.e., a system of interacting particles) obeying Newtonian mechanics, the configurations generated by integration are members of the microcanonical ensemble; that is, the ensemble of configurations for which \(NVE\) is constant, constrained to a subvolume \(\Omega \) in phase space. The “natural” ensemble for Metropolis Monte Carlo, you will recall, is canonical; for MD, it is microcanonical. Later, we will consider techniques for conducting MD simulations in other ensembles (at constant temperature and/or pressure, for example).
When the Verlet algorithm is used to integrate Newtonian equations of motion, the total energy of the system is conserved to within a finite error, so long as \(\Delta t\) is “small enough.” How does one determine a reasonable value for \(\Delta t\)? Basically the same way we determined reasonable maximum displacements in continuous-space MC simulation: trial and error. We will play with time-step values in the next section, in which we consider MD simulation of the Lennard-Jones liquid.
In saying that the total energy is conserved, we realize that total energy is the sum of potential and kinetic energy. To integrate the equations of motion, we need to compute neither the potential or kinetic energy, so we have to take extra steps in an MD program to make sure total energy is being conserved. Potential energy is easily accumulated during the calculation of forces, but kinetic energy has to be computed using particle velocities: \begin{equation} \mathscr {K} = \frac {1}{2}\sum _i m_i\left |{\bf v}_i\right |^2 \end{equation} But where are velocities in the Verlet algorithm? They are not necessary for updating positions, but can be easily “generated” provided one stores previous, current, and next-time-step positions in implementing the algorithm: \begin{equation} \label {eq:verlet_vel} v\left (t\right ) = \frac {r\left (t+\Delta t\right ) - r\left (t-\Delta t\right )}{2\Delta t} + \mathscr {O}\left [\left (\Delta t\right )^2\right ] \end{equation} Eq. 95 is used in Algorithm 6 in F&S (Integration of the Equations of Motion) simply in order to compute \(\mathscr {K}\). Energy consservation can be checked by tracking the total energy, \(\mathscr {U}+\mathscr {K}\).
While we are considering the instantaneous kinetic energy, \(\mathscr {K}\), it is useful to recognize a working definition of instantaneous temperature, \(T\): \begin{equation} \frac {3}{2}Nk_BT = \mathscr {K} = \frac {1}{2}\sum _{i=1}^{N}m_i\left |{\bf v}_i\right |^2 \end{equation} Because \(\mathscr {K}\) fluctuates in time as the system evolves, so does the temperature. So, the actual temperature of a system in a microcanonical MD simulation is a time-average.
Perhaps the most popular integrator is the “velocity Verlet” algorithm [5]. Every MD code I have ever written or used (this totals a dozen or so) has used the velocity Verlet algorithm, so I feel at least it is worth explaining here. The velocity Verlet algorithm requires updates of both positions and velocities:
The update of velocities uses an arithmetic average of the force at time \(t\) and \(t+\Delta t\). This results in a slightly more stable integrator compared to the simple Verlet algorithm, in that one may use slightly larger time-steps to achieve the same level of energy conservation. (Aside: a nice project idea is to quantify this statement.) This might imply that one has to maintain two parallel force arrays. In practice, this is not necessary, because the velocity update can be split to either side of the force computation, forming a so-called “leapfrog” algorithm:
In the next section (Sec. 5.2), we will consider the velocity Verlet algorithm in the context of an MD simulation of the Lennard-Jones fluid.
As a final tidbit, we must consider periodic boundaries applied in a molecular dynamics simulation. Consider modes of a system. Think of a mode as a concerted vibration of collections of particles with a characteristic wavelength. A dense system will have short wavelength (local) modes, and long wavelength modes, like large-scale concerted “sloshing” of the particles in the system. These modes exist naturally in matter, and the partitioning of energy among these various modes is important to understand in describing some transport properties. The key caveat is that modes with wavelengths that are incommensurate with the box size are not permitted in a periodic system because they cancel themselves. A mode is commensurate with the box so long as an integer multiple of its wavelength is the box length. This can actually be very restrictive in systems with a wide span of wavelengths, like amorphous unstructured solids, but is not that important for amorphous liquids.
In this section, we present an elegant formalism for deriving MD integrators, as discussed by Tuckerman et al. [6]. What we present here is essentially the first two parts of the second section of Reference [6], including some of my own elaboration and some of that presented in section 4.3 of F&S.
Imagine a quantity \(f\) which is a function of particle positions \({\bf r}^N\) and momenta \({\bf p}^N\). Its time derivative is given by \begin{equation} \label {eq:liouville_1} \dot {f} = \dot {\bf r}\frac {\partial f}{\partial {\bf r}} + \dot {\bf p}\frac {\partial f}{\partial {\bf p}} \end{equation} We can write down a formal solution to this equation. First, define the Liouville operator as \begin{equation} iL = \dot {\bf r}\frac {\partial }{\partial {\bf r}} + \dot {\bf p}\frac {\partial }{\partial {\bf p}} \end{equation} As Tuckerman points out, the \(i\) is there by convention and ensures that the operator is Hermitian. We can re-express Eq. 97 as \begin{equation} \dot {f} = iLf \end{equation} which we solve directly to yield \begin{equation} f\left (t\right ) = \exp \left (iLt\right )f\left (0\right ). \end{equation} If \(f\) is itself a vector quantity identical to the set of positions and momenta, \(\Gamma \), we have a way to express, formally, the evolution of the system: \begin{equation} \label {eq:propagator} \Gamma \left (t\right ) = U\left (t\right )\Gamma \left (0\right ) \end{equation} where \(U(t) = \exp \left (iLt\right )\) is the classical propagator. The idea with numerical integration is that we find a way to represent the propagator as a discrete algorithm for constructing the system at some time \(t+ \Delta t\) given the system at time \(t\).
Let’s build our discrete integrator by decomposing the operator: \begin{equation} iL = iL_1 + iL_2 \end{equation} This does not necessarily lead to two independent propagators, because the two components do not commute; that is: \begin{equation} \exp \left [\left ( iL_1 + iL_2\right )t\right ] \ne \exp \left (iL_1t\right )\exp \left (iL_2t\right ) \end{equation}
Consider the action of the partial Liouville operator \begin{equation} iL_1 \equiv \dot {\bf r}\left (0\right )\frac {\partial }{\partial {\bf r}}, \end{equation} which gives
The last line is the collapse of the Taylor expansion of the line immediately above it. So, the effect of this operator fragment is a simple shift of coordinates given some initial velocities. This is an interesting fact: we can consider first-order integration as a Taylor expansion.
The next step of Tuckerman was to apply the Trotter identity: \begin{equation} \exp \left [\left ( iL_1 + iL_2\right )t\right ] = \lim _{P\rightarrow \infty } \left [\exp \left (iL_1t/2P\right )\exp \left (iL_t2/P\right )\exp \left (iL_1t/2P\right )\right ]^P \end{equation} When \(P\) is large but finite: \begin{equation} \exp \left [\left (iL_1 + iL_2\right )t\right ] = \left [\exp \left (iL_1t/2P\right )\exp \left (iL_2t/P\right )\exp \left (iL_1t/2P\right )\right ]^P\exp \left [\mathscr {O}\left (1/P^2\right )\right ] \end{equation} Now, we define a finite timestep as \(\Delta t = t/P\) and we have then a discrete operator that, when applied to a configuration at time \(t\), will produce the configuration at time \(t + \Delta t\): \begin{equation} \exp \left (iL_1\Delta t/2\right )\exp \left (iL_2\Delta t\right )\exp \left (iL_1\Delta t/2\right )\Gamma \left (t\right ) = \Gamma \left (t+\Delta t\right ) \end{equation} By performing this operation sequentially \(P\) times, we recover a discretized version of the formal solution to generate \(\Gamma \left (t\right )\) given \(\Gamma \left (0\right )\).
Now we explicitly consider the decomposition:
We can perform one \(\Delta t\)’s worth of update using the following operation on \(f\): \[ \exp \left (\dot {\bf p}\left (0\right )\left (\frac {\partial }{\partial {\bf p}}\right )\frac {\Delta t}{2}\right ) \exp \left (\dot {\bf r}\left (0\right )\left (\frac {\partial }{\partial {\bf r}}\right )\Delta t\right ) \exp \left (\dot {\bf p}\left (0\right )\left (\frac {\partial }{\partial {\bf p}}\right )\frac {\Delta t}{2}\right ) f\left [{\bf p}^N\left (t\right ),{\bf r}^N\left (t\right )\right ] \] The action of the rightmost operator, \(\exp \left (\dot {\bf p}\left (0\right )\left (\frac {\partial }{\partial {\bf p}}\right )\frac {\Delta t}{2}\right )\): \[ f\left [{\bf p}^N\left (t\right ),{\bf r}^N\left (t\right )\right ] \rightarrow f\left \{\left [{\bf p}\left (t\right )+\frac {\Delta t}{2}\dot {\bf p}\left (\Delta t\right )\right ]^N,\left [{\bf r}\left (t\right )\right ]^N\right \} \] The action of the next rightmost, \(\exp \left (\dot {\bf r}\left (0\right )\left (\frac {\partial }{\partial {\bf r}}\right )\Delta t\right )\): \[ f\left \{\left [{\bf p}\left (t\right )+\frac {\Delta t}{2}\dot {\bf p}\left (\Delta t\right )\right ]^N,\left [{\bf r}\left (t\right )\right ]^N\right \} \rightarrow f\left \{\left [{\bf p}\left (t\right )+\frac {\Delta t}{2}\dot {\bf p}\left (0\right )\right ]^N, \left [{\bf r}\left (t\right )+ \Delta t \dot {\bf r}\left (\frac {\Delta t}{2}\right )\right ]^N\right \} \] Then, the action of the final operator: \[ f\left \{\left [{\bf p}\left (t\right )+\frac {\Delta t}{2}\dot {\bf p}\left (0\right )\right ]^N, \left [{\bf r}\left (t\right )+ \Delta t \dot {\bf r}\left (\frac {\Delta t}{2}\right )\right ]^N\right \} \rightarrow f\left \{\left [{\bf p}\left (t\right )+\frac {\Delta t}{2}\dot {\bf p}\left (0\right )+ \frac {\Delta t}{2}\dot {\bf p}\left (\Delta t\right )\right ]^N, \left [{\bf r}\left (t\right )+ \Delta t \dot {\bf r}\left (\frac {\Delta t}{2}\right )\right ]^N\right \}\\ \]
Noting that \({\bf p} = m\dot {\bf r}\) and \({\bf F} = m{\bf a} = \dot {\bf p}\), we can summarize the effect of this three-step update of the positions and velocities as
This is the velocity-Verlet algorithm, seen previously in Eqs ??-??. Interestingly, we can also reverse the order of the decomposition; i.e.,
The update algorithm that arises is
This is termed the position Verlet algorithm [6]. Tuckerman et al. showed that this new algorithm results in a slightly lower drift in total energy in MD simulation of a simple Lennard-Jones fluid, when the time-step is greater than about 0.004.
Let us consider a few of the important elements of any MD program:
We will understand these elements by manipulating an existing simulation program that implements the
Lennard-Jones fluid (which you may recall was analyzed using Metropolis Monte-Carlo simulation in Sec. 4).
The program is called mdlj.c in the instructional codes repository.
Recall the Lennard-Jones pair potential: \begin{equation} \label {eq:md:lj} u\left (r\right ) = 4\epsilon \left [\left (\frac {\sigma }{r}\right )^{12}- \left (\frac {\sigma }{r}\right )^{6}\right ] \end{equation} When implementing this in an MD code, similar to its implementation in MC, we adopt a reduced unit system, in which we measure length in \(\sigma \), and energy in \(\epsilon \). Additionally, for MD, we need to measure mass, and we adopt units of particle mass, \(m\). This convention makes the force on a particle numerically equivalent to its acceleration. With these conventions, time is a derived unit: \begin{equation} t [=] \sigma \sqrt {m/\epsilon } \end{equation} We also measure reduced temperature in units of \(\epsilon /k_B\); so for a system of identical Lennard-Jones particles: \begin{equation} \label {eq:md:T} \frac {3}{2}NT = \mathscr {K} = \frac {1}{2}\sum _{i=1}^{N}\left |{\bf v}_i\right |^2 \end{equation} (Recall that the mass \(m\) is 1 in Lennard-Jones reduced units.) These conventions obviate the need to perform unit conversions in the code.
We have already encountered interparticle forces for the Lennard-Jones pair potential in the context of computing the pressure from the virial in the MC simulation of the LJ fluid (Sec. 4.6). Briefly, the force exerted on particle \(i\) by virtue of its Lennard-Jones interaction with particle \(j\), \({\bf f}_{ij}\), is given by: \begin{equation} {\bf f}_{ij}\left (r_{ij}\right ) = \frac {{\bf r}_{ij}}{r_{ij}^2}\left \{48\epsilon \left [\left (\frac {\sigma }{r_{ij}}\right )^{12} - \frac {1}{2}\left (\frac {\sigma }{r_{ij}}\right )^6\right ]\right \} \equiv {\bf r}_{ij}f. \end{equation} And, as shown in the previous section, once we have computed the vector \({\bf f}_{ij}\), we automatically have \({\bf f}_{ji}\), because \({\bf f}_{ij} = -{\bf f}_{ji}\). The scalar \(f\) is called a “force factor.” If \(f\) is negative, the force vector \({\bf f}_{ij}\) points from \(i\) to \(j\), meaning \(i\) is attracted to \(j\). Likewise, if \(f\) is positive, the force vector \({\bf f}_{ij}\) points from \(j\) to \(i\), meaning that \(i\) is being forced away from \(j\).
Below is a C-code fragment for computing both the total potential energy and interparticle forces:
double forces ( double * rx, double * ry, double * rz,
double * fx, double * fy, double * fz,
int n ) {
int i,j;
double dx, dy, dz, r2, r6, r12;
double e = 0.0, f = 0.0;
for (i=0;i<n;i++) {
fx[i] = 0.0;
fy[i] = 0.0;
fz[i] = 0.0;
}
for (i=0;i<(n-1);i++) {
for (j=i+1;j<n;j++) {
dx = rx[i]-rx[j];
dy = ry[i]-ry[j];
dz = rz[i]-rz[j];
r2 = dx*dx + dy*dy + dz*dz;
r6i = 1.0/(r2*r2*r2);
r12i = r6i*r6i;
e += 4*(r12i - r6i);
f = 48/r2*(r6i*r6i-0.5*r6i);
fx[i] += dx*f;
fx[j] -= dx*f;
fy[i] += dy*f;
fy[j] -= dy*f;
fz[i] += dz*f;
fz[j] -= dz*f;
}
}
return e;
}
Notice that the argument list now includes arrays for the forces, and because force is a vector quantity, we have three parallel arrays for a three-dimensional system. These forces must of course be initialized, shown in lines 6-10. The \(N^2\) loop for visiting all unique pairs of particles is opened on lines 11-12. The inside of this loop is very similar to the evaluation of potential first presented in the MC simulation of the Lennard-Jones fluid; the only real difference is the computation of the “force factor,” \(f\), on line 20, and the subsequent increment of force vector components on lines 21-26. Notice as well that there is no implementation of periodic boundary conditions in this code fragment; it was left out for simplicity. What would this “missing” code do? (Hint: look at the code mdlj.c for the answer.)
The second major aspect of MD is the integrator. As discussed in class, we will primarily use Verlet-style (explicit) integrators. The most common version is the velocity-Verlet algorithm [5], first presented in Sec. 5.1.1. Below is a fragment of C-code for executing one time step of integration for a system of \(N\) particles:
for (i=0;i<N;i++) {
rx[i]+=vx[i]*dt+0.5*dt2*fx[i];
ry[i]+=vy[i]*dt+0.5*dt2*fy[i];
rz[i]+=vz[i]*dt+0.5*dt2*fz[i];
vx[i]+=0.5*dt*fx[i];
vy[i]+=0.5*dt*fy[i];
vz[i]+=0.5*dt*fz[i];
}
PE = total_e(rx,ry,rz,fx,fy,fz,N,L,rc2,ecor,ecut,&vir);
KE = 0.0;
for (i=0;i<N;i++) {
vx[i]+=0.5*dt*fx[i];
vy[i]+=0.5*dt*fy[i];
vz[i]+=0.5*dt*fz[i];
KE+=vx[i]*vx[i]+vy[i]*vy[i]+vz[i]*vz[i];
}
KE*=0.5;
Notice the update of positions (Eq. ??), where vx[i] is the \(x\)-component of velocity, fx[i] is the
\(x\)-component of force, dt and dt2 are the time-step and squared time-step, respectively. Notice that there is no
implementation of periodic boundaries in this code fragment; what would this “missing code” look like? (Hint:
see mdlj.c for the answer!) Lines 5-7 are the first half-update of velocities (Eq. ??). The force routine computes
the new forces on the currently updated configuration on line 10. Then, lines 12-18 perform the
second-half of the velocity update (Eq. ??). Also note that the kinetic energy, \(\mathscr {K}\), is computed in this
loop.
The complete C program, mdlj.c, contains a complete implementation of the Lennard-Jones force routine and
the velocity-Verlet integrator. Compilation instructions appear in the header comments. Let us now consider
some sample results from mdlj.c. First, mdlj.c includes an option that outputs a brief summary of the command
line options available:
$ mdlj -h mdlj usage: mdlj [options] Options: -N [integer] Number of particles -rho [real] Number density -dt [real] Time step -rc [real] Cutoff radius -ns [real] Number of integration steps -T0 [real] Initial temperature -fs [integer] Sample frequency -traj [string] Trajectory file name -prog [integer] Interval with which logging output is generated -icf [string] Initial configuration file -seed [integer] Random number generator seed -uf Print unfolded coordinates in trajectory file -h Print this info
Let us run mdlj.c for 512 particles and 1000 time-steps at a density of 0.85 and an initial temperature of 2.5. We will pick a relatively conservative (small) time-step of 0.001. We will not specify an input configuration, instead allowing the code to create initial positions on a cubic lattice. Here is what we see in the terminal:
$ ./mdlj -N 512 -fs 10 -ns 1000 -traj traj.xyz -T0 2.5 -rho 0.85 -rc 2.5 # NVE MD Simulation of a Lennard-Jones fluid # L = 8.44534; rho = 0.85000; N = 512; rc = 2.50000 # nSteps 1000, seed 23410981, dt 0.00100 #LABELS step time PE KE TE drift T P 0 0.00000 -2429.99610 1919.39622 -510.59988 2.58118e-07 2.49921 4.28896 1 0.00100 -2428.18306 1917.58237 -510.60069 1.84111e-06 2.49685 4.30232 2 0.00200 -2425.15191 1914.54975 -510.60216 4.73104e-06 2.49290 4.32466 3 0.00300 -2420.88660 1910.28230 -510.60430 8.92431e-06 2.48735 4.35604 4 0.00400 -2415.36384 1904.75673 -510.60711 1.44276e-05 2.48015 4.39659 5 0.00500 -2408.55314 1897.94254 -510.61060 2.12521e-05 2.47128 4.44649 6 0.00600 -2400.41688 1889.80212 -510.61476 2.94064e-05 2.46068 4.50593 7 0.00700 -2390.91048 1880.29088 -510.61960 3.88852e-05 2.44830 4.57515 8 0.00800 -2379.98339 1869.35814 -510.62525 4.99443e-05 2.43406 4.65427 9 0.00900 -2367.57807 1856.94670 -510.63137 6.19292e-05 2.41790 4.74385 10 0.01000 -2353.63308 1842.99526 -510.63782 7.45638e-05 2.39973 4.84412 11 0.01100 -2338.08390 1827.43905 -510.64484 8.83209e-05 2.37948 4.95492 12 0.01200 -2320.86308 1810.21130 -510.65178 1.01908e-04 2.35705 5.07698 ...
Each line of output after the header information corresponds to one time-step. (The header line that begins with
the special word #LABELS is used by the Python program plot_mdlj_log.py.) The first column is the
time-step, the second is the time value, the third is the potential energy, the fourth is the kinetic energy, the fifth
is the total energy, the sixth is the “drift,” the seventh is the instantaneous temperature (Eq. 110), and the
eighth is the instantaneous pressure (Eq. 78).
The drift is output to assess the stability of the explicit integration. As a rule of thumb, we would like to keep the drift to below 0.01% of the total energy. The drift reported by mdlj.c is computed as \begin{equation}\label {eq:md:edrift} \Delta \mathscr {T}\left (t\right ) = \frac {\mathscr {T}\left (t\right )-\mathscr {T}\left (0\right )}{\mathscr {T}\left (0\right )} \end{equation} where \(\mathscr {T}\) is the total energy. The plots below show the output trace for the full 1,000 time-steps. We note that with this time-step value (0.001) keeps the total energy conserved to about one part in 10\(^4\).
Now, this invocation of mdlj.c produces an XYZ-format trajectory file (just like mclj did). Here, I have
expanded my XYZ-format convention to allow inclusion of velocities in the output file. Inclusion of velocities is
signified by a 1 on the same line as the number of particles (the first line in each frame). If there is a
1 in that position, then each particle line includes three position coordinates and three velocity
components:
$ more traj.xyz 512 1 BOX 8.44534 8.44534 8.44534 16 0.527833 0.527833 0.527833 -0.982352 -0.823300 -1.530590 16 1.583500 0.527833 0.527833 -0.549726 -0.942381 1.363996 16 2.639167 0.527833 0.527833 1.145014 -0.616762 -1.172725 16 3.694834 0.527833 0.527833 1.520337 3.057595 -1.522653 16 4.750501 0.527833 0.527833 -0.541350 -0.144665 -0.919845 16 5.806168 0.527833 0.527833 -0.221483 1.143893 -1.015784 16 6.861835 0.527833 0.527833 5.752205 -1.045603 1.066189 16 7.917502 0.527833 0.527833 0.819532 0.167275 -1.230297 16 0.527833 1.583500 0.527833 -1.062591 2.169692 -1.318921 16 1.583500 1.583500 0.527833 -1.394993 0.055193 1.411530 ...
The number “16” at the beginning designates the “type” as an atomic number; for simplicity, I have decided to
call all of my particles sulfur. (XYZ format is often used for atomically-specific configurations.) The functions
xyz_out() and xyz_in() write and read this format, respectively, in mdlj.c. We will use these functions in
other programs as well, typically those which analyze configuration data. Examples of such analysis codes are
the subjects of the next two sections.
At this point, you can’t do much with all this data, except appreciate just how much data an MD code can produce. In this example, we generated 100 frames of configuration data (positions and velocities) for a 512-atoms system, and the resulting trajectory file is about 3.5 megabytes in size. Of course, that file size scales with both the number of particles and the number of frames it contains. It is not unusual nowadays for researchers to use MD to produce hundreds of gigabytes of configuration data in order to write a single paper. It leads one to think that perhaps a lesson on handling large amounts of data is appropriate for a course on Molecular Simulation; however, I’ll forego that for now by trying to keep our sample exercises small.
One thing we can do with this data is make nice pictures using VMD. Below are two renderings, one of the initial snapshot, and the other at time \(t\) = 1 (1000 time steps). Notice how the initially perfect crystalline lattice has been wiped out.
The code mdlj.c was run on 108 particles with positions initialized on a simple cubic lattice at a density \(\rho \) = 0.8442 and temperature (specified by velocity assignment and scaling) initially set at \(T\) = 0.728. A single run of 600,000 time steps was performed, which took about 2 minutes on my laptop. (This is almost 10\(^6\) particle-updates per second; not bad for a laptop running a silly \(N^2\) pair search, but it’s only for \(\sim \) 100 particles...the same algorithm applied to 10,000 would be slower.) The commands I issued looked like:
$ mkdir md_cs2 $ cd md_cs2 $ ../mdlj -N 108 -fs 1000 -ns 600000 -rho 0.8442 -T0 0.728 -rc 2.5 -traj traj1.xyz -prog 100 >& 1.out & $ tail -f 1.out 468900 468.90000 -475.45590 242.36056 -233.09534 -2.79034e-04 1.49605 5.43064 469000 469.00000 -475.67663 242.58353 -233.09310 -2.88639e-04 1.49743 5.11914 469100 469.10000 -477.58731 244.49557 -233.09175 -2.94452e-04 1.50923 4.88396 469200 469.20000 -457.41152 224.32176 -233.08975 -3.02989e-04 1.38470 5.82717 469300 469.30000 -492.03036 258.93556 -233.09481 -2.81326e-04 1.59837 4.82093 469400 469.40000 -465.44072 232.34350 -233.09723 -2.70947e-04 1.43422 5.69688 469500 469.50000 -478.52166 245.42514 -233.09652 -2.73957e-04 1.51497 5.01491 469600 469.60000 -469.00769 235.90985 -233.09784 -2.68307e-04 1.45623 5.64466 469700 469.70000 -476.27992 243.18354 -233.09637 -2.74603e-04 1.50113 5.34172 469800 469.80000 -461.91910 228.82734 -233.09176 -2.94382e-04 1.41251 5.68907 469900 469.90000 -469.00227 235.90429 -233.09798 -2.677^C
The final \& puts the simulation “in the background,” and returns the shell prompt to you. The tail -f
command allows you to “watch” as the log file is written to. You can also verify that the job is running by issuing
the “top” command, which displays in the terminal the a listing of processes using CPU, ranked by how
intensively they are using the CPU. This command “takes over” your terminal to display continually updated
information, until you hit Ctrl-C.
top - 13:43:27 up 4 days, 6:19, 0 users, load average: 0.16, 0.45, 0.28 Tasks: 29 total, 2 running, 27 sleeping, 0 stopped, 0 zombie %Cpu(s): 12.7 us, 0.0 sy, 0.0 ni, 83.3 id, 0.0 wa, 0.0 hi, 4.0 si, 0.0 st MiB Mem : 12608.9 total, 11169.1 free, 739.2 used, 700.7 buff/cache MiB Swap: 4096.0 total, 4096.0 free, 0.0 used. 11616.8 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 14668 cfa 20 0 6652 1268 1116 R 100.0 0.0 0:02.83 mdlj 13073 cfa 20 0 863448 47256 28728 S 6.7 0.4 0:19.24 node 1 root 20 0 1020 648 520 S 0.0 0.0 1:29.57 init 8 root 20 0 888 76 16 S 0.0 0.0 0:00.00 init 9 root 20 0 888 76 16 S 0.0 0.0 1:04.20 init 10 cfa 20 0 10164 4980 3188 S 0.0 0.0 0:00.11 bash 168 root 20 0 984 172 16 S 0.0 0.0 0:00.00 init
From the command line arguments shown above, we can see that this simulation run will produce 601
snapshots, beginning with \(t=0\) and outputting every 1000 steps. Each frame has 108 lines of 69 bytes each, plus
another 30 for the header in each frame, so basically every frame is 7,482 bytes. With 601 of them, that is
4,496,682 bytes, or 4.2884 megabytes (1 megabyte is 1,048,576 bytes). We can confirm this calculation using
the du (disk usage) command:
$ du -sh traj1.xyz 4.3M traj1.xyz
Thirty years ago, one might have raised an eyebrow at this; nowadays, this is very nearly an insignificant amount of storage.
An important purpose of this case study is to quantify the notion of “equilibration” of the system by assessing correlations in (apparently) randomly fluctuating quantities like potential energy. Remember, in order to perform accurate ensemble averaging over an MD trajectory, we have to be sure that correlations in the properties we are measuring have “died out.” This is another way of saying that the length of the time interval over which we conduct the time average must be much longer than the correlation time. In this case study, we illustrate the “block averaging” technique of Flyvbjerg and Petersen [7] to determine the equilibration time-scale of the potential energy.
First, compute the variance of the \(L\) samples of \(\mathscr {U}\): \begin{equation} \sigma ^2_0\left ({\mathscr {U}}\right ) \approx \frac {1}{L}\sum _{i=1}^{L}\left [\mathscr {U}_i - \bar {\mathscr {U}}\right ]^2 \end{equation} This is approximate because of so far undetermined
time-correlations in \(\mathscr {U}\); that is, not all \(L\) samples are uncorrelated. For example, two samples one time-step apart
will likely be very close to one another. Now, we block the data by averaging \(L/2\) pairs of adjacent values: \begin{equation} \mathscr {U}_i^{(1)} = \frac {\mathscr {U}_{2i-1} + \mathscr {U}_{2i}}{2} \end{equation} The \((1)\)
superscript indicates that this is a “first-generation” coarsening of the potential energy trace. The variance of
this new set, \(\sigma ^2_1\) is computed. Then the process is recursively carried out through many subsequent
generations. After many blocking operations, the coarsened samples, \(\mathscr {U}_i^{(j)}\), become uncorrelated, and the
variance saturates (temporarily). This means we should observe a plateau in a plot of variance vs
generation. The Python program flyvberg.py implements this calculation using outputs of mdlj.c as
inputs.
According to the discussion of this blocking technique, the standard deviation shows a dependence on the blocking degree, \(M\), when the blocked averages are correlated, and plateaus at a blocking degree for which the averages become uncorrelated. This blocking degree corresponds to a time interval of length \(2^M\). This data indicates that potential energy decorrelates after a time of approximately \(2^{10} \approx 1000\) steps. This is a reassuring result, as we could have guessed that 1000 steps are required based on the initial transience in the energy traces seen in the previous figure. I ran three independent simulations, each differing only in the random number seed used. The results are shown in Table 1.
| Run | \(\left <\mathscr {U}\right >/N\) | \(\left <\mathscr {K}\right >/N\) | \(\left <\mathscr {T}\right >\) | \(\left <P\right >\) |
| 1 | \(-4.4165 \pm 0.0012\) | \(2.2577 \pm 0.0012\) | \(1.5051 \pm 0.0008\) | \(5.1960 \pm 0.0057\) |
| 2 | \(-4.4188 \pm 0.0010\) | \(2.2597 \pm 0.0010\) | \(1.5064 \pm 0.0007\) | \(5.1948 \pm 0.0050\) |
| 3 | \(-4.4157 \pm 0.0011\) | \(2.2564 \pm 0.0011\) | \(1.5043 \pm 0.0008\) | \(5.2022 \pm 0.0055\) |
| Avg. | \(-4.4170 \pm 0.0011\) | \(2.2579 \pm 0.0011\) | \(1.5053 \pm 0.0008\) | \(5.1977 \pm 0.0054\) |
A second major objective of this case study is to demonstrate how to compute the radial distribution function, \(g(r)\). The radial distribution function is an important statistical mechanical function that captures the structure of liquids and amorphous solids. We can express \(g(r)\) using the following statement: \begin{equation} \rho g(r) = \text {average density of particles at } {\bf r} \text { given that a tagged particle is at the origin} \end{equation}
The procedure we will follow will be to write a second program (a “postprocessing code”) which will read in the trajectory output produced by the simulation, mdlj.c. The general structure of a \(g(r)\) post-processing code could look like this:
We will consider a code, rdf.c, that implements this algorithm for computing \(g(r)\), but first we
present a brief argument for post-processing vs on-the-fly processing for computing quantities such
as \(g(r)\). For demonstration purposes, it is arguably simpler to drop in a \(g(r)\)-histogram update sampling
function into an existing MD simulation program to enable computation of \(g(r)\) during a simulation,
compared to writing a wholly separate program. After all, it nominally involves less coding. The
counterargument is that, once you have a working (and eventually optimized) MD simulation code,
one should be wary of modifying it. The purpose of the MD simulation is to produce samples.
One can produce samples once, and use any number of post-processing codes to extract useful
information. The counterargument becomes stronger when one considers that, for particularly
large-scale simulations, it is simply not convenient to re-run an entire simulation when one discovers a
slight bug in the sampling routines. The price one pays is that one needs the disk space to store
configurations.
As shown earlier, one MD simulation of 108 particles out to 600,000 time steps, storing configurations every 1,000 time steps, requires less than 5 MB. This is an insignificant price. Given that we know that the MD simulation works correctly, it is sensible to leave it alone and write a quick, simple post-processing code to read in these samples and compute \(g(r)\).
The code rdf.c is a C-code implementation of just such a post-processing code. This program illustrates a
different way to abstractify the trajectory, namely as a list of frames. A frame is an instance of an abstract data
type called frametype that we define (for now) in rdf.c, along with a special function NewFrame() to
allocate memory for a frame, and another read_xyz_frame() to read a frame in from an XYZ-format
trajectory:
typedef struct FRAME {
double * rx, * ry, * rz; // coordinates
double * vx, * vy, * vz; // velocities
int * typ; // array of particle types 0, 1, ...
int N; // number of particles
double Lx, Ly, Lz; // box dimensions
} frametype;
/* Create and return an empty frame */
frametype * NewFrame ( int N, int hv ) {
frametype * f = (frametype*)malloc(sizeof(frametype));
f->N=N;
f->rx=(double*)malloc(sizeof(double)*N);
f->ry=(double*)malloc(sizeof(double)*N);
f->rz=(double*)malloc(sizeof(double)*N);
if (hv) {
// caller has requested a frame with space for velocities
f->vx=(double*)malloc(sizeof(double)*N);
f->vy=(double*)malloc(sizeof(double)*N);
f->vz=(double*)malloc(sizeof(double)*N);
} else {
f->vz=NULL;
f->vy=NULL;
f->vx=NULL;
}
f->typ=(int*)malloc(sizeof(int)*N);
return f;
}
/* Read an XYZ-format frame from stream fp; returns the new frame.
Note the non-conventional use of the first line to indicate
whether or not the frame contains velocities and the comment
line to hold boxsize information. */
frametype * read_xyz_frame ( FILE * fp ) {
int N,i,j,hasvel=0;
double x, y, z, Lx, Ly, Lz, vx, vy, vz;
char typ[3], dummy[5];
char ln[255];
frametype * f = NULL;
if (fgets(ln,255,fp)){
sscanf(ln,"%i %i\n",&N,&hasvel);
f = NewFrame(N,hasvel);
fgets(ln,255,fp);
sscanf(ln,"%s %lf %lf %lf\n",dummy,&f->Lx,&f->Ly,&f->Lz);
for (i=0;i<N;i++) {
fgets(ln,255,fp);
sscanf(ln,"%s %lf %lf %lf %lf %lf %lf\n",
typ,&f->rx[i],&f->ry[i],&f->rz[i],&vx,&vy,&vz);
if (hasvel) {
f->vx[i]=vx;
f->vy[i]=vy;
f->vz[i]=vz;
}
j=0;
while(strcmp(elem[j],"NULL")&&strcmp(elem[j],typ)) j++;
if (strcmp(elem[j],"NULL")) f->typ[i]=j;
else f->typ[i]=-1;
}
}
return f;
}
With this abstract data type, we no longer need to pass all parallel arrays as separate parameters; we can
instead just pass a pointer frametype*. For example, below is the function rij that computes the
minimum-image convention distance between particles i and j in a particular frame:
/* Compute scalar distance between particles i and j in frame f;
note the use of the minimum image convention */
double rij ( frametype * f, int i, int j ) {
double dx, dy, dz;
double hLx=0.5*f->Lx,hLy=0.5*f->Ly,hLz=0.5*f->Lz;
dx=f->rx[i]-f->rx[j];
dy=f->ry[i]-f->ry[j];
dz=f->rz[i]-f->rz[j];
if (dx<-hLx) dx+=f->Lx;
if (dx> hLx) dx-=f->Lx;
if (dy<-hLy) dy+=f->Ly;
if (dy> hLy) dy-=f->Ly;
if (dz<-hLz) dz+=f->Lz;
if (dz> hLz) dz-=f->Lz;
return sqrt(dx*dx+dy*dy+dz*dz);
}
Using rij() it is then easy to update the RDF histogram:
/* An N^2 algorithm for computing interparticle separations
and updating the radial distribution function histogram. */
void update_hist ( frametype * f, double rcut,
double dr, int * H, int nbins ) {
int i,j;
double r;
int bin;
for (i=0;i<f->N-1;i++) {
for (j=i+1;j<f->N;j++) {
r=rij(f,i,j);
if (r<rcut) {
bin=(int)(r/dr);
if (bin<0||bin>=nbins) {
fprintf(stderr,
"Warning: %.3lf not on [0.0,%.3lf]\n",
r,rcut);
} else {
H[bin]+=2;
}
}
}
}
}
H is the histogram. One can see that the bin value is computed by first dividing the actual distance between
members of the pair by the resolution of the histogram, \(\delta r\), and casting the result as an integer. This resolution
can be specified on the command-line when rdf.c is executed. Also notice that the histogram is updated by 2,
which reflects the fact that either of the two particles in the pair can be placed at the origin. Also notice the
implementation of the minimum image convention.
In the main() function of rdf.c, a simple block of code can read in a whole trajectory from a file named by
trajfile and store it in an array of frametype* pointers:
i=0;
fprintf(stdout,"Reading %s\n",trajfile);
fp=fopen(trajfile,"r");
while (Traj[i++]=read_xyz_frame(fp));
nFrames=i-1;
fclose(fp);
if (!nFrames) {
fprintf(stdout,"Error: %s has no data.\n",trajfile);
exit(-1);
}
fprintf(stdout,"Read %i frames from %s.\n",nFrames,trajfile);
Then a second block of code allocates, initializes, and computes the pair correlation histogram:
/* Adjust cutoff and compute histogram */ L2min=min(Traj[0]->Lx/2,min(Traj[0]->Ly/2,Traj[0]->Lz/2)); if (rcut>L2min) rcut=L2min; nbins=(int)(rcut/dr)+1; H=(int*)malloc(sizeof(int)*nbins); for (i=0;i<nbins;i++) H[i]=0; for (i=begin_frame;i<nFrames;i++) update_hist(Traj[i],rcut,dr,H,nbins); nFramesAnalyzed=nFrames-begin_frame;
Note that the cutoff rcut may not exceed half a box length in any dimension. The number of histogram bins is
simply one plus the cutoff divided by the resolution dr. The variable begin_frame allows the caller to gather
statistics only after a certain number of frames in the trajectory in order to “ignore” initial frames where the initial
configuration is still “remembered”.
Finally, once the trajectory has been traversed and the histrogram computed, it is then normalized by compute \(g(r)\) and save the result to a designated output file:
/* Normalize and output g(r) to the terminal */
/* Compute density, assuming NVT ensemble */
fp=fopen(outfile,"w");
fprintf(fp,"# RDF from %s\n",trajfile);
fprintf(fp,"#LABEL r g(r)\n");
fprintf(fp,"#UNITS %s *\n",length_units);
/* Ideal-gas global density; assumes V is constant */
rho=Traj[0]->N/(Traj[0]->Lx*Traj[0]->Ly*Traj[0]->Lz);
for (i=0;i<nbins-1;i++) {
/* bin volume */
vb=4./3.*M_PI*((i+1)*(i+1)*(i+1)-i*i*i)*dr*dr*dr;
/* number of particles in this shell if this were
an ideal gas */
nid=vb*rho;
fprintf(fp,"%.5lf %.5lf\n",i*dr,
(double)(H[i])/(nFramesAnalyzed*Traj[0]->N*nid));
}
fclose(fp);
fprintf(stdout,"%s created.\n",outfile);
(The variable length_units is a string that just labels the length units; by default, this is "sigma", indicating LJ
\(\sigma \).)
Now, let’s execute rdf from one of our 600,000-time-step simulations’ 6,001-frame trajectory, ignoring the
first 1,000 frames (100,000 time steps):
$ cd ~/dxu/chet580/instructional-codes/my_work/mdlj/set1 $ gcc -O5 -o rdf ../../../originals/rdf.c -lm $ ./rdf -t traj-rho0.90-rep0.xyz -dr 0.02 -rcut 3.5 \ -o rdf-rho0.90-rep0.dat -begin-frame 1000 Reading traj-rho0.90-rep0.xyz Read 6001 frames from traj-rho0.90-rep0.xyz. rdf-rho0.90-rep0.dat created. $
Using the python program plot_rdf.py, we can generate a plot of this RDF (Fig. 17:
$ python ../../../originals/plot_rdf.py -i rdf-rho0.90-rep0.dat \ -o rdf-rho0.90-rep0.png
This \(g(r)\) shows a peak at about 2\(^{1/6}\sigma \) that corresponds to the LJ well, indicating that there is a dense
nearest-neighbor shell out to about 1.5 \(\sigma \). How dense? We can use \(g(r)\) to count particles within a
distance \(r\) from a central atom: \begin{equation} \label {eq:md:ungr} n\left (r\right ) = \rho \int _0^r\int _0^{\pi }\int _0^{2\pi } g\left (r^\prime \right ) \left (r^\prime \right )^2\sin \theta dr^\prime d\theta d\phi = 4\pi \rho \int _0^r \left (r^\prime \right )^2g\left (r^\prime \right )dr^\prime \end{equation} This integration is enabled in plot_rdf.py via the -rho and -R
flags:
$ python ../../../originals/plot_rdf.py -i rdf-rho0.90-rep0.dat \ -o rdf-rho0.90-rep0.png -rho 0.9 -R 1.5 n=12.335
This indicates the nearest neighbor shell is pretty well packed; spherical close-packing would be exactly 12.
This Case Study combines elements of Case Studies 5 and 6 in F&S, which are unfortunately incomplete in their description. The purpose of this Case Study is to demonstrate how one computes a self-diffusion coefficient, \(\mathscr {D}\), from an MD simulation of a simple Lennard-Jones liquid. There are two means to computing \(\mathscr {D}\): (1) the mean-squared displacement (“MSD”) \(\left <r^2\right >(t)\), and (2) the velocity autocorrelation function, \({\rm VACF}(t)\). The approaches are equivalent in the sense that the MSD is the integral representation of the VACF; the former is termed the “Einstein” approach, while the latter is the “Green-Kubo” approach [8].
The self-diffusion coefficient governs the evolution of concentration, \(c\), (or number density) according to a generalized transport equation: \begin{equation} \frac {\partial c}{\partial t} = \mathscr {D}\nabla ^2c \end{equation} Einstein showed (details in text) that \(\mathscr {D}\) is related to the mean-squared displacement, \(\left <r^2\right >\): \begin{equation} \label {eq:md:ein} \frac {\partial \left <r^2\right >}{\partial t} = 6\mathscr {D} \end{equation} At long times, \(\mathscr {D}\) should be independent of time; hence \begin{equation} \left <r^2\right > = \lim _{t\rightarrow \infty }6\mathscr {D}t \end{equation} We can compute \(\left <r^2\right >\), and therefore estimate \(\mathscr {D}\), easily using MD simulation. There is, however, a very important consideration concerning periodic boundary conditions. Recall that, during integration, immediately after the position update, we test to see if the update has taken the particle outside of the primary box. If it has, we simply shift the particle’s position by a box length in the appropriate dimension and direction. The displacement of the particle during this step is not a box length, but if you consider just the coordinates as they appear in the output, you would think that it is. It is therefore important that we work with unfolded coordinates when computing mean-squared displacement. This is not adequately explained in the text, so we cover it in some detail here.
“Unfolding” coordinates in a simulation with periodic boundaries requires that we keep track of how many
times each particle has crossed a boundary. The code mdlj.c allows output of unfolded coordinates in the
trajectory output using the -uf switch on the command line. Now, generally the array rx[] always contains the
periodically shifted coordinates, but we can easily generate the unfolded coordinates at any time (say, upon
output) by performing the following operation:
rxu = rx[i]+ix[i]*L;
This is because ix[] contains a tally of the number of times periodic crossings in the \(x\) direction have occurred:
+1 is added to the tally every time a particle’s \(x\) position exceeds \(L\) and is wrapped back in by subtracting \(L\), and -1
is added to the tally every time a particle’s \(x\) position is below 0 and is wrapped back in by adding \(L\). Here, \(L\) is the
box length (assumed cubic).
The program msd.c computes the MSD from a trajectory with unfolded coordinates using a
conventional, straightforward algorithm. The C-code for this algorithm appears below. \(M\) is the number of
“frames” in the trajectory, and \(N\) is the number of particles. \(\left <r^2\right >(t)\) is computed by considering the change in
particle position over an interval of size \(t\). Any frame in the trajectory can be considered an origin
for any interval size, provided enough frames come after it in the trajectory. This means that we
additionally average over all possible time origins. dt is a variable that loops over allowed time
intervals. cnt[] counts the number of time origins for a given interval. sd[] is the array in which we
accumulate squared displacement at each time interval, and has \(M\) elements, one for each allowed
interval.
/* Compute the mean-squared displacement using
the straightforward algorithm */
fprintf(stdout,"# computing MSD...\n");fflush(stdout);
for (t=begin_frame;t<M;t++) {
for (dt=1;(t+dt)<M;dt++) {
cnt[dt]++; /* number of origins for interval length dt */
for (i=0;i<Traj[0]->N;i++) {
sd[dt] += rij2_unwrapped(Traj[t+dt],i,Traj[t],i,1);
}
}
}
The function rij2_unwrapped(fi,i,fj,j,1) very simply computes the squared displacement between
particle i in frame fi and particle j in frame fj:
double rij2_unwrapped ( frametype * fi, int i,
frametype * fj, int j, int com_corr ) {
double dx, dy, dz;
dx=fi->rx[i]-(com_corr?fi->cx:0)-fj->rx[j]+(com_corr?fj->cx:0);
dy=fi->ry[i]-(com_corr?fi->cy:0)-fj->ry[j]+(com_corr?fj->cy:0);
dz=fi->rz[i]-(com_corr?fi->cz:0)-fj->rz[j]+(com_corr?fj->cz:0);
return dx*dx+dy*dy+dz*dz;
}
The parameter com_corr removes the center of mass drift from the displacement; the center of mass should
not move in NVE, but we will use this code for trajectories in which the COM does diffuse. The center of mass
of a frame is part of the frametype data type used in msd.c, and it’s computed when the frame is read
in.
The code fragment below completes the averaging, and outputs the total mean-squared displacement.
fp=fopen(outfile,"w");
fprintf(fp,"# MSD from %s\n",trajfile);
fprintf(fp,"#LABEL time msd\n");
fprintf(fp,"#UNITS %s %s^2\n",time_units,length_units);
for (t=0;t<M-begin_frame;t++) {
sd[t] /= cnt[t]?(Traj[0]->N*cnt[t]):1;
fprintf(fp,"% .5lf % .8lf\n",
t*traj_interval*md_time_step,sd[t]);
}
fclose(fp);
Fig. 18 shows MSD vs time from 60,000-step MD simulations at various densities in which frames are
saved at intervals of 10 time-steps. In these simulations, the density was constant and there were 216 particles,
with velocities initialized at 0.7. The figure shows the data plotted two ways: MSD vs. \(t\) on the left, and MSD/(6\(t\))
vs 1/\(t\) on the right. The program msd can post-process an unwrapped trajectory file to generate the
MSD:
$ ./msd -t traj-rho0.90-rep0.xyz -traj-interval 10 \ -o msd-rho0.90-rep0.dat -begin-frame 1000
Repeating this process for several densities and several replicas per density builds a nice dataset. MSD at
each density was averaged over replicas and plotted using plot_msd.py:
$ python ../../../originals/plot_msd.py -i msd-rho0.50-mean.dat \ -i msd-rho0.60-mean.dat -i msd-rho0.70-mean.dat \ -i msd-rho0.80-mean.dat -i msd-rho0.90-mean.dat \ -o msd-rho-T0.70.png -lowt 1 msd-rho0.50-mean.dat 0.18750243355919102 msd-rho0.60-mean.dat 0.17188252756031633 msd-rho0.70-mean.dat 0.11801146782479009 msd-rho0.80-mean.dat 0.08188412712076683 msd-rho0.90-mean.dat 0.0645241901384243
The parameter -lowt is the lower time limit beyond which the data is fit to calculate \(\mathscr {D}\). Values of \(\mathscr {D}\) are output
here.
You can see that the MSD transitions from a short-time regime where MSD \(\propto \) t\(^2\) to a long-time regime where MSD \(\propto \) t. That short-time region displays “ballistic” behavior, and on those time scales particles move ballistically (with constant velocity) between collisions with other particles; you can see by the value of MSD of about 0.02 that they are moving only about 0.1 particle diameters or so before colliding. On the longer, “diffusive” timescales, we can see the expected behavior.
The velocity autocorrelation function route to the diffusion constant begins with the realization that one can reconstruct the displacement of a particle over a time interval \(t\) by simply integrating its velocity: \begin{equation} \Delta {\bf r} = \int _0^t {\bf v}(t^\prime )dt^\prime \end{equation} So, the mean squared displacement can be expressed
The third equality arises because we can swap \(t^\prime \) and \(t^{\prime \prime }\). The quantity \( \left <{\bf v}(t^\prime )\cdot {\bf v}(t^{\prime \prime })\right >\) is the velocity autocorrelation function. This is an example of a Green-Kubo relation; that is, a relation between a transport coefficient, and an autocorrelation function of a dynamical variable. Eq. 118 then leads to \begin{equation}\label {eq:gkD} \mathscr {D} = \frac {1}{3}\int _0^\infty \left <{\bf v}(t^\prime )\cdot {\bf v}(t^{\prime \prime })\right > dt \end{equation} So, the second route to computing \(\mathscr {D}\) requires that we numerically integrate \(\left <{\bf v}(t^\prime )\cdot {\bf v}(t^{\prime \prime })\right >\) out to very large times. How large? First, let’s try to understand the behavior of \(\left <{\bf v}(t^\prime )\cdot {\bf v}(t^{\prime \prime })\right >\).
In three dimensions, we compute this by computing the components and adding them together, as we did for mean-squared displacement: \begin{equation} \left <{\bf v}(t^\prime )\cdot {\bf v}(t^{\prime \prime })\right > = \left <v_x(t^\prime )v_x(t^{\prime \prime })\right >+ \left <v_y(t^\prime )v_y(t^{\prime \prime })\right >+ \left <v_z(t^\prime )v_z(t^{\prime \prime })\right > \end{equation} The code to compute the VACF is essentially identical to that for the MSD, with the exception that the quantity we accumulate is the dot product of velocity vectors:
/* Compute velocity dot product between particles i and j in
frame fi and fj, respectively; com_corr removes center of
mass motion */
double vij2 ( frametype * fi, int i, frametype * fj, int j,
int com_corr ) {
double dx, dy, dz;
dx=(fi->vx[i]-(com_corr?fi->cvx:0))*(fj->vx[j]-(com_corr?fj->cvx:0));
dy=(fi->vy[i]-(com_corr?fi->cvy:0))*(fj->vy[j]-(com_corr?fj->cvy:0));
dz=(fi->vz[i]-(com_corr?fi->cvz:0))*(fj->vz[j]-(com_corr?fj->cvz:0));
return dx+dy+dz;
}
Fig. 19 shows the VACF for the same simulation we showed for the MSD above. The right panel is a zoom in
by a factor of 20, which allows us to resolve the part of the VACF that dips below zero at short times; this is the
same time scale on which we have ballistic motion. The negative VACF indicates “bounce-back” from
collisions. That figure was generated using plot_vacf.py, which also applies Eq. 121 using
scipy.integrate.simpson() to compute \(\mathscr {D}\)’s:
$ python ../../../originals/plot_vacf.py -i vacf-rho0.50-mean.dat \ -i vacf-rho0.60-mean.dat -i vacf-rho0.70-mean.dat \ -i vacf-rho0.80-mean.dat -i vacf-rho0.90-mean.dat \ -o vacf-rho-T0.70.png -z 20 vacf-rho0.50-mean.dat 0.16226338888888892 vacf-rho0.60-mean.dat 0.16724698888888884 vacf-rho0.70-mean.dat 0.11846137777777778 vacf-rho0.80-mean.dat 0.08435750000000004 vacf-rho0.90-mean.dat 0.0705403444444444
These values agree only weakly with the values computed via fitting to MSD, but either is considered “correct”.