In this section, I’ll illustrate a few examples for using two of the most popular MD packages for academic research: Gromacs [39] and NAMD [40]. There are many other packages in use, and I in no way mean to say these are any better or worse, just that they are popular.
The Gromacs source code is available officially from www.gromacs.org, though some Linux distributions offer pre-compiled versions. In most high-performance computing settings, Gromacs is compiled from source code in order to link in hardware-specific libraries for things like internode communication and compiler-specific math libraries. For this introductory survey though, we can just run the precompiled version on a laptop. (If you have macOS, you might have to compile Gromacs from source.) In Ubuntu under WSL:
$ sudo apt install gromacs
Gromacs is a suite of tools that include an MD engine along with tools for system preparation and simulation analysis. All tools are invoked using the pre-command ‘gmx‘. ‘gmx -help‘ will give a lot of information.
Before proceeding with a couple of practical examples, I must convey the importance of reading the documentation if you want to use Gromacs in your own research. The official Gromacs documentation is extensive, but accessible to beginners. The official tutorials by Justin Lemkul are also a must if you want to learn how to use Gromacs.
Like any MD simulation, using Gromacs breaks down into three main steps:
As may be inferred, the first step is often the most difficult. It usually requires a lot of care and thought to generate an initial condition for MD. Here we’ll consider just two test cases for which this is not so difficult, but which illustrate the workflow.
Here I illustrate a workflow for generating and simulating a small box of water molecules.
First, we can use ‘gmx insert-molecules‘ to generate a system with waters randomly positioned inside a box:
$ gmx insert-molecules -ci /usr/share/gromacs/top/spc216.gro \ -nmol 100 -box 3 3 3 -o water_box.gro
The -box switch allows us to specify a box that is 3x3x3 nm\(^3\), and -ci refers to a special file containing
template atomic coordinates for a single water molecule. The -nmol switch asks Gromacs to try to insert more
molecules than a simple volume/density calculation would suggest (not many more can be put in). This creates
the file water_box.gro, which for me contains 432 water molecules, as depicted using VMD in
Fig. 32.
Next, we need to use pdb2gmx to generate the Gromacs topology file.
$ gmx pdb2gmx -f water_box.gro -o new_water_box.gro -p topol.top
In running this command, I selected the CHARMM27 force field (which won’t matter since we have nothing but water here) and the TIP3P water model. We now have the topology file and a new gromacs coordinate file (which just renames the water molecules to HOH to conform to the TIP3P naming scheme).
Now, using the minim.mdp parameter file given, we can build and run the energy minimization:
$ gmx grompp -f minim.mdp -c new_water_box.gro -p topol.top -o min.tpr $ gmx mdrun -v -deffnm min
This will create a lot of output files that all begin with min. One of them contains all the energy-like data:
min.edr. This file is binary, so the tool gmx energy is used to extract data from it:
$ gmx energy -f min.edr
This will create energy.xvg with column-oriented time-series of whatever data is selected interactively.
Fig. 33 shows what the potential energy looks like for this minimization.
(This is a very, very minimized system; the initial water box had no overlaps really at all.) Now we can run
the NVT simulation using the nvt.mdp parameter file:
$ gmx grompp -f nvt.mdp -c min.gro -p topol.top -o nvt.tpr $ gmx mdrun -v -deffnm nvt
Fig. 34 shows a plot of the energies vs time for this short, short simulation.
Now let’s try using the output configuration from this NVT simulation as an input for an NPT simulation.
$ gmx grompp -f npt.mdp -c nvt.gro -t nvt.cpt -p topol.top -o npt.tpr $ gmx mdrun -v -deffnm npt
Fig. 35 shows a plot of the density vs time for this short, short simulation.
As a quick example of how to build a protein simulation system, we can consider a very recent example from the Protein Data Bank. The SARS-CoV-2 spike glycoprotein complex is an enormous protein, but a really important part of it are the domains that bind to the ACE2 receptors on the surfaces of epithelial cells. These are called Receptor Binding Domains (RBDs). A lot of recent structural biology has gone into understanding the details of the RBD-ACE2 interface. As an example, take a look at the PDB entry 7c8j, which is the X-ray crystallographic structure of a recombinant construct of the SARS-CoV-2 RBD and the ACE2 ectodomain of the bat [41]. We can use simulations to answer a lot of interesting questions about this structure, but let’s just use it now as a source of coordinates to run a simulation of just the RBD alone.
The first thing to do is to download the PDB file for this entry; there are several ways to do this, but I like to use an interactive VMD session and just put 7c8j in the new molecule file browser. Once VMD has it loaded, the following TcL command in the terminal will create the stripped-down PDB file for just the RBD:
[atomselect top "chain B"] writepdb my_7c8j.pdb
Now we can pretty much follow Justin Lemkul’s lysozyme tutorial here:
# generate the topology; use OPLS-AA force field (sel. 15) $ gmx pdb2gmx -f my_7c8j.pdb -o my_7c8j_processed.pdb -water spce # enlarge box $ gmx editconf -f my_7c8j_processed.pdb -o my_7c8j_newbox.gro \ -c -d 1.0 -bt cubic # solvate $ gmx solvate -cp my_7c8j_newbox.gro # copy JK’s ions.mdp; add ions (replace group 13) $ gmx grompp -f ions.mdp -c my_7c8j_solv.gro -p topol.top \ -o ions.tpr $ gmx genion -s ions.tpr -o my_7c8j_solv.gro -p topol.top \ -pname NA -nname CL -neutral # minimize potential energy using JK’s minim.mdp $ gmx grompp -f minim.mdp -c my_7c8j_solv.gro -p topol.top -o em.tpr $ gmx mdrun -v -deffnm em # have a look at the potential energy $ gmx energy -f em.edr -o potential.xvg # copy JK’s nvt.mdp; run NVT MD for 50,000 steps -- this will take a few hours... $ gmx grompp -f nvt.mdp -c em.gro -r em.gro -p topol.top -o nvt.tpr $ gmx mdrun -v -deffnm nvt
This system has about 60,000 atoms. I show a couple of views from VMD in Fig. 36. For such a large system, we won’t be able to run very long on a laptop; 100 ps takes about 3 hours on mine. Note that we’d typically want to simulate for hundreds of nanoseconds, which would be several thousand hours on my laptop, or a a day or so on a supercomputer. After about 30 ps (an hour), I went ahead and made a plot of the energies (Fig. 37).
The NAMD source code and many precompiled binaries are available from the official download site at the
University of Illinois. Like Gromacs, NAMD is also available in many supercomputing centers. Compiling NAMD
from source is not for the faint-hearted, so I won’t cover that here; fortunately for us, the standard
‘Linux-multicore‘ executable will work just fine in Ubuntu on WSL. (If you have a mac, you’ll have to download a
mac-specific executable.) For now, just download the tarball, extract it, and copy the namd2 executable to
/usr/local/bin/:
$ tar zxf NAMD_2.14_Linux-x86_64-multicore.tar.gz $ cd NAMD_2.14_Linux-x86_64-multicore/ $ sudo cp namd2 /usr/local/bin/
If you have not done so already, download the Linux version of VMD and install it inside WSL as well – we need to use that rather than the native Windows version here.
To build a system to simulation with NAMD with the CHARMM force field, one must use the psfgen utility of
VMD. This will generate an input PDB file along with a PSF file that contains all the topological information. I
cannot stress enough how important it is to use the psfgen manual; I’m only showing a little bit of
here.
Let’s revisit the SARS-CoV-2 S RBD for this example. Starting with the same coordinates extracted
from the PDB entry and put into my_7c8j.pdb, we use a series of VMD scripts to build a solvated
system.
First is mkpsf.tcl (it’s name is not important; what’s important is what is inside it). This script
contains TcL commands that implement PSF generation using VMD as an interpreter. To run
it:
$ vmd -dispdev text -e mkpsf.tcl
This generates my_7c8j.psf which is an X-PLOR Protein Sturcture Format (PSF) file, along with the PDB
file my_78cj_processed.pdb, which contains all the missing H’s. It is customary to run a short energy
minimization in vacuum just in case some of those H’s are a little too close to each other by accident. As with
Gromacs, this is done by invoking the main MD program with an input configuration file requesting a
minimization. Here, this is vac.namd. This can be run:
$ namd2 vac.namd >& vac.log & $ tail -f vac.log
The tail command allows you to watch the simulation write to vac.log as it runs. Once this finishes, the
next step is to solvate and neutralize. This is accomplished with another script, solv.tcl, which uses both the
solvate and autoionize VMD packages:
$ vmd -dispdev text -e solv.tcl
Similar to the editconf in Gromacs, solv.tcl computes and saves the resulting box dimensions, here in
a file called cell.inp. Once this is done, we see the new PSF/PDB pair my_7c8j_i.psf and
my_7c8j_i.pdb that contain the protein and all the solvent. These files, along with cell.inp
and the CHARMM parameter files are now inputs for a 100-ps NPT MD simulation described by
prod.namd.
$ namd2 +p8 prod.namd >& prod.log & $ tail -f prod.log
This takes about 1.5 h on my Dell Latitude 7400 2-in-1 Intel Core-I7 laptop on WSL2 running Ubuntu. I ran
this in a directory called instructional-codes/my_work/namd, so the directory originals is two up from
the run directory. Fig. 38 shows plots of energies, temperature, and pressure, vs time-step from this MD
simulation, generated by the following four python commands:
$ python3 ../../originals/plot_mdlj_log.py -fmt NAMD \ -logs prod.log -d 11 14 -ylabel "Temperature (K)" -o temp.png $ python3 ../../originals/plot_mdlj_log.py -fmt NAMD \ -logs prod.log -d 18 -ylabel "Pressure (bar)" -o pres.png \ -ylim -500 500 -every 100 $ python3 ../../originals/plot_mdlj_log.py -fmt NAMD \ -logs prod.log -d 9 10 12 -ylabel "Energy (kcal/mol)" \ -ylim -150000 100000 -o kin-pot-tot.png $ python3 ../../originals/plot_mdlj_log.py -fmt NAMD \ -logs prod.log -d 1 2 3 4 -ylabel "Potential energy (kcal/mol)" \ -o bonded.png
It should be stressed that system generation in both Gromacs and NAMD requires surmounting a significant learning curve if one wants to venture from only the simplest systems shown here. The Gromacs workflow is (to me) a bit more uniform than NAMD, which requires essentially a bunch of script-writing. (VMD offers an automated PSF builder that also works for simple systems.)