Thermodynamic ensembles¶
Different macroscopic constraints lead to different types of ensembles, with particular statistical characteristics:

Image by Nzjacobmartin - Own work, CC BY-SA 4.0, https://
\textbf{Microcanonical ensemble (or NVE ensemble):} The total energy of the system, its volume and the number of particles in the system are each fixed to particular values.

\textbf{Canonical ensemble (or NVT ensemble):} The energy is not known exactly but the number of particles is fixed. In place of the energy, the temperature is specified. The system is in weak thermal contact with other systems that are described by ensembles with the same temperature (heat bath).

\textbf{Isobaric-isothermal ensemble (or NPT ensemble):} The energy and volume are not known exactly but the number of particles is fixed. In place of the energy, the temperature is specified; in place of the volume, the pressure is specified. The system is in contact with a heat bath, as well as a “pressure bath”.
Thermodynamic results can be transformed between ensembles only for infinitely large systems (in the thermodynamic limit). For finite systems, it is therefore desirably to directly sample in the ensemble of interest:
Constant temperature: Explore behavior of system as a function of temperature (unfolding of a protein, glass formation), simulated annealing (gradual reduction of the temperature), etc
Constant temperature and pressure: Modeling of ambient conditions, where the pressure is also constant (which is true for most lab experiments)
What does constant temperature (or pressure) mean?¶
Temperature: On the molecular level, the temperature of a system is defined by the average kinetic energy of all the particles (atoms, molecules) that make up the system.
Statistical mechanical point of view: Impose a temperature on a system by bringing it into thermal contact with a large heat bath.
Then, the probability of finding the system in a given energy state is given by the Boltzmann distribution and the velocities are described by the Maxwell-Boltzmann distribution. Thus, the particles won’t all have the same velocity but the velocities will follow a distribution that depends on their mass and the temperature of the system.

Important! Constant temperature kinetic energy per particle is constant
I.e. the relative variance of the kinetic energy per particle in a canonical ensemble is NOT zero, and the instantaneous temperature (obtained via the kinetic energy per particle) fluctuates with a magnitude given by the heat capacity.
Velocity-rescaling (1971)¶
The temperature is related to the total kinetic energy:
To alter the temperature at timestep , we can therefore scale the velocities by a factor , which yields the change in temperature :
To change the current temperature to a desired temperature, we can therefore multiply the velocities at each time step by \[0.5cm]
Advantages
Simple, easy to understand
Straightforward to implement
Disadvantages
Discontinuities in the momentum part of the phase space trajectory
cannot fluctuate
Does not generate rigorous canonical averages
Artificially prolongs any temperature differences among the components of the system (since only the average temperature is controlled)
Berendsen thermostat (1984)¶
Couple the system to an external heat bath that is fixed to a desired temperature, which supplies or removes heat from the system. At each timestep, scale the velocities such that the rate of temperature change is proportional to the difference in temperature between the bath and the system:
with the coupling parameter (strong coupling for small ). For one timestep, we therefore get
with the scaling factor
Small : Berendsen = velocity-rescaling! Usually: 0.4ps for 1fs timestep ()
Advantages
Simple, easy to understand
Straightforward to implement
Much smaller discontinuities in the momentum part of the phase space trajectory than velocity-rescaling
can fluctuate (depending on the value of )
Disadvantages
Does not generate rigorous canonical averages
Artificially prolongs any temperature differences among the components of the system (since only the average temperature is controlled)
Stochastic rescaling (2007)¶
Extension of velocity-rescaling: Reformulate velocity-rescaling in terms of kinetic energy (instead of temperature):
Instead of forcing to be exactly , the target value is selected stochastically from the canonical equilibrium distribution of the kinetic energy
with being the nummber of degrees of freedom. The same procedure can be applied to the Berendsen thermostat which yields a smoother trajectory than the formulation above.
Advantages
Simple, easy to understand
Straightforward to implement
Generates correct canonical averages
can fluctuate
Disadvantages
Stochastic, i.e. trajectories cannot be reproduced exactly
Andersen thermostat (1980)¶
Stochastic collisions method where a particle is randomly chosen at intervals and its velocity reassigned by random selection from the Maxwell-Boltzmann distribution.
This is equivalent to a heat bath that randomly emits particles that collide with atoms of the system. Between each collision, the system is simulated with constant energy, so that we obtain a series of short NVE simulations (actually, a Markov chain).\[0.5cm]
The mean rate of collisions determines the strength of the coupling to the heat bath, where successive collisions occur at time intervals of the Poisson distribution
where is the probability that the next collision will take place between and . For one timestep, the probability of a collision is therefore . A too low causes the system to not sample from a canonical distribution. A too high leads to too little fluctuations in .
Advantages
Generates correct canonical averages
Disadvantages
Dynamic properties cannot be computed correctly, since the dynamics is not physical (randomly decorrelates velocities)
Nos'e-Hoover thermostat (1985)¶
Make the thermal reservoir an integral part of the system by introducing an additional degree of freedom, , with additional, artificial coordinates and velocities. The states of the simulated extended system (coordinates , momenta , time ) correspond to the states of the real system (, , time ):
Total energy then gets two additional terms:
Potential energy of the reservoir: where is the degrees of freedom of the physical system
Kinetic energy of the reservoir: with being the fictitious mass of the extra degree of freedom ( therefore influences the strength of the coupling).
We then simulate in the microcanonical ensemble for the extended system (which produces the canonical ensemble in the physical system). The real velocity is . We can thus think of the transformation as a timescaling by
Advantages
Generates correct canonical averages
One of the most accurate and efficient methods for constant-temperature MD
Disadvantages
Difficult to understand
More difficult to implement than other thermostats
can change, so that regular time intervals in the extended system correspond to a trajectory of the real system with unevenly spaced time intervals (but there is a method to arrive at evenly spaced intervals). The time evolution of depends on .
Only correct if there is no external force, and the center of mass remains fixed. Possible remedy: A chain of thermostats (Nos'e-Hoover chains with different masses)
Can exhibit non-ergodic or oscillatory behavior. Possible remedy: A chain of thermostats (Nos'e-Hoover chains with different masses)
Langevin dynamics¶
Temperature is roughly regulated through the addition of friction and random forces (simulates interaction with an artificial solvent) to the equations of motion. The friction coefficient determines how strong the system is coupled to the heat bath.
Friction term removes energy (like the frictional drag of the system moving through a solvent)
Random term adds energy (like collisions/interactions with solvent molecules)
Advantages
Allows for large timesteps
Disadvantages
Not deterministic
Changes dynamics (usually slows down dynamics)
Overview of thermostats¶
The optimal choice of thermostats depends on the properties of interest, as well as the details of the system:

Let us simulate a box of water using the program OpenMM, once with the Langevin thermostat, and once with the Nose-Hover thermostat to showcase this behavior:
from openmm.app import *
from openmm import *
from openmm.unit import *
import MDAnalysis as md
import MDAnalysis.transformations
import MDAnalysis.analysis
import MDAnalysis.analysis.msd as msd
from sys import stdout
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import linregress
import subprocessname = "water_tip3p_25_0"
subprocess.run(["wget", "https://raw.githubusercontent.com/hesther/teaching/main/scm_exercise3/"+name+".pdb"],stdout = subprocess.DEVNULL,stderr = subprocess.DEVNULL)
frictionCoeff = 1 / picosecond
time_step = 0.002 * picosecond
total_time = 50 * picosecond
equ_time = 10 * picosecond
total_steps = total_time / time_step
total_steps_eq = equ_time / time_step
temperature = 300 * kelvin
pdb = PDBFile("water_tip3p_25_0.pdb")
ff = ForceField('amber14/tip3p.xml')
system = ff.createSystem(pdb.topology, nonbondedMethod=PME, nonbondedCutoff=1*nanometer, constraints=HBonds)
integrator = NoseHooverIntegrator(temperature, frictionCoeff, time_step)
simulation = Simulation(pdb.topology, system, integrator)
simulation.context.setPositions(pdb.positions)
#Print progress to screen
simulation.reporters.append(StateDataReporter(stdout, 1000, step=True, potentialEnergy=True, kineticEnergy=True, totalEnergy=True,\
temperature=True, progress=True, volume=True, totalSteps = total_steps+total_steps_eq))
print('Equilibrating')
simulation.step(total_steps_eq)
print('Running NVT')
#Save trajectory to file
simulation.reporters.append(DCDReporter('test.dcd', 100))
simulation.step(total_steps) Equilibrating
#"Progress (%)","Step","Potential Energy (kJ/mole)","Kinetic Energy (kJ/mole)","Total Energy (kJ/mole)","Temperature (K)","Box Volume (nm^3)"
3.3%,1000,-21856.424644286493,2895.4010735348693,-18961.023570751626,231.92587806012386,14.987010811375008
6.7%,2000,-21146.143130833963,3436.9990605936628,-17709.1440702403,275.30867219263195,14.987010811375008
10.0%,3000,-20261.0069528173,3689.691113866752,-16571.315838950548,295.549676753229,14.987010811375008
13.3%,4000,-19894.61027198539,3891.1024081347537,-16003.507863850638,311.68301720865094,14.987010811375008
16.7%,5000,-20241.03531199203,3870.1722892011708,-16370.863022790858,310.0064839449371,14.987010811375008
Running NVT
20.0%,6000,-20158.366571957926,3578.107825626092,-16580.258746331834,286.6116914984068,14.987010811375008
23.3%,7000,-19989.221570546488,3724.5226900457906,-16264.698880500697,298.33973715742684,14.987010811375008
26.7%,8000,-19949.113346631388,3863.109629117341,-16086.003717514046,309.44075450028276,14.987010811375008
30.0%,9000,-20270.30658965812,3814.4194163307175,-16455.887173327403,305.54059695157906,14.987010811375008
33.3%,10000,-20135.437633092264,3666.108266977777,-16469.329366114485,293.66065608459826,14.987010811375008
36.7%,11000,-20105.28460031257,3928.3980454826587,-16176.88655482991,314.6704525310965,14.987010811375008
40.0%,12000,-20005.126865918497,3650.7929502138804,-16354.333915704618,292.4338767203497,14.987010811375008
43.3%,13000,-20133.169081265787,3689.64411794444,-16443.524963321346,295.5459123108355,14.987010811375008
46.7%,14000,-19975.323803479532,3784.03309172162,-16191.290711757913,303.10660772625175,14.987010811375008
50.0%,15000,-20088.734097297052,3775.8174525893505,-16312.916644707702,302.4485229665986,14.987010811375008
53.3%,16000,-20269.58948903785,3626.9193034224986,-16642.670185615352,290.5215625524791,14.987010811375008
56.7%,17000,-20158.688612515787,3722.929699121193,-16435.758913394595,298.212136244964,14.987010811375008
60.0%,18000,-19966.522676522593,3685.4969963569242,-16281.025680165669,295.21372178137887,14.987010811375008
63.3%,19000,-20108.304526860575,3862.9508534527536,-16245.35367340782,309.4280363363777,14.987010811375008
66.7%,20000,-20354.81327634559,3770.3583907853026,-16584.454885560288,302.011244099135,14.987010811375008
70.0%,21000,-20154.828950936655,3844.0561224792614,-16310.772828457393,307.91454063735637,14.987010811375008
73.3%,22000,-20152.49752813087,3789.728694230308,-16362.768833900562,303.56283385153125,14.987010811375008
76.7%,23000,-20134.804652268747,3647.294743576878,-16487.50990869187,292.1536652313997,14.987010811375008
80.0%,24000,-19880.754648740152,3770.1736711380727,-16110.58097760208,301.99644778412,14.987010811375008
83.3%,25000,-20046.44679656134,3699.073268196369,-16347.373528364971,296.30120109328715,14.987010811375008
86.7%,26000,-20135.94130163894,3667.777393633094,-16468.163908005845,293.79435558090165,14.987010811375008
90.0%,27000,-20014.16271978126,3915.5063742040525,-16098.656345577207,313.6378107294883,14.987010811375008
93.3%,28000,-20167.007158810953,3794.541639773972,-16372.465519036981,303.94835785767765,14.987010811375008
96.7%,29000,-20173.26461749778,3652.560923600787,-16520.703693896994,292.5754939850158,14.987010811375008
100.0%,30000,-19936.76081901298,3805.3661582262475,-16131.394660786733,304.81541768215897,14.987010811375008
u = md.Universe("water_tip3p_25_0.pdb", "test.dcd")
u_unwrapped = md.Universe('water_tip3p_25_0.pdb','test.dcd')
transformation = md.transformations.nojump.NoJump()
u_unwrapped.trajectory.add_transformations(transformation)
MSD = msd.EinsteinMSD(u_unwrapped, select='type O', msd_type='xyz', fft=True)
MSD.run()
#Fit to linear model and obtain the diffusion coefficient in 10^-9 m^2/s from the slope
start_index = int(0.1*len(MSD.times))
end_index = int(0.8*len(MSD.times))
linear_model = linregress(MSD.times[start_index:end_index], MSD.results.timeseries[start_index:end_index])
D = linear_model.slope * 1/(2*MSD.dim_fac) * 10
#Plot MSD and fit function
plt.plot(MSD.times,linear_model.slope*MSD.times+linear_model.intercept, label='linear fit')
plt.plot(MSD.times,MSD.results.timeseries, label='mean square displacement')
plt.axvline(MSD.times[start_index], linestyle='--',color='gray')
plt.axvline(MSD.times[end_index], linestyle='--',color='gray')
plt.xlabel("t (ps)")
plt.ylabel("MSD ($A^2$)")
plt.title("Diffusion coefficient = "+str(np.round(D, 2))+"$\cdot 10^{-9}$ m$^2$/s")
plt.legend()<>:22: SyntaxWarning: invalid escape sequence '\c'
<>:22: SyntaxWarning: invalid escape sequence '\c'
/tmp/ipykernel_88/1112675260.py:22: SyntaxWarning: invalid escape sequence '\c'
plt.title("Diffusion coefficient = "+str(np.round(D, 2))+"$\cdot 10^{-9}$ m$^2$/s")
/opt/conda/lib/python3.12/site-packages/MDAnalysis/coordinates/DCD.py:171: DeprecationWarning: DCDReader currently makes independent timesteps by copying self.ts while other readers update self.ts inplace. This behavior will be changed in 3.0 to be the same as other readers. Read more at https://github.com/MDAnalysis/mdanalysis/issues/3889 to learn if this change in behavior might affect you.
warnings.warn("DCDReader currently makes independent timesteps"
0%| | 0/501 [00:00<?, ?it/s]100%|██████████| 501/501 [00:00<00:00, 5876.02it/s]

Exercise: Repeat the simulation but choose the “LangevinIntegrator” instead of the “NoseHooverIntegrator”. Use the same settings for temperature, frictionCoeff, and time_step as above. Compute the diffusion coefficient. Do you still get the correct value? Why/why not?
Constant pressure¶
A macroscopic system maintains constant pressure by changing its volume. Similarly, a simulation maintains constant pressure by changing the volume of the simulation cell.
The magnitude of volume fluctuations is related to the isothermal compressibility , where an easily compressible substance undergoes larger pressure fluctuations (similar to the heat capacity being related to energy fluctuations).
Important: The magnitude of volume fluctuations magnitude of temperature fluctuations:
Ideal gas (atm), simulation in a cube with nm (nm) at 300K, the fluctuation in the volume is 18.1nm (fluctuation in the cell length of ~nm)! This is therefore not an appropriate simulation size!
Water (atm), simulation in a cube with nm (~nm) at 300K, the fluctuation in the volume is 0.121nm (fluctuation in the cell length of nm)
Many barostats work very similar to the thermostats already discussed:
Volume-scaling (not exactly NPT)
Berendsen-barostat 1984 (not exactly NPT): Coupling to “pressure bath”,
with the coupling parameter (strong coupling for small ), which determines the scaling factor scaling the volume and thus the atomic coordinates as
Stochastic-cell rescaling 2020: Like Berendsen, but additional stochastic term for the rescaling factor that causes the local pressure fluctuations to be correct for the NPT ensemble.
Andersen 1980/Nos'e-Hoover 1986 barostat: Add extra degree of freedom corresponding to the volume of the box (similar to Nos'e-Hoover thermostat), with potential energy and kinetic energy of the piston of with being the fictitious mass of the piston (small mass yields rapid oscillations). The coordinates in the extended system are then related to the real coordinates by
Monte-Carlo barostat 2002: Generate a random volume change from a uniform distribution followed by evaluation of the potential energy of the trial system. The volume move is then accepted with the standard Monte-Carlo probability.
Let us use the MonteCarloBarostat in OpenMM and inspect the volume fluctuations:
frictionCoeff = 1 / picosecond
time_step = 0.002 * picosecond
total_time = 50 * picosecond
total_steps = total_time / time_step
temperature = 300 * kelvin
pressure = 1*bar
pdb = PDBFile(name+".pdb")
ff = ForceField('amber14/tip3p.xml')
system = ff.createSystem(pdb.topology, nonbondedMethod=PME, nonbondedCutoff=1*nanometer, constraints=HBonds)
integrator = LangevinIntegrator(temperature, frictionCoeff, time_step)
simulation = Simulation(pdb.topology, system, integrator)
simulation.context.setPositions(pdb.positions)
barostat = MonteCarloBarostat(1*bar, temperature)
system.addForce(barostat)
simulation.context.reinitialize(preserveState=True)
print("Running NPT")
#Print volume to file
simulation.reporters.append(StateDataReporter('volume.dat', 50, volume=True))
#Print progress to screen
simulation.reporters.append(StateDataReporter(stdout, 1000, step=True, potentialEnergy=True, kineticEnergy=True, totalEnergy=True,\
temperature=True, progress=True, volume=True, totalSteps = total_steps))
simulation.step(total_steps)
volumes = np.loadtxt("volume.dat")
#Get volume from last 25% of data
L = (np.mean(volumes[int(len(volumes)*3/4):]))**(1/3)
N = pdb.topology.getNumResidues()
density = N * 18.015 / (6.022 * L**3 * 100)
plt.plot(volumes)
plt.axhline(L**3)
plt.xlabel("timestep (multiples of 0.1ps)")
plt.ylabel("Volume [nm^3]")
plt.title("Density = "+str(np.round(density,2))+"g/cm^3")Running NPT
#"Progress (%)","Step","Potential Energy (kJ/mole)","Kinetic Energy (kJ/mole)","Total Energy (kJ/mole)","Temperature (K)","Box Volume (nm^3)"
4.0%,1000,-20976.49071388003,3557.796674833462,-17418.694039046568,284.9847384917751,14.889543861077373
8.0%,2000,-20572.797009331574,3713.7273190147325,-16859.069690316843,297.47501208418987,14.990222929329795
12.0%,3000,-19959.2762203814,3698.66091431858,-16260.615306062822,296.26817094210986,15.157909589337772
16.0%,4000,-20121.62617528876,3671.60372013867,-16450.022455150087,294.100849953188,15.353527118811876
20.0%,5000,-20313.527235514328,3606.6774723517483,-16706.84976316258,288.9001621573701,15.338963449042733
24.0%,6000,-20246.974416434514,3714.4251176310195,-16532.549298803493,297.5309067781135,15.056768137295073
28.0%,7000,-20052.834073822534,3741.8413311700165,-16310.992742652517,299.72698574493967,15.13729478307794
32.0%,8000,-19997.55350858996,3616.616401375936,-16380.937107214024,289.69628496812044,14.925200511373337
36.0%,9000,-20009.76221139189,3677.6782512745895,-16332.083960117303,294.5874288179882,15.231744008131662
40.0%,10000,-20111.82541724439,3740.2808485353376,-16371.54456870905,299.6019887942396,15.53605086606831
44.0%,11000,-19963.22922607664,3692.475762140519,-16270.753463936122,295.77273116938056,15.51898949881053
48.0%,12000,-19849.523694441927,3634.503948914906,-16215.01974552702,291.12910379492735,15.263784222108496
52.0%,13000,-20063.364380818926,3811.845068289853,-16251.519312529073,305.33438789292137,15.142583734203644
56.0%,14000,-20227.556086682333,3622.3534020013626,-16605.20268468097,290.15582714334624,15.224511879678289
60.0%,15000,-20162.16205431321,3746.2159166650013,-16415.946137648207,300.07739646742885,14.924064861873626
64.0%,16000,-20216.6934605092,3708.094221256579,-16508.59923925262,297.02379268122013,15.104960781191435
68.0%,17000,-19959.920874538744,3877.4413865381966,-16082.479488000547,310.5887493167379,15.201708449569407
72.0%,18000,-19767.98920094813,3735.872386079661,-16032.11681486847,299.2488644774486,15.420782981060865
76.0%,19000,-19909.91083814908,3876.9707679048156,-16032.940070244265,310.5510520730724,15.433838415723763
80.0%,20000,-19644.91729890352,3625.7576003943122,-16019.15969850921,290.4285084338909,15.445554573657391
84.0%,21000,-20029.790395930857,3789.7953592284503,-16239.995036702407,303.568173815784,15.747807236295671
88.0%,22000,-19986.148885117524,3715.6030788695525,-16270.54580624797,297.62526320322667,15.58790969819987
92.0%,23000,-19793.34566112021,3738.2045344902813,-16055.141126629927,299.435672990039,15.556896818278029
96.0%,24000,-19895.46727241807,3932.5726315814363,-15962.894640836636,315.0048430082384,15.333583607407597
100.0%,25000,-19773.733144910075,3608.2623278111473,-16165.470817098927,289.0271114071232,15.319915445947444
