GROMACS Tutorial - TMD with NeqPCA

Introduction
System Building
- Generate Topology
- from Solvation to Equilibration
Create trajectories
PCA for TMD

Introduction

首先简单介绍一下TMD模拟，类似于SMD模拟（可以参考这篇教程），TMD 通过pull_coord1_type = constraint的设置，使用恒定拉力强制配体与蛋白质发生解结合，最终效果与SMD类似。

本教程基于 Multisecond ligand dissociation dynamics from atomistic simulations ¹ 第二个模拟体系 benzamidine-trypsin 编写，意在指导读者运行蛋白质-配体的模拟体系，以及 TMD（牵引动力学模拟）和后续 PCA 分析的进行。

[1] Wolf, S., Lickert, B., Bray, S. et al. Multisecond ligand dissociation dynamics from atomistic simulations. Nat Commun 11, 2918 (2020). https://doi.org/10.1038/s41467-020-16655-1

System Building

Generate Topology

首先我们需要获取 benzamidine-trypsin 晶体结构，前往RCSB下载3PTB的结构。随后将 trypsin 蛋白单独储存为 protein.pdb，将配体加氢后单独储存为 ligand.mol2（但是因为3PTB里面给的配体结构比较奇葩，加氢后氢的数量不一定对，因此建议前往PubChem下载3D结构的 .sdf 文件；然后手动对其到原配体位置后转存为 .mol2 文件，这一步可以在 PyMOL 里面完成，主要是为了后续合并文件时方便）。

对于配体文件：你可以通过你喜欢的方式获得它的参数，也可以按照这篇博文的方法完成，本文默认使用 Gaussain+Multiwfn+sobtop 先计算RESP后创建GAFF力场参数的方法创建配体参数。

计算RESP电荷，运行（<path_to_tut>是教程解压后的文件夹）：

cd <path_to_tut>/lig_para

chmod +x RESP.sh 
./RESP.sh ligand.mol2 0

单核运行大概需要50分钟，随后就会得到 ligand.chg，随后利用 sobtop 创建力场参数（<path_to_sobtop>是你安装sobtop的文件夹）：

cd <path_to_sobtop>
./sobtop

# 启动后输入：
<path_to_tut>/lig_para/ligand.mol2
2
<path_to_tut>/lig_para/ligand.gro
7
10
<path_to_tut>/lig_para/ligand.chg
0
1
2
4
<path_to_tut>/lig_para/ligand.top
<path_to_tut>/lig_para/ligand.itp

本教程使用的是 amber99sb-ildn 力场（当然你可以使用你喜欢而且合理的力场），但是由于我们新生成了配体的参数，需要将 ligand.itp 中 [ atomtypes ] 字段整合到力场的 ffnonbonded.itp 中（你只需要将他们剪切到 ffnonbonded.itp 中就行，但是本教程提供的文件中已经帮你整合好了，所以你只需要删除ligand.itp 中的 [ atomtypes ] 内容）。并且修改 [ moleculetype ] 字段下的名称为 MOL（为了方便后续操作），最终ligand.itp应该长这样：

; Created by Sobtop (http://sobereva.com/soft/sobtop) Version 1.0(dev3.1) on 2023-08-24

[ moleculetype ]
; name          nrexcl
MOL       3

[ atoms ]
;  Index   type   residue  resname   atom        cgnr     charge       mass
     1     nh         1      MOL     N             1   -0.98041913   14.006703
     2     n2         1      MOL     N             2   -0.94207571   14.006703
     3     ca         1      MOL     C             3   -0.00867452   12.010736
     4     ca         1      MOL     C             4   -0.15695061   12.010736
···

对于蛋白文件：进入相应文件夹：

cd <path_to_tut>/md
gmx pdb2gmx -f protein.pdb -ignh -o complex.gro
1  #选择当前文件夹内的那个力场，是我们整合之后的
1  #TIP3P 水模型

cp ../lig_para/ligand.gro ./
cp ../lig_para/ligand.itp ./

随后是整合结构文件：用文本编辑器打开 complex.gro 和 ligand.gro，将 ligand.gro 中所有记录原子坐标的行复制到 complex.gro 的末尾、盒子参数的上方，并且修改 complex.gro 第二行的总原子数；整合之后的 complex.gro 应该是（修改完之后最好查看一下结构是不是和晶体结构一致）：

Gravel Rubs Often Many Awfully Cauterized Sores
 3237
   16ILE      N    1  -0.810   0.960   2.031
   16ILE     H1    2  -0.807   1.055   2.001
   16ILE     H2    3  -0.893   0.943   2.084
···
  245ASN    OC1 3219   1.844   0.488   3.901
  245ASN    OC2 3220   1.733   0.669   3.934
    1MOL      N    1  -0.274   1.409   1.443
    1MOL      N    2  -0.186   1.219   1.547
    1MOL      C    3  -0.180   1.425   1.672
    1MOL      C    4  -0.210   1.561   1.678
    1MOL      C    5  -0.120   1.361   1.780
    1MOL      C    6  -0.179   1.634   1.793
    1MOL      C    7  -0.088   1.434   1.895
    1MOL      C    8  -0.213   1.348   1.552
    1MOL      C    9  -0.118   1.570   1.901
    1MOL      H   10  -0.257   1.615   1.597
    1MOL      H   11  -0.095   1.255   1.779
    1MOL      H   12  -0.202   1.740   1.798
    1MOL      H   13  -0.041   1.385   1.980
    1MOL      H   14  -0.093   1.627   1.991
    1MOL      H   15  -0.296   1.354   1.360
    1MOL      H   16  -0.299   1.507   1.438
    1MOL      H   17  -0.216   1.184   1.456
   5.48900   5.85200   6.76300

接着再修改拓扑文件：打开 topol.top ，在蛋白位置限制引用之后添加配体参数的引用

; Include Position restraint file
#ifdef POSRES
#include "posre.itp"
#endif

#include "ligand.itp"

#ifdef POSRES
#include "posre_ligand.itp"
#endif

; Include water topology
#include "./amber99sb-ildn_with_new_atoms.ff/tip3p.itp"

在末尾的 [ molecules ] 字段处添加 MOL 1

[ molecules ]
; Compound        #mols
Protein_chain_A     1
MOL   1

from Solvation to Equilibration

搭建盒子（这个盒子的尺寸和形状与原文献不同，所以后面加的离子数目也不同）、加水、加0.1M的NaCl：

gmx editconf -f complex.gro -o newbox.gro -center 3 3 3 -box 6.0 7.5 6.0
gmx grompp -f ions.mdp -c solv.gro -p topol.top -o ions.tpr
gmx genion -s ions.tpr -o solv_ions.gro -p topol.top -pname NA -nname CL -neutral -conc 0.1
13 #选 SOL

最小化：

gmx grompp -f em.mdp -c solv_ions.gro -p topol.top -o em.tpr
gmx mdrun -v -deffnm em

生成索引文件和配体的位置限制：

gmx make_ndx -f ligand.gro -o index_lig.ndx
...
 > 0 & ! a H*
 > q

gmx genrestr -f ligand.gro -n index_lig.ndx -o posre_ligand.itp -fc 1000 1000 1000

gmx make_ndx -f em.gro -o index.ndx
...
 > 1 | 13 #蛋白和配体和为一组
 > q

NPT预平衡：

gmx grompp -f npt.mdp -c em.gro -r em.gro -p topol.top -n index.ndx -o npt.tpr
gmx mdrun -v -deffnm npt

Create trajectories

因为我们要进行PCA分析，所以要获得大量的 TMD 轨迹（通常是100个以上），使用如下脚本进行50次重复的TMD模拟，其中的 TMD.mdp 文件的牵引设置如下

; Pull code
pull                    = yes
pull_ncoords            = 1         ; only one reaction coordinate
pull_ngroups            = 2         ; two groups defining one reaction coordinate
pull_group1_name        = MOL
pull_group2_name        = Pull2
pull_group2_pbcatom     = 1708
pull-pbc-ref-prev-step-com = yes
pull_coord1_type        = constraint  ; tmd
pull_coord1_geometry    = distance  ; simple distance increase
pull_coord1_dim         = N Y N
pull_coord1_groups      = 1 2
pull_coord1_start       = yes       ; define initial COM distance > 0
pull_coord1_rate        = 0.001       ; 0.001 nm per ps = 1 nm per ns 
pull_coord1_k           = 100      ; kJ mol^-1 nm^-2

批量TMD运行的脚本tmd.sh内容如下：

mkdir tmd

for i in {000..049}
do
gmx grompp -f TMD.mdp -c npt.gro -p topol.top -r npt.gro -n index.ndx -t npt.cpt -o tmd/tmd_"$i".tpr -maxwarn 1 
gmx mdrun -v -deffnm tmd/tmd_"$i"
done

将所有TMD轨迹文件提取 Protein_MOL 整合进一个 .xtc 文件（为了方便后续计算和分析）：

cd tmd/
mkdir PCA

gmx trjcat -f *.xtc -o ./PCA/tmd.xtc -cat -n ../index.ndx

PCA for TMD

随后生成一个只包含 Protein_MOL 的.tpr 文件，这是为了下一步原子数匹配。然后消除轨迹平移和旋转：

cd PCA/
gmx convert-tpr -s ../tmd_0.tpr -n ../index.ndx -o tmd_k100.tpr

gmx trjconv -s tmd.tpr -f tmd.xtc -o tmd_nopbc.xtc -pbc mol
gmx trjconv -s tmd_k100.tpr -f tmd_k100_nopbc.xtc -o tmd_k100_done.xtc -fit rot+trans

接着利用gmx covar对轨迹进行协方差矩阵和本征向量的计算：

gmx covar -s tmd_k100.tpr -f tmd_k100_done.xtc -o tmd_k100_eigenvalues.xvg -v tmd_k100_eigenvectors.trr -xpma tmd_k100_covapic.xpm

gmx anaeig -s tmd_k100.tpr -f tmd_k100_done.xtc -v tmd_k100_eigenvectors.trr -first 1 -last 1 -proj tmd_k100_pc1.xvg  
gmx anaeig -s tmd_k100.tpr -f tmd_k100_done.xtc -v tmd_k100_eigenvectors.trr -first 2 -last 2 -proj tmd_k100_pc2.xvg
gmx anaeig -s tmd_k100.tpr -f tmd_k100_done.xtc -v tmd_k100_eigenvectors.trr -first 1 -last 2 -2d tmd_k100_2d12.xvg

其中eigenvalues.xvg里面记录了各个本征值的占比，eigenvectors.trr储存了投影到本征向量之后的坐标。然后使用gmx anaeig对轨迹进行投影，下面三个命令依次获得 PC1-time, PC2-time, PC1-PC2 的数据

gmx anaeig -s tmd_k100.tpr -f tmd_k100_done.xtc -v tmd_k100_eigenvectors.trr -first 1 -last 1 -proj tmd_k100_pc1.xvg  
gmx anaeig -s tmd_k100.tpr -f tmd_k100_done.xtc -v tmd_k100_eigenvectors.trr -first 2 -last 2 -proj tmd_k100_pc2.xvg
gmx anaeig -s tmd_k100.tpr -f tmd_k100_done.xtc -v tmd_k100_eigenvectors.trr -first 1 -last 2 -2d tmd_k100_2d12.xvg

（GMX 2023 的gmx anaeig似乎有点问题，我这边一用就Segmentation fault，有空再用低版本跑一个（懒了（最终效果图应当是这样的：

[1] Wolf, S., Lickert, B., Bray, S. et al. Multisecond ligand dissociation dynamics from atomistic simulations. Nat Commun 11, 2918 (2020). https://doi.org/10.1038/s41467-020-16655-1

[2] Yang Z, Zhou N, Jiang X, Wang L. Loop Evolutionary Patterns Shape Catalytic Efficiency of TRI101/201 for Trichothecenes: Insights into Protein-Substrate Interactions. J Chem Inf Model. 2023;63(20):6316-6331. doi:10.1021/acs.jcim.3c00787