Investigating Protein Structure Populations from Simulation Data using Unsupervised Learning
No Thumbnail Available
Date
2022-02
Journal Title
Journal ISSN
Volume Title
Publisher
IEEE
Abstract
Data obtained from molecular dynamics simulation
provides important intuition into the dynamical interactions of
biological molecules. The chronicles of sequential time-dependent
atomic motions of configurations obtained from simulation and
the derived properties estimated from molecule’s trajectory is
specified by this sequence. Therefore, knowing how to efficiently
extract representative structures from simulation data
is important because often, we will want to identify changes
in conformation of a protein structure when simulation is
performed. We use unsupervised machine learning techniques
to cluster such data and investigated a few of protein structural
properties. The algorithms implemented in this paper presents
clusters of the simulation data that tends to group frames from
an adjacent block of time together, even when sampling at 10 ps
intervals. We found that sampling of conformational space for
a shorter run simulation may not be able to completely visit all
structures that belong to a specific cluster. But for the sufficiently
long simulation, the systems revisit previous clusters repeatedly.
Cluster populations change rapidly at the initial stage of the
simulations, but became steady before each got to their terminal
values, indicating equilibrium attainment. Investigation of protein
structure properties also attest the correspondence between clusters
of protein structures obtained from the clustering algorithms.
Description
This article is published by IEEE 2022 and is also available at 10.1109/CSCI58124.2022.00199
Keywords
Citation
2022 International Conference on Computational Science and Computational Intelligence (CSCI)