Towards Solving the Inverse Protein Folding Problem
Accurately assigning folds for divergent protein sequences is a major
obstacle to structural studies and underlies the inverse protein folding
problem. Herein, we outline our theories for fold-recognition in the
"twilight-zone" of sequence similarity (<25% identity). Our analyses
demonstrate that structural sequence profiles built using Position-Specific
Scoring Matrices (PSSMs) significantly outperform multiple popular
homology-modeling algorithms for relating and predicting structures given only
their amino acid sequences. Importantly, structural sequence profiles
reconstitute SCOP fold classifications in control and test datasets. Results
from our experiments suggest that structural sequence profiles can be used to
rapidly annotate protein folds at proteomic scales. We propose that encoding
the entire Protein DataBank (~1070 folds) into structural sequence profiles
would extract interoperable information capable of improving most if not all
methods of structural modeling.
DOI:
Version: za2963e q8zaa q8zb8 q8zcb q8zd2 q8zeb q8zf7 q8zga