Structure comparisons of all representative proteins have been done using a dynamic programming based structure alignment program. Employing the relative root mean square deviation (RMSD) (Betancourt & Skolnick) has enabled the assessment of the statistical significance of structure alignments of different lengths in terms of a Z-score. Two conclusions emerge from this analysis: First, proteins of the native fold can be distinguished by their Z-score, and the accuracy improves when the Z-score gap between natively folded proteins and the rest is also taken into account. Second and somewhat surprising, all small proteins up to 100 residues in length have significant structure alignments to other proteins in a different fold class; i.e. on average the best alignment assessed by the Z-score has an RMSD of 3.8 ? and covers 86.4% of the query protein's length. In this sense, the current PDB is almost a covering set of small protein structures. The length of the aligned region (relative to the whole protein length) does not differ among the top hit proteins, indicating that protein structure space is highly dense. For larger proteins, non-related proteins can cover a significant portion of the structure. Moreover, these top hit proteins are aligned to different parts of the query protein, so that almost the entire molecule can be covered when they are combined. The number of proteins required to cover a query protein is very small, e.g. the top 10 hit proteins can give 90% coverage for proteins up to 320 residues long. These
results give a new view of the nature of protein structure space,
and its implications for protein structure prediction will be discussed.