Validation and Integration of Protein 3D Structure Information

Haruki Nakamura
PDBj, Institute for Protein Research
Osaka University

 Many international projects of Structural Genomics/Structural Proteomics have now made it possible to determine protein three-dimensional (3D) structures much more rapidly than before. The number of those structural data deposited in Protein Data Bank (PDB) increased significantly since 1990, and we have more than 45,000 experimentally determined protein 3D structures in PDB in August, 2007. We, Institute for Protein Research, founded a new organization, PDB Japan (PDBj), in 2001, and organized the world-wide PDB (wwPDB), collaborating with RCSB (Research Collaboratory for Structural Bioinformatics) - PDB composed of Rutgers State University and UCSD, and with MSD (Macromolecular Structure Database) - EBI (European Bioinformatics Institute) [1, 2]. PDBj curates, edits, and processes about 25 to 30 % of the deposited data in the world.
 Because the wwPDB constructs its database from the spontaneous depositors as the databank system, the quality control of the deposited data is one of the most important issues. For keeping the precision and correctness of the experimental, the wwPDB is now going to ask all the depositors to send us the structure factors for the crystallographic data and the distance restraints for the NMR data, which should be obtained during the structure determination as the mandatory information. In addition, we also try to refine the deposited structures, independent of the original structures. On the other hand, for regulating the quality of the data description, it is effective to validate the descriptions in the new XML database, with the canonical XML format, PDBML [3].
 On August 1, 2007, the wwPDB changed all of the data formats (PDB flat format, mmCIF, and PDBML) to the new ones, including the change of atom nomenclatures to those based on IUPAC. On this occasion, the validation procedures worked well with the collaboration among the wwPDB members, and the big change has smoothly been made.
 In the lecture, the actual procedures of the quality control will be introduced with the secondary databases and analysis tools, by adding some analyzed results by structural bioinformatics and computational chemistry, with other biological data in the literatures and other databases [4-8].

References
[1] H. Berman, K. Henrick, H. Nakamura, Announcing the worldwide Protein Data Bank. Nature Struct. Biol. 10, 980-980 (2003).
[2] H. Berman, K. Henrick, H. Nakamura, J. L. Markley, The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data.Nucl. Acids Res. 35, D301-D303 (2007).
[3] J. Westbrook, N. Ito, H. Nakamura, K. Henrick, H. M. Berman, PDBML: The representation of archival macromolecular structure data in XML. Bioinformatics, 21, 988-992 (2005).
[4] H. Wako, M. Kato, and S. Endo, ProMode: a database of normal mode analyses on protein molecules with a full-atom model. Bioinformatics, 20, 2035 (2004).
[5] K. Kinoshita, H. Nakamura, eF-site and PDBjViewer: database and viewer for protein functional sites. Bioinformatics, 20, 1329-1330 (2004).
[6] D. M. Standley, H. Toh, H. Nakamura, Detecting local structural similarity in proteins by maximizing number of equivalent residues. PROTEINS, 57, 381-391 (2004).
[7] D. M. Standley, H. Toh, H. Nakamura, GASH: An improved algorithm for maximizing the number of equivalent residues between two protein structures. BMC Bioinformatics , 6, 221 (2005).
[8] K. Kinoshita, Y. Murakami, H. Nakamura, eF-seek: prediction of the functional sites of proteins by searching for similar electrostatic potential and molecular surface shape. Nucl. Acids Res. 35, W398-W402 (2007).