||November 17, 2008
||Dr. Tatiana Tatusova, NCBI/NLM/NIH
||New Protein Clusters database at the National Center for Biotechnology Information
Rapid increases in DNA sequencing capabilities have led to a vast increase in the data generated from prokaryotic genomic studies, which
has been a boon to scientists studying microorganism evolution and to those who wish to understand the biological underpinnings of microbial
systems. The NCBI Protein Clusters Database (ProtClustDB) has been created to efficiently maintain and keep the deluge of data up to date.
ProtClustDB contains both curated and uncurated clusters of proteins grouped by sequence similarity. The May 2008 release contains a total
of 285386 clusters derived from over 1.7 million proteins encoded by 3806 nucleotide sequences from the RefSeq collection of complete
chromosomes and plasmids from four major groups: prokaryotes, bacteriophages, and the mitochondrial and chloroplast organelles.
There are 7180 clusters containing 376513 proteins with curated gene and protein functional annotation. PubMed identifiers and external
cross references are collected for all clusters and provide additional information resources. A suite of web tools is available to explore
more detailed information, such as multiple alignments, phylogenetic trees, and genomic neighborhoods. ProtClustDB provides an efficient
method to aggregate gene and protein annotation for researchers and is available at the following URL.