- data file, a list of (PDB ID, Uniprot AC)
- obtained from
- PDBbind "v2007 Refined set"
- PDB annotations of Uniprot ACs to obtain protein's full sequences.
- Note:
1,300 of protein-ligand structures in PDB are carefully selected as `refined set (version 2007)' in the PDBbind database. This set is designed to serve as a high-quality standard data set for theoretical studies on protein-ligand binding. Since PDB often contains sequences only corresponding to the determined par t of structure, we also need the full amino-acid sequences of targets so that the bit patterns can be comparable. We manually checked the corresponding Uniprot IDs in PDB annotation, and could retrieve the Uniprot sequences for 1,274 of 1,300 pairs, and obtained 1,252 pairs after excluding 22 pairs found in the DrugBank drug-target pairs.
Close