ゲノム情報科学研究教育機構  アブストラクト
Date July 23, 2010
Speaker Professor Ernst-Walter Knapp, Institute of Chemistry & Biochemistry, Free University of Berlin, Germany
Title SPARROW a Protein secondary structure predictor, variations on an old theme
Abstract Not so much is happening today dealing with this problem. In part certainly due to the tremendous success of the neural network ‘PSIPRED’ from David Jones. The main innovation of PSIPRED was to use sequence profiles for a polypeptide sequence rather than the sequence directly. PSIPRED was first established in 1999 [Jones DT ‘Protein secondary structure prediction based on position-specific scoring matrices’ J. Mol. Biol. 292 (1999) 195-202]. It achieves now on average an accuracy of 82% considering three different secondary structure classes. This performance is difficult to reach or even to supersede. Hence, many attempts by other researchers may have been unknown, since they were not successful enough to publish their work. Another reason, which may have prevented researchers to reconsider the problem, could be the belief that one is already close to the limit of what can be predicted solely based on protein sequences. Although there is very little theoretical development visible the demand for reliable protein secondary structure predictors by the researchers working in structural biology (crystallographers and NMR spectroscopists) has enormously increased. Hence, we considered it worth wile to tackle the problem one more time.

I start with a survey of what has been done so far predicting protein secondary structure. Then I will introduce our machine learning method ‘SPARROW’ that is based on scoring functions linear in parameter space and quadratic is sequence or structure space, respectively. We used two different approaches. The first approach considers the three class problem (α-helix, β-strand, coil and else). It is based on a three-fold hierarchy of three scalar scoring functions in each hierarchy. Each scoring function discriminates between one class from the other two. In the first hierarchy level the scoring functions use sequence profiles as input, thus establishing sequence-structure correlations. The scoring functions of the second level use as input the results from scoring functions of the first level in a sequence window thus describing structure-structure correlations. As a final step we used a neural network similar as in the SPIPRED approach. The second approach uses a vector valued scoring function, which can tackle an arbitrary multiclass problem directly that we not only apply for the three class problem, but also for the eight secondary structure classes of the DSSP program from Kabsch and Sander. Again we use three hierarchies as in the first approach. With these methods we s succeeded to be as good as PSIPRED in predicting secondary structures with the addition that the more general multi-class problem is solved as well.
「セミナー」に戻る      
 ホーム