Rank-Scaled Metric Clustering of Amino-Acid Sequences
Strijov V.V., Kuznetsov M.P., Rudakov K.V.
Dorodnicyn Computing Centre of the Russian Academy of Sciences, Moscow, 19333, Russia
Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, 141700, Russia
strijov@ccas.ru
mikhail.kuznecov@phystech.edu
Abstract. To solve the problem of the secondary protein structure recognition, an algorithm for amino-acid subsequences clustering is developed. To reviel clusters it uses the pairwise distances between the subsequences. The algorithm does not require the complete pairwise matrix. This main distinction of it implies the reduction of the computational complexity. To run the clustering, it needs no more than the ranks of the distances between subsequences. The algorithm is illustrated using synthetic data along with the amino-acid sequences from the UniProt KB Database.
Key words: clustering, distance function,pariwise distance matrix, metric configuration, rank scale.