Search for Megasatellite Tandem Repeats in Eukaryotic Genomes by Estimation of GC-content Curve Oscillations
Tetuev R.K., Nazipova N.N., Pankratov A.N., Dedus F.F.
Institute of Mathematical Problems of Biology RAS
M. V. Lomonosov Moscow State University, Faculty of Computational Mathematics and Cybernetics
Abstract. An efficient method for solving the problem of recognition sites of extended approximate tandem segmental duplications (over 1000 bps long) in genomes of higher eukaryotes has been developed. The essence of the method consists of multiple pass scanning of a genome using the technique of a sliding window with window lengths equal to the successive powers of 2, starting with 256. For each window percentage of GC-content is calculated, and the successive values of that define the GC-profile. The software is developed, which identifies areas of stable oscillations of the GC-profile and determines the basic characteristics of a significant periodicity implicated in these oscillations. Advantages of the new method are that it uses a combination of numerical and analytical approaches and allows yielding of interesting findings. Some results of the ongoing work are presented.
Key words: fuzzy tandem repeats, tandem segmental duplications, genomes of eukaryotes, GC-content curve oscillations, megasatellite sequences in the mouse genome.