Application of the Monte Carlo Method for Searching for the Possible Reading Frameshifts in Genes
Rudenko V.M., Korotkov E.V.
Bioengineering Center of Russian Academy of Sciences, Moscow, 117312, Russia
NRNU MEPHi, Moscow, 115409, Russia
v.m.rudenko@gmail.com
Abstract. In the article we presented the method for searching for the possible reading frameshifts in genes based on revealing change points of triplet frequencies distribution. The statistical significance was estimated by Monte Carlo method. Correctness of the introduced method was demonstrated by using it to analysis the DNA sequences with artificial indels. The method developed was applied for searching for the change points in DNA sequences from databank KEGG GENES. It was revealed more than 140 thousands genes with change points at the significance level equal to 6%. We classified sequences containing change points by field description in databank KEGG GENES. It appeared that many of them are pseudogenes or they were annotated earlier as sequences containing frameshifts. In addition to these sequences the change points were detected in many genes coding of PE-PGRS, cation channel family protein, PPE family protein and others. The relationship between change points and reading frameshifts in genes is discussed.
Key words: DNA sequence, reading frame, reading frameshift, change point, Monte Carlo method.