Russian version English version
Volume 11   Issue 1   Year 2016
Number of Overlaps in Patterns

Furletova E.I., Roytberg M.A.

Institute of Mathematical Problems of Biology, Russian Academy of Science, Pushchino, Moscow Region, Russia
Higher School of Economics, Moscow, Russia
Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, Russia

Abstract. The aim of the paper is to estimate the number of overlaps in the given pattern. The pattern is a set of words of same length m in an alphabet A. We present theoretical and experimental bounds for overlaps number in two types of patterns. Firstly, we  considered random patterns which relate to uniform probability model, i.e. all letters in the alphabet and, correspondently, all words of same length are equiprobable. We  proved that the average number of overlaps P for random patterns consisting of n words of length m linearly depends on pattern size n and is independent of length of pattern words. In performed computer experiments the ratio P/n ranged from 0.33 till 1.06; the theoretical evaluations of the ratio for the patterns do not exceed 1.67. The secondly, we studied the patterns described by position weight matrices (PWM) from the data base HOCOMOCO and various cut-offs. For such patterns the ratio P/n in experiments ranged from 0.004 till 1, for most of the patterns it is smaller then 0.1. 

 

Key words: overlap, pattern, pattern occurrence in a sequence.

 

 

Table of Contents Original Article
Math. Biol. Bioinf.
2016;11(1):14-23
doi: 10.17537/2016.11.14
published in Russian

Abstract (rus.)
Abstract (eng.)
Full text (rus., pdf)
References
Supplementary data

 

  Copyright IMPB RAS © 2005-2024