Электронный журнал | Том 15 Выпуск 2 Год 2020

Гусев В.Д., Мирошниченко Л.А.

Сложность ДНК-последовательностей. Различные подходы и определения

Математическая биология и биоинформатика. 2020;15(2):313-337.

doi: 10.17537/2020.15.313.

Список литературы

Knuth D.E. The Art of Computer Programming: Vol. 2. Seminumerical Algorithms. Addison-Wesley Publishing Company, 1969.
Jermann W.H. Redundancy in deterministic sequences. IEEE Trans. on Syst. Sci. and Cybernetics. 1970;6(4). doi: 10.1109/TSSC.1970.300313
Shannon C. A mathematical theory of communication. Bell System Techn. J. 1948;27(3):379–423. doi: 10.1002/j.1538-7305.1948.tb01338.x
Shannon C. A mathematical theory of communication. Bell System Techn. J. 1948;27(4):623–656. doi: 10.1002/j.1538-7305.1948.tb00917.x
Reznikova Zh.I., Ryabko B.Ya. Analysis of the Language of Ants by Information-Theoretical Methods. Problems of Information Transmission. 1986;22(3):245–249.
Kolmogorov A.N. Three approaches to the definition of the concept “quantity of information. Probl. Peredachi Inf. 1965;1(1):3–11.
Solomonoff R. A Preliminary Report on a General Theory of Inductive Inference. Cambridge, Ma.: Zator Co., 1960.
Solomonoff R.A. Formal theory of inductive inference. Part I. Information and Control. 1964;7(1):1–22. doi: 10.1016/S0019-9958(64)90223-2
Chaitin G. Information-theoretic limitations of formal systems. Journal of the ACM. 1974;21(3):403–424. 10.1145/321832.321839. doi: 10.1145/321832.321839
Levin L.A. Various measures of complexity for finite objects (axiomatic description). Dokl. Akad. Nauk SSSR. 1976;227(4):04–807.
Salamon P., Konopka A.K. A maximum entropy principle for the distribution of local complexity in naturally occurring nucleotide sequences. Computers Chem. 1992;16(2):117–124. doi: 10.1016/0097-8485(92)80038-2
Román-Roldán R., Bernaola-Galván P., Oliver J.L. Sequence compositional complexity of DNA through an entropic segmentation method. Physical Review Letters. 1998;80:1344–1347. doi: 10.1103/PhysRevLett.80.1344
Trifonov E.N. Making sense of the human genome. In: Structure & Methods. Eds. Sarma R.H., Sarma M.H. Adenine Press, 1990;1:69–77.
Crochemore M., Verin R. Zones of low entropy in genomic sequences. Computers and Chemistry. 1999;23:275–282. doi: 10.1016/S0097-8485(99)00009-1
Gabrielian A.E., Bolshoy A. Sequence complexity and DNA curvature. Comput. Chem. 1999;23:263–274. doi: 10.1016/S0097-8485(99)00007-8
Troyanskaya O.G., Arbell O., Koren Y., Landau G.M., Bolshoy A. Sequence complexity profiles of prokaryotic genomic sequences: a fast algorithm for calculating linguistic complexity. Bioinformatics. 2002;18(5):679–688. doi: 10.1093/bioinformatics/18.5.679
Grumbach S., Tahi F. Compression of DNA sequences. Proc. IEEE Symp. on Data Compression. 1993:340–350. doi: 10.1109/DCC.1993.253115
Grumbach S., Tahi F. A new challenge for compression algorithms: genetic sequences. J. Information Processing and Management. 1994;30(6):875–866. doi: 10.1016/0306-4573(94)90014-0
Pratas D., Hosseini M., Silva J.M., Pinho A.J. A reference-free lossless compression algorithm for DNA sequences using a competitive prediction of two classes of weighted models. Entropy. 2019;21(11):1074. doi: 10.3390/e21111074
Brandon M.C., Wallace D.C., Baldi P. Data structures and compression algorithms for genomic sequence data. Bioinformatics. 2009;25(14):1731–1738. doi: 10.1093/bioinformatics/btp319
Deorowicz S., Grabowski S. Robust relative compression of genomes with random access. Bioinformatics. 2011;27(21):2979–2986. doi: 10.1093/bioinformatics/btr505
Pavlichin D.S., Weissman T, Yona G. The human genome contracts again. Bioinformatics. 2013;29(17):2199–2202. doi: 10.1093/bioinformatics/btt362
Bakr N.S., Sharawi A.A. DNA lossless compression algorithms: Review. American Journal of Bioinformatics Research. 2013;3(3):72–81. doi: 10.5923/j.bioinformatics.20130303.04
Zhu Z., Zhang Y., Ji Z., He S., Yang X. High-throughput DNA sequence data compression. Briefings in Bioinformatics. 2015;16(1):1–15. doi: 10.1093/bib/bbt087
Hosseini M., Pratas D., Pinho A. A survey on data compression methods for biological sequences. Information. 2016;7(4):56. doi: 10.3390/info7040056
Smetanin Y.G., Ulyanov M.V., Pestova A.S. Entropy Approach to the Construction of a Measure of Word Symbolic Diverseness and its Application to Clustering of Plant Genomes. Mathematical Biology and Bioinformatics. 2016;11(1):114–126. doi: 10.17537/2016.11.114
Shannon C. Prediction and entropy of printed English. Bell System Techn. J. 1951;30(1):50–64. doi: 10.1002/j.1538-7305.1951.tb01366.x
Herzel H. Complexity of symbol sequences. Systems Analysis Modelling Simulation. 1988;5(5):435–444.
Ebeling W., Nicolis G. Word frequency and entropy of symbolic sequences: a dynamical perspecrive. Chaos, Solitons and Fractals. 1992;2(6):635–650. doi: 10.1016/0960-0779(92)90058-U
Schmitt A.O., Herzel H. Estimating the entropy of DNA sequences. J. Theor. Biol. 1997;188:369–377. doi: 10.1006/jtbi.1997.0493
Weiss O., Jiménes-Montaño M.A., Herzel H. Information content of protein sequences. J. Theor. Biol. 2000;206:379–386. doi: 10.1006/jtbi.2000.2138
Farach M., Noordewier M., Savari S., Shepp L., Syner A., Ziv J. On the entropy of DNA: algorithms and measurements based on memory and rapid convergence. In: Proceedings of the 6th ACM-SIAM Symposium on Discrete Algorithms. New-York: ACM, Inc., 1995. P. 48–57.
Loewenstern D., Yianilos P.N., Significantly lower entropy estimates for natural DNA sequences. J. Comput. Biol. 1999;6:125–142. doi: 10.1089/cmb.1999.6.125
Kisliuk O.S., Borovina T.A., Nazipova N.N. Estimation of redundancy of genetic texts by the high frequency component of the L-gram graph. Biophysics. 1999;44(4):621–630.
Fano R.M. Transmission of Information: A Statistical Theory of Communication. The MIT Press, 1961.
Huffman D. A method for the construction of minimum-redundancy codes. Proceedings of the IRE. 1952;40(9):1098–1101. doi: 10.1109/JRPROC.1952.273898
Knuth D.E. Dynamic Huffman Coding. Journal of Algorithms. 1985;6(2):163–180. doi: 10.1016/0196-6774(85)90036-7
Ryabko B.Ya. Fast Adaptive Coding Algorithm. Problems Inform. Transmission. 1990;26(4):305–317.
Gilbert E.N., Moore E.F. Variable-length binary encodings. Bell System Technical Journal. 1959;38(4):933–967. doi: 10.1002/j.1538-7305.1959.tb01583.x
Ryabko B.Ya. Data Compression by Means of a “Book Stack”. Problems Inform. Transmission. 1980;16(4):265–269.
Nigel G., Martin N. Range encoding: An algorithm for removing redundancy from a digitized message. Video & Data Recording Conference. Southampton, UK, 1979.
Said A. Introduction to Arithmetic Coding Theory and Practice. In: Lossless Compression Handbook. Ed. Sayood K. Elsevier Inc., 2003. P. 101–152. doi: 10.1016/B978-012620861-0/50006-1
Barron A., Rissanen J., Yu B. The minimum description length principle in coding and modeling. IEEE Transactions on Information Theory. 1998;44(6). doi: 10.1109/18.720554
Orlov Y.L., Filippov V.P., Potapov V.N., Kolchanov N.A. Construction of stochastic context trees for genetic texts. In Silico Biology. 2002;2(3):233–247.
Konopka A.K. Sequences and codes: fundamentals of biomolecular cryptology. In: Biocomputing: Informatics and Genome Projects. Ed. Smith D.W. New York: Academic Press, 1994. P. 119–174. doi: 10.1016/B978-0-08-092596-7.50008-3
Wan H., Wootton J.C. A global compositional complexity measure for biological sequences: AT-rich and CG-rich genomes encode less complex proteins. Computers and Chem. 2000;24(1):71–94. doi: 10.1016/S0097-8485(00)80008-X
Hartley R.V.L. Transmission of Information. Bell Syst Techn J. 1928;7(3):535–563. doi: 10.1002/j.1538-7305.1928.tb01236.x
Wootton J.C., Federhen S. Statistics of local complexity in amino acid sequences and sequence databases. Computers &. Chemistry. 1993;17(2):149–163. doi: 10.1016/0097-8485(93)85006-X
Wootton J.C., Federhen S. Analysis of compositionally biased regions in sequence databases. Methods in Enzymology. 1996;266:554–571. doi: 10.1016/S0076-6879(96)66035-2
Bernaola-Galvan P., Román-Roldán R., Oliver J.L. Compositional segmentation and long-range fractal correlation in DNA sequences. Phys. Rev. E. 1996;53(5):5181–5189. doi: 10.1103/PhysRevE.53.5181
Li W. The complexity of DNA: the measure of compositional heterogeneity in DNA sequences and measures of complexity. Complexity. 1997;3(2):33–37. doi: 10.1002/(SICI)1099-0526(199711/12)3:2<33::AID-CPLX7>3.0.CO;2-N
Oliver J.L., Román-Roldán R., Pérez J., Bernaola-Galván P. SEGMENT: identifying compositional domains in DNA sequences. Bioinformatics. 1999;15(2):974–979. doi: 10.1093/bioinformatics/15.12.974
Lin J. Divergence measure based on the Shannon entropy. IEEE Transactions on Information Theory. 1991;37:145–151. doi: 10.1109/18.61115
Tautz D., Trick M., Dover G.A. Cryptic simplicity in DNA is major source of genetic variation. Nature. 1986;322:652–656. doi: 10.1038/322652a0
Hancock J.M., Armstrong J.S. SIMPLE34: an improved and enhanced implementation for VAX and Sun computers of the SIMPLE algorithm for analysis of clustered repetitive motifs in nucleotide sequences. Comput. Appl. Biosci. 1994;10:67–70.
Alba M. Mar, Laskowski R.A., Hancock J.M. Detecting cryptically simple protein sequences using the SIMPLE algorithm. Bioinformatics. 2002;5:672–678. doi: 10.1093/bioinformatics/18.5.672
Promponas V.J., Enright A.J., Tsoka S., Kreil D.P., Leroy C., Hamodrakas S., Sander C., Ouzounis C.A. CAST: an iterative algorithm for the complexity analysis of sequence tracts. Bioinformatics. 2000;16(10):915–922. doi: 10.1093/bioinformatics/16.10.915
Benson G. Tandem repeats finder: a program to analyze DNA sequences. NAR. 1999;22(2):573–580. doi: 10.1093/nar/27.2.573
Chaley M.B., Kutyrkin V.A., Tyulbasheva G.E., Teplukhina E.I., Nazipova N.N. Investigation of Latent Periodicity Phenomenon in the Genomes of Eukaryotic Organisms. Mathematical Biology and Bioinformatics. 2013;8(2):480–501. doi: 10.17537/2013.8.480
Lothaire M. Combinatorics on Words. Reading, MA: Addison-Wesley, 1983.
Ferenczi S. Complexity of sequences and dynamical systems. Discrete Mathematics. 1999;206(1–3):145–154. doi: 10.1016/S0012-365X(98)00400-2
Bolshoy A. DNA sequence analysis linguistic tools: contrast vocabularies, compositional spectra and linguistic complexity. Applied Bioinformatics. 2003;2(2):103–112.
Bolshoy A., Shapiro K., Trifonov E.N., Ioshikhes I. Enhancement of the nucleosomal pattern in sequences of lower complexity. Nucl. Acids Res. 1997;25:3248–3254. doi: 10.1093/nar/25.16.3248
Ukkonen E. On-line constructing of suffix trees. Algorithmica. 1995;14:249–260. doi: 10.1007/BF01206331
Blumer A., Blumer J., Ehrenfeucht A., Haussler D., McConnel R. Building the minimal DFA for the set of all subwords of a word on-line in linear time. Lect. Notes in Comput. Sci. 1984;172:109–118. doi: 10.1007/3-540-13345-3_9
Lempel A., Ziv J. On the complexity of finite sequences. IEEE Trans. Inform. Theory. 1976:IT-22(1):75–81. doi: 10.1109/TIT.1976.1055501
Ziv J., Lempel A. A universal algorithm for sequential data compression. IEEE Trans. Inform. Theory. 1977:IT-23(3):337–343. doi: 10.1109/TIT.1977.1055714
Ziv J., Lempel A. Compression of individual sequences via variable-rate coding. IEEE Trans. Inform. Theory. 1978:IT-24(5):530–536. doi: 10.1109/TIT.1978.1055934
Chen X., Li M., Ma B., Tromp J. DNACompress: fast and effective DNA sequence compression. Bioinformatics. 2002;18(12):1696–1698. doi: 10.1093/bioinformatics/18.12.1696
Mishra K.N., Aaggarwal A., Abdelhadi E., Srivastava D. An efficient horizontal and vertical method for online dna sequence compression. Int. J. Comput. Appl. 2010;3(1):39–46. doi: 10.5120/757-954
Gusev V.D., Miroshnichenko L.A., Chuzhanova N.A. Revealing fractal-like structures in DNA sequences. In: Information Science & Computing. International Book Series, No. 8. Classification, Forecasting, Data Mining. Sofia: ITHEA, 2009. P. 117−123 (in Russ.).
Gusev V.D., Kulichkov V.A., Chupakhina O.M. Slozhnostnoi analiz geneticheskikh tekstov (na primere faga λ) (Complexity analysis of genetic texts (on the example of phage λ)): preprint. Novosibirsk, 1989. 20. 49 p. (in Russ.).
Gusev V.D., Kulichkov V.A., Chupakhina O.M. Complexity analysis of genomes. I. Complexity and classification methods of detected structural regularities. Molecular Biology (Moscow). 1991;25(3):825−833 (in Russ.).
Gusev V.D., Kulichkov V.A., Chupakhina O.M. Complexity analysis of a genome. II. Extensive homology zones in bacteriophage lambda. Mol. Biol. (Mosk.). 1991;25(4):1080–1089 (in Russ.).
Gusev V.D., Kulichkov V.A., Chupakhina O.M. The Lempel–Ziv complexity and local structure analysis of genomes. Biosystems. 1993;30(1–3):183–200. doi: 10.1016/0303-2647(93)90070-S
Gusev V.D., Nemytikova L.A., Chuzhanova N.A. On the complexity measures of genetic sequences. Bioinformatics. 1999;15(12):994–999. doi: 10.1093/bioinformatics/15.12.994
Gusev V.D., Miroshnichenko L.A. In: Doklady 8 Mezhdunarodnoi konferentsii "Intellektualizatsiia obrabotki informatsii" (Reports of the 8th International Conference "Intellectualization of Information Processing" (IOI-2010) (Cyprus, Paphos, October 17-24, 2010). 2010. P. 469–472 (in Russ.).
Gusev V.D., Miroshnichenko L.A. In: Doklady vserossiiskoi konferentsii MMRO-13 «Matematicheskie metody raspoznavaniia obrazov» (Reports of the all-Russian conference MMRO-13 "Mathematical methods of pattern recognition", Leningrad region, Zelenogorsk. September 30-October 6, 2007). Moscow, 2007. P. 473−476 (in Russ.).
Gusev V.D. In: Vychislitel'nye sistemy (Computing systems). Iss. 132. Novosibirsk, 1989. P. 35−63 (in Russ.).
Orlov Yu.L., Gusev V.D., Miroshnichenko L.A. LZcomposer: decomposition of genomic sequences by repeat fragments. Biophisics. 2003;48(Suppl. 1):S7−S16.
Gusev V.D., Nemytikova L.A., Chuzhanova N.A. Rapid method for identification of interconnections between functionally and/or evolutionarily related biological sequences. Molecular Biology (Mosc.). 2001;35(6):867–873.
Chuzhanova N.A., Krawczak M., Nemytikova L.A., Gusev V.D., Cooper D.N. Promoter shuffling has occurred during the evolution of the vertebrate growth hormone gene. Gene. 2000;254:9–18. doi: 10.1016/S0378-1119(00)00308-5
Surguchov A. Migration of promoter elements between genes: a role in transcriptional regulation and evolution. Biomed. Sci. 1991;2:22–28.
Chuzhanova N.A., Krawczak M., Thomas N., Nemytikova L.A., Gusev V.D., Cooper D.N. The evolution of the vertebrate beta–globin gene promoter. Evolution. 2002;56(2):224–232. doi: 10.1111/j.0014-3820.2002.tb01333.x
Orlov Yu.L., Potapov V.N. Estimation of stochastic complexity of genetical texts. Computational technologies (Novosibirsk). 2000;5(Special issue):5–15.
Kiknadze I.I., Gunderina L.I., Istomina A.G., Gusev V.D., Nemytikova L.A. Similarity analysis of inversion banding sequences in chromosomes of Chironomus species (breakpoint phylogeny). In: Bioinformatics of Genome Regulation and Structure. Eds. N. Kolchanov, R. Hofestaedt. Boston, MA: Springer, 2004. P. 245–254. doi: 10.1007/978-1-4419-7152-4_26
Grigor'eva A.N. Zapiski nauchnykh seminarov LOMI AN SSSR (Notes of Scientific Seminars of the Leningrad Branch of the Mathematical Institute of the USSR Academy of Sciences). 1981;105:18–24.(in Russ.).
Allison L., Edgoose T., Dix T.I. Compression of strings with approximate repeats. In: Intelligent Systems in Molecular Biology (ISMB'98) (Montreal, 28 June-1 July 1998). 1998. P. 8–16.
Chen X., Kwong S., Li M. A compression algorithm for DNA sequences and its applications in genome comparison. Genome informatics. International Conference on Genome Informatics. 1999;10:51–61.
Ma B., Tromp J., Li M. PatternHunter: Faster and more sensitive homology search. Bioinformatics. 2002;18(3):440–445. doi: 10.1093/bioinformatics/18.3.440
Merekin Yu.V. A lower bound on complexity for schemes of the concatenation of words. Diskretn. Anal. Issled. Oper. 1996;3(1):52–56.
Evdokimov A.A. Vestnik TGU (Tomsk State University Bulletin). 2005;14:4–12 (in Russ.).
Ebeling W., Jiménes-Montaño M.A. On grammars, complexity, and information measures of biological macromolecules. Math. Biosci. 1980;52:53–71. doi: 10.1016/0025-5564(80)90004-8
Jiménes-Montaño M.A. On syntactic structure of protein sequences and the concept of grammar complexity. Bull. Math. Biol. 1984;46:641–659. doi: 10.1007/BF02459508
Jiménes-Montaño M.A., Pöschel T., Rapp P.E. A measure of the information content of neural spike trains. Proc. Symp. on Complexity in Biology. Eds. Mizraji E., Acerenza L., Alvares F., Pomi A. Montevideo, Uruguay: D.I.R.A.C., 1997. P. 113–142.
Charikar M., Lehman E., Liu D., Panigrahy R., Prabhakaran M., Sahai A., Shelat A. The smallest grammar problem. IEEE Transactions on Information Theory. 2005;51(7):2554–2576. doi: 10.1109/TIT.2005.850116
Nevill-Manning C.G., Witten I. H. Identifying hierarchical structure in sequences: a linear-time algorithm. Journal of Artificial Intelligence Research. 1997;7:67–82. doi: 10.1613/jair.374
Witten I.H. Adaptive text mining: inferring structure from sequences. Journal of Discrete Algorithms. 2004;2(2):137–159. doi: 10.1016/S1570-8667(03)00084-4
Carrascosa R., Coste F., Gallé M., Infante-Lopes G. Searching for smallest grammars on large sequences and application to DNA. Journal of Discrete Algorithms. 2012;11:62–72. doi: 10.1016/j.jda.2011.04.006
Nevill-Manning C.G., Witten I. H. Online and offline heuristics for inferring hierarchies of repetitions in sequences. Proc IEEE. 2000;88(11):1745–1755. doi: 10.1109/5.892710
Cherniavsky N., Ladner R. Grammar-based compression of DNA sequences. In: Proceedings of the DIMACS Working Group on the Burrows-Wheeler Transform. New Jersey, 2004.
Liu Q., Yang Yu., Chen C., Bu J., Zhang Y., Ye X. RNACompress: Grammar-based compression and informational complexity measurement of RNA secondary structure. BMC Bioinformatics. 2008;9:176. doi: 10.1186/1471-2105-9-176
Trifonov E.N. Genetic sequences as product of compression by inclusive superposition of many codes. Molecular Biology (Mosc.). 1997;31(4):647-654.
Bennett C.H., Glacs P., Li M., Vitányi P., Zurek W.H. Information Distance. IEEE Trans. on Inf. Th. 1998;44(4):1407–1423. doi: 10.1109/18.681318
Li M., Chen X., Li X., Ma B., Vitanyi P.M.B. The similarity metric. IEEE Trans. on Inf. Th. 2004;50(12):3250–3264. doi: 10.1109/TIT.2004.838101
Varré J.-S., Delahaye J.-P., Rivals E. Transformation distances: a family of dissimilarity measures based on movements of segments. Bioinformatics. 1999;15(3):194–202. doi: 10.1093/bioinformatics/15.3.194
Vinga S., Almeida J.S. Alignment-free sequence comparison - a Review. Bioinformatics. 2003;19(4):513–523. doi: 10.1093/bioinformatics/btg005
Wallace C.S., Boulton D.M. An information measure for classification. Computer J. 1968;11(2):185–194. doi: 10.1093/comjnl/11.2.185
Sankoff D., Leduc G., Antoine N., Paquin B., Lang B.F., Cedergren R. Gene order comparison for phylogenetic inference: Evolution of the mitochondrial genome. PNAS USA. 1992;89:6575–6579. doi: 10.1073/pnas.89.14.6575
Sankoff D., Nadeau J.H. Conserved synteny as a measure of genomic distance. Discrete Appl. Math. 1996;71:247–257. doi: 10.1016/S0166-218X(96)00067-4
Bafna V., Pevzner P.A. Sorting by reversals: genome rearrangements in plant organelles and evolutionary history of X chromosome. Molecular Biology and Evolution. 1995;12(2):239–246. doi: 10.1093/oxfordjournals.molbev.a040208
Li M., Badger J.H., Chen X., Kwong S., Kearney P., Zhang H. An information-based sequence distance and its application to whole mitochondrial genome phylogeny. Bioinformatics. 2001;17(2):149–154. doi: 10.1093/bioinformatics/17.2.149
Salomaa A. Jewels of formal language theory. Rockville: Computer Science Press, 1981.
Iványi. On the d-complexity of words. Ann. Univ. Sci Budapest Sect Comput. 1987;8:69–90.
Nakashima I., Tamura J., Yasutomi S., Modified complexity and *-Sturmian word. Proc. Japan Acad. Ser. A Math. Sci. 1999;75(3):26–28. doi: 10.3792/pjaa.75.26
Kamae T., Zamboni L. Sequence entropy and the maximal pattern complexity of infinite words. Ergodic Theory Dynamical Systems. 2002;22(4):1191–1199. doi: 10.1017/S014338570200055X
Restivo A., Salemi S. Binary patterns in infinite binary words. In: Formal and Natural Computing. Lecture Notes in Computer Science. Eds. Brauer W., Ehrig H., Karhumäki J., Salomaa A. Berlin, Heidelberg: Springer, 2002. V. 2300. P. 107–116. doi: 10.1007/3-540-45711-9_8
Frid A.E. Arithmetical complexity of symmetric DOL words. Theoretic Computer Science. 2003;306:535–542. doi: 10.1016/S0304-3975(03)00345-1
Herzel H., Grobe I. Measuring correlations in symbol sequences. Phisica A. 1995;216:518–542. doi: 10.1016/0378-4371(95)00104-F
Buldyrev S.V., Goldberger A.L., Havlin S., Mantegna R.N., Matsa M.E., Peng C.-K., Simons M., Stanley H.E. Long-range correlations properties of coding and non-coding DNA-sequences – GenBank analysis. Physical Review E. 1995;51:5084–5091. doi: 10.1103/PhysRevE.51.5084
Havlin S., Buldyrev S.V., Goldberger A.L., Mantegna R.N., Peng C.-K., Simons M., Stanley H.E. Statistical and linguistic features of DNA sequences. Fractals. 1995;3(2):269–284. doi: 10.1142/S0218348X95000229
Karlin S., Brendel V. Patchiness and correlations in DNA sequences. Science. 1993;259(5095):677–680. doi: 10.1126/science.8430316
Voss R.F. Long-range fractal correlations in DNA introns and exons. Fractals. 1994;2:1–6. doi: 10.1142/S0218348X94000831
Li W. The study of correlation structures of DNA sequences: a critical review. Computer & Chem. 1997;21(4):257–271. doi: 10.1016/S0097-8485(97)00022-3
Cormode G., Paterson M., Sahinalp S.C., Vishkin U. Communication complexity of document exchange. In: Proc. Eleventh ACM-SIAM Symposium on Discrete Algorithms (SODA). 2000. P. 197–206.