Entropy Approach to the Construction of a Measure of Word Symbolic Diverseness and its Application to Clustering of Plant Genomes
Smetanin Y.G., Ulyanov M.V., Pestova A.S.
Federal Research Center “Informatics and Control” of the Russian Academy of Sciences, Moscow
Faculty of Computational Mathematics and Cybernetics of Lomonosov Moscow State University, Moscow
Faculty of Computer Science of Higher School of Economics, Moscow
Abstract. An approach to the information analysis is considered for the case when the information is presented by words of finite length over a finite alphabet. A method of generating a measure of symbolic diverseness of words based on peak characteristics of a shift entropy function is proposed. The shift entropy function is formally defined using a unit translation operator and the entropy of discrete distributions. A model example is presented together with some results of application of the proposed measure in the clustering of families of plants using the analysis of genome of their representatives.
Key words: shift entropy, measure of symbolic diverseness, clustering of plant genomes.