Coding Structure for the ORF1ab, S, M and N Coronavirus Genes
Chaley M.B.1, Tyulko Zh.S.2,3, Kutyrkin V.A.4
1Institute of Mathematical Problems of Biology, Keldysh Institute of Applied Mathematics of RAS, Pushchino, Russia
2Omsk State Medical University of Ministry of Healthcare of the Russian Federation, Omsk, Russia
3Federal Service for Surveillance on Consumer Rights Protection and Human Wellbeing, Omsk Research Institute of Natural Focal Infections, Omsk, Russia
4Moscow State Technical University n.a. N.E. Bauman, Moscow, Russia
Abstract. Spectral-statistical approach was applied to comparative analysis of coronavirus genomes from the four genus Alphacoronavirus, Betacoronavirus (including new SARS-CoV-2 virus), Gammacoronavirus and Deltacoronavirus. This analysis was done from the point of view of 3-regularity and latent triplet profile periodicity existence in the coding sequences of four structural genes: ORF1ab encoding transcriptase; S-gene of glycoprotein forming spikes; M-gene of membrane protein; N-gene of nucleoprotein. A whole number of the genomes analyzed was equal to 3410. Gene numbers in each of the four groups in the study respectively were the same. In the result, practically, in the CDSs of all analyzed genes of ORF1ab, S and N the latent profile triplet periodicity was revealed and high value of 3-regularity index, being a quality estimate of coding triplet structure conservation, was determined. On the contrary, for coding structure of M-genes a tendency was revealed to diffuse up to homogeneity for 60 % of the genes in the genomes of alphacoronaviruses analyzed and for 67 % of the genes of the gammacoronaviruses. Tendency of the such structure diffusion, being accompanied by decrease of 3-regularity index average value in comparison with other genes, while the triplet profile periodicity remains saved, was also noted for M-genes of SARS-CoV-2 viruses. Probably, this tendency reflects a significance of M-genes variability in coronavirus adaptation to the novel hosts of genus. Analysis of 3-profile periodicity matrices of the four groups of SARS-CoV-2 genes considered in the work, for the viruses isolated in Europe, Asia and USA, did not revealed their significant difference, that is allowing to propose a single source of this virus propagation.
Key words: property of 3-regularity in CDS, latent profile triplet periodicity, coronavirus genome, SARS-Cov-2 virus genome, ORF1ab, S-gene, M-gene, N-gene.