Russian version English version
Volume 16   Issue 2   Year 2021
Efimov V.M.1,2,3,4, Efimov K.V.5, Kovaleva V.Yu.2, Matushkin Yu.G.1

Principal Components of Genetic Sequences: Correlations and Significance

Mathematical Biology & Bioinformatics. 2021;16(2):299-316.

doi: 10.17537/2021.16.299.


  1. Efimov V.M., Efimov K.V., Kovaleva V.Y. Principal component analysis and its generalizations for any type of sequence (PCA-Seq). Vavilov Journal of Genetics and Breeding.2019;23(8):1032–1036. doi: 10.18699/VJ19.584
  2. Duras T. The fixed effects PCA model in a common principal component environment. Communications in Statistics-Theory and Methods. 2020:1–21. doi: 10.1080/03610926.2020.1765255
  3. Efron B. Bootstrap Methods: Another Look at the Jackknife. The Annals of Statistics. 1979;7:1–26. doi: 10.1214/aos/1176344552
  4. Timmerman M.E., Kiers H.A., Smilde A.K. Estimating confidence intervals for principal component loadings: a comparison between the bootstrap and asymptotic results. British Journal of Mathematical and Statistical Psychology. 2007;60(2):295–314. doi: 10.1348/000711006X109636
  5. Linting M., Meulman J.J., Groenen P.J., Van der Kooij A.J. Stability of nonlinear principal components analysis: An empirical study using the balanced bootstrap. Psychological methods. 2007;12(3):359. doi: 10.1037/1082-989X.12.3.359
  6. Efimov V., Efimov K., Kovaleva V. Anchored Bootstrap. In: 2020 Cognitive Sciences, Genomics and Bioinformatics (CSGB). IEEE. 2020:32–35. doi: 10.1109/CSGB51356.2020.9214598
  7. Hendus-Altenburger R., Vogensen J., Pedersen E.S., Luchini A., Araya-Secchi R., Bendsoe A.H., Nanditha Shyam Prasad, Andreas Prestel, Marité Cardenas, ... Kragelund B.B. The intracellular lipid-binding domain of human Na+/H+ exchanger 1 forms a lipid-protein co-structure essential for activity. Communications Biology. 2020;3(1):1–18. doi: 10.1038/s42003-020-01455-6
  8. Koch A., Schwab A. Cutaneous pH landscape as a facilitator of melanoma initiation and progression. Acta Physiologica. 2019;225(1):e13105. doi: 10.1111/apha.13105
  9. Böhme I., Schönherr R., Eberle J., Bosserhoff A.K. Membrane Transporters and Channels in Melanoma. In: Reviews of Physiology, Biochemistry and Pharmacology. 2020. P. 1–106. doi: 10.1007/112_2020_17
  10. Pethő Z., Najder K., Carvalho T., McMorrow R., Todesca L.M., Rugi M., Bulk E., Chan A., Löwik C.W.G.M., Reshkin S.J., Schwab A. pH-channeling in cancer: How pH-dependence of cation channels shapes cancer pathophysiology. Cancers. 2020;12(9):2484. doi: 10.3390/cancers12092484
  11. Polunin D., Shtaiger I., Efimov V. JACOBI4 software for multivariate analysis of biological data. bioRxiv. 2019:803684. doi: 10.1101/803684
  12. Hammer Ø., Harper D.A., Ryan P.D. PAST: Paleontological statistics software package for education and data analysis. Palaeontologia Electronica. 2001;4(1). (accessed 05 September 2021).
  13. Hill T., Lewicki P. Statistics: methods and applications: a comprehensive reference for science, industry, and data mining. Tulsa, Okla., UK: StatSoft Ltd. 2006. 719 p. ISBN: 9781884233593.
  14. NCBI. (accessed 05 September 2021).
  15. Gower J.C. Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika. 1966;53(3–4):325–338. doi: 10.1093/biomet/53.3-4.325
  16. Nei M., Kumar S. Molekuliarnaia evoliutsiia i filogenetika. Kiev, 2004. ISBN: 966-7192-53-9. (Translation of: Nei M., Kumar S. Molecular Evolution and Phylogenetics. Oxford University Press, 2000. 348 p. ISBN: 9780195135855).
  17. Efimov V.M., Melchakova M.A., Kovaleva V.Yu. Geometric properties of evolutionary distances. Vavilov Journal of Genetics and Breeding. 2013;17(4/1):714–723 (in Russ.).
  18. AAindex (v.9.2 ξς 13.02.2017). (accessed 05 September 2021).
  19. Kawashima S., Pokarowski P., Pokarowska M., Kolinski A., Katayama T., Kanehisa M. AAindex: amino acid index database, progress report 2008. Nucleic Acids Research. 2008;36(1):D202–D205. doi: 10.1093/nar/gkm998
  20. Sneath P.H.A. Relations between chemical structure and biological activity in peptides. Journal of Theoretical Biology. 1966;12(2):157–195. doi: 10.1016/0022-5193(66)90112-3
  21. Hellberg S., Sjoestroem M., Skagerberg B., Wold S. Peptide quantitative structure-activity relationships, a multivariate approach. Journal of Medicinal Chemistry. 1987;30(7):1126–1135. doi: 10.1021/jm00390a003
  22. Sandberg M., Eriksson L., Jonsson J., Sjöström M., Wold S. New chemical descriptors relevant for the design of biologically active peptides. A multivariate characterization of 87 amino acids. Journal of Medicinal Chemistry. 1988;41(14):2481–2491. doi: 10.1021/jm9700575
  23. Kosky A.A., Dharmavaram V., Ratnaswamy G., Manning M.C. Multivariate analysis of the sequence dependence of asparagine deamidation rates in peptides. Pharmaceutical Research. 2009;26(11):2417–2428. doi: 10.1007/s11095-009-9953-8
  24. Zbacnik N.J., Henry C.S., Manning M. C. A Chemometric Approach Toward Predicting the Relative Aggregation Propensity: Aβ (1‒42). Journal of Pharmaceutical Sciences. 2020;109(1):624–632. doi: 10.1016/j.xphs.2019.10.014
  25. MPI Bioinformatics Toolkit. (accessed 05 September 2021).
  26. Zimmermann L., Stephens A., Nam S.Z., Rau D., Kübler J., Lozajic M., Gabler F., Söding J., Lupas A.N., Alva V. A completely reimplemented MPI bioinformatics toolkit with a new HHpred server at its core. Journal of Molecular Biology. 2018;430(15):2237–2243. doi: 10.1016/j.jmb.2017.12.007
  27. Jones D.T. Protein secondary structure prediction based on position-specific scoring matrices. Journal of Molecular Biology. 1999;292(2):195–202. doi: 10.1006/jmbi.1999.3091
  28. Heffernan R., Yang Y., Paliwal K., Zhou Y. Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility. Bioinformatics. 2017;33(18):2842–2849. doi: 10.1093/bioinformatics/bty1006
  29. Yan R., Xu D., Yang J., Walker S., Zhang Y. A comparative assessment and analysis of 20 representative sequence alignment methods for protein structure prediction. Scientific Reports. 2013;3:2619. doi: 10.1038/srep02619
  30. Wang S., Peng J., Ma J., Xu J. Protein secondary structure prediction using deep convolutional neural fields. Scientific Reports. 2016;6:18962. doi: 10.1038/srep18962
  31. Klausen M.S., Jespersen M.C., Nielsen H., Jensen K.K., Jurtz V.I., Soenderby C.K., Sommer M.O.A., Winther O., Nielsen M., Petersen B., Marcatili P. NetSurfP-2.0: Improved prediction of protein structural features by integrated deep learning. Proteins: Structure, Function, and Bioinformatics. 2019;87(6):520–527. doi: 10.1002/prot.25674
  32. Krogh A., Larsson B., Von Heijne G., Sonnhammer E.L. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. Journal of Molecular Biology. 2001;305(3):567–580. doi: 10.1006/jmbi.2000.4315
  33. Käll L., Krogh A., Sonnhammer E.L. A combined transmembrane topology and signal peptide prediction method. Journal of Molecular Biology. 2004;338(5):1027–1036. doi: 10.1016/j.jmb.2004.03.016
  34. Käll L., Krogh A., Sonnhammer E.L. An HMM posterior decoder for sequence feature prediction that includes homology information. Bioinformatics. 2005;21(1):i251–i257. doi: 10.1093/bioinformatics/bti1014
  35. Kruskal J.B. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika. 1964;29(2):1–27. doi: 10.1007/BF02289565
  36. Kel' E.A., Kolchanov N.A., Solov'ev V.V. Zh. Obshch. Biol. 1988;49(3):343–354 (in Russ.).
  37. Kolchanov N.A., Kel' E.A., Solov'ev V.V. Zh. Obshch. Biol. 1988;49(6):723–728 (in Russ.).
  38. Chen C. P., Kernytsky A., Rost B. Transmembrane helix predictions revisited. Protein Science. 2002;11(12):2774–2791. doi: 10.1110/ps.0214502
  39. Lesnik T., Reiss C. Detection of transmembrane helical segments at the nucleotide level in eukaryotic membrane protein genes. Biochem. Mol. Biol. Int. 1998;44(3):471–479. doi: 10.1080/15216549800201492
  40. Nakashima H., Yoshihara A., Kitamura K.I. Favorable and unfavorable amino acid residues in water-soluble and transmembrane proteins. J. Biomedical Science and Engineering. 2013;6(1):36–44. doi: 10.4236/jbise.2013.61006
  41. Vakirlis N., Acar O., Hsu B., Coelho N.C., Van Oss S.B., Wacholder A., Medetgul-Ernar K., Bowman II R.W., Hines C.P., Iannotta J. et all. De novo emergence of adaptive membrane proteins from thymine-rich genomic sequences. Nature Communications. 2020;11(1):1–18. doi: 10.1038/s41467-020-14500-z
Table of Contents Original Article
Math. Biol. Bioinf.
doi: 10.17537/2021.16.299
published in Russian

Abstract (rus.)
Abstract (eng.)
Full text (rus., pdf)


  Copyright IMPB RAS © 2005-2022