Russian version English version
Volume 18   Issue 1   Year 2023
Abhigyan Nath1, Sudama Rathore1, Pangambam Sendash Singh2

Exploiting ensemble learning and negative sample space for predicting extracellular matrix receptor interactions

Mathematical Biology & Bioinformatics. 2023;18(1):113-127.

doi: 10.17537/2023.18.113.


  1. Gullberg D., Heldin P., Liliana S., Ruggero T., Achilleas T., Jan-Olof W. Extracellular matrix: pathobiology and signaling. Walter de Gruyter, 2012.
  2. Manou D., Caon I., Bouris P., Triantaphyllidou I.-E., Giaroni C., Passi A., Karamanos N.K., Vigetti D., Theocharis A.D. The Complex Interplay Between Extracellular Matrix and Cells in Tissues. 2019. P. 1-20. doi: 10.1007/978-1-4939-9133-4_1
  3. Jinka R., Kapoor R., Sistla P.G., Raj T.A., Pande G. Alterations in Cell-Extracellular Matrix Interactions during Progression of Cancers. International Journal of Cell Biology. 2012;2012:1-8. doi: 10.1155/2012/219196
  4. Bosman F.T., Stamenkovic I. Functional structure and composition of the extracellular matrix. The Journal of Pathology. 2003;200(4):423-428. doi: 10.1002/path.1437
  5. Kim S.-H., Turnbull J., Guimond S. Extracellular matrix and cell signalling: the dynamic cooperation of integrin, proteoglycan and growth factor receptor. Journal of Endocrinology. 2011;209(2):139-151. doi: 10.1530/JOE-10-0377
  6. van der Flier A., Sonnenberg A. Function and interactions of integrins. Cell and Tissue Research. 2001;305(3):285-298. doi: 10.1007/s004410100417
  7. David G., Lories V., Decock B., Marynen P., Cassiman J.J., Van den Berghe H. Molecular cloning of a phosphatidylinositol-anchored membrane heparan sulfate proteoglycan from human lung fibroblasts. Journal of Cell Biology. 1990;111(6):3165-3176. doi: 10.1083/jcb.111.6.3165
  8. Stipp C.S., Litwack E.D., Lander A.D. Cerebroglycan: an integral membrane heparan sulfate proteoglycan that is unique to the developing nervous system and expressed specifically during neuronal differentiation. Journal of Cell Biology. 1994;124(1):149-160. doi: 10.1083/jcb.124.1.149
  9. Elenius K., Jalkanen M. Function of the syndecans - a family of cell surface proteoglycans. Journal of Cell Science. 1994;107(11):2975-2982. doi: 10.1242/jcs.107.11.2975
  10. Shi Yan, Yunpeng Zhang, Dai-Feng Lu, Feng Dong, Yongyun Lian. ECM-receptor interaction as a prognostic indicator for clinical outcome of primary osteoporosis. 2016.
  11. Buttner P., Ueberham L., Shoemaker M.B., Roden D.M., Dinov B., Hindricks G., Bollmann A., Husser D. Identification of Central Regulators of Calcium Signaling and ECM–Receptor Interaction Genetically Associated With the Progression and Recurrence of Atrial Fibrillation. Frontiers in Genetics. 2018;9. doi: 10.3389/fgene.2018.00162
  12. Karamanos N.K. Extracellular matrix: key structural and functional meshwork in health and disease. The FEBS Journal. 2019;286(15):2826-2829. doi: 10.1111/febs.14992
  13. Mavrogonatou E., Pratsinis H., Papadopoulou A., Karamanos N.K., Kletsas D. Extracellular matrix alterations in senescent cells and their significance in tissue homeostasis. Matrix Biology. 2019;75-76:27-42. doi: 10.1016/j.matbio.2017.10.004
  14. Theocharis A.D., Manou D., Karamanos N.K. The extracellular matrix as a multitasking player in disease. The FEBS Journal. 2019;286(15):2830-2869. doi: 10.1111/febs.14818
  15. Urbanczyk M., Layland S.L., Schenke-Layland K. The role of extracellular matrix in biomechanics and its impact on bioengineering of cells and 3D tissues. Matrix Biology. 2020;85-86:1-14. doi: 10.1016/j.matbio.2019.11.005
  16. Pupa S.M., Menard S., Forti S., Tagliabue E. New insights into the role of extracellular matrix during tumor onset and progression. Journal of Cellular Physiology. 2002;192(3):259-267. doi: 10.1002/jcp.10142
  17. Jung J., Ryu T., Hwang Y., Lee E., Lee D. Prediction of Extracellular Matrix Proteins Based on Distinctive Sequence and Domain Characteristics. Journal of Computational Biology. 2010;17(1):97-105. doi: 10.1089/cmb.2008.0236
  18. Hanna E., Quick J., Libutti S.K. The tumour microenvironment: a novel target for cancer therapy. Oral Diseases. 2009;15(1):8-17. doi: 10.1111/j.1601-0825.2008.01471.x
  19. Desgrosellier J.S., Cheresh D.A. Integrins in cancer: biological implications and therapeutic opportunities. Nature Reviews Cancer. 2010;10(1):9-22. doi: 10.1038/nrc2748
  20. Launay G., Salza R., Multedo D., Thierry-Mieg N., Ricard-Blum S. MatrixDB, the extracellular matrix interaction database: updated content, a new navigator and expanded functionalities. Nucleic Acids Research. 2015;43(D1):D321-D327. doi: 10.1093/nar/gku1091
  21. Nath A., Leier A. Improved cytokine–receptor interaction prediction by exploiting the negative sample space. BMC Bioinformatics. 2020;21(1):493. doi: 10.1186/s12859-020-03835-5
  22. Kanehisa M. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research. 2000;28(1):27-30. doi: 10.1093/nar/28.1.27
  23. Roy S., Martinez D., Platero H., Lane T., Werner-Washburne M. Exploiting Amino Acid Composition for Predicting Protein-Protein Interactions. PLoS ONE. 2009;4(11). Article No. e7813. doi: 10.1371/journal.pone.0007813
  24. Nath A., Chaube R., Subbiah K. An insight into the molecular basis for convergent evolution in fish antifreeze Proteins. Computers in Biology and Medicine. 2013;43(7):817-821. doi: 10.1016/j.compbiomed.2013.04.013
  25. Nath A. Insights into the sequence parameters for halophilic adaptation. Amino Acids. 2016;48(3):751-762. doi: 10.1007/s00726-015-2123-x
  26. Nath A., Subbiah K. The role of pertinently diversified and balanced training as well as testing data sets in achieving the true performance of classifiers in predicting the antifreeze proteins. Neurocomputing. 2018;272:294-305. doi: 10.1016/j.neucom.2017.07.004
  27. Atchley W.R., Zhao J., Fernandes A.D., Drüke T. Solving the protein sequence metric problem. Proceedings of the National Academy of Sciences. 2005;102(18):6395-6400. doi: 10.1073/pnas.0408677102
  28. Chen Z., Zhao P., Li F., Leier A., Marquez-Lago T.T., Wang Y., Webb G.I., Smith A.I., Daly R.J., Chou K.-C., Song J. iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences. Bioinformatics. 2018;34(14):2499-2502. doi: 10.1093/bioinformatics/bty140
  29. Wang J., Yang B., Revote J., Leier A., Marquez-Lago T.T., Webb G., Song J., Chou K.-C., Lithgow T. POSSUM: a bioinformatics toolkit for generating numerical sequence feature descriptors based on PSSM profiles. Bioinformatics. 2017;33(17):2756-2758. doi: 10.1093/bioinformatics/btx302
  30. Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J. Basic local alignment search tool. Journal of Molecular Biology. 1990;215(3):403-410. doi: 10.1016/S0022-2836(05)80360-2
  31. Polikar R. Ensemble based systems in decision making. IEEE Circuits and Systems Magazine. 2006;6(3):21-45. doi: 10.1109/MCAS.2006.1688199
  32. Freund Y., Schapire R.E. Experiments with a New Boosting Algorithm. In: In proceedings of the thirteenth International Conference on Machine Learning. Morgan Kaufmann, 1996. P. 148-156.
  33. Schapire R.E. The Boosting Approach to Machine Learning: An Overview. 2003. P. 149-171. doi: 10.1007/978-0-387-21579-2_9
  34. Breiman L. Bagging predictors. Machine Learning. 1996;24(2):123-140. doi: 10.1007/BF00058655
  35. Breiman L. Random Forests. Machine Learning. 2001;45(1):5-32. doi: 10.1023/A:1010933404324
  36. Nath A., Subbiah K. Maximizing lipocalin prediction through balanced and diversified training set and decision fusion. Computational Biology and Chemistry. 2015;59:101-110. doi: 10.1016/j.compbiolchem.2015.09.011
  37. de Groot P.J., Postma G.J., Melssen W.J., Buydens L.M.C. Selecting a representative training set for the classification of demolition waste using remote NIR sensing. Analytica Chimica Acta. 1999;392(1):67-75. doi: 10.1016/S0003-2670(99)00193-2
  38. Li D.-C., Hu S.C., Lin L.-S., Yeh C.-W. Detecting representative data and generating synthetic samples to improve learning accuracy with imbalanced data sets. PLoS ONE. 2017;12(8). Article No. e0181853. doi: 10.1371/journal.pone.0181853
  39. Jain A.K., Murty M.N., Flynn P.J. Data clustering. ACM Computing Surveys. 1999;31(3):264-323. doi: 10.1145/331499.331504
  40. Larose D.T., Larose C.D. Discovering Knowledge in Data. Hoboken, NJ, USA: John Wiley and Sons, Inc., 2014. doi: 10.1002/9781118874059
  41. Daszykowski M., Walczak B., Massart D.L. Representative subset selection. Analytica Chimica Acta. 2002;468(1):91-103. doi: 10.1016/S0003-2670(02)00651-7
  42. Kira K., Rendell L.A. A Practical Approach to Feature Selection. In: Proceedings of the Ninth International Workshop on Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1992. P. 249-256. doi: 10.1016/B978-1-55860-247-2.50037-1
  43. Urbanowicz R.J., Meeker M., La Cava W., Olson R.S., Moore J.H. Relief-based feature selection: Introduction and review. Journal of Biomedical Informatics. 2018;85:189-203. doi: 10.1016/j.jbi.2018.07.014
  44. Hall M., Frank E., Holmes G., Pfahringer B., Reutemann P., Witten I.H. The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter. 2009;11(1):10-18. doi: 10.1145/1656274.1656278
  45. Ling C.X., Huang J., Zhang H. AUC: A Better Measure than Accuracy in Comparing Learning Algorithms. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2003;2671:329-341. doi: 10.1007/3-540-44886-1_25
  46. Jin H., Ling C.X. Using AUC and accuracy in evaluating learning algorithms. IEEE Transactions on Knowledge and Data Engineering. 2005;17(3):299-310. doi: 10.1109/TKDE.2005.50
  47. Murakami Y., Mizuguchi K. PSOPIA: Toward more reliable protein-protein interaction prediction from sequence information. In: 2017 International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS). IEEE, 2017. P. 255-261. doi: 10.1109/ICIIBMS.2017.8279749
  48. Perovic V., Sumonja N., Gemovic B., Toska E., Roberts S.G., Veljkovic N. TRI{textunderscoretool: a web-tool for prediction of protein–protein interactions in human transcriptional regulation. Bioinformatics. 2017;33(2):289-291. doi: 10.1093/bioinformatics/btw590
  49. Planas-Iglesias J., Marin-Lopez M.A., Bonet J., Garcia-Garcia J., Oliva B. iLoops: a protein–protein interaction prediction server based on structural features. Bioinformatics. 2013;29(18):2360-2362. doi: 10.1093/bioinformatics/btt401
  50. Wolpert, D., Macready W. No free lunch theorems for optimization. IEEE Transactions On Evolutionary Computation. 1997;1:67-82. doi: 10.1109/4235.585893
  51. Murphy K. Naive bayes classifiers. University Of British Columbia. 2006;18:1-8.
  52. Breiman L. Random forests. Machine Learning. 2001;45:5-32. doi: 10.1023/A:1010933404324
  53. Breiman L. Bagging predictors. Machine Learning. 1996;24:123-140. doi: 10.1007/BF00058655
  54. Peterson L. K-nearest neighbor. Scholarpedia. 2009;4. Article Ή 1883. doi: 10.4249/scholarpedia.1883
  55. Cortes C., Vapnik V. Support-vector networks. Machine Learning. 1995;20:273-297. doi: 10.1007/BF00994018
  56. Platt J. Sequential minimal optimization: A fast algorithm for training support vector machines. Microsoft Research, 1998. Technical Report No. msr-tr-98-14.
  57. Keerthi S., Shevade S., Bhattacharyya C., Murthy K. Improvements to Platt's SMO algorithm for SVM classifier design. Neural Computation. 2001;13:637-649. doi: 10.1162/089976601300014493
  58. Rodriguez J., Kuncheva L., Alonso C. Rotation forest: A new classifier ensemble method. IEEE Transactions On Pattern Analysis And Machine Intelligence. 2006;28:1619-1630. doi: 10.1109/TPAMI.2006.211
Table of Contents Original Article
Math. Biol. Bioinf.
doi: 10.17537/2023.18.113
published in English

Abstract (eng.)
Abstract (rus.)
Full text (eng., pdf)
Supplementary data


  Copyright IMPB RAS © 2005-