Russian version English version
Volume 15   Issue 1   Year 2020
Boyko I.Y., Anisimov D.S., Smolyakova L.L., Ryazanov M.A.

Approach to The Selection of Significant Features in Solving Biomedical Problems of Binary Classification of Microarray Data

Mathematical Biology & Bioinformatics. 2020;15(1):4-19.

doi: 10.17537/2020.15.4.


  1. Renard B.Y., Löwer M., Kühne Y., Reimer U., Rothermel A., Türeci O., Castle J.C., Sahin U. Rapmad: Robust analysis of peptide microarray data. BMC Bioinformatics. 2011;12. doi: 10.1186/1471-2105-12-324
  2. Önskog J., Freyhult E., Landfors M., Rydén P., Hvidsten T.R. Classification of microarrays; synergistic effects between normalization, gene selection and machine learning. BMC Bioinformatics. 2011;12. doi: 10.1186/1471-2105-12-390
  3. Mohammed A., Biegert G., Adamec J., Helikar T. CancerDiscover: An integrative pipeline for cancer biomarker and cancer class prediction from high-throughput sequencing data. Oncotarget. 2018;9(2):2565-2573. doi: 10.18632/oncotarget.23511
  4. Alanni R., Hou J., Azzawi H., Xiang Y. A novel gene selection algorithm for cancer classification using microarray datasets. BMC Med Genomics. 2019;12. doi: 10.1186/s12920-018-0447-6
  5. Xi M., Sun J., Liu L., Fan F., Wu X. Cancer Feature Selection and Classification Using a Binary Quantum-Behaved Particle Swarm Optimization and Support Vector Machine. Computational and Mathematical Methods in Medicine. 2016;2016:1–9. doi: 10.1155/2016/3572705
  6. Hira Z., Gillies D. A review of feature selection and feature extraction methods applied on microarray data. Advances in Bioinformatics. 2015;2015:1-13. doi: 10.1155/2015/198363
  7. Saeys Y., Inza I., Larranaga P. A review of feature selection techniques in bioinformatics. Bioinformatics. 2007;23(19):2507-2517. doi: 10.1093/bioinformatics/btm344
  8. Lazar C., Taminau J., Meganck S., Steenhoff D., Coletta A., Molter C., de Schaetzen V., Duque R., Bersini H., Nowe A. A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2012;9(4):1106-1119. doi: 10.1109/TCBB.2012.33
  9. Jafari P., Azuaje F. An assessment of recently published gene expression data analyses: reporting experimental design and statistical factors. BMC Medical Informatics and Decision Making. 2006;6. doi: 10.1186/1472-6947-6-27
  10. Nguyen T., Khosravi A., Creighton D., Nahavandi S. Hierarchical Gene Selection and Genetic Fuzzy System for Cancer Microarray Data Classification. PLoS ONE. 2015;10(3). doi: 10.1371/journal.pone.0120364
  11. Shahjaman M., Rahman M., Islam S., Mollah M. A Robust Approach for Identification of Cancer Biomarkers and Candidate Drugs. Medicina. 2019;55(6). doi: 10.3390/medicina55060269
  12. Maniruzzaman M., Rahman J., Ahammed B., Abedin M., Suri H., Biswas M., El-Baz A., Bangeas P., Tsoulfas G., Suri J. Statistical characterization and classification of colon microarray gene expression data using multiple machine learning paradigms. Computer Methods and Programs in Biomedicine. 2019;176:173-193. doi: 10.1016/j.cmpb.2019.04.008
  13. Momenzadeh M., Sehhati M., Rabbani H. A novel feature selection method for microarray data classification based on hidden Markov model. Journal of Biomedical Informatics. 2019;95. doi: 10.1016/j.jbi.2019.103213
  14. Boareto M., Caticha N. t-Test at the Probe Level: An Alternative Method to Identify Statistically Significant Genes for Microarray Data. Microarrays. 2014;3(4):340-351. doi: 10.3390/microarrays3040340
  15. Fox R., Dimmic M. A two-sample Bayesian t-test for microarray data. BMC Bioinformatics. 2006;7. doi: 10.1186/1471-2105-7-126
  16. Shukla A., Tripathi D. Identification of potential biomarkers on microarray data using distributed gene selection approach. Mathematical Biosciences. 2019;315. doi: 10.1016/j.mbs.2019.108230
  17. Bolon-Canedo V., Sanchez-Marono N., Alonso-Betanzos A., Benitez J., Herrera F. A review of microarray datasets and applied feature selection methods. Information Sciences. 2014;282:111–135. doi: 10.1016/j.ins.2014.05.042
  18. Aboudi N., Benhlima L. Review on wrapper feature selection approaches. In: 2016 International Conference on Engineering & MIS (ICEMIS). IEEE, 2016. P. 1–5. doi: 10.1109/ICEMIS.2016.7745366
  19. Sanz H., Valim C., Vegas E., Oller J., Reverter F. SVM-RFE: selection and visualization of the most relevant features through non-linear kernels. BMC Bioinformatics. 2018;19. doi: 10.1186/s12859-018-2451-4
  20. Li Z., Xie W., Liu T. Efficient feature selection and classification for microarray data. PLoS ONE. 2018;13(8). doi: 10.1371/journal.pone.0202167
  21. Anaissi A., Kennedy P., Goyal M. Feature selection of imbalanced gene expression microarray data. In: 2011 12th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD). IEEE, 2011. P. 73–78. doi: 10.1109/SNPD.2011.12
  22. Kang C., Huo Y., Xin L., Tian B., Yu B. Feature selection and tumor classification for microarray data using relaxed Lasso and generalized multi-class support vector machine. Journal of Theoretical Biology. 2019;463:77–91. doi: 10.1016/j.jtbi.2018.12.010
  23. Chuang L., Yang C., Wu K., Yang C. A hybrid feature selection method for DNA microarray data. Computers in Biology and Medicine. 2011;41(4):228–237. doi: 10.1016/j.compbiomed.2011.02.004
  24. Huijuan L., Junying C., Ke Y., Qun J., Yu X., Zhigang G. A hybrid feature selection algorithm for gene expression data classification. Neurocomputing. 2017;256:56–62. doi: 10.1016/j.neucom.2016.07.080
  25. Shukla A., Singh P., Vardhan V. A hybrid gene selection method for microarray recognition. Biocybernetics and Biomedical Engineering. 2018;38(4):975–991. doi: 10.1016/j.bbe.2018.08.004
  26. Sun Y., Lu C., Li X. The Cross-Entropy Based Multi-Filter Ensemble Method for Gene Selection. Genes. 2018;9(5). doi: 10.3390/genes9050258
  27. Bolon-Canedo V., Sanchez-Marono N., Alonso-Betanzos A. An ensemble of filters and classifiers for microarray data classification. Pattern Recognition. 2012;45(1):531–539. doi: 10.1016/j.patcog.2011.06.006
  28. Bolon-Canedo V., Sanchez-Marono N., Alonso-Betanzos A. Data classification using an ensemble of filters. Neurocomputing. 2014;135:13–20. doi: 10.1016/j.neucom.2013.03.067
  29. Strimbu K., Tavel J.A. What are Biomarkers? Current Opinion in HIV and AIDS. 2010;192(3):214–216. doi: 10.1097/COH.0b013e32833ed177
  30. Dronov S.V., Petukhova R.V. Izvestiia AltGU (Izvestiya of Altai State University). 2010;65(1/2):34–36 (in Russ.).
  31. Dronov S.V., Boyko I.Yu. Method for estimating connection power of binary and nominal variables. Prikl. Diskr. Mat. 2015;30(4):109–119. doi: 10.17223/20710410/30/11
  32. Anisimov D.S., Podlesnykh S.V., Kolosova E.A., Shcherbakov D.N., Petrova V.D., Johnston S.A., Lazarev A.F., Oskorbin N.N., Shapoval A.I., Ryazanov M.A. Projection to Latent Structures as a Strategy for Peptides Microarray Data Analysis. Mathematical Biology and Bioinformatics. 2017;12(2):435–445. doi: 10.17537/2017.12.435
  33. Gravier E. A prognostic DNA signature for T1T2 node-negative breast cancer patients. Genes, Chromosomes and Cancer. 2010;49(12):1125–1134. doi: 10.1002/gcc.20820
  34. Student. The probable error of a mean. Biometrika. 1908;6(1):1–25. doi: 10.2307/2331554
  35. Mann H.B., Whitney D.R. On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics. 1947;18:50–60. doi: 10.1214/aoms/1177730491
  36. Anisimov D.S., Riazanov M.A., Shapoval A.I. In: Sbornik trudov vserossiiskoi konferentsii po matematike "MAK-2016" (Collection of proceedings of the All-Russian Conference on Mathematics "MAK-2016"). Barnaul, 2016. P. 92 (in Russ.).
  37. Esbensen K.H. Analiz mnogomernykh dannykh. Izbrannye glavy (Multivariate Data Analysis. Selected Chapters). Barnaul, 2003. 157 p. (in Russ.).
  38. Cox D.R. The regression analysis of binary sequences. Journal of the Royal Statistical Society. 1958;20(2):215–242. doi: 10.1111/j.2517-6161.1958.tb00292.x
  39. Vapnik V.N. Recovery of dependencies based on empirical data. New York: Springer, 1982.
  40. Cover T.M., Hart P.E. Nearest neighbor pattern classification. IEEE Transactions on Information Theory. 1967;13(1):21–27. doi: 10.1109/TIT.1967.1053964
  41. Breiman L. Random Forests. Machine Learning. 2001;45(1):5–32. doi: 10.1023/A:1010933404324
  42. Boser B.E., Guyon I.M., Vapnik V.N. A Training Algorithm for Optimal Margin Classifiers. In: Proceedings of the 5th Annual Workshop on Computational Learning Theory – COLT’92 (Pittsburgh. 27–29 July 1992). New York, 1992. P. 144–152. doi: 10.1145/130385.130401
  43. Hyperopt: Distributed Asynchronous Hyper-parameter Optimization. (accessed 16 April 2019).
  44. Youden W.J. Index for rating diagnostic tests. Cancer. 1950;3(1):32–35. doi: 10.1002/1097-0142(1950)3:1<32::AID-CNCR2820030106>3.0.CO;2-3


Table of Contents Original Article
Math. Biol. Bioinf.
doi: 10.17537/2020.15.4
published in Russian

Abstract (rus.)
Abstract (eng.)
Full text (rus., pdf)


  Copyright IMPB RAS © 2005-2024