Russian version English version
Volume 15   Issue 2   Year 2020
Mukhin A.M.1,2, Genaev M.A.1,2,3, Rasskazov D.A.1,2, Lashin S.A.1,2,3, Afonnikov D.A.1,2,3

RDBMS and NOSQL Based Hybrid Technology for Transcriptome Data Structuring and Processing

Mathematical Biology & Bioinformatics. 2020;15(2):455-470.

doi: 10.17537/2020.15.455.

References

  1. Martin L.B.B., Fei Z., Giovannoni J.J., Rose J.K.C. Catalyzing plant science research with RNA-seq. Frontiers in Plant Science. 2013;4:66. doi: 10.3389/fpls.2013.00066
  2. Usadel B., Fernie A.R. The plant transcriptome-from integrating observations to models. Frontiers in Plant Science. 2013;4:48.
  3. Klepikova A. V., Kasianov A.S., Gerasimov E.S., Logacheva M.D., Penin A.A. A high resolution map of the Arabidopsis thaliana developmental transcriptome based on RNA-seq profiling. Plant Journal. 2016;88(6):1058–1070. doi: 10.1111/tpj.13312
  4. Strickler S.R., Bombarely A., Mueller L.A. Designing a transcriptome next-generation sequencing project for a nonmodel plant species. American Journal of Botany. 2012;99(2):257–266. doi: 10.3732/ajb.1100292
  5. Haas B.J., Papanicolaou A., Yassour M., Grabherr M., Blood P.D., Bowden J., Couger M.B., Eccles D., Li B., Lieber M. et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nature Protocols. 2013;8(8):1494–1512. doi: 10.1038/nprot.2013.084
  6. Kim D., Langmead B., Salzberg S.L. HISAT: A fast spliced aligner with low memory requirements. Nature Methods. 2015;12(4):357–360. doi: 10.1038/nmeth.3317
  7. Bryant D.M., Johnson K., DiTommaso T., Tickle T., Couger M.B., Payzin-Dogru D., Lee T.J., Leigh N.D., Kuo T.H., Davis F.G. et al. A Tissue-Mapped Axolotl De Novo Transcriptome Enables Identification of Limb Regeneration Factors. Cell Reports. 2017;18(3):762–776. doi: 10.1016/j.celrep.2016.12.063
  8. Bolger M.E., Arsova B., Usadel B. Plant genome and transcriptome annotations: From misconceptions to simple solutions. Briefings in Bioinformatics. 2018;19(3):437–449.
  9. Glagoleva A.Y., Shmakov N.A., Shoeva O.Y., Vasiliev G. V., Shatskaya N. V., Börner A., Afonnikov D.A., Khlestkina E.K. Metabolic pathways and genes identified by RNA-seq analysis of barley near-isogenic lines differing by allelic state of the Black lemma and pericarp (Blp) gene. BMC Plant Biology. 2017;17(S1):182. doi: 10.1186/s12870-017-1124-1
  10. Shmakov N.A., Vasiliev G. V., Shatskaya N. V., Doroshkov A. V., Gordeeva E.I., Afonnikov D.A., Khlestkina E.K. Identification of nuclear genes controlling chlorophyll synthesis in barley by RNA-seq. BMC Plant Biology. 2016;16(3):119–138. doi: 10.1186/s12870-016-0926-x
  11. Papatheodorou I., Moreno P., Manning J., Fuentes A.M.P., George N., Fexova S., Fonseca N.A., Füllgrabe A., Green M., Huang N. et al. Expression Atlas update: From tissues to single cells. Nucleic Acids Research. 2020;48(D1):D77–D83. doi: 10.1093/nar/gkz947
  12. Masoudi-Nejad A., Goto S., Jauregui R., Ito M., Kawashima S., Moriya Y., Endo T.R., Kanehisa M. EGENES: Transcriptome-based plant database of genes with metabolic pathway information and expressed sequence tag indices in KEGG. Plant Physiology. 2007;144(2):857–866. doi: 10.1104/pp.106.095059
  13. Ueno S., Nakamura Y., Kobayashi M., Terashima S., Ishizuka W., Uchiyama K., Tsumura Y., Yano K., Goto S. TodoFirGene: Developing transcriptome resources for genetic analysis of abies sachalinensis. Plant and Cell Physiology. 2018;59(6):1276–1284. doi: 10.1093/pcp/pcy058
  14. Dubois A., Carrere S., Raymond O., Pouvreau B., Cottret L., Roccia A., Onesto J.P., Sakr S., Atanassova R., Baudino S. et al. Transcriptome database resource and gene expression atlas for the rose. BMC Genomics. 2012;13(1):638. doi: 10.1186/1471-2164-13-638
  15. Fernández-Pozo N., Canales J., Guerrero-Fernández D., Villalobos D.P., Díaz-Moreno S.M., Bautista R., Flores-Monterroso A., Guevara M.Á., Perdiguero P., Collada C. et al. EuroPineDB: A high-coverage web database for maritime pine transcriptome. BMC Genomics. 2011;12(1):366. doi: 10.1186/1471-2164-12-366
  16. Barnett D.W., Garrison E.K., Quinlan A.R., Str̈mberg M.P., Marth G.T. Bamtools: A C++ API and toolkit for analyzing and managing BAM files. Bioinformatics. 2011;27(12):1691–1692. doi: 10.1093/bioinformatics/btr174
  17. Quinlan A.R., Hall I.M. BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26(6):841–842. doi: 10.1093/bioinformatics/btq033
  18. Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078–2079. doi: 10.1093/bioinformatics/btp352
  19. Pertea G., Pertea M. GFF Utilities: GffRead and GffCompare. F1000Research. 2020;9:304. doi: 10.12688/f1000research.23297.1
  20. Anders S., Huber W. Differential expression of RNA-Seq data at the gene level-the DESeq package. Heidelberg, Germany: European Molecular Biology Laboratory (EMBL). 2012;10:f1000research.
  21. Bray N.L., Pimentel H., Melsted P., Pachter L. Near-optimal probabilistic RNA-seq quantification. Nature Biotechnology. 2016;34(5):525–527. doi: 10.1038/nbt.3519
  22. Gunbin K. V., Suslov V. V., Genaev M.A., Afonnikov D.A. Computer System for Analysis of Molecular Evolution Modes (SAMEM): Analysis of molecular evolution modes at deep inner branches of the phylogenetic tree. In Silico Biology. 2011;11(3):109–123.
  23. Han J., Haihong E., Le G., Du J. Survey on NoSQL database. In: ICPCA 2011: 6th International Conference on Pervasive Computing and Applications. 2011. P. 363–366.
  24. Gabetta M., Limongelli I., Rizzo E., Riva A., Segagni D., Bellazzi R. BigQ: A NoSQL based framework to handle genomic variants in i2b2. BMC Bioinformatics. 2015;16(1):415. doi: 10.1186/s12859-015-0861-0
  25. ENA Portal. https://www.ebi.ac.uk/ena/portal/api/ (accessed: 23.10.2020).
  26. Harrison P.W., Alako B., Amid C., Cerdeño-Tárraga A., Cleland I., Holt S., Hussein A., Jayathilaka S., Kay S., Keane T. et al. The European Nucleotide Archive in 2018. Nucleic Acids Research. 2019;47(D1):D84–D88. doi: 10.1093/nar/gky1078
  27. Submit your project and biological samples. https://www.ncbi.nlm.nih.gov/sra/docs/submitbio/ (accessed: 23.10.2020).
  28. Staff S.R.A.S. Using the SRA Toolkit to convert .sra files into other formats. National Center for Biotechnology Information. 2011.
  29. Chen S., Zhou Y., Chen Y., Gu J. Fastp: An ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34(17):i884–i890. doi: 10.1093/bioinformatics/bty560
  30. Bushmanova E., Antipov D., Lapidus A., Suvorov V., Prjibelski A.D. RnaQUAST: A quality assessment tool for de novo transcriptome assemblies. Bioinformatics. 2016;32(14):2210–2212. doi: 10.1093/bioinformatics/btw218
  31. Wu T.D., Watanabe C.K. GMAP: A genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics. 2005;21(9):1859–1875. doi: 10.1093/bioinformatics/bti310
  32. Ensembl Plants. https://plants.ensembl.org/index.html (accessed: 23.10.2020).
  33. Kersey P.J., Allen J.E., Allot A., Barba M., Boddu S., Bolt B.J., Carvalho-Silva D., Christensen M., Davis P., Grabmueller C. et al. Ensembl Genomes 2018: An integrated omics infrastructure for non-vertebrate species. Nucleic Acids Research. 2018;46(D1):D802–D808. doi: 10.1093/nar/gkx1011
  34. Jones P., Binns D., Chang H.Y., Fraser M., Li W., McAnulla C., McWilliam H., Maslen J., Mitchell A., Nuka G. et al. InterProScan 5: Genome-scale protein function classification. Bioinformatics. 2014;30(9):1236–1240. doi: 10.1093/bioinformatics/btu031
  35. PostgreSQL: The world’s most advanced open source database. https://www.postgresql.org/ (accessed: 23.10.2020).
  36. Schönig H.-J. Schönig H.-J. Mastering PostgreSQL 11: Expert techniques to build scalable, reliable, and fault-tolerant database applications. Birmingham: Packt Publishing Ltd., 2018. 448 p.
  37. SQLAlchemy - The Database Toolkit for Python. https://www.sqlalchemy.org/ (accessed: 23.10.2020).
  38. PostgreSQL: Documentation: 12: 11.2. Index Types. https://www.postgresql.org/docs/12/indexes-types.html (accessed: 23.10.2020).
  39. Carbon S., Douglass E., Dunn N., Good B., Harris N.L., Lewis S.E., Mungall C.J., Basu S., Chisholm R.L., Dodson R.J. et al. The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Research. 2019;47(D1):D330–D338. doi: 10.1093/nar/gky1055
  40. Petković D. JSON integration in relational database systems. Int. J. Comput. Appl. 2017;168(5):14–19. doi: 10.5120/ijca2017914389
  41. Kaur M., Shaik B. Kaur M., Shaik B. PostgreSQL Development Essentials. Birmingham: Packt Publishing Ltd., 2016. 210 p.
  42. DataGrip: cross-platform development environment for databases and SQL. https://www.jetbrains.com/ru-ru/datagrip/ (accessed: 23.10.2020).
  43. pgAdmin - PostgreSQL Tools. https://www.pgadmin.org/ (accessed: 23.10.2020).

 

Table of Contents Original Article
Math. Biol. Bioinf.
2020;15(2):455-470
doi: 10.17537/2020.15.455
published in Russian

Abstract (rus.)
Abstract (eng.)
Full text (rus., pdf)
References

 

  Copyright IMPB RAS © 2005-2022