Назипова Н.Н., Исаев Е.А., Корнилов В.В., Первухин Д.В., Морозова А.А., Горбунов А.А., Устинин М.Н.
Большие данные в биоинформатике
Математическая биология и биоинформатика. 2017;12(1):102-119.
doi: 10.17537/2017.12.102.
Список литературы
- Manyika J., Chui M., Brown B., Bughin J., Dobbs R., Roxburgh C., Byers A.H. The Next Frontier for Innovation, Competition, and Productivity. San Francisco: McKinsey Global Institute; 2011. http://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/big-data-the-next-frontier-for-innovation (accessed 17 February 2017).
- Jacobs A. The Pathologies of Big Data. Communications of the ACM. 2009;52(8). doi: 10.1145/1536616.1536632
- What’s New in Gartner’s Hype Cycle for Emerging Technologies, 2015. Gartner. http://www.gartner.com/smarterwithgartner/whats-new-in-gartners-hype-cycle-for-emerging-technologies-2015/ (accessed 17 February 2017).
- Chui M., Löffler M., Roberts R. The Internet of Things. McKinsey Quarterly. 2010. http://www.mckinsey.com/industries/high-tech/our-insights/the-internet-of-things (accessed 17 February 2017).
- Hogeweg P. The Roots of Bioinformatics in Theoretical Biology. PLOS Computational Biology. 2011;7(3). Article No. e1002021. doi: 10.1371/journal.pcbi.1002021
- Winkler H. Verbreitung und Ursache der Parthenogenesis im Pflanzen - und Tierreiche. Jena: Verlag Fischer; 1920. doi: 10.5962/bhl.title.1460
- Baker M. The ’Oms Puzzle. Nature. 2013;494:416-419. doi: 10.1038/494416a
- Ohashi H., Hesegawa M., Wakimoto K., Miyamoto-Sato E. Next-generation technologies for multiomics approaches including interactome sequencing. BioMed Research International. 2015;2015. Article No. 104209.
- International Human Genome Sequencing Consortium. Human genome. Nature. 2001;409:860-921.
- Venter J.C., Adams M.D., Myers E.W., Li P.W., Mural R.J., Sutton G.G., Smith H.O., Yandell M., Evans C.A., Holt R.A., et al. The sequence of the human genome. Science. 2001;291(5507):1304-1351. doi: 10.1126/science.1058040
- Buermans H.P.J., den Dunnen J.T. Next generation sequencing technology. Advances and applications. BBA – Molecular Basis of Disease. 2014;1842(10):1932-1941. doi: 10.1016/j.bbadis.2014.06.015
- Bioinforx Inc. Next Generation Sequencing Software. http://bioinfo.wisc.edu/knowledge_base/next-gen-seq_software.php (accessed 17 February 2017).
- BaseSpace Sequence Hub. https://www.illumina.com/content/dam/illumina-marketing/documents/products/datasheets/datasheet_basespace.pdf (accessed 17 February 2017).
- CLCBio. http://www.clcbio.com (accessed 17 February 2017).
- DNASTAR Lasergene. https://www.dnastar.com/t-allproducts.aspx (accessed 17 February 2017).
- Kearse M., Moir R., Wilson A., Stones-Havas S., Cheung M., Sturrock S., Buxton S., Cooper A., Markowitz S., Duran C., et al. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics. 2012;28(12):1647-1649. doi: 10.1093/bioinformatics/bts199
- Giardine B., Riemer C., Hardison R.C., Burhans R., Elnitski L., Shah P., Zhang Y., Blankenberg D., Albert I., Taylor J., et al. Galaxy: a platform for interactive large-scale genome analysis. Genome Res. 2005;15(10):1451-1455. doi: 10.1101/gr.4086505
- Goecks J., Nekrutenko A., Taylor J., Galaxy Team. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010;11(8). Article No. R86. doi: 10.1186/gb-2010-11-8-r86
- Madduri R.K., Sulakhe D., Lacinski L., Liu B., Rodriguez A., Chard K., Dave U.J., Foster I.T. Experiences Building Globus Genomics: A Next-Generation Sequencing Analysis Service using Galaxy, Globus, and Amazon Web Services. Concurr. Comput. 2014;26(13):2266-2279. doi: 10.1002/cpe.3274
- Wattam A.R., Abraham D., Dalay O., Disz T.L., Driscoll T., Gabbard J.L., Gillespie J.J., Gough R., Hix D., Kenyon R., et al. PATRIC, the bacterial bioinformatics database and analysis resource. Nucleic Acids Res. 2014;42:D581-D591. doi: 10.1093/nar/gkt1099
- Golosova O., Henderson R., Vaskin Y., Gabrielian A., Grekhov G., Nagarajan V., Oler A.J., Quinones M., Hurt D., Fursov M., Huyen Y. Unipro UGENE NGS pipelines and components for variant calling, RNA-seq and ChIP-seq data analyses. PeerJ. 2014;2. Article No. e644. doi: 10.7717/peerj.644
- Okonechnikov K., Golosova O., Fursov M., UGENE Team. Unipro UGENE: a unified bioinformatics toolkit. Bioinformatics. 2012;28(8):1166-1167. doi: 10.1093/bioinformatics/bts091
- Jagla B., Wiswedel B., Coppґee J.-Y. Extending KNIME for next-generation sequencing data analysis. Bioinformatics. 2011;27(20):2907-2909. doi: 10.1093/bioinformatics/btr478
- Warr W.A. Scientific workflow systems: Pipeline Pilot and KNIME. Journal of Computer-Aided Molecular Design. 2012;26(7):801-804. doi: 10.1007/s10822-012-9577-7
- Oinn T., Addis M., Ferris J., Marvin D., Senger M., Greenwood M., Carver T., Glover K., Pocock M.R., Wipat A., Li P. Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics. 2004;20(17):3045-3054. doi: 10.1093/bioinformatics/bth361
- Barnett D.W., Garrison E.K., Quinlan A.R., Stromberg M.P., Marth G.T. BamTools: a C++ API and toolkit for analyzing and managing BAM files. Bioinformatics. 2011;27(12):1691-1692. doi: 10.1093/bioinformatics/btr174
- Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R., 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078-2079. doi: 10.1093/bioinformatics/btp352
- Nordell Markovits A., Joly Beauparlant C., Toupin D., Wang S., Droit A., Gevry N. NGS++: a library for rapid prototyping of epigenomics software tools. Bioinformatics. 2013;29(15):1893-1894. doi: 10.1093/bioinformatics/btt312
- Plieskatt J., Rinaldi G., Brindley P.J., Jia X., Potriquet J., Bethony J., Mulvenna J. Bioclojure: a functional library for the manipulation of biological sequences. Bioinformatics. 2014;30(17):2537-2539. doi: 10.1093/bioinformatics/btu311
- libStatGen. https://github.com/statgen/libStatGen/ (accessed 17 February 2017).
- Pitt W.R., Williams M.A., Steven M., Sweeney B., Bleasby A.J., Moss D.S. The Bioinformatics Template Library – generic components for biocomputing. Bioinformatics. 2001;17(8):729-737. doi: 10.1093/bioinformatics/17.8.729
- Stajich J.E., Block D., Boulez K., Brenner S.E., Chervitz S.A., Dagdigian C., Fuellen G., Gilbert J.G., Korf I., Lapp H., et al. The Bioperl toolkit: Perl modules for the life sciences. Genome Res. 2002;12(10):1611-1618. doi: 10.1101/gr.361602
- Goto N., Prins P., Nakao M., Bonnal R., Aerts J., Katayama T. BioRuby: bioinformatics software for the Ruby programming language. Bioinformatics. 2010;26(20):2617-269. doi: 10.1093/bioinformatics/btq475
- Holland R.C., Down T.A., Pocock M., Prlic A., Huen D., James K., Foisy S., Drager A., Yates A., Heuer M., et al. BioJava: an open-source framework for bioinformatics. Bioinformatics. 2008;24(18):2096-2097. doi: 10.1093/bioinformatics/btn397
- Cock P.J., Antao T., Chang J.T., Chapman B.A., Cox C.J., Dalke A., Friedberg I., Hamelryck T., Kauff F., Wilczynski B., et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25(11):1422-1423. doi: 10.1093/bioinformatics/btp163
- Open Bioinformatics Foundation. https://www.open-bio.org/wiki/Main_Page (accessed 17 February 2017).
- Huber W., Carey V.J., Gentleman R., Anders S., Carlson M., Carvalho B.S., Bravo H.C., Davis S., Gatto L., Girke T., et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nat. Methods. 2015;12(2):115-121. doi: 10.1038/nmeth.3252
- Gentleman R.C., Carey V.J., Bates D.M., Bolstad B., Dettling M., Dudoit S., Ellis B., Gautier L., Ge Y., Gentry J., et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004;5(10). Article No. R80. doi: 10.1186/gb-2004-5-10-r80
- Milicchio F., Rose R., Bian J., Min J., Prosperi M. Visual programming for next-generation data analytics. BioData Mining. 2016;9. Article No. 16. doi: 10.1186/s13040-016-0095-3
- Bernstein F.C., Koetzle T.F., Williams G.J., Meyer E.F.Jr., Brice M.D., Rodgers J.R., Kennard O., Shimanouchi T., Tasumi M. The Protein Data Bank: a computer-based archival file for macromolecular structures. J. Mol. Biol. 1977;112(3):535-542. doi: 10.1016/S0022-2836(77)80200-3
- Bourne P. E., Berman H.M., McMahon B., Watenpaugh K.D., Westbrook J.D., Fitzgerald P.M.D. Macromolecular crystallographic information file. Methods in Enzymology. 1997;277:571-590. doi: 10.1016/S0076-6879(97)77032-0
- Galperin M.Y., Fernández-Suárez X.M., Rigden D.J. The 24th annual Nucleic Acids Research database issue: a look back and upcoming changes. Nucleic Acids Res. 2017;45:D1-D11. doi: 10.1093/nar/gkw1188
- Benson D., Lipman D.J., Ostell J. GenBank. Nucleic Acids Res. 1994;22:3441-3444. doi: 10.1093/nar/22.17.3441
- Rice C.M., Fuchs R., Higgins D.G., Stoehr P.J., Cameron G.N. The EMBL Data Library. Nucleic Acids Res. 1993;21:2967-2971. doi: 10.1093/nar/21.13.2967
- Tateno Y., Gojobori T. DNA Data Bank of Japan in the age of information biology. Nucleic Acids Res. 1997;25(1):14-17. doi: 10.1093/nar/25.1.14
- de Brevern A.G., Meyniel J.-P., Fairhead C., Neuvéglise C., Malpertuy A. Trends in IT Innovation to Build a Next Generation Bioinformatics Solution to Manage and Analyse Biological Big Data Produced by NGS Technologies. BioMed Research International. 2015. Article No. 904541. doi: 10.1155/2015/904541
- Lith A., Mattsson J. Investigating Storage Solutions for Large Data. A comparison of well performing and scalable data storage solutions for real time extraction and batch insertion of data: Master of Science Thesis. 2010. http://publications.lib.chalmers.se/records/fulltext/123839.pdf (accessed 17 February 2017).
- Svensson J. Relational vs. graph databases: Which to use and when? SD Times. 2016. http://sdtimes.com/guest-view-relational-vs-graph-databases-use/#sthash.yHI6aoDv.dpuf (accessed 17 February 2017).
- Have C.T., Jensen L.J. Are graph databases ready for bioinformatics? Bioinformatics. 2013;29(24):3107-3108. doi: 10.1093/bioinformatics/btt549
- Taylor R.C. An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. BMC Bioinformatics. 2010;11. Article No. S1. doi: 10.1186/1471-2105-11-S12-S1
- Chang F., Dean J., Ghemawat S., Hsieh W.C., Wallach D.A., Burrows M., Chandra T., Fikes A., Gruber R.E. Bigtable: A Distributed Storage System For Structured Data. In: The 7th Symposium on Operating System Design and Implementation Seattle, WA: Usenix Association, 2006. 14 p. https://static.googleusercontent.com/media/research.google.com/ru//archive/bigtable-osdi06.pdf (accessed 17 February 2017).
- Shen L., Shao N., Liu X., Nestler E. Ngs.plot: quick mining and visualization of next-generation sequencing data by integrating genomic databases. BMC Genomics. 2014;15(1). Article No. 284. doi: 10.1186/1471-2164-15-284
- Robinson J.T., Thorvaldsdóttir H., Winckler W., Guttman M., Lander E.S., Getz G., Mesirov J.P. Integrative genomics viewer. Nature Biotechnology. 2011;29(1):24-26. doi: 10.1038/nbt.1754
- Toedling J., Ciaudo C., Voinnet O., Heard E., Barillot E. Girafe – an R/Bioconductor package for functional exploration of aligned next-generation sequencing reads. Bioinformatics. 2010;26(22):2902-2903. doi: 10.1093/bioinformatics/btq531
- Nolan D., Lang D.T. Interactive and animated scalable vector graphics and R data displays. Journal of Statistical Software. 2012;46(1):1-88. doi: 10.18637/jss.v046.i01
- TIBCO Spotfire Homepage. http://spotfire.tibco.com/ (accessed 17 February 2017).
- Wexler J., Thompson W., Aponte K. Time Is Precious, So Are Your Models. SAS provides solutions to streamline deployment. In: SAS Global Forum 2013. Paper No. 086-2013. https://support.sas.com/resources/papers/proceedings13/086-2013.pdf (accessed 17 February 2017).
- Tanenbaum A.S., van Steen M. Raspredelennye sistemy. Printsipy i paradigmy. Saint Petersburg; 2003. 877 p. (Translation of: Tanenbaum A.S., van Steen M. Distributed Systems: Principles and Paradigms. Prentice Hall; 2002).
- Dean J., Ghemawat S. MapReduce: simplified data processing on large clusters. Commun. ACM. 2008;51(1):107-113. doi: 10.1145/1327452.1327492
- White T. Hadoop: The Definitive Guide. O’Reilly Media, Inc., 2015. 756 p.
- The Apache Software Foundation Home page. http://www.apache.org/ (accessed 17 February 2017).
- IBM z Systems – z13s. http://www-03.ibm.com/systems/z/hardware/z13s.html/ (accessed 17 February 2017).
- Rustici G., Kolesnikov N., Brandizi M., Burdett T., Dylag M., Emam I., Farne A., Hastings E., Ison J., Keays M., et al. ArrayExpress update – trends in database growth and links to data analysis tools. Nucleic Acids Res. 2013;41:D987-D990. doi: 10.1093/nar/gks1174
- Greene A.C., Giffin K.A., Greene C.S., Moore J.H. Adapting bioinformatics curricula for big data. Briefings in Bioinformatics. 2016;17(1):43-50. doi: 10.1093/bib/bbv018
- Margolis R., Derr L., Dunn M., Huerta M., Larkin J., Sheehan J., Guyer M., Green E.D. The National Institutes of Health’s Big Data to Knowledge (BD2K) initiative: capitalizing on biomedical big data. J. Am. Med. Inform. Assoc. 2014;21:957-958. doi: 10.1136/amiajnl-2014-002974
- Luo J., Wu M., Gopukumar D., Zhao Y. Big Data Application in Biomedical Research and Health Care: A Literature Review. Biomed. Inform. Insights. 2016;8:1-10 doi: 10.4137/BII.S31559
|
|
|