New Procedure of Raw Illumina MiSeq Data Filtering For the Amplicon Metagenomic Libraries
Bukin Yu.S., Buzoleva L.S., Golozubova Y.S., Galachyants Yu.P.
Limnological Institute Siberian Branch of the Russian Academy of Sciences, Irkutsk, Russia
Irkutsk National Research Technical University, Irkutsk, Russia
Irkutsk Scientific Center Siberian Branch of the Russian Academy of Sciences, Irkutsk, Russia
Far Eastern Federal University, Vladivostok, Russia
Somov Institute of Epidemiology and Microbiology, Vladivostok, Russia
Abstract. In this paper we present an algorithm to filter amplicon paired-end NGS raw data which is used to capture genetic and taxonomic diversity of communities of unicellular microorganisms. The suggested approach allows one to overcome the issue of massive data loss during filtration of raw sequences and increases the static representativeness of analyzed amplicons. Furthermore, an unequal elimination of sequences belonging to different taxonomic groups was shown to occur if one applies standard trimming methods based on filtration of quality of raw reads, for instance, using sliding window approach. This bias may result in a skew of taxon counts and depletion of taxonomic diversity of analyzed communities. The suggested method does not introduce the errors of this kind. The implementation of the algorithm on R programming language as well as a number of example files for analysis is available at https://github.com/barnsys/metagenomic_analysis.
Key words: amplicon metagenomics, New Generation Sequencing, meta-barcoding, quality control, software for filtering reads.