A multitude of metabarcoding data analysis tools and pipelines have also been developed. Often, several developed workflows are designed to process the same amplicon sequencing data, making it somewhat puzzling to choose one among the plethora of existing pipelines. However, each pipeline has its own specific philosophy, strengths and limitations, which should be considered depending on the aims of any specific study, as well as the bioinformatics expertise of the user. In this review, we outline the input data requirements, supported operating systems and particular attributes of thirtytwo amplicon processing pipelines with the goal of helping users to select a pipeline for their metabarcoding projects.
Based on the established MGnify resource, we developed metaGOflow. metaGOflow supports the fast inference of taxonomic profiles from GO-derived data based on ribosomal RNA genes and their functional annotation using the raw reads. Thanks to the Research Object Crate packaging, relevant metadata about the sample under study, and the details of the bioinformatics analysis it has been subjected to, are inherited to the data product while its modular implementation allows running the workflow partially. The analysis of 2 EMO BON samples and 1 Tara Oceans sample was performed as a use case.
metaGOflow is an efficient and robust workflow that scales to the needs of projects producing big metagenomic data such as EMO BON. It highlights how containerization technologies along with modern workflow languages and metadata package approaches can support the needs of researchers when dealing with ever-increasing volumes of biological data. Despite being initially oriented to address the needs of EMO BON, metaGOflow is a flexible and easy-to-use workflow that can be broadly used for one-sample-at-a-time analysis of shotgun metagenomics data.
Constraint-based approaches have been widely used for the analysis of such models and led to intriguing geometry-oriented challenges. In this setting, sampling uniformly points from polytopes derived from metabolic models (flux sampling) provides a representation of the solution space of the model under various conditions. However, the polytopes that result from such models are of high dimension (in the order of thousands) and usually considerably skinny. Therefore, to sample uniformly at random from such polytopes shouts for a novel algorithmic and computational framework specially tailored for the properties of metabolic models. We present a Multiphase Monte Carlo Sampling (MMCS) algorithm that unifies rounding and sampling in one pass, yielding both upon termination. It exploits an optimized variant of the Billiard Walk that enjoys faster arithmetic complexity per step than the original. Sampling on the most complicated human metabolic network accessible today, Recon3D, corresponding to a polytope of dimension 5 335, took less than 30 hours.
The present study combines 16S rRNA amplicon sequencing and shotgun metagenomics on a hypersaline marsh in Tristomo bay (Karpathos, Greece). Samples were collected in July 2018 and November 2019 from microbial mats, deeper sediment, aggregates observed in the water overlying the sediment, as well as sediment samples with no apparent layering. Metagenomic samples’ coassembly and binning revealed 250 bacterial and 39 archaeal metagenome-assembled genomes, with completeness estimates higher than 70% and contamination less than 5%. All MAGs had KEGG Orthology terms related to osmoadaptation, with the ‘salt in’ strategy ones being prominent. Halobacteria and Bacteroidetes were the most abundant taxa in the mats. Photosynthesis was most likely performed by purple sulphur and nonsulphur bacteria. All samples had the capacity for sulphate reduction, dissimilatory arsenic reduction, and conversion of pyruvate to oxaloacetate.
This work focuses on information Extraction (IE) from the marine historical biodiversity data perspective. It orchestrates IE tools and provides the curators with a unified view of the methodology; as a result the documentation of the strengths, limitations and dependencies of several tools was drafted. Additionally, the classification of tools into Graphical User Interface (web and standalone) applications and Command Line Interface ones enables the data curators to select the most suitable tool for their needs, according to their specific features
The high volume of already digitised marine documents that await curation is amassed and a demonstration of the methodology, with a new scalable, extendable and containerised tool, DECO (bioDivErsity data Curation programming wOrkflow) is presented. DECO’s usage will provide a solid basis for future curation initiatives and an augmented degree of reliability towards high value data products that allow for the connection between the past and the present, in marine biodiversity research.