Arsenal

Arsenal

This is my favorite resources’ page

TABLE OF CONTENTS


Metabolic modeling

You may have a look at this e-book :notebook: I’ve been working on.

Genomics and Metagenomics

Tool Description Architecture Repo DOI
LocalHGT ultrafast horizontal gene transfer detection from large microbial communities stand-alone GitHub OA

Databases

Resource Description Link DOI
tree asd ads asd

Statistics for microbiome analysis

My dictionary

In the following table I describe some terms I usually read/write about.

Term Description
richness number of different taxa in a community
evenness the commonness or rarity of a species
mucin a family of high molecular weight, heavily glycosylated proteins (glycoconjugates) produced by epithelial tissues in most animals
effective number refers to the number of equally abundant species needed to obtain the same mean proportional species abundance as that observed in the dataset of interest (where all species may not be equally abundant)
copiotrophs taxa living in environments rich in nutrients
succession changes in the presence, relative abundance or absolute abundance of one or more organisms within a microbial community. Its processes can be deterministic or stochastic. Factors that drive deterministic succession fall into three categories: abiotic factors (pH, redox potential), environmental factors (cross-feeding, diet or travel) and biological factors (innate and adaptive immunity). Stochastic succession is defined as microbial community changes that are not the consequence of environmentally determined fitness (ecological drift). Whether microbial succession is more deterministic or stochastic is driven by several factors in the formation of the community, including birth mode, travel, diet (for example, human breast milk) and antibiotics.

Visualizations

Tool Description Architecture Repo Documentation DOI
clinker pipeline for easily generating publication-quality gene cluster comparison figures Python package GitHub wiki page OA

Microbiome basics

Microbiome basics

Microbiome analysis counts several aspects, yet it always comes down at some basic concepts! Here, I will try to cover a thing or two and gather some links that describe them more thoroughly. To save some (of my) time, I will use a lot of copy-paste () always referring the initial resource

TABLE OF CONTENTS


Microbiome data idiosyncrasy

Although microbiome data has some of the attributes of compositional data, it is not perfectly compositional. Classic compositional data vectors represent portions of a whole. The total sum of the components is not meaningful, and only the relative difference between components matters [36]. For truly compositional data, the vectors (2, 1) and (2000, 1000) represent the same information: only that the first and second components are present in the ratio 2 : 1. For microbiome data, the size of the counts also contains information about the reliability of the ratio. Larger counts are more likely to closely match the true ratio in the sample [44].

Covariate Adjustment

Regression analysis, potentially with penalization for variable selection, has been used to analyze an outcome of interest modeled as a function of microbiome features.

covariate_challenge

Microbiome diversity

Diversity within a community

α-diversity

https://docs.onecodex.com/en/articles/4136553-alpha-diversity

Shannon Entropy ((H’)) \[H' = - \sum_{i=1}^S p_i \ln(p_i)\]

where:

  • \(p_i\) is the proportional abundance of species (i),
  • \(S\) is the total number of species.

The effective number of species based on Shannon entropy is: \[\text{Effective Number of Species} = e^{H'}\]


Diversity between communities

β-diversity

Feature Identification

This sections is paraphrasing in a summary of the Liu et al. (2021) chapter in the Statistical Analysis of Microbiome Data book.

Questions to be addressed:

  • which microbiome features are impacted by treatments or environmental conditions? \(\rightarrow\) identify features whose abundances change across treatments or conditions differential \(\rightarrow\) abundance analysis
  • which microbiome features mediate treatment effects on an outcome? \(\rightarrow\) identify taxa affected by treatments and that because of their change the outcome of the treatment is influenced \(\rightarrow\) mediation analysis
  • which microbiome features have an effect on an outcome, adjusting for confounders \(\rightarrow\) identify microbiome features with an effect on an outcome with no particular treatments of interest, but with potential complex confounding arising from relationships between microbes, host, and environment \(\rightarrow\) Feature Identification Adjusting for Confounding

Differential Abundance Analysis

the null hypothesis of a differential abundance test is that treatments do not affect the mean abundance level.

Differential abundance analysis (solid lines) selects microbiome features whose abundance levels change across treatments or conditions. It only examines the relationship between treatments/conditions and microbiome features, but not the relationships involving other outcomes

DA

Mediation analysis

high dimensionality and sparsity of microbiome data.

In case of a treatment - microbiome - outcome study, one needs to avoid differential abundance analysis as the relationships you are looking for are out of scope for DA!

Mediation analysis examines the indirect effects of treatment on the outcome through the microbiome. To determine whether a feature has a mediation effect, a method must consider both the effect of the treatment on the feature and the effect of the feature on the outcome.

mediation

Feature Identification Adjusting for Confounding

asd

Microbial phenomena that can drive you nuts!

A strain dips to very low abundance in a microbial community and then recovers to thrive

This is quite common and reflects the complex interplay of ecological, evolutionary, and environmental factors. In this page you can find some of the ways a strain might manage to survive during these low-abundance periods without going completely extinct.

Literature

Datta, Somnath, and Subharup Guha, eds. Statistical Analysis of Microbiome Data. Springer International Publishing, 2021.

A visit at ETH

A visit at ETH

Last month, I had the chance to visit ETH and the Institute of Food, Nutrition and Health.

I was lucky to see the awsome bioreactors they have set over the years there in the Laboratory of Food Biotechnology but most importantly, a group of people that struggle to keep doing what they love.

I was able to see how the data are produced but also spread the world about microbetag, our co-occurrence network annotator.

For sure, two weeks were not enough, so I hope we will meet again even somewhere not as beautiful as the snowy Zurich. A great thanks to Dr. Annelies Geirnaert and Professor for this opportunity and to all the lab members for making my stay so pleasant. Of course, a special thanks to Dr. Andi Erega for all the explanations and patience with me knowing but the basics in the lab but mostly for the burek!

See you soon I hope! :)

A pile of pipelines: An overview of the bioinformatics software for metabarcoding data analyses

A pile of pipelines: An overview of the bioinformatics software for metabarcoding data analyses

A multitude of metabarcoding data analysis tools and pipelines have also been developed. Often, several developed workflows are designed to process the same amplicon sequencing data, making it somewhat puzzling to choose one among the plethora of existing pipelines. However, each pipeline has its own specific philosophy, strengths and limitations, which should be considered depending on the aims of any specific study, as well as the bioinformatics expertise of the user. In this review, we outline the input data requirements, supported operating systems and particular attributes of thirtytwo amplicon processing pipelines with the goal of helping users to select a pipeline for their metabarcoding projects.

metaGOflow: a workflow for the analysis of marine Genomic Observatories shotgun metagenomics data

metaGOflow: a workflow for the analysis of marine Genomic Observatories shotgun metagenomics data

Based on the established MGnify resource, we developed metaGOflow. metaGOflow supports the fast inference of taxonomic profiles from GO-derived data based on ribosomal RNA genes and their functional annotation using the raw reads. Thanks to the Research Object Crate packaging, relevant metadata about the sample under study, and the details of the bioinformatics analysis it has been subjected to, are inherited to the data product while its modular implementation allows running the workflow partially. The analysis of 2 EMO BON samples and 1 Tara Oceans sample was performed as a use case.

metaGOflow is an efficient and robust workflow that scales to the needs of projects producing big metagenomic data such as EMO BON. It highlights how containerization technologies along with modern workflow languages and metadata package approaches can support the needs of researchers when dealing with ever-increasing volumes of biological data. Despite being initially oriented to address the needs of EMO BON, metaGOflow is a flexible and easy-to-use workflow that can be broadly used for one-sample-at-a-time analysis of shotgun metagenomics data.

Read more for the metaGOflow software

Pagination


© 2022. All rights reserved.