PEMA investigating metabarcoding

PEMA output

In the parameters.tsv file you are asked to provide PEMA with a name for your analysis that will be used to build a directory where all the PEMA output will be found.

In all cases PEMA returns 7 subdirectories no matter. If a phyloseq analysis has been asked then PEMA builds an extra directory for that.

Here is a short description of the output files PEMA returns.

Pre-processing steps output

In folders 1.quality_control, 2.trimmomatic_output, 3.correct_by_BayesHammer, 4.merged_by_SPAdes, 5.dereplicate_by_obiuniq and 6.linearized_files the output of each of tool used for the pre-processing steps are placed.

In the first file the sequence quality control results are located.

In the second one, your trimmed sequences are located, in the third one the corrected ones.

In the fourth subdirectory, you will see that now you have only one file for each of your samples as your sequences have been merged.

In the fifth one you will find the dereplicated sequences and finally, in the sixth only the sequences that remained after the quality control and the pre-processing steps are now present.

These last .fasta files are used to form a single .fasta file, called final_all_samples.fasta that will be used from this point onwards for the clustering and taxonomy assignment steps.

All these files can be considered as intermediate files and are always the same no matter what is your marker gene.

7.gene_dependent subdirectory

In this subdirectory, all output from clustering and taxonomy assignment steps is placed.

According to the user’s parameters, the content of this subdirectory differs.

A number of files can be found there with respect to the OTU clustering or ASVs inference.

The most important file though is the final_table.tsv file where you can find the taxonomy assigmnent of your OTUs/ASVs.

phyloseq_output subdirectory

In case that a phyloseq analysis has been asked by the user, an extra subdirectory will be created with this name including all the relative files.

Finally, a copy of the parameters.tsv file you used for your analysis, will be placed in the main output directory.