PEMA investigating metabarcoding

Running on HPC

PEMA has been developed in a High Performance Computing (HPC)-oriented way.

That is because it is rather usual to have large datasets of dozins, maybe hundreds of samples to analyze.

Many of the tools and the algorithms encapsulated on PEMA require for either essential computational power or memory (RAM) or sometimes both. Thus, PEMA was developed fon HPC systems originally.

The best way to run PEMA is at an HPC system using the Singularity container.

Assuming you have access at an HPC system, you need to check whether that supports Singularity in a version later than v3.0.

To do so, you may run

singularity --version

If Singularity is not available, you need to ask your sys-admin to add it. Singularity is also HPC-oriented so there are no problems for the HPC coming with that.

In case you need to install Singularity on your own, you may follow the steps described on the official Singularity site here.

Once Singularity is on, you need to run the following command to get the PEMA container on your account

singularity pull shub://hariszaf/pema:<tag>

This will take a few moments as the container is 3Gb approximately.

If you leave the :<tag> part empty, then you will get the latest PEMA version.

The latest version is changing from time to time and it is the most recent version of PEMA with the newest feature included.

You may have a look on the PEMA versioning section for more details on that.

Once the container has been downloaded completed, you are ready to go!

Now, you shall follow the how to run PEMA instructions described here.

Assuming your HPC system uses SLURM as a job scheduling system, here is a sbatch script example on how to run PEMA.


#SBATCH --partition=batch
#SBATCH --nodes=1
#SBATCH --nodelist=
#SBATCH --ntasks-per-node=20
#SBATCH --mem=
#SBATCH --job-name="testPema"
#SBATCH --output=test.output
#SBATCH --mail-type=ALL
#SBATCH --requeue

singularity run \
-B /<path_to_your>/analysis_directory/:/mnt/analysis \