Skip to content
Snippets Groups Projects
README.md 11.40 KiB

phap - Phage Host Analysis Pipeline

A snakemake workflow that wraps various phage-host prediction tools.

Snakemake

  • Uses Singularity containers for execution of all tools. When possible (i.e. the image is not larger than a few Gs), tools and their dependencies are bundled in the same container. This means you do not need have to get models or any other external databases.
  • Calculates Last Common Ancestor of all tools per contig.

Current tools

Tool (source) Publication/Preprint Comments
HTP Gałan W. et al., 2019 ok
RaFAh Coutinho F. H. et al., 2020 ok
vHuLK Amgarten D. et al., 2020 ok
VirHostMatcher-Net Wang W. et al., 2020 ok
WIsH Galiez G. et al., 2017 ok (unnecessary?)

Installation

Dependencies

To run the workflow your will need

  • snakemake > 5.x (developed with 5.30.1)
  • singularity >= 3.6 (developed with 3.6.3)

The following python packages are also required to be installed and available in the execution environment

  • biopython >= 1.78 (developed with 1.78)
  • ete3 >= 3.1.2 (developed with 3.1.2)

The ete3.NCBITaxa class is used to get taxonomy information and calculate the LCA of all predictions, when possible. This requires a taxa.sqlite to be available either in its default location ( ~/.ete3toolkit/taxa.sqlite ) or provided in the config. See more on http://etetoolkit.org/docs/latest/tutorial/tutorial_ncbitaxonomy.html

Conda environment

It is recommended to use a conda environment. The file environment.txt can be used to recreate the complete environment used during development.

The provided environment.txt contains an explicit list of all packages, produced with

conda list -n phap --explicit > environment.txt

This ensures all packages are exactly the same versions/builds, so we minimize the risk of running into dependencies issues

To get a working environment

# Clone this repo and get in there
$ git clone https://git.science.uu.nl/papanikos/phap.git
$ cd phap

# Note the long notation --file flag; -f will not work.
$ conda create -n phap --file=environment.txt

# Activate it - use the name you gave above, if it is different
$ conda activate phap

# The (phap) prefix shows we have activated it
# Check the snakemake version
(phap) $ snakemake --version
5.30.1

Configuration

Input data