From a06d66f8171a7fc0ac8e00782c88ed5df50dca82 Mon Sep 17 00:00:00 2001
From: nikos <n.pappas@uu.nl>
Date: Tue, 12 Jan 2021 11:19:04 +0100
Subject: [PATCH] fix tree output formatting

---
 README.md | 43 +++++++++++++++++++++++++++----------------
 1 file changed, 27 insertions(+), 16 deletions(-)

diff --git a/README.md b/README.md
index 2c5dfad..e510c3a 100644
--- a/README.md
+++ b/README.md
@@ -13,7 +13,8 @@ tou do not need to worry about getting models or any other external databases.
 
 |Tool (source) | Publication/Preprint |
 |:------|:------|
-[RaFAh](https://sourceforge.net/projects/rafah/)|[Coutinho F. H. et al. 2020](https://www.biorxiv.org/content/10.1101/2020.09.25.313155v1?rss=1)
+[HTP](https://github.com/wojciech-galan/viruses_classifier)|[Gałan W. et al., 2019](https://www.nature.com/articles/s41598-019-39847-2)
+[RaFAh](https://sourceforge.net/projects/rafah/)|[Coutinho F. H. et al., 2020](https://www.biorxiv.org/content/10.1101/2020.09.25.313155v1?rss=1)
 [vHuLK](https://github.com/LaboratorioBioinformatica/vHULK)|[Amgarten D. et al., 2020](https://www.biorxiv.org/content/10.1101/2020.12.06.413476v1)
 [VirHostMatcher-Net](https://github.com/WeiliWw/VirHostMatcher-Net)|[Wang W. et al., 2020](https://doi.org/10.1093/nargab/lqaa044])
 [WIsH](https://github.com/soedinglab/WIsH)|[Galiez G. et al., 2017](https://academic.oup.com/bioinformatics/article/33/19/3113/3964377)
@@ -36,7 +37,7 @@ The file `environment.txt` can be used to recreate the complete environment
 used during development.
 
 > The provided `environment.txt` contains an explicit list of all packages,
-> produced with `conda list -n hp --explicit > environment.txt` .
+> produced with `conda list -n phap --explicit > environment.txt` .
 > This ensures all packages are exactly the same versions/builds, so we 
 > minimize the risk of running into dependencies issues
 
@@ -158,8 +159,12 @@ For each sample, results for each tool are stored in directories named after
 the tool. An example looks like this:
 
 ```
-results/A/
+$ tree -L2 results/A
+results/A
 ├── all_predictions.tsv
+├── htp
+│   ├── predictions.tsv
+│   └── raw.txt
 ├── rafah
 │   ├── A_CDS.faa
 │   ├── A_CDS.fna
@@ -183,47 +188,53 @@ results/A/
 │   └── results
 └── wish
     ├── llikelihood.matrix
-	├── prediction.list
-	└── predictions.tsv
+    ├── prediction.list
+    └── predictions.tsv
 ```
 
 ### Per sample 
 
 * `all_predictions.tsv`: Contains the best prediction per contig (rows) for 
-each tool along with its confidence/p-value/whatever single value each tool 
+each tool along with its confidence/p-value/whatever-single-value each tool 
 uses to evaluate its confidence in the prediction.
 
 An example for three genomes:
 
 ```
-contig	vhulk_pred	vhulk_score	rafah_pred	rafah_score	vhmnet_pred	vhmnet_score	wish_pred	wish_score
-NC_005964.2	None	4.068828	Mycoplasma	0.461	Mycoplasma fermentans	0.9953	Bacteria;Tenericutes;Mollicutes;Mycoplasmatales;Mycoplasmataceae;Mycoplasma;Mycoplasma fermentans;Mycoplasma fermentans MF-I2	-1.2085700000000001
-NC_015271.1	Escherichia_coli	1.0301523	Salmonella	0.495	Muricauda pacifica	0.9968	Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Raoultella;Raoultella sp. NCTC 9187;Raoultella sp. NCTC 9187	-1.3869200000000002
-NC_023719.1	Bacillus	0.0012575098	Bacillus	0.55	Clostridium sp. LS	1.0000	Bacteria;Firmicutes;Clostridia;Clostridiales;Clostridiaceae;Clostridium;Clostridium beijerinckii;Clostridium beijerinckii	-1.29454
+contig  htp_proba       vhulk_pred      vhulk_score     rafah_pred      rafah_score     vhmnet_pred     vhmnet_score    wish_pred       wish_score
+NC_005964.2     0.8464285626352002      None    4.068828        Mycoplasma      0.461   Mycoplasma fermentans   0.9953  Bacteria;Tenericutes;Mollicutes;Mycoplasmatales;Mycoplasmataceae;Mycoplasma;Mycoplasma fermentans;Mycoplasma fermentans MF-I2   -1.2085700000000001
+NC_015271.1     0.995161392517451       Escherichia_coli        1.0301523       Salmonella      0.495   Muricauda pacifica      0.9968  Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Raoultella;Raoultella sp. NCTC 9187;Raoultella sp. NCTC 9187       -1.3869200000000002
+NC_023719.1     0.9999957241187084      Bacillus        0.0012575098    Bacillus        0.55    Clostridium sp. LS      1.0000  Bacteria;Firmicutes;Clostridia;Clostridiales;Clostridiaceae;Clostridium;Clostridium beijerinckii;Clostridium beijerinckii       -1.29454
 ```
 
 * `tmp` directory
-  * Contains one fasta file per input genome, along with other intermediate 
-files necessary for a smooth execution of the workflow.
+  * Directory `genomes`: Contains one fasta file per input genome
+  * File `reflist.txt`: An intermediate file that holds paths to all produced 
+genome fastas (used as intermediate file to ensure smooth execution)
 
 ### Per tool
 
+* `htp`
+  * File `raw.txt`: The raw output of `htp` per contig
+  * File `predictions.tsv`: **Two**-column separated tsv with contig id and
+probability of host being a phage.
+
 * `rafah`
-  * All files prefixed with `<sample_id>_` are the rafah's raw output
+  * Files prefixed with `<sample_id>_` are the rafah's raw output
   * `predictions.tsv`: A selection of the 1st (`Contig`) , 6th 
 (`Predicted_Host`) and 7th (`Predicted_Host_Score`) columns from file 
 `<sample_id>_Seq_Info.tsv`
 
 * `vhulk`
-  * `results.csv`: Copy of the `results/sample/tmp/genomes/results/results.csv`
-  * `predictions.tsv`: A selection of the 1st (`BIN/genome`), 10th (`final_prediction`) 
+  * File `results.csv`: Copy of the `results/sample/tmp/genomes/results/results.csv`
+  * File `predictions.tsv`: A selection of the 1st (`BIN/genome`), 10th (`final_prediction`) 
 11th (`entropy`) columns from file `results.csv`.
 
 * `vhmnet`
   * Directories `feature_values` and `predictions` are the raw output
   * Directory `tmp` is a temporary dir written by `VirHostMatcher-Net` for 
 doing its magic.
-  * `predictions.tsv` contain contig, host taxonomy and scores.
+  * File `predictions.tsv` contains contig, host taxonomy and scores.
 
 * `wish`
   * Files `llikelihood.matrix` and `prediction.list` are the raw output
-- 
GitLab