Once the `results/RF/best_model.pkl` is written you can save the changes, and quit the server
([more info here](https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#jupyter-notebook-integration) and
you can always [see this demo](https://snakemake.readthedocs.io/en/stable/_images/snakemake-notebook-demo.gif).
This will trigger the execution of the rest of the workflow.
The resulting notebook will be saved as `results/logs/processed_notebook.py.ipynb`.
Note that depending on the changes you make the results you might get will differ from the default, non-interactive run.
### **Option 2.** Archived workflow from zenodo (TO DO).
---
Something along the [guidelines from snakemake](https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html#sustainable-and-reproducible-archiving).
## Output
---
The output of the whole workflow is produced and stored within a `results` directory. This looks like (several directories
and files omitted for legibility). Most prominent ones are marked with an asterisk and a short description:
```
# Skipping several thousands of intermediate files with the -I option
$ tree -n -I '*NC*.fasta|*_genes.*|*.gff|*.log' results
results
├── annotations.tsv
├── filtered_scores.tsv -------------------- * Table containing feature values for all interactions passing filtering
├── final_training_set.tsv
├── interaction_datasets
│ ├── 01_filter_intact
│ ├── 02_summarize_intact
│ ├── 03_uniprot
│ ├── 04_process_uniprot
│ ├── 05_genomes
│ ├── 05_interaction_datasets
│ ├── 06_map_proteins_to_pvogs
│ ├── N1 --------------------------------
.... | * Features, interactions, proteins, and pvogs are stored per dataset
│ └── positives --------------------------
│ ├── positives.features.tsv
│ ├── positives.interactions.tsv
│ ├── positives.proteins.faa
│ └── positives.pvogs_interactions.tsv
├── logs
├── predictions.tsv ------------------------- * Final predictions made
├── pre_process
│ ├── all_genomes
│ ├── comparem --------------------------- * Directory with the final AAI matrix used
...
│ ├── fastani ---------------------------- * Directory with the final ANI matrix used
│ ├── hmmsearch -------------------------- * HMMER search results for all pvogs profiles agains the translated genomes
│ ├── reflist.txt
│ └── transeq
│ └── transeq.genomes.fasta
├── RF
│ ├── best_model_id.txt ------------------- * Contains the id of the negative dataset
│ ├── best_model.pkl ---------------------- * The best model obtained.
│ ├── features_stats.tsv ------------------ * Mean, max, min. std for feature importances
│ ├── features.tsv ------------------------ * Exact values of features importances for each combination of training/validation
│ ├── figures ----------------------------- * Figures used in the manuscript.
│ │ ├── Figure_1a.svg
....
....
│ ├── metrics.pkl
│ ├── metrics.stats.tsv ------------------- * Mean. max, min, std across all models
│ ├── metrics.tsv ------------------------- * Exact values of metrics for each combination of training/validation
│ └── models
│ ├── N10.RF.pkl ---------------------- * Best model obtained when optimizing with each negative set
.....
.....
└── scores.tsv ----------------------------- * Master table with feature values for all possible pVOGs combinations