Automatic Basic Preprocessing#
automatic_script.py is a command-line driver that runs the standard
verkko-fillet preprocessing and QC pipeline end-to-end on a finished Verkko
assembly. It is intended for users who do not want to step through the
notebook tutorials interactively and just want the figures, stats, and
chromosome-assignment files generated in one go.
The script is shipped with the package at:
verkko-fillet/src/verkkofillet/bin/automatic_script.py
What it does#
Given a Verkko output directory, a reference FASTA, and a chromosome name mapping file, the script:
Loads the Verkko assembly into a
VerkkoFilletobject (vf.pp.read_Verkko)Detects T2T contigs (
vf.tl.getT2T)Assigns reference chromosomes via mashmap (
vf.tl.chrAssign)Reads the chromosome mapping (
vf.pp.readChr)Detects broken contigs (
vf.pp.detectBrokenContigs)Generates QC plots:
vf.pl.showMashmapOri— chromosome coverage from mashmapvf.pl.completePlot— completeness per chromosome/haplotypevf.pl.contigLenPlot— contig length per haplotypevf.pl.contigPlot— T2T status heatmapvf.pl.n50Plot— N50 line plot
Detects internal telomeres and computes telomere percentages (
vf.tl.detect_internal_telomere,vf.pp.find_intra_telo,vf.pl.percTel)
All outputs are written under the Verkko-fillet working directory
(obj.verkko_fillet_dir):
Output |
Location |
|---|---|
Figures (PDF) |
|
Stats tables |
|
Chromosome assignment files |
|
Requirements#
verkkofilletinstalled (and importable asverkkofillet)External tools used by the underlying functions:
mashmap,samtools,bgzip,seqtk(as required bychrAssign/ FASTA renaming)A completed Verkko assembly directory
Arguments#
Argument |
Required |
Default |
Description |
|---|---|---|---|
|
yes |
— |
Path to the Verkko output directory. |
|
yes |
— |
Reference FASTA file used for chromosome assignment. |
|
yes |
— |
Tab-separated mapping from contig names in the reference FASTA to the desired chromosome names. |
|
no |
|
Distance from contig end (bp) used to flag internal telomeres. |
--map_file format#
A two-column, tab-separated file:
<contig_name_in_fasta>\t<desired_chromosome_name>
Examples:
# contig is already named as desired
chr1 chr1
chr2 chr2
# rename contig GCF00001 to chr1
GCF00001 chr1
GCF00002 chr2
Usage#
python /path/to/verkko-fillet/src/verkkofillet/bin/automatic_script.py \
--verkkoDir /path/to/verkko_output \
--ref /path/to/reference.fasta \
--map_file /path/to/chrMap.tsv \
--internal_tel_threshold 15000
Example#
python automatic_script.py \
--verkkoDir ./verkko_asm \
--ref ./ref/CHM13v2.fa \
--map_file ./ref/chrMap.tsv
After the run completes, inspect the results: The final verkko-fillet output directory is created next to the input Verkko directory and named {verkko_directory}_verkko_fillet.
ls verkko_asm_verkko_fillet/figs/
ls verkko_asm_verkko_fillet/stats/
ls verkko_asm_verkko_fillet/chromosome_assignment/