Preprocessing : pp

Preprocessing : pp#

Extracting gap from path, preprocessing recipes.

Reading and writing#

pp.read_Verkko

Prepares the Verkko environment by creating necessary directories, locking the original directory, and loading the paths file for further processing.

pp.readChr

Read the chromosome assignment results and store them in the object.

pp.readNode

pp.readScfmap

pp.writeFixedPaths

Writes the fixed paths to a file.

pp.save_Verkko

Save the Verkko fillet object to a file using pickle.

pp.load_Verkko

Load the Verkko fillet object from a file using pickle.

pp.mkCNSdir

Creates a new CNS directory by creating symbolic links to the original verkko directory.

pp.updateCNSdir_missingEdges

Updates the CNS directory by handling missing edges and creating necessary symbolic links or files.

pp.loadGiraffe

Load the object of Giraffe genome from a file using pickle.

pp.readGaf

Reads a GAF file and stores it as a pandas DataFrame in the provided object.

pp.getQV

Reads a QV (Quality Value) file, parses it, and attaches the resulting DataFrame to the provided object.

Preprocessing#

pp.find_intra_telo

Find the internal telomeres in the assembly.

pp.find_reads_intra_telo

Find the reads support for the additional artifical sequences outside of the telomere.

pp.findGaps

Find gaps in the 'path' column of the DataFrame and store the result in the 'gaps' column.

pp.searchSplit

Searches for paths containing all specified nodes with a minimum mapping quality and length.

pp.searchNodes

Extracts and filters paths containing specific nodes from the graph alignment file (GAF).

pp.highlight_nodes

Highlight the nodes in the obj.paths DataFrame.

pp.fillGaps

Fills gaps for a specific gapId, updates the 'fixedPath', 'startMatch', 'endMatch', and 'finalGaf' columns.

pp.estLoops

Estimate the number of loops between two nodes in the graph.

pp.checkGapFilling

This function checks and prints the number of filled gaps in the 'gap' DataFrame and shows the progress bar for gap filling.

pp.connectContigs

Connects two contigs by adding a gap between them.

pp.naming_contigs

Rename the contigs based on the provided chromosome map file.

pp.find_multi_used_node

Find nodes that are used in more than one path.

pp.find_hic_support

Find HiC support for a specific node.

pp.deleteGap

Deletes a gap from the 'gap' DataFrame for a specific gapId.

pp.impute_depth

pp.calNodeDepth

Plot the depth of nodes in the graph.

pp.detectBrokenContigs

Find contigs that assigned same chromosome and haplotype.

Last touch#

pp.get_NodeChr

Get the node and chromosome mapping from the VerkkoFillet object.