About PyPangraph
PyPanGraph is a Python package to facilitate exploration and analysis of PanGraph output JSON files.
Installation
PyPanGraph can be installed from PyPi using pip:
Loading and exploring a graph
We start the tutorial by loading a pangenome graph object and exploring its properties. For this tutorial we will use the plasmids.json file, a pangraph of 15 plasmid sequences. You can download it from the pypangraph repository at packages/pypangraph/tests/data/plasmids.json.
A look at the pangenome
In this next section of the tutorial we use some of the features of PyPangraph to explore the blocks that make up the pangenome graph, and extract information on the pangenome of our dataset.
Exploring block alignments
In this next tutorial section we explore block alignments in more detail.
Paths and core genome synteny
In this tutorial we will learn how to visualize paths in a pangraph and to survey changes in core-genome synteny.
Comparing two genomes in a dotplot
Decomposing genomes in separate blocks provides a very good starting point for pairwise comparison between genomes. The pangenome graph can easily be used to draw a dotplot between two different paths, in which lines represent shared blocks.
Junctions: comparing local accessory variation
When comparing closely related bacterial genomes, the core genome is often largely syntenic: long stretches of conserved blocks appear in the same order across isolates. Between these conserved blocks, segments of accessory DNA (insertions, deletions, mobile elements) vary from one genome to another.
Calculating summary junction statistics
Bacterial genomes can harbor hundreds of loci of accessory genome variability. Manual inspection of every locus is often impractical.
Investigating junctions further: positions and sequences
Using summary statistics, in the previous part of the tutorial we have singled out a junction we want to investigate further. Here we will showcase the use of two BackboneJunctions methods for this:
Bonus: visually untangling graph complexity using junctions
When a whole pangenome graph is exported to GFA and opened in Bandage, the result is usually a tangle. As we saw in the build tutorial, the same accessory or duplicated block can occur in many different genomic contexts, so a single segment ends up linked to many distant parts of the graph. These long-range links are what make the layout look like a hairball. This can be mitigated by filtering out duplicated blocks or even all accessory blocks, but at the cost of losing visualization of the accessory diversity.