Introduction to Pangraph
Pangraph is currently under heavy development. Bugs and crashes are to be expected.
PanGraph is a command-line tool for the analysis of bacterial genomes. It compresses multiple genome in a compact graph representation, that can be queried to extract information about the evolution of the genomes. It is developed and maintained by the Neher lab.
Why Pangraph?​
The content and structure of bacterial genomes evolves very rapidly: Part of the genome can be cut out, duplicated, or inverted. In addition, genomic material can be gained from the outside for example through phage infection or DNA uptake and integration. As a result, comparing bacterial genomes is more complicated than analyzing differences in the alignment of homologous sequences. Instead, one would like to understand how diversity in terms of content and structure has arisen through insertions, deletions, transpositions over the course of evolution. To address such questions, we have developed a scalable multiple genome alignment tool, PanGraph, that identifies regions of mutual homology between large sets of closely related genomes and represents them in a graph.
This is expected to be useful to parsimoniously infer horizontal gene transfer events within a community; perform comparative studies of genome gain, loss, and rearrangement dynamics; or simply to compress many related genomes.
The resultant graph represents contiguous intervals of homologous DNA as vertices and every genome as an ordered walk across such vertices. Edges of the graph are unordered and only exist if at least one genome was found to connect both vertices in either the forward or reverse strand. For a more detailed description of the graph structure, see what is a pangraph.
Documentation outline​
This documentation contains:
- a set of tutorials that explain the essential steps to build and manipulate a graph.
- a reference documentation of the available commands.
- in addition, we provide a python library PyPangraph for analysis of the graph data structure in Python
This documentation refers to the latest version of pangraph. Code for the previous v0
version is available on the v0
branch of the repository, and the legacy documentation is hosted at https://v0.docs.pangraph.org/.