Genome Browser Group
From Computational Genomics
CompGenomics2015 class was charged with extracting useful information from unassembled reads (total 53 genomes) provided by the Center for Disease Control and prevention (CDC). These reads belonged to Neisseria meningitidis, Haempophilus influenza or Haemophuilus haemolyticus. This was in addition to the development of a tool, capable of correctly classifying any input sequence belonging to one of the above mentioned species,down to their serogroup/serotype. This task was carried out by five groups, each group focused on a specific aspect of the project. Hence genome assembly, geneprediction, functional annotation and comparaisons between species were successfully carried out by these groups respectively.
Browser group was tasked to present these results in an accessible, easy to understand and user-friendly manner. Their approach, considerations and decisions are summarized here.
Genome Browser Background
A genome browser is a web/desktop based graphical tool for rapid and reliable display of any requested portion of the genome at any scale, integrated with a large collection of annotations. A browser could be configured to display Genome sequences (contigs, assembly), mRNA, ESTs, Poly A sites, splicing boundaries, SNPs, and much more. Text and sequence-based searches provide quick and precise access to any region of specific interest. Secondary links from individual features lead to sequence details and supplementary off-site databases. Representation of data along a single co-ordinate axis is the USP of a genome browser.
Created for biologists, GMOD is a collection of databases, applications, and adapter software which are used to display genomic information. There are many applications and database schemas that can be used depending on what information will be provided and what the desired output and user interface is. Following is a collection of applications we used in the construction of our genome browser. (For more information; visit http://gmod.org/wiki/Main_Page).
JBrowse was chosen over its predecessor, GBrowse, as the genome browser for the reasons provided:
(1) Client-side rendering means it is very fast. GBrowse is slower since it runs on the web server
(2) Does not need a database, organizes the data directly from GFF. GBrowse requires setting up a database
(3) Easier to install and run compared to GBrowse
In order to display genome sequences and multiple annotation files on JBrowse, we can use the automated scripts provided by JBrowse. We also edit the jbrowse.conf configuration file to display more contigs and sort them by contig sizes.
Commands to load references sequences into JBrowse:
./bin/prepare-refseqs.pl --fasta [M9261.fasta] --out ./data/[M9261]
Commands to load gff annotation files into JBrowse:
./bin/flatfile-to-json.pl --gff [M9261_CDS.gff] --trackLabel CDS --out ./data/[M9261]
For more information about JBrowse please visit: http://jbrowse.org
Since JBrowse is a database free genome browser. We didn't need to build a relational database to store all the sequences and annotation files. Jbrowse will create JSON data files to store all the data of our Browser in the server.
All 53 sequencing reads provided by the CDC were identified to be either Neisseria meningitidis, Haemophilus influenzae, or Haemophilus haemolyticus. There are 5 common serogroups of N. meningitidis known to be pathogenic: A, B, C, W, and Y, and there are 6 common pathogenic serotypes of H. influenzae: a, b, c, d, e, and f. H. haemolyticus cannot be classified into serogroups or serotypes due to the absence of a capsule.
The typer tool will determine the species, serogroup/serotype, and phenotypic traits of a whole genome sequence provided as input by the user. This tool uses an alignment free methodology to identify the species of the input fasta or fastq file. Currently species identification is limited to Neisseria meningitidis, Haemophilus influenza, and Haemophilus haemolyticus. The tool also identifies MLST using the run_MLST[] script from Leighton Pritchard's bioinformatics toolkit.
The difference in the enzymatic pathways of the genus’ Neisseria and Haemophilus is used to characterize the species of the input sequence by identifying genes associated with each pathway unique to the species analyzed. This uses the kmer-based alignment-free method. Analysis of the complete capsule (cap) loci of the input will then determine with serogroup if the input was identified as N. meningitidis, or serotype if the input was identified as H. influenzae.
Please visit our website at http://gbrowse2015.biology.gatech.edu/Home.html
This page contains a brief overview of the three species whose genomes were the focus of this project. Toolbar provided at the top links the home page to other pages.
A brief project description can be found on this page, along with links to compgenomics2015 wiki page and gitHub repositories developed by various groups during this project.All collaborators are also acknowledged. An email id is provided at the bottom of the page for contact purposes.
Implementation of the Typer tool developed by the Comparative group. It uses an alignment free methodlogy to identify the species of the input fasta or fastq file. It determines the species MLST profile, and phenotypic traits of a whole genome sequence input by user and it creates the phylogeny tree.
1: Phylogeny Tree shows how the samples cluster together.
2: Allelic Profile.
3: Sequence type.
Browser page links to jbrowse , which enables easy visualization of annotated regions of the 53 genome sequences obtained from CDC. These annotations include CDS, tRNA, rRNA, miscRNA and more (2-6 in the figure titled screenshot of the browser).
Navigation panel (1) provides an overview of the genome and the current location, navigation buttons for panning and zooming and a text box for navigating directly to coordinates or named features.
Blast page allows the user to search their sequence of interest against our database of 53 genome sequences.
All results from different groups are made available for download here. This page also provides link to the browser, enabling the user to go to the browser page for an assembly of interest, directly from this page.
The Human Genome Browser at UCSC,W. James Kent, Charles W. Sugnet, Terrence S. Furey, et al.,Genome Res. 2002 12: 996-1006