BIN-GENOME-Genome Browsers

From "A B C"
Revision as of 02:49, 3 October 2017 by Boris (talk | contribs)
Jump to navigation Jump to search

Genome Browsers


 

Keywords:  UCSC, GMod; in practice


 



 


Sorry!

This page is only a stub; it is here as a placeholder to establish the logical framework of the site but there is no significant content as yet. Do not work with this material until it is updated to "live" status.


 


Abstract

...


 


This unit ...

Prerequisites

You need to complete the following units before beginning this one:


 


Objectives

...


 


Outcomes

...


 


Deliverables

  • Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
  • Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.
  • Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.


 


Evaluation

Evaluation: Integrated Unit

This unit should be submitted for evaluation for a maximum of 10 marks. Details TBD.


 


Contents

  • Read:
Tyner et al. (2017) The UCSC Genome Browser database: 2017 update. Nucleic Acids Res 45:D626-D634. (pmid: 27899642)

PubMed ] [ DOI ] Since its 2001 debut, the University of California, Santa Cruz (UCSC) Genome Browser (http://genome.ucsc.edu/) team has provided continuous support to the international genomics and biomedical communities through a web-based, open source platform designed for the fast, scalable display of sequence alignments and annotations landscaped against a vast collection of quality reference genome assemblies. The browser's publicly accessible databases are the backbone of a rich, integrated bioinformatics tool suite that includes a graphical interface for data queries and downloads, alignment programs, command-line utilities and more. This year's highlights include newly designed home and gateway pages; a new 'multi-region' track display configuration for exon-only, gene-only and custom regions visualization; new genome browsers for three species (brown kiwi, crab-eating macaque and Malayan flying lemur); eight updated genome assemblies; extended support for new data types such as CRAM, RNA-seq expression data and long-range chromatin interaction pairs; and the unveiling of a new supported mirror site in Japan.


Introduction

 

Large scale genome sequencing and annotation has made a wealth of information available that is all related to the same biological objects: the DNA. The information however can be of very different types, it includes:

  • the actual sequence
  • sequence variants (SNPs and CNVs)
  • conservation between related species
  • genes (with introns and exons)
  • mRNAs
  • expression levels
  • regulatory features such as transcription factor bindings sites

and much more.

Since all of this information relates to specific positions or ranges on the chromosome, displaying it alongside the chromosomal coordinates is a useful way to integrate and visualize it. We call such strips of annotation tracts and display them in genome browsers. Quite a number of such browsers exist and most work on the same principle: server hosted databases are queried through a Web interface; the resulting data is displayed graphically in a Web browser window. The large data centres each have their own browsers, but arguably the best engineered, most informative and mostly widely used one is provided by the University of California Santa Cruz (UCSC) Genome Browser Project.

Compiling the data requires a massive annotation effort, which has not been completed for all genome-sequenced species. In particular, not all of our YFOs have been included in the major model-organism annotation efforts. The general strategy for analysis of a gene in YFO is thus to map it to homologous genes in model organisms. In this assignment you will explore the UCSC genome browser and we will go through an exercise that relates fungal replication genes to human genes. We have previously focused a lot on Mbp1 homologs, but these have no clear equivalences in "higher" eukaryotes. However one of the key target genes of Mbp1 is the cell cycle protein Cdc6, which is well conserved in fungi and other eukaryotes eukaryotes and has a human homolog. Since generally speaking the annotation level for human genes is the highest, we will have a closer look at that gene.


 

The UCSC genome browser

 

The University of California Santa Cruz (UCSC) Genome Browser Project has the largest offering of annotation information. However it is strictly model-organism oriented and you will probably not find YFO among its curated genomes. Nevertheless, if you are studying eg. human genes, or yeast, the UCSC browser will probably be your first choice.

Task:
In this task you will access the UCSC genome browser view of the human Cdc6 gene. You will explore some of the very large number of tracks that are available and study the transcription factor binding region.

  1. Navigate to the UCSC Genome Bioinformatics entry page and follow the link to the Genome Browser in the "Our tools" section.
  2. Click on the link to humans. Note that this is the hg38 assembly.
  3. Enter CDC6 into the "Position/Search Term" field and click "Go". You should get a list of entries, click on the top link, the gene on chromosome 17: CDC6 (uc002huj.2) at chr17:40287633-40304657
  1. Zoom out 1.5x to view the upstream regulatory region: the end of the adjacent WIPF2 gene should have just come into view on the left.
  2. Study the Genome Browser view of the human CDC6 homolog.
    1. In particular, note the extensive functional annotations of DNA and the alignments of vertebrate syntenic regions that allow detailed genomic comparisons.
    2. Distinguish between exon and intron sequence.
    3. Note that the mammal Conservation track has high values for all of the exons, but not only for exons.
    4. Find more information on the "Layered H3K27Ac" tract.
  1. Note the large number of available tracks that have been integrated into this view. Most of them are switched off. Find the Regulation section, and follow the link to the "ORegAnno" information to see what that is about. Note that you can switch individual annotations on or off on this page, as well as set the display format for all of the results. Select the check-box only for "transcription factor binding site" to be on, select the "Display mode" to full and click submit.
  2. Study this information and note:
    1. There is a cluster of TFBS just upstream of the transcription initiation site.
    2. This cluster coincides with the highest H3K27Ac density.
    3. If you <control>-click (right-click?) on the top orange bar of this cluster, a contextual menu opens from which you can access the details page for OREG1791811 in a new window. Follow the link to the RBL2 transcription factor via ENST00000379935 ... from where you can access transcript and gene and expression and protein family and GO and all other information.
  3. Go back to the Genome Browser and set the ORegAnno tract to "pack" and click "refresh".
  4. Slide the SNP track to just beneath the RefSeq genes track that contains the introns and exons. You will notice that one of the SNPs is green, and two are red. Why? Set the "Common SNPs" track display mode to "pack" and click "refresh".


Based on this kind of information, it should be straightforward to identify human transcription factors that potentially regulate human Cdc6 and determine - via sequence comparisons - whether any of them are homologous to any of the yeast transcription factors or factors in YFO. Through a detailed analysis of existing systems, their regulatory components and the conservation of regulation, one can in principle establish functional equivalences across large evolutionary distances.


Task:
Finally:

  1. Print this page, but print the first page only.
  2. With a red pen, mark and label the following four items on your print-out:
    1. The first exon of CDC6.
    2. The chromosomal coordinates of the current view.
    3. The binding sites for the transcription factors that bind to the CDC6 promoter.
    4. The locations of the missense-variant SNPs.
  3. Write your name and Student number on this page and bring it to class to hand it in on Tuesday.




 


Further reading, links and resources

Wang et al. (2013) A brief introduction to web-based genome browsers. Brief Bioinformatics 14:131-43. (pmid: 22764121)

PubMed ] [ DOI ] Genome browser provides a graphical interface for users to browse, search, retrieve and analyze genomic sequence and annotation data. Web-based genome browsers can be classified into general genome browsers with multiple species and species-specific genome browsers. In this review, we attempt to give an overview for the main functions and features of web-based genome browsers, covering data visualization, retrieval, analysis and customization. To give a brief introduction to the multiple-species genome browser, we describe the user interface and main functions of the Ensembl and UCSC genome browsers using the human alpha-globin gene cluster as an example. We further use the MSU and the Rice-Map genome browsers to show some special features of species-specific genome browser, taking a rice transcription factor gene OsSPL14 as an example.


 


Notes


 


Self-evaluation

 



 




 

If in doubt, ask! If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.



 

About ...
 
Author:

Boris Steipe <boris.steipe@utoronto.ca>

Created:

2017-08-05

Modified:

2017-08-05

Version:

0.1

Version history:

  • 0.1 First stub

CreativeCommonsBy.png This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.