Expected Preparations:

  [BIN-Genome]
Annotation
 
  The units listed above are part of this course and contain important preparatory material.  

Keywords: UCSC genome browser

Objectives:

This unit will …

  • … introduce work with the UCSC genome browser;

Outcomes:

After working through this unit you …

  • … can use the UCSC genome browser for genome analysis queries;


Deliverables:

Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.

Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don’t overlook these.

Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.


Evaluation:

NA: This unit is not evaluated for course marks.

Contents

Exploring genomes with the UCSC genome browser

Task…

  • ;Read:

    Lee, Christopher M et al.. (2020). “UCSC Genome Browser enters 20th year”. Nucleic Acids Research 48(D1):D756–D761 .
    [PMID: 31691824] [DOI: 10.1093/nar/gkz1012]

    The University of California Santa Cruz Genome Browser website (https://genome.ucsc.edu) enters its 20th year of providing high-quality genomics data visualization and genome annotations to the research community. In the past year, we have added a new option to our web BLAT tool that allows search against all genomes, a single-cell expression viewer (https://cells.ucsc.edu), a ‘lollipop’ plot display mode for high-density variation data, a RESTful API for data extraction and a custom-track backup feature. New datasets include Tabula Muris single-cell expression data, GeneHancer regulatory annotations, The Cancer Genome Atlas Pan-Cancer variants, Genome Reference Consortium Patch sequences, new ENCODE transcription factor binding site peaks and clusters, the Database of Genomic Variants Gold Standard Variants, Genomenon Mastermind variants and three new multi-species alignment tracks.

Introduction

 

Large scale genome sequencing and annotation has made a wealth of information available that is all related to the same biological objects: the DNA. The information however can be of very different types, it includes: * the actual sequence * sequence variants (SNPs and CNVs) * conservation between related species * genes (with introns and exons) * mRNAs * expression levels * regulatory features such as transcription factor bindings sites and much more.

Since all of this information relates to specific positions or ranges on the chromosome, displaying it alongside the chromosomal coordinates is a useful way to integrate and visualize it. We call such strips of annotation tracts and display them in genome browsers. Quite a number of such browsers exist and most work on the same principle: server hosted databases are queried through a Web interface; the resulting data is displayed graphically in a Web browser window. The large data centres each have their own browsers, but arguably the best engineered, most informative and most widely used one is provided by the University of California Santa Cruz (UCSC) Genome Browser Project.

Compiling the data requires a massive annotation effort, which has not been completed for all genome-sequenced species. In particular, not all of our MYSPEs have been included in the major model-organism annotation efforts. The general strategy for analysis of a gene in MYSPE is thus to map it to homologous genes in model organisms(W). In this assignment you will explore the UCSC genome browser and we will go through an exercise that relates fungal replication genes to human genes. We have previously focused a lot on Mbp1 homologs, but these have no clear equivalences in “higher” eukaryotes. However one of the key target genes of Mbp1 is the cell cycle protein Cdc6(W), which is well conserved in fungi and other eukaryotes eukaryotes and has a human homolog(W). Since generally speaking the annotation level for human genes is the highest, we will have a closer look at that gene.

 

The UCSC genome browser

 

The University of California Santa Cruz (UCSC) Genome Browser Project has the largest offering of annotation information. However it is strictly model-organism oriented and you will probably not find MYSPE among its curated genomes. Nevertheless, if you are studying eg. human genes, or yeast, the UCSC browser will probably be your first choice.

Task…

In this task you will access the UCSC genome browser view of the yeast Cdc6 gene and its human orthologue, the human Cdc6 gene. You will explore some of the very large number of tracks that are available and study the transcription factor binding region.

This tracts show you the ChIP-chip validated TF-binding sites in the upstream regulatory region of yeast Cdc6. Note that there are several Mbp1 binding sites. Curiously, Swi6 is also listed there - but you know that Swi6 does not actually bind DNA directly, but forms a complex with either of the APSES domain transcription factors Mbp1 and Swi4 which form the MBF complex. However, crosslinking of the complex, and immunoprecipitation with anti-Swi6, would certainly identify this region. You should be aware that an annotation of a protein in a ChIP-chip experiment is not the same as demonstrating a protein’s physical interaction with DNA.


Based on this kind of information, it should be straightforward to identify human transcription factors that potentially regulate human Cdc6 and determine - via sequence comparisons - whether any of them are homologous to any of the yeast transcription factors or factors in MYSPE. Through a detailed analysis of existing systems, their regulatory components and the conservation of regulation, one can in principle establish functional equivalences across large evolutionary distances.

Alternatives

Task…

Visit the following three alternatives to UCSC:

Further Reading

Wang, Jun et al.. (2013). “A brief introduction to web-based genome browsers”. Briefings in Bioinformatics 14(2):131–43 .
[PMID: 22764121] [DOI: 10.1093/bib/bbs029]

Genome browser provides a graphical interface for users to browse, search, retrieve and analyze genomic sequence and annotation data. Web-based genome browsers can be classified into general genome browsers with multiple species and species-specific genome browsers. In this review, we attempt to give an overview for the main functions and features of web-based genome browsers, covering data visualization, retrieval, analysis and customization. To give a brief introduction to the multiple-species genome browser, we describe the user interface and main functions of the Ensembl and UCSC genome browsers using the human alpha-globin gene cluster as an example. We further use the MSU and the Rice-Map genome browsers to show some special features of species-specific genome browser, taking a rice transcription factor gene OsSPL14 as an example.

Sloan, Cricket A et al.. (2016). “ENCODE data at the ENCODE portal”. Nucleic Acids Research 44(D1):D726–32 .
[PMID: 26527727] [DOI: 10.1093/nar/gkv1160]

The Encyclopedia of DNA Elements (ENCODE) Project is in its third phase of creating a comprehensive catalog of functional elements in the human genome. This phase of the project includes an expansion of assays that measure diverse RNA populations, identify proteins that interact with RNA and DNA, probe regions of DNA hypersensitivity, and measure levels of DNA methylation in a wide range of cell and tissue types to identify putative regulatory elements. To date, results for almost 5000 experiments have been released for use by the scientific community. These data are available for searching, visualization and download at the new ENCODE Portal (www.encodeproject.org). The revamped ENCODE Portal provides new ways to browse and search the ENCODE data based on the metadata that describe the assays as well as summaries of the assays that focus on data provenance. In addition, it is a flexible platform that allows integration of genomic data from multiple projects. The portal experience was designed to improve access to ENCODE data by relying on metadata that allow reusability and reproducibility of the experiments.

Pazin, Michael J. (2015). “Using the ENCODE Resource for Functional Annotation of Genetic Variants”. Cold Spring Harbor Protocols 2015(6):522–36 .
[PMID: 25762420] [DOI: 10.1101/pdb.top084988]

This article illustrates the use of the Encyclopedia of DNA Elements (ENCODE) resource to generate or refine hypotheses from genomic data on disease and other phenotypic traits. First, the goals and history of ENCODE and related epigenomics projects are reviewed. Second, the rationale for ENCODE and the major data types used by ENCODE are briefly described, as are some standard heuristics for their interpretation. Third, the use of the ENCODE resource is examined. Standard use cases for ENCODE, accessing the ENCODE resource, and accessing data from related projects are discussed. Although the focus of this article is the use of ENCODE data, some of the same approaches can be used with data from other projects.

Questions, comments

If in doubt, ask! If anything about this contents is not clear to you, do not proceed but ask for clarification. If you have ideas about how to make this material better, let’s hear them. We are aiming to compile a list of FAQs for all learning units, and your contributions will count towards your participation marks.

Improve this page! If you have questions or comments, please post them on the Quercus Discussion board with a subject line that includes the name of the unit.

References

Page ID: BIN-GENOME-Genome_Browsers

Author:
Boris Steipe ( <boris.steipe@utoronto.ca> )
Created:
2017-08-05
Last modified:
2022-09-14
Version:
1.1
Version History:
–  1.1 2020 Updates; re-added JBrowse (SGD), NCBI and Ensembl as visit-tasks
–  1.0 First live version
–  0.1 First stub
Tagged with:
–  Unit
–  Live
–  Has further reading

 

[END]