Difference between revisions of "BIN-GENOME-Genome Browsers"

From "A B C"
Jump to navigation Jump to search
m (Created page with "<div id="BIO"> <div class="b1"> Genome Browsers </div> {{Vspace}} <div class="keywords"> <b>Keywords:</b>  UCSC, GMod; in practice </div> {{Vspace}} __TOC__ ...")
 
 
(16 intermediate revisions by the same user not shown)
Line 1: Line 1:
<div id="BIO">
+
<div id="ABC">
  <div class="b1">
+
<div style="padding:5px; border:1px solid #000000; background-color:#b3dbce; font-size:300%; font-weight:400; color: #000000; width:100%;">
 
Genome Browsers
 
Genome Browsers
  </div>
+
<div style="padding:5px; margin-top:20px; margin-bottom:10px; background-color:#b3dbce; font-size:30%; font-weight:200; color: #000000; ">
 +
(UCSC genome browser)
 +
</div>
 +
</div>
  
  {{Vspace}}
+
{{Smallvspace}}
 
+
 
<div class="keywords">
+
 
<b>Keywords:</b>&nbsp;
+
<div style="padding:5px; border:1px solid #000000; background-color:#b3dbce33; font-size:85%;">
UCSC, GMod; in practice
+
<div style="font-size:118%;">
 +
<b>Abstract:</b><br />
 +
<section begin=abstract />
 +
Exploring genomes with the UCSC genome browser
 +
<section end=abstract />
 +
</div>
 +
<!-- ============================  -->
 +
<hr>
 +
<table>
 +
<tr>
 +
<td style="padding:10px;">
 +
<b>Objectives:</b><br />
 +
This unit will ...
 +
* ... introduce work with the UCSC genome browser;
 +
</td>
 +
<td style="padding:10px;">
 +
<b>Outcomes:</b><br />
 +
After working through this unit you ...
 +
* ... can use the UCSC genome browser for genome analysis queries;
 +
</td>
 +
</tr>
 +
</table>
 +
<!-- ============================  -->
 +
<hr>
 +
<b>Deliverables:</b><br />
 +
<section begin=deliverables />
 +
<li><b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.</li>
 +
<li><b>Journal</b>: Document your progress in your [[FND-Journal|Course Journal]]. Some tasks may ask you to include specific items in your journal. Don't overlook these.</li>
 +
<li><b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]].</li>
 +
<section end=deliverables />
 +
<!-- ============================  -->
 +
<hr>
 +
<section begin=prerequisites />
 +
<b>Prerequisites:</b><br />
 +
This unit builds on material covered in the following prerequisite units:<br />
 +
*[[BIN-Genome-Annotation|BIN-Genome-Annotation (Genome annotation)]]
 +
<section end=prerequisites />
 +
<!-- ============================  -->
 
</div>
 
</div>
  
{{Vspace}}
+
{{Smallvspace}}
 +
 
 +
 
 +
 
 +
{{Smallvspace}}
  
  
Line 19: Line 63:
  
  
{{STUB}}
+
=== Evaluation ===
 +
<b>Evaluation: NA</b><br />
 +
<div style="margin-left: 2rem;">This unit is not evaluated for course marks.</div>
 +
== Contents ==
 +
 
 +
{{Task|1=
 +
*;Read:
 +
{{#pmid: 31691824}}
 +
 
 +
}}
 +
 
 +
 
 +
==Introduction==
  
 
{{Vspace}}
 
{{Vspace}}
  
 +
Large scale genome sequencing and annotation has made a wealth of information available that is all related to the same biological objects: the DNA. The information however can be of very different types, it includes:
 +
* the actual sequence
 +
* sequence variants (SNPs and CNVs)
 +
* conservation between related species
 +
* genes (with introns and exons)
 +
* mRNAs
 +
* expression levels
 +
* regulatory features such as transcription factor bindings sites
 +
and much more.
 +
 +
Since all of this information relates to specific positions or ranges on the chromosome, displaying it alongside the chromosomal coordinates is a useful way to integrate and visualize it. We call such strips of annotation ''tracts'' and display them in ''genome browsers''. Quite a number of such browsers exist and most work on the same principle: server hosted databases are queried through a Web interface; the resulting data is displayed graphically in a Web browser window. The large data centres each have their own browsers, but arguably the best engineered, most informative and most widely used one is provided by the University of California Santa Cruz (UCSC) Genome Browser Project.
  
</div>
+
Compiling the data requires a massive annotation effort, which has not been completed for all genome-sequenced species. In particular, not all of our MYSPEs have been included in the major model-organism annotation efforts. The general strategy for analysis of a gene in MYSPE is thus to map it to homologous genes in {{WP|Model organism|model organisms}}. In this assignment you will explore the UCSC genome browser and we will go through an exercise that relates fungal replication genes to human genes. We have previously focused a lot on Mbp1 homologs, but these have no clear equivalences in "higher" eukaryotes. However one of the '''key target genes of Mbp1''' is the cell cycle protein {{WP|Cdc6}}, which is well conserved in fungi and other eukaryotes eukaryotes and has a {{WP|CDC6|human homolog}}. Since generally speaking the annotation level for human genes is the highest, we will have a closer look at that gene.
<div id="ABC-unit-framework">
+
 
== Abstract ==
+
{{vspace}}
<!-- included from "../components/BIN-Genome-Browsers.components.wtxt", section: "abstract" -->
+
<!--
...
+
==GBrowse==
 +
{{smallvspace}}
 +
 
 +
[http://gmod.org/wiki/GBrowse '''GBrowse'''] - the Generic genome Browser - is the browser developed by the [http://gmod.org/wiki/Main_Page Generic Model Organism Database] project that aims to make industry-strength bioinformatics tools and software available for the model organism community. One of the many databases that uses GMod tools is [http://www.yeastgenome.org/ the Saccharomyces Genome Database] but you will find the browser in use on many different sites.
 +
 
 +
{{task|1=
 +
In this task you will access the SGD GBrowse page for Cdc6 and explore some of the options.
 +
# Navigate to the [http://www.yeastgenome.org/ the Saccharomyces Genome Database], enter Cdc6 into the site search field and on the result page, in the '''Sequence''' / '''Location''' box click on the [http://browse.yeastgenome.org/fgb2/gbrowse/scgenome/?name=YJL194W '''View in GBrowse'''] link.
 +
# Locate CDC6 (YJL194W) as a red bar in the graph. Note that the triangle at the end points in the direction of transcription.
 +
# Note how the shape of the cursor changes over different regions of the window. For example, you can click/hold the graph and slide it left and right (this changes the overview indicator that shows where on the chromosome the currently displayed window of sequence is located). You can click on and follow annotation information. You can also select a stretch of nucleotides and dump it as FASTA (hover over the ruler in the ''Details'' pane). It should be obvious how this could e.g. be useful to study untranslated regions upstream of the stop-codon to validate translation start sites.
 +
# Zoom in by selecting '''Show 5 kbp''' at the scroll/zoom controls.
 +
# Click on the '''Select Tracks''' tab at the top (next to the '''Browser''' tab). This gives you access to a fine-grained selection of all tracks that have been created as genome annotations.
 +
# Find the section for '''Transcription Factors''' (a subsection of '''Transcription Regulation'''). Click on the star next to '''TF ChIP chip''' to mark this experiment as a "favorite". Then click on '''Show Favorites Only''' at the top of the page. Finally check '''All on''' for the '''Transcription Factors''' track and '''Back to browser'''.
 +
}}
 +
 
 +
 
 +
This view shows you the ChIP-chip validated TF-binding sites in the upstream regulatory region of yeast Cdc6. Note that Mbp1 is among them. Curiously, Swi6 is also listed there - but you know that [http://www.yeastgenome.org/cgi-bin/locus.fpl?locus=YLR182W Swi6] does not actually bind DNA directly, but forms a complex with the APSES domain transcription factors Mbp1/Swi4 which form the [http://www.yeastgenome.org/cgi-bin/GO/goTerm.pl?goid=0030907 MBF] complex. However, crosslinking of the complex and immunoprecipitation with anti-Swi6 would certainly identify this region. You should be aware that an annotation of a protein in a ChIP-chip experiment is not the same as demonstrating a protein's physical interaction with DNA.
  
 
{{Vspace}}
 
{{Vspace}}
 +
-->
 +
<!--
 +
==NCBI Map Viewer==
 +
{{smallvspace}}
  
 +
{{task|1=
  
== This unit ... ==
+
In this task you will locate and display a map view at the NCBI for the yeast Cdc6 gene.
=== Prerequisites ===
 
<!-- included from "../components/BIN-Genome-Browsers.components.wtxt", section: "prerequisites" -->
 
<!-- included from "ABC-unit_components.wtxt", section: "notes-prerequisites" -->
 
You need to complete the following units before beginning this one:
 
*[[BIN-Genome-Annotation]]
 
*[[BIN-Genome-Databases]]
 
  
{{Vspace}}
+
# Navigate to the [http://www.ncbi.nlm.nih.gov/ '''NCBI''' home page] and follow the link to '''Genomes & maps''' in the left-hand menu.
 +
# Click on the '''Tools''' tab and find the link to the [http://www.ncbi.nlm.nih.gov/mapview/ '''Map Viewer''']
 +
# In the '''Fungi''' section, click on the latest "build" of the ''Saccharomycs cerevisiae'' genome. This takes you to an overview page of the status of the Genome project. Each chromosome is linked to its map. If you would not know what chromosome to look for, you would need to search by keyword, or gene name in the nucleotide database. Regarding Cdc6, you remember from the task above that it is located on [http://www.ncbi.nlm.nih.gov/projects/mapview/maps.cgi?taxid=4932&chr=X Chromosome X] (''i.e'' the {{WP|Roman numerals|roman numeral}} ten, not the "X-Chromosome"). You will arrive at the actual mapview of the entire Chromosome with the RefSeq accession number <code>NC_001142.9</code>. This large nucleotide record containing the entire chromosomal sequence underlies the display.
 +
# Enter '''Cdc6''' into the Search field and click the '''Find in This View''' button. Then zoom in a few levels.
 +
}}
 +
 
 +
 
 +
The [http://www.ncbi.nlm.nih.gov/projects/mapview/maps.cgi?TAXID=4932&CHR=X&MAPS=cntg-r,genes%5B36220.54%3A43678.04%5D&QUERY=Cdc6&zoom=10 resulting view] shows you the location and orientation of the gene on the chromosome. A number of links to various NCBI databases are given for each gene. Note that this is primarily a tool for database crossreferencing, not for integrating and displaying annotations.
 +
 
 +
{{vspace}}
 +
-->
 +
<!--
 +
==Ensembl==
 +
{{smallvspace}}
 +
 
 +
The EBI offers its own version of genome browsers through the Ensembl project. A large number of genomes have been annotated, cross-referenced and made available for viewing. The EBI has spent a lot of effort on automated curation of their genome offerings. '''The ensemble offerings are therefore more comprehensive and  complete than those of other sources'''. In particular, you may find a genome view for MYSPE. Use any other fungus if MYSPE is not present.
 +
 
 +
{{task|1=
 +
 
 +
In this task you will review the ensembl view of the MYSPE ortholog to yeast CDC6.
 +
 
 +
# Navigate to the [http://fungi.ensembl.org/index.html '''EnsemblFungi'''] page (easy to find via Google).
 +
 
 +
# Select ''Saccharomyces cerevisiae'' from the species list.
 +
# '''Search''' for  Cdc6 as a search term in the ''Search Saccharomyces cerevisiae ...'' field.
 +
# Click on [http://fungi.ensembl.org/Saccharomyces_cerevisiae/Gene/Summary?g=YJL194W;r=X:69338-70879;t=YJL194W;db=core CDC6 (YJL194W)]
  
 +
You will be taken to a browser view of the genome. Tracts can be switched on and off through the menu on the left hand side.
  
=== Objectives ===
+
# Find the link to [http://fungi.ensembl.org/Saccharomyces_cerevisiae/Gene/Compara_Ortholog?db=core;g=YJL194W;r=X:69338-70879;t=YJL194W '''Orthologues'''] under the '''Fungal Compara''' section in the menu.
<!-- included from "../components/BIN-Genome-Browsers.components.wtxt", section: "objectives" -->
+
# In the resulting page, find the MYSPE orthologue and click on the link in the '''Location''' column.
...
+
# On the Browser page, click on the cogwheel icon in the bottom left bar of the lower pane to configure tracks.
 +
# On the configuration page, in the '''Configure Region Image''' tab, click on '''Sequence and Assembly''' in the left-hand menu and click the (check)-boxes to turn '''Contigs''' off and '''Translated sequence''' on. Leave '''Sequence''' on. Click the checkmark in the top-right corner of the configuration window to close it and return to the browser view.
 +
# Zoom in until you see the display of the actual nucleotides and the six reading frames. This is a genome view of MYSPE at the actual nucleotide level.
  
{{Vspace}}
+
}}
  
  
=== Outcomes ===
+
ensembl provides a very comprehensive offering in terms of sequences, and it has a well thought-out and maintained [http://rest.ensemblgenomes.org/ REST API]. However, ensemble too offers little in terms of annotations of DNA elements, expression levels and the like. Nevertheless, since it is the database with the largest number of species annotated, it would be the tool to go to if you were to compare syntenic regions or genomic context between different species.
<!-- included from "../components/BIN-Genome-Browsers.components.wtxt", section: "outcomes" -->
 
...
 
  
{{Vspace}}
+
{{vspace}}
  
 +
-->
  
=== Deliverables ===
+
==The UCSC genome browser==
<!-- included from "../components/BIN-Genome-Browsers.components.wtxt", section: "deliverables" -->
+
{{smallvspace}}
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-time_management" -->
 
*<b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
 
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-journal" -->
 
*<b>Journal</b>: Document your progress in your [[FND-Journal|course journal]].
 
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-insights" -->
 
*<b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|insights! page]].
 
  
{{Vspace}}
+
The University of California Santa Cruz (UCSC) Genome Browser Project has the largest offering of annotation information. However it is strictly model-organism oriented and you will probably not find MYSPE among its curated genomes. Nevertheless, if you are studying eg. human genes, or yeast, the UCSC browser will probably be your first choice.
  
 +
{{task|1=
  
=== Evaluation ===
+
In this task you will access the UCSC genome browser view of the yeast Cdc6 gene and its human orthologue, the human Cdc6 gene. You will explore some of the very large number of tracks that are available and study the transcription factor binding region.
<!-- included from "../components/BIN-Genome-Browsers.components.wtxt", section: "evaluation" -->
 
<!-- included from "ABC-unit_components.wtxt", section: "eval-none" -->
 
<b>Evaluation: NA</b><br />
 
:This unit is not evaluated for course marks.
 
  
{{Vspace}}
+
* Navigate to the [http://genome.ucsc.edu/ '''UCSC''' Genome Bioinformatics entry page] and follow the link to the '''Genome Browser''' in the "Our tools" section.
  
 +
* To view genomes, you need to select a species. Note the scrollable species overview on the left hand side. The species are arranged by their position in the universal Tree of Life for eukaryotes.
 +
** Find and click on "Saccharomyces cerevisiae". This loads the genome data overview for the sequence. Enter Cdc6 as the '''Position search term'''. Click on '''Go'''.
 +
* Click on the link to the [http://genome.ucsc.edu/cgi-bin/hgTracks?db=sacCer3&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chrX%3A69338%2D70879&hgsid=905699123_xtytFDSaNg4p4toSAxAmgiclnbrr Cdc6 gene] on chromosome X.
 +
* Click on the button to zoom out '''3x''' - we want to see the upstream regulatory region.
 +
* In the subsection for '''Expression and Regulation''', find the menu for '''Regulatory Code''' and select '''full'''; select '''hide''' for all other expression tracks. Click '''refresh'''.
  
</div>
+
This tracts show you the ChIP-chip validated TF-binding sites in the upstream regulatory region of yeast Cdc6. Note that there are several Mbp1 binding sites. Curiously, Swi6 is also listed there - but you know that [https://www.yeastgenome.org/locus/S000004172 Swi6] does not actually bind DNA directly, but forms a complex with either of the APSES domain transcription factors Mbp1 and Swi4 which form the [https://www.yeastgenome.org/go/GO:0030907 MBF] complex. However, crosslinking of the complex, and immunoprecipitation with anti-Swi6, would certainly identify this region. You should be aware that an annotation of a protein in a ChIP-chip experiment is not the same as demonstrating a protein's physical interaction with DNA.
<div id="BIO">
 
== Contents ==
 
<!-- included from "../components/BIN-Genome-Browsers.components.wtxt", section: "contents" -->
 
...
 
  
{{Vspace}}
+
* Zoom in, for better resolution, and shift-drag the view to keep the regulatory region of the Cdc6 gene centred. Then note that both stretches of DNA that have demostrated TF binding sites are also listed as conserved regions (dark red bars). As with all track elements, clicking on the bars will expand the display density, and clicking again will take you to an information page about this experimental tract, with further download options.
  
 +
* Zoom in until you can see the individual nucleotides for the Mbp1 binding sites. Then click on one of the Mbp1 bars to get information about the specific binding site. '''Note that the canoncial binding sequence corresponds to a regular expression of <code>[AT]CGCG[AT]</code> ... a pattern with a probability of occurrence of about 1/1000 in random sequence.''' This is not very stringent - but here we have three such motifs within 200 bp - two of them adjacent.
  
== Further reading, links and resources ==
+
----
<!-- {{#pmid: 19957275}} -->
 
<!-- {{WWW|WWW_GMOD}} -->
 
<!-- <div class="reference-box">[http://www.ncbi.nlm.nih.gov]</div> -->
 
  
{{Vspace}}
+
* Return to the genome browser entry page to access the genome browser for the '''human genome'''.
 +
* Click on the link to humans. Note that this is the hg38 assembly.
 +
* Enter CDC6 into the "Position/Search Term" field and click "Go". You should get a list of entries, click on the top link, the <code>Homo sapiens cell division cycle 6 (CDC6), mRNA</code> gene on chromosome 17.</tt>
  
 +
* Zoom out '''1.5x''' to view the upstream regulatory region: the end of the adjacent WIPF2 gene should have just come into view on the left.
 +
* Study the Genome Browser view of the human CDC6 homolog.
 +
** In particular, note the extensive functional annotations of DNA and the alignments of vertebrate syntenic regions that allow detailed genomic comparisons.
 +
** Distinguish between exon and intron sequence.
 +
** Note that the mammal Conservation track has high values for all of the exons, but not only for exons.
 +
** Find more information on the "Layered H3K27Ac" tract.
  
== Notes ==
+
* Note the '''large''' number of available tracks that have been integrated into this view. Most of them are switched off. Find the '''Regulation''' section, and follow the link to the "ORegAnno" information to see what that is about. Note that you can switch individual annotations on or off on this page, as well as set the display format for all of the results. Select the check-box '''only''' for "transcription factor binding site" to be on, select the "Display mode" to '''full''' and click '''submit'''.
<!-- included from "../components/BIN-Genome-Browsers.components.wtxt", section: "notes" -->
+
* Study this information and note:
<!-- included from "ABC-unit_components.wtxt", section: "notes" -->
+
** There is a cluster of TFBS just upstream of the transcription initiation site.
<references />
+
** This cluster coincides with the highest H3K27Ac density.
 +
** If you &lt;control&gt;-click (right-click?) on the top orange bar of this cluster, a contextual menu opens from which you can access the details page for OREG1791811 in a new window. Follow the link to the RBL2 transcription factor via [http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?g=ENSG00000103479;r=16:53445781-53491648;t=ENST00000379935 ENST00000379935] ... from where you can access transcript and gene and expression and protein family and GO and all other information.
 +
* Go back to the Genome Browser and set the ORegAnno tract to "pack" and click "refresh".
 +
* Slide the SNP track to just beneath the RefSeq genes track that contains the introns and exons. You will notice that one of the SNPs is green, and two are red. Why? Set the "Common SNPs" track display mode to "pack" and click "refresh".
 +
}}
  
{{Vspace}}
 
  
 +
Based on this kind of information, it should be straightforward to identify human transcription factors that potentially regulate human Cdc6 and determine - via sequence comparisons - whether any of them are homologous to any of the yeast transcription factors or factors in MYSPE. Through a detailed analysis of existing systems, their regulatory components and the conservation of regulation, one can in principle establish functional equivalences across large evolutionary distances.
  
</div>
 
<div id="ABC-unit-framework">
 
== Self-evaluation ==
 
<!-- included from "../components/BIN-Genome-Browsers.components.wtxt", section: "self-evaluation" -->
 
 
<!--
 
<!--
=== Question 1===
+
The UCSC browser has a sometimes bewildering amount of information available. But its curators are aware of the need for educating users regarding the utility of their tools.
  
Question ...
+
{{task|1=
  
<div class="toccolours mw-collapsible mw-collapsed" style="width:800px">
+
In this task you will access some of the tutorial information that UCSC provides.
Answer ...
+
# Return to the [http://genome.ucsc.edu/ '''UCSC''' Genome Bioinformatics entry page] and follow the link to '''Training''' in the left-hand menu.
<div class="mw-collapsible-content">
+
# Follow the link to the [http://www.openhelix.com/ucsc '''OpenHelix UCSC tutorials'''].
Answer ...
+
# Download the Hands-on exercise PDF file and work through '''Exercise 2''' (the rat leptin exercise).
 +
}}
  
</div>
+
This exercise includes a number of interesting options to work with the UCSC data - the BLAT tool for genomic region alignment and the selective display of SNP annotations.
  </div>
 
 
 
  {{Vspace}}
 
  
 +
; Optional
 +
* Work through exercise one and three of the OpenHelix UCSC introduction.
 +
* Access the [http://www.openhelix.com/ENCODE2 OpenHelix '''ENCODE''' tutorial], download the '''Hands-on Exercises''' pdf and work through the exercises. Exercise 3 is particularly valuable, as it teaches you how to create results from complex intersections of queries.
 +
* You can also work through the [http://www.nature.com/scitable/ebooks/guide-to-the-ucsc-genome-browser-16569863 Guide to the UCSC Genome Browser at "nature"] which gives an excellent, in-depth overview.
 +
* Study the ''User's guide to ENCODE'' paper linked below.
 
-->
 
-->
  
{{Vspace}}
+
==Alternatives==
  
 +
{{Task|1=
  
 +
Visit the following three alternatives to UCSC:
  
{{Vspace}}
+
* [https://browse.yeastgenome.org/?loc=chrX%3A69029..71186&tracks=DNA%2CAll%20Annotated%20Sequence%20Features%2CDoube_strand_break_hotspots%2CXrn1-sensitive_unstable%20transcripts_XUTs%2CScGlycerolMedia%2C3%27UTRs%2CPolII_occupancy_WT%2CProtein-Coding-Genes&highlight= '''JBrowse''' at SGD (yeast CDC6)]
 +
* [https://www.ncbi.nlm.nih.gov/genome/gdv/browser/genome/?cfg=NCID_1_14780446_130.14.18.128_9146_1601098490_12150196 '''NCBI''' (yeast CDC6)]
 +
*[http://fungi.ensembl.org/Saccharomyces_cerevisiae/Gene/Summary?g=YJL194W;r=X:69338-70879;t=YJL194W;db=core '''Ensembl'''  (yeast CDC6 - YJL194W)]
  
  
<!-- included from "ABC-unit_components.wtxt", section: "ABC-unit_ask" -->
+
}}
  
----
+
== Further reading, links and resources ==
  
{{Vspace}}
+
{{#pmid: 22764121}}<!-- Genom browser introduction -->
 +
{{#pmid: 26527727}}<!-- UCSC Genome browser update 2017 -->
 +
{{#pmid: 25762420}}<!-- Ensembl 2017 -->
  
<b>If in doubt, ask!</b> If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.
+
== Notes ==
 +
<references />
  
----
+
{{Vspace}}
  
{{Vspace}}
 
  
 
<div class="about">
 
<div class="about">
Line 152: Line 268:
 
:2017-08-05
 
:2017-08-05
 
<b>Modified:</b><br />
 
<b>Modified:</b><br />
:2017-08-05
+
:2020-09-25
 
<b>Version:</b><br />
 
<b>Version:</b><br />
:0.1
+
:1.1
 
<b>Version history:</b><br />
 
<b>Version history:</b><br />
 +
*1.1 2020 Updates; re-added JBrowse (SGD), NCBI and Ensembl as visit-tasks
 +
*1.0 First live version
 
*0.1 First stub
 
*0.1 First stub
 
</div>
 
</div>
[[Category:ABC-units]]
 
<!-- included from "ABC-unit_components.wtxt", section: "ABC-unit_footer" -->
 
  
 
{{CC-BY}}
 
{{CC-BY}}
  
 +
[[Category:ABC-units]]
 +
{{UNIT}}
 +
{{LIVE}}
 
</div>
 
</div>
 
<!-- [END] -->
 
<!-- [END] -->

Latest revision as of 01:33, 6 September 2021

Genome Browsers

(UCSC genome browser)


 


Abstract:

Exploring genomes with the UCSC genome browser


Objectives:
This unit will ...

  • ... introduce work with the UCSC genome browser;

Outcomes:
After working through this unit you ...

  • ... can use the UCSC genome browser for genome analysis queries;

Deliverables:

  • Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
  • Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.
  • Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.

  • Prerequisites:
    This unit builds on material covered in the following prerequisite units:


     



     



     


    Evaluation

    Evaluation: NA

    This unit is not evaluated for course marks.

    Contents

    Task:

    • Read
    Lee et al. (2020) UCSC Genome Browser enters 20th year. Nucleic Acids Res 48:D756-D761. (pmid: 31691824)

    PubMed ] [ DOI ] The University of California Santa Cruz Genome Browser website (https://genome.ucsc.edu) enters its 20th year of providing high-quality genomics data visualization and genome annotations to the research community. In the past year, we have added a new option to our web BLAT tool that allows search against all genomes, a single-cell expression viewer (https://cells.ucsc.edu), a 'lollipop' plot display mode for high-density variation data, a RESTful API for data extraction and a custom-track backup feature. New datasets include Tabula Muris single-cell expression data, GeneHancer regulatory annotations, The Cancer Genome Atlas Pan-Cancer variants, Genome Reference Consortium Patch sequences, new ENCODE transcription factor binding site peaks and clusters, the Database of Genomic Variants Gold Standard Variants, Genomenon Mastermind variants and three new multi-species alignment tracks.


    Introduction

     

    Large scale genome sequencing and annotation has made a wealth of information available that is all related to the same biological objects: the DNA. The information however can be of very different types, it includes:

    • the actual sequence
    • sequence variants (SNPs and CNVs)
    • conservation between related species
    • genes (with introns and exons)
    • mRNAs
    • expression levels
    • regulatory features such as transcription factor bindings sites

    and much more.

    Since all of this information relates to specific positions or ranges on the chromosome, displaying it alongside the chromosomal coordinates is a useful way to integrate and visualize it. We call such strips of annotation tracts and display them in genome browsers. Quite a number of such browsers exist and most work on the same principle: server hosted databases are queried through a Web interface; the resulting data is displayed graphically in a Web browser window. The large data centres each have their own browsers, but arguably the best engineered, most informative and most widely used one is provided by the University of California Santa Cruz (UCSC) Genome Browser Project.

    Compiling the data requires a massive annotation effort, which has not been completed for all genome-sequenced species. In particular, not all of our MYSPEs have been included in the major model-organism annotation efforts. The general strategy for analysis of a gene in MYSPE is thus to map it to homologous genes in model organisms. In this assignment you will explore the UCSC genome browser and we will go through an exercise that relates fungal replication genes to human genes. We have previously focused a lot on Mbp1 homologs, but these have no clear equivalences in "higher" eukaryotes. However one of the key target genes of Mbp1 is the cell cycle protein Cdc6, which is well conserved in fungi and other eukaryotes eukaryotes and has a human homolog. Since generally speaking the annotation level for human genes is the highest, we will have a closer look at that gene.


     

    The UCSC genome browser

     

    The University of California Santa Cruz (UCSC) Genome Browser Project has the largest offering of annotation information. However it is strictly model-organism oriented and you will probably not find MYSPE among its curated genomes. Nevertheless, if you are studying eg. human genes, or yeast, the UCSC browser will probably be your first choice.

    Task:
    In this task you will access the UCSC genome browser view of the yeast Cdc6 gene and its human orthologue, the human Cdc6 gene. You will explore some of the very large number of tracks that are available and study the transcription factor binding region.

    • To view genomes, you need to select a species. Note the scrollable species overview on the left hand side. The species are arranged by their position in the universal Tree of Life for eukaryotes.
      • Find and click on "Saccharomyces cerevisiae". This loads the genome data overview for the sequence. Enter Cdc6 as the Position search term. Click on Go.
    • Click on the link to the Cdc6 gene on chromosome X.
    • Click on the button to zoom out 3x - we want to see the upstream regulatory region.
    • In the subsection for Expression and Regulation, find the menu for Regulatory Code and select full; select hide for all other expression tracks. Click refresh.

    This tracts show you the ChIP-chip validated TF-binding sites in the upstream regulatory region of yeast Cdc6. Note that there are several Mbp1 binding sites. Curiously, Swi6 is also listed there - but you know that Swi6 does not actually bind DNA directly, but forms a complex with either of the APSES domain transcription factors Mbp1 and Swi4 which form the MBF complex. However, crosslinking of the complex, and immunoprecipitation with anti-Swi6, would certainly identify this region. You should be aware that an annotation of a protein in a ChIP-chip experiment is not the same as demonstrating a protein's physical interaction with DNA.

    • Zoom in, for better resolution, and shift-drag the view to keep the regulatory region of the Cdc6 gene centred. Then note that both stretches of DNA that have demostrated TF binding sites are also listed as conserved regions (dark red bars). As with all track elements, clicking on the bars will expand the display density, and clicking again will take you to an information page about this experimental tract, with further download options.
    • Zoom in until you can see the individual nucleotides for the Mbp1 binding sites. Then click on one of the Mbp1 bars to get information about the specific binding site. Note that the canoncial binding sequence corresponds to a regular expression of [AT]CGCG[AT] ... a pattern with a probability of occurrence of about 1/1000 in random sequence. This is not very stringent - but here we have three such motifs within 200 bp - two of them adjacent.

    • Return to the genome browser entry page to access the genome browser for the human genome.
    • Click on the link to humans. Note that this is the hg38 assembly.
    • Enter CDC6 into the "Position/Search Term" field and click "Go". You should get a list of entries, click on the top link, the Homo sapiens cell division cycle 6 (CDC6), mRNA gene on chromosome 17.
    • Zoom out 1.5x to view the upstream regulatory region: the end of the adjacent WIPF2 gene should have just come into view on the left.
    • Study the Genome Browser view of the human CDC6 homolog.
      • In particular, note the extensive functional annotations of DNA and the alignments of vertebrate syntenic regions that allow detailed genomic comparisons.
      • Distinguish between exon and intron sequence.
      • Note that the mammal Conservation track has high values for all of the exons, but not only for exons.
      • Find more information on the "Layered H3K27Ac" tract.
    • Note the large number of available tracks that have been integrated into this view. Most of them are switched off. Find the Regulation section, and follow the link to the "ORegAnno" information to see what that is about. Note that you can switch individual annotations on or off on this page, as well as set the display format for all of the results. Select the check-box only for "transcription factor binding site" to be on, select the "Display mode" to full and click submit.
    • Study this information and note:
      • There is a cluster of TFBS just upstream of the transcription initiation site.
      • This cluster coincides with the highest H3K27Ac density.
      • If you <control>-click (right-click?) on the top orange bar of this cluster, a contextual menu opens from which you can access the details page for OREG1791811 in a new window. Follow the link to the RBL2 transcription factor via ENST00000379935 ... from where you can access transcript and gene and expression and protein family and GO and all other information.
    • Go back to the Genome Browser and set the ORegAnno tract to "pack" and click "refresh".
    • Slide the SNP track to just beneath the RefSeq genes track that contains the introns and exons. You will notice that one of the SNPs is green, and two are red. Why? Set the "Common SNPs" track display mode to "pack" and click "refresh".


    Based on this kind of information, it should be straightforward to identify human transcription factors that potentially regulate human Cdc6 and determine - via sequence comparisons - whether any of them are homologous to any of the yeast transcription factors or factors in MYSPE. Through a detailed analysis of existing systems, their regulatory components and the conservation of regulation, one can in principle establish functional equivalences across large evolutionary distances.


    Alternatives

    Task:
    Visit the following three alternatives to UCSC:

    Further reading, links and resources

    Wang et al. (2013) A brief introduction to web-based genome browsers. Brief Bioinformatics 14:131-43. (pmid: 22764121)

    PubMed ] [ DOI ] Genome browser provides a graphical interface for users to browse, search, retrieve and analyze genomic sequence and annotation data. Web-based genome browsers can be classified into general genome browsers with multiple species and species-specific genome browsers. In this review, we attempt to give an overview for the main functions and features of web-based genome browsers, covering data visualization, retrieval, analysis and customization. To give a brief introduction to the multiple-species genome browser, we describe the user interface and main functions of the Ensembl and UCSC genome browsers using the human alpha-globin gene cluster as an example. We further use the MSU and the Rice-Map genome browsers to show some special features of species-specific genome browser, taking a rice transcription factor gene OsSPL14 as an example.

    Sloan et al. (2016) ENCODE data at the ENCODE portal. Nucleic Acids Res 44:D726-32. (pmid: 26527727)

    PubMed ] [ DOI ] The Encyclopedia of DNA Elements (ENCODE) Project is in its third phase of creating a comprehensive catalog of functional elements in the human genome. This phase of the project includes an expansion of assays that measure diverse RNA populations, identify proteins that interact with RNA and DNA, probe regions of DNA hypersensitivity, and measure levels of DNA methylation in a wide range of cell and tissue types to identify putative regulatory elements. To date, results for almost 5000 experiments have been released for use by the scientific community. These data are available for searching, visualization and download at the new ENCODE Portal (www.encodeproject.org). The revamped ENCODE Portal provides new ways to browse and search the ENCODE data based on the metadata that describe the assays as well as summaries of the assays that focus on data provenance. In addition, it is a flexible platform that allows integration of genomic data from multiple projects. The portal experience was designed to improve access to ENCODE data by relying on metadata that allow reusability and reproducibility of the experiments.

    Pazin (2015) Using the ENCODE Resource for Functional Annotation of Genetic Variants. Cold Spring Harb Protoc 2015:522-36. (pmid: 25762420)

    PubMed ] [ DOI ] This article illustrates the use of the Encyclopedia of DNA Elements (ENCODE) resource to generate or refine hypotheses from genomic data on disease and other phenotypic traits. First, the goals and history of ENCODE and related epigenomics projects are reviewed. Second, the rationale for ENCODE and the major data types used by ENCODE are briefly described, as are some standard heuristics for their interpretation. Third, the use of the ENCODE resource is examined. Standard use cases for ENCODE, accessing the ENCODE resource, and accessing data from related projects are discussed. Although the focus of this article is the use of ENCODE data, some of the same approaches can be used with data from other projects.

    Notes


     


    About ...
     
    Author:

    Boris Steipe <boris.steipe@utoronto.ca>

    Created:

    2017-08-05

    Modified:

    2020-09-25

    Version:

    1.1

    Version history:

    • 1.1 2020 Updates; re-added JBrowse (SGD), NCBI and Ensembl as visit-tasks
    • 1.0 First live version
    • 0.1 First stub

    CreativeCommonsBy.png This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.