Expected Preparations:
|
|||||||
|
|||||||
Keywords: UCSC genome browser | |||||||
|
|||||||
Objectives:
This unit will …
|
Outcomes:
After working through this unit you …
|
||||||
|
|||||||
Deliverables: Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit. Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don’t overlook these. Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page. |
|||||||
|
|||||||
Evaluation: NA: This unit is not evaluated for course marks. |
Exploring genomes with the UCSC genome browser
Task…
Lee, Christopher
M et al.. (2020). “UCSC Genome Browser enters 20th year”.
Nucleic Acids Research 48(D1):D756–D761 .
[PMID: 31691824]
[DOI: 10.1093/nar/gkz1012]
Large scale genome sequencing and annotation has made a wealth of information available that is all related to the same biological objects: the DNA. The information however can be of very different types, it includes: * the actual sequence * sequence variants (SNPs and CNVs) * conservation between related species * genes (with introns and exons) * mRNAs * expression levels * regulatory features such as transcription factor bindings sites and much more.
Since all of this information relates to specific positions or ranges on the chromosome, displaying it alongside the chromosomal coordinates is a useful way to integrate and visualize it. We call such strips of annotation tracts and display them in genome browsers. Quite a number of such browsers exist and most work on the same principle: server hosted databases are queried through a Web interface; the resulting data is displayed graphically in a Web browser window. The large data centres each have their own browsers, but arguably the best engineered, most informative and most widely used one is provided by the University of California Santa Cruz (UCSC) Genome Browser Project.
Compiling the data requires a massive annotation effort, which has not been completed for all genome-sequenced species. In particular, not all of our MYSPEs have been included in the major model-organism annotation efforts. The general strategy for analysis of a gene in MYSPE is thus to map it to homologous genes in model organisms(W). In this assignment you will explore the UCSC genome browser and we will go through an exercise that relates fungal replication genes to human genes. We have previously focused a lot on Mbp1 homologs, but these have no clear equivalences in “higher” eukaryotes. However one of the key target genes of Mbp1 is the cell cycle protein Cdc6(W), which is well conserved in fungi and other eukaryotes eukaryotes and has a human homolog(W). Since generally speaking the annotation level for human genes is the highest, we will have a closer look at that gene.
The University of California Santa Cruz (UCSC) Genome Browser Project has the largest offering of annotation information. However it is strictly model-organism oriented and you will probably not find MYSPE among its curated genomes. Nevertheless, if you are studying eg. human genes, or yeast, the UCSC browser will probably be your first choice.
Task…
In this task you will access the UCSC genome browser view of the yeast Cdc6 gene and its human orthologue, the human Cdc6 gene. You will explore some of the very large number of tracks that are available and study the transcription factor binding region.
Navigate to the UCSC Genome Bioinformatics entry page and follow the link to the Genome Browser in the “Our tools” section.
To view genomes, you need to select a species. Note the scrollable species overview on the left hand side. The species are arranged by their position in the universal Tree of Life for eukaryotes.
Click on the link to the Cdc6 gene on chromosome X.
Click on the button to zoom out 3x - we want to see the upstream regulatory region.
In the subsection for Expression and Regulation, find the menu for Regulatory Code and select full; select hide for all other expression tracks. Click refresh.
This tracts show you the ChIP-chip validated TF-binding sites in the upstream regulatory region of yeast Cdc6. Note that there are several Mbp1 binding sites. Curiously, Swi6 is also listed there - but you know that Swi6 does not actually bind DNA directly, but forms a complex with either of the APSES domain transcription factors Mbp1 and Swi4 which form the MBF complex. However, crosslinking of the complex, and immunoprecipitation with anti-Swi6, would certainly identify this region. You should be aware that an annotation of a protein in a ChIP-chip experiment is not the same as demonstrating a protein’s physical interaction with DNA.
Zoom in, for better resolution, and shift-drag the view to keep the regulatory region of the Cdc6 gene centred. Then note that both stretches of DNA that have demostrated TF binding sites are also listed as conserved regions (dark red bars). As with all track elements, clicking on the bars will expand the display density, and clicking again will take you to an information page about this experimental tract, with further download options.
Zoom in until you can see the individual nucleotides for the Mbp1
binding sites. Then click on one of the Mbp1 bars to get information
about the specific binding site. Note that the canoncial binding
sequence corresponds to a regular expression of
[AT]CGCG[AT]
… a pattern with a probability of occurrence
of about 1/1000 in random sequence. This is not very stringent
- but here we have three such motifs within 200 bp - two of them
adjacent.
Return to the genome browser entry page to access the genome browser for the human genome.
Click on the link to humans. Note that this is the hg38 assembly.
Enter CDC6 into the “Position/Search Term” field and click “Go”.
You should get a list of entries, click on the top link, the Homo
sapiens cell division cycle 6 (CDC6), mRNA
gene on chromosome
17.`
Zoom out 1.5x to view the upstream regulatory region: the end of the adjacent WIPF2 gene should have just come into view on the left.
Study the Genome Browser view of the human CDC6 homolog.
Note the large number of available tracks that have been integrated into this view. Most of them are switched off. Find the Regulation section, and follow the link to the “ORegAnno” information to see what that is about. Note that you can switch individual annotations on or off on this page, as well as set the display format for all of the results. Select the check-box only for “transcription factor binding site” to be on, select the “Display mode” to full and click submit.
Study this information and note:
Go back to the Genome Browser and set the ORegAnno tract to “pack” and click “refresh”.
Slide the SNP track to just beneath the RefSeq genes track that contains the introns and exons. You will notice that one of the SNPs is green, and two are red. Why? Set the “Common SNPs” track display mode to “pack” and click “refresh”.
Based on this kind of information, it should be straightforward to identify human transcription factors that potentially regulate human Cdc6 and determine - via sequence comparisons - whether any of them are homologous to any of the yeast transcription factors or factors in MYSPE. Through a detailed analysis of existing systems, their regulatory components and the conservation of regulation, one can in principle establish functional equivalences across large evolutionary distances.
Task…
Visit the following three alternatives to UCSC:
Wang, Jun
et al.. (2013). “A brief introduction to web-based genome
browsers”. Briefings in Bioinformatics
14(2):131–43 .
[PMID: 22764121]
[DOI: 10.1093/bib/bbs029]
Sloan, Cricket
A et al.. (2016). “ENCODE data at the ENCODE portal”.
Nucleic Acids Research 44(D1):D726–32 .
[PMID: 26527727]
[DOI: 10.1093/nar/gkv1160]
Pazin, Michael
J. (2015). “Using the ENCODE Resource for Functional Annotation
of Genetic Variants”. Cold Spring Harbor Protocols
2015(6):522–36 .
[PMID:
25762420]
[DOI: 10.1101/pdb.top084988]
If in doubt, ask! If anything about this contents is not clear to you, do not proceed but ask for clarification. If you have ideas about how to make this material better, let’s hear them. We are aiming to compile a list of FAQs for all learning units, and your contributions will count towards your participation marks.
Improve this page! If you have questions or comments, please post them on the Quercus Discussion board with a subject line that includes the name of the unit.
[END]