BIN-GENOME-Genome Annotation
Genome annotation
Keywords: Genome contents; ENCODE; Genome annotation methods.
Contents
Abstract
Introduction to genome annotation: the content of genomes - what to look for; identifying genes, and keeping up-to-date on methods.
This unit ...
Prerequisites
You need to complete the following units before beginning this one:
Objectives
This unit will ...
- ... introduce categories of genome contents, as defined eg. through the ENCODE project, and discuss annotation methods.
Outcomes
After working through this unit you ...
- ... are familar with the contents of genomes, some methods to annotate protein genes, and sources for genomes;
- ... know how to get up-to-date information on genome annotation workflows.
Deliverables
- Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
- Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.
- Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.
Evaluation
Evaluation: NA
- This unit is not evaluated for course marks.
Contents
Task:
- Read the introductory notes on the annotation of genome sequences.
Further reading, links and resources
ENCODE Project Consortium (2011) A user's guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol 9:e1001046. (pmid: 21526222) |
[ PubMed ] [ DOI ] The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made available through a freely accessible database. Here we provide an overview of the project and the resources it is generating and illustrate the application of ENCODE data to interpret the human genome. |
Zarrei et al. (2015) A copy number variation map of the human genome. Nat Rev Genet 16:172-83. (pmid: 25645873) |
[ PubMed ] [ DOI ] A major contribution to the genome variability among individuals comes from deletions and duplications - collectively termed copy number variations (CNVs) - which alter the diploid status of DNA. These alterations may have no phenotypic effect, account for adaptive traits or can underlie disease. We have compiled published high-quality data on healthy individuals of various ethnicities to construct an updated CNV map of the human genome. Depending on the level of stringency of the map, we estimated that 4.8-9.5% of the genome contributes to CNV and found approximately 100 genes that can be completely deleted without producing apparent phenotypic consequences. This map will aid the interpretation of new CNV findings for both clinical and research applications. |
Tyner et al. (2017) The UCSC Genome Browser database: 2017 update. Nucleic Acids Res 45:D626-D634. (pmid: 27899642) |
[ PubMed ] [ DOI ] Since its 2001 debut, the University of California, Santa Cruz (UCSC) Genome Browser (http://genome.ucsc.edu/) team has provided continuous support to the international genomics and biomedical communities through a web-based, open source platform designed for the fast, scalable display of sequence alignments and annotations landscaped against a vast collection of quality reference genome assemblies. The browser's publicly accessible databases are the backbone of a rich, integrated bioinformatics tool suite that includes a graphical interface for data queries and downloads, alignment programs, command-line utilities and more. This year's highlights include newly designed home and gateway pages; a new 'multi-region' track display configuration for exon-only, gene-only and custom regions visualization; new genome browsers for three species (brown kiwi, crab-eating macaque and Malayan flying lemur); eight updated genome assemblies; extended support for new data types such as CRAM, RNA-seq expression data and long-range chromatin interaction pairs; and the unveiling of a new supported mirror site in Japan. |
Aken et al. (2017) Ensembl 2017. Nucleic Acids Res 45:D635-D642. (pmid: 27899575) |
[ PubMed ] [ DOI ] Ensembl (www.ensembl.org) is a database and genome browser for enabling research on vertebrate genomes. We import, analyse, curate and integrate a diverse collection of large-scale reference data to create a more comprehensive view of genome biology than would be possible from any individual dataset. Our extensive data resources include evidence-based gene and regulatory region annotation, genome variation and gene trees. An accompanying suite of tools, infrastructure and programmatic access methods ensure uniform data analysis and distribution for all supported species. Together, these provide a comprehensive solution for large-scale and targeted genomics applications alike. Among many other developments over the past year, we have improved our resources for gene regulation and comparative genomics, and added CRISPR/Cas9 target sites. We released new browser functionality and tools, including improved filtering and prioritization of genome variation, Manhattan plot visualization for linkage disequilibrium and eQTL data, and an ontology search for phenotypes, traits and disease. We have also enhanced data discovery and access with a track hub registry and a selection of new REST end points. All Ensembl data are freely released to the scientific community and our source code is available via the open source Apache 2.0 license. |
Bracken et al. (2016) A network-biology perspective of microRNA function and dysfunction in cancer. Nat Rev Genet 17:719-732. (pmid: 27795564) |
[ PubMed ] [ DOI ] MicroRNAs (miRNAs) participate in most aspects of cellular differentiation and homeostasis, and consequently have roles in many pathologies, including cancer. These small non-coding RNAs exert their effects in the context of complex regulatory networks, often made all the more extensive by the inclusion of transcription factors as their direct targets. In recent years, the increased availability of gene expression data and the development of methodologies that profile miRNA targets en masse have fuelled our understanding of miRNA functions, and of the sources and consequences of miRNA dysregulation. Advances in experimental and computational approaches are revealing not just cancer pathways controlled by single miRNAs but also intermeshed regulatory networks controlled by multiple miRNAs, which often engage in reciprocal feedback interactions with the targets that they regulate. |
Stricker et al. (2017) From profiles to function in epigenomics. Nat Rev Genet 18:51-66. (pmid: 27867193) |
[ PubMed ] [ DOI ] Myriads of epigenomic features have been comprehensively profiled in health and disease across cell types, tissues and individuals. Although current epigenomic approaches can infer function for chromatin marks through correlation, it remains challenging to establish which marks actually have causative roles in gene regulation and other processes. After revisiting how classical approaches have addressed this question in the past, we discuss the current state of epigenomic profiling and how functional information can be indirectly inferred. We also present new approaches that promise definitive functional answers, which are collectively referred to as 'epigenome editing'. In particular, we explore CRISPR-based technologies for single-locus and multi-locus manipulation. Finally, we discuss which level of function can be achieved with each approach and introduce emerging strategies for high-throughput progression from profiles to function. |
Notes
Self-evaluation
If in doubt, ask! If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.
About ...
Author:
- Boris Steipe <boris.steipe@utoronto.ca>
Created:
- 2017-08-05
Modified:
- 2017-08-05
Version:
- 1.0
Version history:
- 1.0 First live version
- 0.1 First stub
This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.