BIN-GENOME-Genome Annotation

From "A B C"
Revision as of 20:55, 19 November 2017 by Boris (talk | contribs)
Jump to navigation Jump to search

Abstract

Introduction to genome annotation: the content of genomes - what to look for; identifying genes, and keeping up-to-date on methods.


 


This unit ...

Prerequisites

You need to complete the following units before beginning this one:


 


Objectives

This unit will ...

  • ... introduce categories of genome contents, as defined eg. through the ENCODE project, and discuss annotation methods.


 


Outcomes

After working through this unit you ...

  • ... are familar with the contents of genomes, some methods to annotate protein genes, and sources for genomes;
  • ... know how to get up-to-date information on genome annotation workflows.


 


Deliverables

  • Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
  • Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.
  • Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.


 


Evaluation

Evaluation: NA

This unit is not evaluated for course marks.


 


Contents

Task:

  • Read:
Tyner et al. (2017) The UCSC Genome Browser database: 2017 update. Nucleic Acids Res 45:D626-D634. (pmid: 27899642)

PubMed ] [ DOI ] Since its 2001 debut, the University of California, Santa Cruz (UCSC) Genome Browser (http://genome.ucsc.edu/) team has provided continuous support to the international genomics and biomedical communities through a web-based, open source platform designed for the fast, scalable display of sequence alignments and annotations landscaped against a vast collection of quality reference genome assemblies. The browser's publicly accessible databases are the backbone of a rich, integrated bioinformatics tool suite that includes a graphical interface for data queries and downloads, alignment programs, command-line utilities and more. This year's highlights include newly designed home and gateway pages; a new 'multi-region' track display configuration for exon-only, gene-only and custom regions visualization; new genome browsers for three species (brown kiwi, crab-eating macaque and Malayan flying lemur); eight updated genome assemblies; extended support for new data types such as CRAM, RNA-seq expression data and long-range chromatin interaction pairs; and the unveiling of a new supported mirror site in Japan.

Aken et al. (2017) Ensembl 2017. Nucleic Acids Res 45:D635-D642. (pmid: 27899575)

PubMed ] [ DOI ] Ensembl (www.ensembl.org) is a database and genome browser for enabling research on vertebrate genomes. We import, analyse, curate and integrate a diverse collection of large-scale reference data to create a more comprehensive view of genome biology than would be possible from any individual dataset. Our extensive data resources include evidence-based gene and regulatory region annotation, genome variation and gene trees. An accompanying suite of tools, infrastructure and programmatic access methods ensure uniform data analysis and distribution for all supported species. Together, these provide a comprehensive solution for large-scale and targeted genomics applications alike. Among many other developments over the past year, we have improved our resources for gene regulation and comparative genomics, and added CRISPR/Cas9 target sites. We released new browser functionality and tools, including improved filtering and prioritization of genome variation, Manhattan plot visualization for linkage disequilibrium and eQTL data, and an ontology search for phenotypes, traits and disease. We have also enhanced data discovery and access with a track hub registry and a selection of new REST end points. All Ensembl data are freely released to the scientific community and our source code is available via the open source Apache 2.0 license.

Bracken et al. (2016) A network-biology perspective of microRNA function and dysfunction in cancer. Nat Rev Genet 17:719-732. (pmid: 27795564)

PubMed ] [ DOI ] MicroRNAs (miRNAs) participate in most aspects of cellular differentiation and homeostasis, and consequently have roles in many pathologies, including cancer. These small non-coding RNAs exert their effects in the context of complex regulatory networks, often made all the more extensive by the inclusion of transcription factors as their direct targets. In recent years, the increased availability of gene expression data and the development of methodologies that profile miRNA targets en masse have fuelled our understanding of miRNA functions, and of the sources and consequences of miRNA dysregulation. Advances in experimental and computational approaches are revealing not just cancer pathways controlled by single miRNAs but also intermeshed regulatory networks controlled by multiple miRNAs, which often engage in reciprocal feedback interactions with the targets that they regulate.

Stricker et al. (2017) From profiles to function in epigenomics. Nat Rev Genet 18:51-66. (pmid: 27867193)

PubMed ] [ DOI ] Myriads of epigenomic features have been comprehensively profiled in health and disease across cell types, tissues and individuals. Although current epigenomic approaches can infer function for chromatin marks through correlation, it remains challenging to establish which marks actually have causative roles in gene regulation and other processes. After revisiting how classical approaches have addressed this question in the past, we discuss the current state of epigenomic profiling and how functional information can be indirectly inferred. We also present new approaches that promise definitive functional answers, which are collectively referred to as 'epigenome editing'. In particular, we explore CRISPR-based technologies for single-locus and multi-locus manipulation. Finally, we discuss which level of function can be achieved with each approach and introduce emerging strategies for high-throughput progression from profiles to function.

UCSC


ENCODE

miRNAs

Task:
Read this review:


Epigenomics

Task:
Read this review:



 


Further reading, links and resources

Sloan et al. (2016) ENCODE data at the ENCODE portal. Nucleic Acids Res 44:D726-32. (pmid: 26527727)

PubMed ] [ DOI ] The Encyclopedia of DNA Elements (ENCODE) Project is in its third phase of creating a comprehensive catalog of functional elements in the human genome. This phase of the project includes an expansion of assays that measure diverse RNA populations, identify proteins that interact with RNA and DNA, probe regions of DNA hypersensitivity, and measure levels of DNA methylation in a wide range of cell and tissue types to identify putative regulatory elements. To date, results for almost 5000 experiments have been released for use by the scientific community. These data are available for searching, visualization and download at the new ENCODE Portal (www.encodeproject.org). The revamped ENCODE Portal provides new ways to browse and search the ENCODE data based on the metadata that describe the assays as well as summaries of the assays that focus on data provenance. In addition, it is a flexible platform that allows integration of genomic data from multiple projects. The portal experience was designed to improve access to ENCODE data by relying on metadata that allow reusability and reproducibility of the experiments.

Pazin (2015) Using the ENCODE Resource for Functional Annotation of Genetic Variants. Cold Spring Harb Protoc 2015:522-36. (pmid: 25762420)

PubMed ] [ DOI ] This article illustrates the use of the Encyclopedia of DNA Elements (ENCODE) resource to generate or refine hypotheses from genomic data on disease and other phenotypic traits. First, the goals and history of ENCODE and related epigenomics projects are reviewed. Second, the rationale for ENCODE and the major data types used by ENCODE are briefly described, as are some standard heuristics for their interpretation. Third, the use of the ENCODE resource is examined. Standard use cases for ENCODE, accessing the ENCODE resource, and accessing data from related projects are discussed. Although the focus of this article is the use of ENCODE data, some of the same approaches can be used with data from other projects.

ENCODE Project Consortium (2011) A user's guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol 9:e1001046. (pmid: 21526222)

PubMed ] [ DOI ] The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made available through a freely accessible database. Here we provide an overview of the project and the resources it is generating and illustrate the application of ENCODE data to interpret the human genome.

 
Zarrei et al. (2015) A copy number variation map of the human genome. Nat Rev Genet 16:172-83. (pmid: 25645873)

PubMed ] [ DOI ] A major contribution to the genome variability among individuals comes from deletions and duplications - collectively termed copy number variations (CNVs) - which alter the diploid status of DNA. These alterations may have no phenotypic effect, account for adaptive traits or can underlie disease. We have compiled published high-quality data on healthy individuals of various ethnicities to construct an updated CNV map of the human genome. Depending on the level of stringency of the map, we estimated that 4.8-9.5% of the genome contributes to CNV and found approximately 100 genes that can be completely deleted without producing apparent phenotypic consequences. This map will aid the interpretation of new CNV findings for both clinical and research applications.


 


Notes


 


Self-evaluation

 



 




 

If in doubt, ask! If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.



 

About ...
 
Author:

Boris Steipe <boris.steipe@utoronto.ca>

Created:

2017-08-05

Modified:

2017-08-05

Version:

1.0

Version history:

  • 1.0 First live version
  • 0.1 First stub

CreativeCommonsBy.png This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.