Difference between revisions of "BIN-FUNC-GO"

Revision as of 21:54, 12 November 2017

Gene Ontology

Keywords: Ontologies in knowledge engineering, GO and GOA

This unit is under development. There is some contents here but it is incomplete and/or may change significantly: links may lead to nowhere, the contents is likely going to be rearranged, and objectives, deliverables etc. may be incomplete or missing. Do not work with this material until it is updated to "live" status.

Abstract

Introduction to the Gene Ontology (GO) and Gene Ontology Annotations (GOA).

This unit ...

Prerequisites

You need to complete the following units before beginning this one:

BIN-FUNC-Databases (Molecular Function Databases)

Objectives

This unit will ...

... introduce Go and associated data and services;

Outcomes

After working through this unit you ...

... are familar with the concept of an ontology and the terms and ontologies of the GO project;
... can search for a gene of interest, identify associations, evaluate the term graph, find relevant ancestor nodes in the inferred tree, and discover proteins with related function.

Deliverables

Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.
Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.

Evaluation

Evaluation: NA

This unit is not evaluated for course marks.

Introduction

The Gene Ontology project is the most influential contributor to the definition of function in computational biology and the use of GO terms and GO annotations is ubiquitous.

Task:

Read the introductory notes on the Gene Ontology project to define and annotate gene function.

Browse through the papar describing the 2017 update on the GO database and tools; in particular take note of the LEGO initiative that aims to build systems and pathway models by combining suitable GO terms.

The Gene Ontology Consortium (2017) Expansion of the Gene Ontology knowledgebase and resources. Nucleic Acids Res 45:D331-D338. (pmid: 27899567)

[ PubMed ] [ DOI ] The Gene Ontology (GO) is a comprehensive resource of computable knowledge regarding the functions of genes and gene products. As such, it is extensively used by the biomedical research community for the analysis of -omics and related data. Our continued focus is on improving the quality and utility of the GO resources, and we welcome and encourage input from researchers in all areas of biology. In this update, we summarize the current contents of the GO knowledgebase, and present several new features and improvements that have been made to the ontology, the annotations and the tools. Among the highlights are 1) developments that facilitate access to, and application of, the GO knowledgebase, and 2) extensions to the resource as well as increasing support for descriptions of causal models of biological systems and network biology. To learn more, visit http://geneontology.org/.

Remember the three separate component ontologies of GO:

Molecular function
Biological Process
Cellular component

GO evidence codes

Annotations can be made according to literature data or computational inference and it is important to note how an annotation has been justified by the curator to evaluate the level of trust we should have in the annotation. GO uses evidence codes to make this process transparent. When computing with the ontology, we may want to filter (exclude) particular terms in order to avoid tautologies: for example if we were to infer functional relationships between homologous genes, we should exclude annotations that have been based on the same inference or similar, and compute only with the actual experimental data.

The following evidence codes are in current use; if you want to exclude inferred anotations you would restrict the codes you use to the ones shown in bold below: EXP, IDA, IPI, IMP, IEP, and perhaps IGI, although the interpretation of genetic interactions can require assumptions. The codes are ubiquitous and important, you need to know what they mean and imply when working with GOA data.

Automatically-assigned Evidence Codes

IEA: Inferred from Electronic Annotation

Curator-assigned Evidence Codes

Experimental Evidence Codes
- EXP: Inferred from Experiment
- IGI: Inferred from Genetic Interaction

Computational Analysis Evidence Codes
- ISS: Inferred from Sequence or Structural Similarity
- ISO: Inferred from Sequence Orthology
- ISA: Inferred from Sequence Alignment
- ISM: Inferred from Sequence Model
- IGC: Inferred from Genomic Context
- IBA: Inferred from Biological aspect of Ancestor
- IBD: Inferred from Biological aspect of Descendant
- IKR: Inferred from Key Residues
- IRD: Inferred from Rapid Divergence
- RCA: inferred from Reviewed Computational Analysis

Author Statement Evidence Codes
- TAS: Traceable Author Statement
- NAS: Non-traceable Author Statement

Curator Statement Evidence Codes
- IC: Inferred by Curator
- ND: No biological Data available

For further details, see the Guide to GO Evidence Codes and the GO Evidence Code Decision Tree.

GO tools

For many projects, the simplest approach will be to download the GO ontology itself. It is a well constructed, easily parseable file that is well suited for computation. Bioconducter has a large number of packages that supply and analyze GO and GOA data.

AmiGO

AmiGO is a convenient online GO browser developed by the Gene Ontology consortium and hosted on their website.

AmiGO - Gene products

Task:

Navigate to the GO homepage.
Enter Mbp1 into the search box to initiate a search for the yeast Mbp1 transcription factor (as gene or protein name).
There are a three catgories of hits - Ontology terms directly associated with the search string, Genes and gene products annoted to terms in GOA, and Annotations of terms to any of the genes. As usual, we need to be wary of keyword searches since they rarely identify a unique gene, so we check the Genes... category first. Follow the link.
From the table you find you can easily identify the correct gene. Follow its link to the associated Gene Information page. Study the information on that page.
Note that this page lists Associations - i.e. GO terms that haven been associated with Mbp1 in GOA.

AmiGO - Associations

GO annotations for a protein are called associations.

Task:

Expand the reuslt count to show all Associations on the page. Note the evidence codes. Also note that there are two associations referring to a negative experiment, with the red annotation qualifier NOT. This means an experiment has demonstrated the gene not to have that activity.
Note that you can expand the left hand menu for detailed filtering. Click on Ontology (aspect) to display or undisplay the terms for the three different component ontologies - the GO "aspects": F, C, and P (what were these again?).
The most specifc annotation on the page seems to be "positive regulation of transcription involved in G1/S transition of mitotic cell cycle". Follow the link.
Note that you can now filter for organisms. Restrict the organism to Saccaromyces cerevisiae S288C by clicking on the green (+) sign. Note that you now see all yeast genes that are annotated to this term! This is an effective way to build system membership information from the bottom up.
There are a number of tabs available for different views on the data: Annotations, Graph Views, Inferred Tree View, Neigborhood, and Mappings. Visit them.
1. The link to QuickGo from the Graph Views tab gives you the entire ancestor chart of the term, with clicakble term nodes. You need to consider the ancestor terms to expand searches for related, collaborating genes. For example, if a term is annotaed with "positive regulation ...", you will need to consider genes associete to the cognate "negative regulation ..." or just "regulation ..." terms as well to get a complete picture of the gene's activities.
2. Neigborhood refers to the ancestors and children of a term.
Study the information available on that page and through the tabs on the page, especially the graph view.
In the Inferred Tree View tab, find the ancestor node "GO:0000082 G1/S transition of mitotic cell cycle" and follow it. On the associations page, go again to Annotations and filter the list to S. cerevisiae genes. As of today there are 137 annotated genes. Are these genes specifically annotated to that term, or does the list include genes that are annotated to descendants of the term?

GO Slims

GO is large and very detailed and the need for somehwat more high-level descriptions in model organisms is met by the GoSlim datasets that are curated by some of the main model-organism databases and consortia. Read more what that is about.

Further reading, links and resources

Gene Ontology Consortium (2012) The Gene Ontology: enhancements for 2011. Nucleic Acids Res 40:D559-64. (pmid: 22102568)

[ PubMed ] [ DOI ] The Gene Ontology (GO) (http://www.geneontology.org) is a community bioinformatics resource that represents gene product function through the use of structured, controlled vocabularies. The number of GO annotations of gene products has increased due to curation efforts among GO Consortium (GOC) groups, including focused literature-based annotation and ortholog-based functional inference. The GO ontologies continue to expand and improve as a result of targeted ontology development, including the introduction of computable logical definitions and development of new tools for the streamlined addition of terms to the ontology. The GOC continues to support its user community through the use of e-mail lists, social media and web-based resources.

Bastos et al. (2011) Application of gene ontology to gene identification. Methods Mol Biol 760:141-57. (pmid: 21779995)

[ PubMed ] [ DOI ] Candidate gene identification deals with associating genes to underlying biological phenomena, such as diseases and specific disorders. It has been shown that classes of diseases with similar phenotypes are caused by functionally related genes. Currently, a fair amount of knowledge about the functional characterization can be found across several public databases; however, functional descriptors can be ambiguous, domain specific, and context dependent. In order to cope with these issues, the Gene Ontology (GO) project developed a bio-ontology of broad scope and wide applicability. Thus, the structured and controlled vocabulary of terms provided by the GO project describing the biological roles of gene products can be very helpful in candidate gene identification approaches. The method presented here uses GO annotation data in order to identify the most meaningful functional aspects occurring in a given set of related gene products. The method measures this meaningfulness by calculating an e-value based on the frequency of annotation of each GO term in the set of gene products versus the total frequency of annotation. Then after selecting a GO term related to the underlying biological phenomena being studied, the method uses semantic similarity to rank the given gene products that are annotated to the term. This enables the user to further narrow down the list of gene products and identify those that are more likely of interest.

du Plessis et al. (2011) The what, where, how and why of gene ontology--a primer for bioinformaticians. Brief Bioinformatics 12:723-35. (pmid: 21330331)

[ PubMed ] [ DOI ] With high-throughput technologies providing vast amounts of data, it has become more important to provide systematic, quality annotations. The Gene Ontology (GO) project is the largest resource for cataloguing gene function. Nonetheless, its use is not yet ubiquitous and is still fraught with pitfalls. In this review, we provide a short primer to the GO for bioinformaticians. We summarize important aspects of the structure of the ontology, describe sources and types of functional annotations, survey measures of GO annotation similarity, review typical uses of GO and discuss other important considerations pertaining to the use of GO in bioinformatics applications.

Hackenberg & Matthiesen (2010) Algorithms and methods for correlating experimental results with annotation databases. Methods Mol Biol 593:315-40. (pmid: 19957156)

[ PubMed ] [ DOI ] An important procedure in biomedical research is the detection of genes that are differentially expressed under pathologic conditions. These genes, or at least a subset of them, are key biomarkers and are thought to be important to describe and understand the analyzed biological system (the pathology) at a molecular level. To obtain this understanding, it is indispensable to link those genes to biological knowledge stored in databases. Ontological analysis is nowadays a standard procedure to analyze large gene lists. By detecting enriched and depleted gene properties and functions, important insights on the biological system can be obtained. In this chapter, we will give a brief survey of the general layout of the methods used in an ontological analysis and of the most important tools that have been developed.

Gene Ontology Consortium (2010) The Gene Ontology in 2010: extensions and refinements. Nucleic Acids Res 38:D331-5. (pmid: 19920128)

[ PubMed ] [ DOI ] The Gene Ontology (GO) Consortium (http://www.geneontology.org) (GOC) continues to develop, maintain and use a set of structured, controlled vocabularies for the annotation of genes, gene products and sequences. The GO ontologies are expanding both in content and in structure. Several new relationship types have been introduced and used, along with existing relationships, to create links between and within the GO domains. These improve the representation of biology, facilitate querying, and allow GO developers to systematically check for and correct inconsistencies within the GO. Gene product annotation using GO continues to increase both in the number of total annotations and in species coverage. GO tools, such as OBO-Edit, an ontology-editing tool, and AmiGO, the GOC ontology browser, have seen major improvements in functionality, speed and ease of use.

Carol Goble on the tension between purists and pragmatists in life-science ontology construction. Plenary talk at SOFG2...

Goble & Wroe (2004) The Montagues and the Capulets. Comp Funct Genomics 5:623-32. (pmid: 18629186)

[ PubMed ] [ DOI ] Two households, both alike in dignity, In fair Genomics, where we lay our scene, (One, comforted by its logic's rigour, Claims ontology for the realm of pure, The other, with blessed scientist's vigour, Acts hastily on models that endure), From ancient grudge break to new mutiny, When 'being' drives a fly-man to blaspheme. From forth the fatal loins of these two foes, Researchers to unlock the book of life; Whole misadventured piteous overthrows, Can with their work bury their clans' strife. The fruitful passage of their GO-mark'd love, And the continuance of their studies sage, Which, united, yield ontologies undreamed-of, Is now the hour's traffic of our stage; The which if you with patient ears attend, What here shall miss, our toil shall strive to mend.

Harris (2008) Developing an ontology. Methods Mol Biol 452:111-24. (pmid: 18563371)

[ PubMed ] [ DOI ] In recent years, biological ontologies have emerged as a means of representing and organizing biological concepts, enabling biologists, bioinformaticians, and others to derive meaning from large datasets.This chapter provides an overview of formal principles and practical considerations of ontology construction and application. Ontology development concepts are illustrated using examples drawn from the Gene Ontology (GO) and other OBO ontologies.

Dimmer et al. (2007) Methods for gene ontology annotation. Methods Mol Biol 406:495-520. (pmid: 18287709)

[ PubMed ] [ DOI ] The Gene Ontology (GO) is an established dynamic and structured vocabulary that has been successfully used in gene and protein annotation. Designed by biologists to improve data integration, GO attempts to replace the multiple nomenclatures used by specialised and large biological knowledgebases. This chapter describes the methods used by groups to create new GO annotations and how users can apply publicly available GO annotations to enhance their datasets.

Aravind et al. (2005) The many faces of the helix-turn-helix domain: transcription regulation and beyond. FEMS Microbiol Rev 29:231-62. (pmid: 15808743)

[ PubMed ] [ DOI ] The helix-turn-helix (HTH) domain is a common denominator in basal and specific transcription factors from the three super-kingdoms of life. At its core, the domain comprises of an open tri-helical bundle, which typically binds DNA with the 3rd helix. Drawing on the wealth of data that has accumulated over two decades since the discovery of the domain, we present an overview of the natural history of the HTH domain from the viewpoint of structural analysis and comparative genomics. In structural terms, the HTH domains have developed several elaborations on the basic 3-helical core, such as the tetra-helical bundle, the winged-helix and the ribbon-helix-helix type configurations. In functional terms, the HTH domains are present in the most prevalent transcription factors of all prokaryotic genomes and some eukaryotic genomes. They have been recruited to a wide range of functions beyond transcription regulation, which include DNA repair and replication, RNA metabolism and protein-protein interactions in diverse signaling contexts. Beyond their basic role in mediating macromolecular interactions, the HTH domains have also been incorporated into the catalytic domains of diverse enzymes. We discuss the general domain architectural themes that have arisen amongst the HTH domains as a result of their recruitment to these diverse functions. We present a natural classification, higher-order relationships and phyletic pattern analysis of all the major families of HTH domains. This reconstruction suggests that there were at least 6-11 different HTH domains in the last universal common ancestor of all life forms, which covered much of the structural diversity and part of the functional versatility of the extant representatives of this domain. In prokaryotes the total number of HTH domains per genome shows a strong power-equation type scaling with the gene number per genome. However, the HTH domains in two-component signaling pathways show a linear scaling with gene number, in contrast to the non-linear scaling of HTH domains in single-component systems and sigma factors. These observations point to distinct evolutionary forces in the emergence of different signaling systems with HTH transcription factors. The archaea and bacteria share a number of ancient families of specific HTH transcription factors. However, they do not share any orthologous HTH proteins in the basal transcription apparatus. This differential relationship of their basal and specific transcriptional machinery poses an apparent conundrum regarding the origins of their transcription apparatus.

Gajiwala & Burley (2000) Winged helix proteins. Curr Opin Struct Biol 10:110-6. (pmid: 10679470)

[ PubMed ] [ DOI ] The winged helix proteins constitute a subfamily within the large ensemble of helix-turn-helix proteins. Since the discovery of the winged helix/fork head motif in 1993, a large number of topologically related proteins with diverse biological functions have been characterized by X-ray crystallography and solution NMR spectroscopy. Recently, a winged helix transcription factor (RFX1) was shown to bind DNA using unprecedented interactions between one of its eponymous wings and the major groove. This surprising observation suggests that the winged helix proteins can be subdivided into at least two classes with radically different modes of DNA recognition.

Notes

Self-evaluation

If in doubt, ask! If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.

About ...

Author:

Boris Steipe <boris.steipe@utoronto.ca>

Created:

2017-08-05

Modified:

2017-11-12

Version:

1.0

Version history:

1.0 First live
0.1 First stub

This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.

Difference between revisions of "BIN-FUNC-GO"

Revision as of 21:54, 12 November 2017

Contents

Abstract

This unit ...

Prerequisites

Objectives

Outcomes

Deliverables

Evaluation

Contents

Introduction

GO evidence codes

GO tools

AmiGO

AmiGO - Gene products

AmiGO - Associations

GO Slims

Further reading, links and resources

Notes

Self-evaluation

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Sections

Tools

@@ Line 29: / Line 29: @@
 <section begin=abstract />
 <!-- included from "../components/BIN-FUNC-GO.components.wtxt", section: "abstract" -->
-...
+Introduction to the Gene Ontology (GO) and Gene Ontology Annotations (GOA).
 <section end=abstract />
@@ Line 47: / Line 47: @@
 === Objectives ===
 <!-- included from "../components/BIN-FUNC-GO.components.wtxt", section: "objectives" -->
-...
+This unit will ...
+* ... introduce Go and associated data and services;
 {{Vspace}}
@@ Line 54: / Line 55: @@
 === Outcomes ===
 <!-- included from "../components/BIN-FUNC-GO.components.wtxt", section: "outcomes" -->
-...
+After working through this unit you ...
+* ... are familar with the concept of an ontology and the terms and ontologies of the GO project;
+* ... can search for a gene of interest, identify associations, evaluate the term graph, find relevant ancestor nodes in the inferred tree, and discover proteins with related function.
 {{Vspace}}
@@ Line 85: / Line 88: @@
 <!-- included from "../components/BIN-FUNC-GO.components.wtxt", section: "contents" -->
+{{Vspace}}
+==Introduction==
+{{Smallvspace}}
+The Gene Ontology project is the most influential contributor to the definition of function in computational biology and the use of GO terms and GO annotations is ubiquitous.
+{{Smallvspace}}
 {{Task|1=
 *Read the introductory notes on {{ABC-PDF|BIN-FUNC-GO|the Gene Ontology project to define and annotate gene function}}.
-}}
+* Browse through the papar describing the 2017 update on the GO database and tools; in particular take note of the LEGO initiative that aims to build systems and pathway models by combining suitable GO terms.
-==GO==
-==Introduction==
-{{#pmid: 18563371}}
-{{#pmid: 19957156}}
 {{#pmid: 27899567}}
+}}
-==GO==
-The Gene Ontology project is the most influential contributor to the definition of function in computational biology and the use of GO terms and GO annotations is ubiquitous.
-{{WWW|WWW_GO}}
-{{#pmid: 21330331}}
-The GO actually comprises three separate ontologies:
+Remember the three separate component ontologies of GO:
-;Molecular function
+* '''Molecular function'''
-:...
+* '''Biological Process'''
+* '''Cellular component'''
+{{Vspace}}
-;Biological Process
+==GO evidence codes==
-:...
+{{Smallvspace}}
-;Cellular component:
-: ...
-===GO terms===
-GO terms comprise the core of the information in the ontology: a carefully crafted definition of a term in any of GO's separate ontologies.
-===GO relationships===
-The nature of the relationships is as much a part of the ontology as the terms themselves. GO uses three categories of relationships:
-* is a
-* part of
-* regulates
-===GO annotations===
-The GO terms are conceptual in nature, and while they represent our interpretation of biological phenomena, they do not intrinsically represent biological objects, such a specific genes or proteins. In order to link molecules with these concepts, the ontology is used to '''annotate''' genes. The annotation project is referred to as GOA.
-{{#pmid:18287709}}
-===GO evidence codes===
 Annotations can be made according to literature data or computational inference and it is important to note how an annotation has been justified by the curator to evaluate the level of trust we should have in the annotation. GO uses evidence codes to make this process transparent. When computing with the ontology, we may want to filter (exclude) particular terms in order to avoid tautologies: for example if we were to infer functional relationships between homologous genes, we should exclude annotations that have been based on the same inference or similar, and compute only with the actual experimental data.
-The following evidence codes are in current use; if you want to exclude inferred anotations you would restrict the codes you use to the ones shown in bold: EXP, IDA, IPI, IMP, IEP, and perhaps IGI, although the interpretation of genetic interactions can require assumptions.
+The following evidence codes are in current use; if you want to exclude inferred anotations you would restrict the codes you use to the ones shown in bold below: EXP, IDA, IPI, IMP, IEP, and perhaps IGI, although the interpretation of genetic interactions can require assumptions. The codes are '''ubiquitous and important''', you need to know what they mean and imply when working with GOA data.
 ;Automatically-assigned Evidence Codes
 *IEA: Inferred from Electronic Annotation
 ;Curator-assigned Evidence Codes
 *'''Experimental Evidence Codes'''
-**EXP: Inferred from Experiment
+**<b>EXP: Inferred from Experiment
 **IDA: Inferred from Direct Assay
 **IPI: Inferred from Physical Interaction
 **IMP: Inferred from Mutant Phenotype
-**IGI: Inferred from Genetic Interaction
+**IEP: Inferred from Expression Pattern
-**IEP: Inferred from Expression Pattern</b>
+**IGI: Inferred from Genetic Interaction</b>
 *'''Computational Analysis Evidence Codes'''
 **ISS: Inferred from Sequence or Structural Similarity
@@ Line 166: / Line 140: @@
 **IRD: Inferred from Rapid Divergence
 **RCA: inferred from Reviewed Computational Analysis
 *'''Author Statement Evidence Codes'''
 **TAS: Traceable Author Statement
 **NAS: Non-traceable Author Statement
 *'''Curator Statement Evidence Codes'''
 **IC: Inferred by Curator
@@ Line 178: / Line 154: @@
 {{Vspace}}
-===GO tools===
+==GO tools==
-For many projects, the simplest approach will be to download the GO ontology itself. It is a well constructed, easily parseable file that is well suited for computation. For details, see [[Computing with GO]] on this wiki.
+For many projects, the simplest approach will be to download the GO ontology itself. It is a well constructed, easily parseable file that is well suited for computation. Bioconducter has a [https://www.bioconductor.org/packages/release/BiocViews.html#___GO large number of packages] that supply and analyze GO and GOA data.
+{{Vspace}}
+===AmiGO===
+{{Smallvspace}}
+[http://amigo.geneontology.org/cgi-bin/amigo/go.cgi '''AmiGO'''] is a convenient online [http://www.geneontology.org/ '''GO'''] browser developed by the Gene Ontology consortium and hosted on their website.
+{{Vspace}}
-===AmiGO===
-practical work with GO: at first via the AmiGO browser
-[http://amigo.geneontology.org/cgi-bin/amigo/go.cgi '''AmiGO'''] is a [http://www.geneontology.org/ '''GO'''] browser developed by the Gene Ontology consortium and hosted on their website.
 ====AmiGO - Gene products====
+{{Smallvspace}}
 {{task|1=
 # Navigate to the [http://www.geneontology.org/ '''GO'''] homepage.
-# Enter <code>SOX2</code> into the search box to initiate a search for the human SOX2 transcription factor ({{WP|SOX2|WP}}, [http://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=11195 HUGO]) (as ''gene or protein name'').
+# Enter <code>Mbp1</code> into the search box to initiate a search for the yeast Mbp1 transcription factor (as ''gene or protein name'').
-# There are a number of hits in various organisms: ''sulfhydryl oxidases'' and ''(sex determining region Y)-box'' genes. Check to see the various ways by which you could filter and restrict the results.
+# There are a three catgories of hits - '''Ontology''' terms directly associated with the search string, '''Genes and gene products''' annoted to terms in GOA, and '''Annotations''' of terms to any of the genes. As usual, we need to be wary of keyword searches since they rarely identify a unique gene, so we check the [http://amigo.geneontology.org/amigo/search/bioentity?q=Mbp1 '''Genes...''']  category first. Follow the link.
-# Select ''Homo sapiens'' as the '''species''' filter and set the filter. Note that this still does not give you a unique hit, but ...
+# From the table you find you can easily identify the correct gene. Follow [http://amigo.geneontology.org/amigo/gene_product/SGD:S000002214 its link to the associated '''Gene Information''' page]. Study the information on that page.
-# ... you can identify the '''[http://amigo.geneontology.org/cgi-bin/amigo/gp-details.cgi?gp=UniProtKB:P48431 Transcription factor SOX-2]''' and follow its gene product information link. Study the information on that page.
+# Note that this page lists '''Associations''' - i.e. GO terms that haven been associated with Mbp1 in GOA.
-# Later, we will need Entrez Gene IDs. The GOA information page provides these as '''GeneID''' in the '''External references''' section. Note it down.  With the same approach, find and record the Gene IDs (''a'') of the functionally related [http://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=9221 Oct4 (POU5F1)] protein, (''b'') the human cell-cycle transcription factor [http://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=3113 E2F1], (''c'') the human bone morphogenetic protein-4 transforming growth factor [http://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=1071 BMP4], (''d'') the human UDP glucuronosyltransferase 1 family protein 1, an enzyme that is differentially expressed in some cancers, [http://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=12530 UGT1A1], and (''d'') as a positive control, SOX2's interaction partner [http://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=20857 NANOG], which we would expect to be annotated as functionally similar to both Oct4 and SOX2.
 }}
+{{Vspace}}
-<!--
-SOX2: 6657
-POU5F1: 5460
-E2F1: 1869
-BMP4: 652
-UGT1A1: 54658
-NANOG: 79923
-mgeneSim(c("6657", "5460", "1869", "652", "54658", "79923"), ont="BP", organism="human", measure="Wang")
--->
 ====AmiGO - Associations====
+{{Smallvspace}}
 GO annotations for a protein are called ''associations''.
 {{task|1=
-# Open the ''associations'' information page for the human SOX2 protein via the [http://amigo.geneontology.org/cgi-bin/amigo/gp-assoc.cgi?gp=UniProtKB:P48431 link in the right column] in a separate tab. Study the information on that page.
+# Expand the reuslt count to show all '''Associations''' on the page. Note the evidence codes. Also note that there are two associations referring to a negative experiment, with the red annotation qualifier <span style="border:solid 1px #FF0000; color=#FF0000;"><b>NOT</b></span>. This means an experiment has demonstrated the gene '''not''' to have that activity.
-# Note that you can filter the associations by ontology and evidence code. You have read about the three GO ontologies in your previous assignment, but you should also be familiar with the evidence codes. Click on any of the evidence links to access the Evidence code definition page and study the [http://www.geneontology.org/GO.evidence.shtml definitions of the codes]. '''Make sure you understand which codes point to experimental observation, and which codes denote computational inference, or say that the evidence is someone's opinion (TAS, IC ''etc''.).''' <small>Note: it is good practice - but regrettably not universally implemented standard - to clearly document database semantics and keep definitions associated with database entries easily accessible, as GO is doing here. You won't find this everywhere, but as a user please feel encouraged to complain to the database providers if you come across a database where the semantics are not clear. Seriously: opaque semantics make database annotations useless.</small>
+# Note that you can expand the left hand menu for detailed filtering. Click on '''Ontology (aspect)''' to display or undisplay the terms for the three different component ontologies - the GO "aspects": F, C, and P (what were these again?).
-# There are many associations (around 60) and a good way to select which ones to pursue is to follow the '''most specific''' ones. Set <code>IDA</code> as a filter and among the returned terms select <code>GO:0035019</code> &ndash; [http://amigo.geneontology.org/cgi-bin/amigo/term_details?term=GO:0035019 ''somatic stem cell maintenance''] in the '''Biological Process''' ontology. Follow that link.
+# The most specifc annotation on the page seems to be [http://amigo.geneontology.org/amigo/term/GO:0071931 ''"positive regulation of transcription involved in G1/S transition of mitotic cell cycle"'']. Follow the link.
+# Note that you can now filter for organisms. Restrict the organism to ''Saccaromyces cerevisiae S288C'' by clicking on the green (+) sign. Note that you now see all yeast genes that are annotated to this term! This is an effective way to build system membership information from the bottom up.
+# There are a number of tabs available for different views on the data: '''Annotations''', '''Graph Views''', '''Inferred Tree View''', '''Neigborhood''', and '''Mappings'''. Visit them.
+## The link to [https://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0071931 QuickGo] from the Graph Views tab gives you the entire ancestor chart of the term, with clicakble term nodes. You need to consider the ancestor terms to expand searches for related, collaborating genes. For example, if a term is annotaed with "positive regulation ...", you will need to consider genes associete to the cognate "negative regulation ..." or just "regulation ..." terms as well to get a complete picture of the gene's activities.
+## Neigborhood refers to the ancestors and children of a term.
 # Study the information available on that page and through the tabs on the page, especially the graph view.
-# In the '''Inferred Tree View''' tab, find the genes annotated to this go term for ''homo sapiens''. There should be about 55. Click on [http://amigo.geneontology.org/cgi-bin/amigo/term-assoc.cgi?term=GO:0035019&speciesdb=all&taxid=9606 the number behind the term]. The resulting page will give you all human proteins that have been annotated with this particular term. Note that the great majority of these is via the <code>IEA</code> evidence code.
+# In the '''Inferred Tree View''' tab, find the ancestor node "GO:0000082 G1/S transition of mitotic cell cycle" and follow it. On the associations page, go again to '''Annotations''' and filter the list to ''S. cerevisiae'' genes. As of today there are 137 annotated genes. '''Are these genes specifically annotated to that term, or does the list include genes that are annotated to descendants of the term?'''
 }}
-==Quick GO==
 {{Vspace}}
 ==GO Slims==
+{{Smallvspace}}
-http://www.geneontology.org/page/go-slim-and-subset-guide
+GO is large and very detailed and the need for somehwat more high-level descriptions in model organisms is met by the [http://www.geneontology.org/page/go-slim-and-subset-guide '''GoSlim'''] datasets that are curated by some of the main model-organism databases and consortia. Read more what that is about.
 {{Vspace}}
@@ Line 238: / Line 208: @@
 == Further reading, links and resources ==
 {{#pmid: 22102568}}
 {{#pmid: 21779995}}
+{{#pmid: 21330331}}
+{{#pmid: 19957156}}
 {{#pmid: 19920128}}
 Carol Goble on the tension between purists and pragmatists in life-science ontology construction. Plenary talk at SOFG2...
 {{#pmid: 18629186}}
+{{#pmid: 18563371}}
+{{#pmid: 18287709}}
 {{Smallvspace}}
+{{#pmid: 15808743}}
 {{#pmid: 10679470}}
-{{#pmid: 15808743}}
@@ Line 309: / Line 284: @@
 :2017-08-05
 <b>Modified:</b><br />
-:2017-08-05
+:2017-11-12
 <b>Version:</b><br />
-:0.1
+:1.0
 <b>Version history:</b><br />
+*1.0 First live
 *0.1 First stub
 </div>