ABC-INT-Mutation impact
Integration Unit: Mutation Impact
Keywords: Integration unit: assess the impact of mutations in a gene
Contents
This page is only a stub; it is here as a placeholder to establish the logical framework of the site but there is no significant content as yet. Do not work with this material until it is updated to "live" status.
Abstract
This page assesses the learning units for working with sequence data.
- Write code to access a mutated RNA sequence and a reference sequence.
- Estimate distribution of missense and nonsense mutations
- Assess sifgnificance of observed changes
This unit ...
Prerequisites
You need to complete the following units before beginning this one:
- FND-STA-Significance (Significance)
- RPR-Genetic_code_optimality (Optimality of the Genetic Code: an R Exploration)
- RPR-Unit_testing (Testing R code)
Deliverables
- No separate deliverables: This unit collects other units and has no deliverables on its own.
Evaluation
- This "Integrator Unit" should be submitted for evaluation for a maximum of 10 marks.
- Please note the evaluation types that are available as options for this unit. Choose one evaluation type that you have not chosen for another Integrator Unit. (Each submitted Integrator Unit must be evaluated in a different way and one of your evaluations - but not your first one - must be an oral exam).
- Report option
- Work through the tasks described in the scenario.
- Document your results in a short report on a subpage of your User page on the Student Wiki. Describe your methods (R-code!) in an appendix;
- When you are done with everything, add the following category tag to the page:
[[Category:EVAL-INT-Mutation_impact]]
- Do not change your submission page after this tag has been added. The page will be marked and the category tag will be removed by the instructor.
- Oral exam option
- Work through the tasks described in the scenario. Remember to document your work in your journal.
- Part of your task will involve writing an R script, place that code in a subpage of your User page on the Student Wiki and link to it from your Journal. (Do not add an evaluation category tag to that code).
- Your work must be complete before 21:00 on the day of your exam.
- Schedule an oral exam by editing the signup page on the Student Wiki. Enter the unit that you are signing up for, and your name. You must have signed-up for an exam slot before 21:00 on the day before your exam.
- R code option
- Work through the tasks described in the scenario and develop code as required.
- Put your code on a subpage of your User page on the Student Wiki;
- When you are done with everything, add the following category tag to the page:
[[Category:EVAL-INT-Mutation_impact]]
- Do not change your submission page after this tag has been added. The page will be marked and the category tag will be removed by the instructor.
Contents
Scenario background
Cancer is a genetic disease and one aspect that makes cancer hard to treat is that cancer cells progress through their own micro-evolution and become progressively more aggressive and treatment-resistant. But since the cancer phenotype is ultimately based on genetic alterations, it is important to understand which genes contribute. Unfortunately this is not as simple as just sequencing a few cancers: one of the hallmarks of the disease is genome instability (this contributes to the accelerated evolution), and it is very difficult to distinguish causal mutations from incidental mutations, or, driver genes from passenger genes.
However, an analysis of the distribution of mutations may help. Passenger mutations are expected to be randomly distributed throughout the genome, driver mutations are expected to have either a gain of function or loss of function effect. Gain of function mutations are expected to be very specific, targeting only a small number of amino acids in a defined region of the protein. We actually expect purifying selection against mutations elsewhere. Loss of function mutations are expected to include nonsense mutations, frameshifts, but above all, they should be enriched in missense and nonsense mutations relative to silent mutations.
The task of this unit is to analyze the relative frequencies of neutral, missense and nonsense mutations in a gene, and contrast that with the frequencies one would expect if the distribution of mutations was purely due to chance. This analysis should work on an actual sequence, and consider actually observed mutations. We will develop it to evaluate mutations of the KRas gene, a known cancer driver, an olfactory receptor (OR1A1), most likely not involved in cancer, and the PTPN11 phosphatase, a gene of interest whose role in cancer we would like to understand better.
KRas and cancer
Nucleotide binding domains are among the oldest known protein families and one family in particular, the G-proteins has diverse roles in all domains of life. These are collectively called GTP hydrolases, or GTPases – a misnomer, since even though they do catalyze the hydrolysis of GTP to GDP, their role in the cell has nothing to do with GTP metabolism, but comes from a conformational change that accompanies binding to either GTP or GDP. As far as enzymes go, GTPases are rather slow.
A large family among these G-proteins are the Ras proteins: these act as molecular switches. In humans, there are three isoforms of Ras called HRas, KRas and NRas. These are differentially expressed in tissues and have slightly different C-termini through which they are localized to different membrane subdomains. When Ras binds GTP, it adopts a stable, active ON conformation through which it activates effector proteins. But then the Ras protein slowly hydrolyses GTP to GDP, it undergoes a conformational change and enters the OFF state. Then GDP dissociates from the binding site, Ras can re-bind GTP and is once again switched on. This cycle is modified by interactors: GEF proteins (Guanine Nucleotide Exchange factors such as Sos) catalyze the dissociation of GDP and thus speed up the re-uptake of GTP and re-activation of Ras. Thus they shift the cycle to an active state. GAP proteins (GTPase activating proteins such as P120GAP) speed up the conversion of GTP to GDP. This shifts the cycle towards its inactive state.
One of the most important pathways for cell proliferation is the EGFR pathway that feeds into the MAPK cascade. Under physiological conditions, the active EGFR activates the Sos protein, which shifts a pool of Ras molecules into their active state. Active Ras then turns on its effectors – among them Raf1 – which activates a signalling cascade that induces cell proliferation. This is limited by GAPs that speed up Ras GTPase activity which turns the protein off. Deactivation of Sos when the EGFR is inactive ensures that GDP remains bound and the Ras protein pool remains off. This matches our expectations about the roles of these proteins well.
The problem is that this system can go terribly wrong if Ras gets mutated in a way that damages its catalytic activity and prevent GTP hydrolysis. Activating GAPs no longer works to switch Ras off, because if the Ras active site is dead, GAPs have no way of inducing it. And inhibiting GEFs does not switch Ras off either, because GTP does not get hydrolyzed to GDP and there is no need for GEFs to clear the active site of GDP. The switch is ON and stays ON. The EGFR pathway is on and stays on. The cell proliferates out of control. This can be the first step of transforming a cell into a cancer cell and this exact mutation in the KRas protein is the third-most frequent mutation seen in cancer genome studies and possibly the most powerful cancer driver mutation of all. The big issue about all this is that mutant Ras is generally considered "undruggable": we can't imagine small molecule drugs that would restore Ras' catalytic activity, and the affinity of GTP to the molecule is so high that we haven't found competitive antagonists that don't have dramatic side effects. An interesting new development therefore was the recent discovery that a phosphatase - PTPN11 - somehow works synergistically with Ras to facilitate its activation of effectors: inhibition of PTPN11 suppressed oncogenesis[1]. If this is a pathophysiologically relevent effect, we expect cancer mutations to spare PTPN11. Do they?
Cancer gene data
Knowledge about the mutations of cancer comes from large-scale genome sequencing efforts of cancer tissue samples, and is collected and curated by a small number of databases. These databases sift through the massive volumes of sequence changes, distinguish natural variation from novel somatic mutations, and map the nucletotide changes to individual genes. One of these resources is the IntOGen database in Barcelona.
Task:
- visit IntOGen.
- find the KRas information page and briefly explore the information that is available.
For the Report Option...
Task:
- Open the RStudio course project.
- Begin a new R script to explore KRas, PTPN11 and OR1A1 mutations.
- Load the data file of mRNAs I have prepared for you. This will create the three R objects,
KRascodons
,PTPN11codons
, andOR1A1codons
:
load(file = "./data/ABC-INT-Mutation_impact.RData")
- Write code that executes a loop
N
times (forN <- 100000
) to create a point mutation randomly in each of the three genes. Keep track of the number of missense, silent ("synonymous"), and nonsense ("truncating")" mutations you find. - Contrast that with the number of mutations in each category reported on the IntOGen Web page for each gene.
- Establish if there is a significant difference between the expected categories of mutations (i.e. the stochastic background that you simulated), and categories of mutations that were observed in cancer genomes.
- Write a short report that interprets your results against the context outlined above: what would you expect if any of these genes were cancer drivers, what do you observe, what can you conclude from your observation?
For the Oral Exam Option...
Task:
- Open the RStudio course project.
- Begin a new R script to explore KRas, PTPN11 and OR1A1 mutations.
- Load the data file of mRNAs I have prepared for you. This will create the three R objects,
KRAscodons
,PTPN11codons
, andOR1A1codons
:
load(file = "./data/ABC-INT-Mutation_impact.RData")
- Write code that executes a loop
N
times (forN <- 100000
) to create a point mutation randomly in each of the three genes. Keep track of the number of missense, silent ("synonymous"), and nonsense ("truncating")" mutations you find. - Contrast that with the number of mutations in each category reported on the IntOGen Web page for each gene.
- Establish if there is a significant difference between the expected categories of mutations (i.e. the stochastic background that you simulated), and categories of mutations that were observed in cancer genomes.
- Document your actvities and results in your Journal. Add a brief conclusion / interpretation.
For the R Code Option...
Task:
- Open the RStudio course project.
- Begin a new R script to develop a function that explores mutation effects, given mRNA and mutation data. You will find the following three mRNA files in the course project's
./data
directory, Use them to develop your function:
./data/KRAS_HSa_coding.fa ./data/PTPN11_HSa_coding.fa ./data/OR1A1_HSa_coding.fa
- Here is a header that specifies the function, its parameters and its value.
evalMut <- function(FA, N) {
# Purpose: evaluate the distribution of silent, missense and nonsense
# codon changes in "mRNA" for N random mutation trials.
# Parameters:
# FA chr Filename of a FASTA formatted sequence file of mRNA
# beginning with a start codon.
# N integer The number of point mutation trials to perform
# Value: list List with the following elements:
# FA chr the input file
# N num same as the input parameter
# nSilent num the mean number of silent mutations
# nMissense num the mean number of missense mutations
# nNonsense num the mean number of nonsense mutations
}
- The IntOGen Website lists the counts and frequencies of silent, missense, and nonsense mutations, but that includes point mutations, splice-site mutations, insertions and deletions. However your method above only simulates the frequency of point mutations; thus, for a correct comparison of observation and expectation we need to distinguish. IntOGen provides data downloads that list the exact mutation and categorize it. Write a second function that reads an IntOGen mutation-distribution file and returns counts for the three categories of point mutations that you are simulating. You will find three files in the course project's
./data
directory that you can use to develop your function:
./data/intogen-KRAS-distribution-data.fa ./data/intogen-PTPN11-distribution-data.fa ./data/intogen-OR1A1-distribution-data.fa
- Here is a header that specifies the function, its parameters and its value.
readIntOGen <- function(IN) {
# Purpose: read and parse an IntOGen mutation data file. Return only the
# number of silent, missense, and nonsense point mutations.
# All indels are ignored.
# Parameters:
# IN chr Filename of an IntOGen mutation data file.
# Value: list List with the following elements:
# nSilent num the mean number of silent mutations
# nMissense num the mean number of missense mutations
# nNonsense num the mean number of nonsense mutations
}
- You may find the function
read.delim()
, orread_tsv()
from thereadr
package useful.
- Ensure that the script is "clean" in the sense that
source()
'ing the file has no effects other than loading the functions and any packages they need.
- Write tests for your function. Place them in a protected block of code that will not get executed when the file gets sourced, like so:
if (FALSE) {
# Code that won't get executed goes here....
}
- Write a brief script that simulates 10000 point mutations of PTPN11 and compares them with the values reported in the distribution-data file. Note whether the resulting differences are significant. Place your script too in a protected block of code that will not get executed.
Further reading, links and resources
Notes
- ↑
Bunda et al. (2015) Inhibition of SHP2-mediated dephosphorylation of Ras suppresses oncogenesis. Nat Commun 6:8859. (pmid: 26617336) [ PubMed ] [ DOI ] Ras is phosphorylated on a conserved tyrosine at position 32 within the switch I region via Src kinase. This phosphorylation inhibits the binding of effector Raf while promoting the engagement of GTPase-activating protein (GAP) and GTP hydrolysis. Here we identify SHP2 as the ubiquitously expressed tyrosine phosphatase that preferentially binds to and dephosphorylates Ras to increase its association with Raf and activate downstream proliferative Ras/ERK/MAPK signalling. In comparison to normal astrocytes, SHP2 activity is elevated in astrocytes isolated from glioblastoma multiforme (GBM)-prone H-Ras(12V) knock-in mice as well as in glioma cell lines and patient-derived GBM specimens exhibiting hyperactive Ras. Pharmacologic inhibition of SHP2 activity attenuates cell proliferation, soft-agar colony formation and orthotopic GBM growth in NOD/SCID mice and decelerates the progression of low-grade astrocytoma to GBM in a spontaneous transgenic glioma mouse model. These results identify SHP2 as a direct activator of Ras and a potential therapeutic target for cancers driven by a previously 'undruggable' oncogenic or hyperactive Ras.
Self-evaluation
If in doubt, ask! If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.
About ...
Author:
- Boris Steipe <boris.steipe@utoronto.ca>
Created:
- 2017-08-05
Modified:
- 2017-08-09
Version:
- 1.0
Version history:
- 1.0 New unit
- 0.1 First stub
This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.