Difference between revisions of "RPR-GEO2R"

From "A B C"
Jump to navigation Jump to search
m
m
Line 19: Line 19:
  
  
{{STUB}}
+
{{LIVE}}
  
 
{{Vspace}}
 
{{Vspace}}
Line 29: Line 29:
 
<section begin=abstract />
 
<section begin=abstract />
 
<!-- included from "../components/RPR-GEO2R.components.wtxt", section: "abstract" -->
 
<!-- included from "../components/RPR-GEO2R.components.wtxt", section: "abstract" -->
...
+
This unit demonstrates accessing and working with datasets downloaded from NCBI GEO.
 
<section end=abstract />
 
<section end=abstract />
  
Line 51: Line 51:
 
=== Objectives ===
 
=== Objectives ===
 
<!-- included from "../components/RPR-GEO2R.components.wtxt", section: "objectives" -->
 
<!-- included from "../components/RPR-GEO2R.components.wtxt", section: "objectives" -->
...
+
This unit will ...
 +
* ... teach downloading and annotating GEO data, and performing differential expression analysis.
  
 
{{Vspace}}
 
{{Vspace}}
Line 58: Line 59:
 
=== Outcomes ===
 
=== Outcomes ===
 
<!-- included from "../components/RPR-GEO2R.components.wtxt", section: "outcomes" -->
 
<!-- included from "../components/RPR-GEO2R.components.wtxt", section: "outcomes" -->
...
+
After working through this unit you ...
 +
* ... can access GEO data;
 +
* ... are familar with the structure of GEO expression sets;
 +
* ... can annotate the data, perform differential expression anlysis and critically evaluate the results.
  
 
{{Vspace}}
 
{{Vspace}}
Line 77: Line 81:
 
=== Evaluation ===
 
=== Evaluation ===
 
<!-- included from "../components/RPR-GEO2R.components.wtxt", section: "evaluation" -->
 
<!-- included from "../components/RPR-GEO2R.components.wtxt", section: "evaluation" -->
<!-- included from "ABC-unit_components.wtxt", section: "eval-TBD" -->
+
This learning unit can be evaluated for a maximum of 6 marks. If you want to submit the tasks for this unit for credit:
<b>Evaluation: TBD</b><br />
+
# Create a new page on the student Wiki as a subpage of your User Page.
:This unit can be submitted for evaluation for a maximum of 6 marks. Details TBD.
+
# There are a number of tasks in which you are explicitly asked you to submit code or other text for credit. Put all of these submission on this one page.
 +
# When you are done with everything, add the following category tag to the page:
 +
::<code><nowiki>[[Category:EVAL-RPR-GEO2R]]</nowiki></code>
 +
'''Do not''' change your submission page after this tag has been added. The page will be marked and the category tag will be removed by the instructor.
  
 
{{Vspace}}
 
{{Vspace}}
Line 89: Line 96:
 
<!-- included from "../components/RPR-GEO2R.components.wtxt", section: "contents" -->
 
<!-- included from "../components/RPR-GEO2R.components.wtxt", section: "contents" -->
  
 
+
{{ABC-unit|RPR-GEO2R.R}}
GEO regex example
 
 
 
    ===Labeling===
 
    <div class="mw-collapsible mw-collapsed  exercise-box" data-expandtext="Hint" data-collapsetext="Collapse">
 
    Write an '''R''' script that creates ''meaningful'' labels for data elements from metadata and shows them in a plot. Use the sample data below - or any other data you are interested in.
 
 
 
 
 
 
 
  <div class="mw-collapsible mw-collapsed exercise-box" data-expandtext="Expand" data-collapsetext="Collapse" style="background-color:#EEEEF9;">
 
    Sample input data from GEO, and task description ...
 
  <div class="mw-collapsible-content">
 
    These data were downloaded from the NCBI GEO database using the GEO2R tool, this is a microarray expression data study that compares tumor and metastasis tissue. You can access the dataset [http://www.ncbi.nlm.nih.gov/geo/geo2r/?acc=GSE42952 '''here'''.] Grouping primary PDAC (pancreatic ductal adenocarcinoma) as "tumor" and liver/peritoneal metastasis as "metastasis", an '''R''' script on the server calculates significantly differentially expressed genes using the {{[http://www.bioconductor.org/packages/2.12/bioc/html/limma.html Bioconductor limma package]. I have selected the top 100 genes, and now would like to plot significance (adjusted P value) vs. level of differential expression (logFC). Moreover I would like to vaguely identify the function of each gene if that is discernible from the  "Gene title".
 
 
 
      <source lang="text">
 
        "ID" "adj.P.Val" "P.Value" "t" "B" "logFC" "Gene.symbol" "Gene.title"
 
      "238376_at" "3.69e-19" "4.53e-23" "-49.138515" "42.43328" "-2.202043" "LOC100505564///DEXI" "uncharacterized LOC100505564///Dexi homolog (mouse)"
 
      "214041_x_at" "2.36e-17" "8.74e-21" "38.089228" "37.60995" "4.541989" "RPL37A" "ribosomal protein L37a"
 
      "241662_x_at" "2.36e-17" "1.03e-20" "-37.793765" "37.45851" "-2.105123" "" ""
 
      "231628_s_at" "2.36e-17" "1.16e-20" "-37.574182" "37.34507" "-1.97516" "SERPINB6" "serpin peptidase inhibitor, clade B (ovalbumin), member 6"
 
      "224760_at" "3.23e-17" "2.10e-20" "36.500909" "36.77932" "3.798724" "SP1" "Sp1 transcription factor"
 
      "214149_s_at" "3.23e-17" "2.38e-20" "36.282193" "36.66167" "4.246787" "ATP6V0E1" "ATPase, H+ transporting, lysosomal 9kDa, V0 subunit e1"
 
      "243177_at" "4.15e-17" "3.57e-20" "-35.573827" "36.275" "-1.801709" "" ""
 
      "243800_at" "5.63e-17" "5.52e-20" "-34.825113" "35.85663" "-2.018088" "NR1H4" "nuclear receptor subfamily 1, group H, member 4"
 
      "238398_s_at" "1.10e-16" "1.21e-19" "-33.519208" "35.10201" "-2.245806" "" ""
 
      "1569856_at" "1.48e-16" "1.82e-19" "-32.860752" "34.70891" "-1.810438" "TPP2" "tripeptidyl peptidase II"
 
      "1555116_s_at" "1.51e-16" "2.14e-19" "-32.598656" "34.55" "-1.990665" "SLC11A1" "solute carrier family 11 (proton-coupled divalent metal ion transporters), member 1"
 
      "218733_at" "1.51e-16" "2.23e-19" "32.535823" "34.51169" "2.764663" "MSL2" "male-specific lethal 2 homolog (Drosophila)"
 
      "201225_s_at" "2.72e-16" "4.33e-19" "31.497695" "33.86667" "3.447828" "SRRM1" "serine/arginine repetitive matrix 1"
 
      "217052_x_at" "4.45e-16" "7.64e-19" "30.636232" "33.31345" "1.601527" "" ""
 
      "1569348_at" "5.24e-16" "9.65e-19" "-30.289176" "33.08577" "-1.793925" "TPTEP1" "transmembrane phosphatase with tensin homology pseudogene 1"
 
      "219492_at" "6.96e-16" "1.37e-18" "29.777415" "32.74483" "3.586919" "CHIC2" "cysteine-rich hydrophobic domain 2"
 
      "215047_at" "7.51e-16" "1.58e-18" "-29.567379" "32.60307" "-2.033635" "TRIM58" "tripartite motif containing 58"
 
      "232877_at" "7.51e-16" "1.66e-18" "-29.491388" "32.55151" "-1.65225" "" ""
 
      "229265_at" "7.51e-16" "1.75e-18" "29.419139" "32.50236" "3.933071" "SKI" "v-ski sarcoma viral oncogene homolog (avian)"
 
      "1553842_at" "8.16e-16" "2.00e-18" "-29.226409" "32.37061" "-1.832581" "BEND2" "BEN domain containing 2"
 
      "220791_x_at" "1.11e-15" "2.87e-18" "-28.71601" "32.01715" "-1.969381" "SCN11A" "sodium channel, voltage-gated, type XI, alpha subunit"
 
      "212911_at" "1.17e-15" "3.15e-18" "28.584094" "31.92471" "2.143175" "DNAJC16" "DnaJ (Hsp40) homolog, subfamily C, member 16"
 
      "243464_at" "1.22e-15" "3.43e-18" "-28.463254" "31.83963" "-1.675747" "" ""
 
      "243823_at" "1.30e-15" "3.81e-18" "-28.316669" "31.7359" "-1.499823" "" ""
 
      "201533_at" "1.56e-15" "4.80e-18" "27.999089" "31.5092" "4.054743" "CTNNB1" "catenin (cadherin-associated protein), beta 1, 88kDa"
 
      "210878_s_at" "1.59e-15" "5.06e-18" "27.927536" "31.45775" "2.982033" "KDM3B" "lysine (K)-specific demethylase 3B"
 
      "227712_at" "3.18e-15" "1.05e-17" "26.938855" "30.73223" "2.426311" "LYRM2" "LYR motif containing 2"
 
      "228520_s_at" "3.56e-15" "1.22e-17" "26.742683" "30.58495" "3.744881" "APLP2" "amyloid beta (A4) precursor-like protein 2"
 
      "210242_x_at" "3.80e-15" "1.36e-17" "26.605262" "30.48111" "1.815311" "ST20" "suppressor of tumorigenicity 20"
 
      "217301_x_at" "3.80e-15" "1.40e-17" "26.565414" "30.45089" "3.275566" "RBBP4" "retinoblastoma binding protein 4"
 
      "1557551_at" "6.17e-15" "2.35e-17" "-25.892664" "29.93351" "-1.78824" "" ""
 
      "201392_s_at" "6.17e-15" "2.42e-17" "25.856344" "29.90519" "3.283483" "IGF2R" "insulin-like growth factor 2 receptor"
 
      "210371_s_at" "7.18e-15" "2.91e-17" "25.62344" "29.72255" "3.463431" "RBBP4" "retinoblastoma binding protein 4"
 
      "204252_at" "9.08e-15" "3.79e-17" "25.291186" "29.45902" "2.789842" "CDK2" "cyclin-dependent kinase 2"
 
      "243200_at" "1.04e-14" "4.48e-17" "-25.082134" "29.29138" "-1.539093" "" ""
 
      "201140_s_at" "1.16e-14" "5.13e-17" "24.916407" "29.15746" "2.834707" "RAB5C" "RAB5C, member RAS oncogene family"
 
      "1559066_at" "1.23e-14" "5.57e-17" "-24.813534" "29.07387" "-1.595061" "" ""
 
      "201123_s_at" "1.27e-14" "5.91e-17" "24.741268" "29.01494" "4.870779" "EIF5A" "eukaryotic translation initiation factor 5A"
 
      "218291_at" "1.41e-14" "6.83e-17" "24.565645" "28.87099" "2.605328" "LAMTOR2" "late endosomal/lysosomal adaptor, MAPK and MTOR activator 2"
 
      "217704_x_at" "1.41e-14" "6.91e-17" "-24.550405" "28.85845" "-1.711476" "SUZ12P1" "suppressor of zeste 12 homolog pseudogene 1"
 
      "227338_at" "1.44e-14" "7.22e-17" "-24.498114" "28.81536" "-2.927581" "LOC440983" "hypothetical gene supported by BC066916"
 
      "210231_x_at" "1.64e-14" "8.47e-17" "24.305184" "28.65556" "4.548338" "SET" "SET nuclear oncogene"
 
      "225289_at" "1.86e-14" "9.82e-17" "24.127523" "28.50726" "3.062123" "STAT3" "signal transducer and activator of transcription 3 (acute-phase response factor)"
 
      "204658_at" "1.93e-14" "1.04e-16" "24.056703" "28.44783" "2.868797" "TRA2A" "transformer 2 alpha homolog (Drosophila)"
 
      "208819_at" "2.54e-14" "1.40e-16" "23.705016" "28.15009" "2.593365" "RAB8A" "RAB8A, member RAS oncogene family"
 
      "210011_s_at" "2.58e-14" "1.46e-16" "23.660126" "28.11176" "2.309763" "EWSR1" "EWS RNA-binding protein 1"
 
      "202397_at" "2.58e-14" "1.48e-16" "23.638422" "28.0932" "4.332132" "NUTF2" "nuclear transport factor 2"
 
      "1552628_a_at" "2.86e-14" "1.68e-16" "23.492249" "27.96778" "2.892763" "HERPUD2" "HERPUD family member 2"
 
      "233757_x_at" "3.85e-14" "2.31e-16" "23.123802" "27.64812" "2.430056" "" ""
 
      "201545_s_at" "5.07e-14" "3.16e-16" "22.767216" "27.33385" "2.568005" "PABPN1" "poly(A) binding protein, nuclear 1"
 
      "1562463_at" "5.07e-14" "3.17e-16" "-22.763883" "27.33089" "-1.119718" "" ""
 
      "219859_at" "5.41e-14" "3.45e-16" "-22.669239" "27.24664" "-1.787549" "CLEC4E" "C-type lectin domain family 4, member E"
 
      "1569136_at" "6.91e-14" "4.50e-16" "-22.372385" "26.98011" "-1.95396" "MGAT4A" "mannosyl (alpha-1,3-)-glycoprotein beta-1,4-N-acetylglucosaminyltransferase, isozyme A"
 
      "208601_s_at" "7.15e-14" "4.74e-16" "-22.314594" "26.92781" "-1.323653" "TUBB1" "tubulin, beta 1 class VI"
 
      "226194_at" "1.11e-13" "7.47e-16" "21.813583" "26.46872" "2.331245" "CHAMP1" "chromosome alignment maintaining phosphoprotein 1"
 
      "217877_s_at" "1.15e-13" "7.93e-16" "21.748093" "26.40795" "2.862688" "GPBP1L1" "GC-rich promoter binding protein 1-like 1"
 
      "225371_at" "1.25e-13" "8.73e-16" "21.644444" "26.31139" "2.518013" "GLE1" "GLE1 RNA export mediator homolog (yeast)"
 
      "1563431_x_at" "1.44e-13" "1.02e-15" "21.472848" "26.15053" "1.874743" "CALM3" "calmodulin 3 (phosphorylase kinase, delta)"
 
      "211505_s_at" "1.45e-13" "1.06e-15" "21.437744" "26.11746" "2.642609" "STAU1" "staufen double-stranded RNA binding protein 1"
 
      "201585_s_at" "1.45e-13" "1.07e-15" "21.430113" "26.11027" "2.787833" "SFPQ" "splicing factor proline/glutamine-rich"
 
      "225197_at" "1.75e-13" "1.31e-15" "21.212989" "25.90451" "2.845005" "" ""
 
      "220336_s_at" "1.83e-13" "1.41e-15" "-21.132294" "25.82752" "-1.848273" "GP6" "glycoprotein VI (platelet)"
 
      "216515_x_at" "1.83e-13" "1.42e-15" "21.128023" "25.82343" "2.877477" "MIR1244-2///MIR1244-3///MIR1244-1///PTMAP5///PTMA" "microRNA 1244-2///microRNA 1244-3///microRNA 1244-1///prothymosin, alpha pseudogene 5///prothymosin, alpha"
 
      "241773_at" "3.49e-13" "2.74e-15" "-20.441442" "25.15639" "-1.835223" "" ""
 
      "1558011_at" "3.89e-13" "3.15e-15" "-20.297118" "25.01342" "-1.577874" "LOC100510697" "putative POM121-like protein 1-like"
 
      "215240_at" "3.89e-13" "3.15e-15" "-20.29699" "25.01329" "-1.613308" "ITGB3" "integrin, beta 3 (platelet glycoprotein IIIa, antigen CD61)"
 
      "233746_x_at" "3.95e-13" "3.25e-15" "20.265986" "24.98245" "2.364699" "HYPK///SERF2" "huntingtin interacting protein K///small EDRK-rich factor 2"
 
      "1555338_s_at" "4.10e-13" "3.42e-15" "-20.214797" "24.93143" "-1.280803" "AQP10" "aquaporin 10"
 
      "217714_x_at" "4.12e-13" "3.48e-15" "20.195128" "24.91179" "2.247023" "STMN1" "stathmin 1"
 
      "202276_at" "4.75e-13" "4.08e-15" "20.035595" "24.75183" "2.654202" "SHFM1" "split hand/foot malformation (ectrodactyly) type 1"
 
      "225414_at" "6.34e-13" "5.52e-15" "19.733786" "24.44585" "3.287225" "RNF149" "ring finger protein 149"
 
      "243930_x_at" "7.43e-13" "6.64e-15" "-19.55046" "24.2578" "-1.219467" "" ""
 
      "1569263_at" "7.43e-13" "6.66e-15" "-19.548534" "24.25581" "-1.662363" "" ""
 
      "1554876_a_at" "8.55e-13" "7.77e-15" "-19.397142" "24.09923" "-1.388081" "S100Z" "S100 calcium binding protein Z"
 
      "220001_at" "1.08e-12" "9.97e-15" "-19.15375" "23.84505" "-1.412727" "PADI4" "peptidyl arginine deiminase, type IV"
 
      "228170_at" "1.12e-12" "1.05e-14" "-19.106672" "23.79554" "-1.840114" "OLIG1" "oligodendrocyte transcription factor 1"
 
      "211445_x_at" "1.29e-12" "1.22e-14" "-18.959325" "23.63981" "-1.134266" "NACAP1" "nascent-polypeptide-associated complex alpha polypeptide pseudogene 1"
 
      "1555311_at" "1.33e-12" "1.27e-14" "-18.91869" "23.59666" "-1.45603" "" ""
 
      "201643_x_at" "1.47e-12" "1.43e-14" "18.808994" "23.47974" "1.867155" "KDM3B" "lysine (K)-specific demethylase 3B"
 
      "216449_x_at" "1.51e-12" "1.48e-14" "18.773094" "23.44134" "3.178009" "HSP90B1" "heat shock protein 90kDa beta (Grp94), member 1"
 
      "218680_x_at" "1.51e-12" "1.50e-14" "18.763896" "23.43149" "2.262739" "HYPK///SERF2" "huntingtin interacting protein K///small EDRK-rich factor 2"
 
      "225954_s_at" "1.65e-12" "1.67e-14" "18.662853" "23.32298" "2.405388" "MIDN" "midnolin"
 
      "203102_s_at" "1.65e-12" "1.68e-14" "18.658192" "23.31796" "2.476697" "MGAT2" "mannosyl (alpha-1,6-)-glycoprotein beta-1,2-N-acetylglucosaminyltransferase"
 
      "1569345_at" "1.69e-12" "1.74e-14" "18.624203" "23.28133" "1.236884" "" ""
 
      "214001_x_at" "1.71e-12" "1.78e-14" "18.598496" "23.25358" "2.570012" "" ""
 
      "231812_x_at" "1.72e-12" "1.81e-14" "18.583236" "23.2371" "1.678685" "PHAX" "phosphorylated adaptor for RNA export"
 
      "232075_at" "1.93e-12" "2.06e-14" "-18.462717" "23.10643" "-2.150701" "WDR61" "WD repeat domain 61"
 
      "200669_s_at" "1.96e-12" "2.12e-14" "18.438729" "23.08033" "1.891968" "UBE2D3" "ubiquitin-conjugating enzyme E2D 3"
 
      "236995_x_at" "2.04e-12" "2.23e-14" "-18.389604" "23.02677" "-1.879369" "TFEC" "transcription factor EC"
 
      "218008_at" "2.24e-12" "2.48e-14" "18.291537" "22.91946" "2.445428" "TMEM248" "transmembrane protein 248"
 
      "217140_s_at" "2.30e-12" "2.56e-14" "18.260017" "22.88485" "3.983721" "VDAC1" "voltage-dependent anion channel 1"
 
      "210183_x_at" "2.46e-12" "2.79e-14" "18.183339" "22.80044" "1.79105" "PNN" "pinin, desmosome associated protein"
 
      "216954_x_at" "2.46e-12" "2.80e-14" "-18.177967" "22.79451" "-1.090193" "ATP5O" "ATP synthase, H+ transporting, mitochondrial F1 complex, O subunit"
 
      "207688_s_at" "2.53e-12" "2.92e-14" "18.141153" "22.75385" "2.492309" "INHBC" "inhibin, beta C"
 
      "218020_s_at" "2.63e-12" "3.06e-14" "18.095669" "22.70351" "1.772689" "ZFAND3" "zinc finger, AN1-type domain 3"
 
      "217756_x_at" "3.12e-12" "3.67e-14" "17.930201" "22.51939" "1.914366" "SERF2" "small EDRK-rich factor 2"
 
      "214150_x_at" "3.42e-12" "4.07e-14" "-17.835551" "22.41336" "-1.177963" "ATP6V0E1" "ATPase, H+ transporting, lysosomal 9kDa, V0 subunit e1"
 
      "208750_s_at" "3.48e-12" "4.18e-14" "17.812279" "22.38721" "2.649599" "ARF1" "ADP-ribosylation factor 1"
 
      "201749_at" "3.59e-12" "4.42e-14" "17.761415" "22.32994" "1.917794" "ECE1" "endothelin converting enzyme 1"
 
      </source>
 
        </div>
 
        </div>
 
 
 
 
 
 
 
 
 
        <div class="mw-collapsible-content  exercise-box">
 
        <div class="mw-collapsible mw-collapsed" data-expandtext="Solution" data-collapsetext="Collapse">
 
        Read the data into '''R'''. Plot log(P) against log(FC). Define some regular expressions that identify keywords in the gene title: things like "X-ase", "Y factor", "Z gene" etc. Apply these to the gene titles using {{R|regex()||regexpr()}} and store the results by applying {{R|regmatches()}} to the text. Then use {{R|graphics|text()}} to plot the extracted strings.
 
 
 
 
 
      <div class="mw-collapsible-content  exercise-box">
 
 
 
        <source lang="R">
 
        #GEO-hits.R
 
        # bs - Sept. 2013
 
 
 
        dat <- read.table("GEO-hits_100.txt", header = TRUE) # this is a file of GEO
 
      # differential expression data
 
      head(dat)
 
 
 
      plot(-log(dat[,"adj.P.Val"]), dat[,"logFC"], cex=0.7, pch=16, col="#BB0000")
 
      # Note that all these genes have at least one log of
 
      # differential expression - up or down. As a trend,
 
      # higher probabilities are found for higher levels of
 
      # differential expression.
 
 
 
      # The dataframe produced by R's read.table() function
 
      # defines all character-containing rows as _factors_.
 
      # However to process them as strings, we need to convert
 
      # them to characters.
 
 
 
      dat[,"Gene.title"] <- as.character(dat[,"Gene.title"])
 
 
 
      # First, let's define some regexes for keywords to guess
 
      # a function ...
 
 
 
      # (Note the need for doubled escape characters in R!)
 
 
 
      r <- c(  "\\b(\\w+ase)\\b")  # peptidase, kinase ...
 
      r <- c(r, "\\b(?!factor)(\\w+or)") # suppressor, adaptor ...
 
      r <- c(r, "\\b(\\w+)\\b\\s(factor|protein|homolog)") # the preceeding word ...
 
 
 
 
 
      # Now iterate over the Gene.title column and for each row try all regular
 
      # expressions.
 
 
 
      for (i in 1:nrow(dat)) { # for all rows ...
 
        for (j in 1:length(r)) { # for all regular expressions
 
          dat[i,"Function.guess"] <- "" # clear the contents of the column
 
          M <- regexpr(r[j], dat[i, "Gene.title"], perl = TRUE)
 
          if (M[1] > 0) {
 
            dat[i,"Function.guess"] <- regmatches(dat[i,"Gene.title"], M)
 
            break  # stop regexing if something was found
 
          }
 
        }
 
      }
 
 
 
      dat[,"Function.guess"] # check what we found ...
 
      # ... and plot the strings to the right of its point.
 
      text(-log(dat[,"adj.P.Val"]), dat[,"logFC"], dat[,"Function.guess"], cex=0.4, pos=4)
 
 
 
      # I'm not sure we are actually learning anything important from this.
 
      # But the code was merely meant to illustrate how
 
      # to work with regular expressions in R (and introduce you to GEO
 
      # differential expression data on the side). Mission accomplished.
 
 
 
      </source>
 
 
 
 
 
 
 
 
 
  
  
Line 345: Line 165:
 
:2017-08-05
 
:2017-08-05
 
<b>Modified:</b><br />
 
<b>Modified:</b><br />
:2017-08-05
+
:2017-11-11
 
<b>Version:</b><br />
 
<b>Version:</b><br />
:0.1
+
:1.0
 
<b>Version history:</b><br />
 
<b>Version history:</b><br />
 +
*1.0 First live version
 
*0.1 First stub
 
*0.1 First stub
 
</div>
 
</div>

Revision as of 09:36, 12 November 2017

Abstract

This unit demonstrates accessing and working with datasets downloaded from NCBI GEO.


 


This unit ...

Prerequisites

You need the following preparation before beginning this unit. If you are not familiar with this material from courses you took previously, you need to prepare yourself from other information sources:

  • The Central Dogma: Regulation of transcription and translation; protein biosynthesis and degradation; quality control.

You need to complete the following units before beginning this one:


 


Objectives

This unit will ...

  • ... teach downloading and annotating GEO data, and performing differential expression analysis.


 


Outcomes

After working through this unit you ...

  • ... can access GEO data;
  • ... are familar with the structure of GEO expression sets;
  • ... can annotate the data, perform differential expression anlysis and critically evaluate the results.


 


Deliverables

  • Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
  • Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.
  • Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.


 


Evaluation

This learning unit can be evaluated for a maximum of 6 marks. If you want to submit the tasks for this unit for credit:

  1. Create a new page on the student Wiki as a subpage of your User Page.
  2. There are a number of tasks in which you are explicitly asked you to submit code or other text for credit. Put all of these submission on this one page.
  3. When you are done with everything, add the following category tag to the page:
[[Category:EVAL-RPR-GEO2R]]

Do not change your submission page after this tag has been added. The page will be marked and the category tag will be removed by the instructor.


 


Contents

Task:

 
  • Open RStudio and load the ABC-units R project. If you have loaded it before, choose FileRecent projectsABC-Units. If you have not loaded it before, follow the instructions in the RPR-Introduction unit.
  • Choose ToolsVersion ControlPull Branches to fetch the most recent version of the project from its GitHub repository with all changes and bug fixes included.
  • Type init() if requested.
  • Open the file RPR-GEO2R.R and follow the instructions.


 

Note: take care that you understand all of the code in the script. Evaluation in this course is cumulative and you may be asked to explain any part of code.


 


 


Further reading, links and resources

This unit has focussed on microarray analysis with GEO2R. For RNAseq experiments, refer to the excellent Bioconductor RNAseq analysis tutorial.


 


Notes


 


Self-evaluation

 



 




 

If in doubt, ask! If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.



 

About ...
 
Author:

Boris Steipe <boris.steipe@utoronto.ca>

Created:

2017-08-05

Modified:

2017-11-11

Version:

1.0

Version history:

  • 1.0 First live version
  • 0.1 First stub

CreativeCommonsBy.png This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.