Difference between revisions of "Workshops/Saskatoon 2015-Exploratory Data Analysis"
Jump to navigation
Jump to search
m |
|||
Line 80: | Line 80: | ||
+ | ==Progress Notes Thursday== | ||
+ | |||
+ | Selected objectives we covered during the workshop: | ||
+ | |||
+ | * subsetting | ||
+ | ** selecting rows and columns by "index" | ||
+ | ** ... by rowname or columnname as string or vector of strings | ||
+ | ** ... using the $ sign for individual columns of a dataframe | ||
+ | ** using order() to get values sorted by some property | ||
+ | |||
+ | * filtering | ||
+ | ** finding elements that contain a string with grep() (and using that to select rows) | ||
+ | ** finding elements that match a logical expression, such as ==, <, > etc. | ||
+ | |||
+ | |||
+ | ---- | ||
+ | |||
+ | |||
+ | * simple descriptive statistics | ||
+ | ** mean() / median() | ||
+ | ** using as.numeric(), as.logical() etc. to force evaluation as a particular type | ||
+ | ** sd() / IQR() / summary() | ||
+ | ** theoretical and empirical quantiles; quantile() | ||
+ | |||
+ | * random numbers and seeded random numbers; set.seed() | ||
+ | * normally distributed random numbers; rnorm() | ||
+ | |||
+ | * simple plots | ||
+ | ** abline() to draw lines on plots with parameters h= ... or v = ... | ||
+ | |||
+ | ---- | ||
+ | |||
+ | |||
+ | * scatterplot | ||
+ | ** empty plots and overplotting with | ||
+ | *** points() | ||
+ | *** lines() | ||
+ | *** segments() | ||
+ | *** text() | ||
+ | * boxplot | ||
+ | * barplot | ||
+ | * colors | ||
+ | ** color names | ||
+ | ** colors as hexcodes | ||
+ | ** color palettes | ||
+ | ** transparency | ||
+ | * lines | ||
+ | ** linetypes (lty=) / line width (lwd=) | ||
+ | * plotting characters (pch=) | ||
+ | |||
+ | * hexbin package | ||
+ | |||
+ | ---- | ||
+ | |||
+ | * synthetic data is useful | ||
+ | * linear regression | ||
+ | ** retrieving parameters; lm() | ||
+ | ** analyzing quality; resid() | ||
+ | ** plotting prediction and confidence intervals | ||
+ | * non-linear regression | ||
+ | ** set up a formula | ||
+ | ** initiate with starting values | ||
+ | ** plot results | ||
+ | * MIC as alternative to Pearson Correlation | ||
+ | |||
+ | |||
+ | | ||
Line 140: | Line 207: | ||
;Slides | ;Slides | ||
+ | *[[Media:EDA_DimensionReduction.pdf|EDA_DimensionReduction.pdf (pdf of slides)]] | ||
Line 159: | Line 227: | ||
;Slides | ;Slides | ||
+ | *[[Media:EDA_Clustering.pdf|EDA_Clustering.pdf (pdf of slides)]] | ||
;Scripts | ;Scripts | ||
+ | *[[Media:EDA_ClusteringExpressionData.R|EDA_ClusteringExpressionData.R]] | ||
− | ; | + | ;Data |
+ | *[[Media:GSE26922.dat|GSE26922.dat (Fallback data)]] | ||
+ | |||
Line 175: | Line 247: | ||
;Slides | ;Slides | ||
+ | *[[Media:EDA_HypothesisTesting.pdf|EDA_HypothesisTesting.pdf (pdf of slides)]] | ||
;Scripts | ;Scripts | ||
+ | *[[Media:EDA_HypothesisTesting.R|EDA_HypothesisTesting.R]] | ||
;Resources | ;Resources | ||
− | + | *[[Media:Tan_2015-NGSdifferentialTranscription.pdf|Tan_2015-NGSdifferentialTranscription.pdf]] | |
+ | *[[Media:ErroneusAnalysesOfSignificance-NatureNeuroscience2011.pdf|ErroneusAnalysesOfSignificance-NatureNeuroscience2011.pdf]] | ||
| |
Revision as of 14:36, 21 August 2015
Introduction to Exploratory Data Analysis with R
Contents
Schedule
Please note: this schedule is a rough guideline only, we will be very flexible to adapt to class needs as we proceed.
Time | Thursday's Activities | Friday |
09:00 – 10:30 | Lecture and practicals: EDA | Lecture and practicals: Dimension reduction |
10:30 – 11:00 | Coffee break | |
11:00 – 12:30 | Lecture and practicals: EDA | Lecture and practicals: Clustering |
12:30 – 13:30 | Lunch break | |
13:30 – 15:00 | Lecture and practicals: Software development | Lecture and practicals:Clustering |
15:00 – 15:30 | Coffee break | |
13:30 – 15:00 | Lecture and practicals: Regression | Lecture and practicals: Hypothesis testing |
General Resources
Progress Notes Thursday
Selected objectives we covered during the workshop:
- subsetting
- selecting rows and columns by "index"
- ... by rowname or columnname as string or vector of strings
- ... using the $ sign for individual columns of a dataframe
- using order() to get values sorted by some property
- filtering
- finding elements that contain a string with grep() (and using that to select rows)
- finding elements that match a logical expression, such as ==, <, > etc.
- simple descriptive statistics
- mean() / median()
- using as.numeric(), as.logical() etc. to force evaluation as a particular type
- sd() / IQR() / summary()
- theoretical and empirical quantiles; quantile()
- random numbers and seeded random numbers; set.seed()
- normally distributed random numbers; rnorm()
- simple plots
- abline() to draw lines on plots with parameters h= ... or v = ...
- scatterplot
- empty plots and overplotting with
- points()
- lines()
- segments()
- text()
- empty plots and overplotting with
- boxplot
- barplot
- colors
- color names
- colors as hexcodes
- color palettes
- transparency
- lines
- linetypes (lty=) / line width (lwd=)
- plotting characters (pch=)
- hexbin package
- synthetic data is useful
- linear regression
- retrieving parameters; lm()
- analyzing quality; resid()
- plotting prediction and confidence intervals
- non-linear regression
- set up a formula
- initiate with starting values
- plot results
- MIC as alternative to Pearson Correlation
EDA
- Slides
- Scripts
- EDA.R (the main script for this session)
- SubsettingQuizAnswers.R (Answers, in case you didn't already write them in your script)
- PlottingReference.R
- Data
- Resources
Software
- Links
- Resources
Regression
- Slides
- Scripts
Dimension Reduction
- Slides
- Scripts
- Data
- Resources
Clustering
- Slides
- Scripts
- Data
Hypothesis Testing
- Slides
- Scripts
- Resources
Generally useful links
- Help and Information
- The R help mailing list: https://stat.ethz.ch/mailman/listinfo/r-help
- Rseek: the specialized search engine for R topics: http://rseek.org/
- R questions on stackoverflow: http://stackoverflow.com/questions/tagged/r
- The Comprehensive R Archive Network CRAN: http://cran.r-project.org/
- The CRAN task-view collection: http://cran.r-project.org/web/views/
- Bioconductor task views: http://www.bioconductor.org/packages/release/BiocViews.html
- Resources
Notes