Workshops/Saskatoon 2015-Exploratory Data Analysis
Jump to navigation
Jump to search
Introduction to Exploratory Data Analysis with R
Contents
Schedule
Please note: this schedule is a rough guideline only, we will be very flexible to adapt to class needs as we proceed.
Time | Thursday's Activities | Friday |
09:00 – 10:30 | Lecture and practicals: EDA | Lecture and practicals: Dimension reduction |
10:30 – 11:00 | Coffee break | |
11:00 – 12:30 | Lecture and practicals: EDA | Lecture and practicals: Clustering |
12:30 – 13:30 | Lunch break | |
13:30 – 15:00 | Lecture and practicals: Software development | Lecture and practicals:Clustering |
15:00 – 15:30 | Coffee break | |
13:30 – 15:00 | Lecture and practicals: Regression | Lecture and practicals: Hypothesis testing |
General Resources
Progress Notes Thursday
Selected objectives we covered during the workshop:
- subsetting
- selecting rows and columns by "index"
- ... by rowname or columnname as string or vector of strings
- ... using the $ sign for individual columns of a dataframe
- using order() to get values sorted by some property
- filtering
- finding elements that contain a string with grep() (and using that to select rows)
- finding elements that match a logical expression, such as ==, <, > etc.
- simple descriptive statistics
- mean() / median()
- using as.numeric(), as.logical() etc. to force evaluation as a particular type
- sd() / IQR() / summary()
- theoretical and empirical quantiles; quantile()
- random numbers and seeded random numbers; set.seed()
- normally distributed random numbers; rnorm()
- simple plots
- abline() to draw lines on plots with parameters h= ... or v = ...
- scatterplot
- empty plots and overplotting with
- points()
- lines()
- segments()
- text()
- empty plots and overplotting with
- boxplot
- barplot
- colors
- color names
- colors as hexcodes
- color palettes
- transparency
- lines
- linetypes (lty=) / line width (lwd=)
- plotting characters (pch=)
- hexbin package
- synthetic data is useful
- linear regression
- retrieving parameters; lm()
- analyzing quality; resid()
- plotting prediction and confidence intervals
- non-linear regression
- set up a formula
- initiate with starting values
- plot results
- MIC as alternative to Pearson Correlation
EDA
- Slides
- Scripts
- EDA.R (the main script for this session)
- SubsettingQuizAnswers.R (Answers, in case you didn't already write them in your script)
- PlottingReference.R
- Data
- Resources
Software
- Links
- Resources
Regression
- Slides
- Scripts
Dimension Reduction
- Slides
- Scripts
- Data
- Resources
Clustering
- Slides
- Scripts
- Data
Hypothesis Testing
- Slides
- Scripts
- Resources
Generally useful links
- Help and Information
- The R help mailing list: https://stat.ethz.ch/mailman/listinfo/r-help
- Rseek: the specialized search engine for R topics: http://rseek.org/
- R questions on stackoverflow: http://stackoverflow.com/questions/tagged/r
- The Comprehensive R Archive Network CRAN: http://cran.r-project.org/
- The CRAN task-view collection: http://cran.r-project.org/web/views/
- Bioconductor task views: http://www.bioconductor.org/packages/release/BiocViews.html
- Resources
Notes