ABC-INT-Expression data

Integration Unit: Expression Data

(Integrator unit: select and normalize expression data)

Abstract:

This page integrates material from the learning units and defines a task for selecting and normalizing human expression data.

Deliverables:

Integrator unit: Deliverables can be submitted for course marks. See below for details.

Prerequisites:
This unit builds on material covered in the following prerequisite units:

RPR-GEO2R (GEO2R)

Work through the tasks described below. Remember to document your work in your journal.
Part of your task will involve writing an R script, place that code in a subpage of your User page on the Student Wiki and link to it from your Journal.
Your work must be complete before 21:00 on the day before your exam.
Schedule an oral exam by editing the signup page on the Student Wiki. You must have signed-up for an exam slot before 20:00 on the day before your exam.

Your task is to select an expression dataset that is suitable for use as features for human genes in machine learning. Currently, expression data are collected with microarrays and from RNAseq experiments. If we want to use different experiments in a computational experiment, we need to consider very carefully how to prepare comparable values.

To begin, please read the following paper:

Taroni & Greene (2017) Cross-platform normalization enables machine learning model training on microarray and RNA-seq data simultaneously (BioRχiv doi: https://doi.org/10.1101/118349)

We need expression datasets -

with good coverage;
not much older than ten years (quality!);
with sufficient numbers of replicates;
collected under interesting conditions;
mapped to unique human gene identifiers.

As the result of this task, you should prepare a script that will produce one reference and one experimental data set for human genes (from the same experiment).

To avoid mistakes in praparing the dataset, discuss with your team members, or post questions on the mailing list. You are encouraged to discuss strategies with anyone however the script you submit must be entirely your own and you must not copy code (apart from the script template) from elsewhere.

Select an Expression Data Set

Task:

Clean it and impute missing data

Task:

Apply Quantile Normalization (QN)

Task:

Post a script that will download the dataset and perform all required operations.

Interpret

Be prepared to answer the following questions

Task:

What is the coverage of your dataset?

Notes

↑ Note: oral exams will focus on the content of Integrator Units, but will also cover material that leads up to it. All exams in this course are cumulative.

ABC-INT-Expression data

Contents

Evaluation

Contents

Select an Expression Data Set

Clean it and impute missing data

Apply Quantile Normalization (QN)

Interpret

Notes

Further reading, links and resources

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Sections

Tools