CSB Assignment Week 2

From "A B C"
Jump to navigation Jump to search

Assignments for Week 2


Note! This assignment is currently inactive. Major and minor unannounced changes may be made at any time.

 
 

Assigned material will be reflected on next week's quiz. Please remember to contribute to quiz questions by Tuesday, 20:00.



Special dates
  • Post your workflow sketch by Monday.
  • Do the R tasks when they are announced.
  • Post your quiz questions by Tuesday, 20:00.
  • All other tasks are due by next week's class.


Warm up

You go to the Toronto Zoo. You see giraffes, ostriches and a green tree python. Altogether they have 30 eyes and 44 legs.

How many necks do these animals have? [I don't know...]

Seriously?
This not so hard.
Maybe you are wondering whether snakes have necks? (TLDR; It's complicated. But: yes.)
Or do you need a hint ... [Ok. A hint please...]

Maybe you are just confused by some irrelevant information.[No. I still don't get it...]

It's really quite simple. Thirty eyes are in fifteen heads. Fifteen heads attached to fifteen necks. Fifteeeen. No more. No less.
How many of each? You can't tell. Not less than two giraffes. Not more than twenty ostriches. And no more then ten giraffes. And one snake. But that wasn't the question.


Towards systems discovery

In class, we have discussed a number of data sources, the exemplar workflows of the papers you have posted, and some strategies to determine whether genes could be functionally interacting, or "collaborating" with each other. I have distilled the data sources and the strategies into tables that I have posted on the Student Wiki's project resource section.

Task:

Existing databases and strategies
  1. Study the Data Sources page on the student Wiki. Navigate to the linked databases. Browse around. Get a sense of what data is available and how it can be accessed.
  2. For one of the databases, fill in the data access information.
  3. Study the System Discovery Strategies page on the student Wiki.
    1. Think about the listed strategies.
    2. See if you can add information.
    3. See if you can add a strategy.
    4. See if you can add a comment.

 

New workflows

I have put a Workflow Collection page on the student Wiki.

  1. Create a "Project" subpage on your User page (follow the instructions from Assignment 1). On that page draft a workflow for data driven systems discovery using data/strategies of your choice. Keep this maximally brief (not more than three or four sentences). But be specific: make sure that the data you need is actually available, the algorithms are defined, and the computations are tractable. Discuss this on the list if you wish, or simply ask for feedback on your idea.
  2. Transclude your paragraph to the Workflow Collection (instructions are there).


 


 

Software Development

  • Habits (Projects, IDE and debugging, Version control)
  • Collaboration (Wiki, Git, Etherpad)
  • Development (TDD, Literate Development)
  • Testing (Unit testing, Integration testing)


  • Work through R Studio development tutorial (TBD)
  • Work through github tutorial (TBD)


 


 

Pre-reading

In week 3, we will discuss various aspects of working with genome-scale data sets. For many experimental approaches, the ultimate outcome is a list of genes and the challenge is how to infer information from what such lists have in common:

Kim (2012) Chapter 8: Biological knowledge assembly and interpretation. PLoS Comput Biol 8:e1002858. (pmid: 23300429)

PubMed ] [ DOI ] Most methods for large-scale gene expression microarray and RNA-Seq data analysis are designed to determine the lists of genes or gene products that show distinct patterns and/or significant differences. The most challenging and rate-liming step, however, is to determine what the resulting lists of genes and/or transcripts biologically mean. Biomedical ontology and pathway-based functional enrichment analysis is widely used to interpret the functional role of tightly correlated or differentially expressed genes. The groups of genes are assigned to the associated biological annotations using Gene Ontology terms or biological pathways and then tested if they are significantly enriched with the corresponding annotations. Unlike previous approaches, Gene Set Enrichment Analysis takes quite the reverse approach by using pre-defined gene sets. Differential co-expression analysis determines the degree of co-expression difference of paired gene sets across different conditions. Outcomes in DNA microarray and RNA-Seq data can be transformed into the graphical structure that represents biological semantics. A number of biomedical annotation and external repositories including clinical resources can be systematically integrated by biological semantics within the framework of concept lattice analysis. This array of methods for biological knowledge assembly and interpretation has been developed during the past decade and clearly improved our biological understanding of large-scale genomic data from the high-throughput technologies.