Difference between revisions of "BIO Assignment Week 2"
m |
m |
||
Line 7: | Line 7: | ||
{{Template:active}} | {{Template:active}} | ||
+ | |||
+ | ;Parts labelled as "TBC" are in progress and will be made available as they are being completed. | ||
+ | |||
Concepts and activities (and reading, if applicable) for this assignment will be topics on next week's quiz. | Concepts and activities (and reading, if applicable) for this assignment will be topics on next week's quiz. |
Revision as of 19:33, 25 September 2015
Assignment for Week 2
Scenario, Labnotes, R-functions, Databases, Data Modeling
Note! This assignment is currently active. All significant changes will be announced on the mailing list.
- Parts labelled as "TBC" are in progress and will be made available as they are being completed.
Concepts and activities (and reading, if applicable) for this assignment will be topics on next week's quiz.
Contents
The Scenario
I have introduced the concept of "cargo cult science" in class. The "cargo" in Bioinformatics is to understand biology. This includes understanding how things came to be the way they are, and how they work. Both relate to the concept of function of biomolecules, and the systems they contribute to. But "function" is a rather poorly defined concept and exploring ways to make it rigorous and computable will be the major objective of this course. The realm of bioinformatics contains many kingdoms and duchies and shires and hidden glades. To find out how they contribute to the whole, we will proceed on a quest. We will take a relatively well-characterized protein that is part of a relatively well-characterized process, and ask what its function is. We will examine the protein's sequence, its structure, its domain composition, its relationship to and interactions with other proteins, and through that paint a picture of a "system" that it contributes to.
Our quest will revolve around a transcription factor that plays an important role in the regulation of the cell cycle: Mbp1 is a key component of the MBF complex (Mbp1/Swi6) in yeast. This complex regulates gene expression at the crucial G1/S-phase transition of the mitotic cell cycle and has been shown to bind to the regulatory regions of more than a hundred target genes. It is therefore a DNA binding protein that acts as a control switch for a key cellular process.
We will start our quest with information about the Mbp1 protein of Baker's yeast, Saccharomyces cerevisiae, one of the most important model organisms. Baker's yeast is a eukaryote that has been studied genetically and biochemically in great detail for many decades, and it is easily manipulated with high-throughput experimental methods. But each of you will use this information to study not Baker's yeast, but a related organism. You will explore the function of the Mbp1 protein in some other species from the kingdom of fungi, whose genome has been completely sequenced; thus our quest is also an exercise in model-organism reasoning: the transfer of knowledge from one, well-studied organism to others.
It's reasonable to hypothesize that such central control machinery is conserved in most if not all fungi. But we don't know. Many of the species that we will be working with have not been characterized in great detail, and some of them are new to our class this year. And while we know a fair bit about Mbp1, we probably don't know very much at all about the related genes in other organisms: whether they exist, whether they have similar functional features and whether they might contribute to the G1/S checkpoint system in a similar way. Thus we might discover things that are new and interesting. This is a quest of discovery.
Here are the steps of the assignment for this week:
- We'll need to explore what data is available for the Mbp1 protein.
- We'll need to pick a species to adopt for exploration.
- We'll need to define what data we want to store and design a datamodel.
However, before we head off into the Internet: have you thought about how to document such a "quest"? How will you keep notes? Obviously, computational research proceeds with the same best-practice principles as any wet-lab experiment. We have to keep notes, ensure our work is reproducible, and that our conclusions are supported by data. I think it's pretty obvious that paper notes are not very useful for bioinformatics work. Ideally, you should be able to save results, and link to files and Webpages.
Keeping Labnotes
Consider it a part of your assignment to document your activities in electronic form. Here are some applications you might think of - but (!) disclaimer, I myself don't use any of these (yet) (except the Wiki of course).
- Evernote - a web hosted, automatically syncing e-notebook.
- Nevernote - the Open Source alternative to Evernote.
- Google Keep - if you have a Gmail account, you can simply log in here. Grid-based. Seems a bit awkward for longer notes. But of course you can also use Google Docs.
- Microsoft OneNote - this sounds interesting and even though I have had my share of problems with Microsoft products, I'll probably give this a try. Syncing across platforms, being able to format contents and organize it sounds great.
- The Student Wiki - of course. You can keep your course notes with your User pages.
Are you aware of any other solutions? Let us know!
Keeping such a journal will be helpful, because the assignments are integrated over the entire term, and later assignments will make use of earlier results. But it is also excellent practice for "real" research. Expand the section below for details - written from a Wiki perspective but generally applicable.
Data Sources
SGD - a Yeast Model Organism Database
Yeast happens to have a very well maintained model organism database - a Web resource dedicated to Saccharomyces cerevisiae. Where such resources are available, they are very useful for the community. For the general case however, we need to work with one of the large, general data providers - the NCBI and the EBI. But in order to get a sense of the type of data that is available, let's visit the SGD database first.
Task:
Access the information page on Mbp1 at the Saccharomyces Genome Database.
- Browse through the Summary page and note the available information: you should see:
- information about the gene and the protein;
- Information about it's roles in the cell curated at the Gene Ontology database;
- Information about knock-out phenotypes; (Amazing. Would you have imagined that this is a non-essential gene?)
- Information about protein-protein interactions;
- Regulation and expression;
- A curators' summary of our understanding of the protein. Mandatory reading.
- And key references.
- Access the Protein tab and note the much more detailed information.
- Domains and their classification;
- Sequence;
- Shared domains;
- and much more...
You will notice that some of this information relates to the molecule itself, and some of it relates to its relationship with other molecules. Some of it is stored at SGD, and some of it is cross-referenced from other databases. And we have textual data, numeric data, and images.
How would you store such data to use it in your project? We will work on this question at the end of the assignment.
If we were working on yeast, most data we need is right here: curated, kept current and consistent, referenced to the literature and ready to use. But you'll be working on a different species and we'll explore the much, much larger databases at the NCBI for this. The upside is that most of the information like this is available for your species. The downside is that we'll have to integrate information from many different sources "by hand".
NCBI
TBC
Choosing YFO (Your Favourite Organism)
TBC
Data modelling
TBC
- That is all.
Links and resources
Footnotes and references
Ask, if things don't work for you!
- If anything about the assignment is not clear to you, please ask on the mailing list. You can be certain that others will have had similar problems. Success comes from joining the conversation.
- Do consider how to ask your questions so that a meaningful answer is possible:
- How to create a Minimal, Complete, and Verifiable example on stackoverflow and ...
- How to make a great R reproducible example are required reading.