Difference between revisions of "BIO Assignment Week 1"
m (→Wiki) |
m (→Wiki) |
||
Line 44: | Line 44: | ||
** enter your name, | ** enter your name, | ||
** your major(s), specialist program, year of study - or your lab and thesis theme if you are a graduate student; | ** your major(s), specialist program, year of study - or your lab and thesis theme if you are a graduate student; | ||
− | ** and your eMail address. <small>I use this information a lot | + | ** and your eMail address. <small>I use this information a lot when I need to contact students, so make sure it is correct and current</small>. |
** Add a category tag to your User page for this year's BCH441 course. <small>All pages with this tag are accessible via the link in the sidebar.</small> | ** Add a category tag to your User page for this year's BCH441 course. <small>All pages with this tag are accessible via the link in the sidebar.</small> | ||
Revision as of 20:26, 14 September 2016
Assignment for Week 1
Preparations: Wiki editing, Chimera and R
Assignment 2 > |
Note! This assignment is currently active. All significant changes will be announced on the mailing list.
Concepts and activities (and reading, if applicable) for this assignment will be topics on next week's quiz.
Contents
The Assignment
In this assignment you will:
- familiarize yourself with basic Wiki editing on the Student Wiki;
- install the statistics workbench R, and the RStudio user interface, and work through parts of an introductory tutorial.
Caution: this is a lengthy assignment and can't be done in one day. Work on it every day, or better every morning and evening. A lot of this has to do with first steps of learning the R programming language and you need constant repetition to bring this material into active memory. Cramming everything in a single, desperate effort makes you forget things quickly and is a waste of your time.
Wiki
Collaboration is a common theme for modern lab work and a Wiki is a great way to share and seamlessly update information in groups - or just for yourself. Probably the most sophisticated Wiki software is MediaWiki, a set of PHP scripts that is under continuous development by the Wikimedia foundation; it is the same software that runs Wikipedia. This is open source, free software that is easy to install, is well documented and requires very little resources other than a machine that runs a MySQL database server and an Apache Webserver. Numerous extensions exist (and extensions are not hard to write); they enhance the already rich functionality. But let's start with small steps. I will create an account for you on the Student Wiki, and I have configured the Wiki so that
- only logged in users can view the pages;
- all logged in users can create and edit pages at will.
This means you could edit pages that don't "belong" to you. Respect the "House Rules" and don't edit other's things without permission, even if you can think of a particularly witty comment or hilarious prank. If you want to comment on a page: every page has an associated "Discussion" page that you can freely edit. Remember to "sign your name" to discussion entries.
Task:
- Access the Student Wiki;
- log in and navigate to your user page;
- open the "Help" link in the left-hand sidebar in a separate tab;
- follow the link to the "Editing" page on the Student Wiki;
- try and learn basic editing syntax by editing your User Page:
- enter your name,
- your major(s), specialist program, year of study - or your lab and thesis theme if you are a graduate student;
- and your eMail address. I use this information a lot when I need to contact students, so make sure it is correct and current.
- Add a category tag to your User page for this year's BCH441 course. All pages with this tag are accessible via the link in the sidebar.
Feel free to look at my User Page for code examples: clicking on the edit link will show you the source text. How do you find my User Page? Good question ...
- Create a subpage to your User Page; call it "Resources" or something similar. Note: the link MUST be in your "User space". If you don't add the prefix
User:yourname/...
before your page name, the new page will end up in the main "namespace". I'll then have to delete it. That's not good because you have then failed this part of the assignment. Make sure you know what you are doing, for example by looking at the code on my User Page, asking someone who knows, or asking on the mailing list. - Put some text on your new page - perhaps a link to a Wikipedia article, or to PubMed, or to the NCBI. Make sure you understand the difference between an internal link and an external link (they have slightly different formats), and you understand the concept of namespace and categories. Also add a category link to that page.
- Play around some more. Feel free to ask how to go about achieving a particular effect that you may have seen elsewhere.
For next week, you should be comfortable with the following mark-up conventions and concepts:
- Login and accessing your user page;
- viewing a page's history;
- basic text formatting;
- "signing" your name;
- creating internal and external links;
- creating section headers on a page on multiple levels;
- reverting a changed page to an earlier version;
- creating a new page (as a subpage of an existing page);
- the concept of namespaces - especially the default ("main") and
User:
namespace; - the concept of categories.
I expect that there may be aspects of the Wiki you find puzzling, it is after all a complex piece of software that supports the world's largest collaborative project and one of the busiest sites on the Internet. Do ask about these things on the mailing list. My first encounter with Wikis is a while back and I can't remember everything I was initially confused about.
R
The R statistics environment and programming language is an exceptionally well engineered, free (as in free speech) and free (as in free beer) platform for data manipulation and analysis. The number of functions that are included by default is large, there is a very large number of additional, community-generated analysis modules that can be simply imported from dedicated sites (e.g. the Bioconductor project for molecular biology data), or via the CRAN network, and whatever function is not available can be easily programmed. The ability to filter and manipulate data to prepare it for analysis is an absolute requirement in research-centric fields such as ours, where the strategies for analysis are constantly shifting and prepackaged solutions become obsolete almost faster than they can be developed. Besides numerical analysis, R has very powerful and flexible functions for plotting graphical output.
Learning to work with R code and an introduction to programming in R is one focus of the course.
If any of this material is confusing, discuss it on the mailing list. At the end of this assignment you should have a working installation of R and RStudio, be able to check out material from github, be competent to read expressions in basic R syntax, be able to predict their result and spot syntax errors, familiar with the concepts in the tutorial, and able to write your own expressions.
Links and resources
<! --
- Further reading
Nguyen et al. (2015) DYVIPAC: an integrated analysis and visualisation framework to probe multi-dimensional biological networks. Sci Rep 5:12569. (pmid: 26220783) |
[ PubMed ] [ DOI ] Biochemical networks are dynamic and multi-dimensional systems, consisting of tens or hundreds of molecular components. Diseases such as cancer commonly arise due to changes in the dynamics of signalling and gene regulatory networks caused by genetic alternations. Elucidating the network dynamics in health and disease is crucial to better understand the disease mechanisms and derive effective therapeutic strategies. However, current approaches to analyse and visualise systems dynamics can often provide only low-dimensional projections of the network dynamics, which often does not present the multi-dimensional picture of the system behaviour. More efficient and reliable methods for multi-dimensional systems analysis and visualisation are thus required. To address this issue, we here present an integrated analysis and visualisation framework for high-dimensional network behaviour which exploits the advantages provided by parallel coordinates graphs. We demonstrate the applicability of the framework, named "Dynamics Visualisation based on Parallel Coordinates" (DYVIPAC), to a variety of signalling networks ranging in topological wirings and dynamic properties. The framework was proved useful in acquiring an integrated understanding of systems behaviour. |
Jayaswal et al. (2013) VAN: an R package for identifying biologically perturbed networks via differential variability analysis. BMC Res Notes 6:430. (pmid: 24156242) |
[ PubMed ] [ DOI ] BACKGROUND: Large-scale molecular interaction networks are dynamic in nature and are of special interest in the analysis of complex diseases, which are characterized by network-level perturbations rather than changes in individual genes/proteins. The methods developed for the identification of differentially expressed genes or gene sets are not suitable for network-level analyses. Consequently, bioinformatics approaches that enable a joint analysis of high-throughput transcriptomics datasets and large-scale molecular interaction networks for identifying perturbed networks are gaining popularity. Typically, these approaches require the sequential application of multiple bioinformatics techniques - ID mapping, network analysis, and network visualization. Here, we present the Variability Analysis in Networks (VAN) software package: a collection of R functions to streamline this bioinformatics analysis. FINDINGS: VAN determines whether there are network-level perturbations across biological states of interest. It first identifies hubs (densely connected proteins/microRNAs) in a network and then uses them to extract network modules (comprising of a hub and all its interaction partners). The function identifySignificantHubs identifies dysregulated modules (i.e. modules with changes in expression correlation between a hub and its interaction partners) using a single expression and network dataset. The function summarizeHubData identifies dysregulated modules based on a meta-analysis of multiple expression and/or network datasets. VAN also converts protein identifiers present in a MITAB-formatted interaction network to gene identifiers (UniProt identifier to Entrez identifier or gene symbol using the function generatePpiMap) and generates microRNA-gene interaction networks using TargetScan and Microcosm databases (generateMicroRnaMap). The function obtainCancerInfo is used to identify hubs (corresponding to significantly perturbed modules) that are already causally associated with cancer(s) in the Cancer Gene Census database. Additionally, VAN supports the visualization of changes to network modules in R and Cytoscape (visualizeNetwork and obtainPairSubset, respectively). We demonstrate the utility of VAN using a gene expression data from metastatic melanoma and a protein-protein interaction network from the Human Protein Reference Database. CONCLUSIONS: Our package provides a comprehensive and user-friendly platform for the integrative analysis of -omics data to identify disease-associated network modules. This bioinformatics approach, which is essentially focused on the question of explaining phenotype with a 'network type' and in particular, how regulation is changing among different states of interest, is relevant to many questions including those related to network perturbations across developmental timelines. |
Fung et al. (2012) Visualization of the interactome: what are we looking at?. Proteomics 12:1669-86. (pmid: 22610544) |
[ PubMed ] [ DOI ] Network visualization of the interactome has been become routine in systems biology research. Not only does it serve as an illustration on the cellular organization of protein-protein interactions, it also serves as a biological context for gaining insights from high-throughput data. However, the challenges to produce an effective visualization have been great owing to the fact that the scale, biological context and dynamics of any given interactome are too large and complex to be captured by a single visualization. Visualization design therefore requires a pragmatic trade-off between capturing biological concept and being comprehensible. In this review, we focus on the biological interpretation of different network visualizations. We will draw on examples predominantly from our experiences but elaborate them in the context of the broader field. A rich variety of networks will be introduced including interactomes and the complexome in 2D, interactomes in 2.5D and 3D and dynamic networks. |
-->
Footnotes and references
Ask, if things don't work for you!
- If anything about the assignment is not clear to you, please ask on the mailing list. You can be certain that others will have had similar problems. Success comes from joining the conversation.
- Do consider how to ask your questions so that a meaningful answer is possible:
- How to create a Minimal, Complete, and Verifiable example on stackoverflow and ...
- How to make a great R reproducible example are required reading.
Assignment 2 > |