Difference between revisions of "FND-MAT-Graphs and networks"
m |
m |
||
Line 44: | Line 44: | ||
<!-- included from "ABC-unit_components.wtxt", section: "notes-prerequisites" --> | <!-- included from "ABC-unit_components.wtxt", section: "notes-prerequisites" --> | ||
You need to complete the following units before beginning this one: | You need to complete the following units before beginning this one: | ||
− | *[[RPR-Introduction]] | + | *[[RPR-Introduction|RPR-Introduction (Introduction to R)]] |
{{Vspace}} | {{Vspace}} | ||
Line 51: | Line 51: | ||
=== Objectives === | === Objectives === | ||
<!-- included from "../components/FND-MAT-Graphs_and_networks.components.wtxt", section: "objectives" --> | <!-- included from "../components/FND-MAT-Graphs_and_networks.components.wtxt", section: "objectives" --> | ||
− | ... | + | This unit will ... |
+ | * ... introduce ; | ||
+ | * ... demonstrate ; | ||
+ | * ... teach ; | ||
{{Vspace}} | {{Vspace}} | ||
Line 58: | Line 61: | ||
=== Outcomes === | === Outcomes === | ||
<!-- included from "../components/FND-MAT-Graphs_and_networks.components.wtxt", section: "outcomes" --> | <!-- included from "../components/FND-MAT-Graphs_and_networks.components.wtxt", section: "outcomes" --> | ||
− | ... | + | After working through this unit you ... |
+ | * ... can ; | ||
+ | * ... are familar with ; | ||
+ | * ... have begun to. | ||
{{Vspace}} | {{Vspace}} | ||
Line 77: | Line 83: | ||
=== Evaluation === | === Evaluation === | ||
<!-- included from "../components/FND-MAT-Graphs_and_networks.components.wtxt", section: "evaluation" --> | <!-- included from "../components/FND-MAT-Graphs_and_networks.components.wtxt", section: "evaluation" --> | ||
− | <!-- included from "ABC-unit_components.wtxt", section: " | + | This learning unit can be evaluated for a maximum of 6 marks. If you want to submit tasks for this unit for credit you have the following options. If you have any questions about these options, discuss on the mailing list. |
− | <b>Evaluation: | + | |
− | :This unit | + | ; Context |
+ | :The R project for the learning units contains a network that I have prepared from functional interaction data from the STRING database, subset with GO annotations for yeast. STRING curates networks of genes that have functional interaction, based on a variety of data such as experimental evidence, literature mining and more. GO (Gene Ontology) categorizes genes with functional annotations. The network I have prepared contains all yeast genes that have been annotated to the <code>"mitotic cell cycle"</code> and that have high-confidence functional interactions as determined by STRING. It contains 261 genes and 1280 interactions. You can load the edge list with the command ... | ||
+ | |||
+ | <source> | ||
+ | load("./data/scCCnet.RData") | ||
+ | </source> | ||
+ | ... which will place the "''tibble''" <code>scCCnet</code> (''S. cerevisiae'' Cell Cycle network) into your workspace. ''tibbles'' are modern versions of data frames and can be used just like data frames for most practical purposes<ref>Post o the mailing list in case you come into a situation where you are trying to do something that ought to have worked with a data frame, but does not work in the same way with a ''tibble''.</ref>. The code I wrote to create this ''tibble'' is in <code>./scripts/ABC-makeScCCnet.R</code> - it may be worthwhile for you to look at the script to get a better understanding what this data is. | ||
+ | |||
+ | ; Short Report option | ||
+ | :'''1.''' Create a new page on the student Wiki as a subpage of your User Page. | ||
+ | :'''2.''' Write a short report on '''one of the three following topics - A, B, or C'''. The goal of this short report is to connect graph measures to the biology of the nodes. (All reports must have the R code you wrote in an appendix.) | ||
+ | |||
+ | ::'''A.1''' Create an informative overview plot of the network to highlight the degree distribution. I expect that <code>layout_with_graphopt()</code> will work reasonably well, but you may need to play with the parameters. Color and scale the nodes according to their degree. | ||
+ | ::'''A.2''' Plot a log-frequency against log-rank plot of the degree distribution. Does this look like a scale-free network? Interpret this result. What are the highest-degree nodes? What are the lowest-degree nodes? Do genes in these categories have anything in common? | ||
+ | |||
+ | {{Smallvspace}} | ||
+ | |||
+ | ::'''B.1''' Create an informative overview plot of the network to highlight the centrality scores of the nodes. Choose one of: betweenness centrality, closeness centrality, or eigenvector centrality for your analysis. I expect that <code>layout_with_graphopt()</code> will work reasonably well for the plot, but you may need to play with the parameters. Color and scale the nodes according to their centrality score. | ||
+ | ::'''B.2''' Order the nodes according to their score. Choose the nodes with the 10 highest centrality scores. Interpret the results. What are the highest-centrality nodes? Do genes with high centrality have anything in common? | ||
+ | |||
+ | {{Smallvspace}} | ||
+ | |||
+ | ::'''C.1''' Create an informative overview plot of the network to highlight the community structure of the network. Choose <code>cluster_infomap()</code> and one of the eight other clustering algorithms of igraph. I expect that <code>layout_with_graphopt()</code> will work reasonably well for the plot, but you may need to play with the parameters. Color and scale the nodes according to their community membership. | ||
+ | ::'''C.2''' Compare the two methods. Do the clusters overlap? Are they very different? Considering the biology of the nodes, which clustering method was more successful? | ||
+ | ::'''C.3''' For the "better" of the two methods, analyse the top 4 communities. Why do these genes have more functional interactions with each other than with other genes? Do the memebrs of these clusters have anything in common? | ||
+ | |||
+ | {{Smallvspace}} | ||
+ | |||
+ | ::Hint: The Saccharomycs Genome Database hosts [https://yeastmine.yeastgenome.org/yeastmine/bag.do '''YeastMine'''] - a Web tool into which you can paste a number of yeast gene IDs, and retrieve summary information including symbol, '''name''', and sequence length. | ||
+ | |||
+ | {{Smallvspace}} | ||
+ | |||
+ | :'''3.''' When you are done with your report, add the following category tag to the page: | ||
+ | ::<code><nowiki>[[Category:Eval-FND-Graphs]]</nowiki></code> | ||
+ | :'''Do not''' change your submission page after this tag has been added. The page will be marked and the category tag will be removed by the instructor. | ||
+ | |||
+ | {{Smallvspace}} | ||
+ | |||
+ | <!-- | ||
+ | ; Tasks submission option | ||
+ | :# Create a new page on the student Wiki as a subpage of your User Page. | ||
+ | :# There are a number of tasks in which you are explicitly asked you to submit code or other text for credit. Put all of these submission on this one page. | ||
+ | :# When you are done with everything, add the following category tag to the page: | ||
+ | ::<code><nowiki>[[Category:EVAL-FND-Graphs]]</nowiki></code> | ||
+ | :'''Do not''' change your submission page after this tag has been added. The page will be marked and the category tag will be removed by the instructor. | ||
+ | --> | ||
+ | |||
+ | ; Quiz option | ||
+ | : Open the [http://steipe.biochemistry.utoronto.ca/abc/students/index.php/Signup-FND-Graphs_Quiz '''signup-page for the quiz for this unit (linked from here)'''] and add your name. Your name must be signed up by 12:00 of the day of the Quiz to ensure copies of the quiz are available for all participants. | ||
+ | <!-- included from "ABC-unit_components.wtxt", section: "quiz-mechanics" --> | ||
+ | <b>Evaluation: Quiz</b><br /> | ||
+ | :Quizzes will be written in class, back-to-back if there is more than one quiz scheduled. We may begin at any time. We will have an open-ended Q&A session before the quiz. You can't take the quiz if you are not present in class when the question sheets are handed out, so don't be late. Once all scheduled quizzes are written, we will discuss and mark them. You will mark your own quiz. All marking must be done with a red pen - so you '''must''' bring a red pen to class in order to participate. The mark you give yourself may be revised by the instructor after spot-checking quizzes. If this is necessary, you will be notified. You must mark your quiz correctly and honestly - don't get into trouble with academic integrity rules: it will be an academic offence if you mark questions as correct that were discussed in class and should have been marked incorrect. When in doubt, ask. | ||
+ | |||
+ | {{Smallvspace}} | ||
+ | |||
+ | ; R-code option | ||
+ | :The graph defined by the <code>scCCnet</code> edgelist contains a pronounced community structure with several large modules of similar size. This has an obvious biological interpretation of systems of functionally interacting genes. But is this an independent feature that has evolved due to functional requirements in these nodes? Or is it simply a consequence of the degree distribution? Your task is to assess this question by comparing the <code>scCCnet</code> graph to random graphs with the same degree distribution. | ||
+ | :Submit code according to the following requirements. Make sure your code is documented and that you have tested your functions to give correct output. | ||
+ | :*Write a function that takes an igraph graph as input and outputs the size difference between the largest and the second largest community, as defined by <code>cluster_infomap()</code>. | ||
+ | :*Write a function that takes an igraph graph as input and returns a random graph with the same degree distribution. Use the <code>sample_degseq()</code> function internally, with the <code>"vl"</code> method. | ||
+ | :*Write a script that produces a vector of 1,000 results from your first function, calculated for a graph produced by your second function using the <code>scCCnet</code> graph as input. | ||
+ | :*Plot and interpret the result. | ||
+ | :#Create a new page on the student Wiki as a subpage of your User Page. Put your code there. | ||
+ | :#When you are done with developing this contents, add the following category tag to the page: | ||
+ | ::<code><nowiki>[[Category:EVAL-FND-Graphs]]</nowiki></code> | ||
+ | :'''Do not''' change your submission page after this tag has been added. The page will be marked and the category tag will be removed by the instructor. | ||
+ | |||
+ | {{Smallvspace}} | ||
+ | |||
+ | ; Option to write a "Self-Evaluation Question" | ||
+ | :# Write a "Self-evaluation Question" (with a model solution) that explores the actual use of centrality measures in the recent biomedical literature. Find a recent article (not older than 2 years) that uses a centrality score in the analysis of biological networks. Summarize the use of this measure, and then quote a passage from the article that describes how the measure was used. Then ask the learner to paraphrase what this passage means, and how the measure relates to the biology that is being described. Ensure that the answer is feasible, given what was discussed in the unit, and what was summarized and quoted by you. The goal is for the learner to think about the biological interpretation of graph-theoretical measures obtained from biological networks. Apply the [[ABC-Rubrics| '''marking rubrics''']] in spirit to satisfy yourself of the quality of your question. Use the format and code templates that you find on the [[Self_evaluation_questions|'''Self evaluation questions page''']] - but don't assume those examples are already models of excellent contributions. This will be a short-answer format question. | ||
+ | :#Create a new page on the student Wiki as a subpage of your User Page. Develop your question there. | ||
+ | :#When you are done with developing this contents, add the following category tag to the page: | ||
+ | ::<code><nowiki>[[Category:EVAL-FND-Graphs]]</nowiki></code> | ||
+ | :'''Do not''' change your submission page after this tag has been added. The page will be marked and the category tag will be removed by the instructor. | ||
{{Vspace}} | {{Vspace}} | ||
Line 91: | Line 171: | ||
{{Task|1= | {{Task|1= | ||
* Read the introductory notes on {{ABC-PDF|FND-MAT-Graphs_and_networks|graph theory and network science}}. | * Read the introductory notes on {{ABC-PDF|FND-MAT-Graphs_and_networks|graph theory and network science}}. | ||
+ | * For a useful overview of graph-theory concepts you could additionally have a look at: | ||
+ | {{#pmid: 21527005}} | ||
+ | |||
}} | }} | ||
+ | {{Smallvspace}} | ||
+ | {{ABC-unit|FND-MAT-Graphs_and_networks.R}} | ||
− | {{ | + | {{Vspace}} |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
Line 177: | Line 254: | ||
:2017-08-05 | :2017-08-05 | ||
<b>Version:</b><br /> | <b>Version:</b><br /> | ||
− | :0 | + | :1.0 |
<b>Version history:</b><br /> | <b>Version history:</b><br /> | ||
− | *0.1 | + | *1.0 First live version |
+ | *0.1 Stub only | ||
</div> | </div> | ||
[[Category:ABC-units]] | [[Category:ABC-units]] |
Revision as of 12:40, 7 October 2017
Graphs and Networks
Keywords: Intoduction to graph theory and network science; iGraph
Contents
This unit is under development. There is some contents here but it is incomplete and/or may change significantly: links may lead to nowhere, the contents is likely going to be rearranged, and objectives, deliverables etc. may be incomplete or missing. Do not work with this material until it is updated to "live" status.
Abstract
...
This unit ...
Prerequisites
You need the following preparation before beginning this unit. If you are not familiar with this material from courses you took previously, you need to prepare yourself from other information sources:
- Calculus: functions and equations; polynomial functions, logarithms, trigonometric functions; integrals and derivatives; theorem and proof.
You need to complete the following units before beginning this one:
Objectives
This unit will ...
- ... introduce ;
- ... demonstrate ;
- ... teach ;
Outcomes
After working through this unit you ...
- ... can ;
- ... are familar with ;
- ... have begun to.
Deliverables
- Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
- Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.
- Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.
Evaluation
This learning unit can be evaluated for a maximum of 6 marks. If you want to submit tasks for this unit for credit you have the following options. If you have any questions about these options, discuss on the mailing list.
- Context
- The R project for the learning units contains a network that I have prepared from functional interaction data from the STRING database, subset with GO annotations for yeast. STRING curates networks of genes that have functional interaction, based on a variety of data such as experimental evidence, literature mining and more. GO (Gene Ontology) categorizes genes with functional annotations. The network I have prepared contains all yeast genes that have been annotated to the
"mitotic cell cycle"
and that have high-confidence functional interactions as determined by STRING. It contains 261 genes and 1280 interactions. You can load the edge list with the command ...
load("./data/scCCnet.RData")
... which will place the "tibble" scCCnet
(S. cerevisiae Cell Cycle network) into your workspace. tibbles are modern versions of data frames and can be used just like data frames for most practical purposes[1]. The code I wrote to create this tibble is in ./scripts/ABC-makeScCCnet.R
- it may be worthwhile for you to look at the script to get a better understanding what this data is.
- Short Report option
- 1. Create a new page on the student Wiki as a subpage of your User Page.
- 2. Write a short report on one of the three following topics - A, B, or C. The goal of this short report is to connect graph measures to the biology of the nodes. (All reports must have the R code you wrote in an appendix.)
- A.1 Create an informative overview plot of the network to highlight the degree distribution. I expect that
layout_with_graphopt()
will work reasonably well, but you may need to play with the parameters. Color and scale the nodes according to their degree. - A.2 Plot a log-frequency against log-rank plot of the degree distribution. Does this look like a scale-free network? Interpret this result. What are the highest-degree nodes? What are the lowest-degree nodes? Do genes in these categories have anything in common?
- A.1 Create an informative overview plot of the network to highlight the degree distribution. I expect that
- B.1 Create an informative overview plot of the network to highlight the centrality scores of the nodes. Choose one of: betweenness centrality, closeness centrality, or eigenvector centrality for your analysis. I expect that
layout_with_graphopt()
will work reasonably well for the plot, but you may need to play with the parameters. Color and scale the nodes according to their centrality score. - B.2 Order the nodes according to their score. Choose the nodes with the 10 highest centrality scores. Interpret the results. What are the highest-centrality nodes? Do genes with high centrality have anything in common?
- B.1 Create an informative overview plot of the network to highlight the centrality scores of the nodes. Choose one of: betweenness centrality, closeness centrality, or eigenvector centrality for your analysis. I expect that
- C.1 Create an informative overview plot of the network to highlight the community structure of the network. Choose
cluster_infomap()
and one of the eight other clustering algorithms of igraph. I expect thatlayout_with_graphopt()
will work reasonably well for the plot, but you may need to play with the parameters. Color and scale the nodes according to their community membership. - C.2 Compare the two methods. Do the clusters overlap? Are they very different? Considering the biology of the nodes, which clustering method was more successful?
- C.3 For the "better" of the two methods, analyse the top 4 communities. Why do these genes have more functional interactions with each other than with other genes? Do the memebrs of these clusters have anything in common?
- C.1 Create an informative overview plot of the network to highlight the community structure of the network. Choose
- Hint: The Saccharomycs Genome Database hosts YeastMine - a Web tool into which you can paste a number of yeast gene IDs, and retrieve summary information including symbol, name, and sequence length.
- 3. When you are done with your report, add the following category tag to the page:
[[Category:Eval-FND-Graphs]]
- Do not change your submission page after this tag has been added. The page will be marked and the category tag will be removed by the instructor.
- Quiz option
- Open the signup-page for the quiz for this unit (linked from here) and add your name. Your name must be signed up by 12:00 of the day of the Quiz to ensure copies of the quiz are available for all participants.
Evaluation: Quiz
- Quizzes will be written in class, back-to-back if there is more than one quiz scheduled. We may begin at any time. We will have an open-ended Q&A session before the quiz. You can't take the quiz if you are not present in class when the question sheets are handed out, so don't be late. Once all scheduled quizzes are written, we will discuss and mark them. You will mark your own quiz. All marking must be done with a red pen - so you must bring a red pen to class in order to participate. The mark you give yourself may be revised by the instructor after spot-checking quizzes. If this is necessary, you will be notified. You must mark your quiz correctly and honestly - don't get into trouble with academic integrity rules: it will be an academic offence if you mark questions as correct that were discussed in class and should have been marked incorrect. When in doubt, ask.
- R-code option
- The graph defined by the
scCCnet
edgelist contains a pronounced community structure with several large modules of similar size. This has an obvious biological interpretation of systems of functionally interacting genes. But is this an independent feature that has evolved due to functional requirements in these nodes? Or is it simply a consequence of the degree distribution? Your task is to assess this question by comparing thescCCnet
graph to random graphs with the same degree distribution. - Submit code according to the following requirements. Make sure your code is documented and that you have tested your functions to give correct output.
- Write a function that takes an igraph graph as input and outputs the size difference between the largest and the second largest community, as defined by
cluster_infomap()
. - Write a function that takes an igraph graph as input and returns a random graph with the same degree distribution. Use the
sample_degseq()
function internally, with the"vl"
method. - Write a script that produces a vector of 1,000 results from your first function, calculated for a graph produced by your second function using the
scCCnet
graph as input. - Plot and interpret the result.
- Create a new page on the student Wiki as a subpage of your User Page. Put your code there.
- When you are done with developing this contents, add the following category tag to the page:
[[Category:EVAL-FND-Graphs]]
- Write a function that takes an igraph graph as input and outputs the size difference between the largest and the second largest community, as defined by
- Do not change your submission page after this tag has been added. The page will be marked and the category tag will be removed by the instructor.
- Option to write a "Self-Evaluation Question"
-
- Write a "Self-evaluation Question" (with a model solution) that explores the actual use of centrality measures in the recent biomedical literature. Find a recent article (not older than 2 years) that uses a centrality score in the analysis of biological networks. Summarize the use of this measure, and then quote a passage from the article that describes how the measure was used. Then ask the learner to paraphrase what this passage means, and how the measure relates to the biology that is being described. Ensure that the answer is feasible, given what was discussed in the unit, and what was summarized and quoted by you. The goal is for the learner to think about the biological interpretation of graph-theoretical measures obtained from biological networks. Apply the marking rubrics in spirit to satisfy yourself of the quality of your question. Use the format and code templates that you find on the Self evaluation questions page - but don't assume those examples are already models of excellent contributions. This will be a short-answer format question.
- Create a new page on the student Wiki as a subpage of your User Page. Develop your question there.
- When you are done with developing this contents, add the following category tag to the page:
[[Category:EVAL-FND-Graphs]]
- Do not change your submission page after this tag has been added. The page will be marked and the category tag will be removed by the instructor.
Contents
Task:
- Read the introductory notes on graph theory and network science.
- For a useful overview of graph-theory concepts you could additionally have a look at:
Pavlopoulos et al. (2011) Using graph theory to analyze biological networks. BioData Min 4:10. (pmid: 21527005) |
[ PubMed ] [ DOI ] Understanding complex systems often requires a bottom-up analysis towards a systems biology approach. The need to investigate a system, not only as individual components but as a whole, emerges. This can be done by examining the elementary constituents individually and then how these are connected. The myriad components of a system and their interactions are best characterized as networks and they are mainly represented as graphs where thousands of nodes are connected with thousands of vertices. In this article we demonstrate approaches, models and methods from the graph theory universe and we discuss ways in which they can be used to reveal hidden properties and features of a network. This network profiling combined with knowledge extraction will help us to better understand the biological significance of the system. |
Task:
- Open RStudio and load the
ABC-units
R project. If you have loaded it before, choose File → Recent projects → ABC-Units. If you have not loaded it before, follow the instructions in the RPR-Introduction unit. - Choose Tools → Version Control → Pull Branches to fetch the most recent version of the project from its GitHub repository with all changes and bug fixes included.
- Type
init()
if requested. - Open the file
FND-MAT-Graphs_and_networks.R
and follow the instructions.
Note: take care that you understand all of the code in the script. Evaluation in this course is cumulative and you may be asked to explain any part of code.
Further reading, links and resources
Notes
- ↑ Post o the mailing list in case you come into a situation where you are trying to do something that ought to have worked with a data frame, but does not work in the same way with a tibble.
Self-evaluation
If in doubt, ask! If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.
About ...
Author:
- Boris Steipe <boris.steipe@utoronto.ca>
Created:
- 2017-08-05
Modified:
- 2017-08-05
Version:
- 1.0
Version history:
- 1.0 First live version
- 0.1 Stub only
This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.