Difference between revisions of "FND-MAT-Graphs and networks"
m |
m |
||
(8 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
<div id="ABC"> | <div id="ABC"> | ||
− | <div style="padding:5px; border: | + | <div style="padding:5px; border:4px solid #000000; background-color:#b3dbce; font-size:300%; font-weight:400; color: #000000; width:100%;"> |
Graphs and Networks | Graphs and Networks | ||
− | <div style="padding:5px; margin-top:20px; margin-bottom:10px; background-color:# | + | <div style="padding:5px; margin-top:20px; margin-bottom:10px; background-color:#b3dbce; font-size:30%; font-weight:200; color: #000000; "> |
(Intoduction to graph theory and network science; iGraph) | (Intoduction to graph theory and network science; iGraph) | ||
</div> | </div> | ||
Line 10: | Line 10: | ||
− | <div style="padding:5px; border:1px solid #000000; background-color:# | + | <div style="padding:5px; border:1px solid #000000; background-color:#b3dbce33; font-size:85%;"> |
<div style="font-size:118%;"> | <div style="font-size:118%;"> | ||
<b>Abstract:</b><br /> | <b>Abstract:</b><br /> | ||
<section begin=abstract /> | <section begin=abstract /> | ||
− | "Traditional" bioinformatics has focussed on the properties of individual genes, but we are developing a much more functionally oriented view that investigates gene '''relationships'''. Analyzing relationships is the domain of graph theory, where relationships are represented as edges between nodes. This unit introduces key concepts and terminology and puts working with graphs into practice with the igraph package in R. | + | "Traditional" bioinformatics has focussed on the properties of individual genes, but we are developing a much more functionally oriented view that investigates gene '''relationships'''. Analyzing relationships is the domain of graph theory, where relationships are represented as edges between nodes. This unit introduces key concepts and terminology and puts working with graphs into practice with the <tt>igraph::</tt> package in R. |
<section end=abstract /> | <section end=abstract /> | ||
</div> | </div> | ||
Line 25: | Line 25: | ||
This unit will ... | This unit will ... | ||
* ... introduce concepts and definitions of graph theory; | * ... introduce concepts and definitions of graph theory; | ||
− | * ... demonstrate how to work with graphs in R, using the igraph package. | + | * ... demonstrate how to work with graphs in R, using the <tt>igraph::</tt> package. |
</td> | </td> | ||
<td style="padding:10px;"> | <td style="padding:10px;"> | ||
Line 32: | Line 32: | ||
* ... know what the most important terms of graph theory mean; | * ... know what the most important terms of graph theory mean; | ||
* ... are familar with a number of quantitiative measures on graphs; | * ... are familar with a number of quantitiative measures on graphs; | ||
− | * ... can use igraph to construct graphs, plot graphs with a variety of layout algorithms, and analyse their degree distribution, centrality distribution, and community structure; | + | * ... can use <tt>igraph::</tt> to construct graphs, plot graphs with a variety of layout algorithms, and analyse their degree distribution, centrality distribution, and community structure; |
* ... can interpret such measures in a biological context. | * ... can interpret such measures in a biological context. | ||
</td> | </td> | ||
Line 41: | Line 41: | ||
<b>Deliverables:</b><br /> | <b>Deliverables:</b><br /> | ||
<section begin=deliverables /> | <section begin=deliverables /> | ||
− | |||
<li><b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.</li> | <li><b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.</li> | ||
− | |||
<li><b>Journal</b>: Document your progress in your [[FND-Journal|Course Journal]]. Some tasks may ask you to include specific items in your journal. Don't overlook these.</li> | <li><b>Journal</b>: Document your progress in your [[FND-Journal|Course Journal]]. Some tasks may ask you to include specific items in your journal. Don't overlook these.</li> | ||
− | |||
<li><b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]].</li> | <li><b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]].</li> | ||
<section end=deliverables /> | <section end=deliverables /> | ||
Line 52: | Line 49: | ||
<section begin=prerequisites /> | <section begin=prerequisites /> | ||
<b>Prerequisites:</b><br /> | <b>Prerequisites:</b><br /> | ||
− | |||
You need the following preparation before beginning this unit. If you are not familiar with this material from courses you took previously, you need to prepare yourself from other information sources:<br /> | You need the following preparation before beginning this unit. If you are not familiar with this material from courses you took previously, you need to prepare yourself from other information sources:<br /> | ||
− | |||
*<b>Calculus</b>: functions and equations; polynomial functions, logarithms, trigonometric functions; integrals and derivatives; theorem and proof. | *<b>Calculus</b>: functions and equations; polynomial functions, logarithms, trigonometric functions; integrals and derivatives; theorem and proof. | ||
− | |||
This unit builds on material covered in the following prerequisite units:<br /> | This unit builds on material covered in the following prerequisite units:<br /> | ||
*[[RPR-Introduction|RPR-Introduction (Introduction to R)]] | *[[RPR-Introduction|RPR-Introduction (Introduction to R)]] | ||
Line 66: | Line 60: | ||
− | |||
{{Smallvspace}} | {{Smallvspace}} | ||
Line 77: | Line 70: | ||
=== Evaluation === | === Evaluation === | ||
− | + | This learning unit can be evaluated for a maximum of 5 marks. There are several options for submission. Choose one option, then ... | |
− | This learning unit can be evaluated for a maximum of | + | <ol> |
+ | <li>Create a new page on the student Wiki as a subpage of your User Page.</li> | ||
+ | <li>Put all of your writing to submit on this one page.</li> | ||
+ | <li>When you are done with everything, go to the [https://q.utoronto.ca/courses/180416/assignments Quercus '''Assignments''' page] and open the first Learning Unit that you have not submitted yet. Paste the URL of your Wiki page into the form, and click on '''Submit Assignment'''.</li> | ||
+ | </ol> | ||
− | + | Your link can be submitted only once and not edited. But you may change your Wiki page at any time. However only the last version before the due date will be marked. All later edits will be silently ignored. | |
− | |||
− | < | + | {{Smallvspace}} |
− | + | ||
− | </ | + | <div class="reference-box"> |
− | ... which will place the "''tibble''" <code>scCCnet</code> (''S. cerevisiae'' Cell Cycle network) into your workspace. ''tibbles'' are modern versions of data frames and can be used just like data frames for most practical purposes<ref>Post to the | + | ;Context |
+ | :The R project for the learning units contains a network that I have prepared from functional interaction data from the STRING database, subset with GO annotations for yeast. STRING curates networks of genes that have functional interactions, based on a variety of data such as experimental evidence, literature mining and more. GO (Gene Ontology) categorizes genes with functional annotations. The network I have prepared contains all yeast genes that have been annotated to the <code>"mitotic cell cycle"</code> and that have high-confidence functional interactions as determined by STRING. You can load the edge list with the command ... | ||
+ | |||
+ | <pre> | ||
+ | scCCnet <- readRDS("./data/scCCnet.rds") | ||
+ | </pre> | ||
+ | |||
+ | ... which will place the "''tibble''" <code>scCCnet</code> (''S. cerevisiae'' Cell Cycle network) into your workspace. ''tibbles'' are modern versions of data frames and can be used just like data frames for most practical purposes<ref>Post to the discussion board in case you come into a situation where you are trying to do something that ought to have worked with a data frame, but does not work in the same way with a ''tibble''. There '''are''' some differences ...</ref>. The code I wrote to create this ''tibble'' is in <code>./scripts/ABC-makeScCCnet.R</code> - it may be worthwhile for you to look at the script to get a better understanding what this data is. All options require you to plot the network, I expect that <code>layout_with_graphopt()</code> will work reasonably well to define the layout, but <code>igraph::</code> has many options for graph layout and its worthwhile to try them. In all cases you may need to play with the parameters to find the proper balance of node distance and node size. | ||
+ | </div> | ||
; Short Report option | ; Short Report option | ||
Line 93: | Line 97: | ||
::'''A - Degree distribution''' | ::'''A - Degree distribution''' | ||
− | ::'''A.1''' Create an informative overview plot of the network to highlight the degree distribution. Color and scale the nodes according to their degree. | + | ::'''A.1''' Create an informative overview plot of the <code>scCCnet</code> network to highlight the degree distribution. Color and scale the nodes according to their degree. How well you choose the layout algorithm, and the plotting parameters for node color, size, and label-size, will determine part of your mark. "Excellent" submissions will include a legend in the plot. Obviously, the full plotting command will be included on your submitted page. |
::'''A.2''' Plot a log-frequency against log-rank plot of the degree distribution. Does this look like a scale-free network? Interpret this result. What are the highest-degree nodes? What are the lowest-degree nodes? Do genes in these categories have anything in common? | ::'''A.2''' Plot a log-frequency against log-rank plot of the degree distribution. Does this look like a scale-free network? Interpret this result. What are the highest-degree nodes? What are the lowest-degree nodes? Do genes in these categories have anything in common? | ||
::'''B - Centrality''' | ::'''B - Centrality''' | ||
− | ::'''B.1''' Create an informative overview plot of the network to highlight the centrality scores of the nodes. Choose one of: betweenness centrality, closeness centrality, or eigenvector centrality for your analysis. I expect that <code>layout_with_graphopt()</code> will | + | ::'''B.1''' Create an informative overview plot of the <code>scCCnet</code> network to highlight the centrality scores of the nodes. Choose one of: betweenness centrality, closeness centrality, or eigenvector centrality for your analysis. I expect that <code>igraph::layout_with_graphopt()</code> will provide a good starting point, but you may need to adjust the parameters. Color and scale the nodes according to their centrality score. How well you choose the layout algorithm, and the plotting parameters for node color, size, and label-size, will determine part of your mark. "Excellent" submissions will include a legend in the plot. Obviously, the full plotting command is to be included on your submitted page. |
::'''B.2''' Order the nodes according to their score. Choose the nodes with the 10 highest centrality scores. Interpret the results. What are the highest-centrality nodes? Do genes with high centrality have anything in common? | ::'''B.2''' Order the nodes according to their score. Choose the nodes with the 10 highest centrality scores. Interpret the results. What are the highest-centrality nodes? Do genes with high centrality have anything in common? | ||
::'''C - Clusters''' | ::'''C - Clusters''' | ||
− | ::'''C.1''' Create an informative overview plot of the network to highlight the community structure of the network. Choose <code>cluster_infomap()</code> and one of the eight other clustering algorithms of igraph. Color and scale the nodes according to their community membership. | + | ::'''C.1''' Create an informative overview plot of the network to highlight the community structure of the network. Choose <code>igraph::cluster_infomap()</code> and '''one of''' the eight other clustering algorithms of <tt>igraph::</tt> for comparison. Color and scale the nodes according to their community membership. How well you choose the layout algorithm, and the plotting parameters for node color, size, and label-size, will determine part of your mark. "Excellent" submissions will include a legend in the plot. Obviously, the full plotting command is to be included on your submitted page. |
::'''C.2''' Compare the two methods. Do the clusters overlap? Are they very different? Considering the biology of the nodes, which clustering method was more successful? | ::'''C.2''' Compare the two methods. Do the clusters overlap? Are they very different? Considering the biology of the nodes, which clustering method was more successful? | ||
− | ::'''C.3''' For the "better" of the two methods, analyse the top 4 communities. Why do these genes have more functional interactions with each other than with other genes? Do the | + | ::'''C.3''' For the "better" of the two methods, analyse the top 4 communities. Why do these genes have more functional interactions with each other than with other genes? Do the members of these clusters have anything in common? |
::Hint: The Saccharomycs Genome Database hosts [https://yeastmine.yeastgenome.org/yeastmine/bag.do '''YeastMine'''] - a Web tool into which you can paste a number of yeast gene IDs, and retrieve summary information including symbol, '''name''', and sequence length. The name is a good indicator of a gene's biological role. | ::Hint: The Saccharomycs Genome Database hosts [https://yeastmine.yeastgenome.org/yeastmine/bag.do '''YeastMine'''] - a Web tool into which you can paste a number of yeast gene IDs, and retrieve summary information including symbol, '''name''', and sequence length. The name is a good indicator of a gene's biological role. | ||
+ | :* When you are done, submit the link to your page via Quercus as described above. | ||
− | |||
− | |||
− | |||
− | |||
− | |||
{{Smallvspace}} | {{Smallvspace}} | ||
Line 126: | Line 126: | ||
--> | --> | ||
+ | <!-- | ||
; Quiz option | ; Quiz option | ||
: Open the [http://steipe.biochemistry.utoronto.ca/abc/students/index.php/Signup-FND-Graphs_Quiz '''signup-page for the quiz for this unit (linked from here)'''] and add your name. Your name must be signed up by 12:00 of the day of the Quiz to ensure copies of the quiz are available for all participants. | : Open the [http://steipe.biochemistry.utoronto.ca/abc/students/index.php/Signup-FND-Graphs_Quiz '''signup-page for the quiz for this unit (linked from here)'''] and add your name. Your name must be signed up by 12:00 of the day of the Quiz to ensure copies of the quiz are available for all participants. | ||
− | |||
<div style="margin-left: 2rem;">Quizzes will be written in class, back-to-back if there is more than one quiz scheduled. We may begin at any time. We will have an open-ended Q&A session before the quiz. You can't take the quiz if you are not present in class when the question sheets are handed out, so don't be late. Once all scheduled quizzes are written, we will discuss and mark them. You will mark your own quiz. All marking must be done with a red pen - so you '''must''' bring a red pen to class in order to participate. The mark you give yourself may be revised by the instructor after spot-checking quizzes. If this is necessary, you will be notified. You must mark your quiz correctly and honestly - don't get into trouble with academic integrity rules: it will be an academic offence if you mark questions as correct that were discussed in class and should have been marked incorrect. When in doubt, ask.</div> | <div style="margin-left: 2rem;">Quizzes will be written in class, back-to-back if there is more than one quiz scheduled. We may begin at any time. We will have an open-ended Q&A session before the quiz. You can't take the quiz if you are not present in class when the question sheets are handed out, so don't be late. Once all scheduled quizzes are written, we will discuss and mark them. You will mark your own quiz. All marking must be done with a red pen - so you '''must''' bring a red pen to class in order to participate. The mark you give yourself may be revised by the instructor after spot-checking quizzes. If this is necessary, you will be notified. You must mark your quiz correctly and honestly - don't get into trouble with academic integrity rules: it will be an academic offence if you mark questions as correct that were discussed in class and should have been marked incorrect. When in doubt, ask.</div> | ||
+ | --> | ||
{{Smallvspace}} | {{Smallvspace}} | ||
; R-code option | ; R-code option | ||
− | :The graph defined by the <code>scCCnet</code> edgelist contains a pronounced community structure with several large modules of similar size. This has an obvious biological interpretation of systems of functionally interacting genes. But | + | :The graph defined by the <code>scCCnet</code> edgelist contains a pronounced community structure with several large modules of similar size. This has an obvious biological interpretation of systems of functionally interacting genes. But what mechanism could have produced this graph-feature that has evolved to support functional requirements of the yeast cell cycle? Your task is to assess degree distribution and community structure of the <code>scCCnet</code> graph to random graphs that are produced with different algorithms. <tt>igraph::</tt> has several stochastic network construction algorithms which it calls "games". You can find them in the [https://igraph.org/r/doc/ <tt>igraph::</tt> manual], searching for "game". Among them are preferential attachment, Watts-Strogatz, forest fire, and many more. Identify which algorithms are able to create a network from first principles (i.e. not according to a given degree distribution) that is similar to <code>scCCnet</code>. |
− | :Submit code according to the following requirements. Make sure your code is documented and that you have tested your functions to give correct output. | + | |
− | :*Write a function that takes an igraph graph as input and | + | |
− | + | :Submit code according to the following '''requirements'''. Make sure your code is documented and that you have tested your functions to give correct output. | |
− | + | :* Write a function that takes an <tt>igraph::</tt> graph as input and computes the Kullback-Leibler divergence (cf. [[FND-STA-Probability_distribution]]) between a simulated graph and <code>scCCnet</code>. Validate your function with <code>igraph::sample_degseq()</code>. | |
− | :*Plot and interpret the | + | :* For three <tt>igraph::</tt> "games", find parameters that minimize the KL-divergence between the produced graph and <code>scCCnet</code>. |
+ | :* Plot and interpret the results. | ||
:* Create a new page on the student Wiki as a subpage of your User Page. Put your code there. | :* Create a new page on the student Wiki as a subpage of your User Page. Put your code there. | ||
− | :* When you are done | + | :* When you are done, submit the link to your page via Quercus as described above. |
− | |||
− | |||
− | |||
− | |||
{{Smallvspace}} | {{Smallvspace}} | ||
;Option to write a "Self-Evaluation Question" | ;Option to write a "Self-Evaluation Question" | ||
+ | : You can submit a "Self-Evaluation Question" for at most '''one''' of your assignments. | ||
:Write a "Self-evaluation Question" (with a model solution) that explores the actual use of centrality measures in the recent biomedical literature. | :Write a "Self-evaluation Question" (with a model solution) that explores the actual use of centrality measures in the recent biomedical literature. | ||
− | :* Find a recent article (not older than 2 years) that uses a centrality score in the analysis of biological networks. Summarize the use of this measure, and then quote a passage from the article that describes how the measure was used. Then ask the learner to paraphrase what this passage means, and how the measure relates to the biology that is being described. Ensure that the answer is feasible, given what was discussed in the unit, and what was summarized and quoted by you. The goal is for the learner to think about the biological interpretation of graph-theoretical measures obtained from biological networks. Apply the [[ABC-Rubrics| '''marking rubrics''']] in spirit to satisfy yourself of the quality of your question. Use the format and code templates that you find on the [[Self_evaluation_questions|'''Self evaluation questions page''']] - but don't assume those examples are already models of excellent contributions. This will be a short-answer format question. Note: assume that approximately the same amount of work is expected for all evaluation options. Consequently, the standard of excellence for this option will be | + | :* Find a recent article (not older than 2 years) that uses a centrality score in the analysis of biological networks. Summarize the use of this measure, and then quote a passage from the article that describes how the measure was used. Then ask the learner to paraphrase what this passage means, and how the measure relates to the biology that is being described. Ensure that the answer is feasible, given what was discussed in the unit, and what was summarized and quoted by you. The goal is for the learner to think about the biological interpretation of graph-theoretical measures obtained from biological networks. Apply the [[ABC-Rubrics| '''marking rubrics''']] in spirit to satisfy yourself of the quality of your question. Use the format and code templates that you find on the [[Self_evaluation_questions|'''Self evaluation questions page''']] - but don't assume those examples are already models of excellent contributions. This will be a short-answer format question. Note: assume that approximately the same amount of work is expected for all evaluation options. Consequently, the standard of excellence for this option will be very high. |
:* Create a new page on the student Wiki as a subpage of your User Page. Develop your question there. | :* Create a new page on the student Wiki as a subpage of your User Page. Develop your question there. | ||
− | :* When you are done | + | :* When you are done, submit the link to your page via Quercus as described above. |
− | |||
− | |||
− | |||
− | |||
== Contents == | == Contents == | ||
− | |||
{{Task|1= | {{Task|1= | ||
* Read the introductory notes on {{ABC-PDF|FND-MAT-Graphs_and_networks|graph theory and network science}}. | * Read the introductory notes on {{ABC-PDF|FND-MAT-Graphs_and_networks|graph theory and network science}}. | ||
− | * For a useful overview of graph-theory concepts you could additionally have a look at: | + | * For a useful overview of graph-theory and network analysis concepts you could additionally have a look at: |
+ | {{#pmid: 32083072}} | ||
{{#pmid: 21527005}} | {{#pmid: 21527005}} | ||
Line 178: | Line 173: | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
== Notes == | == Notes == | ||
− | |||
− | |||
<references /> | <references /> | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
{{Vspace}} | {{Vspace}} | ||
− | |||
− | |||
− | |||
− | |||
− | |||
<div class="about"> | <div class="about"> | ||
Line 227: | Line 187: | ||
:2017-08-05 | :2017-08-05 | ||
<b>Modified:</b><br /> | <b>Modified:</b><br /> | ||
− | : | + | :2020-10-07 |
<b>Version:</b><br /> | <b>Version:</b><br /> | ||
− | :1. | + | :1.2 |
<b>Version history:</b><br /> | <b>Version history:</b><br /> | ||
+ | *1.2 Edit policy update | ||
+ | *1.1 2020 Updates and revisions to deliverables | ||
*1.0 First live version | *1.0 First live version | ||
*0.1 Stub only | *0.1 Stub only | ||
</div> | </div> | ||
− | |||
− | |||
{{CC-BY}} | {{CC-BY}} | ||
+ | [[Category:ABC-units]] | ||
+ | {{EVAL}} | ||
+ | {{LIVE}} | ||
+ | {{EVAL}} | ||
</div> | </div> | ||
<!-- [END] --> | <!-- [END] --> |
Latest revision as of 11:09, 24 October 2020
Graphs and Networks
(Intoduction to graph theory and network science; iGraph)
Abstract:
"Traditional" bioinformatics has focussed on the properties of individual genes, but we are developing a much more functionally oriented view that investigates gene relationships. Analyzing relationships is the domain of graph theory, where relationships are represented as edges between nodes. This unit introduces key concepts and terminology and puts working with graphs into practice with the igraph:: package in R.
Objectives:
|
Outcomes:
|
Deliverables:
Prerequisites:
You need the following preparation before beginning this unit. If you are not familiar with this material from courses you took previously, you need to prepare yourself from other information sources:
- Calculus: functions and equations; polynomial functions, logarithms, trigonometric functions; integrals and derivatives; theorem and proof.
This unit builds on material covered in the following prerequisite units:
Contents
Evaluation
This learning unit can be evaluated for a maximum of 5 marks. There are several options for submission. Choose one option, then ...
- Create a new page on the student Wiki as a subpage of your User Page.
- Put all of your writing to submit on this one page.
- When you are done with everything, go to the Quercus Assignments page and open the first Learning Unit that you have not submitted yet. Paste the URL of your Wiki page into the form, and click on Submit Assignment.
Your link can be submitted only once and not edited. But you may change your Wiki page at any time. However only the last version before the due date will be marked. All later edits will be silently ignored.
- Context
- The R project for the learning units contains a network that I have prepared from functional interaction data from the STRING database, subset with GO annotations for yeast. STRING curates networks of genes that have functional interactions, based on a variety of data such as experimental evidence, literature mining and more. GO (Gene Ontology) categorizes genes with functional annotations. The network I have prepared contains all yeast genes that have been annotated to the
"mitotic cell cycle"
and that have high-confidence functional interactions as determined by STRING. You can load the edge list with the command ...
scCCnet <- readRDS("./data/scCCnet.rds")
... which will place the "tibble" scCCnet
(S. cerevisiae Cell Cycle network) into your workspace. tibbles are modern versions of data frames and can be used just like data frames for most practical purposes[1]. The code I wrote to create this tibble is in ./scripts/ABC-makeScCCnet.R
- it may be worthwhile for you to look at the script to get a better understanding what this data is. All options require you to plot the network, I expect that layout_with_graphopt()
will work reasonably well to define the layout, but igraph::
has many options for graph layout and its worthwhile to try them. In all cases you may need to play with the parameters to find the proper balance of node distance and node size.
- Short Report option
- 1. Create a new page on the student Wiki as a subpage of your User Page.
- 2. Write a short report on one of the three following topics - A, B, or C. The goal of this short report is to connect graph measures to the biology of the nodes. (All reports must have the R code you wrote in an appendix.)
- A - Degree distribution
- A.1 Create an informative overview plot of the
scCCnet
network to highlight the degree distribution. Color and scale the nodes according to their degree. How well you choose the layout algorithm, and the plotting parameters for node color, size, and label-size, will determine part of your mark. "Excellent" submissions will include a legend in the plot. Obviously, the full plotting command will be included on your submitted page. - A.2 Plot a log-frequency against log-rank plot of the degree distribution. Does this look like a scale-free network? Interpret this result. What are the highest-degree nodes? What are the lowest-degree nodes? Do genes in these categories have anything in common?
- B - Centrality
- B.1 Create an informative overview plot of the
scCCnet
network to highlight the centrality scores of the nodes. Choose one of: betweenness centrality, closeness centrality, or eigenvector centrality for your analysis. I expect thatigraph::layout_with_graphopt()
will provide a good starting point, but you may need to adjust the parameters. Color and scale the nodes according to their centrality score. How well you choose the layout algorithm, and the plotting parameters for node color, size, and label-size, will determine part of your mark. "Excellent" submissions will include a legend in the plot. Obviously, the full plotting command is to be included on your submitted page. - B.2 Order the nodes according to their score. Choose the nodes with the 10 highest centrality scores. Interpret the results. What are the highest-centrality nodes? Do genes with high centrality have anything in common?
- C - Clusters
- C.1 Create an informative overview plot of the network to highlight the community structure of the network. Choose
igraph::cluster_infomap()
and one of the eight other clustering algorithms of igraph:: for comparison. Color and scale the nodes according to their community membership. How well you choose the layout algorithm, and the plotting parameters for node color, size, and label-size, will determine part of your mark. "Excellent" submissions will include a legend in the plot. Obviously, the full plotting command is to be included on your submitted page. - C.2 Compare the two methods. Do the clusters overlap? Are they very different? Considering the biology of the nodes, which clustering method was more successful?
- C.3 For the "better" of the two methods, analyse the top 4 communities. Why do these genes have more functional interactions with each other than with other genes? Do the members of these clusters have anything in common?
- Hint: The Saccharomycs Genome Database hosts YeastMine - a Web tool into which you can paste a number of yeast gene IDs, and retrieve summary information including symbol, name, and sequence length. The name is a good indicator of a gene's biological role.
- When you are done, submit the link to your page via Quercus as described above.
- R-code option
- The graph defined by the
scCCnet
edgelist contains a pronounced community structure with several large modules of similar size. This has an obvious biological interpretation of systems of functionally interacting genes. But what mechanism could have produced this graph-feature that has evolved to support functional requirements of the yeast cell cycle? Your task is to assess degree distribution and community structure of thescCCnet
graph to random graphs that are produced with different algorithms. igraph:: has several stochastic network construction algorithms which it calls "games". You can find them in the igraph:: manual, searching for "game". Among them are preferential attachment, Watts-Strogatz, forest fire, and many more. Identify which algorithms are able to create a network from first principles (i.e. not according to a given degree distribution) that is similar toscCCnet
.
- Submit code according to the following requirements. Make sure your code is documented and that you have tested your functions to give correct output.
- Write a function that takes an igraph:: graph as input and computes the Kullback-Leibler divergence (cf. FND-STA-Probability_distribution) between a simulated graph and
scCCnet
. Validate your function withigraph::sample_degseq()
. - For three igraph:: "games", find parameters that minimize the KL-divergence between the produced graph and
scCCnet
. - Plot and interpret the results.
- Create a new page on the student Wiki as a subpage of your User Page. Put your code there.
- When you are done, submit the link to your page via Quercus as described above.
- Write a function that takes an igraph:: graph as input and computes the Kullback-Leibler divergence (cf. FND-STA-Probability_distribution) between a simulated graph and
- Option to write a "Self-Evaluation Question"
- You can submit a "Self-Evaluation Question" for at most one of your assignments.
- Write a "Self-evaluation Question" (with a model solution) that explores the actual use of centrality measures in the recent biomedical literature.
- Find a recent article (not older than 2 years) that uses a centrality score in the analysis of biological networks. Summarize the use of this measure, and then quote a passage from the article that describes how the measure was used. Then ask the learner to paraphrase what this passage means, and how the measure relates to the biology that is being described. Ensure that the answer is feasible, given what was discussed in the unit, and what was summarized and quoted by you. The goal is for the learner to think about the biological interpretation of graph-theoretical measures obtained from biological networks. Apply the marking rubrics in spirit to satisfy yourself of the quality of your question. Use the format and code templates that you find on the Self evaluation questions page - but don't assume those examples are already models of excellent contributions. This will be a short-answer format question. Note: assume that approximately the same amount of work is expected for all evaluation options. Consequently, the standard of excellence for this option will be very high.
- Create a new page on the student Wiki as a subpage of your User Page. Develop your question there.
- When you are done, submit the link to your page via Quercus as described above.
Contents
Task:
- Read the introductory notes on graph theory and network science.
- For a useful overview of graph-theory and network analysis concepts you could additionally have a look at:
Koutrouli et al. (2020) A Guide to Conquer the Biological Network Era Using Graph Theory. Front Bioeng Biotechnol 8:34. (pmid: 32083072) |
[ PubMed ] [ DOI ] Networks are one of the most common ways to represent biological systems as complex sets of binary interactions or relations between different bioentities. In this article, we discuss the basic graph theory concepts and the various graph types, as well as the available data structures for storing and reading graphs. In addition, we describe several network properties and we highlight some of the widely used network topological features. We briefly mention the network patterns, motifs and models, and we further comment on the types of biological and biomedical networks along with their corresponding computer- and human-readable file formats. Finally, we discuss a variety of algorithms and metrics for network analyses regarding graph drawing, clustering, visualization, link prediction, perturbation, and network alignment as well as the current state-of-the-art tools. We expect this review to reach a very broad spectrum of readers varying from experts to beginners while encouraging them to enhance the field further. |
Pavlopoulos et al. (2011) Using graph theory to analyze biological networks. BioData Min 4:10. (pmid: 21527005) |
[ PubMed ] [ DOI ] Understanding complex systems often requires a bottom-up analysis towards a systems biology approach. The need to investigate a system, not only as individual components but as a whole, emerges. This can be done by examining the elementary constituents individually and then how these are connected. The myriad components of a system and their interactions are best characterized as networks and they are mainly represented as graphs where thousands of nodes are connected with thousands of vertices. In this article we demonstrate approaches, models and methods from the graph theory universe and we discuss ways in which they can be used to reveal hidden properties and features of a network. This network profiling combined with knowledge extraction will help us to better understand the biological significance of the system. |
Task:
- Open RStudio and load the
ABC-units
R project. If you have loaded it before, choose File → Recent projects → ABC-Units. If you have not loaded it before, follow the instructions in the RPR-Introduction unit. - Choose Tools → Version Control → Pull Branches to fetch the most recent version of the project from its GitHub repository with all changes and bug fixes included.
- Type
init()
if requested. - Open the file
FND-MAT-Graphs_and_networks.R
and follow the instructions.
Note: take care that you understand all of the code in the script. Evaluation in this course is cumulative and you may be asked to explain any part of code.
Notes
- ↑ Post to the discussion board in case you come into a situation where you are trying to do something that ought to have worked with a data frame, but does not work in the same way with a tibble. There are some differences ...
About ...
Author:
- Boris Steipe <boris.steipe@utoronto.ca>
Created:
- 2017-08-05
Modified:
- 2020-10-07
Version:
- 1.2
Version history:
- 1.2 Edit policy update
- 1.1 2020 Updates and revisions to deliverables
- 1.0 First live version
- 0.1 Stub only
This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.