Difference between revisions of "User:Boris/BCB hackathon 2018"
(Created page with "<div id="ABC"> <div style="padding:5px; border:1px solid #000000; background-color:#b3dbce; font-size:300%; font-weight:400; color: #000000; width:100%;"> BCB Hackathon 2018 <...") |
m (→Process) |
||
(10 intermediate revisions by the same user not shown) | |||
Line 17: | Line 17: | ||
<section end=abstract /> | <section end=abstract /> | ||
</div> | </div> | ||
− | < | + | </div> |
− | |||
{{Smallvspace}} | {{Smallvspace}} | ||
− | |||
__TOC__ | __TOC__ | ||
Line 29: | Line 27: | ||
== Background == | == Background == | ||
{{Smallvspace}} | {{Smallvspace}} | ||
− | The first two draft sequences of the human genome were published in February of 2001<ref>{{#pmid: 11237011}}</ref><ref>{{#pmid: 11181995}}</ref>. Three years from now will mark the twentieth anniversary of this accomplishment that like now other has shaped the landscape of bioinformatics, computational biology and molecular medicine | + | <table style="Cellpadding:10px;"> |
+ | <tr> | ||
+ | <td style="padding-right:10px" width="50%"> | ||
+ | The first two draft sequences of the human genome were published in February of 2001<ref>{{#pmid: 11237011}}</ref><ref>{{#pmid: 11181995}}</ref>. Three years from now will mark the twentieth anniversary of this accomplishment that like now other has shaped the landscape of bioinformatics, computational biology and molecular medicine. | ||
{{Smallvspace}} | {{Smallvspace}} | ||
+ | In 2001, {{WP|Celera}} - a private company founded three years earlier to commercialize genome information - published an iconic poster summarizing their version of the genome. It is still fascinating today. | ||
+ | </td> | ||
+ | <td> | ||
+ | [[File:CeleraPoster.med.jpg|400px]] | ||
+ | </td> | ||
+ | </tr> | ||
− | + | <tr> | |
+ | <td style="padding-right:10px"> | ||
+ | This poster is significant, not so much for its interpretable content, but for the unique perspective it gives us on the entirety of information that constitutes our molecular identity. | ||
+ | </td> | ||
+ | <td> | ||
+ | [[File:CeleraPosterTop.med.jpg|400px]] | ||
+ | </td> | ||
+ | </tr> | ||
− | + | <tr> | |
+ | <td style="padding-right:10px"> | ||
+ | The details are rich, in fact, surprisingly "modern", presenting features like CpG islands and SNP density, and exon transcripts with Gene Ontology functional categories colour coded, for forward and reverse strand, accurately plotted on the nucleotide backbone at about 500 kB per centimetre. This was computed from gff records with Josep Abril's <code>gff2ps</code> software<ref>{{#pmid: 11099262 }}</ref>. | ||
+ | </td> | ||
+ | <td> | ||
+ | [[File:CeleraPosterChr4.med.jpg|400px]] | ||
+ | </td> | ||
+ | </tr> | ||
+ | </table> | ||
{{Smallvspace}} | {{Smallvspace}} | ||
+ | But we know so much more today. While the Celera map showed us the genome of one Caucasian male, the '''number of sequenced genomes''' has exploded - we envisioned the 1,000 genomes project (2008, completed 2012); quickly set our sights on 100,000 genomes (2012, almost completed), and as of today more than 500,000 human genomes have been sequenced overall. We have sequenced '''cancers''', and '''genetic diseases'''. We have sequenced representatives of virtually all '''ethnicities''' on the planet. We have even sequenced '''Neanderthals''' and Denisovians, and we have sequenced '''other species''' far and wide to acquire a sense of where we humans fit into the landscape of evolution. We have annotated the contents of the genome in the '''ENCODE''' project. We have built databases that carefully dissect all proteins into their domains, such as '''InterPro'''. We have started to outline how things work together in functional networks such as the '''STRING''' data, or in modules as published by '''KEGG''', and we are beginning to translate our insights into actionable information for medicine, at the '''OICR''', at Sick Kids' '''TCAG'''. | ||
{{Smallvspace}} | {{Smallvspace}} | ||
− | + | Our imagination of the genome has matured tremendously. Let's come together for a catalytic task: | |
− | |||
− | |||
+ | {{Smallvspace}} | ||
+ | '''Create the image that will define how we understand the Human Genome – 20 years on.''' | ||
Line 54: | Line 77: | ||
== Goals == | == Goals == | ||
{{Smallvspace}} | {{Smallvspace}} | ||
+ | '''The goal of the hackathon contest is to define data-driven visualizations that broadly and intuitively represent key aspects of our current understanding of the human genome.''' | ||
+ | We will evaluate | ||
+ | |||
+ | * creativity and innovation; | ||
+ | * quality of information design; | ||
+ | * biological relevance; | ||
+ | * computational implementation; and | ||
+ | * documentation and presentation. | ||
{{Vspace}} | {{Vspace}} | ||
Line 60: | Line 91: | ||
== Process == | == Process == | ||
{{Smallvspace}} | {{Smallvspace}} | ||
+ | |||
+ | This is a two day hackathon for undergraduate student teams from any POSt, Faculty, or even University<ref>Teams can include up to one-in-five graduate students or BCB alumni.</ref>. So prepare yourselves: | ||
+ | ** Recent papers on sequenced genomes will make you familiar with the language and ideas in the field; | ||
+ | ** Have a look at the code snippets we've prepared to get some technicalities out of the way; request more if you can't find what you think you'll need; | ||
+ | ** Form teams: teams will need a mixed set of skills: writing clean, efficient code; statistics; algorithms; software engineering; understanding the biology; art and design; planning, coordination and documentation; public presentation - you'll need to find people beyond BCB in CS, Stats, the humanities (philosophy would be useful), life sciences, medicine (for sure!), art (welcome to reach out to OCAD); | ||
+ | ** (logistics?) | ||
+ | |||
+ | |||
+ | * We'll start the day off with a backgrounder on the genome, genome-scale data sources and examples of current analysis and visualization; | ||
+ | * We'll do a special presentation on information design and user perspectives; | ||
+ | * Then we'll design, code, and refine; | ||
+ | * Mentors will be available for assistance; | ||
+ | * We'll have ad hoc tutorials on common issues; | ||
+ | * We'll supply sample code for common tasks; | ||
+ | * And we'll have a round of judges' feedback on concepts; | ||
+ | * Food. Yes, there will be food. | ||
+ | * And coffee. | ||
+ | * Because this will go all night (or until we're done). | ||
+ | * Code-freeze in the morning: the teams will present their progress. | ||
+ | * Judging will be done over lunch; | ||
+ | * And we'll finish off with awards and prizes. | ||
+ | |||
+ | And finally we'll talk about where we'll go from there. Because there are perspectives. | ||
Line 66: | Line 120: | ||
== Perspectives == | == Perspectives == | ||
{{Smallvspace}} | {{Smallvspace}} | ||
+ | We don't expect to come up with polished, comprehensive solutions. But we hope for a rich showcase of possibilities: our collective intelligence creates approaches that we could not possibly have thought of alone. We will take these results, and coordinate refinement and integration. Once we are satisfied, the "Genome Anniversary" will be close ... | ||
+ | |||
+ | ;Let's think big. | ||
+ | |||
+ | * Make this a story for the Bulletin? The Star? Nature & Science? | ||
+ | * With resources and sponsorship from the CS Department? Compute Ontario? Amazon? Google? | ||
+ | * A poster in every biology department? In every school of Toronto? Canada? The Planet? | ||
+ | * Under the patronage of UofT's research institutes? CIHR? UNESCO? | ||
+ | It's up to you. | ||
{{Vspace}} | {{Vspace}} |
Latest revision as of 16:49, 8 February 2018
BCB Hackathon 2018
(Topic Proposal: The Human Genome - 20 years later)
Abstract:
This topic proposal for the 2018 BCB hackathon is to explore new ways to provide a holistic overview of the contents of the human genome, for the occasion of the 20th anniversary of its sequence.
Contents
Background
The first two draft sequences of the human genome were published in February of 2001[1][2]. Three years from now will mark the twentieth anniversary of this accomplishment that like now other has shaped the landscape of bioinformatics, computational biology and molecular medicine.
In 2001, Celera - a private company founded three years earlier to commercialize genome information - published an iconic poster summarizing their version of the genome. It is still fascinating today. |
|
This poster is significant, not so much for its interpretable content, but for the unique perspective it gives us on the entirety of information that constitutes our molecular identity. |
|
The details are rich, in fact, surprisingly "modern", presenting features like CpG islands and SNP density, and exon transcripts with Gene Ontology functional categories colour coded, for forward and reverse strand, accurately plotted on the nucleotide backbone at about 500 kB per centimetre. This was computed from gff records with Josep Abril's |
But we know so much more today. While the Celera map showed us the genome of one Caucasian male, the number of sequenced genomes has exploded - we envisioned the 1,000 genomes project (2008, completed 2012); quickly set our sights on 100,000 genomes (2012, almost completed), and as of today more than 500,000 human genomes have been sequenced overall. We have sequenced cancers, and genetic diseases. We have sequenced representatives of virtually all ethnicities on the planet. We have even sequenced Neanderthals and Denisovians, and we have sequenced other species far and wide to acquire a sense of where we humans fit into the landscape of evolution. We have annotated the contents of the genome in the ENCODE project. We have built databases that carefully dissect all proteins into their domains, such as InterPro. We have started to outline how things work together in functional networks such as the STRING data, or in modules as published by KEGG, and we are beginning to translate our insights into actionable information for medicine, at the OICR, at Sick Kids' TCAG.
Our imagination of the genome has matured tremendously. Let's come together for a catalytic task:
Create the image that will define how we understand the Human Genome – 20 years on.
Goals
The goal of the hackathon contest is to define data-driven visualizations that broadly and intuitively represent key aspects of our current understanding of the human genome.
We will evaluate
- creativity and innovation;
- quality of information design;
- biological relevance;
- computational implementation; and
- documentation and presentation.
Process
This is a two day hackathon for undergraduate student teams from any POSt, Faculty, or even University[4]. So prepare yourselves:
- Recent papers on sequenced genomes will make you familiar with the language and ideas in the field;
- Have a look at the code snippets we've prepared to get some technicalities out of the way; request more if you can't find what you think you'll need;
- Form teams: teams will need a mixed set of skills: writing clean, efficient code; statistics; algorithms; software engineering; understanding the biology; art and design; planning, coordination and documentation; public presentation - you'll need to find people beyond BCB in CS, Stats, the humanities (philosophy would be useful), life sciences, medicine (for sure!), art (welcome to reach out to OCAD);
- (logistics?)
- We'll start the day off with a backgrounder on the genome, genome-scale data sources and examples of current analysis and visualization;
- We'll do a special presentation on information design and user perspectives;
- Then we'll design, code, and refine;
- Mentors will be available for assistance;
- We'll have ad hoc tutorials on common issues;
- We'll supply sample code for common tasks;
- And we'll have a round of judges' feedback on concepts;
- Food. Yes, there will be food.
- And coffee.
- Because this will go all night (or until we're done).
- Code-freeze in the morning: the teams will present their progress.
- Judging will be done over lunch;
- And we'll finish off with awards and prizes.
And finally we'll talk about where we'll go from there. Because there are perspectives.
Perspectives
We don't expect to come up with polished, comprehensive solutions. But we hope for a rich showcase of possibilities: our collective intelligence creates approaches that we could not possibly have thought of alone. We will take these results, and coordinate refinement and integration. Once we are satisfied, the "Genome Anniversary" will be close ...
- Let's think big.
- Make this a story for the Bulletin? The Star? Nature & Science?
- With resources and sponsorship from the CS Department? Compute Ontario? Amazon? Google?
- A poster in every biology department? In every school of Toronto? Canada? The Planet?
- Under the patronage of UofT's research institutes? CIHR? UNESCO?
It's up to you.
Notes
- ↑
Lander et al. (2001) Initial sequencing and analysis of the human genome. Nature 409:860-921. (pmid: 11237011) - ↑
Venter et al. (2001) The sequence of the human genome. Science 291:1304-51. (pmid: 11181995) - ↑
Abril & Guigó (2000) gff2ps: visualizing genomic annotations. Bioinformatics 16:743-4. (pmid: 11099262) - ↑ Teams can include up to one-in-five graduate students or BCB alumni.
About ...
Last update:
- 2017-02-06
Version:
- 1.0
Version history:
- 1.0 First proposal
This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.