Genome sequencing

Contents
Further Reading
Questions, comments
References

Expected Preparations:

	Biomolecules: The molecules of life; The genetic code; Nucleic acids; Amino acids; Protein folding; Post-translational modifications and protein biochemistry; Membrane proteins; Biological function.		[BIN] Sequence
	If you are not already familiar with the prior knowledge listed above, you need to prepare yourself from other information sources.		The units listed above are part of this course and contain important preparatory material.

Keywords: Sequencing technologies; highly parallel; single-molecule and single-cell

Objectives:

This unit will …

… introduce methods and concepts of “Next Generation Sequencing” and genome assembly.

Outcomes:

After working through this unit you …

… are familar with the basic methods and concepts of “Next Generation Sequencing” and genome assembly.

Deliverables:

Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.

Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don’t overlook these.

Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.

Evaluation:

NA: This unit is not evaluated for course marks.

A basic introduction to “Next Generation Sequencing” concepts and technologies.

Task…

Read the introductory notes on genome-scale sequencing technologiesPDF.
Review the very most basic information on Genome Sequencing technology and “The Cost of Sequencing a Human Genome” at the US National Human Genome Research Institute.

Here, we review single-cell sequencing techniques for individual and multiomics profiling in single cells. We mainly describe single-cell genomic, epigenomic, and transcriptomic methods, and examples of their applications. For the integration of multilayered data sets, such as the transcriptome data derived from single-cell RNA sequencing and chromatin accessibility data derived from single-cell ATAC-seq, there are several computational integration methods. We also describe single-cell experimental methods for the simultaneous measurement of two or more omics layers. We can achieve a detailed understanding of the basic molecular profiles and those associated with disease in each cell by utilizing a large number of single-cell sequencing techniques and the accumulated data sets.

Kempfer, Rieke and Ana Pombo. (2020). “Methods for mapping 3D chromosome architecture”. Nature Reviews. Genetics 21(4):207–226 .
[PMID: 31848476] [DOI: 10.1038/s41576-019-0195-2]

Abstract …

Determining how chromosomes are positioned and folded within the nucleus is critical to understanding the role of chromatin topology in gene regulation. Several methods are available for studying chromosome architecture, each with different strengths and limitations. Established imaging approaches and proximity ligation-based chromosome conformation capture (3C) techniques (such as DNA-FISH and Hi-C, respectively) have revealed the existence of chromosome territories, functional nuclear landmarks (such as splicing speckles and the nuclear lamina) and topologically associating domains. Improvements to these methods and the recent development of ligation-free approaches, including GAM, SPRITE and ChIA-Drop, are now helping to uncover new aspects of 3D genome topology that confirm the nucleus to be a complex, highly organized organelle.

Ho, Steve S, Alexander E Urban, and Ryan E Mills. (2020). “Structural variation in the sequencing era”. Nature Reviews. Genetics 21(3):171–189 .
[PMID: 31729472] [DOI: 10.1038/s41576-019-0180-9]

Abstract …

Identifying structural variation (SV) is essential for genome interpretation but has been historically difficult due to limitations inherent to available genome technologies. Detection methods that use ensemble algorithms and emerging sequencing technologies have enabled the discovery of thousands of SVs, uncovering information about their ubiquity, relationship to disease and possible effects on biological mechanisms. Given the variability in SV type and size, along with unique detection biases of emerging genomic platforms, multiplatform discovery is necessary to resolve the full spectrum of variation. Here, we review modern approaches for investigating SVs and proffer that, moving forwards, studies integrating biological information with detection will be necessary to comprehensively understand the impact of SV in the human genome.

Stark, Rory, Marta Grzelak, and James Hadfield. (2019). “RNA sequencing: the teenage years”. Nature Reviews. Genetics 20(11):631–656 .
[PMID: 31341269] [DOI: 10.1038/s41576-019-0150-2]

Abstract …

Over the past decade, RNA sequencing (RNA-seq) has become an indispensable tool for transcriptome-wide analysis of differential gene expression and differential splicing of mRNAs. However, as next-generation sequencing technologies have developed, so too has RNA-seq. Now, RNA-seq methods are available for studying many different aspects of RNA biology, including single-cell gene expression, translation (the translatome) and RNA structure (the structurome). Exciting new applications are being explored, such as spatial transcriptomics (spatialomics). Together with new long-read and direct RNA-seq technologies and better computational tools for data analysis, innovations in RNA-seq are contributing to a fuller understanding of RNA biology, from questions such as when and where transcription occurs to the folding and intermolecular interactions that govern RNA function.

Sedlazeck, Fritz J et al.. (2018). “Piercing the dark matter: bioinformatics of long-range sequencing and mapping”. Nature Reviews. Genetics 19(6):329–346 .
[PMID: 29599501] [DOI: 10.1038/s41576-018-0003-4]

Abstract …

Several new genomics technologies have become available that offer long-read sequencing or long-range mapping with higher throughput and higher resolution analysis than ever before. These long-range technologies are rapidly advancing the field with improved reference genomes, more comprehensive variant identification and more complete views of transcriptomes and epigenomes. However, they also require new bioinformatics approaches to take full advantage of their unique characteristics while overcoming their complex errors and modalities. Here, we discuss several of the most important applications of the new technologies, focusing on both the currently available bioinformatics tools and opportunities for future research.

Langmead, Ben and Abhinav Nellore. (2018). “Cloud computing for genomic data analysis and collaboration”. Nature Reviews. Genetics 19(4):208–219 .
[PMID: 29379135] [DOI: 10.1038/nrg.2017.113]

Abstract …

Next-generation sequencing has made major strides in the past decade. Studies based on large sequencing data sets are growing in number, and public archives for raw sequencing data have been doubling in size every 18 months. Leveraging these data requires researchers to use large-scale computational resources. Cloud computing, a model whereby users rent computers and storage from large data centres, is a solution that is gaining traction in genomics research. Here, we describe how cloud computing is used in genomics for research and large-scale collaborations, and argue that its elasticity, reproducibility and privacy features make it ideally suited for the large-scale reanalysis of publicly available archived data, including privacy-protected data.

A very informative and influential blog about the absolutely newest in the field is written by Lior Pachter, Berkley - “Bits of DNA”. Check it out, and from time to time (continuous).

An EBI hands-on online course on NGS (2012)

Questions, comments

If in doubt, ask! If anything about this contents is not clear to you, do not proceed but ask for clarification. If you have ideas about how to make this material better, let’s hear them. We are aiming to compile a list of FAQs for all learning units, and your contributions will count towards your participation marks.

Improve this page! If you have questions or comments, please post them on the Quercus Discussion board with a subject line that includes the name of the unit.

References

About this page …

[END]

Genome sequencing

Boris Steipe

Contents

Further Reading

Questions, comments

References