Informal programming

From "A B C"
Jump to navigation Jump to search

"Informal" programming


This page is a placeholder, or under current development; it is here principally to establish the logical framework of the site. The material on this page is correct, but incomplete.


Much of our programming work is "informal" in the sense that - simply for reasons of practicality - it does not respect the well-established paradigms of software engineering. Some sources refer to end-user programming to contrast this with programming by developers. Here we discuss parameters of such informal programming and how to avoid a number of potential problems.



Introductory reading

How scientists use computers (PDF at software-carpentry.org)


Parameters

  • (+) domain knowledge
  • (-) knowledge of tools, theory, and best practices
  • (-) infrequent tasks
  • (-) one-off tasks
  • (+) agile


Development

  • scripting vs. compiling
  • Perl
  • Python
  • PHP
  • The LAMP stack

Documentation

...


Testing

...


Exercises



References



Further reading and resources

Stajich (2007) An Introduction to BioPerl. Methods Mol Biol 406:535-48. (pmid: 18287711)

PubMed ] [ DOI ] The BioPerl toolkit provides a library of hundreds of routines for processing sequence, annotation, alignment, and sequence analysis reports. It often serves as a bridge between different computational biology applications assisting the user to construct analysis pipelines. This chapter illustrates how BioPerl facilitates tasks such as writing scripts summarizing information from BLAST reports or extracting key annotation details from a GenBank sequence record.

Mühlberger et al. (2011) Computational analysis workflows for Omics data interpretation. Methods Mol Biol 719:379-97. (pmid: 21370093)

PubMed ] [ DOI ] Progress in experimental procedures has led to rapid availability of Omics profiles. Various open-access as well as commercial tools have been developed for storage, analysis, and interpretation of transcriptomics, proteomics, and metabolomics data. Generally, major analysis steps include data storage, retrieval, preprocessing, and normalization, followed by identification of differentially expressed features, functional annotation on the level of biological processes and molecular pathways, as well as interpretation of gene lists in the context of protein-protein interaction networks. In this chapter, we discuss a sequential transcriptomics data analysis workflow utilizing open-source tools, specifically exemplified on a gene expression dataset on familial hypercholesterolemia.

The Software Carpentry project based on the excellent course and other activities of Greg Wilson at UofT and elsewhere.