Difference between revisions of "Software Development"
m |
|||
Line 77: | Line 77: | ||
For an exercise in Test Driven Development, [[R_Test_Driven_Development|'''follow this link''']]. | For an exercise in Test Driven Development, [[R_Test_Driven_Development|'''follow this link''']]. | ||
− | + | Typically testing is done at several levels: | |
+ | * During the initial development phases {{WP|uni testing}} continuously checks the function of the software ''units'' of the system. | ||
+ | * As the code base progresses, code units are integrated and begin interacting via their interfaces. These interfaces can be specified as "contracts" that define the conditions and obligations of an interaction. Typically, a contract will define the precondition, postcondition and invariants of an interaction. These can be verified by tests. | ||
+ | * Final tests '''verify''' the code, and '''validate''' its correct execution- just like a positive control in a lab experiment. | ||
− | |||
+ | | ||
+ | ===Code=== | ||
+ | Here is a small list of miscellaneous best-practice items for the phase when actual code is being written: | ||
− | + | * Be organized. Keep your files in well-named folders and give your file names some thought. | |
− | + | * Use version control. | |
+ | * Use an IDE (Integrated Development Environment). Syntax highlighting and code autocompletion are nice, but good support for debugging, especially stepping through code and examining variables, setting breakpoints and conditional breakpoints are essential for development. | ||
+ | * Design your code to be easily extensible and only loosely coupled. Your requirements will change frequently, make sure your code is modular and nimble to change as well. | ||
+ | * Design reusable code. This may include standardized interface conventions and separating options and operands well. | ||
+ | * DRY (Don't repeat yourself): create functions or subroutines for tasks that need to be repeated. | ||
+ | * KISS (Keep it simple): resist the temptation for particularly "elegant" language idioms and terse code. | ||
+ | * Comment your code. I can't repeat that often enough. Code is read very much more often than it is written. Unfortunately (for you) the one most likely to have to read and understand your convoluted code is you yourself, half a year later. So do yourself the favour to explain what you are thinking. Not what the code does - that is readable from the code itself - but '''why''' you do something the way you do. | ||
+ | * Be consistent. | ||
− | + | | |
− | + | ==Deploy and Maintain== | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | ; | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | == | ||
These may not be distinct in the scenario we are considering here: validation may comprise the one run of discovery we are aiming for, deployment may not apply and maintenance may be foregone as the research agenda moves on. | These may not be distinct in the scenario we are considering here: validation may comprise the one run of discovery we are aiming for, deployment may not apply and maintenance may be foregone as the research agenda moves on. | ||
Line 148: | Line 117: | ||
<section begin=exercises /> | <section begin=exercises /> | ||
<section end=exercises /> | <section end=exercises /> | ||
+ | |||
+ | --> | ||
Line 154: | Line 125: | ||
<references /> | <references /> | ||
− | |||
− | |||
| | ||
Line 161: | Line 130: | ||
<!-- {{#pmid:21627854}} --> | <!-- {{#pmid:21627854}} --> | ||
<!-- {{WWW|WWW_UniProt}} --> | <!-- {{WWW|WWW_UniProt}} --> | ||
+ | ;Concepts | ||
+ | *{{WP|Software design|Software '''design'''}} | ||
+ | *{{WP|Software design pattern|Software '''pattern'''}} | ||
+ | *{{WP|Software development process}} | ||
+ | *{{WP|Software architecture}} | ||
+ | *{{WP|Portal:Software_testing}} | ||
+ | |||
<div class="reference-box">[http://archive.eiffel.com/doc/manuals/technology/bmarticles/uml/page.html UML: The Positive Spin]</div> | <div class="reference-box">[http://archive.eiffel.com/doc/manuals/technology/bmarticles/uml/page.html UML: The Positive Spin]</div> | ||
<div class="reference-box">[http://msdn.microsoft.com/en-us/library/vstudio/dd490886.aspx Architecture modeling]. A quite useful overview of systems modeling, part of the Microsoft Visual Studio documentation.</div> | <div class="reference-box">[http://msdn.microsoft.com/en-us/library/vstudio/dd490886.aspx Architecture modeling]. A quite useful overview of systems modeling, part of the Microsoft Visual Studio documentation.</div> | ||
*Kim Waldén and Jean-Marc Nerson: Seamless Object-Oriented Software Architecture: Analysis and Design of Reliable Systems, Prentice Hall, 1995. | *Kim Waldén and Jean-Marc Nerson: Seamless Object-Oriented Software Architecture: Analysis and Design of Reliable Systems, Prentice Hall, 1995. | ||
− | |||
− | |||
− | |||
*Article in '''Nature Biotechnology'''; note that ''successful'' here is meant to imply ''widely used''. David Baker's ''Rosetta'' package is not mentioned, for example. Nevertheless: good insights in this. | *Article in '''Nature Biotechnology'''; note that ''successful'' here is meant to imply ''widely used''. David Baker's ''Rosetta'' package is not mentioned, for example. Nevertheless: good insights in this. |
Revision as of 19:49, 17 January 2015
Software Development
(In a small-scale research context)
It is not hard to argue that the creation of software is the greatest human cultural achievement to date. But writing software well is not easy and much sophisticated methodology has been proposed for software development, primarily addressing the needs of large software companies and enterprise-scale systems. Certainly: once software development becomes the task of teams, and systems become larger than what one person can remember confidently, failure is virtually guaranteed if the task can't be organized in a structured way.
But our work often does not fit this paradigm, because in the bioinformatics lab the requirements change quickly. The reason is obvious: most of what we produce in science are one-off solutions. Once one analysis runs, we publish the results, and we move on. There is limited value in doing an analysis over and over again. However, this does not mean we can't profit from applying the basic principles of good development principles. Fortunately that is easy. There actually is only one principle.
Make implicit knowledge explicit.
Everything else follows.
Contents
Plan
The planning stage involves defining the goals and endpoints of the project. We usually start out with a vague idea of something we would like to achieve. We need to define:
- where we are;
- where we want to be;
- and how we will get there.
For an example of a plan, refer to the 2015 BCB420 Class Project. There, we lay out a plan in three phases: Preparation, Implementation and Results. This is generic, the preparation phase implies an analysis of the problem, which focusses on what will be accomplished, independent of how this will be done. The results of the analysis can be a requirements document (see here for a template ABP Requirements template) or a less formal collection of goals.
The most important achievement of the plan is to break down the project into manageable parts and define the Milestones that characterize the completion of each part.
Design
In the design phase, we focus on the architecture of the system that fulfils the requirements. By architecture we mean the components, their interfaces and behaviour. Typically this will involve some modelling and there are different ways to model a system.
- Structural modelling describes the components and interfaces. The components are typically pieces of software, the interfaces are "contracts" that describe how information passes from one piece to another. Structural models include the Data model that captures how data reflects reality and how reality changes the data in our system;
- Behaviour modelling describes the state changes of our system, how it responds to input and how data flows through the system. In data-driven analysis, the data flow model may capture most of what is important about the system.
Typically, several different types of models may contribute to understanding a system; in practice dataflow diagrams may be particularly well suited for the workflow centric systems that we commonly encounter in bioinformatics.
Develop
In the development phase, we actually build our system. It is a misunderstanding if you believe most time will be spent in this phase. Designing a system well is hard. Building it, if it is well designed, is easy. Building it if it is poorly designed is probably impossible.
A number of development methodologies and philosophies have been proposed, and they go in and out of fashion. In this course we will work with a conjunction of TDD (Test Driven Development) and Literate programming.
Literate Programming
Literate programming is an idea that software is best described in a natural language, focussing on the logic of the program, i.e. the why of code, not the what. The goal is to ensure that model, code, and documentation become a single unit, and that all this information is stored in one and only one location. The product should be consistent between its described goals and its implementation, seamless in capturing the process from start (data input) to end (visualization, interpretation), and reversible (between analysis, design and implementation).
In literate programming, narrative and computer code are kept in the same file. This source document is typically written in Markdown or LaTeX syntax and includes the programming code as well as text annotations, tables, formulas etc. The supporting software can weave human-readable documentation from this, or tangle executable code. Literate programming with both Markdown and LaTex is supported by R Studio and this makes the R Studio interface a useful development environment for this paradigm. While it is easy to edit source files with a different editor and process files in base R after loading the Sweave()
and Stangle()
functions or the knitr
package. In our context here we will use R Studio because it conveniently integrates the functionality we need.
For exercises on knitr, RMarkdown and LaTex, follow this link.
Test Driven Development
TDD is meant to ensure that code actually does what it is meant to do. In practice, we define our software goals and devise a test (or battery of tests) for each. Initially, all test fail. As we develop, the test succeed. As we continue development
- we think carefully about how to break the project into components and structure them;
- we discipline ourselves to watch out for unexpected input, edge- and corner cases and unwarranted assumptions;
- we can be confident that later changes do not break what we have done earlier - because our test keep track of the behaviour.
For an exercise in Test Driven Development, follow this link.
Typically testing is done at several levels:
- During the initial development phases uni testing continuously checks the function of the software units of the system.
- As the code base progresses, code units are integrated and begin interacting via their interfaces. These interfaces can be specified as "contracts" that define the conditions and obligations of an interaction. Typically, a contract will define the precondition, postcondition and invariants of an interaction. These can be verified by tests.
- Final tests verify the code, and validate its correct execution- just like a positive control in a lab experiment.
Code
Here is a small list of miscellaneous best-practice items for the phase when actual code is being written:
- Be organized. Keep your files in well-named folders and give your file names some thought.
- Use version control.
- Use an IDE (Integrated Development Environment). Syntax highlighting and code autocompletion are nice, but good support for debugging, especially stepping through code and examining variables, setting breakpoints and conditional breakpoints are essential for development.
- Design your code to be easily extensible and only loosely coupled. Your requirements will change frequently, make sure your code is modular and nimble to change as well.
- Design reusable code. This may include standardized interface conventions and separating options and operands well.
- DRY (Don't repeat yourself): create functions or subroutines for tasks that need to be repeated.
- KISS (Keep it simple): resist the temptation for particularly "elegant" language idioms and terse code.
- Comment your code. I can't repeat that often enough. Code is read very much more often than it is written. Unfortunately (for you) the one most likely to have to read and understand your convoluted code is you yourself, half a year later. So do yourself the favour to explain what you are thinking. Not what the code does - that is readable from the code itself - but why you do something the way you do.
- Be consistent.
Deploy and Maintain
These may not be distinct in the scenario we are considering here: validation may comprise the one run of discovery we are aiming for, deployment may not apply and maintenance may be foregone as the research agenda moves on.
But this does not mean we can afford ignorance of best practice in scientific software development: simple, but essential aspects like using version control for your code, using IDEs, writing test cases for all code functions etc. These aspects are nowhere better explained than in Greg Wilson's excellent Software Carpentry initiative. Free, online, accessible and to the point. Go there and learn:
Sandve et al. (2013) Ten simple rules for reproducible computational research. PLoS Comput Biol 9:e1003285. (pmid: 24204232) |
Notes
Further reading and resources
- Concepts
- Software design
- Software pattern
- Software development process
- Software architecture
- Portal:Software testing
- Kim Waldén and Jean-Marc Nerson: Seamless Object-Oriented Software Architecture: Analysis and Design of Reliable Systems, Prentice Hall, 1995.
- Article in Nature Biotechnology; note that successful here is meant to imply widely used. David Baker's Rosetta package is not mentioned, for example. Nevertheless: good insights in this.