Expected Preparations:
|
|||||||
|
|||||||
Keywords: Structured Process Notation (SPN) for workflow modelling | |||||||
|
|||||||
Objectives:
This unit will …
|
Outcomes:
After working through this unit you …
|
||||||
|
|||||||
Deliverables: Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit. Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don’t overlook these. Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page. |
|||||||
|
|||||||
Evaluation: NA: This unit is not evaluated for course marks. |
SPN (Structured Process Notation) is a lightweight, expressive notation for the design of dataflow diagrams.
Engineering workflow-centric software for research laboratories requires transparent, high level models to structure the design and to document the overall organisation of the code. However current, popular methodologies are actually not well suited for this task: they tend to focus on the implementation of objects rather than on the integration of functions, and they are burdened with poor information design decisions. SPN (Structured Process Notation) is an intuitive, lightweight alternative loosely based on SADT(W) / IDEF0(W). SPN is intuitive by consistently applying a directional metaphor that is close to implementation, expressive by using icons and avoiding any superfluous elements, simple by minimizing the number of symbols and using explicit annotations, granular by supporting multiple levels of detail through nesting, and extensible.
Features include:
SPN Elements | ||
Icon | Name | Description |
---|---|---|
|
Data | Any data item that is consumed or produced by an activity / procedure in the workflow. Typically this may be a file, but it can also be a stream, a series of files, an object in memory, or one or more tables in a datamodel. The nature of the data is clarified in the description. Labels are short mnemonics. Examples: ID, Tor, s3D, mapX. Data items have one incoming relationship line (a data item is produced by exactly one activity) and can have more than one outgoing relationships (data items can be used by more than one activity). The data icon can be used to mark the Begin and End state for the process. Begin icons are not produced by an activity. |
|
Multi data | Multiple data items with the same format and semantics are represented with the multi data icon. They are produced and consumed sequentially by an activity. For paralell processes, use braces (see below). |
|
Connecting arrow |
Connecting arrows depict dataflow in the process and establish the relationship between input, activity and output. All lines are orthogonal and line crossings are avoided. Drawing connections against the flow is discouraged, and lines must never enter the output (lower) edges of icons. All connections have a direction. We never use plain lines but arrows. Arrows at the page edge that do not connect to icons indicate that the workflow / diagram is continued on another page. |
|
Resource | A database, repository or service that supplies data to the process through some query mechanism but is not significantly modified by the process, i.e. it is not a target of process output – any “canned” result. Labels are either acronyms, abbreviations, or proper names. Examples: PDB, SP ( = SwissProt). This icon is also used for services, i.e. activites that reside outside of the system, such as BLAST, or T-Coffee. |
|
Activity | Activities operate on input to generate output. These can be programs, scripts, modules or functions/methods within programs. Activities are given mnemonic labels, one-sylllable action verbs are recommended. Examples: Hoard, Spin, Chew. Labels do not have to be the actual function name, their purpose is merely to be descriptive enough to support comprehension of the workflow. |
|
Parameter | One or more parameters for an activity. These would be values that would typically passed through command-line or initialization file. Parameters are named and default values can be given. Ellipsis {…} can be used if the list is not complete. Parameters can be Begin states. |
|
Function | A function, subroutine or algorithm that is a non-trivial part of an activity. This is optional and only used to highlight a particular functional aspect of the workflow, or choices between alternatives. Example: BioConductor, Minerva, curl. |
|
Branch | Conditional branches fork dataflow according to a condition that is described in a right-hand side annotation. Labels are short eg. C1, C2. Use these sparingly, SPN diagrams are data flow models, not control models. It will usually be better to indicate iteration or recursion with a Note on the Activity, rather than drawing an explicit loop. But don’t be dogmatic about this, draw a loop if it improves clarity. |
|
Connector | The connector dot is used when dataflows are merged or split. In general, it should be avoided – data icons accept up to three outgoing lines and activities accept up to three lines from input, output and resources. However connectors are required if more than three conections to the same object are needed. They make splits and joins explicit and distinguish them clearly from line crossings. |
|
Braces | Braces provide conceptual connections to parallel processes, such as map/reduce procedures. |
|
Note | A Note contains a comment about a component of the diagram or about its functions or purpose. The sense of the arrow always goes against the dataflow to avoid confusing a note with an activity. Short information is contained in the Note, longer information is annotated at the side and referenced to the Note with a label. |
|
Reference |
A pale, dashed box is used to indicate a reference to a nested, more detailed model and description, such as a specification, fine grained diagram, datamodel etc. The number of incoming and outgoing connections in the subordinate diagram must be the same as those entering or leaving the reference box. Through this, the SPN provides for multi-layered hierarchical descriptions of a system and is easily extensible with other notational conventions. |
|
Continuation | A pale dashed line is used to indicate that the process continues on a different page. Page numbers are labeled. |
If multiple instances of syntactically and semantically equivalent data items are produced, this is indicated with a multi data icon. These are connected to the activity that produces them with a single line.
If the multiple elements are connected to an activity, this indicates that are processed sequentially. Parallel processing is indicated through braces.
If the multiple elements are merged in an activity, this is indicated with a brace.
I start drawing sketches by hand on a blackboard, or whiteboard to get a first overview and quickly move things around, reconnect and rearrange. A final, handdrawn sketch is fine, if an electronic diagram is required, I find after experimenting with many alternatives that Google Slides is currently the best tool to develop SPN diagrams. Here is a link to a SPN template that you can copy and use for your own diagrams.
Best Practice in SPN Diagrams | |
Labels | |
Good |
|
Bad |
|
Worst |
|
Connectors | |
Good |
|
Bad |
|
Worst |
|
Color | |
Good |
|
Bad |
|
Worst |
|
Even Worse than Worst |
|
If in doubt, ask! If anything about this contents is not clear to you, do not proceed but ask for clarification. If you have ideas about how to make this material better, let’s hear them. We are aiming to compile a list of FAQs for all learning units, and your contributions will count towards your participation marks.
Improve this page! If you have questions or comments, please post them on the Quercus Discussion board with a subject line that includes the name of the unit.
[END]