Difference between revisions of "SPN"

Revision as of 14:52, 2 February 2016

SPN
Structured Process Notation

SPN (Structured Process Notation) is a lightweight, expressive notation for the design of dataflow diagrams.

Introduction

Engineering workflow-centric software for research laboratories requires transparent, high level models to structure the design and to document the overall organisation of the code. However current methodologies are not well suited for this task, focussing on the implementation of objects rather than on the integration of functions and suffering from poor information design principles. SPN (Structured Process Notation) is an intuitive, lightweight alternative loosely based on SADT / IDEF0. SPN is intuitive by consistently applying a directional metaphor that is close to implementation, expressive by using icons and avoiding any superfluous elements, simple by minimizing the number of symbols and using explicit annotations, granular by supporting multiple levels of detail through nesting, and extensible.

The core SPN elements and their general layout. Icons are drawn on the left of the page, labels and descriptions on the right. Fonts may be formatted to emphasize correspondence. Input data enters an activity from the top, parameters from the left, database, libraries, services, repositories and similar resources from the right and output data leaves the activity at the bottom. Connections are drawn with black lines and all other lines in the diagram are not black to make them visually distinct. An activity can be annotated with functions, methods or algorithms. Notes and comments can be added for clarification, they are formatted in grey, the sense of their pointing arrows is always against the dataflow and these arrows are not connected to icons, to avoid confusion. A pale dotted line groups elements that are described in greater detail elsewhere, a reference is included. All elements of the diagram are optional. A Name for the process and a versioned reference ID (process prefix, number, version, page). The page number is optional for a single-page diagram. A summary at the top may be useful.

Features include:

A structured, spatial organization of components intuitively depicts data flow.
Information is not implicit in a complex grammar of symbols but made explicit; the set of symbols that are used is small and they are iconic, i.e. their shape is related to their semantics.
SPN's process-centric view of data flow is close to implementation.
Icons are separated from text to allow uncluttered spatial representation of relationships, but explicit descriptions are available on the diagram by placing elements and descriptions at corresponding heights on the diagram and relating them through mnemonic labels.
Nesting of diagrams is encouraged. Expanding procedures at a higher level of granularity into more detail in a separate diagram structures the components of the workflow without overloading individual diagrams with too much detail.

Elements

SPN Elements

Icon	Name	Description
SPN Elements
	Data	Any data item that is produced by the process. Typically this may be a file, but it can also be a stream, a series of files, an object in memory, or one or more tables in a datamodel. The nature of the data is clarified in the description. Labels are short mnemonics. Examples: ID, Tor, s3D, mapX. Data items have one incoming relationship line (a data item is produced by exactly one activity) and can have more than one outgoing relationships (data items can be used by more than one activity). The data icon can be used to mark the Begin and End state for the process. Begin icons are not produced by an activity.
	Multi data	Multiple data items with the same format and semantics are represented with the multi data icon. They are produced and consumed sequentially by an activity. For paralell processes, use braces (see below).
	Connecting arrow	Connecting arrows depict dataflow in the process and establish the relationship between input, activity and output. All lines are orthogonal and line crossings are avoided. Drawing connections against the flow is discouraged, and lines must never enter the output (lower) edges of icons. All connections have a direction. We never use plain lines but arrows. Arrows at the page edge that do not connect to icons indicate that the workflow / diagram is continued on another page.
	Resource	A database, repository or service that supplies data to the process through some query mechanism but is not significantly modified by the process, i.e. it is not a target of process output – any "canned" result. Labels are either acronyms, abbreviations, or proper names. Examples: PDB, SP ( = SwissProt). This icon is also used for services, i.e. activites that reside outside of the system, such as BLAST, or T-Coffee.
	Activity	Activities operate on input to generate output. These can be programs, scripts, modules or functions/methods within programs. Activities are given mnemonic labels, one-sylllable action verbs are recommended. Examples: Hoard, Spin, Chew. Labels do not have to be the actual function name, their purpose is merely to be descriptive enough to support comprehension of the workflow.
	Parameter	One or more parameters for an activity. These would be values that would typically passed through command-line or initialization file. Parameters are named and default values can be given. Ellipsis {...} can be used if the list is not complete. Parameters can be Begin states.
	Function	A function, subroutine or algorithm that is a non-trivial part of an activity. This is optional and only used to highlight a particular functional aspect of the workflow, or choices between alternatives. Example: BioConductor, Minerva, curl.
	Branch	Conditional branches fork dataflow according to a condition that is described in a right-hand side annotation. Labels are short eg. C1, C2. Use these sparingly, SPN diagrams are data flow models, not control models. It will usually be better to indicate iteration or recursion with a Note on the Activity, rather than drawing an explicit loop. But don't be dogmatic about this, draw a loop if it improves clarity.
	Connector	The connector dot is used when dataflows are merged or split. In general, it should be avoided – data icons accept up to three outgoing lines and activities accept up to three lines from input, output and resources. However connectors are required if more than three conections to the same object are needed. They make splits and joins explicit and distinguish them clearly from line crossings.
	Braces	Braces provide conceptual connections to parallel processes, such as map/reduce procedures.
	XXX	A Note contains a comment about a component of the diagram or about its functions or purpose. The sense of the arrow always goes against the dataflow to avoid confusing a note with an activity. Short information is contained in the Note, longer information is annotated at the side and referenced to the Note with a label.
	Reference	A pale, dashed box is used to indicate a reference to a nested, more detailed model and description, such as a specification, fine grained diagram, datamodel etc. The number of incoming and outgoing connections in the subordinate diagram must be the same as those entering or leaving the reference box. Through this, the SPN provides for multi-layered hierarchical descriptions of a system and is easily extensible with other notational conventions.
	Continuation	A pale dashed line is used to indicate that the process continues on a different page. Page numbers are labeled.

In Practice

I start drawing sketches by hand on a blackboard, or whiteboard to get a first overview and quickly move things around, reconnect and rearrange. A final, handdrawn sketch is fine, if an electronic diagram is required, I find after experimenting with many alternatives that Google Slides is currently the best tool to develop SPN diagrams. Here is a link to a SPN template that you can copy and use for your own diagrams.

Best Practice in SPN Diagrams
Labels
Good	Mnemonic labels; Verb labels for activities; Acronyms for data; Actual names for resources; Versioned references.
Bad	Wordy labels (Graph.clustering) Meaningless labels (A, B, C ...)
Worst	Wordy and redundant labels (Retrieve.sequence.from.NCBI) Duplicate labels; References without version numbers.
Connectors
Good	All connectors straight and orthogonal; Minimal number of crossings; No backtracking; Connectors attached to proper connection points..
Bad	Connectors at odd angles; Superfluous corners; Backtracking (breaks consistent direction of information flow).
Worst	Unnecessarily crossing lines; Connectors attached to corners or the wrong side of elements. Connectors that are slightly off horizontal/vertical (these stand out visually and give a sloppy impression of the diagram).
Color
Good	Maximally restrained use of color to separate elements from background and perhaps to distinguish functionally related modules; Consistent use of black lines and text for process components, and grey lines and text for annotations.
Bad	No use of color (elements blend into canvas); Obtrusive color (attention is drawn away from process contents).
Worst	Gratuitous use of color and decorations – shadows, outlines, gradients and similar fluff ...
Even Worse than Worst	Inconsistent use of decorations.

Example

A sample process. Note that FETCH generates multiple files, PRY/MEND and TELL operate on these files sequentially. The files are merged by COUNT, which produces a single output list. Two nested diagrams for TELL and COUNT are referenced, there is no off-page continuation, the process terminates with FRA.

@@ Line 43: / Line 43: @@
+<table cellpadding="10">
+<tr>
+<td colspan="3" style="font-size:120%; background-color:#cce0eb;" align="center">'''SPN Elements'''</td>
+</tr>
+<tr>
+<th style="font-size:110%; background-color:#daeaf2;">'''Icon'''</th>
+<th style="font-size:110%; background-color:#daeaf2;">'''Name'''</th>
+<th style="font-size:110%; background-color:#daeaf2;">'''Description'''</th>
+</tr>
+<tr>
+<td style="border-bottom:solid 1px #999999;">[[File:SPN_Data.png|50px]]</td>
+<td align="center" style="border-bottom:solid 1px #999999;">'''Data'''</td>
+<td style="border-bottom:solid 1px #999999;">
+Any data item that is produced by the process. Typically this may be a file, but it can also be a stream, a series of files, an object in memory, or one or more tables in a datamodel. The nature of the data is clarified in the description. Labels are short mnemonics. Examples: ID, Tor, s3D, mapX. Data items have one incoming relationship line (a data item is produced by exactly one activity) and can have more than one outgoing relationships (data items can be used by more than one activity). The data icon can be used to mark the ''Begin'' and ''End'' state for the process. ''Begin'' icons are not produced by an activity.
+</td>
+</tr>
+<tr>
+<td style="border-bottom:solid 1px #999999;">[[File:SPN_MultiData.png|50px]]</td>
+<td align="center" style="border-bottom:solid 1px #999999;">'''Multi data'''</td>
+<td style="border-bottom:solid 1px #999999;">
+Multiple data items with the same format and semantics are represented with the multi data icon. They are produced and consumed sequentially by an activity. For paralell processes, use braces (see below).
+</td>
+</tr>
+<tr>
+<td style="border-bottom:solid 1px #999999;">[[File:SPN_ConnectingArrows.png|75px]]</td>
+<td align="center" style="border-bottom:solid 1px #999999;">'''Connecting arrow'''</td>
+<td style="border-bottom:solid 1px #999999;">
+Connecting arrows depict dataflow in the process and establish the relationship between input, activity and output. All lines are orthogonal and line crossings are avoided. Drawing connections against the flow is discouraged, and lines must never enter the output (lower) edges of icons.
+All connections have a direction. We never use plain lines but arrows.
+Arrows at the page edge that do not connect to icons indicate that the workflow / diagram is continued on another page.
+</td>
+</tr>
+<tr>
+<td style="border-bottom:solid 1px #999999;">[[File:SPN_Resource.png|100px]]</td>
+<td align="center" style="border-bottom:solid 1px #999999;">'''Resource'''</td>
+<td style="border-bottom:solid 1px #999999;">
+A database, repository or service that supplies data to the process through some query mechanism but is not significantly modified by the process, i.e. it is not a target of process output – any "canned" result. Labels are either acronyms, abbreviations, or proper names. Examples: PDB, SP ( = SwissProt). This icon is also used for services, i.e. activites that reside outside of the system, such as BLAST, or T-Coffee.
+</td>
+</tr>
+<tr>
+<td style="border-bottom:solid 1px #999999;">[[File:SPN_Activity.png|100px]]</td>
+<td align="center" style="border-bottom:solid 1px #999999;">'''Activity'''</td>
+<td style="border-bottom:solid 1px #999999;">
+Activities operate on input to generate output. These can be programs, scripts, modules or functions/methods within programs. Activities are given mnemonic labels, one-sylllable action verbs are recommended. Examples: Hoard, Spin, Chew. Labels do not have to be the actual function name, their purpose is merely to be descriptive enough to support comprehension of the workflow.
+</td>
+</tr>
+<tr>
+<td style="border-bottom:solid 1px #999999;">[[File:SPN_Parameter.png|75px]]</td>
+<td align="center" style="border-bottom:solid 1px #999999;">'''Parameter'''</td>
+<td style="border-bottom:solid 1px #999999;">
+One or more parameters for an activity. These would be values that would typically passed through command-line or initialization file. Parameters are named and default values can be given. Ellipsis {...} can be used if the list is not complete. Parameters can be ''Begin'' states.
+</td>
+</tr>
+<tr>
+<td align="center" style="border-bottom:solid 1px #999999;">[[File:SPN_Function.png|50px]]</td>
+<td align="center" style="border-bottom:solid 1px #999999;">'''Function'''</td>
+<td style="border-bottom:solid 1px #999999;">
+A function, subroutine or algorithm that is a non-trivial part of an activity. This is optional and only used to highlight a particular functional aspect of the workflow, or choices between alternatives. Example: BioConductor, Minerva, curl.
+</td>
+</tr>
+<tr>
+<td style="border-bottom:solid 1px #999999;">[[File:SPN_Branch.png|50px]]</td>
+<td align="center" style="border-bottom:solid 1px #999999;">'''Branch'''</td>
+<td style="border-bottom:solid 1px #999999;">
+Conditional branches fork dataflow according to a condition that is described in a right-hand side annotation. Labels are short eg. C1, C2. Use these sparingly, SPN diagrams are data flow models, not control models. It will usually be better to indicate iteration or recursion with a ''Note'' on the Activity, rather than drawing an explicit loop. But don't be dogmatic about this, draw a loop if it improves clarity.
+</td>
+</tr>
+<tr>
+<td style="border-bottom:solid 1px #999999;">[[File:SPN_Connector.png|75px]]</td>
+<td align="center" style="border-bottom:solid 1px #999999;">'''Connector'''</td>
+<td style="border-bottom:solid 1px #999999;">
+The connector dot is used when dataflows are merged or split. In general, it should be avoided – data icons accept up to three outgoing lines and activities accept up to three lines from input, output and resources. However connectors are required if more than three conections to the same object are needed. They make splits and joins explicit and distinguish them clearly from line crossings.
+</td>
+</tr>
+<tr>
+<td style="border-bottom:solid 1px #999999;">[[File:SPN_Braces.png|100px]]</td>
+<td align="center" style="border-bottom:solid 1px #999999;">'''Braces'''</td>
+<td style="border-bottom:solid 1px #999999;">
+Braces provide conceptual connections to '''parallel processes''', such as map/reduce procedures.
+</td>
+</tr>
+<tr>
+<td style="border-bottom:solid 1px #999999;">[[File:SPN_Note.png|100px]]</td>
+<td align="center" style="border-bottom:solid 1px #999999;">'''XXX'''</td>
+<td style="border-bottom:solid 1px #999999;">
+A Note contains a comment about a component of the diagram or about its functions or purpose. The sense of the arrow always goes against the dataflow to avoid confusing a note with an activity. Short information is contained in the Note, longer information is annotated at the side and referenced to the Note with a label.
+</td>
+</tr>
+<tr>
+<td style="border-bottom:solid 1px #999999;">[[File:SPN_Reference.png|100px]]</td>
+<td align="center" style="border-bottom:solid 1px #999999;">'''Reference'''</td>
+<td style="border-bottom:solid 1px #999999;">
+A pale, dashed box is used to indicate a reference to a nested, more detailed model and description, such as a specification, fine grained diagram, datamodel etc.
+The number of incoming and outgoing connections in the subordinate diagram must be the same as those entering or leaving the reference box.
+Through this, the SPN provides for multi-layered hierarchical descriptions of a system and is easily extensible with other notational conventions.
+</td>
+</tr>
+<tr>
+<td style="border-bottom:solid 1px #999999;">[[File:SPN_Continuation.png|75px]]</td>
+<td align="center" style="border-bottom:solid 1px #999999;">'''Continuation'''</td>
+<td style="border-bottom:solid 1px #999999;">
+A pale dashed line is used to indicate that the process continues on a different page. Page numbers are labeled.
+</td>
+</tr>
+</table>
+{{Vspace}}
 ==In Practice==

Difference between revisions of "SPN"

Revision as of 14:52, 2 February 2016

Contents

Introduction

Elements

In Practice

Example

Further reading and resources

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Sections

Tools