Analysis of Scientist Descriptions of Workflows
Workflow systems often provide graphical user interfaces for the composition and editing of workflows. However, workflows are often naturally described in natural language in scientific publications. We wanted to understand and quantify the gap between the formal workflow representation of an experiment and its description in natural language created by a scientist. By quantifying this gap, valuable insight can be gained into what users consider to be the most pertinent information when naturally describing workflows.
In our work, we measure this gap by comparing the natural language descriptions of bioinformatics workflows with their associated workflow representations by measuring the difference between the procedural information constructs they contain. For example, a formal workflow representation might specify six detailed steps, whereas the associated textual description may focus on the two key steps within the workflow. Using this information, we identified key constructs that workflow systems can support to make them more natural. We performed an analysis of workflows from myExperiment (http://www.myExperiment.org), a virtual research environment that facilitates collaboration and sharing of workflows through a social web approach. At the time of this study (Sept 30 2008), myExperiment had 1181 users and 451 workflows. The workflows currently within myExperiment are edited and executed using the Taverna workflow system. Each workflow in myExperiment includes a title, a textual description, and a diagrammatic rendition of the formal representation of components and their dataflow. In our analysis, we include the title of the workflow as part of the description. It is well known that human instruction is often incomplete, erroneous, and out of order, and we found that workflow descriptions are no exception. But because the formal representations in myExperiment are executable, the dataflow diagram provides a complete specification of the workflow that the user has described with text. The descriptions provide us with useful examples of how humans describe workflows naturally, while still having a gold standard of what the actual workflow is.
Our study resulted in the following conclusions:
- Users specify key components used in a workflow. However, they often leave out components assuming their inclusion can be inferred. Likewise, users specify important bindings (inputs) and effects (outputs) including their types, while leaving many necessary inputs and outputs unspecified.
- Generally, user descriptions contain a correct ordering for those steps found within the description. While many workflows contained sub-workflows, the descriptions did not describe these decompositions. Instead, users treated workflows as just another component that could be referred to by name.
- Advice was given to indicate when to use the workflow, and appeared both in positive and negative forms.
- The vocabulary used in the textual descriptions differs dramatically from the component names used within the workflows.
This work is reported in the following publication: * "Analyzing the Gap Between Workflows and their Natural Language Descriptions". Paul Groth and Yolanda Gil. Proceedings of the IEEE Third International Workshop on Scientific Workflows (SWF), Los Angeles, CA, July 10, 2009. Available as a preprint.