What Are Workflows?
Computational workflows have recently emerged as an effective paradigm to manage large-scope terascale scientific analyses and are a crucial technology to scale up to petascale levels.
Workflows were used for decades to manage business processes in complex organizations by providing a formalism to specify tasks, their dependencies, their requirements, and products, and to track task execution over time. Similarly, computational workflows represent computations that are often executed in geographically distributed settings, their interdependencies, their requirements, and their data products.
In the last few years, research focused on the creation and execution of computational workflows has resulted in great gains in productivity, feasibility, and scalability of quite complex scientific analyses. Existing workflow systems have been demonstrated in a variety of scientific applications where workflow creation draws from catalogs of hundreds of distributed software components and data sources, where the generation of workflows of thousands of interrelated computing processes is automated, and where the execution of workflows takes place on high-end computing resources and often spans several months.
Some workflow systems have been deployed for routine use in scientific collaboratories in many scientific disciplines including astronomy, earthquake science, physics, and biology (e.g., National Virtual Observatory, the Southern California Earthquake Center, and the Laser Interferometer Gravitational-wave Observatory).
An Introductory Book
- "Workflows for e-Science: Scientific Workflows for Grids" Taylor, I. J., Deelman, E., Gannon, D. B., and M. Shields (Eds). Springer Verlag, 2007.
Overview Articles About Workflows
- "Examining the Challenges of Scientific Workflows", Yolanda Gil, Ewa Deelman, Mark Ellisman, Thomas Fahringer, Geoffrey Fox, Dennis Gannon, Carole Goble, Miron Livny, Luc Moreau, and Jim Myers. IEEE Computer, vol. 40, no. 12, pp. 24-32, December, 2007.
We have designed tutorial materials that have been presented in a variety of conferences. One focuses on workflow management and execution. Another focuses on the design of workflows and how they capture interesting data-rich or computationally-intensive experiments.
Tutorial on Workflow Management and Distributed Execution
This tutorial examines the opportunities and challenges of designing and executing scientific workflows in distributed environments. It introduces scientific workflows, their usefulness in data analysis, and the challenges of running scientific workflows in a variety of execution environments. This tutorial outlines issues that need to be addressed by any workflow system in order to be able to run scientific workflows on the grid. It also focuses on issues of workflow composition--how to design workflow components that are portable across many platforms and how to define workflows at an appropriate level of abstraction.
Tutorial on Computational Workflows for Large-Scale AI Research
This tutorial is designed to: 1) introduce computational workflows to AI researchers as a powerful paradigm they can use to manage large-scale experimentation, and 2) present interesting research problems for AI that arise in developing workflow systems. The tutorial begins with an introduction to the stages of design of workflows, and the capabilities of current workflow systems being used in a variety of scientific domains to specify and manage thousands of distributed computations. The introduction includes examples of computational workflows in several science applications that specify the analysis steps to be executed and the data flow among them. It also describes how to design these steps as encapsulated software components so that the workflow system can automatically manage execution and tailor it to available computing resources. The second part of the tutorial presents recent work on applying computational workflows to artificial intelligence as a scientific domain, in particular for large-scale integrative machine learning and for natural language processing. The last part of the tutorial will introduce a variety of artificial intelligence techniques that are relevant to current challenges in computational workflows, including dynamic self-configuration, interactive steering, continuous and robust operations, and performance optimization.