Workflow Design: Designing Scientific Applications One Workflow at a Time
Much of science today relies on software to make new discoveries. This software embodies scientific analyses that are frequently composed of several application components and created collaboratively by different researchers. Computational workflows have recently emerged as a paradigm to manage these large-scale and large-scope scientific analyses. Workflows represent computations that are often executed in geographically distributed settings, their interdependencies, their requirements and their data products. In our view, the design of these workflows is at the core of the scientific discovery process and must be treated as scientific products in their own right.
The focus of this research was to develop the foundations for a science of design of scientific processes embodied in the new software artifact that is the computational workflow. The work integrates best practices and lessons learned in existing workflow applications, and extends them in order to define and formalize design principles of computational workflows. This research represents a fundamentally new approach to designing workflows that will greatly improve the scientific software design methodology by defining and formalizing design principles, and by familiarizing the scientific community with these effective workflow design processes.
Recent Results and Project Reports
We investigated several major research areas:
- Capabilities of current workflow systems and benefits of using semantics for workflow descriptions
- Repository of workflow exemplars that capture commonly occurring workflow structures
- Comparison of workflows with other programming paradigms
- Investigation of desirable properties of workflows
- Discovery of workflows
- Analysis of scientific descriptions of workflows
- Use of data collections in parallel constructs of workflows
- Execution of workflows on cloud environments
- Workflows Across Organizations
Learning About Workflows
We developed introductory materials and tutorials to introduce workflows:
- "A Framework for Efficient Text Analytics through Automatic Configuration and Customization of Scientific Workflows." Matheus Hauder, Yolanda Gil, and Yan Liu. Proceedings of the Seventh IEEE International Conference on e-Science, Stockholm, Sweden, December 5-8, 2011. Available as a preprint.
- "A New Approach for Publishing Workflows: Abstractions, Standards, and Linked Data." Daniel Garijo and Yolanda Gil. Proceedings of the Sixth Workshop on Workflows in Support of Large-Scale Science, held in conjunction with SC-11, Seattle, WA, Nov. 12-18 2011. Available as a preprint.
- "A Semantic Framework for Automatic Generation of Computational Workflows Using Distributed Data and Component Catalogs". Yolanda Gil, Pedro Antonio Gonzalez-Calero, Jihie Kim, Joshua Moody, and Varun Ratnakar. To appear in the Journal of Experimental and Theoretical Artificial Intelligence, 2012. Available as a preprint.
- "Wings: Intelligent Workflow-Based Design of Computational Experiments". Yolanda Gil, Varun Ratnakar, Jihie Kim, Pedro Antonio Gonzalez-Calero, Paul Groth, Joshua Moody, and Ewa Deelman. IEEE Intelligent Systems, 26(1), 2011. Available as a preprint.
- "Assisting Scientists with Complex Data Analysis Tasks through Semantic Workflows". Yolanda Gil, Varun Ratnakar and Christian Fritz. Proceedings of the AAAI Fall Symposium on Proactive Assistant Agents, Arlington, VA, November 2010. Available as a preprint.
- "Reasoning about the Appropriate Use of Private Data through Computational Workflows". Yolanda Gil and Christian Fritz. AAAI Spring Symposium on Privacy Management, Stanford, CA, March 23-25, 2010. Available as a preprint.
- "Principles for Interactive Acquisition and Validation of Workflows." Yolanda Gil, Jihie Kim, and Marc Spraragen. Journal of Experimental and Theoretical Artificial Intelligence, 22(2), 2010. Available as a preprint.
- "Workflows and e-Science: An Overview of Workflow System Features and Capabilities". Ewa Deelman, Dennis Gannon, Matthew Shields, Ian Taylor. Future Generation Computer Systems, Vol 25, 2009. Available from the publisher.
- "Analyzing the Gap Between Workflows and their Natural Language Descriptions". Paul Groth and Yolanda Gil. Proceedings of the IEEE Third International Workshop on Scientific Workflows (SWF), Los Angeles, CA, July 10, 2009. Available as a preprint.
- "Workflow Matching Using Semantic Metadata". Yolanda Gil, Jihie Kim, Gonzalo Florez, Varun Ratnakar, and Pedro A. Gonzalez Calero. Proceedings of the Fifth International Conference on Knowledge Capture (K-CAP), Redondo Beach, CA, September 1-4, 2009. Available as a preprint.
- "Expressive Reusable Workflow Templates". Yolanda Gil, Paul Groth, Varun Ratnakar, and Christian Fritz. Proceedings of the Fifth IEEE International Conference on e-Science, Oxford, UK, December 9-11, 2009. Available as a preprint.
- "From Data to Knowledge to Discoveries: Scientific Workflows and Artificial Intelligence". Yolanda Gil. Scientific Programming, Volume 17, Number 3, 2009. Available as a preprint and | from the publisher.
- "Scientific Software as Workflows: From Discovery to Distribution". David Woollard, Nenad Medvidovic, Yolanda Gil, and Chris Mattmann. IEEE Software, Special Issue on Developing Scientific Software, July/August 2008. Available from the publisher.
- "Self-Configuring Applications for Heterogeneous Systems: Program Composition and Optimization Using Cognitive Techniques". Mary Hall, Yolanda Gil, and Robert Lucas. Proceedings of the IEEE, Special Issue on Cutting-Edge Computing: Using New Commodity Architectures, Volume 96, Issue 5, May 2008. Available from the publisher and as a preprint.
- "Characterization of Scientific Workflows". Shishir Bharathi, Ann Chervenak, Ewa Deelman, Gaurang Mehta, Mei-Hui Su, Karan Vahi, 3rd Workshop on Workflows in Support of Large-Scale Science (WORKS08), Austin, TX, November 2008. Available as a preprint.
- "On the Use of Cloud Computing for Scientific Workflows," Christina Hoffa, Gaurang Mehta, Timothy Freeman, Ewa Deelman, Kate Keahey, Bruce Berriman, John Good, 3rd. International Workshop on Scientific Workflows and Business Workflow Standards in e-Science (SWBES) in conjunction with Fourth IEEE International Conference on e-Science (e-Science 2008), 10 December 2008 in Indianapolis, Indiana, USA. Available as a preprint.
- "Resource Provisioning Options for Large-Scale Scientific Workflows," Gideon Juve, Ewa Deelman, Third International Workshop on Scientific Workflows and Business Workflow Standards in e-Science (SWBES) in conjunction with Fourth IEEE International Conference on e-Science (e-Science 2008), 10 December 2008 in Indianapolis, Indiana, USA. Available as a preprint.
Points of Contact
- Yan Liu, University of Southern California
- David Woollard, Jet Propulsion Laboratory
- Chris Mattmann, Jet Propulsion Laboratory
- Nenad Medvidovic, University of Southern California
- Mary Hall, University of Utah
- Joel Saltz, Emory University
- Tahsin Kurc, Emory University
- Pedro Gonzalez, Universidad Complutense de Madrid (Spain)
- Michael Reich, MIT/Harvard Broad Institute
- Ann Chervenak, University of Southern California
- Jihie Kim, University of Southern California
- Daniel Garijo (PhD student), Universidad Politecnica de Madrid
- Matheus Hauder (Masters student), University of Augsburg
- Sara Alspaugh (Undergraduate student), University of Southern California
- Gonzalo Florez (PhD student), Universidad Complutense de Madrid
- Christian Fritz (Post-doctoral student), University of Southern California
- Paul Groth (Post-doctoral student), University of Southern California
This work was done under the grant Designing Scientific Software One Workflow at a Time, funded by the National Science Foundation with grant number CCF-0725332 from October 2007 to September 2011.