Karma Provenance Collection Tool
Overview
Provenance (or lineage, trace) of digital scientific data is a critical component to broadening sharing and reuse of scientific data. Provenance captures the information needed to attribute ownership and determine, among other things, the quality of a particular data set. Provenance collection is often a tightly coupled part of a cyberinfrastructure system, but is better served as a standalone tool. The Karma tool is a standalone tool that can be added to existing cyberinfrastructure for purposes of collection and representation of provenance data. Karma utilizes a modular architecture that permits support for multiple instrumentation plugins that make it usable in different architectural settings.
Visualization of provenance data is more useful with support for manipulating very large structures, for displaying different views and for interactivity. This can help a user to navigate their experiment information with a mental map of what is going on in the experiment, to compare different experiment runs quantitatively, and to do model selection with an effective collaboration between the user and the discovery system. We developed two plugins to Cytoscape to aid the visual representation and navigation of provenace information.
The Karma Provenance Tool is licensed under Apache License, Version 2.0 (the "License") (http://www.apache.org/licenses/LICENSE-2.0). The code is copyrighted and copyright owned by The Trustees of Indiana University. Karma is a product of the Data to Insight Center of Pervasive Technology Institue (http://pti.iu.edu) at Indiana University. See Digital Data Provenance for more information.
Features of Latest Release (v3.2.1)
- Improvement of query performance with provenance graphs caching.
- Implementation of several query API calls.
- Mutiple bug fixes.
Contact Us
- We'd love to hear from you! Please feel free to contact us by submitting this form, or subscribe to Mailing List.
- Check out the latest code from the karmatool on sourceforge.
- Karma Service Core
Runs on minimal versions JDK 1.5, MySQL 5.1, XMLbeans 2.3.0. Requires Apache Ant v1.6 to build. If built as a webservice, Karma requires Apache Axis2 v1.4/v1.5, and Apache Tomcat Server v5.5x/v6. If built as a standalone server, Karma requires RabbitMQ server. Erlang is required to build RabbitMQ.
- Karma Client Core (for RabbitMQ Karma Service Core configuration)
Requires JDK 1.5. Requires Apache Ant v1.6 to build.
- Karma Client Core (for Axis2 Karma Service Core configuration)
- Visualization Plugin; Software dependency: Cytoscape v2.8.1
Documentation
Previous releases are here
Publications
- Bin Cao, Beth Plale, Girish Subramanian, Ed Robertson, Yogesh Simmhan, Provenance Information Model of Karma Version 3, IEEE 2009 Third International Workshop on Scientific Workflows (SWF'09), July 2009.
- Bin Cao, Girish Subramanian, Beth Plale, Poster: Provenance Collection in a Industry Biochemical Discovery Cyberinfrastructure, IEEE e-Science, Indianapolis, IN, December 2008.
- The Open Provenance Model (v1.01). Moreau, L. (Editor), B. Plale, S. Miles, C. Goble, P. Missier, R. Barga, Y. Simmhan, J. Futrelle, R. McGrath, J. Myers, P. Paulson, S. Bowers, B. Ludaescher, N. Kwasnikowska, J. Van den Bussche, T. Ellkvist, J. Frieire, P. Groth, Technical Report, Electronics and Computer Science, University of Southampton, 2008. http://eprints.ecs.soton.ac.uk/16148
- Yogesh L. Simmhan, Beth Plale, Dennis Gannon, Query Capabilities of the Karma Provenance Framework, Concurrency and Computation: Practice and Experience, Vol 20, Issue 5, pp. 441-451, John Wiley and Sons, 2008.
- Yogesh Simmhan, Beth Plale, and Dennis Gannon, Karma2: Provenance Management for Data Driven Workflows, Extended and invited from ICWS 2006. International Journal of Web Services Research, IGI Publishing, Vol 5, No 2, 2008.
- Yogesh Simmhan, Beth Plale, Dennis Gannon, Towards a Quality Model for Effective Data Selection in Collaboratories, IEEE Workshop on Workflow and Data Flow for Scientific Applications (SciFlow06), held in conjunction with ICDE, Atlanta, GA, April 2006.[Slides]
- Yogesh Simmhan, Beth Plale, Dennis Gannon, A Performance Evaluation of the Karma Provenance Framework for Scientific Workflows, International Provenance and Annotation Workshop (IPAW'06), Lecture Notes in Computer Science 4145, L. Moreau and I Foster (Eds), Springer-Verlag, Berlin Heidelberg pp. 222-236, 2006. [Slides]
- Yogesh Simmhan, Beth Plale, and Dennis Gannon, A Framework for Collecting Provenance in Data-Centric Scientific Workflows, Proceedings of the IEEE International Conference on Web Services pp. 427-436, 2006.
- Yogesh L. Simmhan, Beth Plale, and Dennis Gannon, A Survey of Data Provenance in e-Science, ACM SIGMOD Record, Vol. 34, No. 3, September 2005.
- Yogesh L. Simmhan, Beth Plale, and Dennis Gannon, A Survey of Data Provenance Techniques, Technical Report TR-618, Computer Science Department, Indiana University, Bloomington, 2005.
Contact
- Beth Plale [plale at indiana dot edu]
- Yiming Sun [yimsun at indiana dot edu]
Project Contributors
Current:
- Beth Plale, Project Director
- Mehmet Aktas, Associated Faculty
- Scott Jensen, Research Associate
- Yiming Sun, Senior Software Developer
- You-Wei Cheah
- Peng Chen
- Devarshi Ghoshal
- Yuan Luo
Historical:
- Mehmet Aktas, Research Associate
- Bin Cao
- Dennis Gannon
- Prajakta Purohit
- Ed Robertson
- Yogesh Simmhan
- Girish Subramanian