International Lattice Data Grid
Project Leader: Paul Coddington, SAPAC
Description
The International Lattice Data Grid (ILDG) is an international collaboration that aims to allow high-energy physicists to publish, locate and access hundreds of Terabytes of data from computationally intensive lattice Quantum ChromoDynamics (QCD) simulations, and stored in distributed data repositories located around the world.
The ILDG project aims to implement an Australian node of the ILDG, to serve lattice QCD data generated by the Centre for the Subatomic Structure of Matter (CSSM). We will work with the ILDG collaboration in developing standards for data formats, metadata (defined using XML Schema) and middleware interfaces (defined using web services).
The project will develop programs to convert CSSM data files into the standard data format, and to generate standard XML format metadata to describe the data (this is a challenge since existing CSSM data has very little associated metadata). We will also develop an implementation of the ILDG middleware standards, which define web service interfaces for a metadata catalog, which maps metadata queries to logical file names, and a replica catalog, which maps a logical file name to one or more (if there are replicas of the files) physical file names. ILDG nodes must also provide a mechanism for data download, preferably by using Storage Resource Manager (SRM). We will also develop a web portal to allow researchers to search and access CSSM data, and extend this to provide distributed queries across all IDLG nodes. The project will therefore provide all of the infrastructure required for an application-specific distributed data repository.
Achievements
| 2004 |
- We have been tracking developments in metadata standards and proposed interfaces for the International Lattice Data Grid (ILDG). We have also begun investigating possible mechanisms for interfacing to the data repository, including proposals from different ILDG members as well as some of our own ideas based on grid tools such as OGSA-DAI.
- The ILDG completed standardization of its metadata schema (QCDML 1.0) in mid-June, which was a few months later than expected. We are therefore a little behind where we had expected to be in developing a program that generates metadata in this standard schema from CSSM data. However we now have a working program that generates most of the QCDML metadata from the lattice data for one particular QCD simulation program, and we are currently working on extending this to cover the other programs used by CSSM and the remaining metadata fields.
- The metadata generation program has been extended to cover all the main programs used by CSSM. It still requires a few modifications to handle some fields, in particular the type of lattice QCD actions that are supported in the standard QCDML schema do not fully handle some of the novel actions used by CSSM, so this will require negotiating a change to the QCDML schema.
- Implementations of the metadata catalog and replica catalog are underway and should be completed on schedule in 1H2005. Planning to leverage existing software such as OGSA-DAI, the Replica Location Service (RLS) in Globus, and the Metadata Catalog Service (MCS) developed at Kesselmann’s group at University of Southern California. However they are grid services rather than web services, the MCS does not support XML queries, and OGSA-DAI has limited support for XML databases.
|
| 1H2005 |
- Made available a prototype Australian ILDG node for CSSM data at SAPAC, providing the metadata catalog web service defined by ILDG and a web portal to query CSSM data (http://www.sapac.edu.au/ildg/). Details were presented at the 6th ILDG meeting. Data download is still not available.
- The software for generating metadata conforming to the QCDml XML Schema has been completed, and metadata has been generated for most of the CSSM data. There are still some issues with QCDml that need to be addressed – currently it does not support some of the lattice QCD models used by CSSM.
- Investigated possible solutions to deploying a metadata catalog. Due to limitations of existing generic metadata catalog programs (mentioned above), we ended up developing our own metadata catalog that interfaces to an Xindice XML database, which stores the XML metadata, and is queried using XPath. The catalog is accessible as a web service conforming to the current ILDG draft specification.
- Completed a web service wrapper to the Globus GT3 Replica Location Service in order to implement the replica service interface specified by ILDG (RLS for GT4 was not yet available when the work was done).
- Developed a web portal using JSP to enable querying of the CSSM data through the Australian ILDG node.
|
| 2H2005 |
- Provided input to a proposal for updating QCDml metadata schema to support all CSSM data. ILDG is taking some time to decide upon updates to QCDml which affect the schema for the CSSM metadata.
- Almost completed a program to convert CSSM data files to new ILDG standard data format. There is some information that needs to be obtained from QCDml metadata fields which have not yet been standardized, so the conversion is awaiting this decision.
|
Plan for 2006
By the end of 2006 we aim to have developed a full implementation of an ILDG node, populated with all the CSSM data that is to be shared with the international community. We will also have completed a web portal to provide an interface for querying and downloading CSSM data and data for all other ILDG sites that implement the standard interfaces. This goal requires the completion of the following milestones.
Milestones for 2006
| 1H2006 |
- Develop proposal for updating QCDml metadata schema to support all CSSM data.
- Convert CSSM metadata to new version of QCDml.
- Convert CSSM data files to new ILDG standard format.
- Enable download of data files using FTP.
- Modify our codes to conform to revised standard web service interfaces.
- Update replica catalog implementation.
|
| 2H2006 |
- Enable download of CSSM data using Storage Resource Manager (SRM).
- Add capability for distributed data query across multiple ILDG nodes to our web portal.
|
Participating Organisations
- SAPAC
- University of Adelaide
Resources for 2006
- Total resources available to the project for 2006 are 0.60 efts (SAPAC: 0.60).
- APAC is providing funds to support 0.375 efts (SAPAC: 0.375).