Digital Libraries and Grid Computing
Project Leader: Glenn Hyland, University of Tasmania
Description
Earth systems science, with its heritage of global collaboration and long tradition of sharing data collected on ground, air and sea events, is well suited to the adoption of grid systems. The project is enabling Australian data sets related to oceans, atmospheres, Antarctica and the climate to be made transparently available to the Australian research and the international research community using grid protocols and standards.
Additionally, the project will create compute grid applications that will improve access to, and the workflow within, these applications for the Australian research community, users of the APAC National Facility and APAC partner facilities. This will open the way for scientists to focus on the science, and undertake new experiments based on ensemble models, optimisation, and studies of future, current and past climates. This work meets the national priority for climate change and climate variability and smart information use.
Achievements
| 2004 |
- OpenDAP server established at National Facility. This server is now serving the TPAC 1/8 degree model results to its community of users (now number ~7 people). There are two numerical simulations and a third simulation to be undertaken in third and fourth quarters of 2005 (~2 TBytes).
- Ocean Colour OpenDAP server. The software for processing of ocean colour data for serving to the Open DAP server has been developed and implemented. The OpenDAP server is being established on new hardware at TPAC with a 6 Terabyte RAID array and will be full installed in February 2005. This represents significant additional investment for CSIRO Marine Research.
- Trials have also been established for the use of Live Access Servers serving sea-surface temperature (www.marine.csiro.au/las/servlets/dataset). Live Access Servers allow data discovery through the visualisation of data across the web. This is a phase 3 milestone activity for 2005.
- Development of the Ocean Data Portal in Gridsphere. Work has commenced on the development of on Ocean Data Portal written in Gridsphere and is due for completion by the end of 2Q2005. The portal will work with the OpenDAP Earth Systems Science network that has been developed as part of this project. During the reporting period, the user interface had been developed and search facilities within a pre-existing catalogue had also been developed. The crawler to create catalogues of the entire OpenDAP holdings is being developed in 1Q2005.
- Inter-governmental Panel on Climate Change (IPCC) data. CSIRO Atmospheric Research has delivered the Australian IPCC scenarios to IPCC data center USA, and will make all of these model output available to the Australian Community from CSIRO HPCCC. This milestone will be completed in February 2005. Results from other nations’ model scenarios for IPCC scenarios will also be held at the National Facility. These data are also important and central to the IPCC Fourth Assessment Report for the publication in 2007.
|
| 1H2005 |
- An Oceans and Climate portal has been developed in Gridsphere to operate over the OpenDAP network. It supports search by metadata and file characteristics, displays data sets in a common navigable format, and crawls known OpenDAP installations to automatically find new datasets.
- The Earth Systems Science OpenDAP network is now complete. There are servers at 6 locations across Australia – CSIRO HPSC and BMRC in Melbourne, TPAC and CSIRO MR in Hobart, the National Facility in Canberra, and the ac3 facility in Sydney. These servers provide access to 11 different ocean and climate related datasets – a total of 14TBytes of data in over 2 million files (this will expand to over 50TBytes by the end of 2006).
- Now serving IPCC full model output results via OpenDAP service at the National Facility.
- Now serving IPCC CSIRO Mk3 results via OpenDAP service at CSIRO HPSC.
- CSIRO AR has a functioning Live Access Server (LAS) installed. Connecting the ocean colour product when it is more robust will be a trivial exercise. Installation of a LAS was delayed at the National Facility because authentication of IPCC data holdings was not completed within this reporting period.
- Digital Library Repositories: CSIRO Ocean Colour Data at CSIRO MR completed; CSIRO IPCC scenarios at CSIRO HPSC completed.
- Installed harvest methods for creating catalogues. Stage 1 development of the Digital Library Portal is done, and stage 2 is underway.
|
| 2H2005 |
- Completed development of Oceans and Climate Data Portal to operate over the OpenDAP network. It supports search by metadata and file characteristics, displays data sets in a common navigable format, and crawls known OpenDAP installations to automatically find new datasets.
- Ported CSIRO Mk3L climate model to run on the National Grid.
- Completed development of CSIRO Mk3L Portal.
- A specifications document for a web-services based computational toolkit has been developed. One module has been developed by Ian Smith, an APAC Intern at TPAC.
- CSIRO AR has a functioning LAS installed. A LAS has been installed at the APAC NF, but has yet to be fully configured. Delays have occurred because the next generation LAS (Java based) is about to be released, and this newest version will be more suitable for incorporation into the National Grid infrastructure.
- BMRC is waiting for additional features to be added to the OpenDAP server software (authentication, user rights, and resource management) before making these data sets available to external users.
- Several new datasets have been identified as suitable candidates for inclusion in TPAC’s digital data repository.
- A number of ESS models have been chosen as suitable candidates for porting to the APAC National Grid: CSIRO Mk3L, UVIC ESCM 2.7, CLIMBER3, NCAR CCSM 3.0. All models have been ported to at least one of the partner facilities. Portals have been, or are in the process of being, developed for several of these models.
|
| 1H2006 |
- Digital Library Repositories: added Argo float and MSLA datasets (CSIRO MAR), and ASPeCT sea-ice dataset (AAD)
- BMRC has contracted the OpenDAP developers to include additional features (authentication, user rights, and resource management) into the next version of the OpenDAP server software. They require these features before making BOM data sets available to external users.
- Digital Library Tools: Insight4 has completed development of a Gridsphere-based WMS client. The client allows users to connect to and browse any OGC-compliant web-service.
- CSIRO AR has a functioning LAS installed. LASs have been installed at the APAC NF and CSIRO HPSC, but have yet to be fully configured to serve up the IPCC dataset. Delays have occurred because the next generation LAS (Java based) is about to be released, and this newest version will be more suitable for incorporation into the National Grid infrastructure.
- The CSIRO Mk3L climate model is now running in both serial and parallel modes on several National Grid facilities (APAC, iVEC, TPAC) – previously there was a problem with parallel execution on Linux platforms (this has been resolved).
- The UVIC ESCM 2.7 climate model is running on the AC3 facility (Altix and Beowulf clusters), and is currently being ported to the APAC NF. It is being used extensively by Willem Sijp from UNSW maths.
|
Plan and Milestones for 2006
The TPAC OpenDAP network is now complete. Many of the datasets continue to expand, and several new datasets will be brought online throughout 2006. In particular, once BMRC has completed adding security features to the OpenDAP server software, many of its operational datasets will be made available to the wider research community.
Milestones for Digital Library Repositories
| December |
- Sea-surface & sub-surface temperature analysis products (BMRC)
- ACE CRC sea-ice simulations, 1980-2000 (ACE CRC)
|
To increase the utility of the digital repositories, we plan to support and develop server-side analysis and visualisation services. This will allow users to display of the contents of the digital repositories, and allow very large datasets to be manipulated prior to sending the normally much smaller results across the web to client programs. For example, a user could average many satellite images or calculate the global temperature from a very long model simulation on the server hosting the data, rather than downloading large amounts of data over the network to perform the analysis on their client machine.
These server-side services will be based on standard open-source tools such as the Live Access Server (LAS), and GrADS Data Server (GDS). Additionally, we plan to include web-services (OGC compliant) portals to allow the OpenGIS (or XML) based communities to access these services using standard web clients.
Milestones for Digital Library Tools
| Complete |
- Development of web-services specification documents
|
| October |
- Development of WMS-LAS wrapper (Insight4)
|
| December |
- Development of WCS-OpenDAP wrapper
|
Selected earth system models will be ported to all high performance computing platforms that will be participating in the National Grid. This single step will allow many new users to use these models immediately on the APAC and partner facilities. The selected models will be transitioned to grid based models executable on the Grid, and will become complete portals in their own right.
Much of the scientific analysis of datasets produced in large-scale earth system simulations is routine, in principle. Standard discrete operations such as averaging of fields, flux calculations, etc are repeated time and again – over different data, different geographic or temporal regions, and in different composite workflows. We plan to develop an earth systems science toolkit implementing some of these key operations, and by grid-enabling the toolkit users will be able to execute such routine workflows efficiently on the Grid.
Milestones for Compute Grid
| December |
- Implementation of an ESS analysis work-flow portal
|
| Ongoing |
- Porting of selected ESS models to APAC partner facilities
- Development of model portals
|
By the end of 2006 we will have the various portals up and running on the National Grid - data discovery, visualisation, analysis tookit. We then plan to integrate these individual portals to form a complete earth systems science workflow portal. This will enable researchers to discover datasets of interest, to visualise the contents of those datasets, and to do simple analysis with the data - all within the one web-based portal environment.
Participating Organisations
- University of New South Wales
- University of Sydney
- Australian National University
- University of Tasmania
- Australian Antarctic Division
- Bureau of Meteorology Research Centre
- CSIRO Marine and Atmospheric Research
- Antarctic Climate and Ecosystems (ACE) CRC
Resources for 2006
- Total 2006 resources are 5.10 efts (ac3: 1.00, ANU: 0.10, CSIRO: 1.00, TPAC: 3.00)
- APAC is providing funds to support 2.50 efts (ac3: 0.50, ANU: 0.00, CSIRO: 0.50, TPAC: 1.50)