Genome Annotation
Project Leader: Matthew Bellgard, Murdoch University
Description
Bioinformatics is rapidly developing into a highly data and compute intensive research domain driven by recent advances in data gathering experimental techniques. Activities covered under bioinformatics include comparative genomic analysis (as a result of the enormous quantities of data emanating from the various genomics and functional genomics projects), proteomic data analysis, phylogenetics and molecular evolution studies, protein modelling (including protein-ligand interactions), complex systems modelling, and plant breeding.
The main aims of the project are to:
- establish a Grid-enabled Blast system across the APAC partners;
- install a local Ensembl database and install the relevant compute/data intensive tools at iVEC/SAPAC. Ensembl is a joint project between EMBL-EBI and the Sanger Institute consisting of a database that contains annotated eukaryotic genomes; it is a trusted, well known bioinformatics resource);
- develop a distributed Genome Annotation System using the Rice Genome as a model.
The collaborators are the APAC partners with the CRC for Molecular Plant Breeding, Institute of Molecular Biosciences, the Australian Centre for Plant Functional Genomics and the ARC Network for Parasitology. There are several other groups which will be involved in this project such as the Australian Genome Research Facility, the Victorian Bioinformatics Consortium, the Centre for Bioinformatics and Biological Computing and the WA State Government Centre of Excellence for Comparative Genomics.
Achievements
| 2004 |
- The project has developed a unified direction which has been achieved in consultation with the other projects and the Steering Committee.
- Some of the fundamental procedures for mirroring of bioinformatics data across APAC nodes has been established and implemented.
- In addition, standard procedures have been established for the installation, configuration and maintenance of APAC nodes and aligning the Bioinformatics project to these.
- SAPAC and iVEC have operational versions of coordinated data mirroring and web-enabled Blast service. AC3 also has a prototype of a distributed framework for the BLAST bioinformatics application.
- SAPAC have been studying international bioinformatics grid projects, and familiarizing the project team with existing bioinformatics tools and grid software, including myGrid and a number of grid-enabled implementations of BLAST.
- Further discussions held with the Australian Centre for Plant Functional Genomics and collaborators in the APAC bioinformatics project to identify requirements and modify the project plan.
- Work has continued on implementing support for a bioinformatics grid on SAPAC facilities, including bioinformatics software, database mirrors, and grid software.
- JCU has implemented as simple portal (based on cgi/perl) that can call local and remote compute resources via PBS queues. At present this is not a grid resource, but work is underway to convert this interface to a GridSphere-based portal.
|
| 1H2005 |
- Developed a communication mechanism and understanding for expertise sharing between CBBC, iVEC and SAPAC.
- Datasets are now mirrored between ANU, SAPAC and CBBC/iVEC, experiments are underway to use SRB for this purpose.
- Blast services: Blast system has been implemented at CBBC/iVEC and SAPAC. Portal based access using Nimrod/G became available on SAPAC resources in May and the portal will be rolled out to other resources after APAC Grid gateways are operational.
- Genome Annotation System: Set up an operational version of Ensembl; integrated Ensembl with CBBC web service Bioinformatics tools, and designed integration between Ensembl and the APAC Grid.
- Deployed CLUSTAL XP on QCIF and APAC resources, and further developed its parallel performance, to production mode; and develop a web portal, accessible to a wider community of biochemists and health researchers.
- The MyGrid system was evaluated - some useful ideas, but has a different model for service delivery.
|
| 2H2005 |
- Bioinformatics Application Support Project Workshop in early August, with representatives from CBBC, SAPAC, iVEC and UTS. Clarified the roles of all groups in this project. Draft plan outlining how all groups and systems are to integrate was sent out by project leader on 18 Oct to all group leaders for confirmation. No further updates although requests were made.
- SAPAC now able to run test jobs to SAPAC machines through the APAC grid gateway. Some issues with security, firewalls, user accounts, and data management that need to be addressed.
- A web service interface to SAPAC’s Nimrod BLAST service was completed in December 2005 and tested on a SAPAC development machine.
- Data sets are now mirrored between ANU, SAPAC and CBBC/iVEC, experiments are underway to use SRB for this purpose.
- Set up an operational version of Ensembl; integrated Ensembl with CBBC web service Bioinformatics tools, and designed integration between Ensembl and the APAC Grid. Ensembl pipeline installed and tested using CBBC internal resources.
|
Plan and Milestones for 2006
Bioinformatics tools
| March |
- Consistency with dataset updates, versions of software
- Blast available as Web Service
|
| June |
- Blast executed across the APAC GRID
|
| August |
- Repeat above for RepeatMasker
- Gene prediction tool, possibly Glimmer
|
Distributed Genome annotation
| June |
- Install Ensembl on specific APAC nodes and trial
- Implement Web Service interfaces to Ensembl components
|
| December |
- Populate Ensembl with genomes from crop plants (rice &wheat), production animals (Beef).
- Parasite (to be selected) to enable annotation sharing work
|
Participating Organisations
- Murdoch University
- SAPAC
- Australian Centre for Plant Functional Genomics
Resources for 2006