Grand Challenge: Fast Start winners 2011
BlueFern and NeSI wish to congratulate the following winners of the Fast Start Challenge for 2011:
- Tim White from Mathematics and Statistics Department, Otago University with their project: Pushing the Phylogenetics Envelope
- Lutz R. Gehen from Justin M. O'Sullivan's Research Group, Massey University, Auckland with their project: DNA sequence recognition by electrostatic DNA-DNA interactions
In 1859, Charles Darwin proposed that all life on Earth arose through evolution, and the evidence for his theory now overwhelming. It is thought to be a mostly treelike process: over time, new species form by “branching off” existing species. Reconstructing this process is the subject of phylogenetics. Many of the phylogenetic questions that have fascinated scientists since the time of Darwin can now be answered by extracting DNA sequences for each species and looking at subtle variations in them.
One of the most straightforward methods for estimating the tree that describes the history of a group of species is to look for the tree that requires the fewest DNA mutations. This method, maximum parsimony, is based on the intuition that mutations in DNA occur rarely. Unfortunately, it is extremely computationally expensive for more than around 20 species (see “Benefits of Scaling Up”). Existing programs failed to take advantage of multiple CPUs, so in 2011 we released XMP, an open-source program for finding maximum parsimony phylogenetic trees that attacks the problem from several angles, foremost among them a strong focus on efficient parallelisation in the distributed-memory environment. Using computer time generously donated by BlueFern, we were able to achieve unprecedented speedups for this problem, with efficiency above 90% on 128 CPUs. XMP’s highly overlapped communications protocol allows it to drive the BlueGene supercomputer efficiently in Virtual Node mode, effectively doubling the number of CPUs available.
As successful as XMP has been, we have identified 2 areas for improvement. The first is the centralised work-stealing scheme XMP uses to ensure that all CPUs remain productive. While this delivers excellent performance up to around 256 CPUs, we suspect that the centralised nature of this algorithm is the cause of the falloff in efficiency above this point. We wish to investigate whether a distributed work-stealing scheme can unlock even more performance.
The second improvement concerns the maximum parsimony criterion itself. As is well known in the phylogenetics community, maximum parsimony disregards the possibility of a nucleotide mutating more than once in a single branch, which can lead to inaccurate inference of trees containing deep divergences. Less well known is that the method of corrected parsimony rectifies the problem, resulting in a statistically consistent estimator just like the popular maximum likelihood method – but unlike maximum likelihood search, which involves heuristic numerical optimisations, exact search using corrected parsimony offers the intriguing possibility of recovering trees in which the error in the optimality criterion is bounded. We wish to investigate the accuracy of trees inferred using corrected parsimony on realistic-size datasets.
Our proposed development of XMP presents, for us, an opportunity to advance the state of the art in phylogenetic computational analysis, and for the team at BlueFern, an opportunity to gauge its performance on an application centred on an integer inner loop and unpredictable, highly overlapped communications – something quite different from the “traditional” lock-step floating-point demands of FEM or N-body simulations.
The exchange of genetic information between identical or highly similar DNA sequences in the cell nucleus is of great importance for essential cellular functions including the repair of DNA double-strand breaks (DSBs). DSBs are particularly challenging to repair because the continuity of the genetic information is lost. The most reliable process to repair such a break and maintain genetic stability is called homologous recombination: the broken site is aligned to another piece of DNA with the same sequence, and the lost information is restored according to the identical template. A key challenge of the alignment process is the rapid recognition of high sequence similarity in the presence of a huge amount of non-similar DNA.
Recent experimental evidence  indicates that double-stranded DNA molecules with identical sequence can attract each other in the absence of proteins and at salt concentrations similar to those found in the nucleus. This suggests that sequence recognition can occur entirely at a basic physical level. To explain these results, two competing theoretical models have been proposed , . However, neither model has been rigorously tested.
The goal of this project is to use all atom molecular dynamics simulations of two DNA double helices to identify whether a thermally stable electrostatic attraction between two identical DNA molecules exists and to assess the proposed models. This will involve comparative analyses of the interactions between identical and non-identical 50bp DNA molecules. Simulations will be performed using the AMBER  package with the AMBER parmbsc0 force field, which is the most commonly used software package for molecular dynamics simulations of DNA.
The research proposed in this project will enable me to address a recalcitrant problem in DNA damage repair, namely how two identical DNA sequences recognize each other. The results will be of interest to a wide range of biologists and biophysicists.
 Danilowicz, C et al., Proc Natl Acad Sci U S A. 2009 Nov 24;106(47):19824-9. Epub 2009 Nov 10.
 McGavin S, J Mol Biol. 1971;55:293–298.
 Kornyshev AA et al., Phys Rev Lett. 2001;86:3666–3669.
New Zealand eScience Infrastructure is a consortium between Government and the sector, investing jointly into HPC facilities at UC, The University of Auckland, and NIWA. More information is available at: http://www.nesi.org.nz/