MacGyver, Dinosaurs and Viruses
Why would someone who usually works on fossil reptiles delve into molecular evolution of viruses? It turns out the two fields have a lot in common, and my diversion eventually led to a quirky study which has received extensive international interest, which began with an interview request from Time Magazine and culminating in an invitation to write for PubChase!. While much of the public interest obviously relates to the study taxon (dinosaurs!), I think the more important scientific legacy relates to methodological novelty. Essentially, we "did a MacGyver" - transferring techniques normally used by molecular epidemiologists across to the fossil record. But as is usually the case, this study had a long, exciting and occasionally frustrating gestation.
Palaeontologists have long held an embarrassing little secret when it comes to dating. The impressive phylogenetic trees we see in textbooks and scientific papers, showing how various fossil groups diverged through evolutionary time, are partly made up. This is because the fossil record only provides evidence for minimum divergence times. If lineages A and B both first appear 100 million years ago, all we can say is that they must have diverged sometime earlier - but how much earlier is largely "guestimated".
A few years ago, it dawned on several people - including myself - that we could do better. If the first examples of lineages A and B were already vastly different, then presumably their respective lineages diverged much earlier. But if the first examples were nearly indistinguishable, then presumably their lineages diverged not long before. It turns out the relevant maths to do all this had already been developed by virologists. Given (for instance) DNA sequences from virus samples spanning 1980 to 2000, they could easily work out both their phylogenetic relationships (the branching order of the tree) and also estimate divergence dates (when those divergences happened), and further estimate rates of DNA change in each lineage. In effect, the serially time-sampled viruses represented information exactly analogous to the fossil record: just re-read the previous sentence and replace "virus" with "reptile", DNA with "skeletal features", and "1980-2000" with "200 millions ago to the present".
All that was required was to find an anatomical dataset, put it through the relevant software, and we'd have an instant citation classic. Or maybe not. There were hurdles at each stage.
First, these analyses require one to sample all traits in an unbiased fashion. Geneticists routinely do this, sampling all nucleotides in a certain gene or the entire genome. But virtually all anatomical datasets selectively exclude vital data. Palaeontologists seldom "score" all observable traits: rather, they ignore traits which are constant across all taxa being studied, and also ignore traits which are "parsimony-uninformative" (where a single taxon exhibits one state, and all taxa exhibit another state). This is because such traits are irrelevant under the dominant method - parsimony or "cladistics" - used to reconstruct trees based on anatomical data. Imagine the issues with molecular analysis if people selectively failed to record invariant and parsimony-uninformative sites when sequencing genes and genomes! The thought of gathering a suitable dataset entirely from scratch was a bit daunting (though our current studies are now gathering data in the appropriate manner). Fortunately, it turns out that some colleagues in Italy and the UK had such a dataset; they had diligently been scoring traits across Dinosauria, including invariant and parsimony-uninformative traits. With Andrea, Gareth and Darren on board, we were in business.
The next hurdle was to get this anatomical data into the relevant software. But the program - a vast and complex collection of code appropriately named BEAST - was largely geared to analyse DNA. It was a bit of a struggle to get it to handle an odd new type of data (anatomy), and tweaking the evolutionary models. But, to my credit, I only swore once: starting in March 2012 and stopping in April 2014 (to paraphrase Woody Allen). And once this was all working fine, it still took months of supercomputer time to sample all the parameter space of the solution. Palaeontologists are not used to such complex and time-consuming analyses: I still get people asking me if I could quickly run their data using such methods one afternoon while they sit and watch (I reply they should not only bring coffee, but perhaps harvest it personally from a plantation in Brazil as there will be more than enough time)!
There is an amusing scene in Hitchhikers Guide to the Galaxy where humanity waits for seven million years for the supercomputer Deep Thought to calculate the answer to "the ultimate question" - and when it finally spits out an answer they are incredulous because it doesn't make any sense ("forty-two"!). This happened a few times in our study, and each time I found the bug, fixed it and waited further (while wondering how often people would uncritically publish such "amazing" results). Eventually, the final results materialised and they were very exciting - and probably not artefactual. It showed that birds were evolving new adaptations four times faster than their dinosaurian cousins (in much the same way as virologists might demonstrate that a particular flu virus strain is evolving unusually fast). Birds were also the only dinosaurian lineage which continually shrank in size over an extended period of time. A video summary of these results is here.
There's always angst and uncertainty when you have novel results and aim high; journals at the very top of the pecking order only have a 95% rejection rate, which means one lukewarm review is often the death knell. And authors tend to have unrealistic optimism: virtually everyone thinks their submission deserves to be in the lucky 5%, just like surveys show that almost all drivers reckon they are "better than average"! But we were fortunate: after promising assessments and an 11-page "Response to Reviewers" which included an Appendix 1 (much longer than the actual paper), we had formal acceptance. By the time of the proofs, I'd seen the manuscript so many times my eyes were glazing over: a Ph.D student we know complained that he still saw his paper as he drifted off to sleep, burned into his retinas! We submitted a killer cover proposal, but in the end our gigantic dinosaur and delightful hummingbird lost out to - a worm. Yes, I realise nematodes are the most abundant and ecologically important animals - but they are generally so nondescript. It was just our luck to come up against the Gwyneth Paltrow (or Johnny Depp) of nematodes that week; I was disappointed my parasitologist friends weren't very sympathetic to our plight!
The final remark I'd like to make about our study is that it demonstrates the value of maintaining a wide perspective. Time-series samples of viruses, and fossils collected across vast swathes of geological time, can be analysed using the same approach. So take your advice from MacGyver*. Whatever field you work in, there are theories, methods and algorithms developed in other fields that are potentially ideal for addressing your research problems. Both Darwin and Wallace independently came up with Natural Selection by applying economic theory (An Essay on the Principle of Population) to the natural world.
*Disclaimer: I have never watched a full episode.
COMMENTS