project . 2008 - 2011 . Closed

Bayesian methods for modelling and integrating metabolic data

UK Research and Innovation
Funder: UK Research and InnovationProject code: BB/E020372/1
Funded under: BBSRC Funder Contribution: 520,983 GBP
Status: Closed
07 Jan 2008 (Started) 05 Oct 2011 (Ended)

Recent advances in biological technology enable the measurement of multiple measures of complex systems from the cell to the whole organism. However, these technologies generate massive amount of data and it is a major task to process these robustly and efficiently. The aim of our multidisciplinary project is to devise methods to combine and analyze different data measurements arising from experiments in modern biology that will ultimately aid in the understanding of the causes of common diseases, and lead to the development of new treatments. It is now possible to investigate how complex organisms function by measuring in great detail the chemical composition of, for example, a sample of blood or urine, and also to measure how that composition changes over time, or in reaction to different treatments or experimental conditions. Perhaps most importantly, it is also possible to compare the composition across different groups that may have or not have a particular disease, and to use this comparison to understand how treatments might be developed. This exciting prospect can only be achieved, however, if the experimental data are collected and analyzed as accurately possible. This is the principal goal of our research. We will focus on so-called 'metabolic' analysis using two specific types of technology (known by the initials NMR and MS) that allow us to measure the amount of a large number of different chemicals (or metabolites) that are present in the samples of blood or other body fluids being analyzed. Metabolites are small molecules present in all organisms which are essential to the functioning of their living cells. NMR and MS are both extremely sophisticated measurement procedures that each produce a large amount of data (spectra), but although the measurements from the two technologies contain some information on the same metabolites, most of the information from the two sources is not identical, and an important statistical modelling task involves combining data from them in the most sensible fashion. We will separate this task into two components; first, the mathematical modelling of the NMR and MS metabolite spectra, and secondly the combination of the data across the two measurement systems. Both components require major input from both biologists and statisticians involved in our research programme. The statistical analysis of the large amounts of data generated by NMR and MS technologies is an extremely challenging task. Some methods for data analysis do already exist, but they do not use all the information at hand. An important advantage of our approach is that we will use physico-chemical information already available about typical metabolites to direct how we build our models and carry out our analysis. Such physico-chemical 'prior' information has been only rarely used in the analysis of metabolite data, but we feel that it provides an important guide as to how analysis should proceed. Thus we will adopt a Bayesian statistical approach that combines data and prior information in a principled fashion. However, despite being scientifically attractive, this modelling approach needs advanced computing methods so that the analysis can be implemented, and a major component of the research we will carry out will be to implement the most efficient computational strategies. Understanding and modelling the content of NMR and MS metabolite spectra is a complicated task that requires both highly specialized chemical knowledge and state of the art statistical techniques. The novelty of our project is that by using a Bayesian analysis framework we are able to harness and incorporate such specialist information. Our multidisciplinary research team that combines expertise in modelling, statistics, chemical biology and bioinformatics will ensure the success of our research programme and facilitate the dissemination of its results to a wide community.

Data Management Plans