Probabilistic modelling of transcription dynamics in whole embryos and singel cells
Watch the video here
Abstract
We are using probabilistic time-series models to gain insights into transcription dynamics in the early drosophila embryo. We consider the very earliest stage of development, where maternal transcripts are progressively replaced by zygotic gene expression. We are using a combination of whole embryo RNA-Seq and live cell imagining time-series datasets to gain insights into the mechanisms regulating RNA levels in the cell. Firstly, using a total RNA-Seq time course that captures intronic and exonic reads, we model the production and degradation of RNA by combining a differential equation model of degradation with a Gaussian process model of transcription (1). We infer half-lives for a large set of zygotic genes and show how degradation rate regulates the difference in timing of peak levels of nascent and mature transcripts. Short half-life mRNAs are more likely to be associated with P-bodies and we find evidence of 5' to 3' degradation occurring in P-bodies for a subset of mRNAs. Secondly, we use live cell imaging to model transcriptional bursting in single cells. A previously proposed compound-state hidden Markov model (cpHMM) provides an effective approach to inferring transcription dynamics from ms2 live imaging data. However, the original formulation has time complexity scaling exponentially with gene length and is not practical in our case. We have therefore developed more scalable inference approaches, including mean-field variational Bayes approaches and a truncated state-space approximation (2). By comparing burst kinetics in cells receiving different levels of BMP signaling, we show that BMP signaling controls burst frequency by regulating the promoter activation rate. The rate of promoter activation depends on both the enhancer and promoter sequences and we show that the main determinant of the RNA polymerase II initiation rate is the enhancer.
Biosketch
My group works on how to learn models and make inferences given evidence from high-throughput biological datasets. The models that we develop range from mechanistic differential equation models of the cell to more abstract probabilistic latent variable models that can be used uncover interesting structure in high-dimensional data. We are particularly interested in hybrid models that combine aspects of mechanistic and probabilistic models.
Models encode our hypotheses about how biological systems work. We use probabilistic inference to learn the model parameters and to choose between competing models so as to identify the hypotheses best supported by the available experimental evidence. Bayesian inference and non-parametric modelling is a particular focus as this provides a principled framework for dealing with uncertainty in complex systems. We are applying our methods to infer gene regulatory networks from time-series mRNA expression and DNA-protein binding data, to uncover changes in the transcriptome from RNA-Seq datasets, and to develop novel inference algorithms for time-series data analysis and systems biology modelling.