Systems & Computational Modeling

As a bioengineer, I believe that building quantitative models of systems is one of the most important components of systems biology. Unlike developing mathematical models of macroscopic systems like some organs and tissues, building models of systems from molecular components is a very hard task. Challenges in building biochemical models include (1) the complexity of proteomic states and interactions, (2) integration of diverse data to infer biochemical interactions, and (3) temporal state of biochemical models. Challenges in building mathematical models include (1) incorporating statistical/probabilistic information into analytical models, (2) utilizing qualitative constraints into mathematical models, and (3) incomplete knowledge and coarse-graining. Challenges in computational modeling are: (1) the absence of knowledge about model parameters such as rate constants, (2) local versus global concentrations of species and multiple length and time-scales, and (3) variation among different cell-types and sub-populational variability or variability among biological repeats. We have developed strategies addressing each of these challenges. In dealing with the problem of parameter estimation and optimization for biochemical network modeling, we have developed new methods involving multi-parametric sensitivity analysis and optimization with particle-swarm genetic algorithms. In order to reduce the complexity of the networks, we have developed a method that uses mixed-integer nonlinear programming for reduced order models. In building temporal networks, we have built probabilistic methods that will allow construction of time-series networks. We are also actively involved in developing stochastic kinetic modeling strategies for dealing with concentrations involving small number of molecules varying spatially in cellular locations. For all these developments, excellent data-driven networks of the GPCR and Calcium Signaling Pathway served as our models. Our kinetic modeling of the GPCR network established the ternary complex involving the receptor, G-protein and the RGS protein. This was subsequently verified with FRET experiments. Our calcium modeling led to prediction of calcium response in macrophages where components were knocked down and subsequent experiments verified our results. We also showed that single cell measurements of calcium in macrophage cells (thousands) in the Meyer laboratory in Stanford collectively fell into one of four types of curves that were reflected collectively in the measurements of whole ensemble of cells. This clearly showed that while microscopically cells differ in their behavior (concentrations of components), the collective phenotype reflects the “states” of cells which are finite, at least in a coarse-grained description. We then asked the question, how can we investigate local concentration of components in cells especially if these are small numbers of molecules? We developed a novel hybrid method that uses classical kinetics for components with large concentrations and a stochastic method for cases where the concentrations are low. We then examined the calcium measurements in the light of small concentrations.

Much of the above falls under the category of “equation (physics)-driven” methods and appeared to be divorced from data-driven “omics” measurements and methods. Very early on, I realized that it is essential to find ways to integrate data-driven and physics-driven modeling to produce models at multi-scale and granularity. And most importantly while “physical modeling” is intrinsically dynamic in nature, the data-driven models warrant time series measurements. The great merit of time series measurements is that they have the potential to provide some notion of “causal mechanisms”. The macrophage once again served as an excellent system since we have systematic time series of measurements of all components be they transcripts, proteins or lipids. Our goal then became reconstruction of networks from data, first canonical and static and subsequently dynamic network topologies.

A few years ago, I joined forces with mathematicians and computer scientists and we developed a Center for the Science of Information (funded by a NSF Science and Technology Center Initiative) involving Purdue, Stanford, MIT, Princeton, UIUC and UCSD. My main focus in this project was the development of networks from data both static and dynamic, establish notions of causality and modularity. To facilitate research in my laboratory, I divided the network reconstruction methods into algebraic, Bayesian and information-theoretic. Our first work in this area analyzed existing methods such as PLS, PCR, LASSO and LMI (established methods) for introduction of noise and missing data – things commonplace in biology. We established the limits of applicability of these methods. We then combined these methods to produce a superior method called DP-LASSO. In parallel, we extended an information-theoretic method with a novel cut-off criterion to build macrophage networks. I was, however, keen on exploring if we can a) decipher causal mechanisms from network reconstruction from dynamic data and b) ask if network topologies themselves are dynamic. We developed methods using “Granger causality” arguments to accomplish both and these are early days in these developments.

In parallel, we began exploring developing new methods for identifying modularity in large networks. Starting with protein-protein interaction networks and subsequently correlation networks from transcriptional data, we developed several exciting strategies. First, we developed a novel stopping criteria for a well-established Newman-Girvan betweenness algorithm. We then developed a variational Bayes method followed by a simple but elegant Newton method. All of these are now published in peer-reviewed journals.

In recent work, we have developed exciting methods for inferring time-varying causal networks from time-series data. We have applied this to a mammalian cell cycle and predicted the phases of the cell cycle without a priori assumptions. This method promises to be extremely useful in building causal networks.

RELATED PUBLICATIONS