Modeling a metabolic flow network with CVXPY

Every cell has a myriad of metabolic enzymes producing internal building blocks from nutrients found in the environment. These building blocks are then used as monomers for large polymer molecules, and these polymers constitutes most of the mass of our cells. So we have to make a lot of building blocks per second!

Construction of the building blocks themselves is done by a network of thousands of unique metabolic enzymes in our cells.

For this blog post, a good mental model of this network is actually a network of water pipes (reactions) and reservoirs (molecules). A chemical reaction is a pipe between some sets of reservoirs, and the flow is the reaction rate. Enzymes make the pipes bigger, enabling a faster flow than what would be possible normally.

A toy example

Let's say we have a set of reactions with the structure:

$$\begin{align*} \ce{\varnothing ->[R_1] A}& \\ \ce{2A <=>[R_2] B}& \\ \ce{B <=>[R_3] C}& \\ \ce{C ->[R_4] \varnothing}& \end{align*}$$ So we can create molecules of \(A\) with reaction \(R_1\), remove molecules of \(C\) with reaction \(R_4\), and \(A, B, C\) can be interchanged with reactions \(R_2, R_3\). We can represent this as a matrix as well, which is often called the stoichiometric matrix $\mathbf{S}$. I've annotated the rows and columns to make the reaction-matrix conversion clear. $$ \mathbf{S} = \mathbb{Z}^{m \ x \ n} = \quad \begin{array}{c c} & \begin{array}{c c c c} R_1 & R_2 & R_3 & R_4 \\ \end{array} \\ \begin{array}{c c c}A\\B\\C \end{array} & \left[ \begin{array}{c c c c} 1 & -2 & 0 & 0 \\ 0 & 1 & -1 & 0 \\ 0 & 0 & 1 & -1 \end{array} \right] \end{array} $$ If we know a set of reaction rates $\mathbf{v}$, then we can calculate the change in molecule counts with a simple matrix-vector product: $$ \Delta \mathbf{m} = \mathbf{S} \mathbf{v} $$ While chemistry in living cells usually operates far from thermodynamic equilibrium, for steadily growing cells, an approximation (but a very useful one) is that most reactions are in a steady state, i.e. the net change in intermediate molecule counts is zero, so: $$ \Delta \mathbf{m} = \mathbf{S} \mathbf{v} = 0 \qquad \text{steady state assumption} $$ This means that given the structure of a reaction network, we can constrain what steady state solutions of $\mathbf{v}$ are allowed. In this case, the set of allowed steady state solutions is clearly: $$ \mathbf{v} = \begin{bmatrix} 2 \\ 1 \\ 1 \\ 1 \end{bmatrix} \cdot t \qquad t \in \mathbb{R}, \ t \geq 0 $$ Obviously, in the real cell, things are more complicated. Typically, the reaction network will be of the scale at least a couple of hundred reactions, and a couple of hundred metabolites. More importantly, the solution space of $\mathbf{v}$ will typically be a high-dimensional convex polytope instead of a simple line because of the flexibility of large, branching networks.

Implementing a metabolic flow network in CVXPY

Since solving the flow problem gets complicated for large networks, we can instead solve it as a linear programming problem. CVXPY is a Python library for convex optimization problems that allows us to easily express optimization problems in a mathematically intuitive way.

For a linear program, we typically want to maximize or minimize an objective function. Let's say in this case, we've measured the maximum rate of reaction \(R_1\) to be 10 molecules produced per time, and we want to maximize the rate of reaction \(R_4\). We can express this mathematically as:

$$ \begin{alignat*}{3} & \text{maximize} \quad \quad && v_4 && \\ & \text{subject to} && \Delta \mathbf{m} = \mathbf{S} \mathbf{v} = 0&& \\ & && 0 \leq v_1 \leq 10 && \\ & && 0 \leq v_4 && \\ & \text{where} && \mathbf{v} \in \mathbb{R}^4 && \\ \end{alignat*} $$

We can implement this in CVXPY in a very similar fashion:

import cvxpy as cp
import numpy as np

# Define the stoichiometric matrix S
S = np.array([
    [1, -2, 0, 0],
    [0, 1, -1, 0],
    [0, 0, 1, -1]
])

# Define the reaction rates vector v as a variable in CVXPY
v = cp.Variable(4)

# Define the steady state constraint S @ v = 0
constraints = [S @ v == 0]

# Append the constraint that the rate of reaction R1 is less than 10
constraints.append(v[0] <= 10)

# Define the objective function
objective = cp.Maximize(v[3])

# Formulate the problem, which has the objective and constraints
problem = cp.Problem(objective, constraints)

# Solve the problem, which returns the optimal vector v
problem.solve()

# Print the solution
print("Optimal v:", v.value)
# v = [10, 5, 5, 5]

The perks of understanding convex optimization theory for modeling metabolic networks

Many research groups use premade packages for solving metabolic flow problems, such as COBRApy or FBApy. These packages are great for quickly getting a solution for simple problem structures, but understanding the underlying theory of convex optimization can be incredibly useful if you want to include more complex constraints or objective functions in your model.

As an example, let's say that you've measured the rate of reaction \(R_2\) to be 5 molecules per time, and for \(R_3\) to be 3 molecules per time. You could add this as a constraint, but most measurements have some noise to them.

So you go back and look at your data, and you realize that the rate of reaction \(R_2\) is actually 5 ± 2 molecules per time. What do we do now?

It turns out that fortunately, L2-norms are convex functions, and actually correspond exactly to setting a Gaussian prior for the reaction rates. We can control the mean and variance of the prior by adjusting the L2-norm as follows:

$$ v_2 \sim \mathcal{N}(\mu, \sigma) \quad \rightarrow \quad \text{minimize} \quad \frac{1}{2\sigma^2}||v_2 - \mu||_2^2 $$

Adding this as a constraint in CVXPY is as simple as:

# Define the prior mean and variance
mu = 5
sigma = 2

# Add the L2-norm penalty to the objective function
objective += (v[1] - mu)**2 / (2 * sigma**2)

This is just an example of how understanding the underlying theory of convex optimization can help you model more complex biological systems. I hope this blog post has been helpful in understanding how to model metabolic networks in just a few lines of code!