Cell growth / viability assays are chemical indicators that correlate with viable cell number such as metabolism or DNA / nuclei count and can also be used to quantify the effect of media on cells. In chapter 5 we conducted many experiments with different assays and show the inter-assay correlations in Figure 1.3. Notice no assay is perfectly correlated with any other assay because they are collected with different methodologies and fundamentally measure different physical phenomena. For example, AlamarBlue measures the activity of the metabolism in the population of cells, so optimizing a media based on this metric might end up simply increasing the metabolic activity of the cells rather than their overall number. As some of these measurements can be destructive / toxic to the cells , continuous measurements to collect data on the change in growth can be tedious. Collecting high-quality growth curves over time may be accomplished using image segmentation and automatic counting techniques. Using fluorescent stained cells and images, segmentation can be done using algorithms like those discussed. Cells may even be classified based on their morphology dynamically if enough training data is collected to create a generalizable machine learning model. Successfully quantifying the ability of media to grow cells forms the backbone of the novelty of this dissertation. The primary means by which this dissertation will improve cell culture media is through the application of various experimental optimization methods, often called design-of-experiments . The purpose of DOEs are to determine the best set of conditions x to optimize some output y by sampling a process for sets of conditions in an optimal manner. If an experiment is time / resource inefficient, then optimizing the conditions of a system may prove tedious. For example,gallon pot doing experiments at the lower and upper bounds of a 30-dimensional medium like DMEM requires 2 30 ≈ 109 experiments. This militates for methods that can optimize experimental conditions and explore the design space in as few experiments as possible. DOEs where samples are located throughout the design space to maximize their spread and diversity according to some distribution are called space-filling designs.
The most popular method is the Latin hypercube , which are particularly useful for initializing training data for models and for sensitivity analysis. Maximin designs, where some minimum distance metric is maximized for a set of experiments, can also allow for diversity in samples, with the disadvantage being that in high dimensional systems the designs tend to be pushed to the upper and lower bounds. Thus, we may prefer a Latin hypercube design for culture media optimization because media design spaces may be >30 factors large. Uniform random samples, Sobol sequences, and maximum entropy filling designs, all with varying degrees of ease-of-implementation and space-filling properties, also may be used. It cannot be known a priori how many sampling points are needed to successfully model and optimize a design space because it is dependent on the number of components in the media system, degree of non-linearity, and amount of noise expected in the response. Because of these limitations, DOE methods that sequentially sample the design space have gained traction, which will be talked about in the next section. A more data-efficient DOE is to split up individual designs into sequences and use old experiments to inform the new experiments in a campaign. One sequential approach is to use derivative-free optimizers where only function evaluations y are used to sample new designs x. DFOs are popular because they are easy to implement and understand, as they do not require gradients. They are also useful for global optimization problems because they usually have mechanisms to explore the design space to avoid getting stuck in local optima. The genetic algorithm is a common DFO where a selection and mutation operator is used to find more fit combinations of genes . In Figure 1.7, notice the GA was able to locate the optimal region of both problems regardless of the degree of multi-modality. [9] used a GA to optimize media for rifamycin B fermentation in bacteria where the HPLC titer at the end of 9 days was used to select high performing media combinations from nine metabolites for the next batch of experiments. They allowed for a 1% chance of mutation of each experiment and componentto allow for global search.
They also discovered that the response space was multi-modal and had interactions between components, which confirmed the need for global optimization of fermentation and bio-processing problems discusses 17 cases in which GAs have improved media for different organisms for chemical fermentation often by > 50% yields for problems of > 10 media components. Particle swarm optimization is a population-based method that optimizes systems sequentially based on varying x according to a velocity vector v. At the tth iteration of the algorithm a particle x will have the velocity update rule vt+1 = wvt +c1r1 ∗+c2r2 for random numbers r1, r2, coefficients w, c1, c2 . c1 and c2 parameterize the exploration-exploitation trade-off, similar to the mutation rate in the GA. w represents the fraction of velocity saved for the next iteration t + 1. To implement this one merely computes xt+1 = xt + vt for a large population of particles over time as the population gradually gravitates to the optimal designs. The Nelder-Mead simplex method, wherein a group of points is moved closer to better values via expansion and contraction steps, is also a popular DFO method. Nelder-Mead is a local optimizer and may be hybridized with other global DFO methods to improve convergence. While DFOs don’t require gradient calculations and can usually optimize complex multi-modal optimization problems , they require 100’s, if not 1000’s, of experiments so are limited to fast growing culture systems or computer experiments where experiments are somewhat costless. The most powerful experimental optimization technique is arguably the model-based sequential DOE, in which a response-surface model of the relationship between the input x and output y data is trained, and new samples are constructed based on the predictions of the trained model. The newly collected data is then fed back into the model and used to generate another sequence of samples discusses using combinations of screening DOEs and polynomial RSM to optimize conditions for the fermentation of metabolites such as chitinase, γ-glutamic acid, polysaccharides, chlortetracycline and tetracycline among 20 other metabolites from various organisms. This demonstrates the usefulness of RSMs for fermentation and culture optimization.
The primary limitation of polynomial RSMs is their inability to accurately model many factors at a time or systems with significant nonlinearity. Due to their generalizability to modeling different response surfaces,gallon nursery pot neural networks have been used to optimize bioreactor cultures and multi-objective protein storage conditions. Radial basis function have been used to optimize yeast and C2C12 mammalian muscle cell culture growth media. Decision trees and neighborhood analysis have been used to optimize media for antibiotics and bacteria fermentation. An example of an RSM can be seen in Figure 1.8 where a radial basis function maps the input / output relationship in a nonlinear system, then a GA finds new optimal experiments. Over time the predicted contour looks similar to the true function. While these RSMs tend to be more generalizable compared to polynomial and linear models, low-data experimental campaigns common in fermentation and cell culture often obscure the differences between modeling techniques. Additionally, many of these RSM approaches do not take into account prior information about the system to speed up optimization. Due to the noisiness of fermentation data it may be useful to consider noise in our process models. Known or unknown constraints can be incorporated into GPs as well. For example, a known constraint might be that growth must exceed some minimum value. An unknown constraint might be the existence of excessive foaming in bioreactors, which may be learned from data, but is generally not known ahead of time. Multiple objectives, some of which may compete against one another, can be modeled and optimized using GPs and correlations between tasks may be considered. By correlating measurements, fewer total experiments are often needed. Multi-objective versions of acquisition functions α such as maxvalue entropy search and hypervolume improvement exist to turn these GP predictions into a score for a variety of objectives. Fermentation and cell culture systems are often subject to growth vs cost trade-offs so multi-objective Bayesian methods are useful here. Because most bio-processing experiments can be done using multiple bioreactors or cell culture plates, designing multiple optimal experiments at a time is often necessary shows how, using monte-carlo samples of the GP model, arbitrary numbers of experiments can be designed simultaneously. Knowledge that systems may exhibit separate but interacting local and global responses may militate for additive GPs.. Experimenters with access to separate computer simulations or algebraic process models may pose their GPs as composites of deterministic or other modeled functions and speed up optimization. Bayesian models may even fuse historical data-sets together to estimate optimal model parameters with constrained uncertainty, and could perhaps be used for optimization as well . More closely related to cell culture media optimization, GPs have been used in a Bayesian optimization scheme to optimize C2C12 growth media for proliferation maximization and cost minimization in chapter 5 of this dissertation.This dissertation is divided into roughly two equal parts. The first part are comprised of the development of a radial basis function genetic algorithm sequential DOE scheme. It drew heavily on the work of, where a sequential DOE technique was developed on the principle of local random search in areas of high performing media.
This algorithm was also dynamic by converging on high performing results and selectively searching the design space when good results were not forthcoming. Additionally, previous work in our lab provided the framework for a sequential DOE based on a truncated GA. This modified GA incorporates uncertainty in the optimal samples found by halting algorithm convergence proportional to the amount of clustering around an optima the GA finds. By hybridizing these two methods, a DOE algorithm called NNGA-DYCORS was developed that solved various computational optimization problems better than either method alone. It was used to optimize a 30-dimensional media for serum-containing C2C12 cell culture with the metric of growth being AlamarBlue reduction after 48 hrs of growth in 96 well plates . Cells were seeded at the same time, concentration, and from the same frozen innoculum so that all experiments were roughly the same. While it was successful at finding media that maximized this metric , the optimal medium did not grow as many cells over additional passages. To fix this underlying problem, multiple passages needed to be incorporated into the DOE process. This is a very time-consuming process as each passage takes multiple days, many more physical manipulations than simple chemical assays which introduces opportunities for contamination, and difficulty for manual experimentation. To solve this, chemical assays were supplemented with small amounts of manual multi-passage cell counts in a multi-information source Bayesian GP model which was used to successfully optimize a 14-dimensional serum-containing media for C2C12 cells . Due to the presence of multi-passage data, the final optimal medium grew cells robustly over four passages, provided nearly twice the number of cells at the end of each passage relative to the DMEM + 10% FBS control and traditional DOE method, and did so at nearly the same cost in terms of media components. In the final chapter the multi-information source GP model was extended to optimize a 26-dimensional serum-free media based on the Essential 8 media using a multi-objective metric that improves cell growth while minimizing medium cost. Using this Bayesian metric, a broad set of media samples along the trade-off curve of media quality and cost were found, showing that a designer can be given options in media optimization. In particular, one medium resulted in higher growth over five passages while the control and Essential 8 lagged. We identify two important future considerations for this work. First, the data collection process, which is the major innovation of this dissertation, needs to be made more robust by actually capturing the long-term growth dynamics of the cells. Fluorescent and bright field imaging, used to quantify the temporal and spatial changes of the cells, may improve over whole-well AlamarBlue and LIVE/DEAD stains by couting individual cells and collecting more fine-grained growth curves.