The opinion of the research community on rarefying microbiome data seems rather divided

Cultures cluster close to one another while liquid samples show large inter-sample variation both before rarefaction and after . While rarefaction changed both the total and coordinate-specific amounts of variation, it did not do so to a remarkable extent – 97.2% to 97.5% and 89.4% to 88.5% for total variation in spiked and unspiked, respectively, and below 5% in all cases for individual coordinate axes in all cases. On the other hand, removing the spike-in OTU from the read counts did substantially change the clustering as well as the x/y positioning of both the liquids and cultures. Even a cursory comparison of the left column in Figure 15 to the right column in the same figure shows that the E. coli OTU affected the apparent similarities of the samples shown by PCoA. A few finer points should be made here about removing this process. First, removing this OTU pulled the liquids and cultures together; whereas the first coordinate spanned from -0.5 to 0.6 before removal, it spans from about -0.3 to 0.6 afterwards, indicating a tighter grouping for all samples. Second, as might be expected, removing this OTU allowed the compositional variations in the liquid samples to surface , whereas the liquids were clustering according to whether they had received the spike in before removal . The liquid samples were clearly more compositionally diverse than the cultures, evidence by the wider horizontal spread of the triangles in Figures 15 b and d. Third, removing the E. coli reads did not greatly affect the grouping of the cultures. Culture samples still fall within 0.2 units of one another in both coordinate axes, with one exception of a culture spiked by 100µL of E. coli. Fourth, after removal of the E. coli OTU, blueberries in pots neither liquids nor cultures cluster with visibly discernible patterns that group with spike volumes anymore, whereas the grouping of samples with spike volumes was apparent before removal .

Overall, these clusters and their disappearances fall well within expectation, considering that the spike-ins were much higher in biomass than the liquids but much lower than the cultures. The difference in biomass between the liquids and cultures inevitably led to the increased propensity of the liquids to similarity/dissimilarity influences from the spike-in. For both liquids and cultures in the preliminary experiments, it seems that as expected, no distinct groupings based on inherent compositional dissimilarities can be observed. It is also not clear whether the variations in the liquid samples come, in large part, from the low numbers of read counts after E. coli removal, as even a few reads in a low-read-count sample could lead to seemingly large inter-sample differences. In any case, the total percentage of variation accounted by PCoA here falls between 89% and 98%, indicating that two axes were sufficient for this set of samples. Interestingly, removing the E. coli OTU decreased the total percentage of variation accounted for by more than 8%, once again underscoring the sway that the spike-in had on liquid samples. From the compositional analyses, we see that cultures in the preliminary experiments yielded sufficient biomass and contained dental plaque bacteria, without exhibiting unexpected similarities or dissimilarity with themselves or with the liquids above the sedimented cultures. As to what the principal coordinates represent, i.e. what underlying biological differences may have led to two coordinates being sufficient, we would need to adopt a different analytical approach, which we do in the next stage of the project.In these preliminary experiments, we established a culturing procedure that minimizes external contamination while producing high numbers of viable cells from the human oral/dental bacterial community. Compositional analysis of the cultures showed that OTUs with the highest relative abundances belong to the Neisseria, Streptococcus, and Veillonella genera, two of which have been shown to be early and middle colonizers of the oral microbiome and all three of which have been shown as core genera in the supragingival plaque community.

The prevalence of OTUs from commonly occurring oral bacterial genera confirms that the culturing conditions support the growth and proliferation of anaerobic oral microbes without resorting to traditional, closed-form anaerobic culturing techniques such as anaerobic agar. Not many members from the group of later colonizers were present in the cultures in the preliminary experiments, though an Eikenella OTU was cultivated in the in vitro oral community to the extent of having a visible relative abundance value . Previous research has show that members of this OTU belong to groups of later colonizers that also include Actinomyces spp., Capnocytophaga ochracea, Propionibacterium acnes, and Haemophilus parainfluenzae. In this case, the absence or low abundance of later colonizers is not surprising, given that the cultures were incubated for less than 24 hours and not replenished with fresh host plaque. The short incubation time and lack of re-inoculation are part of the widely known scientific truth that in vitro conditions frequently select for organisms that can survive without the rich and complex environment of the original host. For bacteria that come from humans, this truth holds even more weight because it is unfeasible to replicate the human oral cavity. The complexity of host-microbe interactions simply defies reproduction in the lab. Another aspect to consider regarding the lack of later colonizers in these cultures is that membership of the oral bacterial community can vary greatly across different hosts. Kolenbrander and coworkers presented a larger picture of all the organisms that generally colonize earlier or later, with results that implicated trends, in other words, approximate orders of succession of oral/dental bacteria instead of definite lines of succession, and their work is far from the only instance for which human microbiome compositions have shown such great inter-host variations. The oral microbiome is no exception to such variation, but there exists a core community of major genera, and our methods have captured members of these major genera .

However, the factors already mentioned as having possibly detracted from organismal diversity in the in vitro cultures can be mitigated in future experiments by periodic re-inoculation of the cultures, longer incubation times, and/or variable nutrient sources and concentrations. These changes may help meet nutrient and signaling requirements of more fastidious bacteria, as well as increase the density of cells from certain OTUs to beyond their threshold values, such that proliferation becomes possible. Some of the culturing conditions that seemed appropriate for a proof-of-concept, such as this set of experiments was intended to be, including surface hydrophobicity of the scaffold, sampling with consideration of the growth phase of the bacteria, and bacterial attachment. The results seemed to indicate a promising protocol to establish an in vitro plaque community. An aspect that deserves some special, detailed consideration is the formulation of the culturing medium, SHI, based on the work from Tian and coworkers . As we used this medium in the preliminary experiments, it provided adequate nourishment, particularly in terms of pH and ionic strength. A potential disadvantage of SHI is that it is considered an undefined culture medium because the major carbon source in SHI is porcine stomach mucin. This glycoprotein is supplied in a partially purified form, square plant pots and because the glycosyl modifi- cations on glycoproteins can varied greatly depending on the conditions in the source organism, the composition of this protein cannot be guaranteed to be entirely biochemically identical across batches. Interestingly, the undefined nature of this medium has not yet been reported to be a major obstacle. On the contrary, research has shown some evidence that this medium may outperform more defined medium. A study that com-pared the effects of two media, DMM vs. BMM , on dental plaque microcosms grown in an artificial mouth system showed that plaque growth was slower in the chemically defined DMM, which contained higher concentrations of choline, citrate, uric acid, haemin, pyridoxine, biotin, and cyanocobalamin but lower concentrations of inositol, menadione, niacin, pantothenic acid, thiamine, and riboflavin. Furthermore, enzymatic activity for DMM was lower or in some cases undetectable. The results of our preliminary experiments indirectly affirm those from the comparative media experiment – we saw fast growth with the SHI medium, which contains major components from BMM as well as supplements such as menadione. However, we did not test the enzymatic activity of the cells in the culture to ensure that it is at least detectable, and we may need to do so. Another potential improvement might be to make the medium more defined for the sake of repeatability in our lab and reproducibility in the community. There has been some evidence that an artificial saliva may substitute human saliva in the growth of streptococcal species, and the composition of this artificial saliva may be a good starting point for a defined medium that would also be nutritionally sufficient. With regards to the attempt at establishing an internal standard with a known E. coli strain, we found that it was not feasible to seek correlations between read counts and OD600 values or CFU/mL under the conditions in the preliminary experiments. Finding such correlations mathematically would require quantifying and optimizing additional steps in the sequencing process.

The key steps to optimize here would include setting a concentration of E. coli DNA to be spiked into the samples to be sequenced, rather than using cells as spike-ins; understanding the efficiency of DNA extraction and mitigating the somewhat common bias of extraction processes to preferentially yield more DNA from Gram-negative bacteria; quantifying and optimizing the efficiency of PCR for the 16S rRNA of samples, which may involve some minor primer modifications; quantifying the composition of the library to be sequenced, potentially using genus-specific primers;quantifying how the fixed sequencing depths of HTS platforms affect the read counts and apparent compositions of samples, especially when samples do not have the same biomass; and so on. The quantification and optimization of these steps, including a detailed understanding of how systematic errors mathematically affect the results and how the number of discarded low-quality sequences affect the apparent compositions, may then enable us to find empirical relationships between number of cells in the cultures and read counts from sequencing. Gaining such a great degree of control over the whole process was not feasible at the time but would be a worthwhile venture for a future project. If we can establish a facile and rapid approach to quantification and optimization, we may be able to propagate the approach to developing many such numerical, analytical protocols. The bio-informatics process used for the preliminary experiment seemed to have served its intended purposes. With this process, we were able to perform quality control on the reads and cluster reads into OTUs at a reasonable level of sequence identity . More importantly, this procedure did not produce apparent artifacts that affected the processing and interpretation of data. The results obtained from using this bio-informatics pipeline met expectations formed from existing research on the human oral microbiome, though an aspect that may merit further consideration is the standardization of sample size by rarefaction. While there is some evidence that rarefaction helps reduce false discovery rates, there is equally reasonable evidence that rarefaction omits valid data and may bias against rare OTUs. For the purposes of these preliminary experiments, we have shown that rarefaction to 20,000 reads does not produce obvious artifacts or detectably reduce features observed in non-rarefied samples. Given that one of the major goals of the bio-informatics analysis in these experiments was to establish a procedure capable of distinguishing between biologically distinct samples without introducing much bias, rarefaction was clearly a defensible part of our approach. As for PCoA, the observation that rarefaction increases the percentage of variation accounted for is an expected result because of the nature of rarefaction. Rarefaction is a standardization procedure that simultaneously equalizes sample sizes and reduces the inter-sample variation, especially for samples with high numbers of rarer OTUs. To understand this point, we need to consider the two foundational concepts of diversity: richness and evenness. In terms of richness, adding a single OTU to a sample increases richness by one, which would only change diversity greatly in samples with low numbers of OTU. As for evenness, the addition of one OTU to a sample may or may not lead to a dramatic change in diversity, depending on two major factors.