Different execution strategies also make it difficult to compare validation results between papers. Additionally, there remain some problems with the RWR method, particularly its reliance on the known gene distribution, which is unlikely to reflect the true distribution of genes associated with the trait. It is also apparent that even with the bidirectional extension of QTL there remains a tendency to over represent the centers of chromosomes. Given the central importance of gene distribution to RWR, addressing this particular failing will produce a method that is much more effective at identifying QTL of interest, and will therefore improve the rate at which breeding and fine mapping can be accomplished. The probability associated with the selection of a particular marker, P, as the origin for a particular QTL is calculated by assessing the number of locations in the genome on which the QTL can be placed. P is determined by the number of markers that can be used as the origin O of the QTL of length L, a number which excludes all markers that would result in the QTL extending beyond the end of the chromosome. This origin-based model of QTL mapping is an approximation for the true process of QTL placement, wherein QTL are roughly centered on a marker and are terminated at a marker on either end which represent the 95% confidence intervals for that QTL. In the case where the original QTL were not terminated at markers, it is preferable to use a model in which the QTL is centered on the chosen marker. Models requiring that the QTL must both start and end on a marker are not feasible, vertical farming aeroponics because the distance between markers is not uniform; with this constraint, a QTL of a given length might have only one possible genomic location. Markers are used in a direction-independent manner to avoid under representation of the ends of the chromosomes.
Under the null hypothesis, the probability of using any marker is {0,1,2}/M, which can take on any value between 0 and 1, inclusive. Here, the numerator depends on whether the marker can serve as O with unidirectional QTL extension, bidirectional QTL extension, or with neither; M refers to the total number of markers from which the QTL can extend to the left plus the total number of markers from which the QTL can extend to the right . Although the denominator could be adjusted to take into account the fact that a single QTL can only be mapped to one chromosome, this is unnecessary, because a QTL is not equally likely to be mapped to every chromosome. To understand why, consider the most obvious strategy to account for differences in chromosome length and the number of usable markers: a weighting scheme. We would divide the number of usable markers on each chromosome by the total number of usable markers in the genome . If this weighting is then used to adjust the probability P of using any marker on that chromosome, which is already proportional to 1/MC, the chromosome-specific marker counts cancel out and leave only the whole-genome marker count. To best represent the topology of the genome, the SPQV simulates genetic loci by selecting genes at random from the whole genome gene distribution to represent the genetic basis of the trait of interest. The use of the whole genome gene distribution as a source accounts for the topology of the genome, including the decrease of gene density at the centromere and telomeres . This strategy assumes that the true distribution of genes associated with a particular trait is approximately the same as the distribution of genes on a whole, rather than the assumption used in the simplest instance of RWR: that the distribution of known, previously associated genes reflects the true distribution of genes associated with that trait. We argue that this novel assumption is more likely to be accurate because trait-related genes can easily be discovered in a spatially biased manner : tandem arrays promote clustered discovery, some transposons involved in transposon-mediated mutagenesis preferentially target certain sequences , and genes in regions close to the centromere tend to be difficult to identify through methods that rely on recombination .
Additionally, the genome is interconnected; many traits rely on the interaction between multiple, seemingly disparate biological processes. Use of a random distribution for simulating genes with the SPQV is also possible, but fails to capture the genomic topography. Because genes within functional groups are not randomly arranged, duplication events and gene clusters in the original set of known genes are taken into account by considering genes without a marker between them as one genetic unit. The SPQV values are clearly most similar to those produced by the RWR experiment that was closest to biological reality: marker-only QTL origins, bidirectional mapping, and no bounce back . This makes sense, as the SPQV method is designed as a smoothed version of an experiment with these characteristics. Restriction of QTL origin to the markers that were used in mapping leads to an increase in EGN for RWR . This effect occurs for all QTL lengths. It is likely that the increase of identified genes in the context of restricted QTL placement is attributable to the physical structure of chromosomes: the markers selected for QTL mapping have a similar distribution to the genome wide distribution of genes , and are therefore relatively sparse in gene-poor regions such as the centromere. Similarly, the use of bounce back leads to an increase in RWR identified genes for all lengths of QTL, though this increase is particularly noticeable for some of the larger QTL . It is likely that the relatively large number of genes situated close to the ends of chromosomes is the main contributor to the impact of bounce back on identified gene number. It is possible that this reduction is due to a smoothing of the distribution, as the occurrence of bounce back is effectively split in half over the two separate tails of the chromosome. The presence of long and short arms on chromosomes, and the corresponding lopsidedness of the gene distribution, might also contribute to this phenomenon .
The use of bidirectional mapping appears to result in fewer genes identified by RWR, though this effect is relatively minor when compared to the effects of origin restriction and bounce back. In spite of the prominence of dark colors on the left side of the heat map, the confidence intervals identified for small and medium QTL by the SPQV and by RWR are fairly similar regardless of RWR method . The CIs in this range were consistently far below 1 regardless of method. In all, the SPQV 95% confidence limit for small QTL tends to be slightly smaller than the one produced by the RWR method that takes the same biological realities into account . However, this makes little difference in practice, because they are both less than 1: because observed gene counts are integers, EGNs from either method will be rounded up . In other words, if an SPQV confidence limit is defined as 1.2, the QTL of interest must have an observed gene content of 2 or more genes to be considered significant during general use. For larger QTL, SPQV values tend to outsize those produced by RWR. These large QTL approach the size of a full chromosome, and can indeed be larger than several chromosomes within the S. italica genome. It is the authors’ opinion that this is not an overestimation for the true distribution of genes associated with the trait of interest, as the true distribution likely has more than the known number of genes. Because of this, significance is unlikely for very large QTL,vertical indoor hydroponic system except for in the case of a true distribution of genes that is extremely uneven at the chromosomal level. A reduction in tiller number is a classical domestication trait in maize . Modern maize lines have been bred to grow as single stalked plants to facilitate high-density planting, while the maize progenitor, teosinte, is highly tillered. The genetic network associated with tiller suppression is controlled by the teosinte branched1 gene that also controls several other aspects of maize morphology , including inflorescence and floral architecture. Several mapping populations made from crosses between the W22 maize inbred line and teosinte were recently described and used to map several domestication traits . As expected, several domestication traits associate tightly with tb1 pathway. Here, we use the QTL reported by Chen et al. 2019 to illustrate the utility of the SPQV. Only the QTL with the same effect direction in both maize/teosinte mapping populations were assessed. Seven genes closely associated with the tb1 pathway in maize were located in the Zm-W22 NRGene 2.0 assembly and analyzed using SPQV. Notably, these genes were selected based on their strong, known associations with the branching pathway in maize. Since only high-confidence genes can be used accurately with our method, if any gene were not truly associated with the trait of interest, its presence will render the SPQV more stringent than necessary.
The QTL identified for the traits BARE , EB , GLUM , KRN , STAM and TILN have previously been connected with the tb1 pathway in maize. These QTL were therefore assessed in relation to the seven genes in Table 1. The markers found in the W22 x TIL01 RIL sub-population were used to determine the base pairs associated with the CIs of these QTL. Where the end points of the QTL did not have an exact match to a marker, the next closest marker was used so as to mimic ‘extension’ style mapping. The results of this analysis are reported in Table 2-2. Four of the assayed QTL identified a gene from the tb1 pathway, corresponding to four out of six of the represented traits. If the various adjustments described in this paper are applied to the RWR-based assessment of QTL mapping experiments, a more apt confidence limit for the expected number of genes will be identified. These adjustments do not account, however, for all of the issues associated with the application of RWR to this particular variety of question. RWR not only continues to rely on the distribution of known genes, but also results in gene-count distributions that nearly always fail to meet the requirement for smoothness . These distributions, in other words, have a tendency to change abruptly, and are frequently binary in the case of small and very large QTL. The ‘unsmoothness’ of any given distribution will be exaggerated by small QTL size and short lists of known genes; a known gene list with fewer than one gene per chromosome, for example, would produce a binary distribution even for large QTL. Additionally, RWR continues to exhibit a reduced likelihood of the QTL falling in the regions [1,1+L] and [C–L, C] even with the adjustment for bidirectional QTL extension. Finally, a practical weakness of the application of considered RWR is that this procedure requires a great deal of thought, effort, and expertise, and there are many points in the procedure at which simple errors can produce dramatic changes in the confidence limits that are ultimately produced. In light of the flaws of naive RWR, and the complexity of making the suggested adjustments, we recommend using the SPQV to assess the quality of QTL mapping experiments. The function provided, SPQValidate, requires only a few lists of data; the function itself accomplishes the analytic work that might be a stumbling block in RWR. Many of the other problems inherent to RWR are overcome by the SPQV’s probabilistic nature. This tool is potentially overly conservative, however, in the case of short QTL. It is extremely unlikely that the SPQV will produce a value of 0 for the confidence limit, as any locus is likely to be within range of at least one marker for even the shortest identified QTL. Because of this, the minimum confidence limit is, in practice, 1, which might be misleading for small QTL. Additionally, the total number of genes in a QTL is not necessarily an authoritative measure of a QTL’s validity; one can imagine that a QTL located on a single gene of high impact might be considered non-significant if the SPQV is the only method of validation used.