Six seedlings of each genotype were planted per pot for each replicate. The 76 IL’s were divided into four cohorts of 20 randomly assigned genotypes. These cohorts were placed across four temporal replicates in a Latinsquare design as described in . The seedlings were harvested 5 d after transplanting . Cotyledons and mature leaves .1 cm in total length were excluded, and remaining tissues above the midpoint of the hypocotyl were pooled, for all individuals in a pot, into 2-mL microcentrifuge tubes and immediately frozen in liquid nitrogen. Two IL’s, IL7.4 and IL12.4.1, were not included in the final analysis due to seed contaminations.Seeds 76 IL’s along with the parents were sterilized using 70% ethanol, followed by 50% bleach, and finally rinsed with sterile water. This experiment was replicated three times each in 2011 and 2012. Ten to 12 seeds of each IL were sown into Phytatray II containers with 0.53 Murashige and Skoog minimal salt agar. Trays of each IL were randomly assigned to either a sun or shade treatment consisting of 110 mmol PAR with a red to far-red ratio of either 1.5 or 0.5 at 22°C with 16-h-light/8-hdark cycles for 10 d. Three genotypes were excluded from the analyses due to poor germination or their necrotic dwarf phenotypes . After 10 d, seedlings were removed from the agar and placed onto transparency sheets containing a moistened kimwipe to prevent dehydration and scanned using an Epson V700 at 8-bit grayscale at 600 dpi. Image analysis was carried out using the software ImageJ . For hypocotyl length analysis of backcross inbred lines between S. pennellii and S. lycopersicum cv M82, growing berries in containers seeds were sterilized in 50% bleach and then rinsed with sterile water.
The seeds were then placed in Phytatrays in total dark at room temperature for 72 h and then moved to 16 h light/8 h dark for 4 d. Seedlings were transferred to soil using a randomized design and assigned to eithera sun or shade treatment for 7 d. Images were taken with an HTC One M8 Dual 4MP camera and hypocotyl lengths measured in ImageJ using the Simple Neurite Tracer plugin.RNA-seq libraries were prepared and the reads were preprocessed as de- scribed in Chitwood et al. and are outlined here. mRNA isolation and RNA-seq library preparation were performed from 80 samples at a time using a high-throughput RNA-seq protocol . The prepared libraries were sequenced in pools of 12 for replicates 1 and 2 and in pools of 80 for replicates 3 and 4 at the UC Davis Genome Centre Ex- pression Analysis Core using the HiSeq 2000 platform . Preprocessing of reads involved removal of lowquality reads , trimming of low-quality bases from the 39 ends of the reads, and removal of adapter con- tamination using custom Perl scripts. The quality-filtered reads were sorted into individual libraries based on barcodes, and then barcodes were trimmed using custom Perl script.Mapping and normalization were done on the iPLANT Atmosphere cloud server . S. lycopersicum reads were mapped to 34,727 tomato cDNA sequences predicted from the gene models from the ITAG2.4 genome build . A pseudo reference list was constructed for S. pennellii using the homologous regions between S. pennellii scaffolds v.1.9 and S. lycopersicum cDNA references above. Using the defined boundaries of IL’s, custom R scripts were used to prepare IL-specific references that had the S. pennellii sequences in theintrogressed region and S. lycopersicum sequences outside the introgressed region. The reads were mapped using BWA using default parameters except for the following that were changed: bwa aln: -k 1 -l 25 -e 15 -i 10 and bwa samse: -n 0.
The bam alignment files were used as inputs for express software to account for reads mapped to multiple locations . The estimated read counts obtained for each gene for each sample from express were treated as raw counts for DE analysis. The counts were then filtered in R using the Bioconductor package EdgeR version 2.6.10 such that only genes that had more than two reads per million in at least three of the samples were kept. Normalization of read counts was performed using the trimmed mean of M-values method , and normalized read counts were used to identify genes that are differentially expressed at the transcript level in each IL compared to cv M82 parent as well as between two parents, S. pennellii and M82. The DE genes for each IL were compared to those between the two parents to identify genes that were differentially expressed for the IL but not for S. pennellii compared to cv M82. Those genes were considered to show transgressive expression pattern at the transcript level for the specific IL, whereas other DE genes were considered to show the transcript expression similar to S. pennellii.RNA-seq libraries were prepared and the reads were preprocessed as de- scribed in Chitwood et al. and are outlined here. mRNA isolation and RNA-seq library preparation were performed from 80 samples at a time using a high-throughput RNA-seq protocol . The prepared libraries were sequenced in pools of 12 for replicates 1 and 2 and in pools of 80 for replicates 3 and 4 at the UC Davis Genome Centre Ex- pression Analysis Core using the HiSeq 2000 platform . Preprocessing of reads involved removal of low-quality reads , trimming of low-quality bases from the 39 ends of the reads, and removal of adapter contamination using custom Perl scripts. The quality-filtered reads were sorted into individual libraries based on barcodes, and then barcodes were trimmed using custom Perl script.Mapping and normalization were done on the iPLANT Atmosphere cloud server . S. lycopersicum reads were mapped to 34,727 tomato cDNA sequences predicted from the gene models from the ITAG2.4 genome build .
A pseudo reference list was constructed for S. pennellii using the homologous regions between S. pennellii scaffolds v.1.9 and S. lycopersicum cDNA references above. Using the defined boundaries of IL’s, custom R scripts were used to prepare IL-specific references that had the S. pennellii sequences in the introgressed region and S. lycopersicum sequences outside the introgressed region. The reads were mapped using BWA using default parameters except for the following that were changed: bwa aln: -k 1 -l 25 -e 15 -i 10 and bwa samse: -n 0. The bam alignment files were used as inputs for express software to account for reads mapped to multiple locations . The estimated read counts obtained for each gene for each sample from express were treated as raw counts for DE analysis. The counts were then filtered in R using the Bioconductor package EdgeR version 2.6.10 such that only genes that had more than two reads per million in at least three of the samples were kept. Normalization of read counts was performed using the trimmed mean of M-values method , and normalized read counts were used to identify genes that are differentially expressed at the transcript level in each IL compared to cv M82 parent as well as between two parents, S. pennellii and M82. The DE genes for each IL were compared to those between the two parents to identify genes that were differentially expressed for the IL but not for S. pennellii compared to cv M82. Those genes were considered to show transgressive expression pattern at the transcript level for the specific IL, whereas other DE genes were considered to show the transcript expression similar to S. pennellii.Transcript level patterns were correlated with three phenotypes collected from the IL’s along with the parents. Normalized estimated read counts with 3 to 4 independent replicates per IL were log2 transformed prior to the analyses. Leaf number and complexity were collected from the IL’s as outlined in Chitwood et al. under both sun and shade treatments. Hypocotyl lengths were measured as detailed above. To test whether the transcript level for a given gene was correlated with a particular phenotype, blueberry containers boostrapping analyses were performed. Transcript levels and phenotype data were randomly per- muted using the sample function against IL and then merged. For each analysis, 1,000 replications were performed and the P values were calculated from the Spearman’s rho value distributions. P values were adjusted for multiple comparisons using the BH correction . Significant correlations were identified as those with an adjusted P value, 0.05, and the mean rho value was used to designate the correlation as either positive or negative . All analyses were implemented using the statistical software R and custom scripts .eQTL mapping analyses were performed to determine whether the transcript level of a gene is correlated with the presence of a specific introgression from S. pennellii into S. lycopersicum cv M82. This correlation was examined at the level of “bin,” with a bin defined as a unique overlapping region between introgressions. Examining eQTL at the bin level enables those eQTL to be mapped to considerably smaller intervals than the IL’s themselves . eQTL mapping analyses were performed on the normalized estimated read counts with 3 to 4 independent replicates per IL, which were log2 transformed prior to the analyses. To test whether the transcript level for a given gene is correlated with the presence of a particular bin, a Spearman’s rank correlation test was used with ties resolved using the midrank method. P values were adjusted for multiple comparisons using the BH correction . Significant eQTL were identified as those with an adjusted P value, 0.05, and Spearman’s rho was used to designate the eQTL as up or down .
Significant eQTL were also designated as cis if the gene was located on the bin with which it is correlated; trans if the gene was correlated with a bin that is neither the bin it is on nor a bin that shares an overlapping IL with the correlated bin; or chromo0 if the gene lies in the unassembled part of the genome. When a gene has a designation cis- eQTL, and a secondary correlation was found with a bin that shares an overlapping introgression, this secondary correlation was not designated as an eQTL. When a gene does not have a designated cis-eQTL and a correlation was found with a bin that shares an overlapping introgression, this correlation was designated as a trans-eQTL. All analyses were implemented using the statistical software R and custom scripts .t-SNE or t-distributed stochastic neighbor embedding is a nonlinear dimensionality reduction method, which faithfully maps objects in high dimensional space into low dimensional space . Crowding is avoided through the longtailed t-distribution, which forces nonneighbor clusters farther away from each other in V-space than those clusters actually are in H-space . The ex- aggerated separation of non-neighboring clusters improves 2D resolution, allowing identification of novel groupings not readily apparent in other clustering methods. However, this method is resource intensive, and with higher dimensionality, the number of genes that can be analyzed is limited. We have used Barnes-Hut-SNE, a newer implementation of t-SNE that greatly increases the speed and number of genes that can be analyzed, for the present analysis . BarnesHut-SNE accomplishes this efficiency through the use of a Vantage Point tree and a variant of the Barnes-Hut algorithm . For clustering, 2D maps were generated using a perplexity of 30 and without the initial PCA step from the Barnes-Hut-SNE R implementation . Theta was set to 0.3 based on van der Maaten to maintain an accurate dimensionality reduction without sacrificing processing speed.The DBs can algorithm was used to select modules from the Barnes-Hut-SNE results . This algorithm had the advantage of both selecting modules and removing any genes that fell between modules. The scanning range and minimum seed points were selected manually and used to deter- mine if any one point is a member of a cluster based on physical positioning within the mapping relative to neighboring points. A minpts of 25 was used to capture smaller modules on the periphery, and an epsilon of 2.25 was used to avoid the overlapping of internal and closely spaced modules.Box plots were generated from normalized transcript abundance values for each module. The ribbon plot was generated from correlated abundance values from leaf development and photosynthesis related modules. These plots were generated using ggplot form the ggplot2 R Package . The median transcript levels of the genes mapped to a module were calculated for each IL and replicated for all modules.