The transitory increase in the cellular gallic acid concentration may regulate the expression/activity of UGT that convert gallic acid to the glucoside derivatives . Indeed, gallic acid 3-Oand 4-O-glucosides accumulated in the ugt84a23 ugt84a24 hairy roots . In addition, transcriptome and real-time qPCR analyses identified 11 UGTs with increased expression in ugt84a23 ugt84a24 and one of the candidate UGTs, PgUGT72BD1, exhibited regioselective glucosylation of gallic acid at the 4-OH position . However, none of the candidate UGTs produced gallic acid 3-O-glucoside, suggesting that the gallic acid 3-O-glucosylation activity may be regulated at a level other than transcription.Therefore, even though PgUGT72BD1 is expressed in the wild-type pomegranate roots and hairy roots, gallic acid is mainly used for the biosynthesis of β-glucogallin by PgUGT84A23 and PgUGT84A24 due to their much higher catalytic efficiencies than PgUGT72BD1. Indeed, our previous metabolite profiling analysis did not identify gallic acid 4-Oglucoside in any pomegranate tissues. These results also suggest that the primary role of PgUGT72BD1 in pomegranate roots could be glycosylating aglycones other than gallic acid. Intriguingly, HT production was not completely abolished in ugt84a23 ugt84a24 , suggesting that there could be additional UGT contributing to β-glucogallin formation in pomegranate. Of the 17 so far defined UGT phylogenetic groups in plants,vertical farming in shipping containers PgUGT72BD1 belongs to group E that contains UGT71, UGT72, and UGT88 gene families .
Regioselective glycosylation of hydroxybenzoic acids was previously reported for members of group E UGTs, including AtUGT71B1 that only glycosylates the 4-OH position and AtUGT71C1, AtUGT71C4, and AtUGT72B1 that specifically glycosylate the 3-OH position. Six amino acids are conserved in the hydroxybenzoic acid/ gallic acid 3-O or 4-O UGTs but distinct between the two groups of regioselective UGTs . The function of these amino acids in determining the regioselectivity of the corresponding UGTs can be explored by site-directed mutagenesis and enzyme assays. In addition, once the gallic acid 3-O UGT is cloned in pomegranate, the protein sequences and structural features of the gallic acid 3-O and 4-O UGTs can be compared to identify the key amino acid for regioselectivity. Furthermore, it was proposed that the regioselectivity for hydroxycoumarins was switched among the UGT71, UGT72, and UGT88 families during the evolution of group E UGTs. It will be interesting to understand whether regioselectivity switching event for gallic acid also occurred among these UGT gene families.Haplotype phasing and navigating between allelic and nonallelic variation are the major challenges in assembling genomes of out crossing species with high levels of heterozygosity such as found in members of the genus Juglans. Genome sequencing targeting out crossing plants employed inbred lines, haploids, and megagametophytes to avoid heterozygosity. Interspecific hybrids offer another strategy to avoid heterozygosity. Since the genome of an interspecific hybrid is usually comprised of haploid genomes of the parental species, interspecific hybrids have the same advantages for genome sequencing as haploids, but are usually easy to produce. Technical difficulties with allocating scaffolds to parental genomes have precluded the deployment of hybrids in genome sequencing.
Using an interspecific hybrid between cultivated walnut and its wild relative J. microcarpa, we describe here a novel approach to sequencing hybrid genomes which results ina cost-effective high-quality genome assembly for both parents. The cultivated Persian/English walnut, Juglans regia, is native to Asia whereas J. microcarpa is native to North America, where it occurs in riparian areas in the southwestern USA. Both species are wind-pollinated, highly heterozygous, and intolerant of inbreeding. Their hybrids are infertile. Both have a genome size of about 600 MB with n = 16. English walnut is an important nut crop with 3.8 million tons harvested worldwide in 2017 . Walnut production has been steadily increasing in part due to health benefits derived from including walnuts in the human diet. In the USA, English walnut trees are grown commercially using rootstocks chosen for their ability to tolerate such soilborne pathogens as Phytophthora spp., lesion nematodes, and Agrobacterium tumefaciens, which are all serious pathogens of walnut. The commercial hybrid rootstock J. microcarpa × J. regia possessing tolerance to soil borne diseases is extensively used in walnut production in California. The development of genomic resources for walnut and its wild relatives, including reference-quality genome sequences, will accelerate genetic improvement of walnut scions and rootstocks. Juglans and its relatives in the family Juglandaceae are members of order Fagales, which includes many important forest trees. Reference-quality genome sequences will facilitate comparative genomics and will advance biology of this important group of woody perennials. Recent attempts to sequence the heterozygous genome of English walnut using traditional approaches resulted in assemblies with 4402 scaffolds with N50 = 640 kb and 25,670 scaffolds with N50 = 310 kb.
An attempt to sequence the heterozygous genome of J. microcarpa resulted in an even more fragmented assembly with 329,873 scaffolds with N50 = 136 kb.The novel sequencing approach described here exploits the synergy between long-read sequencing and optical genome mapping. The average length of reads produced with long read platforms exceeds the lengths of a vast majority of plant long-terminal-repeat retrotransposons , which results in a dramatically improved sequence assembly. Contigs or scaffolds assembled from long reads are sufficiently long to be aligned on genome-wide optical maps, which can be assembled with very high accuracy, even for large or polyploid plant genomes. Alignments of the optical maps of a hybrid onto the optical maps of its parents will assign contigs to parental genomes, while also serving as an assembly quality control. We used the assembled J. regia and J. microcarpa genome sequences in conjunction with the Juglandoid wholegenome-duplication, and the Juglandaceae fossil record to calibrate the molecular clock rate for woody perennials. We used the calibrated molecular clock to estimate the time of divergence of Juglans species and other woody perennials. Based on synteny within Juglans genomes, we allocated the 16 Juglans chromosomes produced by the Juglandoid WGD into eight homoeologous chromosome pairs and analyzed their evolution. Finally, we exploited the contiguity of the assemblies in the analyses of the structure and evolution of Juglans telomeres and centromeres and the distribution of disease resistance genes in the J. regia and J. microcarpa genomes.Next, we constructed two optical maps for the hybrid and one for each of its parents . The N50 of the optical contigs ranged from 1.31 to 2.90 Mb. The parental maps consisted of ‘haploid’ regions, in which the haplotypes were similar enough to collapse into a single contig, and ‘diploid’ regions, in which the haplotypes were dissimilar enough to be assembled into separate contigs . The haploid and diploid regions were identified by a map self-alignment . The mean length across the phased regions was 108 Mb in Serr and 33 Mb in 31.01 . Due to the inclusion of phased haplotypes in a map, the sum of the total lengths of the Serr and 31.01 optical maps was 13.4% longer than the length of the MS1-56 optical map . To ascertain whether the genome of our hybrid was complete, we compared the total length of its optical map to the sum of the lengths of the optical maps of the parents,vertical grow racks which we edited to disregard the redundant haplotypes from the phased regions . The total length of the edited maps of Serr and 31.01 differed from the length of the map of MS1-56 by only 1 Mb . We therefore concluded that the hybrid genome was a complete representation of the parental genomes. We aligned the sequence contigs on the optical map of the hybrid, stitched them into scaffolds , and allocated 40 of them comprising 99.85% of the hybrid genome assembly into the parental genomes with the aid of the optical maps of the parents . The remaining 224 scaffolds representing 0.15% of the hybrid genome sequence were too short to be aligned on an optical map and were aligned on Illuminasequence of the J. regia cv Chandler for assigning to parental genomes. Finally, we ordered and oriented scaffolds on high density genetic maps producing 16 pseudomolecules for each of the two genomes. They had only five gaps of unknown lengths ; the remaining gaps were estimated based on the optical map alignments. We then reduced gap lengths or closed them entirely with unassigned contigs or contigs produced with the 10X Genomics technology . We mapped 89X Illumina reads of the hybrid to the genome assemblies and detected 48,717 and 51,789 indels and 748 and 1,205 base substitutions in the Serr and 31.01 assemblies, respectively . We subsequently corrected these errors and produced the fifinal assemblies JrSerr_v1.0 and Jm31.01_v1.0 .Of the genes annotated on the Serr pseudomolecules, 26,403 were collinear with genes on the J. microcarpa pseudomolecules . The two genomes differed by 28 inversions involving > 3 collinear genes, 21 segmental duplications, 3 intra- and 14 inter-chromosomal interstitial translocations, but no terminal translocation . Only 90 of the JrSerr_v1.0 genes were not detected in the Jm31.01_v1.0 assembly.
We computed Ks divergence among the J. regia and J. microcarpa genes to analyze gene duplication and divergence. The Ks plot showed three peaks . The first peak mostly consisted of Ks values between orthologous genes in the two genomes and reflected their divergence. The second peak coincided with the major Ks peak in self-searches within the Serr and 31.01 genomes and reflected the divergence of paralogous genes, which originated by the Juglandoid WGD. The third peak coincided with the major peak in self-searches within the grape genome and reflected divergence between genes duplicated by the whole genome triplication first described in the grape genome. These inferences were confirmed by self-alignment plots , a plot of the JrSerr_v1.0 pseudomolecules against the grape pseudomolecules , and a plot against the Amborella trichopoda scaffolds.We analyzed gene collinearity along homoeologous chromosomes within the Serr genome and detected 38 paracentric inversions involving >3 collinear genes and 20 intra- and 16 inter-chromosomal interstitial translocations. The homoeologues did not differ by any terminal translocations and retained a 1:1 relationship. Using the grape genome as an outgroup, we assigned rearrangements to phylogenetic branches and computed the rates of their accumulation. The rates ranged from 0.4 to 1.4 major rearrangements per MY . The collinearity analysis showed that the 16 J. regia chromosomes can be built from 142 major synteny blocks making up the 19 grape chromosomes. Of them, 122 were shared by the J. regia homoeologous chromosomes, and were in the same order along the them . These rearrangements must have taken place prior to the Juglandoid WGD. We found 43.7% of the J. regia genes collinear with genes on the grape pseudomolecules .In each pair of homoeologous chromosomes within a Juglans genome, one chromosome contained more genes than the other . We denoted the homoeologues with more genes as “dominant” and those with fewer genes as “subdominant,” allocated the 16 Juglans chromosomes into eight homoeologous chromosome pairs, which we arranged in descending order based on the number of genes in the dominant chromosome , and renamed the chromosomes . There were 18,179 and 17,093 genes in the dominant subgenome and 13,107 and 12,304 genes in the subdominant subgenome in JrSerr_v1.0 and Jm31.01_v1.0, respectively . Both dominant and subdominant subgenomes had fewer genes than are in the A. trichopoda genome, which was not subject to a recent WGD . This comparison indicates that both dominant and subdominant Juglans subgenomes have lost genes since the WGD. The difference in gene loss between dominant and subdominant homoeologues was nearly constant among the homoeologous pairs , suggesting that the rate of gene loss was intrinsic to a subgenome. We subdivided homoeologous chromosomes into sections delineated by successive pairs of collinear genes and counted the numbers of singleton genes in the intervening intervals . In the JrSerr_v1.0 pseudomolecules, 13,791 singletons were on the dominant chromosomes but only 8,261 were on the subdominant chromosomes . The numbers of singletons per interval varied little along each chromosome, except for the proximal regions, indicating that gene loss was uniform along the chromosomes and occurred by many deletions involving one or few genes along a chromosome . In each chromosome, deletions were larger and more frequent in the proximal region than the rest of the chromosome, as the numbers of singletons per interval were greatly elevated in proximal regions. We also expressed the numbers of singletons per 2-Mb non-overlapping windows. Most of the windows in subdominant chromosomes contained fewer singletons than those in the dominant homoeologues . Since the divergence of the J. regia and J. microcarpa lineages 8 MYA, 28.7% of singletons were lost from the subdominant chromosomes in J. microcarpa but only 14.9% from the corresponding intervals in the dominant homoeologues . Thus, the factor that caused the asymmetry in fractionation has persisted in the subgenomes since their origin to the recent past. In 22 RNAseq datasets , genes on the dominant chromosomes were on average transcribed significantly more than their paralogues on the subdominant chromosomes .