We have also seen that taking into account hyperbolic geometry produces better low-dimensional visualizations, cf. Figure 11 and Figure 12 . Accurate representation of data across scales is a very active area of research. Special attention is being devoted to developing visualization methods that can not only cluster data in a useful way but also preserve relative positions between clusters. In particular, preserving global data structure was one of the driving factors for the UMAP method. Knowing the underlying geometry helps to position clusters appropriately and robustly map them across different runs in a visualization method. For example, the t-SNE method produces random positions of the clusters across different runs of the algorithm. This problem can in part be alleviated by additional constraints on large distances. Here we find that using a combination of a hyperbolic metric for large distances and Euclidean metric for local distances offers strong improvements in this respect. It also outperforms the recent Poincar´e map method that implements hyperbolic metric only for local distances. We notice that although h-SNE is best fit for hyperbolic data, it performs similarly as g-SNE in accuracy distances preservation. It’s future direction to further optimize h-SNE algorithm.What could be the origin of hyperbolic geometry at the large scale and Euclidean at small scale? First, any curved geometry, including hyperbolic, is locally flat, i.e. Euclidean. The scale at which non-Euclidean effects become important depends on the curvature of the space. From a biological perspective,procona valencia the Euclidean aspects can arise from intrinsic noise in gene expression. This noise effectively smoothes the underlying hierarchical process that generates the data. We find that hyperbolic effects of human gene expression can be detected by including measurements on as few as ∼ 100 probes. Why do hyperbolic effects require measurements along multiple dimensions?
The reason is that hyperbolic geometry is a representation of an underlying hierarchical process, which generates correlations between variables. These correlations become detectable above the noise once a sufficient number of measurements is made. As an example, one can think of leaves in a tree-like network, and how their activity becomes correlated when it is induced by turning on and off branches of the network. Intuitively, these correlations generate the outstanding branches of a hyperbola. We observe that these correlations can be detected by monitoring even a relatively small number of probes. This makes it possible to construct a global map of genes from partial measurements, and open new ways for combining data from different experiments. Individual olfactory receptors respond to many odor ligands, and each odor ligand evokes responses from many ORs. How the activities of ORs collectively encode natural odor mixtures remains an open question. In Chapter 1 we have demonstrated that odor molecules can be mapped onto a three dimensional hyperbolic space based on the statistics of their co-occurrence within natural mixtures, and that the principal perceptual properties of odorants, e.g. pleasantness, can be well represented by the axes in the space. This indicated that the hyperbolic embedding space of natural odorants may serve as the stimuli space for olfactory receptors. To show this we use the concentration measurements of odorants from strawberry and tomato datasets used in Zhou et al., and the OR response datasets from Hallem et al.. We combined strawberry and tomato odor datasets based on their overlapping odorants, and then selected the common odorants that are available in both natural odor datasets and receptor response datasets. The similarities of odorants in terms of OR responses were defined as the Euclidean distances of available receptor activities vectors. The co-occurrence similarities were defined as the absolute values of correlation coefficients of odorant concentrations across samples. The geometric distances of odorants were calculated using both hyperbolic and Euclidean representations of natural odorants, which are achieved by hyperbolic multi-dimensional scaling used in and Euclidean multi-dimensional scaling respectively. Figure 13 shows the correlations between the OR response similarities and odorants stimuli similarities.
The correlation is significant when using co-occurrence statistics in natural fruit samples as the stimuli, compared with the shuffling results . In the geometric representations, stimuli similarities are given by the geometric distances of the embedding points. Hyperbolic representation leads to a much higher increase of correlation compared with Euclidean representation . These findings show that OR responses capture the co-occurrence statistics of natural odorants, and that a hyperbolic model is a proper representation of odorants stimuli space for OR responses. We have shown in Chapter 2 that h-SNE outperforms other algorithms qualitatively and quantitatively in 2D visualization for Lukk data. The Lukk data is a relatively small dataset and has a limited degree of complexity. In this section, we apply h-SNE to a highly complicated dataset which contains scRNAseq measurements of very large number of cells in nine mouse brain regions. The dataset came from Saunders et al in which they used Drop-seq to profile RNA expressions in 69000 cells from nine regions in mouse brain. We analyzed the data in both a global scale which considers cells from the whole brain, and a local scale which focuses on specific brain regions. We first perform t-SNE, UMAP and h-SNE to 4500 cells equally sampled from the nine regions and visualize the global mouse brain atlas in 2D map . Some of the brain regions, such as the striatum and hippocampus, are well separated from other regions in all the three algorithms; while some other regions, such as cerebellum and substantia nigra, are broken into disconnected sub-clusters in t-SNE and UMAP embedding, and only continuously represented in h-SNE . The global structures of the disconnected components in t-SNE and UAMP are hard to detect, but a branching structure is clearly shown in h-SNE map. In h-SNE map, there are two types of regions which show distinct spatial organizations in the disk. One is called “centering region”: the cells from entopedencular, cerebellum, globus pallidus, substantia nigra and thalamus locate around the center of disks; the other type is called “branching region”: cells from the frontal cortex, posterior cortex, hippocampus and striatum stretch out from the center like branches .
Next we look at each of the nine regions separately and perform h-SNE embedding for 5000 cells sampled from each region. In these regional embeddings, cells are further separated into several sub-clusters and labeled by different colors. The cell distributions in regional mapping are consistent with the distribution in global mapping: in the five “centering regions”, different sub-clusters expand in all directions from center in the disks; while in the four “branching regions”, the sub-clusters tend to branch out in a single direction . Next we study the relationship between the cell structures and cell distributions in the disk. The cells with low gene expressions tend to locate in the center of the disk ; the granule cells in CB and dentate gyrus are small neurons and form circular shapes surrounding the centers ; cells in CA1 and CA3 of hippocampus and layer 2/3 in frontal and posterior cortex are mostly pyramidal neurons, and they form “crabs” extending to the boundaries of the disks ; polydendrocytes are glial cells and distribute like a “narrow path” going from the center to the boundary . These spatial patterns of different cell types are hard to find in t-SNE and UMAP embedding . These findings show that structurally similar cell types across brain regions are characterized by similar spatial localization patterns in the 2D hyperbolic disk. The quality of the embeddings are validated by the quantitative evaluation of data distances preservation. We calculate the correlation coefficient between the pairwise distances of the low dimensional embedding points and original high dimensional data points,flower bucket and find that h-SNE best preserves the data distances with the highest correlation coefficient across all the nine brain regions in Saunders et al . From the distances plots, we notice that h-SNE not only preserves distances with less noise , but also preserves the intrinsic geometry of the data, as can be seen in the linear distance relationships in the plots, compared with the other embeddings . In Section 3.2, we show that h-SNE embedding can characterize structure-specific cell types across brain regions, here we further study whether the method can be applied to characterize region-specific cell types across brain regions in dynamic process, e.g. cell differentiation. Marques performed single cell RNAseq for 5072 cells to study the oligodendrocytes differentiation in 10 regions of mouse central nervous system. They identified 13 cell types including: vascular and leptomeningeal cells , oligodendrocyte precursors , differentiation committed oligodendrocyte precursors , two sub-types of newly-formed oligodendrocytes , two sub-types of myelin-forming oligodendrocytes , six sub-types of mature oligodendrocytes .
These cell types represent different stages of oligodendrocytes differentiation. The authors performed t-SNE and defined the 10 regions to be immature , intermediate and mature , based on the mature cells proportions in different regions of juvenile brain. However, the classification of regions was qualitative and the differences of these “mature” regions were not explored. Here we performed t-SNE, UMAP and h-SNE on the whole dataset , and find that all the three embedding methods show clear global differentiation trajectories, but in totally different ways. t-SNE mapping is similar to the result in,where the two sub-clusters MOL1-4 and MOL5-6 are mixed together . UMAP mapping shows a branch between MFOL and MOL cells, however, this branchdoes not separate MOL1-4 and MOL5-6 since they exist in both branches. h-SNE shows a narrow path before MFOL and then “explodes” to circular shape, MOL1-4 locate at the inner circle and MOL 5-6 move further to the outer circle. Next we separate the cells based on the anatomical regions and study how the mature cells distributions differ in the five regions where mature cells abound: SN-VTA, dorsal horn, hypothalamus, cortex S1 and corpus callosum. t-SNE does not show a clear structure in the mature cells distribution in these five regions . In UMAP, the mature cells in SN-VTA and corpus callosum locate in different branches, but the cells in the dorsal horn, hypothalamus and cortex S1 distribute in both branches and cannot be distinguished. In h-SNE, we find that the mature cells of the five regions locate at different radii in the hyperbolic disk. The median embedding radii of MOL5-6 cells are larger than MOL1-4 in all the five regions, and the median radii of both MOL1-4 and MOL5-6 cells increase from SN-VTA to corpus callosum . These results show the expression patterns of the same mature cells are region-dependent in an organized way, and may provide new insights of studying cell types in different regions during the differentiation. The quality of the embeddings are also evaluated by distance preservation: the h-SNE embedding performs best with the correlation of R = 0.78 in distance plots.Nitrogen is an essential element for plant growth. The application of N fertilizers has resulted in N losses from agricultural systems into groundwater, rivers, coastal waters, and the atmosphere . Nitrate leaching and nitrous oxide emissions from agricultural soils are recognized as significant environmental threats . Nitrate leaching into rivers and estuarine ecosystems is responsible for algal blooms, eutrophication and public health risk . The greenhouse gas N2O is produced mainly during nitrification and denitrification . Nitrate leaching and N2O production from orchards have not been widely studied. If an orchard is located on light-textured and free-draining soils, receiving a high input of N the potential for leaching can be high . Our objective was to evaluate the environmental impact of upland blueberry cultivation with two different soil organic amendments regarding NO3 – leaching and N2O emissions. A 40 Lcontainer was equipped with a 5 cm thick quartz grain drainage layer on the bottom and 5 cm pine bark mulch on top of the media. A completely randomized block design with three replications was arranged for the following treatments: PM – Tateyama brown forest soil + peat moss , SC- Soil + sawdust sewage sludge compost with 5 g ferrous sulfate per L soil and SO- soil only. This study used Rabbiteye blueberry cv. ’Tifblue’ . Ammonium sulfate was applied at the rate of 134 kg/ha, divided into two applications: 45 kg/ha in July 2008 and 89 kg/ha in March 2009. The soil and plant samples were collected every third decade of each month during the growing season.