{"id":1982,"date":"2025-02-21T07:02:31","date_gmt":"2025-02-21T07:02:31","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2025\/02\/21\/unraveling-spatially-variable-genes-a-statistical-perspective-on-spatial-transcriptomics\/"},"modified":"2025-02-21T07:02:31","modified_gmt":"2025-02-21T07:02:31","slug":"unraveling-spatially-variable-genes-a-statistical-perspective-on-spatial-transcriptomics","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2025\/02\/21\/unraveling-spatially-variable-genes-a-statistical-perspective-on-spatial-transcriptomics\/","title":{"rendered":"Unraveling Spatially Variable Genes: A Statistical Perspective on Spatial Transcriptomics"},"content":{"rendered":"<p>    Unraveling Spatially Variable Genes: A Statistical Perspective on Spatial Transcriptomics<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<p class=\"wp-block-shortcode\">[<\/p>\n<p class=\"wp-block-paragraph\"><em>The article was written by Guanao Yan, Ph.D. student of Statistics and Data Science at UCLA. Guanao is the first author of the Nature Communications review article [1].<\/em><\/p>\n<p class=\"wp-block-paragraph\">Spatially resolved transcriptomics (SRT) is revolutionizing <a href=\"https:\/\/towardsdatascience.com\/tag\/genomics\/\" title=\"Genomics\">Genomics<\/a> by enabling the high-throughput measurement of gene expression while preserving spatial context. Unlike single-cell RNA sequencing (scRNA-seq), which captures transcriptomes without spatial location information, SRT allows researchers to map gene expression to precise locations within a tissue, providing insights into tissue organization, cellular interactions, and spatially coordinated gene activity. The increasing volume and complexity of SRT data necessitate the development of robust statistical and computational methods, making this field highly relevant to data scientists, statisticians, and machine learning (ML) professionals. Techniques such as spatial statistics, graph-based models, and deep learning have been applied to extract meaningful biological insights from these data.<\/p>\n<p class=\"wp-block-paragraph\">A key step in SRT analysis is the detection of spatially variable genes (SVGs)\u2014genes whose expression varies non-randomly across spatial locations. Identifying SVGs is crucial for characterizing tissue architecture, functional gene modules, and cellular heterogeneity. However, despite the rapid development of computational methods for SVG detection, these methods vary widely in their definitions and statistical frameworks, leading to inconsistent results and challenges in interpretation.<\/p>\n<p class=\"wp-block-paragraph\">In our recent review published in <em>Nature Communications <\/em>[1], we systematically examined 34 peer-reviewed SVG detection methods and introduced a classification framework that clarifies the biological significance of different SVG types. This article provides an overview of our findings, focusing on the three major categories of SVGs and the statistical principles underlying their detection.<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" data-dominant-color=\"dad8d8\" data-has-transparency=\"false\" style=\"--dominant-color: #dad8d8;\" fetchpriority=\"high\" decoding=\"async\" width=\"512\" height=\"225\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/unnamed-13.png?resize=512%2C225&#038;ssl=1\" alt=\"\" class=\"wp-image-598262 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/unnamed-13.png 512w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/unnamed-13-300x132.png 300w\" sizes=\"(max-width: 512px) 100vw, 512px\"><\/figure>\n<p class=\"wp-block-paragraph\">SVG detection methods aim to uncover genes whose spatial expression reflects biological patterns rather than technical noise. Based on our review of 34 peer-reviewed methods, we categorize SVGs into three groups: Overall SVGs, Cell-Type-Specific SVGs, and Spatial-Domain-Marker SVGs (Figure 2).<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" data-dominant-color=\"f2f3f2\" data-has-transparency=\"false\" style=\"--dominant-color: #f2f3f2;\" decoding=\"async\" width=\"512\" height=\"236\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/unnamed-14.png?resize=512%2C236&#038;ssl=1\" alt=\"\" class=\"wp-image-598263 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/unnamed-14.png 512w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/unnamed-14-300x138.png 300w\" sizes=\"(max-width: 512px) 100vw, 512px\"><figcaption class=\"wp-element-caption\">Image created by the authors, adapted from [1]. Publication timeline of 34 SVG detection methods. Colors represent three SVG categories: overall SVGs (green), cell-type-specific SVGs (red), and spatial-domain-marker SVGs (purple).<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">Methods for detecting the three SVG categories serve different purposes (Fig. 3). First, the detection of overall SVGs screens informative genes for downstream analyses, including the identification of spatial domains and functional gene modules. Second, detecting cell-type-specific SVGs aims to reveal spatial variation within a cell type and help identify distinct cell subpopulations or states within cell types. Third, spatial-domain-marker SVG detection is used to find marker genes to annotate and interpret spatial domains already detected. These markers help understand the molecular mechanisms underlying spatial domains and assist in annotating tissue layers in other datasets.<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" data-dominant-color=\"bbb57e\" data-has-transparency=\"true\" style=\"--dominant-color: #bbb57e;\" decoding=\"async\" width=\"512\" height=\"146\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/unnamed-15.png?resize=512%2C146&#038;ssl=1\" alt=\"\" class=\"wp-image-598264 has-transparency\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/unnamed-15.png 512w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/unnamed-15-300x86.png 300w\" sizes=\"(max-width: 512px) 100vw, 512px\"><figcaption class=\"wp-element-caption\">Image created by the authors, adapted from [1]. Conceptual visualization of three SVG categories: overall SVGs, cell-type-specific SVGs, and spatial-domain-marker SVGs. The left column shows a tissue slice with two cell types and three spatial domains. The right column shows exemplar genes with colors representing the expression levels shown for an overall SVG, a cell-type-specific SVG, and a spatial-domain-marker SVG, respectively.<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">The relationship among the three SVG categories depends on the detection methods, particularly the null and alternative hypotheses they employ. If an overall SVG detection method uses the null hypothesis that a non-SVG\u2019s expression is independent of spatial location and the alternative hypothesis that any deviation from this independence indicates an SVG, then its SVGs should theoretically include both cell-type-specific SVGs and spatial-domain-marker SVGs. For example, DESpace [2] is a method that detects both overall SVGs and spatial-domain-marker SVGs, and its detected overall SVGs must be marker genes for some spatial domains. This inclusion relationship holds true except in extreme scenarios, such as when a gene exhibits opposite cell-type-specific spatial patterns that effectively cancel each other out. However, if an overall SVG detection method\u2019s alternative hypothesis is defined for a specific spatial expression pattern, then its SVGs may not include some cell-type-specific SVGs or spatial-domain-marker SVGs.<\/p>\n<p class=\"wp-block-paragraph\">To understand how SVGs are detected, we categorized the statistical approaches into three major types of hypothesis tests:\u00a0<\/p>\n<ol class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Dependence Test \u2013 Examines the dependence between a gene\u2019s expression level and the spatial location.\u00a0<\/li>\n<li class=\"wp-block-list-item\">Regression Fixed-Effect Test \u2013 Examines whether some or all of the fixed-effect covariates, for instance, spatial location, contribute to the mean of the response variable, i.e., a gene\u2019s expression.\u00a0<\/li>\n<li class=\"wp-block-list-item\">Regression Random-Effect Test (Variance Component Test) \u2013 Examines whether the random-effect covariates, for instance, spatial location, contribute to the variance of the response variable, i.e., a gene\u2019s expression.<\/li>\n<\/ol>\n<p class=\"wp-block-paragraph\">To further explain how these tests are used for SVG detection, we denote Y as gene\u2019s expression level and S as the spatial locations. Dependence test is the most general hypothesis test for SVG detection. For a given gene, it decides whether the gene\u2019s expression level Y is independent of the spatial location S, i.e., the null hypothesis is:<\/p>\n<figure class=\"wp-block-image size-full is-resized\"><img data-recalc-dims=\"1\" data-dominant-color=\"000000\" data-has-transparency=\"true\" loading=\"lazy\" decoding=\"async\" width=\"512\" height=\"114\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/unnamed-16.png?resize=512%2C114&#038;ssl=1\" alt=\"\" class=\"wp-image-598265 has-transparency\" style=\"--dominant-color: #000000; width:195px;height:auto\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/unnamed-16.png 512w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/unnamed-16-300x67.png 300w\" sizes=\"auto, (max-width: 512px) 100vw, 512px\"><\/figure>\n<p class=\"wp-block-paragraph\">There are two types of regression tests: fixed-effect tests, where the effect of the spatial location is assumed to be fixed, and random-effect tests, which assume the effect of the spatial location as random. To explain these two types of tests, we use a linear mixed model for a given gene as an example:<\/p>\n<figure class=\"wp-block-image size-full is-resized\"><img data-recalc-dims=\"1\" data-dominant-color=\"000000\" data-has-transparency=\"true\" loading=\"lazy\" decoding=\"async\" width=\"512\" height=\"53\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/unnamed-17.png?resize=512%2C53&#038;ssl=1\" alt=\"\" class=\"wp-image-598266 has-transparency\" style=\"--dominant-color: #000000; width:453px;height:auto\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/unnamed-17.png 512w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/unnamed-17-300x31.png 300w\" sizes=\"auto, (max-width: 512px) 100vw, 512px\"><\/figure>\n<p class=\"wp-block-html\">\nwhere the response variable ( Y_i ) is the gene\u2019s expression level at spot ( i ),<br \/>\n( x_i ) ( epsilon ) ( R^p ) indicates the fixed-effect covariates of spot ( i ),<br \/>\n( z_i ) ( epsilon ) ( R^q ) denotes the random-effect covariates of spot ( i ),<br \/>\nand ( epsilon_i ) is the random measurement error at spot ( i ) with zero mean. <\/p>\n<p>In the model parameters, ( beta_0 ) is the (fixed) intercept, ( beta ) ( epsilon ) ( R^p ) indicates the fixed effects, and ( gamma ) ( epsilon ) ( R^q ) denotes the random effects with zero means and the covariance matrix:\n<\/p>\n<figure class=\"wp-block-image size-full is-resized\"><img data-recalc-dims=\"1\" data-dominant-color=\"e8e8e8\" data-has-transparency=\"false\" loading=\"lazy\" decoding=\"async\" width=\"264\" height=\"50\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/unnamed-18.png?resize=264%2C50&#038;ssl=1\" alt=\"\" class=\"wp-image-598269 not-transparent\" style=\"--dominant-color: #e8e8e8; width:174px;height:auto\"><\/figure>\n<p class=\"wp-block-paragraph\">In this linear mixed model, independence is assumed between random effect and random errors and among random errors.<\/p>\n<p class=\"wp-block-html\">Fixed-effect tests examine whether some or all of the fixed-effect covariates ( x_i ) (dependent on spatial locations <i>S<\/i>) contribute to the mean of the response variable. If all fixed-effect covariates make no contribution, then:<\/p>\n<figure class=\"wp-block-image size-full is-resized\"><img data-recalc-dims=\"1\" data-dominant-color=\"000000\" data-has-transparency=\"true\" loading=\"lazy\" decoding=\"async\" width=\"512\" height=\"79\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/unnamed-19.png?resize=512%2C79&#038;ssl=1\" alt=\"\" class=\"wp-image-598270 has-transparency\" style=\"--dominant-color: #000000; width:222px;height:auto\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/unnamed-19.png 512w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/unnamed-19-300x46.png 300w\" sizes=\"auto, (max-width: 512px) 100vw, 512px\"><\/figure>\n<p class=\"wp-block-paragraph\">The null hypothesis<\/p>\n<figure class=\"wp-block-image size-full is-resized\"><img data-recalc-dims=\"1\" data-dominant-color=\"000000\" data-has-transparency=\"true\" loading=\"lazy\" decoding=\"async\" width=\"512\" height=\"130\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/unnamed-20.png?resize=512%2C130&#038;ssl=1\" alt=\"\" class=\"wp-image-598271 has-transparency\" style=\"--dominant-color: #000000; width:141px;height:auto\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/unnamed-20.png 512w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/unnamed-20-300x76.png 300w\" sizes=\"auto, (max-width: 512px) 100vw, 512px\"><\/figure>\n<p class=\"wp-block-paragraph\">implies<\/p>\n<figure class=\"wp-block-image size-full is-resized\"><img data-recalc-dims=\"1\" data-dominant-color=\"000000\" data-has-transparency=\"true\" loading=\"lazy\" decoding=\"async\" width=\"512\" height=\"44\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/unnamed-21.png?resize=512%2C44&#038;ssl=1\" alt=\"\" class=\"wp-image-598272 has-transparency\" style=\"--dominant-color: #000000; width:401px;height:auto\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/unnamed-21.png 512w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/unnamed-21-300x26.png 300w\" sizes=\"auto, (max-width: 512px) 100vw, 512px\"><\/figure>\n<p class=\"wp-block-html\">Random-effect tests examine whether the random-effect covariates ( z_i ) (dependent on spatial locations <i>S<\/i>) contribute to the variance of the response variable Var\u2061Yi, focusing on the decomposition: <\/p>\n<figure class=\"wp-block-image size-full is-resized\"><img data-recalc-dims=\"1\" data-dominant-color=\"e2e2e2\" data-has-transparency=\"false\" loading=\"lazy\" decoding=\"async\" width=\"389\" height=\"28\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/unnamed-22.png?resize=389%2C28&#038;ssl=1\" alt=\"\" class=\"wp-image-598273 not-transparent\" style=\"--dominant-color: #e2e2e2; width:525px;height:auto\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/unnamed-22.png 389w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/unnamed-22-300x22.png 300w\" sizes=\"auto, (max-width: 389px) 100vw, 389px\"><\/figure>\n<p class=\"wp-block-paragraph\">and testing if the contribution of the random-effect covariates\u00a0is zero. The null hypothesis:<\/p>\n<figure class=\"wp-block-image size-full is-resized\"><img data-recalc-dims=\"1\" data-dominant-color=\"ececec\" data-has-transparency=\"false\" loading=\"lazy\" decoding=\"async\" width=\"77\" height=\"28\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/unnamed-23.png?resize=77%2C28&#038;ssl=1\" alt=\"\" class=\"wp-image-598274 not-transparent\" style=\"--dominant-color: #ececec; width:122px;height:auto\"><\/figure>\n<p class=\"wp-block-paragraph\">implies<\/p>\n<figure class=\"wp-block-image size-full is-resized\"><img data-recalc-dims=\"1\" data-dominant-color=\"ececec\" data-has-transparency=\"false\" loading=\"lazy\" decoding=\"async\" width=\"212\" height=\"34\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/unnamed-24.png?resize=212%2C34&#038;ssl=1\" alt=\"\" class=\"wp-image-598275 not-transparent\" style=\"--dominant-color: #ececec; width:285px;height:auto\"><\/figure>\n<p class=\"wp-block-paragraph\">Among the 23 methods that use frequentist hypothesis tests, dependence tests and random-effect regression tests have been primarily applied to detect overall SVGs, whereas fixed-effect regression tests have been used across all three SVG categories. Understanding these distinctions is key to selecting the right method for specific research questions.<\/p>\n<p class=\"wp-block-paragraph\">Improving SVG detection methods requires balancing detection power, specificity, and scalability while addressing key challenges in spatial transcriptomics analysis. Future developments should focus on adapting methods to different SRT technologies and tissue types, as well as extending support for multi-sample SRT data to enhance biological insights. Additionally, strengthening statistical rigor and validation frameworks will be crucial for ensuring the reliability of SVG detection. Benchmarking studies also need refinement, with clearer evaluation metrics and standardized datasets to provide robust method comparisons.<\/p>\n<h3 class=\"wp-block-heading\"><strong>References<\/strong><\/h3>\n<p class=\"wp-block-paragraph\">[1] Yan, G., Hua, S.H. &amp; Li, J.J. (2025). Categorization of 34 computational methods to detect spatially variable genes from spatially resolved transcriptomics data. <em>Nature Communication<\/em>, 16, 1141. <a href=\"https:\/\/doi.org\/10.1038\/s41467-025-56080-w\">https:\/\/doi.org\/10.1038\/s41467-025-56080-w<\/a><\/p>\n<p class=\"wp-block-paragraph\">[2] Cai, P., Robinson, M. D., &amp; Tiberi, S. (2024). DESpace: spatially variable gene detection via differential expression testing of spatial clusters. Bioinformatics, 40(2). https:\/\/doi.org\/10.1093\/bioinformatics\/btae027<\/p>\n<p class=\"wp-block-shortcode\">]<\/p>\n<p class=\"wp-block-paragraph\"><\/p>\n<p>The post <a href=\"https:\/\/towardsdatascience.com\/unraveling-spatially-variable-genes-a-statistical-perspective-on-spatial-transcriptomics\/\">Unraveling Spatially Variable Genes: A Statistical Perspective on Spatial Transcriptomics<\/a> appeared first on <a href=\"https:\/\/towardsdatascience.com\/\">Towards Data Science<\/a>.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Jingyi Jessica Li<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/towardsdatascience.com\/unraveling-spatially-variable-genes-a-statistical-perspective-on-spatial-transcriptomics\/\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Unraveling Spatially Variable Genes: A Statistical Perspective on Spatial Transcriptomics [ The article was written by Guanao Yan, Ph.D. student of Statistics and Data Science at UCLA. Guanao is the first author of the Nature Communications review article [1]. Spatially resolved transcriptomics (SRT) is revolutionizing Genomics by enabling the high-throughput measurement of gene expression while [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,506,1820,83,1821,533,238],"tags":[272,661,1822],"class_list":["post-1982","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-biology","category-computational-biology","category-data-science","category-genomics","category-science","category-statistics","tag-methods","tag-spatial","tag-svgs"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/1982"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=1982"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/1982\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=1982"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=1982"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=1982"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}