Find features with high spatial autocorrelation

This function can be used to find genes with high spatial autocorrelation in SRT data. A more detailed description of the algorithm is outlined below.

Usage

CorSpatialFeatures(object, ...)

# Default S3 method
CorSpatialFeatures(
  object,
  spatnet,
  across_all = FALSE,
  calculate_pvalue = FALSE,
  nCores = NULL,
  verbose = TRUE,
  ...
)

# S3 method for class 'Seurat'
CorSpatialFeatures(
  object,
  features = NULL,
  assay_use = NULL,
  slot_use = "data",
  across_all = FALSE,
  calculate_pvalue = FALSE,
  nCores = NULL,
  verbose = TRUE,
  ...
)

Arguments

object: An object (see details)
...: Arguments passed to other methods
spatnet: A list of spatial networks created with GetSpatialNetwork. The spots in these networks should match the spots in the feature matrix.
across_all: Boolean specifying if autocorrelation scores be calculated across all samples (default FALSE).
calculate_pvalue: Boolean specifying if p-values should be calculated for each correlation value (default FALSE). Raw and BH-adjusted p-values provided.
nCores: Number of cores to use for the spatial autocorrelation calculation
verbose: Print messages
features: A character vector with features present in Seurat object. These features need to be accessible with FetchData
assay_use: Select assay to use for computation. If not specified, the default assay will be used.
slot_use: Select slot to use from assay object.

Value

Either a list of tibbles or a tibble with feature names and correlation scores. See section Mode for details.

Spatial autocorrelation

Spatial autocorrelation is the term used to describe the presence of systematic spatial variation. Positive spatial autocorrelation of a feature is the tendency for regions that are close together in space to have similar values for that feature.

A simple example is when you have an anatomical structure or a tissue type that spans across multiple neighboring spots in an SRT experiment, for example a gland, an immune infiltrate or a region of the brain. Inside such structures, you might find that the expression levels of certain genes (or other features) are highly similar and hence these genes have a positive spatial autocorrelation.

The method provided in semla works as follows. For each feature and spot, the expression is averaged across all neighboring spots (typically the 6 closest neighbors) to produce a lag expression vector. Since this vector represents the average of the surrounding spots, we can use it to test if the expression in those spots is similar to the center spot. One simple strategy is to calculate the pearson correlation between a genes' lag vector and the original expression vector which typically captures the spatial autocorrelation well.

Method steps

Load a matrix with features in rows and spots in columns: \(X_{expr}\)
Convert the corresponding spatial network to wide format and construct a nearest neighbor matrix \(N_{neighbors}\) in which neighboring spots have a value of 1 and the remaining spots have a value of 0
\(N_{neighbors}\) is then multiplied with the \(X_{expr}\) to calculate a lag vector for each feature:

\(X_{lagexpr} = (N_{neighbors}*X_{expr})/n_{neighbors}\)

where \(n_{neighbors}\) is the number of neighbors for each spot.
The spatial autocorrelation score for a feature is the 'pearson' correlation of the lag vector and the initial expression vector:

\(spatcor_{feature} = cor(X_{lagexpr}[feature, ], X_{expr}[feature, ])\)

Default method

The default method expects a matrix-like object with features in columns and spots in rows and a list of spatial networks generated with GetSpatialNetwork.

Mode

If across_all is set to TRUE, the spatial autocorrelation scores will be computed across all samples. Otherwise, the scores will be calculated for each sample separately, and returns a list with one tibble per sample.

Author

Ludvig Larsson

Examples

# \donttest{
se_mbrain <- readRDS(system.file("extdata/mousebrain",
                                 "se_mbrain",
                                 package = "semla"))
featureMat <- FetchData(se_mbrain, vars = VariableFeatures(se_mbrain)[1:100])

coordfile <-
  system.file("extdata/mousebrain/spatial",
              "tissue_positions_list.csv",
              package = "semla")

# Load coordinate data into a tibble
xys <- setNames(read.csv(coordfile, header = FALSE),
                nm = c("barcode", "selection", "grid_y", "grid_x", "y", "x"))
xys$sampleID <- 1L
xys <- xys |>
  dplyr::filter(selection == 1) |>
  dplyr::select(barcode, x, y, sampleID) |>
  tibble::as_tibble()

# Create spatial networks
spatnet <- GetSpatialNetwork(xys)
spatgenes <- CorSpatialFeatures(featureMat, spatnet, nCores = 1)
#> 
#> ── Computing spatial autocorrelation ──
#> 
#> ℹ Sample 1:
#> →   Cleaned out spots with 0 adjacent neighbors
#> →   Computed feature lag expression
#> →   Computed feature spatial autocorrelation scores
#> ✔   Returning results

# Check genes with highest spatial autocorrelation
head(spatgenes[[1]])
#> # A tibble: 6 × 2
#>   gene     cor
#>   <chr>  <dbl>
#> 1 Mbp    0.870
#> 2 Nrgn   0.857
#> 3 Olfm1  0.798
#> 4 Slc6a3 0.798
#> 5 Trh    0.770
#> 6 Th     0.757
# }


se_mbrain <- readRDS(system.file("extdata/mousebrain",
                                 "se_mbrain",
                                 package = "semla"))
se_mbrain <- se_mbrain |>
  ScaleData() |>
  RunPCA()
#> Centering and scaling data matrix
#> PC_ 1 
#> Positive:  Nrgn, Olfm1, Cck, Nptxr, Rtn1, Snca, Nov, Tmsb4x, Lamp5, Egr1 
#> 	   Crym, Cpne6, Coro1a, Arc, Hpca, Sst, Nr4a1, Npy, Chgb, Neurod6 
#> 	   Snap25, Myh7, Uchl1, Eef1a2, Cort, Grp, Stmn2, Rprm, Spink8, Mfge8 
#> Negative:  Mbp, Plp1, Apod, Mobp, Ptgds, Mog, Mal, Mag, Cnp, Aldh1a1 
#> 	   Opalin, Tcf7l2, Ddc, Lhx1os, Slc6a3, Ret, Th, Slc18a2, Sncg, Drd2 
#> 	   Chrna6, Slc10a4, Spp1, En1, Dlk1, Calb2, Hbb-bs, Pvalb, Col1a2, Tnnt1 
#> PC_ 2 
#> Positive:  Myoc, Gfap, Col1a2, Fmod, Slc13a4, Hba-a1, Hbb-bt, Slc6a20a, Hba-a2, Hbb-bs 
#> 	   Acta2, Tagln, Mgp, Ogn, Vtn, H2-Aa, Lyz2, Cd74, Myl9, Dcn 
#> 	   Myh11, Ptgds, Cytl1, H2-Eb1, Hmgcs2, Clu, Mfge8, Emp1, Npy, Cnn1 
#> Negative:  Th, Uchl1, Slc18a2, En1, Slc10a4, Slc6a3, Chrna6, Stmn2, Ret, Snap25 
#> 	   Drd2, Dlk1, Ddc, Scg2, Sncg, Eef1a2, Rtn1, Chga, Pcp4, Calb2 
#> 	   Mobp, Mbp, Chgb, Fabp5, Plp1, Pvalb, Mog, Mal, Mag, Snca 
#> PC_ 3 
#> Positive:  Trbc2, Arc, Egr1, Myl4, Nr4a1, Mbp, Mobp, Pvalb, Plp1, Opalin 
#> 	   Mog, Mal, Cnp, Mag, Snap25, Lamp5, Ighm, Tcf7l2, Cplx3, Tgm3 
#> 	   Ighg2c, Pcp4, Hpca, Ly6d, Eef1a2, Tnnt1, Chga, Neurod6, Prph, Ctgf 
#> Negative:  Nnat, Slc18a2, Dlk1, Slc6a3, Slc10a4, En1, Th, Chrna6, Dcn, Sncg 
#> 	   Cpne7, Drd2, Ddc, Ret, Hpcal1, Ecel1, Cpne6, Col1a2, Trh, Calb2 
#> 	   Cd24a, Fmod, Mgp, Snca, Lypd1, Slc13a4, Fibcd1, Crym, Spink8, Slc6a20a 
#> PC_ 4 
#> Positive:  Nr4a1, Arc, Lamp5, Egr1, Myl4, Trbc2, Chrna6, Tagln, En1, Snap25 
#> 	   Th, Slc18a2, Acta2, Col1a2, Myh11, Fmod, Hba-a2, Slc10a4, Slc6a3, Slc13a4 
#> 	   Ret, Tgm3, Hbb-bt, Slc6a20a, Ighm, Hbb-bs, Hba-a1, Vtn, Mgp, Drd2 
#> Negative:  Spink8, Fibcd1, Tmsb4x, Nnat, Lefty1, Crym, Cpne7, Nos1, Dcn, Cpne6 
#> 	   Grp, Homer3, Htr3a, Tac1, Fabp5, C1ql2, Trh, Opalin, Mog, Plp1 
#> 	   Mag, Calb2, Ecel1, Gfap, Cnp, Tcf7l2, Lypd1, Vgll3, Hpcal1, Mal 
#> PC_ 5 
#> Positive:  Pcp4, Tcf7l2, Snap25, Uchl1, Eef1a2, Chga, Stmn2, Lhx1os, Calb2, Scg2 
#> 	   Fabp5, Bok, Prkcd, Pvalb, Cartpt, Chgb, Tnnt1, Gpx3, Slc20a1, Rtn1 
#> 	   C1ql2, Spp1, Hpcal1, Ptgds, Pitx2, Lypd1, Aldh1a1, Ecel1, Vtn, Nme7 
#> Negative:  En1, Chrna6, Th, Slc18a2, Slc10a4, Ddc, Lefty1, Arc, Slc6a3, Neurod6 
#> 	   Grp, Nov, Lamp5, Fibcd1, Nr4a1, Drd2, Spink8, Vip, Myl4, Ret 
#> 	   Egr1, Dlk1, Nrgn, Gfap, Trbc2, Tgm3, Myh7, Tac2, Sncg, Npy 

# Compute spatial autocorrelation for variable features
spatgenes <- CorSpatialFeatures(se_mbrain,
                                features = VariableFeatures(se_mbrain),
                                nCores = 1)
#> 
#> ── Computing spatial autocorrelation ──
#> 
#> ℹ Sample 1:
#> →   Cleaned out spots with 0 adjacent neighbors
#> →   Computed feature lag expression
#> →   Computed feature spatial autocorrelation scores
#> ✔   Returning results

# Check genes with highest spatial autocorrelation
head(spatgenes[[1]])
#> # A tibble: 6 × 2
#>   gene     cor
#>   <chr>  <dbl>
#> 1 Mbp    0.870
#> 2 Nrgn   0.857
#> 3 Cck    0.813
#> 4 Olfm1  0.798
#> 5 Slc6a3 0.798
#> 6 Trh    0.770

# Note that the top variable genes are blood related (hemoglobin genes)
# These genes have lower spatial autocorrelation since blood vessels
# typically only cover a few spots and more randomly dispersed throughput the tissue
head(VariableFeatures(se_mbrain))
#> [1] "Hbb-bs" "Hba-a1" "Hba-a2" "Hbb-bt" "Slc6a3" "Th"    

# \donttest{
# The same principle can be used to estimate spatial autocorrelation for other features,
# for example dimensionality reduction vectors
spatpcs <- CorSpatialFeatures(se_mbrain,
                             features = paste0("PC_", 1:10),
                             nCores = 1)
#> 
#> ── Computing spatial autocorrelation ──
#> 
#> ℹ Sample 1:
#> →   Cleaned out spots with 0 adjacent neighbors
#> →   Computed feature lag expression
#> →   Computed feature spatial autocorrelation scores
#> ✔   Returning results

# Calculate spatial autocorrelation scores for principal components
head(spatpcs[[1]])
#> # A tibble: 6 × 2
#>   gene    cor
#>   <chr> <dbl>
#> 1 PC_1  0.934
#> 2 PC_3  0.890
#> 3 PC_4  0.843
#> 4 PC_2  0.822
#> 5 PC_7  0.768
#> 6 PC_5  0.739

# Compute spatial autocorrelation scores for multiple datasets
se_mcolon <- readRDS(system.file("extdata/mousecolon",
                                 "se_mcolon",
                                 package = "semla"))
se_merged <- MergeSTData(se_mbrain, se_mcolon) |>
  FindVariableFeatures()

spatgenes <- CorSpatialFeatures(se_merged,
                                features = VariableFeatures(se_merged),
                                nCores = 1)
#> 
#> ── Computing spatial autocorrelation ──
#> 
#> ℹ Sample 1:
#> →   Cleaned out spots with 0 adjacent neighbors
#> →   Computed feature lag expression
#> →   Computed feature spatial autocorrelation scores
#> ℹ Sample 2:
#> →   Cleaned out spots with 0 adjacent neighbors
#> →   Computed feature lag expression
#> →   Computed feature spatial autocorrelation scores
#> ✔   Returning results

# Check spatial autocorrelation scores mouse brain data
head(spatgenes[[1]])
#> # A tibble: 6 × 2
#>   gene     cor
#>   <chr>  <dbl>
#> 1 Mbp    0.870
#> 2 Nrgn   0.857
#> 3 Cck    0.813
#> 4 Olfm1  0.798
#> 5 Slc6a3 0.798
#> 6 Trh    0.770
# Check spatial autocorrelation scores mouse colon data
head(spatgenes[[2]])
#> # A tibble: 6 × 2
#>   gene    cor
#>   <chr> <dbl>
#> 1 Prdx6 0.701
#> 2 Car1  0.616
#> 3 Cnn1  0.562
#> 4 Tgm3  0.556
#> 5 Acta2 0.536
#> 6 Tagln 0.530
# }