Find features with high spatial autocorrelation
cor-features.Rd
This function can be used to find genes with high spatial autocorrelation in SRT data. A more detailed description of the algorithm is outlined below.
Usage
CorSpatialFeatures(object, ...)
# Default S3 method
CorSpatialFeatures(
object,
spatnet,
across_all = FALSE,
calculate_pvalue = FALSE,
nCores = NULL,
verbose = TRUE,
...
)
# S3 method for class 'Seurat'
CorSpatialFeatures(
object,
features = NULL,
assay_use = NULL,
slot_use = "data",
across_all = FALSE,
calculate_pvalue = FALSE,
nCores = NULL,
verbose = TRUE,
...
)
Arguments
- object
An object (see details)
- ...
Arguments passed to other methods
- spatnet
A list of spatial networks created with
GetSpatialNetwork
. The spots in these networks should match the spots in the feature matrix.- across_all
Boolean specifying if autocorrelation scores be calculated across all samples (default FALSE).
- calculate_pvalue
Boolean specifying if p-values should be calculated for each correlation value (default FALSE). Raw and BH-adjusted p-values provided.
- nCores
Number of cores to use for the spatial autocorrelation calculation
- verbose
Print messages
- features
A character vector with features present in
Seurat
object. These features need to be accessible withFetchData
- assay_use
Select assay to use for computation. If not specified, the default assay will be used.
- slot_use
Select slot to use from assay object.
Value
Either a list of tibbles or a tibble with feature names and correlation scores. See section Mode for details.
Spatial autocorrelation
Spatial autocorrelation is the term used to describe the presence of systematic spatial variation. Positive spatial autocorrelation of a feature is the tendency for regions that are close together in space to have similar values for that feature.
A simple example is when you have an anatomical structure or a tissue type that spans across multiple neighboring spots in an SRT experiment, for example a gland, an immune infiltrate or a region of the brain. Inside such structures, you might find that the expression levels of certain genes (or other features) are highly similar and hence these genes have a positive spatial autocorrelation.
The method provided in semla
works as follows. For each feature and spot,
the expression is averaged across all neighboring spots (typically the 6 closest neighbors)
to produce a lag expression vector. Since this vector represents the average of the surrounding
spots, we can use it to test if the expression in those spots is similar to the center spot.
One simple strategy is to calculate the pearson correlation between a genes' lag vector and
the original expression vector which typically captures the spatial autocorrelation well.
Method steps
Load a matrix with features in rows and spots in columns: \(X_{expr}\)
Convert the corresponding spatial network to wide format and construct a nearest neighbor matrix \(N_{neighbors}\) in which neighboring spots have a value of 1 and the remaining spots have a value of 0
\(N_{neighbors}\) is then multiplied with the \(X_{expr}\) to calculate a lag vector for each feature:
\(X_{lagexpr} = (N_{neighbors}*X_{expr})/n_{neighbors}\)
where \(n_{neighbors}\) is the number of neighbors for each spot.The spatial autocorrelation score for a feature is the 'pearson' correlation of the lag vector and the initial expression vector:
\(spatcor_{feature} = cor(X_{lagexpr}[feature, ], X_{expr}[feature, ])\)
Default method
The default method expects a matrix-like object with features in columns and spots in rows
and a list of spatial networks generated with GetSpatialNetwork
.
Mode
If across_all
is set to TRUE
, the spatial autocorrelation scores will be computed
across all samples. Otherwise, the scores will be calculated for each sample separately, and returns
a list with one tibble
per sample.
See also
Other network-methods:
GetSpatialNetwork()
Other spatial-methods:
CutSpatialNetwork()
,
DisconnectRegions()
,
GetSpatialNetwork()
,
RadialDistance()
,
RegionNeighbors()
,
RunLabelAssortativityTest()
,
RunLocalG()
,
RunNeighborhoodEnrichmentTest()
Examples
# \donttest{
se_mbrain <- readRDS(system.file("extdata/mousebrain",
"se_mbrain",
package = "semla"))
featureMat <- FetchData(se_mbrain, vars = VariableFeatures(se_mbrain)[1:100])
coordfile <-
system.file("extdata/mousebrain/spatial",
"tissue_positions_list.csv",
package = "semla")
# Load coordinate data into a tibble
xys <- setNames(read.csv(coordfile, header = FALSE),
nm = c("barcode", "selection", "grid_y", "grid_x", "y", "x"))
xys$sampleID <- 1L
xys <- xys |>
dplyr::filter(selection == 1) |>
dplyr::select(barcode, x, y, sampleID) |>
tibble::as_tibble()
# Create spatial networks
spatnet <- GetSpatialNetwork(xys)
spatgenes <- CorSpatialFeatures(featureMat, spatnet, nCores = 1)
#>
#> ── Computing spatial autocorrelation ──
#>
#> ℹ Sample 1:
#> → Cleaned out spots with 0 adjacent neighbors
#> → Computed feature lag expression
#> → Computed feature spatial autocorrelation scores
#> ✔ Returning results
# Check genes with highest spatial autocorrelation
head(spatgenes[[1]])
#> # A tibble: 6 × 2
#> gene cor
#> <chr> <dbl>
#> 1 Mbp 0.870
#> 2 Nrgn 0.857
#> 3 Olfm1 0.798
#> 4 Slc6a3 0.798
#> 5 Trh 0.770
#> 6 Th 0.757
# }
se_mbrain <- readRDS(system.file("extdata/mousebrain",
"se_mbrain",
package = "semla"))
se_mbrain <- se_mbrain |>
ScaleData() |>
RunPCA()
#> Centering and scaling data matrix
#> PC_ 1
#> Positive: Nrgn, Olfm1, Cck, Nptxr, Rtn1, Snca, Nov, Tmsb4x, Lamp5, Egr1
#> Crym, Cpne6, Coro1a, Arc, Hpca, Sst, Nr4a1, Npy, Chgb, Neurod6
#> Snap25, Myh7, Uchl1, Eef1a2, Cort, Grp, Stmn2, Rprm, Spink8, Mfge8
#> Negative: Mbp, Plp1, Apod, Mobp, Ptgds, Mog, Mal, Mag, Cnp, Aldh1a1
#> Opalin, Tcf7l2, Ddc, Lhx1os, Slc6a3, Ret, Th, Slc18a2, Sncg, Drd2
#> Chrna6, Slc10a4, Spp1, En1, Dlk1, Calb2, Hbb-bs, Pvalb, Col1a2, Tnnt1
#> PC_ 2
#> Positive: Myoc, Gfap, Col1a2, Fmod, Slc13a4, Hba-a1, Hbb-bt, Slc6a20a, Hba-a2, Hbb-bs
#> Acta2, Tagln, Mgp, Ogn, Vtn, H2-Aa, Lyz2, Cd74, Myl9, Dcn
#> Myh11, Ptgds, Cytl1, H2-Eb1, Hmgcs2, Clu, Mfge8, Emp1, Npy, Cnn1
#> Negative: Th, Uchl1, Slc18a2, En1, Slc10a4, Slc6a3, Chrna6, Stmn2, Ret, Snap25
#> Drd2, Dlk1, Ddc, Scg2, Sncg, Eef1a2, Rtn1, Chga, Pcp4, Calb2
#> Mobp, Mbp, Chgb, Fabp5, Plp1, Pvalb, Mog, Mal, Mag, Snca
#> PC_ 3
#> Positive: Trbc2, Arc, Egr1, Myl4, Nr4a1, Mbp, Mobp, Pvalb, Plp1, Opalin
#> Mog, Mal, Cnp, Mag, Snap25, Lamp5, Ighm, Tcf7l2, Cplx3, Tgm3
#> Ighg2c, Pcp4, Hpca, Ly6d, Eef1a2, Tnnt1, Chga, Neurod6, Prph, Ctgf
#> Negative: Nnat, Slc18a2, Dlk1, Slc6a3, Slc10a4, En1, Th, Chrna6, Dcn, Sncg
#> Cpne7, Drd2, Ddc, Ret, Hpcal1, Ecel1, Cpne6, Col1a2, Trh, Calb2
#> Cd24a, Fmod, Mgp, Snca, Lypd1, Slc13a4, Fibcd1, Crym, Spink8, Slc6a20a
#> PC_ 4
#> Positive: Nr4a1, Arc, Lamp5, Egr1, Myl4, Trbc2, Chrna6, Tagln, En1, Snap25
#> Th, Slc18a2, Acta2, Col1a2, Myh11, Fmod, Hba-a2, Slc10a4, Slc6a3, Slc13a4
#> Ret, Tgm3, Hbb-bt, Slc6a20a, Ighm, Hbb-bs, Hba-a1, Vtn, Mgp, Drd2
#> Negative: Spink8, Fibcd1, Tmsb4x, Nnat, Lefty1, Crym, Cpne7, Nos1, Dcn, Cpne6
#> Grp, Homer3, Htr3a, Tac1, Fabp5, C1ql2, Trh, Opalin, Mog, Plp1
#> Mag, Calb2, Ecel1, Gfap, Cnp, Tcf7l2, Lypd1, Vgll3, Hpcal1, Mal
#> PC_ 5
#> Positive: Pcp4, Tcf7l2, Snap25, Uchl1, Eef1a2, Chga, Stmn2, Lhx1os, Calb2, Scg2
#> Fabp5, Bok, Prkcd, Pvalb, Cartpt, Chgb, Tnnt1, Gpx3, Slc20a1, Rtn1
#> C1ql2, Spp1, Hpcal1, Ptgds, Pitx2, Lypd1, Aldh1a1, Ecel1, Vtn, Nme7
#> Negative: En1, Chrna6, Th, Slc18a2, Slc10a4, Ddc, Lefty1, Arc, Slc6a3, Neurod6
#> Grp, Nov, Lamp5, Fibcd1, Nr4a1, Drd2, Spink8, Vip, Myl4, Ret
#> Egr1, Dlk1, Nrgn, Gfap, Trbc2, Tgm3, Myh7, Tac2, Sncg, Npy
# Compute spatial autocorrelation for variable features
spatgenes <- CorSpatialFeatures(se_mbrain,
features = VariableFeatures(se_mbrain),
nCores = 1)
#>
#> ── Computing spatial autocorrelation ──
#>
#> ℹ Sample 1:
#> → Cleaned out spots with 0 adjacent neighbors
#> → Computed feature lag expression
#> → Computed feature spatial autocorrelation scores
#> ✔ Returning results
# Check genes with highest spatial autocorrelation
head(spatgenes[[1]])
#> # A tibble: 6 × 2
#> gene cor
#> <chr> <dbl>
#> 1 Mbp 0.870
#> 2 Nrgn 0.857
#> 3 Cck 0.813
#> 4 Olfm1 0.798
#> 5 Slc6a3 0.798
#> 6 Trh 0.770
# Note that the top variable genes are blood related (hemoglobin genes)
# These genes have lower spatial autocorrelation since blood vessels
# typically only cover a few spots and more randomly dispersed throughput the tissue
head(VariableFeatures(se_mbrain))
#> [1] "Hbb-bs" "Hba-a1" "Hba-a2" "Hbb-bt" "Slc6a3" "Th"
# \donttest{
# The same principle can be used to estimate spatial autocorrelation for other features,
# for example dimensionality reduction vectors
spatpcs <- CorSpatialFeatures(se_mbrain,
features = paste0("PC_", 1:10),
nCores = 1)
#>
#> ── Computing spatial autocorrelation ──
#>
#> ℹ Sample 1:
#> → Cleaned out spots with 0 adjacent neighbors
#> → Computed feature lag expression
#> → Computed feature spatial autocorrelation scores
#> ✔ Returning results
# Calculate spatial autocorrelation scores for principal components
head(spatpcs[[1]])
#> # A tibble: 6 × 2
#> gene cor
#> <chr> <dbl>
#> 1 PC_1 0.934
#> 2 PC_3 0.890
#> 3 PC_4 0.843
#> 4 PC_2 0.822
#> 5 PC_7 0.768
#> 6 PC_5 0.739
# Compute spatial autocorrelation scores for multiple datasets
se_mcolon <- readRDS(system.file("extdata/mousecolon",
"se_mcolon",
package = "semla"))
se_merged <- MergeSTData(se_mbrain, se_mcolon) |>
FindVariableFeatures()
spatgenes <- CorSpatialFeatures(se_merged,
features = VariableFeatures(se_merged),
nCores = 1)
#>
#> ── Computing spatial autocorrelation ──
#>
#> ℹ Sample 1:
#> → Cleaned out spots with 0 adjacent neighbors
#> → Computed feature lag expression
#> → Computed feature spatial autocorrelation scores
#> ℹ Sample 2:
#> → Cleaned out spots with 0 adjacent neighbors
#> → Computed feature lag expression
#> → Computed feature spatial autocorrelation scores
#> ✔ Returning results
# Check spatial autocorrelation scores mouse brain data
head(spatgenes[[1]])
#> # A tibble: 6 × 2
#> gene cor
#> <chr> <dbl>
#> 1 Mbp 0.870
#> 2 Nrgn 0.857
#> 3 Cck 0.813
#> 4 Olfm1 0.798
#> 5 Slc6a3 0.798
#> 6 Trh 0.770
# Check spatial autocorrelation scores mouse colon data
head(spatgenes[[2]])
#> # A tibble: 6 × 2
#> gene cor
#> <chr> <dbl>
#> 1 Prdx6 0.701
#> 2 Car1 0.616
#> 3 Cnn1 0.562
#> 4 Tgm3 0.556
#> 5 Acta2 0.536
#> 6 Tagln 0.530
# }