Skip to contents

In this article, we will show how to load and work with VisiumHD data using semla.

We are going to be using the FFPE mouse brain test dataset that is available at 10x Genomics website. Specifically, we are going to use the binned_outputs folder. Mouse brain, FFPE, 6.5mm.

Overview

Introduction to VisiumHD

The high-resolution VisiumHD commercialized by 10x Genomics in spring 2024 follows the same philosophy as the traditional Visium platform we are used to: spatially barcoded spots capture the transcriptome (or most of the transcriptome since it is probe based but now there is FF protocol), and the location of the transcripts can be then mapped to the tissue by the spatial sequence that is specific to each spot. While similar in philosophy, both technologies have some fundamental differences:

  1. The capture area in Visium are rounded spots distributed in a honeycomb or hexagonal pattern, while in VisiumHD they are squares distributed in a grid pattern.
  2. In the traditional Visium, there is a distance of 100um between the spot’s center, whereas in VisiumHD there is no gaps in between. All the spots are immediately next to each-other, leaving no space.
  3. In Visium, the spots are 55um in diameter, meaning that the spatial resolution for the technology is 55um. In VisiumHD, the spots/squares are 2um in side, meaning that the spatial resolution is of 2um. In theory, this should allow for a sub-cellular resolution of the transcriptome. To put this into context, there is around (55/2)2(55/2)^2 VisiumHD spots per Visium spot.

So, 3 main things regarding VisiumHD: the spots are now squares instead of circles, the spots have no empty space in between them, and the size of the spots is 2um. This is a bit ugly, but maybe it can help put these numbers into context:

How many VisiumHD spots there is in one Visium spot!
How many VisiumHD spots there is in one Visium spot!

As a side note, maybe it is more accurate to refer to VisiumHD’s capture areas as squares, but due to old habit we will still refer to them as spots.

Data sparsity

An inherited problem of high-resolution spatial transcriptomics, be that sequencing-based (like BGI’s Stereo-seq, Curio’s Slide-seq, and other alternatives) or imaging-based (like 10x Genomics’s ISS platform Xenium, VIZGEN’s MERSCOPE, and others), is that the data tends to be quite sparse: few data points per spatial region. A common approach to circumvent this issue is to bin/group the spatial regions to increase the density of the data. Nevertheless, the cost of this binning is that you lose spatial resolution. There are different ways to bin the spatial data, but the most common approaches are via cell segmentation and grid binning. So, with high-resolution data, you have the following problem: HighSpatialResolution=SparseData=Binning=NotHighSpatialResolutionAnymoreHighSpatialResolution = SparseData = Binning = NotHighSpatialResolutionAnymore.

This is a problem the field is facing, and it will be interesting to see what creative approaches the community takes in order to maximize the information while minimizing the decrease in spatial resolution. As of the time this markdown was last knitted, the approach taken by 10x Genomics by default is grid binning. What this means is that all spots that are within a certain distance of each other will be grouped under the same spot, and their expression data summed up. The spots in VisiumHD are 2um in side, so if we talk about an 8um bin we mean that the new spots are going to be squares of 8um in side: we will be adding up the expression of 4*44*4 adjacent 2um spots to generate one 8um spot.

Grid binning

In VisiumHD, the “original” count data is the count data at the 2um spatial resolution, but 10x Genomics provides several binned_outputs by default when running spaceranger: 2, 8 and 16, with the option of specifying a custom resolution when calling spaceranger. At the 8um bin, each spot is a sum of 4*44*4 2um spots; at 16um bin, each spot is a sum of 8*88*8 2um spots; and so on.

For each one of these resolutions, spaceranger returns an entire folder with the structure we are used to from the normal Visium. This folder will contain the spatial position of each spot, the count matrix, the tissue HE image and the json file required to map the spots to the HE image.

Folder structure

We can think of each subfolder in the binned_outputs directory as it’s own sample. So for each tissue section, we will have at least “3 samples”: one for the 2um resolution, another for the 8um and another for the 16um. But do not forget that all the bin sizes above 2um are made up by adding up the “original” count data, which is the 2um data.

Load data

First we need to load some 10x VisiumHD data. You can run the code chunk below or follow this link to download it manually.

tmpdir <- "." # Set current wd or change to tmpdir()
dir.create(paste0(tmpdir, "/mousebrain"))
targetdir <- paste0(tmpdir, "/mousebrain/visium-hd")
dir.create(targetdir)
destfile <- paste0(targetdir, "/mouse_brain_HD.tar.gz")
download.file("https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Mouse_Brain/Visium_HD_Mouse_Brain_binned_outputs.tar.gz", destfile = destfile)
# now we decompress it
untar(paste0(targetdir, "mouse_brain_HD.tar.gz")

As mentioned above, we can consider each resolution as its separate section, and load them as such into R using semla. Let’s construct the infoTable with information regarding all the resolutions.

# directory where the data is stored
st.dir <- file.path(getwd(), "mousebrain/visium-hd/binned_outputs")
# specify resolutions to load into memory
res <- paste(0, c(2, 8, 16), "um", sep = "")
all.res <- paste(res, collapse = "|")

# retrieve the directories for each resolution
res.dir <- list.dirs(st.dir, recursive = FALSE) |> 
  stringr::str_subset(pattern = all.res)

# build the infoTable for all resolutions
infoTable <- data.frame(
  samples = list.files(res.dir,
                       full.names = TRUE, recursive = TRUE,
                       pattern = paste0("^filtered_feature.+.h5$")
  ),
  spotfiles = list.files(res.dir,
                         full.names = TRUE, recursive = TRUE,
                         pattern = "parquet$|positions.csv$"
  ),
  imgs = list.files(res.dir,
                    recursive = TRUE,
                    full.names = TRUE, pattern = "hires"
  ) |>
    str_subset(all.res),
  json = list.files(st.dir,
                    recursive = TRUE,
                    full.names = TRUE, pattern = "^scalefactors"
  ) |>
    str_subset(all.res),
  resolution = res,
  sample_ID = "mousebrain"
)

Now that we have built the infoTable, we can load the VisiumHD data into memory. Be aware that the 2um data is very big, so due to RAM issues you might not be able to work with this data locally. For example, if you want to build the infoTable to store only the 8 and 16um resolutions, you could do the following:

st.dir <- file.path(getwd(), "mousebrain/visium-hd/binned_outputs")
res <- paste(0, c(8, 16), "um", sep = "")
all.res <- paste(res, collapse = "|")

# retrieve the directories for each resolution
res.dir <- list.dirs(st.dir, recursive = FALSE) |>
  str_subset(pattern = all.res)

# build the infoTable for all resolutions
infoTable <- data.frame(
  samples = list.files(res.dir,
                       full.names = TRUE, recursive = TRUE,
                       pattern = paste0("^filtered_feature.+.h5$")
  ),
  spotfiles = list.files(res.dir,
                         full.names = TRUE, recursive = TRUE,
                         pattern = "parquet$|positions.csv$"
  ),
  imgs = list.files(res.dir,
                    recursive = TRUE,
                    full.names = TRUE, pattern = "hires"
  ) |>
    str_subset(all.res),
  json = list.files(st.dir,
                    recursive = TRUE,
                    full.names = TRUE, pattern = "^scalefactors"
  ) |>
    str_subset(all.res),
  resolution = res,
  sample_ID = "mousebrain"
)

Regardless of which resolution you want to load, with the built infoTable now we can load the data using ReadVisiumData() like you would in your standard Visium sections. If you have the 2um resolution in the infoTable, this can take a bit of time.

se.hd <- ReadVisiumData(infoTable)
se.hd
## An object of class Seurat 
## 19059 features across 492460 samples within 1 assay 
## Active assay: Spatial (19059 features, 0 variable features)

As mentioned before, now each resolution of the VisiumHD experiment is stored in the Seurat object as an individual experiment. Let’s take a look at the HE images. You can note how, even though we are labeling each image by expressing from which resolution they come from, they are exactly the same image.

se.hd <- LoadImages(se.hd)
ImagePlot(se.hd, label_by = "resolution")

This object is fully compatible with all of semla’s functions, so you can proceed normally with your analysis!

Analysis

Two of the main issues when working with high-resolution spatial transcriptomics is data sparsity and the size of the datasets. For example, do not be surprised if your standard analysis does not run at all at the higher resolutions in VisiumHD data. For standard Seurat clustering (v4 and prior), we recommend working on a high performance computing cluster.

Thus, we are going to perform the standard analysis pipeline on the 8 and 16um bins.

Seurat clustering

# 16um
se.16 <- SubsetSTData(se.hd, expression = resolution == "016um")
se.16 <-  se.16 |>
  NormalizeData() |>
  ScaleData() |>
  FindVariableFeatures() |>
  RunPCA() |>
  FindNeighbors(reduction = "pca", dims = 1:10) |>
  FindClusters(resolution = 0.2)

# 8um
se.8 <- SubsetSTData(se.hd, expression = resolution == "08um")
se.8 <-  se.8 |>
  NormalizeData() |>
  ScaleData() |>
  FindVariableFeatures() |>
  RunPCA() |>
  FindNeighbors(reduction = "pca", dims = 1:10) |>
  FindClusters(resolution = 0.2)

Non-negative matrix factorization

In the same way we can perform the standard Seurat analysis pipeline, we can also follow our tutorial for performing non-Negative matrix factorization (NMF).

library(singlet)

# 16um
se.16 <- se.16[VariableFeatures(se.16), ]
## Set seed for reproducibility
set.seed(42)
se.16 <- RunNMF(se.16)

# 8um
se.8 <- se.8[VariableFeatures(se.8), ]
## Set seed for reproducibility
set.seed(42)
se.8 <- RunNMF(se.8)

Plotting

We have done a few changes to semla’s plotting functions to properly represent this new type of data. The capture areas of a Visium slide are circles, and we represent them as such when using MapFeatures(), MapLabels() or MapMultipleFeatures(). However, in VisiumHD the capture areas are no longer circles, but rather squares. Thus, we have added a new argument for all of the aforementioned functions named shape =. This new argument allows for 3 different options:

  • shape = "point". This would be the default argument that the functions will take. It will represent the capture areas as circles.

  • shape = "tile". One of the new options, it will represent the capture areas as squares. Unless specified otherwise in the argument spot_side, this shape will represent the exact dimensions of the capture areas in respect to the tissue.

  • shape = "raster". The other new plotting option. Like tile, it will represent the capture areas as squares. The main peculiarity of this argument is that it is a faster implementation of tile, which could be useful when plotting at high resolutions. However, this argument does not accept the HE image, so one should set image_use = NULL when using this argument.

These arguments are compatible with all of the other arguments already present in the plotting functions, like crop_area and such. As a side note, even though these options make sense with VisiumHD data, you can also use them with normal Visium data in order to represent the capture areas as squares instead of the real circles.

With this information in mind, let’s take a look at the results of our analysis, starting with Seurat’s clustering:

p <- MapLabels(se.16,
               column_name = "seurat_clusters", 
               override_plot_dims = TRUE,
               shape = "tile", 
               image_use = "raw", 
               label_by = "resolution"
)
p
1. Seurat clusters identified at 16ums.
1. Seurat clusters identified at 16ums.
p <- MapLabels(se.8,
               column_name = "seurat_clusters", 
               override_plot_dims = TRUE,
               shape = "tile", 
               image_use = "raw", 
               label_by = "resolution"
)
p
2. Seurat clusters identified at 8ums.
2. Seurat clusters identified at 8ums.

Next, let’s take a quick look at the spatial distribution of some of the NMF factors computed in each resolution:

p <- MapFeatures(se.16,
                 features = "NMF_3", 
                 label_by = "resolution",
                 image_use = "raw", 
                 override_plot_dims = TRUE,
                 shape = "tile", 
                 scale_alpha = TRUE,
                 colors = c("darkblue", "cyan", "yellow", "red", "darkred")
)
p
1. NMF factors identified at 16ums.
1. NMF factors identified at 16ums.
p <- MapFeatures(se.8,
                 features = "NMF_2", 
                 label_by = "resolution",
                 image_use = "raw", 
                 override_plot_dims = TRUE,
                 shape = "tile", 
                 scale_alpha = TRUE,
                 colors = c("darkblue", "cyan", "yellow", "red", "darkred")
)
p
2. NMF factors identified at 8ums.
2. NMF factors identified at 8ums.



Package versions
  • semla: 1.1.6
Session info
## R version 4.4.0 (2024-04-24)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sonoma 14.5
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: Europe/Stockholm
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices datasets  utils     methods   base     
## 
## other attached packages:
## [1] stringr_1.5.1      semla_1.1.6        ggplot2_3.5.1      dplyr_1.1.4       
## [5] SeuratObject_4.1.4 Seurat_4.3.0.1    
## 
## loaded via a namespace (and not attached):
##   [1] RColorBrewer_1.1-3     rstudioapi_0.16.0      jsonlite_1.8.9        
##   [4] magrittr_2.0.3         magick_2.8.4           spatstat.utils_3.1-0  
##   [7] farver_2.1.2           rmarkdown_2.28         fs_1.6.4              
##  [10] ragg_1.3.3             vctrs_0.6.5            ROCR_1.0-11           
##  [13] spatstat.explore_3.3-2 forcats_1.0.0          htmltools_0.5.8.1     
##  [16] sass_0.4.9             sctransform_0.4.1      parallelly_1.38.0     
##  [19] KernSmooth_2.23-24     bslib_0.8.0            htmlwidgets_1.6.4     
##  [22] desc_1.4.3             ica_1.0-3              plyr_1.8.9            
##  [25] plotly_4.10.4          zoo_1.8-12             cachem_1.1.0          
##  [28] igraph_2.0.3           mime_0.12              lifecycle_1.0.4       
##  [31] pkgconfig_2.0.3        Matrix_1.7-0           R6_2.5.1              
##  [34] fastmap_1.2.0          fitdistrplus_1.2-1     future_1.34.0         
##  [37] shiny_1.9.1            digest_0.6.37          colorspace_2.1-1      
##  [40] patchwork_1.3.0        tensor_1.5             irlba_2.3.5.1         
##  [43] textshaping_0.4.0      progressr_0.14.0       fansi_1.0.6           
##  [46] spatstat.sparse_3.1-0  httr_1.4.7             polyclip_1.10-7       
##  [49] abind_1.4-8            compiler_4.4.0         bit64_4.0.5           
##  [52] withr_3.0.1            highr_0.11             MASS_7.3-60.2         
##  [55] tools_4.4.0            lmtest_0.9-40          httpuv_1.6.15         
##  [58] future.apply_1.11.2    goftest_1.2-3          glue_1.7.0            
##  [61] dbscan_1.2-0           nlme_3.1-164           promises_1.3.0        
##  [64] grid_4.4.0             Rtsne_0.17             cluster_2.1.6         
##  [67] reshape2_1.4.4         generics_0.1.3         hdf5r_1.3.11          
##  [70] gtable_0.3.5           spatstat.data_3.1-2    tidyr_1.3.1           
##  [73] data.table_1.16.0      sp_2.1-4               utf8_1.2.4            
##  [76] spatstat.geom_3.3-2    RcppAnnoy_0.0.22       ggrepel_0.9.6         
##  [79] RANN_2.6.2             pillar_1.9.0           spam_2.10-0           
##  [82] later_1.3.2            splines_4.4.0          lattice_0.22-6        
##  [85] bit_4.0.5              renv_1.0.2             survival_3.6-4        
##  [88] deldir_2.0-4           tidyselect_1.2.1       miniUI_0.1.1.1        
##  [91] pbapply_1.7-2          knitr_1.48             gridExtra_2.3         
##  [94] scattermore_1.2        xfun_0.47              matrixStats_1.4.1     
##  [97] stringi_1.8.4          lazyeval_0.2.2         yaml_2.3.10           
## [100] evaluate_0.24.0        codetools_0.2-20       tibble_3.2.1          
## [103] BiocManager_1.30.25    cli_3.6.3              uwot_0.2.2            
## [106] arrow_17.0.0.1         xtable_1.8-4           reticulate_1.39.0     
## [109] systemfonts_1.1.0      munsell_0.5.1          jquerylib_0.1.4       
## [112] Rcpp_1.0.13            globals_0.16.3         spatstat.random_3.3-1 
## [115] zeallot_0.1.0          png_0.1-8              spatstat.univar_3.0-1 
## [118] parallel_4.4.0         assertthat_0.2.1       pkgdown_2.1.0         
## [121] dotCall64_1.1-1        listenv_0.9.1          viridisLite_0.4.2     
## [124] scales_1.3.0           ggridges_0.5.6         leiden_0.4.3.1        
## [127] purrr_1.0.2            rlang_1.1.4            cowplot_1.1.3         
## [130] shinyjs_2.1.0