Non-negative matrix factorization
Last compiled: 18 September 2024
NNMF.Rmd
Non-negative matrix factorization (NNMF or NMF) can be a useful
technique to deconvolve 10x Visium data, in particular when a reference
single-cell RNA-seq data is not available to conduct cell type
deconvolution. In this vignette we will demonstrate how you can apply
this method with semla
on spatial data.
We recommend using the singlet R package for
NNMF which handles Seurat
objects. singlet
uses the ultra fast NNMF implementation from RcppML. Both of these
packages are developed by Zach DeBruines lab and you can find more
information about their work and tools at https://github.com/zdebruine and https://www.zachdebruine.com/.
RcppML
is available on CRAN
and GitHub, and
singlet can be installed from GitHub with
devtools::install_github("zdebruine/singlet")
.
Run NMF
We’ll use a mouse brain Visium data set provided in
SeuratData
for the NNMF. The method automatically runs
cross-validation to find the best rank and learns a model at that
rank.
The factorization can be computed on all genes, but below we have filtered the data to include only the top variable features to speed up the computation.
# Here we'll use an example data set provided in SeuratData
se_mbrain <- LoadData("stxBrain", type = "anterior1")
# Update se_mbrain for compatibility with semla
se_mbrain <- UpdateSeuratForSemla(se_mbrain)
# Normalize data and find top variable features
se_mbrain <- se_mbrain |>
NormalizeData() |>
FindVariableFeatures()
# OPTIONAL: subset data to improve computational speed
se_mbrain <- se_mbrain[VariableFeatures(se_mbrain), ]
# Set seed for reproducibility
set.seed(42)
se_mbrain <- RunNMF(se_mbrain)
k <- ncol(se_mbrain@reductions$nmf@feature.loadings)
We can plot the cross-validation results and find that the optimal rank decided by the method is 22, which determines the number of factors we obtain from the NMF run.
RankPlot(se_mbrain)
Spatial visualization
The results are stored as a DimReduc
object in our
Seurat
object and we can map the factors spatially with
MapFeatures()
.
MapFeatures(se_mbrain,
features = paste0("NMF_", 1:22),
override_plot_dims = TRUE,
colors = viridis::magma(n = 11, direction = -1)) &
theme(plot.title = element_blank())
Explore gene contributions
We can also investigate the gene loadings for each factor with
PlotFeatureLoadings()
which will give us an idea about what
the top contributing genes for each factor are. Below we can see the top
30 contributing genes for NMF_1 and NMF_2.
PlotFeatureLoadings(se_mbrain,
dims = 1:2,
reduction = "nmf",
nfeatures = 30,
mode = "dotplot",
fill = "darkmagenta",
pt_size = 3)
Make composite plot
We can also map multiple factors spatially in a single plot with
MapMultipleFeatures()
. Note that the visualization will
only display the most dominant factor for each spot. See
?MapMultipleFeatures()
for details.
se_mbrain <- LoadImages(se_mbrain)
factor_colors <- c('#e6194b', '#3cb44b', '#ffe119', '#4363d8', '#f58231',
'#911eb4', '#46f0f0', '#f032e6', '#bcf60c', '#fabebe',
'#008080', '#e6beff', '#9a6324', '#fffac8', '#800000', '#aaffc3')
# Select non-overlapping factors
selected_factors <- c(1, 2, 3, 4, 5, 6, 7, 9, 10, 12, 13, 15, 16, 17, 19, 20)
MapMultipleFeatures(se_mbrain,
features = paste0("NMF_", selected_factors),
colors = factor_colors,
image_use = "raw",
override_plot_dims = TRUE,
pt_size = 2)
Explore gene contributions (heatmap)
Similarly, we can also summarize the top feature loadings for each factor with a heatmap:
PlotFeatureLoadings(se_mbrain,
dims = selected_factors,
reduction = "nmf",
nfeatures = 30,
mode = "heatmap",
gradient_colors = viridis::magma(n = 11, direction = -1))
Functional Enrichment Analysis (FEA)
If you want to get the feature loading values for each factor, you
can extract them from the DimReduc
object. This can for
example be useful to run Functional Enrichment Analysis (FEA).
# fetch feature.loadings from DimReduc object
nmf_loadings <- se_mbrain[["nmf"]]@feature.loadings
# Convert to long format and group data by factor
gene_loadings_sorted <- nmf_loadings |>
as.data.frame() |>
tibble::rownames_to_column(var = "gene") |>
as_tibble() |>
tidyr::pivot_longer(all_of(colnames(nmf_loadings)), names_to = "fctr", values_to = "loading") |>
mutate(fctr = factor(fctr, colnames(nmf_loadings))) |>
group_by(fctr) |>
arrange(fctr, -loading)
# Extract top 10 genes per factor
gene_loadings_sorted |>
slice_head(n = 10)
## # A tibble: 220 × 3
## # Groups: fctr [22]
## gene fctr loading
## <chr> <fct> <dbl>
## 1 Penk NMF_1 0.00757
## 2 Pcp4 NMF_1 0.00753
## 3 Ppp1r1b NMF_1 0.00721
## 4 Gpr88 NMF_1 0.00650
## 5 Hpca NMF_1 0.00639
## 6 Ppp3ca NMF_1 0.00608
## 7 Pde1b NMF_1 0.00608
## 8 Arpp21 NMF_1 0.00598
## 9 Nrgn NMF_1 0.00585
## 10 Pde10a NMF_1 0.00582
## # ℹ 210 more rows
Run functional enrichment analysis on top 10 genes from factor 1:
library(gprofiler2)
# Get gene sets
gene_set_nmf_1 <- gene_loadings_sorted |>
filter(fctr == "NMF_1") |>
slice_head(n = 10)
# Run FEA
fea_results_nmf_1 <- gost(query = gene_set_nmf_1$gene, ordered_query = TRUE, organism = "mmusculus", sources = "GO:BP")$result |>
as_tibble()
# Look at results
fea_results_nmf_1 |> select(p_value, term_size, query_size, intersection_size, term_name)
## # A tibble: 23 × 5
## p_value term_size query_size intersection_size term_name
## <dbl> <int> <int> <int> <chr>
## 1 0.0000825 36 7 3 response to amphetamine
## 2 0.000265 250 7 4 locomotory behavior
## 3 0.000301 188 9 4 learning
## 4 0.000302 55 7 3 response to amine
## 5 0.000384 1329 8 6 response to abiotic stimulus
## 6 0.000885 417 6 4 response to salt
## 7 0.00214 760 9 5 behavior
## 8 0.00264 324 9 4 learning or memory
## 9 0.00342 475 7 4 response to radiation
## 10 0.00401 360 9 4 cognition
## # ℹ 13 more rows
Package versions
semla
: 1.1.6RcppML
: 0.5.6singlet
: 0.99.36
Session info
## R version 4.4.0 (2024-04-24)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sonoma 14.5
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: Europe/Stockholm
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices datasets utils methods base
##
## other attached packages:
## [1] stxBrain.SeuratData_0.1.2 ssHippo.SeuratData_3.1.4
## [3] SeuratData_0.2.1 singlet_0.99.36
## [5] RcppML_0.5.6 semla_1.1.6
## [7] ggplot2_3.5.0 dplyr_1.1.4
## [9] SeuratObject_4.1.4 Seurat_4.3.0.1
##
## loaded via a namespace (and not attached):
## [1] RcppAnnoy_0.0.22 splines_4.4.0
## [3] later_1.3.2 tibble_3.2.1
## [5] polyclip_1.10-7 lifecycle_1.0.4
## [7] globals_0.16.3 lattice_0.22-6
## [9] MASS_7.3-60.2 magrittr_2.0.3
## [11] limma_3.60.4 plotly_4.10.4
## [13] sass_0.4.9 rmarkdown_2.28
## [15] jquerylib_0.1.4 yaml_2.3.10
## [17] httpuv_1.6.15 sctransform_0.4.1
## [19] spam_2.10-0 sp_2.1-4
## [21] spatstat.sparse_3.1-0 reticulate_1.39.0
## [23] cowplot_1.1.3 pbapply_1.7-2
## [25] RColorBrewer_1.1-3 abind_1.4-8
## [27] zlibbioc_1.50.0 Rtsne_0.17
## [29] GenomicRanges_1.56.1 purrr_1.0.2
## [31] BiocGenerics_0.50.0 msigdbr_7.5.1
## [33] rappdirs_0.3.3 GenomeInfoDbData_1.2.12
## [35] IRanges_2.38.1 S4Vectors_0.42.1
## [37] ggrepel_0.9.6 irlba_2.3.5.1
## [39] listenv_0.9.1 spatstat.utils_3.1-0
## [41] goftest_1.2-3 spatstat.random_3.3-1
## [43] fitdistrplus_1.2-1 parallelly_1.38.0
## [45] pkgdown_2.1.0 leiden_0.4.3.1
## [47] codetools_0.2-20 DelayedArray_0.30.1
## [49] tidyselect_1.2.1 UCSC.utils_1.0.0
## [51] farver_2.1.2 matrixStats_1.4.1
## [53] stats4_4.4.0 spatstat.explore_3.3-2
## [55] jsonlite_1.8.8 progressr_0.14.0
## [57] ggridges_0.5.6 survival_3.6-4
## [59] systemfonts_1.1.0 dbscan_1.2-0
## [61] tools_4.4.0 ragg_1.3.3
## [63] ica_1.0-3 Rcpp_1.0.13
## [65] glue_1.7.0 gridExtra_2.3
## [67] SparseArray_1.4.8 xfun_0.47
## [69] MatrixGenerics_1.16.0 GenomeInfoDb_1.40.1
## [71] withr_3.0.1 BiocManager_1.30.25
## [73] fastmap_1.2.0 fansi_1.0.6
## [75] shinyjs_2.1.0 digest_0.6.37
## [77] R6_2.5.1 mime_0.12
## [79] textshaping_0.4.0 colorspace_2.1-1
## [81] scattermore_1.2 tensor_1.5
## [83] spatstat.data_3.1-2 utf8_1.2.4
## [85] tidyr_1.3.1 generics_0.1.3
## [87] renv_1.0.2 data.table_1.16.0
## [89] httr_1.4.7 htmlwidgets_1.6.4
## [91] S4Arrays_1.4.1 uwot_0.2.2
## [93] pkgconfig_2.0.3 gtable_0.3.5
## [95] lmtest_0.9-40 SingleCellExperiment_1.26.0
## [97] XVector_0.44.0 htmltools_0.5.8.1
## [99] dotCall64_1.1-1 fgsea_1.30.0
## [101] scales_1.3.0 Biobase_2.64.0
## [103] png_0.1-8 spatstat.univar_3.0-1
## [105] knitr_1.48 rstudioapi_0.16.0
## [107] reshape2_1.4.4 nlme_3.1-164
## [109] cachem_1.1.0 zoo_1.8-12
## [111] stringr_1.5.1 KernSmooth_2.23-24
## [113] parallel_4.4.0 miniUI_0.1.1.1
## [115] desc_1.4.3 pillar_1.9.0
## [117] grid_4.4.0 vctrs_0.6.5
## [119] RANN_2.6.2 promises_1.3.0
## [121] xtable_1.8-4 cluster_2.1.6
## [123] evaluate_0.24.0 zeallot_0.1.0
## [125] magick_2.8.4 cli_3.6.3
## [127] compiler_4.4.0 rlang_1.1.4
## [129] crayon_1.5.3 future.apply_1.11.2
## [131] plyr_1.8.9 forcats_1.0.0
## [133] fs_1.6.4 stringi_1.8.4
## [135] viridisLite_0.4.2 deldir_2.0-4
## [137] BiocParallel_1.38.0 babelgene_22.9
## [139] munsell_0.5.1 lazyeval_0.2.2
## [141] spatstat.geom_3.3-2 Matrix_1.7-0
## [143] patchwork_1.3.0 future_1.34.0
## [145] statmod_1.5.0 shiny_1.9.1
## [147] SummarizedExperiment_1.34.0 ROCR_1.0-11
## [149] igraph_2.0.3 bslib_0.8.0
## [151] fastmatch_1.1-4