Load and merge multiple gene expression matrices

Gene expression matrices should have features in rows and spots in columns.

Usage

LoadAndMergeMatrices(samplefiles, verbose = TRUE)

Arguments

samplefiles: Character vector of file/directory paths. Paths should specify .h5 or .tsv/.tsv.gz files. Alternatively, the paths could specify directories including barcodes.tsv, features.tsv and matrix.mtx files.
verbose: Print messages

Value

A sparse matrix of class dgCMatrix or a list of sparse matrices of class dgCMatrix

Details

The merging process makes sure that all genes detected are present in the merged output. This means that if a gene is missing in a certain dataset, the spots in that dataset will be assigned with 0 expression.

Spot IDs are renamed to be unique. Usually, the spots are named something similar to: "ACGCCTGACACGCGCT-1", "TACCGATCCAACACTT-1". If a "-N" suffix is missing from the barcode IDs, it will be added.

Since spot barcodes are shared across datasets, there is a risk that some of the spot IDs will be duplicated after merging. To avoid this, the suffix (e.g. "-1") is replaced by a unique suffix for each loaded matrix: "-1", "-2", "-3", ...

IF data

If the provided h5 files store antibody capture data, LoadAndMergeMatrices will return a list of matrices. If multiple samples are loaded, the RNA expression matrices and antibody capture matrices will be merged and returned as separate elements of the list. Note that if one or more samples only have RNA expression data, the function will add empty values for those samples in the merged antibody capture matrix.

Examples


# Load and merge two gene expression matrices
samples <-
  c(
    system.file(
      "extdata/mousebrain",
      "filtered_feature_bc_matrix.h5",
      package = "semla"
    ),
    system.file(
      "extdata/mousecolon",
      "filtered_feature_bc_matrix.h5",
      package = "semla"
    )
  )
mergedMatrix <- LoadAndMergeMatrices(samples)
#> ℹ Loading matrices:
#> →   Finished loading expression matrix 1
#> →   Finished loading expression matrix 2
#> ! There are only 188 gene shared across all matrices:
#> →   Are you sure that the matrices share the same gene IDs?
#> →   Are the datasets from the same species?
#> 
#> ℹ Merging expression matrices:
#> ✔ There are 188 features and 5164 spots in the merged expression matrix.