最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

空间转录组数据注释分析:RCTD反卷积(Nature Biotechnology IF: 33.1)

网站源码admin2浏览0评论

空间转录组数据注释分析:RCTD反卷积(Nature Biotechnology IF: 33.1)

RCTD(Robust Cell Type Decomposition),是一种用于将单细胞RNA测序数据中的细胞类型注释转移到空间转录组学数据上的方法,于2021年2月18号发表在Nature Biotechnology杂志上,标题为《Robust decomposition of cell type mixtures in spatial transcriptomics》。RCTD 通过整合单细胞和空间转录组学数据,能够较为精确地为空间 spots 分配细胞类型,以便更好地理解空间组织结构中的基因表达情况。

参考教程链接:

0.环境准备

代码语言:javascript代码运行次数:0运行复制
# 加载R包
library(spacexr)
library(Matrix)
library(doParallel)
library(ggplot2)

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  cache = TRUE,
  out.width = "100%"
)

1.读取数据

本次使用数据为 hippocampus Visium 数据,首先,使用一个已经注释好的单细胞数据海马体数据集对空间数据用RCTD进行细胞注释,然后做区域差异表达分析

full mode可以对每个spots反卷积成任意细胞数,对于Visium空间数据比较合适。

1.1 读取空间数据

空间转录组数据需要的是counts矩阵与空间坐标

  • counts:矩阵的行名应为基因,列名应代表 barcode/pixel 名称。counts应为未经转换的原始counts数据。
  • coords:一个数值型数据框(或矩阵),用于表示空间像素的位置。行名是barcode/像素名称,且应包含两列,分别用于表示“x”坐标和“y”坐标。
  • nUMI:可选,每个空间位置的总umi数
代码语言:javascript代码运行次数:0运行复制
# 包中的示例数据
# 读取counts
file.path(datadir,"counts.csv")
counts <- as.data.frame(readr::read_csv(file.path(datadir,"counts.csv"))) # load in counts matrix
counts[1:10,1:5]
rownames(counts) <- counts[,1]; 
counts <- counts[,-1] # Move first column to rownames

# 读取坐标
file.path(datadir,"coords.csv")
coords <- read.csv(file.path(datadir,"coords.csv"))
head(coords)
rownames(coords) <- coords[,1]; 
coords <- coords[,-1] # Move first column to rownames

# 计算每个位置的umi
nUMI <- colSums(counts) # In this case, total counts per pixel is nUMI
head(nUMI)

1.2 构建对象

代码语言:javascript代码运行次数:0运行复制
# 构建一个对象
puck <- SpatialRNA(coords, counts, nUMI)
puck

barcodes <- colnames(puck@counts) # pixels to be used (a list of barcode names). 
plot_puck_continuous(puck, barcodes, puck@nUMI, 
                   ylimit = c(0,round(quantile(puck@nUMI,0.9))), 
                   size = 2,
                   title ='plot of nUMI') 

空间每个spot中表达的nUMI值:

看一下构建好的SpatialRNA对象

代码语言:javascript代码运行次数:0运行复制
str(puck)

Formal class 'SpatialRNA' [package "spacexr"] with 3 slots
  ..@ coords:'data.frame':      313 obs. of  2 variables:
  .. ..$ x: int [1:313] 4125412546043047412530474005412536464484...
  .. ..$ y: int [1:313] 5477300041013895602848585684396457523069...
  ..@ counts:Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. .. ..@ i       : int [1:85720] 0123456789...
  .. .. ..@ p       : int [1:314] 0275551827110113771650192622022478...
  .. .. ..@ Dim     : int [1:2] 276313
  .. .. ..@ Dimnames:List of 2
  .. .. .. ..$ : chr [1:276] "Rpl7""Cox5b""Rpl31""Rpl37a"...
  .. .. .. ..$ : chr [1:313] "AAAGGGATGTAGCAAG-1""AAAGGGCAGCTTGAAT-1""AACAACTGGTAGTTGC-1""AACCCAGAGACGGAGA-1"...
  .. .. ..@ x       : num [1:85720] 8178117864915...
  .. .. ..@ factors : list()
  ..@ nUMI  : Named num [1:313] 481813002715656125844...
  .. ..- attr(*, "names")= chr [1:313] "AAAGGGATGTAGCAAG-1""AAAGGGCAGCTTGAAT-1""AACAACTGGTAGTTGC-1""AACCCAGAGACGGAGA-1"...

1.3 读取单细胞参考数据

单细胞需要的是counts矩阵,以及做好的细胞注释信息,和nUMI数据。

代码语言:javascript代码运行次数:0运行复制
### 2.读取单细胞参考数据
refdir <- system.file("extdata",'Reference/Visium_Ref',package = 'spacexr') # directory for the reference

# count值
file.path(refdir,"counts.csv")
counts <- read.csv(file.path(refdir,"counts.csv")) # load in cell types
rownames(counts) <- counts[,1]; 
counts <-  counts[,-1] # Move first column to rownames
counts[1:5,1:5]

简单看一下数据:原始counts

细胞类型:

代码语言:javascript代码运行次数:0运行复制
# 细胞类型:
file.path(refdir,"cell_types.csv")
cell_types <- read.csv(file.path(refdir,"cell_types.csv")) # load in cell types
cell_types <- setNames(cell_types[[2]], cell_types[[1]])
cell_types <- as.factor(cell_types) # convert to factor data type
head(cell_types)

nUMI数据:

代码语言:javascript代码运行次数:0运行复制
# umi
file.path(refdir,"nUMI.csv")
nUMI <- read.csv(file.path(refdir,"nUMI.csv")) # load in cell types
nUMI <- setNames(nUMI[[2]], nUMI[[1]])
head(nUMI)

参考对象:

代码语言:javascript代码运行次数:0运行复制
# 构建对象
reference <- Reference(counts, cell_types, nUMI)
str(reference)

Formal class 'Reference' [package "spacexr"] with 3 slots
  ..@ cell_types: Factor w/ 17 levels "Astrocyte","CA1",..: 1111111111...
  .. ..- attr(*, "names")= chr [1:510] "P60HippoRep6P2_GACTTGAAATGC""P60HippoRep2P2_ACTGAAGAATTN""P60HippoRep2P2_CCTTAGCCCTGG""P60HippoRep5P3_GTCAAGGCGGGC"...
  ..@ counts    :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. .. ..@ i       : int [1:69901] 1249111216222326...
  .. .. ..@ p       : int [1:511] 011221538456666874688010321163...
  .. .. ..@ Dim     : int [1:2] 307510
  .. .. ..@ Dimnames:List of 2
  .. .. .. ..$ : chr [1:307] "Acta2""Actb""Ahi1""Aldoa"...
  .. .. .. ..$ : chr [1:510] "P60HippoRep6P2_GACTTGAAATGC""P60HippoRep2P2_ACTGAAGAATTN""P60HippoRep2P2_CCTTAGCCCTGG""P60HippoRep5P3_GTCAAGGCGGGC"...
  .. .. ..@ x       : num [1:69901] 416212410111...
  .. .. ..@ factors : list()
  ..@ nUMI      : Named int [1:510] 9961199231339298845861378176814861331...
  .. ..- attr(*, "names")= chr [1:510] "P60HippoRep6P2_GACTTGAAATGC""P60HippoRep2P2_ACTGAAGAATTN""P60HippoRep2P2_CCTTAGCCCTGG""P60HippoRep5P3_GTCAAGGCGGGC"...

2.运行RCTD

数据都准备好了之后开始运行RCTD:需要注意这里的mode选择full模式

代码语言:javascript代码运行次数:0运行复制
# 运行
myRCTD <- create.RCTD(puck, reference, max_cores = 10)
myRCTD <- run.RCTD(myRCTD, doublet_mode = 'full')
saveRDS(myRCTD,'myRCTD_visium_full.rds')

看看myRCTD的结构:

代码语言:javascript代码运行次数:0运行复制
str(myRCTD)

Formal class 'RCTD' [package "spacexr"] with 9 slots
  ..@ spatialRNA        :Formal class 'SpatialRNA' [package "spacexr"] with 3 slots
  .. .. ..@ coords:'data.frame':        313 obs. of  2 variables:
  .. .. .. ..$ x: int [1:313] 4125412546043047412530474005412536464484...
  .. .. .. ..$ y: int [1:313] 5477300041013895602848585684396457523069...
  .. .. ..@ counts:Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. .. .. .. ..@ i       : int [1:37939] 0123456789...
  .. .. .. .. ..@ p       : int [1:314] 01222443664886107308529741096...
  .. .. .. .. ..@ Dim     : int [1:2] 122313
  .. .. .. .. ..@ Dimnames:List of 2
  .. .. .. .. .. ..$ : chr [1:122] "Aldoc""Apoe""Atp1a2""Atp5f1"...
  .. .. .. .. .. ..$ : chr [1:313] "AAAGGGATGTAGCAAG-1""AAAGGGCAGCTTGAAT-1""AACAACTGGTAGTTGC-1""AACCCAGAGACGGAGA-1"...
  .. .. .. .. ..@ x       : num [1:37939] 101887625725504...
  .. .. .. .. ..@ factors : list()
  .. .. ..@ nUMI  : Named num [1:313] 481813002715656125844...
  .. .. .. ..- attr(*, "names")= chr [1:313] "AAAGGGATGTAGCAAG-1""AAAGGGCAGCTTGAAT-1""AACAACTGGTAGTTGC-1""AACCCAGAGACGGAGA-1"...
  ..@ originalSpatialRNA:Formal class 'SpatialRNA' [package "spacexr"] with 3 slots
  .. .. ..@ coords:'data.frame':        313 obs. of  2 variables:
  .. .. .. ..$ x: int [1:313] 4125412546043047412530474005412536464484...
  .. .. .. ..$ y: int [1:313] 5477300041013895602848585684396457523069...
  .. .. ..@ counts:Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. .. .. .. ..@ i       : int [1:85720] 0123456789...
  .. .. .. .. ..@ p       : int [1:314] 0275551827110113771650192622022478...
  .. .. .. .. ..@ Dim     : int [1:2] 276313
  .. .. .. .. ..@ Dimnames:List of 2
  .. .. .. .. .. ..$ : chr [1:276] "Rpl7""Cox5b""Rpl31""Rpl37a"...
  .. .. .. .. .. ..$ : chr [1:313] "AAAGGGATGTAGCAAG-1""AAAGGGCAGCTTGAAT-1""AACAACTGGTAGTTGC-1""AACCCAGAGACGGAGA-1"...
  .. .. .. .. ..@ x       : num [1:85720] 8178117864915...
  .. .. .. .. ..@ factors : list()
  .. .. ..@ nUMI  : Named num [1:313] 481813002715656125844...
  .. .. .. ..- attr(*, "names")= chr [1:313] "AAAGGGATGTAGCAAG-1""AAAGGGCAGCTTGAAT-1""AACAACTGGTAGTTGC-1""AACCCAGAGACGGAGA-1"...
  ..@ reference         :Formal class 'Reference' [package "spacexr"] with 3 slots
  .. .. ..@ cell_types: Factor w/ 17 levels "Astrocyte","CA1",..: 1111122222...
  .. .. .. ..- attr(*, "names")= chr [1:85] "P60HippoRep3P1_ATTATACTTGCG""P60HippoRep1P2_TCACCCTCAAGC""P60HippoRep2P2_CCTTAGCCCTGG""P60HippoRep4P3_CTCCCCTTAATC"...
  .. .. ..@ counts    :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. .. .. .. ..@ i       : int [1:12442] 12410121316212325...
  .. .. .. .. ..@ p       : int [1:86] 095262431653756905109712981551...
  .. .. .. .. ..@ Dim     : int [1:2] 30785
  .. .. .. .. ..@ Dimnames:List of 2
  .. .. .. .. .. ..$ : chr [1:307] "Acta2""Actb""Ahi1""Aldoa"...
  .. .. .. .. .. ..$ : chr [1:85] "P60HippoRep3P1_ATTATACTTGCG""P60HippoRep1P2_TCACCCTCAAGC""P60HippoRep2P2_CCTTAGCCCTGG""P60HippoRep4P3_CTCCCCTTAATC"...
  .. .. .. .. ..@ x       : num [1:12442] 2110216111121...
  .. .. .. .. ..@ factors : list()
  .. .. ..@ nUMI      : Named int [1:85] 755193623133614119914142512400282535677...
  .. .. .. ..- attr(*, "names")= chr [1:85] "P60HippoRep3P1_ATTATACTTGCG""P60HippoRep1P2_TCACCCTCAAGC""P60HippoRep2P2_CCTTAGCCCTGG""P60HippoRep4P3_CTCCCCTTAATC"...
  ..@ config            :List of 22
  .. ..$ gene_cutoff         : num 0.000125
  .. ..$ fc_cutoff           : num 0.5
  .. ..$ gene_cutoff_reg     : num 2e-04
  .. ..$ fc_cutoff_reg       : num 0.75
  .. ..$ UMI_min             : num 100
  .. ..$ UMI_min_sigma       : num 300
  .. ..$ max_cores           : num 2
  .. ..$ N_epoch             : num 8
  .. ..$ N_X                 : num 50000
  .. ..$ K_val               : num 100
  .. ..$ N_fit               : num 1000
  .. ..$ N_epoch_bulk        : num 30
  .. ..$ MIN_CHANGE_BULK     : num 1e-04
  .. ..$ MIN_CHANGE_REG      : num 0.001
  .. ..$ UMI_max             : num 2e+07
  .. ..$ counts_MIN          : num 10
  .. ..$ MIN_OBS             : num 3
  .. ..$ MAX_MULTI_TYPES     : num 4
  .. ..$ CONFIDENCE_THRESHOLD: num 10
  .. ..$ DOUBLET_THRESHOLD   : num 25
  .. ..$ RCTDmode            : chr "full"
  .. ..$ doublet_mode        : chr "full"
  ..@ cell_type_info    :List of 2
  .. ..$ info  :List of 3
  .. .. ..$ :'data.frame':      307 obs. of  17 variables:
  .. .. .. ..$ Astrocyte            : num [1:307] 2.31e-052.53e-032.09e-047.12e-048.77e-03...
  .. .. .. ..$ CA1                  : num [1:307] 0.002.85e-039.55e-041.08e-038.91e-05...
  .. .. .. ..$ CA3                  : num [1:307] 00.0018910.0003620.0010630.0001...
  .. .. .. ..$ Cajal_Retzius        : num [1:307] 0.002.51e-037.78e-041.25e-033.78e-05...
  .. .. .. ..$ Choroid              : num [1:307] 1.96e-061.53e-037.45e-059.64e-043.55e-05...
  .. .. .. ..$ Denate               : num [1:307] 0.003.95e-031.06e-031.43e-039.47e-05...
  .. .. .. ..$ Endothelial_Stalk    : num [1:307] 0.006.15e-037.55e-052.48e-042.73e-04...
  .. .. .. ..$ Endothelial_Tip      : num [1:307] 00.0032040.0003120.0002230.000186...
  .. .. .. ..$ Entorihinal          : num [1:307] 00.0018530.0012460.0010980.000276...
  .. .. .. ..$ Ependymal            : num [1:307] 0.0009250.0033460.0002470.000410.000506...
  .. .. .. ..$ Interneuron          : num [1:307] 00.0017790.0012510.0010120.000114...
  .. .. .. ..$ Microglia_Macrophages: num [1:307] 00.0080270.0003610.0005020.000234...
  .. .. .. ..$ Mural                : num [1:307] 0.0104680.0098880.0001450.0010870.000281...
  .. .. .. ..$ Neurogenesis         : num [1:307] 00.0075950.0003140.0001040.000133...
  .. .. .. ..$ Neuron.Slc17a6       : num [1:307] 3.99e-052.01e-031.54e-031.60e-031.12e-04...
  .. .. .. ..$ Oligodendrocyte      : num [1:307] 0.005.35e-035.36e-056.72e-044.75e-04...
  .. .. .. ..$ Polydendrocyte       : num [1:307] 00.0042450.000230.0002650.000367...
  .. .. ..$ : chr [1:17] "Astrocyte""CA1""CA3""Cajal_Retzius"...
  .. .. ..$ : int 17
  .. ..$ renorm:List of 3
  .. .. ..$ :'data.frame':      122 obs. of  17 variables:
  .. .. .. ..$ Astrocyte            : num [1:122] 0.039460.061510.020670.002470.00384...
  .. .. .. ..$ CA1                  : num [1:122] 0.0004010.0004830.0003930.0007710.000516...
  .. .. .. ..$ CA3                  : num [1:122] 4.52e-042.87e-046.84e-057.67e-041.66e-04...
  .. .. .. ..$ Cajal_Retzius        : num [1:122] 0.000170.0003040.0009330.00190.001271...
  .. .. .. ..$ Choroid              : num [1:122] 0.000160.016030.001050.002760.00112...
  .. .. .. ..$ Denate               : num [1:122] 0.0004260.001250.0001610.0018550.000682...
  .. .. .. ..$ Endothelial_Stalk    : num [1:122] 0.001230.0057320.0006460.0014260.001425...
  .. .. .. ..$ Endothelial_Tip      : num [1:122] 0.0008380.0288240.0168660.0009220.002654...
  .. .. .. ..$ Entorihinal          : num [1:122] 1.24e-033.07e-047.69e-051.43e-035.16e-04...
  .. .. .. ..$ Ependymal            : num [1:122] 0.002280.027730.001430.001010.00151...
  .. .. .. ..$ Interneuron          : num [1:122] 0.0005150.000660.0001940.0011440.000779...
  .. .. .. ..$ Microglia_Macrophages: num [1:122] 0.0010520.0128690.0001310.0011450.003646...
  .. .. .. ..$ Mural                : num [1:122] 0.001260.002830.00840.00150.00267...
  .. .. .. ..$ Neurogenesis         : num [1:122] 0.0006010.0019410.0003440.0016370.000988...
  .. .. .. ..$ Neuron.Slc17a6       : num [1:122] 0.0005050.0003720.0001150.0015640.00067...
  .. .. .. ..$ Oligodendrocyte      : num [1:122] 0.0021390.0101660.0004550.0013870.003402...
  .. .. .. ..$ Polydendrocyte       : num [1:122] 0.001650.013030.002250.001530.00394...
  .. .. ..$ : chr [1:17] "Astrocyte""CA1""CA3""Cajal_Retzius"...
  .. .. ..$ : int 17
  ..@ internal_vars     :List of 8
  .. ..$ gene_list_reg      : chr [1:102] "Aldoc""Apoe""Atp1a2""Cd81"...
  .. ..$ gene_list_bulk     : chr [1:122] "Aldoc""Apoe""Atp1a2""Atp5f1"...
  .. ..$ proportions        : Named num [1:17] 1.21e-012.54e-072.60e-015.61e-019.39e-03...
  .. .. ..- attr(*, "names")= chr [1:17] "Astrocyte""CA1""CA3""Cajal_Retzius"...
  .. ..$ class_df           :'data.frame':      17 obs. of  1 variable:
  .. .. ..$ class: chr [1:17] "Astrocyte""CA1""CA3""Cajal_Retzius"...
  .. ..$ cell_types_assigned: logi TRUE
  .. ..$ sigma              : num 0.21
  .. ..$ Q_mat              : num [1:103, 1:2536] 1.001.43e-051.60e-069.72e-076.88e-07...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:103] "V1""V2""V3""V4"...
  .. .. .. ..$ : NULL
  .. ..$ X_vals             : num [1:2536] 1.00e-052.83e-055.20e-058.00e-051.12e-04...
  ..@ results           :List of 1
  .. ..$ weights:Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. .. .. ..@ i       : int [1:5321] 0123456789...
  .. .. .. ..@ p       : int [1:18] 0313626939125215651878219125042817...
  .. .. .. ..@ Dim     : int [1:2] 31317
  .. .. .. ..@ Dimnames:List of 2
  .. .. .. .. ..$ : chr [1:313] "AAAGGGATGTAGCAAG-1""AAAGGGCAGCTTGAAT-1""AACAACTGGTAGTTGC-1""AACCCAGAGACGGAGA-1"...
  .. .. .. .. ..$ : chr [1:17] "Astrocyte""CA1""CA3""Cajal_Retzius"...
  .. .. .. ..@ x       : num [1:5321] 0.04360.01130.04730.03940.0574...
  .. .. .. ..@ factors : list()
  ..@ de_results        : list()
  ..@ internal_vars_de  : list()

3.探索一下RCTD结果

RCTD full mode的结果保存在@results$weights中。接下来使用normalize_weights函数标化这个权重,使得每个spots中的各种细胞类型比例加起来等于1

代码语言:javascript代码运行次数:0运行复制
## 探索一下RCTD结果
## result
barcodes <- colnames(myRCTD@spatialRNA@counts)
weights <- myRCTD@results$weights
norm_weights <- normalize_weights(weights)
dim(norm_weights)
norm_weights[1:5,1:6]
head(rowSums(norm_weights))

看一下norm_weights:

每一行代表一个spots,每一列代表一个细胞类型,值表示这个spot中细胞类型所占的比例,每一行的和为1

4.可视化反卷积结果

有了这个结果,再加上空间对象的rds文件,可以借助其他反卷积的绘图函数绘制出其他反卷积结果中一样的那种饼图了,比如SPOTlight、STdeconvolve 软件可以绘制的饼图:

教程中可以展示某一个细胞类型在空间数据中反卷积出来的比例可视化:

总共有17中细胞类型,这里展示Denate

代码语言:javascript代码运行次数:0运行复制
# plot Dentate weights
p <- plot_puck_continuous(myRCTD@spatialRNA, 
                          barcodes, norm_weights[,'Denate'], ylimit = c(0,0.5), 
                          title ='plot of Dentate weights', size=4.5, alpha=0.8) 
p
ggsave("Spaital_weights.png", width=8, height=6, plot=p,bg="white")

结果图:

借用STdeconvolve绘制一下饼图看看:

代码语言:javascript代码运行次数:0运行复制
## 借用STdeconvolve绘制一下饼图看看:
library(STdeconvolve)
library(ggplot2)
library(ggsci)
packageVersion("STdeconvolve")

m <- as.matrix(norm_weights)
p <- coords

plt <- vizAllTopics(theta = m,
                    pos = p,
                    topicOrder=seq(ncol(m)),
                    topicCols=rainbow(ncol(m)),
                    groups = NA,
                    group_cols = NA,
                    r = 56, # size of scatterpies; adjust depending on the coordinates of the pixels
                    lwd = 0.3,
                    showLegend = TRUE,
                    plotTitle = "scatterpies")

## function returns a `ggplot2` object, so other aesthetics can be added on:
plt <- plt + 
  ggplot2::guides(fill=ggplot2::guide_legend(ncol=2))
plt 
ggsave(Spaital_scatterpies.png, width=12, height=6, plot=plt, bg="white")

结果图:

这个反卷积结果可以看到,某些区域细胞类型占比比较大,具有区域特征。绘图的时候还有个小技巧:同种类型的细胞的不同亚型,可以设置同种色系,可以方便更清晰的看出来区域特征。这里就不展示了,毕竟还要去找色系来对应。

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。原始发表:2025-04-05,如有侵权请联系 cloudcommunity@tencent 删除cellint对象教程数据
发布评论

评论列表(0)

  1. 暂无评论