12个Cell杂志的疑难杂症空转读取与分析：GSE144240

前面在曾老板的帖子中提到了各种各样的空间转录组数据：《10x的空间单细胞文件格式详解》，这些数据都不是标准的10x spaceranger的输出，读取起来会有一定的难度，这个里面的数据已经有三个进行了解读啦：

4个NC杂志的空间转录组数据分析(GSE190811)
8个Cell杂志的空间转录组数据基础分析(GSE158328)
两个只有表达矩阵的空间转录组数据如何分析？（Nat Biotechnol（IF46.9/Q1））

今天的这个数据也是一篇非常经典的单细胞与空间转录组数据联合分析的经典文献，于2020年7月份发表在 CELL上，标题为《Multimodal Analysis of Composition and Spatial Architecture in Human Squamous Cell Carcinoma》：

数据背景

文章将数据放在了GEO上：/geo/query/acc.cgi?acc=GSE144240，共12例样本

代码语言：javascript代码运行次数：0运行复制

GSM4284316 P2_ST_rep1
GSM4284317 P2_ST_rep2
GSM4284318 P2_ST_rep3
GSM4284319 P5_ST_rep1
GSM4284320 P5_ST_rep2
GSM4284321 P5_ST_rep3
GSM4284322 P9_ST_rep1
GSM4284323 P9_ST_rep2
GSM4284324 P9_ST_rep3
GSM4284325 P10_ST_rep1
GSM4284326 P10_ST_rep2
GSM4284327 P10_ST_rep3

下载GEO附件 GSE144240_RAW.tar，数据列表如下：

每个样本提供了一个jpg格式的HE染色切片，一个表达矩阵，以及具有barcode的位点

代码语言：javascript代码运行次数：0运行复制

GSM4284316_P2_ST_rep1.jpg.gz 10.0 Mb
GSM4284316_P2_ST_rep1_stdata.tsv.gz 3.9 Mb
GSM4284316_spot_data-selection-P2_ST_rep1.tsv.gz 8.1 Kb
GSM4284317_P2_ST_rep2.jpg.gz 9.8 Mb
GSM4284317_P2_ST_rep2_stdata.tsv.gz 4.4 Mb
GSM4284317_spot_data-selection-P2_ST_rep2.tsv.gz 7.9 Kb
GSM4284318_P2_ST_rep3.jpg.gz 10.4 Mb
GSM4284318_P2_ST_rep3_stdata.tsv.gz 4.5 Mb
GSM4284318_spot_data-selection-P2_ST_rep3.tsv.gz 7.8 Kb
GSM4284319_P5_ST_rep1.jpg.gz 9.4 Mb
GSM4284319_P5_ST_rep1_stdata.tsv.gz 2.8 Mb
GSM4284319_spot_data-selection-P5_ST_rep1.tsv.gz 7.4 Kb
GSM4284320_P5_ST_rep2.jpg.gz 9.2 Mb
GSM4284320_P5_ST_rep2_stdata.tsv.gz 3.2 Mb
GSM4284320_spot_data-selection-P5_ST_rep2.tsv.gz 6.5 Kb
GSM4284321_P5_ST_rep3.jpg.gz 9.6 Mb
GSM4284321_P5_ST_rep3_stdata.tsv.gz 2.3 Mb
GSM4284321_spot_data-selection-P5_ST_rep3.tsv.gz 6.5 Kb
GSM4284322_P9_ST_rep1.jpg.gz 13.4 Mb
GSM4284322_P9_ST_rep1_stdata.tsv.gz 2.6 Mb
GSM4284322_spot_data-selection-P9_ST_rep1.tsv.gz 14.0 Kb
GSM4284323_P9_ST_rep2.jpg.gz 13.7 Mb
GSM4284323_P9_ST_rep2_stdata.tsv.gz 2.4 Mb
GSM4284323_spot_data-selection-P9_ST_rep2.tsv.gz 13.1 Kb
GSM4284324_P9_ST_rep3.jpg.gz 15.0 Mb
GSM4284324_P9_ST_rep3_stdata.tsv.gz 1.8 Mb
GSM4284324_spot_data-selection-P9_ST_rep3.tsv.gz 14.4 Kb
GSM4284325_P10_ST_rep1.jpg.gz 10.0 Mb
GSM4284325_P10_ST_rep1_stdata.tsv.gz 1.5 Mb
GSM4284325_spot_data-selection-P10_ST_rep1.tsv.gz 7.5 Kb
GSM4284326_P10_ST_rep2.jpg.gz 9.6 Mb
GSM4284326_P10_ST_rep2_stdata.tsv.gz 2.4 Mb
GSM4284326_spot_data-selection-P10_ST_rep2.tsv.gz 7.7 Kb
GSM4284327_P10_ST_rep3.jpg.gz 8.1 Mb
GSM4284327_P10_ST_rep3_stdata.tsv.gz 2.0 Mb
GSM4284327_spot_data-selection-P10_ST_rep3.tsv.gz 5.7 Kb

我们这次就以其中一个样本为例 GSM4284316_P2_ST_rep1，看看怎么读取以及绘图！

代码语言：javascript代码运行次数：0运行复制

GSM4284316_P2_ST_rep1.jpg.gz 10.0 Mb
GSM4284316_P2_ST_rep1_stdata.tsv.gz 3.9 Mb
GSM4284316_spot_data-selection-P2_ST_rep1.tsv.gz 8.1 Kb

文章中关于这个样本，做了基本的聚类分群（图A），以及单细胞中鉴定的一个上皮亚群sc-TSK打分分布（图B），特征基因MMP10的空间表达模式（图C），我们来做做看~

数据读取

1.首先读取表达矩阵：

代码语言：javascript代码运行次数：0运行复制

rm(list=ls())
library(data.table)
library(tidyverse)
library(ggpubr)
library(Seurat)

ct <- fread("GSE144240_RAW/GSM4284316_P2_ST_rep1_stdata.tsv.gz",header = T,data.table = F)
ct[1:5,1:5]
rownames(ct) <- ct[,1]
ct <- ct[,-1]
ct <- t(ct)
ct[1:5,1:5]
dim(ct)
# [1] 17138  1933

这样就得到了17138个基因，1933个spots的表达矩阵，矩阵的列名为空间坐标信息，但是不是具有切片位置的spots，是芯片上面全部的spots。

画一个图就知道啦：

代码语言：javascript代码运行次数：0运行复制

## 空间位置信息
library(stringr)
head(colnames(ct))
position <- str_split(colnames(ct),pattern = "x", n=2,simplify = T)
position <- as.data.frame(position)
position[,1] <- as.numeric(position[,1])
position[,2] <- as.numeric(position[,2])
head(position)
range(position[,1])
range(position[,2])
str(position)
# barcode: The sequence of the barcode associated to the spot.
# in_tissue: Binary, indicating if the spot falls inside (1) or outside (0) of tissue.
# array_row: The row coordinate of the spot in the array from 0 to 77. The array has 78 rows.
# array_col: The column coordinate of the spot in the array. In order to express the orange crate arrangement of the spots, this column index uses even numbers from 0 to 126 for even rows, and odd numbers from 1 to 127 for odd rows. Notice then that each row (even or odd) has 64 spots.
# pxl_row_in_fullres: The row pixel coordinate of the center of the spot in the full resolution image.
# pxl_col_in_fullres: The column pixel coordinate of the center of the spot in the full resolution image.
# 

colnames(position) <- c("row","col")
rownames(position) <- colnames(ct)
str(position)
head(position)

## 绘制一下切片图看看
ggplot(position, aes(x = row, y = col)) + 
  geom_point(data = position,color="blue",size=2)+
  theme_bw()

2.读取切片范围的坐标矩阵

这个信息保存在文件：GSM4284316_spot_data-selection-P2_ST_rep1.tsv.gz

这里绘图，注意翻转一下坐标：上下翻转180度才会跟文章中的图片方向一致：

代码语言：javascript代码运行次数：0运行复制

## 有坐标的 位置
position_select <- read.table("GSE144240_RAW/GSM4284316_spot_data-selection-P2_ST_rep1.tsv.gz",header = T)
head(position_select)
dim(position_select)

# 上下翻转180度才会跟文章中的图片方向一致
position_select$y1 <- -1*position_select$y 

p <- ggplot(position_select, aes(x = x, y = y1)) + 
  geom_point(data = position_select,color="blue",size=2)+
  theme_bw()
p

结果如下：

基本分群分析

现在表达矩阵，位置信息都有了，可以进行基本的分析：

代码语言：javascript代码运行次数：0运行复制

## 提取矩阵中的有spots的点
spots <- paste0(position_select$x,"x",position_select$y)
head(spots)
length(spots)

ct_select <- ct[, spots]
dim(ct_select)
ct_select[1:6,1:6]


#################################################
### 走st流程
ct_select[1:6,1:6]
dim(ct_select)
st <- CreateSeuratObject(ct_select,assay = "Spatial", min.cells = 1, project = "P2_rep1")
st
st <- NormalizeData(st) 
st <- FindVariableFeatures(st)
st <- ScaleData(st, features = rownames(st))
st <- RunPCA(st, features = VariableFeatures(st)) 

## 细胞聚类
st <- FindNeighbors(st, dims = 1:30) 
st <- FindClusters(st, resolution = 2)
# UMAP
st <- RunUMAP(st, dims = 1:30)
p <- DimPlot(st, reduction = "umap", label=T) 
p
# 保存数据
save(st,file = "P2_rep1_st.Rds")

总共得到9个cluster：

聚类图：

代码语言：javascript代码运行次数：0运行复制

## 绘图
head(st@meta.data)
dat <- FetchData(object=st, vars=c("umap_1","umap_2","seurat_clusters"))
head(dat)
head(position_select)

data_plot <- cbind(dat, position_select)
head(data_plot)
table(data_plot$seurat_clusters)

p3 <-ggplot(data_plot, aes(x = x, y = y1)) + 
  geom_point(data = data_plot, aes(color=seurat_clusters),size=2) +
  labs(color="Cluster") + 
  theme_void() 
p3
ggsave(filename = "P2_rep1.png",width = 6,height = 5.5,bg="white")

特征基因绘图

我们可以替换seurat中的坐标换为空间切片坐标进行绘图：

代码语言：javascript代码运行次数：0运行复制

###################################################
# 替换坐标，映射到seurat对象中的函数中，可以使用默认函数可以画图
# 保存原始坐标
embed_umap_raw <- st@reductions$umap@cell.embeddings
head(embed_umap_raw)

embed_umap <- data.frame(umap_1=position_select$x,
                         umap_2=position_select$y1,
                         row.names = paste0(position_select$x,"x",position_select$y)
                         )
head(embed_umap)

# 替换
st@reductions$umap@cell.embeddings <- as.matrix(embed_umap)
DimPlot(st)
p <- FeaturePlot(st,"MMP10",pt.size = 2) + 
  theme_void() + 
  theme(plot.title = element_text(hjust = 0.5))  # 居中对齐标题
p

结果如下：

学会了吗~

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。原始发表：2025-05-04，如有侵权请联系 cloudcommunity@tencent 删除celldatapositionselect数据

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

12个Cell杂志的疑难杂症空转读取与分析：GSE144240