seurat -- 细胞注释部分

news2024/11/15 17:25:02

文章目录

  • brief
  • 寻找差异基因部分
  • 注释细胞部分
  • 详细参数

brief

细胞注释大概分为两步:差异基因 --> marker genes —> map reference

差异基因可以是表达量上存在差异也可以是表达细胞占比上存在差异,通常二者兼顾考虑。

marker genes 个人理解为出现这个基因就可以认为是这种细胞,所以才称为marker gene,marker gene 不等于 difference expression gene,二者有区别和联系。

map reference就是与细胞系的表达谱进行比较,如果细胞相似那么表达谱也会很相似;或者是与上面提到的marker genes进行比较,如果出现了某些marker genes则可以认为其是某一类细胞,但是没有“识别到”marker gene不代表该细胞不属于特殊的类群,可能是没检测到。

singleR链接。
singleR官方链接。

寻找差异基因部分

  • 数据准备阶段
library(dplyr)
library(Seurat)
library(patchwork)
library(sctransform)
library(ggplot2)
# devtools::install_github('satijalab/seurat-data')
library(SeuratData)


rm(list=ls())


# 获取测试数据集
# For convenience, we distribute this dataset through our SeuratData package.
# install dataset
InstallData("ifnb")
# load dataset
LoadData("ifnb")


# split the dataset into a list of two seurat objects (stim and CTRL)
ifnb.list <- SplitObject(ifnb, split.by = "stim")

# Performing integration on datasets normalized with logNormalization
# normalize and identify variable features for each dataset independently
ifnb.list <- lapply(X = ifnb.list, FUN = function(x) {
  x <- NormalizeData(x)
  x <- FindVariableFeatures(x, selection.method = "vst", nfeatures = 2000)
})


features <- SelectIntegrationFeatures(object.list = ifnb.list)

immune.anchors <- FindIntegrationAnchors(object.list = ifnb.list, anchor.features = features)

immune.combined <- IntegrateData(anchorset = immune.anchors)

str(immune.combined)
  • 寻找差异基因
DefaultAssay(immune.combined) <- "integrated"

immune.combined <- ScaleData(immune.combined, verbose = FALSE)
immune.combined <- RunPCA(immune.combined, npcs = 30, verbose = FALSE)
immune.combined <- RunUMAP(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindNeighbors(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindClusters(immune.combined, resolution = 0.5)

head(immune.combined)
levels(immune.combined)
# 寻找差异基因
# Find differentially expressed features between seurat_clusters 0 and seurat_clusters 11
monocyte.de.markers <- FindMarkers(immune.combined, ident.1 = "0", ident.2 = "11",only.pos = TRUE)

# view results
head(monocyte.de.markers)

# 寻找差异基因
# Find differentially expressed features between seurat_clusters 0 and seurat_clusters 11
monocyte.de.markers_2 <- FindMarkers(immune.combined, ident.1 = "0", 
									ident.2 = "11",only.pos = TRUE,test.use = "roc")

# view results
head(monocyte.de.markers_2)
x <- intersect(rownames(monocyte.de.markers),rownames(monocyte.de.markers_2))
length(rownames(monocyte.de.markers))
length(rownames(monocyte.de.markers_2))
length(x)

在这里插入图片描述

注释细胞部分

# 细胞注释
# cell cluster annotation using SingleR  <==================
library(SingleR)
library(SingleCellExperiment)
library(celldex)

# 获取reference
hpca.se <- HumanPrimaryCellAtlasData()

# 这里是用RNA下面的counts作为query与reference去比较
counts <- GetAssayData(immune.combined[["RNA"]], slot="counts")
pred.sce <- SingleR(test = counts, ref = hpca.se, labels = hpca.se$label.main, 
                    clusters=immune.combined@meta.data$seurat_clusters)
write.table(pred.sce,"sce_singler_1.annotation.xls",sep="\t", quote=FALSE)

# 这里是用integrated下面的data作为query与reference去比较
counts <- GetAssayData(immune.combined[["integrated"]], slot="data")
pred.sce_2 <- SingleR(test = counts, ref = hpca.se, labels = hpca.se$label.main, 
                    clusters=immune.combined@meta.data$seurat_clusters)
write.table(pred.sce_2,"sce_singler_2.annotation.xls",sep="\t", quote=FALSE)

pred.sce$labels
pred.sce_2$labels

# 更新cluster对应的细胞类群名称
new.cluster.ids <- pred.sce$labels
names(new.cluster.ids) <- levels(immune.combined)
immune.combined <- RenameIdents(immune.combined, new.cluster.ids)

##########################################################################################
# 对细胞子类进行DE查找和注释。这里以T细胞大类为演示对象   <======================

sce_T <- FindSubCluster(immune.combined, cluster="T_cells", graph.name="integrated_snn", 
                         subcluster.name="T_sub", resolution = 0.5, algorithm = 1)

T_sub_clusters <- unique(sce_T$T_sub)[grep("T_cells", unique(sce_T$T_sub))]

for (i in T_sub_clusters){
  markers <- FindMarkers(immune.combined, ident.1 = i, only.pos=T, assay="RNA", group.by = "T_sub")
  write.table(markers, file=paste("T_sub/cluster_",i,"_pos.markers.xls", sep=""), sep="\t", quote=F)
}

# 手动指定细胞类群的名称
immune.combined@meta.data[immune.combined@meta.data$T_sub == "T_cells_0",]$cell_cluster_rename <- "CD8+ T cell"
immune.combined@meta.data[immune.combined@meta.data$T_sub == "T_cells_1",]$cell_cluster_rename <- "CD4+ T cell"

详细参数

在这里插入图片描述

  • object
    An object

  • slot
    Slot to pull data from; note that if test.use is “negbinom”, “poisson”, or “DESeq2”, slot will be set to “counts”

  • counts
    Count matrix if using scale.data for DE tests. This is used for computing pct.1 and pct.2 and for filtering features based on fraction expressing

  • cells.1
    Vector of cell names belonging to group 1

  • cells.2
    Vector of cell names belonging to group 2

  • features
    Genes to test. Default is to use all genes

  • logfc.threshold
    Limit testing to genes which show, on average, at least X-fold difference (log-scale) between the two groups of cells. Default is 0.25 Increasing logfc.threshold speeds up the function, but can miss weaker signals.

  • test.use
    Denotes which test to use. Available options are:

    • “wilcox” : Identifies differentially expressed genes between two groups of cells using a Wilcoxon Rank Sum test (default)

    • “bimod” : Likelihood-ratio test for single cell gene expression, (McDavid et al., Bioinformatics, 2013)

    • “roc” : Identifies ‘markers’ of gene expression using ROC analysis. For each gene, evaluates (using AUC) a classifier built on that gene alone, to classify between two groups of cells. An AUC value of 1 means that expression values for this gene alone can perfectly classify the two groupings (i.e. Each of the cells in cells.1 exhibit a higher level than each of the cells in cells.2). An AUC value of 0 also means there is perfect classification, but in the other direction. A value of 0.5 implies that the gene has no predictive power to classify the two groups. Returns a ‘predictive power’ (abs(AUC-0.5) * 2) ranked matrix of putative differentially expressed genes.

    • “t” : Identify differentially expressed genes between two groups of cells using the Student’s t-test.

    • “negbinom” : Identifies differentially expressed genes between two groups of cells using a negative binomial generalized linear model. Use only for UMI-based datasets

    • “poisson” : Identifies differentially expressed genes between two groups of cells using a poisson generalized linear model. Use only for UMI-based datasets

    • “LR” : Uses a logistic regression framework to determine differentially expressed genes. Constructs a logistic regression model predicting group membership based on each feature individually and compares this to a null model with a likelihood ratio test.

    • “MAST” : Identifies differentially expressed genes between two groups of cells using a hurdle model tailored to scRNA-seq data. Utilizes the MAST package to run the DE testing.

    • “DESeq2” : Identifies differentially expressed genes between two groups of cells based on a model using DESeq2 which uses a negative binomial distribution (Love et al, Genome Biology, 2014).This test does not support pre-filtering of genes based on average difference (or percent detection rate) between cell groups. However, genes may be pre-filtered based on their minimum detection rate (min.pct) across both cell groups. To use this method, please install DESeq2, using the instructions at https://bioconductor.org/packages/release/bioc/html/DESeq2.html

  • min.pct
    only test genes that are detected in a minimum fraction of min.pct cells in either of the two populations. Meant to speed up the function by not testing genes that are very infrequently expressed. Default is 0.1

  • min.diff.pct
    only test genes that show a minimum difference in the fraction of detection between the two groups. Set to -Inf by default

  • verbose
    Print a progress bar once expression testing begins

  • only.pos
    Only return positive markers (FALSE by default)

  • max.cells.per.ident
    Down sample each identity class to a max number. Default is no downsampling. Not activated by default (set to Inf)

  • random.seed
    Random seed for downsampling

  • latent.vars
    Variables to test, used only when test.use is one of ‘LR’, ‘negbinom’, ‘poisson’, or ‘MAST’

  • min.cells.feature
    Minimum number of cells expressing the feature in at least one of the two groups, currently only used for poisson and negative binomial tests

  • min.cells.group
    Minimum number of cells in one of the groups

  • pseudocount.use
    Pseudocount to add to averaged expression values when calculating logFC. 1 by default.

  • fc.results
    data.frame from FoldChange

  • densify
    Convert the sparse matrix to a dense form before running the DE test. This can provide speedups but might require higher memory; default is FALSE

  • mean.fxn
    Function to use for fold change or average difference calculation. If NULL, the appropriate function will be chose according to the slot used

  • fc.name
    Name of the fold change, average difference, or custom function column in the output data.frame. If NULL, the fold change column will be named according to the logarithm base (eg, “avg_log2FC”), or if using the scale.data slot “avg_diff”.

  • base
    The base with respect to which logarithms are computed.

  • norm.method
    Normalization method for fold change calculation when slot is “data”

  • recorrect_umi
    Recalculate corrected UMI counts using minimum of the median UMIs when performing DE using multiple SCT objects; default is TRUE

  • ident.1
    Identity class to define markers for; pass an object of class phylo or ‘clustertree’ to find markers for a node in a cluster tree; passing ‘clustertree’ requires BuildClusterTree to have been run

  • ident.2
    A second identity class for comparison; if NULL, use all other cells for comparison; if an object of class phylo or ‘clustertree’ is passed to ident.1, must pass a node to find markers for

  • group.by
    Regroup cells into a different identity class prior to performing differential expression (see example)

  • subset.ident
    Subset a particular identity class prior to regrouping. Only relevant if group.by is set (see example)

  • assay
    Assay to use in differential expression testing

  • reduction
    Reduction to use in differential expression testing - will test for DE on cell embeddings

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/506155.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

蓝牙网状网络的基本原理及应用开发

借助蓝牙 5 的网状网络功能&#xff0c;开发人员可以增强无线连接系统&#xff08;如物联网设备&#xff09;的通信范围和网络可用性。但是&#xff0c;网状网络的低功耗无线硬件设计与网状网络软件开发之间存在着复杂的层次&#xff0c;这可能会使开发人员迅速陷入混乱并危及项…

GLM论文精读-自回归填空的通用语言模型

GLM作为ChatGLM的前期基础论文&#xff0c;值得精读。本文是对GLM论文的精读笔记&#xff0c;希望对大家有帮助。GLM主要思想概述&#xff0c;利用自回归填空的思想&#xff0c;基于transformer的编码器实现了同时在NLU和有无条件生成任务上较好的表现。 基本信息 原文&#…

设计模式 -- 备忘录模式

前言 月是一轮明镜,晶莹剔透,代表着一张白纸(啥也不懂) 央是一片海洋,海乃百川,代表着一块海绵(吸纳万物) 泽是一柄利剑,千锤百炼,代表着千百锤炼(输入输出) 月央泽,学习的一种过程,从白纸->吸收各种知识->不断输入输出变成自己的内容 希望大家一起坚持这个过程,也同…

邮件营销自动化:优化营销流程,提升转化率

对于希望与客户联系&#xff0c;并推广其产品或服务的企业来说&#xff0c;电子邮件营销是一个强大的工具。然而&#xff0c;随着电子邮件通信量的持续增长&#xff0c;企业要跟上客户对个性化和及时性消息的需求&#xff0c;可能会面临一定的挑战。而这就是电子邮件营销自动化…

干货满满!破解FP安全收款难题

怎样安全收款是做擦边产品卖家比较忧虑的问题&#xff0c;2023年已经即将来到了年中&#xff0c;跨境卖家们在这一方面做得怎么样了呢&#xff1f; 这期分享破解FP独立站收款难题的方法。 一、商家破解FP收款难题方法 1.第三方信用通道 优点&#xff1a;信用卡在国外使用率比…

强化学习p2-价值学习

基本概念 折扣回报(Discounted Return) 在 MDP 中&#xff0c;通常使用折扣回报 (discounted return)&#xff0c;给未来的奖励做折扣。折扣回报的定义如下: U t R t γ R t 1 γ 2 R t 2 γ 3 R t 3 . . . U_t R_t\gamma R_{t1}\gamma ^2R_{t2}\gamma ^3R_{t3}...…

【IoT】<硬件产品经理进阶课> 正式在CSDN学院上线

目录 课程目录 适用人群 课程介绍 课程地址 课程目录 001-产品经理进阶&#xff1a;开课介绍 002-产品经理进阶&#xff1a;产品经理简介 003-产品经理进阶&#xff1a;产品经理所需具备的核心素质 004-产品经理进阶&#xff1a;产品经理的进阶路径 005-产品经理进阶&a…

指定城市|眼科医生入世界名校斯坦福大学访学深造

J医生计划利用一年时间自费到美国进行访学交流。提出的要求是专业匹配&#xff0c;兼顾基础医学研究及眼科临床观摩&#xff0c;并且指定城市&#xff0c;希望在今年3、4月份出国。最终我们确定了世界名校斯坦福大学。邀请函上明示&#xff1a;访学期间除从事基础研究外&#x…

手把手教你学习PyQT5:打造精美、功能强大的桌面应用程序(更新中。。)

目录 前言一、PyQt5介绍&开发环境安装&简单案例分析1-1、PyQt5的介绍1-2、开发环境安装1-3、简单案例分析 二、QT Designer2-1、安装和配置2-2、QT Designer基础入门2-3、ui文件转换为python文件 三、PyQt5基本窗口控件&#xff08;QMain Window、Qwidget、QDialog、Ql…

C# 利用ffmpeg的image2pipe参数实现USB摄系头本地预览同时推流

本地USB摄像头在使用中时&#xff0c;不支持另一个程序的并发访问&#xff0c;也就是所USB摄像头只能令第一个连接的程序“独享”。 在开发一个软件时&#xff0c;希望实现预览USB摄像头的同时&#xff0c;实现摄像头的推流。 推流要用的ffmpeg&#xff0c;经过资料查找&…

mac m2芯片 安装 brew 和cocoapods

Homebrew的安装 /bin/zsh -c "$(curl -fsSL https://gitee.com/cunkai/HomebrewCN/raw/master/Homebrew.sh)" 这里可能会失败&#xff0c;如 git clone 时候报错 error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly before end of the underlyi…

进程(二)

进程二 2.6 调度的概念、层次2.6.1 基本概念2.6.2 三个层次2.6.3 三层调度的联系、对比2.6.4 补充知识2.6.5 本小节总结 2.7 进程调度的时机、切换与过程、方式2.7.1 进程调度的时机2.7.2 切换与过程2.7.3 进程调度的方式2.7.4 总结 2.8 调度器/调度程序/闲逛线程2.9 调度算法的…

HTML5 + JavaScript绘柱状图

之前用HTML5 JavaScript绘柱状图&#xff0c;可以直观显示各类型产品或品牌的所占比例大小。详见&#xff1a; HTML5 JavaScript绘柱状图1 现在需要针对每年获得各类品牌数据进行对比&#xff0c;绘制柱状图会更直观。 首先我们定义二维数组aBrandType&#xff0c;存放品牌…

双指针的基本应用

一、环形链表 I 方法1:哈希表 struct hashTable {struct ListNode* key;UT_hash_handle hh; };struct hashTable* hashtable;struct hashTable* find(struct ListNode* ikey) {struct hashTable* tmp;HASH_FIND_PTR(hashtable, &ikey, tmp);return tmp; }void insert(struc…

页面一打开就有30个重复请求,优化方法

一、写在前面 上周测试同事给我提了个bug。他说在公司运营系统某个编辑页面中&#xff0c;一个post请求调用太多次了&#xff0c;想让我看看怎么回事。我刚听他讲这个事情时心里有点不屑一顾&#xff0c;觉得能有多少次啊&#xff0c;大惊小怪的。然而当我在测试环境中打开那个…

经典文献阅读之--A Lifelong Learning Approach to Mobile Robot Navigation(终生学习轨迹导航)

0. 简介 终生学习作为近年来比较火的一种深度学习方式&#xff0c;导航终身学习(LLfN)旨在解决标准导航问题的一种新变体&#xff0c;在该问题中&#xff0c;智能体在有限的内存预算下&#xff0c;通过学习提高在线经验或跨环境的导航性能。而最近有一篇文章《A Lifelong Lear…

Python数据分析实战【十四】:你知道python中有几种排序方法吗【文末源码地址】

文章目录 一、List.sort()排序案例一&#xff1a;按照列表中的元素进行排序案例二&#xff1a;按照销售额数据进行排列 二、sorted()排序案例一&#xff1a;sorted()对列表进行排序案例二&#xff1a;sorted()对字典进行排序案例三&#xff1a;sorted()对列表中的字典元素排序 …

[工具]Pytorch-lightning的使用

Pytorch-lightning的使用 Pytorch-lightning介绍Pytorch-lightning与Pytorch的区别Pytorch-lightning框架的优势Pytorch-lightning框架 重要资源 Pytorch-lightning介绍 这里介绍Pytorch_lighting框架. Pytorch-lightning与Pytorch的区别 Pytorch-lightning可以简单的看作是…

shiro反序列化[cve_2016_4437]

目录 什么是shiro&#xff1f; 漏洞原理 漏洞复现 修复方案 什么是shiro&#xff1f; Apache Shiro是一款开源安全框架&#xff0c;提供身份验证、授权、密码学和会话管理。Shiro框架直观、易用&#xff0c;同时也能提供健壮的安全性。 漏洞原理 Apache Shiro 1.2.4及以前版本…

D1. LuoTianyi and the Floating Islands (Easy Version)(树形dp)

Problem - D1 - Codeforces 这是问题的简化版本。唯一的区别在于在该版本中k≤min(n,3)。只有在两个版本的问题都解决后&#xff0c;才能进行黑客攻击。 琴音和漂浮的岛屿。 洛天依现在生活在一个有n个漂浮岛屿的世界里。这些漂浮岛屿由n−1个无向航线连接&#xff0c;任意两个…