seurat -- 关于DE gene的讨论

news2024/12/24 7:02:09

实例

# 加载演示数据集
library(Seurat)
library(SeuratData)
pbmc <- LoadData("pbmc3k", type = "pbmc3k.final")
# list options for groups to perform differential expression on
levels(pbmc)

## [1] "Naive CD4 T"  "Memory CD4 T" "CD14+ Mono"   "B"            "CD8 T"       
## [6] "FCGR3A+ Mono" "NK"           "DC"           "Platelet"
# 默认使用非参秩和检验做差异性分析
# Find differentially expressed features between CD14+ and FCGR3A+ Monocytes
monocyte.de.markers <- FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono")
# view results
head(monocyte.de.markers)
# Find differentially expressed features between CD14+ Monocytes and all other cells, only
# search for positive markers
monocyte.de.markers <- FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = NULL, only.pos = TRUE)
# view results
head(monocyte.de.markers)
# 过滤掉一些细胞加速运算,比如一些gene只在cluster的少部分细胞里表达
# Pre-filter features that are detected at <50% frequency in either CD14+ Monocytes or FCGR3A+
# Monocytes
head(FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono", min.pct = 0.5))
# Pre-filter features that have less than a two-fold change between the average expression of
# CD14+ Monocytes vs FCGR3A+ Monocytes
head(FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono", logfc.threshold = log(2)))
# 参数检验的时候如果数据量不一样计算的误差会较大
# 这里直接把数据量差不多的gene过滤掉了,难道非参数检验不关注数据量的问题?
# Pre-filter features whose detection percentages across the two groups are similar (within
# 0.25)
head(FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono", min.diff.pct = 0.25))
# 还可以对大的cluster进行抽样获取最多200个细胞进行DE分析
# 个人感觉cluster内部其实也不均一,抽样会更不均一,假阳性会增加吧
# Subsample each group to a maximum of 200 cells. Can be very useful for large clusters, or
# computationally-intensive DE tests
head(FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono", max.cells.per.ident = 200))

Perform DE analysis using alternative tests

在这里插入图片描述
其中 MAST and DESeq2需要自己额外安装。
t-test是个人最不推荐使用的方法。

# Test for DE features using the MAST package
head(FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono", test.use = "MAST"))
# Test for DE features using the DESeq2 package. Throws an error if DESeq2 has not already
# been installed Note that the DESeq2 workflows can be computationally intensive for large
# datasets, but are incompatible with some feature pre-filtering options We therefore suggest
# initially limiting the number of cells used for testing
head(FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono", test.use = "DESeq2", max.cells.per.ident = 50))

具体参数

在这里插入图片描述

  • object
    An object

  • slot
    Slot to pull data from; note that if test.use is “negbinom”, “poisson”, or “DESeq2”, slot will be set to “counts”。这里就很有意思,不同的normalization方法处理后的数据放在了不同的data下面,具体是哪个?对结果影响怎样呢?

  • counts
    Count matrix if using scale.data for DE tests. This is used for computing pct.1 and pct.2 and for filtering features based on fraction expressing

  • cells.1
    Vector of cell names belonging to group 1

  • cells.2
    Vector of cell names belonging to group 2

  • features
    Genes to test. Default is to use all genes

  • logfc.threshold
    Limit testing to genes which show, on average, at least X-fold difference (log-scale) between the two groups of cells. Default is 0.25 Increasing logfc.threshold speeds up the function, but can miss weaker signals.

  • test.use
    Denotes which test to use. Available options are:

    • “wilcox” : Identifies differentially expressed genes between two groups of cells using a Wilcoxon Rank Sum test (default)

    • “bimod” : Likelihood-ratio test for single cell gene expression, (McDavid et al., Bioinformatics, 2013)

    • “roc” : Identifies ‘markers’ of gene expression using ROC analysis. For each gene, evaluates (using AUC) a classifier built on that gene alone, to classify between two groups of cells. An AUC value of 1 means that expression values for this gene alone can perfectly classify the two groupings (i.e. Each of the cells in cells.1 exhibit a higher level than each of the cells in cells.2). An AUC value of 0 also means there is perfect classification, but in the other direction. A value of 0.5 implies that the gene has no predictive power to classify the two groups. Returns a ‘predictive power’ (abs(AUC-0.5) * 2) ranked matrix of putative differentially expressed genes.

    • “t” : Identify differentially expressed genes between two groups of cells using the Student’s t-test.

    • “negbinom” : Identifies differentially expressed genes between two groups of cells using a negative binomial generalized linear model. Use only for UMI-based datasets

    • “poisson” : Identifies differentially expressed genes between two groups of cells using a poisson generalized linear model. Use only for UMI-based datasets

    • “LR” : Uses a logistic regression framework to determine differentially expressed genes. Constructs a logistic regression model predicting group membership based on each feature individually and compares this to a null model with a likelihood ratio test.

    • “MAST” : Identifies differentially expressed genes between two groups of cells using a hurdle model tailored to scRNA-seq data. Utilizes the MAST package to run the DE testing.

    • “DESeq2” : Identifies differentially expressed genes between two groups of cells based on a model using DESeq2 which uses a negative binomial distribution (Love et al, Genome Biology, 2014).This test does not support pre-filtering of genes based on average difference (or percent detection rate) between cell groups. However, genes may be pre-filtered based on their minimum detection rate (min.pct) across both cell groups. To use this method, please install DESeq2, using the instructions at https://bioconductor.org/packages/release/bioc/html/DESeq2.html

  • min.pct
    only test genes that are detected in a minimum fraction of min.pct cells in either of the two populations. Meant to speed up the function by not testing genes that are very infrequently expressed. Default is 0.1

  • min.diff.pct
    only test genes that show a minimum difference in the fraction of detection between the two groups. Set to -Inf by default。

  • verbose
    Print a progress bar once expression testing begins

  • only.pos
    Only return positive markers (FALSE by default)

  • max.cells.per.ident
    Down sample each identity class to a max number. Default is no downsampling. Not activated by default (set to Inf)

  • random.seed
    Random seed for downsampling

  • latent.vars
    Variables to test, used only when test.use is one of ‘LR’, ‘negbinom’, ‘poisson’, or ‘MAST’

  • min.cells.feature
    Minimum number of cells expressing the feature in at least one of the two groups, currently only used for poisson and negative binomial tests

  • min.cells.group
    Minimum number of cells in one of the groups

  • pseudocount.use
    Pseudocount to add to averaged expression values when calculating logFC. 1 by default.

  • fc.results
    data.frame from FoldChange

  • densify
    Convert the sparse matrix to a dense form before running the DE test. This can provide speedups but might require higher memory; default is FALSE

  • mean.fxn
    Function to use for fold change or average difference calculation. If NULL, the appropriate function will be chose according to the slot used

  • fc.name
    Name of the fold change, average difference, or custom function column in the output data.frame. If NULL, the fold change column will be named according to the logarithm base (eg, “avg_log2FC”), or if using the scale.data slot “avg_diff”.

  • base
    The base with respect to which logarithms are computed.

  • norm.method
    Normalization method for fold change calculation when slot is “data”

  • recorrect_umi
    Recalculate corrected UMI counts using minimum of the median UMIs when performing DE using multiple SCT objects; default is TRUE

  • ident.1
    Identity class to define markers for; pass an object of class phylo or ‘clustertree’ to find markers for a node in a cluster tree; passing ‘clustertree’ requires BuildClusterTree to have been run

  • ident.2
    A second identity class for comparison; if NULL, use all other cells for comparison; if an object of class phylo or ‘clustertree’ is passed to ident.1, must pass a node to find markers for

  • group.by
    Regroup cells into a different identity class prior to performing differential expression (see example)

  • subset.ident
    Subset a particular identity class prior to regrouping. Only relevant if group.by is set (see example)

  • assay
    Assay to use in differential expression testing

  • reduction
    Reduction to use in differential expression testing - will test for DE on cell embeddings

在这里插入图片描述

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/481391.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

Orangepi Zero2 全志H616(DHT11温湿度检测)

最近在学习Linux应用和安卓开发过程中&#xff0c;打算把Linux实现的温湿度显示安卓app上&#xff0c;于是在此之前先基于Orangepi Zero2 全志H616下的wiringPi库对DHT11进行开发&#xff0c;本文主要记录开发过程的一些问题和细节&#xff0c;主要简单通过开启线程来接收温湿度…

LeetCode1376. 通知所有员工所需的时间

【LetMeFly】1376.通知所有员工所需的时间 力扣题目链接&#xff1a;https://leetcode.cn/problems/time-needed-to-inform-all-employees/ 公司里有 n 名员工&#xff0c;每个员工的 ID 都是独一无二的&#xff0c;编号从 0 到 n - 1。公司的总负责人通过 headID 进行标识。…

QML动画分组(Grouped Animations)

通常使用的动画比一个属性的动画更加复杂。例如你想同时运行几个动画并把他们连接起来&#xff0c;或者在一个一个的运行&#xff0c;或者在两个动画之间执行一个脚本。动画分组提供了很好的帮助&#xff0c;作为命名建议可以叫做一组动画。有两种方法来分组&#xff1a;平行与…

SNAP + StaMPS 处理Sentinel-1哨兵1 时间序列

SNAP StaMPS 处理Sentinel-1哨兵1 时间序列 Step0: 文件准备及路径设置 0.1 前往GitHub下载snap2stamps: Github snap2stamps 0.2 新建工作路径&#xff0c;用来进行数据处理&#xff0c;并将下载的snap2stamps解压到该文件夹下&#xff0c;并新建两个文件夹&#xff0c;ma…

二叉搜索树的最小绝对差

1题目 给你一个二叉搜索树的根节点 root &#xff0c;返回 树中任意两不同节点值之间的最小差值 。 差值是一个正数&#xff0c;其数值等于两值之差的绝对值。 示例 1&#xff1a; 输入&#xff1a;root [4,2,6,1,3] 输出&#xff1a;1示例 2&#xff1a; 输入&#xff1a;r…

证券从业资格证-考前复习

备考2023年6月证券从业资格证&#xff0c;每章思维导图及相关概念&#xff0c;用于考前复习 1. 金融市场基础知识 1.1 第一章 金融市场体系 #mermaid-svg-XEPZZTVBmo6nGm2Y {font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}#merm…

【14.HTML-移动端适配】

移动端适配 1 布局视口和视觉视口1.1 设置移动端布局视口宽度 2 移动端适配方案2.1 rem单位动态html的font-size&#xff1b;2.2 vw单位2.3 rem和vw对比2.4 flex的弹性布局 1 布局视口和视觉视口 1.1 设置移动端布局视口宽度 避免布局视口宽度默认980px带了的缩放问题,并且禁止…

周赛343(模拟、网格图求最短路、贪心)

文章目录 周赛343[6341. 保龄球游戏的获胜者](https://leetcode.cn/problems/determine-the-winner-of-a-bowling-game/)模拟 技巧 [2661. 找出叠涂元素](https://leetcode.cn/problems/first-completely-painted-row-or-column/)模拟 [2662. 前往目标的最小代价](https://lee…

给jar包编写start和stop脚本

文章目录 前言一、编写脚本1.start.sh2.stop.sh3.restart.sh 二、展示总结 前言 springboot项目内置tomcat,一般都是以jar包形式对外发布服务,我们不能每次都去kill pid,抽到脚本里来做这个事会方便许多。 一、编写脚本 1.start.sh #!/bin/bash APP_NAME"springboot2.3…

基于深度学习的水果检测与识别系统(Python界面版,YOLOv5实现)

摘要&#xff1a;本博文介绍了一种基于深度学习的水果检测与识别系统&#xff0c;使用YOLOv5算法对常见水果进行检测和识别&#xff0c;实现对图片、视频和实时视频中的水果进行准确识别。博文详细阐述了算法原理&#xff0c;同时提供Python实现代码、训练数据集&#xff0c;以…

QT 中的多线程之 moveToThread

文章目录 1. 概述2. 方法描述3. 代码&#xff1a;4. 运行结果及说明5. 注意事项6. 结语 1. 概述 ​QThread 类提供了一个与平台无关的管理线程的方法。一个 QThread 对象管理一个线程。 QThread 的执行从 run() 函数的执行开始&#xff0c;在 Qt 自带的 QThread 类中&#xf…

GraalVM编译SpringBoot程序

GraalVM 提供了一个名为 “Native Image” 的工具&#xff0c;它能够将 Java 应用程序预编译成本机可执行文件。这种方法的优点是启动速度快&#xff0c;内存占用少&#xff0c;因为程序运行时不需要 JVM 和类加载。 然而这种方式也存在一些弊端&#xff0c;如预编译的 GraalV…

扩展ACL配置

扩展ACL配置 【实验目的】 掌握扩展ACL的配置。认识扩展ACL的作用。验证配置。 【实验拓扑】 实验拓扑如图1所示。 图1 实验拓扑 设备参数如表所示。 表1 设备参数表 设备 接口 IP地址 子网掩码 默认网关 R1 S0/3/0 192.168.1.1 255.255.255.252 N/A Fa0/0/0 …

操作系统:文件系统基础

文件系统基础 ​ 文件是以硬盘为载体的存储在计算机上的信息集合&#xff0c;文件可以是文档、图片、程序等。在系统运行时&#xff0c;计算机以进程为基本单位进行资源的调度和分配&#xff1b;而在用户的输入和输出中&#xff0c;则以文件为基本单位 文件控制块和索引节点 …

【Python基础练习100题--第二篇:文件篇】

前言 这些题都是在B站的练习题&#xff0c;链接在这 对于刚学python的新手来说十分的适合&#xff0c; 可以加强和巩固我们的基础。 嘿嘿 一起噶油吧&#xff01;&#x1f349; &#x1f349;1.对学生成绩排序 # 这里对字典进行排序&#xff0c;同事使用到了sorted函数 # 这…

stm32F103 WIFIESP8266模块连接阿里云平台

(43条消息) ESP8266 AT MQTT 透传指令接入阿里云物联网平台笔记&#xff1b;_安信可科技的博客-CSDN博客 这里这篇博客提到的固件是错误的 我用的固件是这个&#xff0c;刷固件之后&#xff0c;可以连上阿里云。 ATMQTTCLIENTID0,"ClienId"//clientId第二个参数注…

权限提升:烂土豆 || DLL 劫持.

权限提升&#xff1a;烂土豆 || DLL 劫持. 权限提升简称提权&#xff0c;由于操作系统都是多用户操作系统&#xff0c;用户之间都有权限控制&#xff0c;比如通过 Web 漏洞拿到的是 Web 进程的权限&#xff0c;往往 Web 服务都是以一个权限很低的账号启动的&#xff0c;因此通…

数字化转型:如何利用文件管理系统提高制造业的工作效率

对于制造行业的企业用户&#xff0c;在日常企业文件管理中会遇到怎样的问题呢&#xff1f;制造业又该该如何高效管理文件呢&#xff1f; 这些问题困扰着机械制造行业客户…… 1&#xff09;项目多、文件杂&#xff0c;统一管理不便 2&#xff09;文档资料在公司内部流转不畅 …