学习记录:featurecounts

news2025/2/28 23:21:51

Input

  • one or more files of aligned reads (short or long reads) in either SAM or BAM format
  • a list of genomic features in either Gene Transfer Format (GTF) or General Feature Format (GFF) or Simplified Annotation Format (SAF)
  • 比对后产生的bam或者sam文件,而且没要求samtools sort或者samtools index处理
  • 基因组注释文件,需要和比对时用的基因组注释文件一致

If the input contains location-sorted paired-end reads, featureCounts will automatically re-order the reads to place next to each other the reads from the same pair before counting them. Sometimes name-sorted paired-end input reads are not compatible with featureCounts (due to for example reporting of multi-mapping results) and in this case featureCounts will also automatically re-order them

If paired reads are used, then each pair of reads defines a DNA or RNA fragment bookended by the two reads. In this case, featureCounts can be instructed to count fragments rather than reads. featureCounts automatically sorts reads by name if paired reads are not in consecutive positions in the SAM or BAM file, with minimal cost. Users do not need to sort their paired reads before providing them to featureCounts

A feature is an interval (range of positions) on one of the reference sequences. A meta-feature is a set of features that represents a biological construct of interest. For example, features often correspond to exons and meta-features to genes. Features sharing the same feature identifier in the GTF or SAF annotation are taken to belong to the same meta-feature

Users may use ‘{minOverlap (minOverlap in R)’ and ‘{fracOverlap (fracOverlap in R)’ options to specify the minimum number of overlapping bases and minimum fraction of overlapping bases requried for assigning a read to a feature, respectively. The ‘{fracOverlap’ option might be particularly useful for counting reads with variable lengths

When counting reads at meta-feature level, a hit is called for a meta-feature if the read overlaps any component feature of the meta-feature. Note that if a read hits a meta-feature, it is always counted once no matter how many features in the meta-feature this read overalps with. For instance, an exon-spanning read overlapping with more than one exon within the same gene only contributes 1 count to the gene

When assigning reads to genes or exons, most reads can be successfully assigned without ambiguity. However if reads are to be assigned to transcripts, due to the high overlap between transcripts from the same gene, many reads will be found to overlap more than one transcript and therefore cannot be uniquely assigned. Specialized transcript-level quantification tools are recommended for counting reads to transcripts. Such tools use model-based approaches to deconvolve reads overlapping with multiple transcripts.

Output

Number and percentage of successfully assigned alignments are also shown in featureCounts screen output.

  • Unassigned Unmapped: unmapped reads cannot be assigned.
  • Unassigned Read Type: reads that have an unexpected read type (eg. being a single end read included in a paired end dataset) and also cannot be counted with confidence (eg. due to stranded counting). Such reads are typically generated from a read trimming program.
  • Unassigned Singleton: read pairs that have only one end mapped.
  • Unassigned MappingQuality: alignments with a mapping quality score lower than the threshold
  • Unassigned Chimera: two ends in a paired end alignment are located on different chromosomes or have unexpected orientation
  • Unassigned FragementLength: fragment length inferred from paired end alignment does not meet the length criteria
  • Unassigned Duplicate: alignments marked as duplicate (indicated in the FLAG field)
  • Unassigned MultiMapping: alignments reported for multi-mapping reads (indicated by ‘NH’ tag)
  • Unassigned Secondary: alignments reported as secondary alignments (indicated in the FLAG field)
  • Unassigned Split (or Unassigned NonSplit): alignments that contain junctions (or do not contain junctions)
  • Unassigned NoFeatures: alignments that do not overlap any feature
  • Unassigned Overlapping Length: alignments that do not overlap any feature (or metafeature) with the minimum required overlap length
  • Unassigned Ambiguity: alignments that overlap two or more features (feature-level summarization) or meta-features (meta-feature-level summarization)

Notes

A multi-mapping read is a read that maps to more than one location in the reference genome. There are multiple options for counting such reads. Users can specify the ‘-M’ option (set countMultiMappingReads to TRUE in R) to fully count every alignment reported for a multimapping read (each alignment carries 1 count),or do not count such reads at all (this is the default behavior)

A multi-overlapping read is a read that overlaps more than one meta-feature when counting reads at meta-feature level or overlaps more than one feature when counting reads at feature level.
By default, featureCounts does not count multi-overlapping reads

featureCounts implements a variety of read filters to facilitate flexible read counting, which should satisfy the requirement of most downstream analyses. The order of these filters being applied is as following (from first to last): unmapped > read type > singleton > mapping quality > chimeric fragment > fragment length > duplicate > multi-mapping > secondary alignment > split reads (or nonsplit reads) > no overlapping features > overlapping length > assignment ambiguity
The ‘read type’ filter removes those reads that have an unexpected read type and also cannot be counted with confidence. For example, if there are single end reads included in a paired end read dataset (such data can be produced from a read trimming program for instance) and reads are required to be counted in a strand-specific manner, then all the single end reads will be excluded from counting because their strandness cannot be determined

Column ‘Length’ always contains one single value which is the total number of non-overlapping bases included in a meta-feature (or a feature)

  • 对于我来说,比较迷惑性的是 -f参数和 -t 参数;-f 指的是在哪个level上描述统计的reads数,比如在gene level上统计了多少reads,在exon level 上统计了多少reads;-t 指的是在那个feature上统计reads,feature可以是exon,utr,gene等,比如-t exon 参数reads map到exon上会被统计,reads map 到utr上就不会被统计。

Arguments

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/161112.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

一次简单的本机调试webshell的经历

环境安装 安装php和nginx,不再赘述 apt-get update apt-get install nginx这里我的版本是php7.4 # php -v PHP 7.4.33 (cli) (built: Jan 6 2023 16:10:36) ( NTS ) Copyright (c) The PHP Group Zend Engine v3.4.0, Copyright (c) Zend Technologies with Zend …

Optional最佳实践(对象操作利器)

前篇文章已经总结了集合的操作是如何在Java8优化的:函数编程和Stream_txxs的博客-CSDN博客,这篇文章总结一下对于对象如何利用Java8进行操作,这样对于大部分代码都可以用Java8的语法进行操作了。 一、 Optional 是什么 Optional 的作者 Bri…

收购淘米后,MMV加速走向迪士尼式IP开发之旅

一家纯正的元宇宙控股公司与经典IP公司碰撞,能擦出怎样的火花?这个问题或将在不远的将来得出答案。 1月12日晚间,刚在一周前以SPAC方式登陆纳斯达克的MMV(MultiMetaVerse,即“元宇宙控股”),宣…

C语言进阶——动态内存管理

目录 一. 为什么存在动态内存分配 二. 动态内存函数 1.malloc 2.free 3.calloc 4.realloc 三. 常见的动态内存错误 一. 为什么存在动态内存分配 在此之前,我们为数组开配空间都是这样的 int nums[10]{0}; 但这样会有很大的局限性 1. 空间开辟大…

十三、Gtk4-TfeTextView函数

TfeTextView相关函数在这一章节介绍 1 tfetextview.h 头文件tfetextview.h提供了: TfeTextView的类型,是TFE_TYPE_TEXT_VIEW。G_DECLARE_FINAL_TYPE的扩展包含了一些有用的宏。定义了open-response信号的常量。tfetextview.c的公共函数被声明。 因此&#xff0c…

「布道人生」第一期:阿里云DevOps资深专家章屹

本期嘉宾——章 屹 阿里云 DevOps 资深专家。2012 年加入阿里巴巴,十年如一日专注在 DevOps 领域的理论、咨询、解决方案和产品技术工作上。作为 CIO 学院和阿里云大学的讲师,为众多大型企业管理高层讲授 DevOps 课程,探讨通过 DevOps 提升企…

drawCell | 不会画细胞结构图就用这个R包吧~ Super Nice!~

1写在前面 我们在paper中经常需要画到细胞结构图,新手ppt一点一点画,高手可能会用AI手搓,土豪直接使用BioRender。🤒 今天给大家大家分享一个代码画细胞结构图的R包,如果你觉得自己不会写代码,不想看了&…

python基础篇之元组、字典(增删改查)

大家好,我是csdn的博主:lqj_本人 这是我的个人博客主页:lqj_本人的博客_CSDN博客-微信小程序,前端,vue领域博主lqj_本人擅长微信小程序,前端,vue,等方面的知识https://blog.csdn.net/lbcyllqj?spm1000.2115.3001.5343 哔哩哔哩欢迎关注&…

震旦ad188复印机报机器故障维修召唤c0211

故障现象: 手送走纸的机器出现嗒嗒的异响,走几张还会跳0211,经常出现卡纸等现象,或者报C0211; 故障分析: 维修召唤C0211可能是感光鼓和载体寿命到期࿰

云原生|Java二级高速缓存架构设计

为什么使用缓存 缓存,主要有两个用途:提高服务性能和并发。 缓存是提高服务响应速度最快的方式之一。 我们设计缓存的目的是减少用户直接访问磁盘、访问网络带来的性能损耗,把磁盘、网络请求的内容存在在内存中,提升应用程序的…

【Doris】Doris数据库最新版安装方法,详细图文教程

环境安装 Doris 作为一款开源的 MPP 架构 OLAP 数据库,能够运行在绝大多数主流的商用服务器上。为了能够充分运用 MPP 架构的并发优势,以及 Doris 的高可用特性,我们建议 Doris 的部署遵循以下需求: Linux 操作系统版本需求Linu…

研讨会回顾 | UI自动化测试现场演示

2022年12月6日,龙智与软件测试自动化“领导者”SmartBear联合举办了主题为“如何通过自动化测试实现降本、增效与提质”的在线研讨会。此次研讨会中,龙智技术工程师邱洁玉现场演示了使用UI自动化测试的过程,并简要介绍了API的自动化测试。 软…

时钟频率与时间单位的换算

1. 频率、时间 各自的单位的关系 频率: 1GHz 1000MHz、1MHz 1000KHz、1KHz 1000Hz 1GHz 103MHz 106KHz 109Hz 时间: 1s 1000ms、1ms 1000μs、1μs 1000ns [注]:s (秒)、ms (毫秒)、μs (微秒)、ns (纳秒)。 2. 时间 和 频率 的换…

jsp税务管理系统Myeclipse开发mysql数据库web结构java编程计算机网页项目

一、源码特点 jsp 税务管理系统 是一套完善的web设计系统,对理解JSP java编程开发语言有帮助,系统具有完整的源代码和数据库,系统主要采用B/S模式开发。开发环境为TOMCAT7.0,Myeclipse8.5开 发,数据库为Mysql,使用ja…

【SpringCloud10】OpenFeign服务接口调用

1.概述 1.1OpenFeign是什么 官网 Feign是一个声明式WebService客户端,使用Feign能让编写Web Service客户端更加简单。 它的使用方法是定义一个服务接口然后在上面添加注解,Feign也支持可拔插式的编码器和解码器,Spring Cloud对Feign进行了…

设计模式-JDBC中的桥接模式

一、首先看整个brige模式的结构图。如下:Abstraction — 抽象化角色:定义抽象的接口,包含一个对实现化角色的引用Refined Abstraciotn — 扩展抽象化角色:抽象化角色的子类,实现父类中的业务方法,并通过组合…

录屏专家怎么用?录屏软件使用教程(附下载)

想要更好地录制电脑屏幕可以使用电脑录屏专家,但有很多人在安装录屏专家之后,不知道如何使用。录屏专家怎么用?怎样使用录屏专家录制电脑屏幕?下面小编给您分享录屏软件使用的教程(附安装教程),…

RabbitMQ实战:性能和安全

本系列是「RabbitMQ实战:高效部署分布式消息队列」书籍的总结笔记。 前两篇介绍了RabbitMQ在可用性、监控方面的考虑,这是基础保障,因为在某些场景下是不容许丢失消息的,但它和性能往往是对立的,需要根据业务场景做取舍…

JSP SSM众包网站系统myeclipse开发mysql数据库springMVC模式java编程计算机网页设计

一、源码特点 JSP SSM众包网站系统 是一套完善的系统源码,对理解JSP java SrpingMVC mybiats 框架 MVC编程开发语言有帮助,系统具有完整的源代码和数据库,以及相应配套的设计文档,系统主要采用B/S模式开发。 研究的基本内容…

JUC并发编程学习笔记——CAS个人理解

1. CAS引出 1.1 悲观锁 顾名思义,就是比较悲观的锁,总是假设最坏的情况,每次去拿数据的时候都认为别人会修改,所以每次在拿数据的时候都会上锁,这样别人想拿这个数据就会阻塞直到它拿到锁(共享资源每次只…