从细菌基因组中提取噬菌体变异序列工具PhaseFinder的介绍、安装和使用方法

news2025/1/23 17:32:48

PhaseFinder

## 概览,不翻译了,大家自己看吧
The PhaseFinder algorithm is designed to detect DNA inversion mediated phase variation in bacterial genomes using genomic or metagenomic sequencing data. It works by identifying regions flanked by inverted repeats, mimicking their inversion in silico, and identifying regions where sequencing reads support both orientations. Here, we define phase variation as "a process employed by bacteria to generate frequent and reversible changes within specific hypermutable loci, introducing phenotypic diversity into clonal populations”. Not every region detected by PhaseFinder will directly result in phase variation, but the results should be highly enriched for regions that do. 

github: https://github.com/XiaofangJ/PhaseFinder

## Prerequisites,安装依赖
+ [Biopython](https://biopython.org/)
+ [pandas](https://pandas.pydata.org)
+ [samtools](http://samtools.sourceforge.net/) (>=1.4)
+ [bowtie](https://github.com/BenLangmead/bowtie)(>=version 1.2.0)
+ [einverted](http://emboss.sourceforge.net/apps/release/6.6/emboss/apps/einverted.html)
+ [bedops](https://bedops.readthedocs.io/en/latest/)
+ [bedtools](https://bedtools.readthedocs.io/en/latest/)

To install PhaseFinder,安装

git clone git@github.com:nlm-irp-jianglab/PhaseFinder.git
cd PhaseFinder
conda env create --file environment.yml
conda activate PhaseFinder

快速开始
All you need to get started is a genome (in fasta format) you would like to search for invertible DNA regions and genomic sequencing data (preferrably Illumina in fastq format) from the same organism, or metagenomic sequencing data from a sample containing the organism (preferrably Illumina in fastq format). 

To test PhaseFinder, you can use the example files (genome: test.fa, genomic data: p1.fq, p2.fq) Example:

# Identify regions flanked by inverted repeats 
python PhaseFinder.py locate -f ./data/test.fa -t ./data/test.einverted.tab -g 15 85 -p 

# Mimic inversion
python PhaseFinder.py create -f ./data/test.fa -t ./data/test.einverted.tab -s 1000 -i ./data/test.ID.fasta

# Identify regions where sequencing reads support both orientations 
python PhaseFinder.py ratio -i ./data/test.ID.fasta -1 ./data/p1.fq -2 ./data/p2.fq -p 16 -o ./data/out

If successful, the output will be in data/out.ratio.txt

In this example, there is one real example of an invertible DNA region "am_0171_0068_d5_0006:81079-81105-81368-81394" because only this region has reads supporting both the F and R orientation. 

---

教程Tutorial
1. Generate a position table of regions flanked by inverted repeats 
Users can identify inverted repeats using the "PhaseFinder.py locate" command, or generate their own table.

1.1. Generate the position table with the PhaseFinder script

Usage: PhaseFinder.py locate [OPTIONS]

  Locate putative inverted regions

Options:
  -f, --fasta PATH        Input genome sequence file in fasta format
                          [required]
  -t, --tab PATH          Output table with inverted repeats coordinates
                          [required]
  -e, --einv TEXT         Einverted parameters, if unspecified run with
                          PhaseFinder default pipeline
  -m, --mismatch INTEGER  Max number of mismatches allowed between IR pairs,
                          used with -einv (default:3)
  -r, --IRsize INTEGER    Max size of the inverted repeats, used with -einv
                          (default:50)
  -g, --gcRatio MIN MAX   The minimum and maximum value of GC ratio
  -p, --polymer           Remove homopolymer inverted repeats
  --help                  Show this message and exit.

Input: A fasta file containing the genome sequence
Output: A table file containing the postion information of invereted repeats in the genome

Examples:
* Run the default PhaseFinder locate parameters

python PhaseFinder.py locate -f ./data/test.fa -t ./data/test.einverted.tab 

Run the default PhaseFinder locate parameters and remove inverted repeats with GC content lower than 15% and higher than 85% or with homopolymers

python PhaseFinder.py locate -f ./data/test.fa -t ./data/test.einverted.tab -g 15 85 -p 

* Run with the specified einverted parameters "-maxrepeat 750 -gap 100 -threshold 51 -match 5 -mismatch -9" 

python PhaseFinder.py locate -f ./data/test.fa -t ./data/test.einverted.tab -e "-maxrepeat 750 -gap 100 -threshold 51 -match 5 -mismatch -9" 


1.2. Generate the position table with other tools
You can identify regions flanked by inverted repeats directly with tools such as [einverted](http://emboss.sourceforge.net/apps/release/6.6/emboss/apps/einverted.html) and [palindrome](http://emboss.sourceforge.net/apps/cvs/emboss/apps/palindrome.html). 

Prepare the output into the following format:

A table file with five columns (tab delimited):

 Column name | Explanation                                                   |
-------------|---------------------------------------------------------------|
 Scaffold    | The scaffold or contig name where the inverted repeat is detected
 pos A       | The start coordinate of the first inverted repeat (0-based)
 pos B       | The end coordinate of the first inverted repeat (1-based)
 pos C       | The start coordinate of the second inverted repeat (0-based)
 pos D       | The end coordinate of the second inverted repeat (1-based)---

2. Mimic inversion in silico to create a database of inverted sequences

Usage: PhaseFinder.py create [OPTIONS]

  Create inverted fasta file

Options:
  -f, --fasta PATH         Input genome sequence file in fasta format
                           [required]
  -t, --tab PATH           Table with inverted repeat coordinates  [required]
  -s, --flanksize INTEGER  Base pairs of flanking DNA on both sides of the
                           identified inverted repeats  [required]
  -i, --inv PATH           Output path of the inverted fasta file  [required]
  --help                   Show this message and exit.

Input
* The position table from step 1

 Output
* A fasta file containing inverted (R) and non-inverted (F) putative invertible DNA regions flanked by sequences of specified length (bowtie indexed)
* A table file (with suffix ".info.tab") describing the location of inverted repeats in the above fasta file---
3. Align sequence reads to inverted sequence database and calculate the ratio of reads aligning to the F or R orienation. 

Usage: PhaseFinder.py ratio [OPTIONS]

  Align reads to inverted fasta file

Options:
  -i, --inv PATH         Input path of the inverted fasta file  [required]
  -1, --fastq1 PATH      First pair in fastq  [required]
  -2, --fastq2 PATH      Second pair in fastq  [required]
  -p, --threads INTEGER  Number of threads
  -o, --output TEXT      Output prefix  [required]
  --help                 Show this message and exit.

输入 Input
* Output from step 2
* fastq file of genomic or metagenomic sequence used to verify DNA inversion
* Number of threads used for bowtie alignment and samtools process
输出Output
* A table file (with suffix ".ratio.txt") containing the reads that supporting either R or F orientation of invertible DNA

 Column name | Explanation                                                                 |
-------------|-----------------------------------------------------------------------------|
Sequence     | Putative invertible regions(Format:Scaffold:posA-posB-posC-posD)
Pe_F         | The number of reads supprting the F orientation with paired-end information
Pe_R         | The number of reads supprting the R orientation with paired-end information
Pe_ratio     | Pe_R/(Pe_F + Pe_R). The percent of reads supporting the R orientation with the paired-end method
Span_F       | The number of reads supporting the F orientation spanning the inverted repeat by at least 10 bp on either side
Span_R       | The number of reads supporting the R orientation spanning the inverted repeat by at least 10 bp on either side
Span_ratio   | Span_R/(Span_F + Span_R). The percent of reads supporting the R orientation with the spanning method. 

True invertible regions have reads supporting both the F and R orientation. We recommend combining the information from both the paired-end (Pe) and spanning (Span) methods to find valid invertible DNA regions. Our default is to classify a region as invertible if at least 1% of reads support the R orientation with a minimum Pe_R > 5 and Span_R > 3. 

4. (Optional) Subset for intergenic invertible DNA regions 

If you are especially interested in intergenic regulatory regions, such as promoters, you can remove predicted invertible regions overlapping with coding sequences (CDS). First, obtain an annotation for the genome of interest from the NCBI or that you genereate yourself in GFF3 format. Second, subsubset the annotation for CDS regions only. Third, use the following command to process the output of PhaseFinder step 3 to obtain a list of intergenic putative invertible DNA regions.

sed '1d' output_from_phasefinder.ratio.txt| awk '{print $1"\t"$0}'|sed 's/:/\t/;s/-[^\t]*-/\t/'|sortBed |closestBed  -a - -b annotation.gff  -d |awk '$20!=0{print $3}' > intergenic_IDR.txt

Citation
Jiang X, Hall AB, et al. Invertible promoters mediate bacterial phase variation, antibiotic resistance, and host adaptation in the gut, *Science* (2019) [DOI: 10.1126/science.aau5238](http://science.sciencemag.org/content/363/6423/181)http://science.sciencemag.org/content/363/6423/181

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1360085.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

Java学习笔记(四)——正则表达式

文章目录 正则表达式基本规则字符类(只匹配一个字符)预定义字符(只匹配一个字符)数量词练习正则表达式插件 爬虫利用正则表达式获取想要的内容爬取网络信息练习有条件的爬取贪婪爬取非贪婪爬取正则表达式在字符串中的使用 分组捕获分组正则表达式外部使用非捕获分组正则表达式忽…

公共用例库计划--个人版(二)主体界面设计

1、任务概述 计划内容:完成公共用例库的开发实施工作,包括需求分析、系统设计、开发、测试、打包、运行维护等工作。 1.1、 已完成: 需求分析、数据库表的设计:公共用例库计划–个人版(一) 1.2、 本次待…

神经网络-卷积层

卷积 输入通道数, 输出通道数,核大小 参数具体含义 直观理解各个参数的网站(gif) https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md大概长这样,cyan是青色的意思 channel数(终于理解论文里图片放好多层的原因…

VMware ESXI 8 安装ipmitool 调整戴尔服务器风扇转速

本文内容适合ESXI 8版本安装ipmitool ,进行管理,已知的是8.0以上版本无法安装社区的vib.所以需要自己编译文件,7.0及之前的版本可以安装vib版本的ipmtools。 一、编译好的适用于esxi8的ipmitool下载 ipmitool下载 二、安装ipmitool 1、开…

铁塔基站数字化管理监测解决方案

截至2023年10月,我国5G基站总数达321.5万个,占全国通信基站总数的28.1%。然而,随着5G基站数量的快速增长,基站的能耗问题也逐渐日益凸显,基站的用电给运营商带来了巨大的电费开支压力,降低5G基站的能耗成为…

new FormData 同时发送表单 json 以及文件二进制流

需要新增时同时发送表单 json 以及对应的文件即可使用以下方法传参 let formDataParams new FormData(); 首先通过 new FormData() 创建你需要最后发送的表单 接着将你的对象 json 存储,注意使用 new Blob 创建大表单转换成 json 格式。以…

根据MySql的表名,自动生成实体类,模仿ORM框架

ORM框架可以根据数据库的表自动生成实体类,以及相应CRUD操作 本文是一个自动生成实体类的工具,用于生成Mysql表对应的实体类。 新建Winform窗体应用程序AutoGenerateForm,框架(.net framework 4.5), 添加对System.Configuration的…

了解nginx

1,概念 nginx是一个轻量级、高性能的HTTP和反向代理web服务器,同时也是一个通用代理服务器(TCP、UDP、IMAP、POP3、SMTP)。 2,优势 轻量级,占用内存少,启动极快采用事件驱动的异步非阻塞处理方…

linux中的系统安全

一.账号安全 将非登录用户的shell设为/sbin/nologin 系统中用户有三种:超级管理员 普通用户 程序用户 前两种用户可以登录系统,程序用户不给登录 所以称为非登录用户 命令格式: usermod -s /sbin/nologin(改已有用户&#…

亲测表白网制作源码,在线制作表白,无数据库上传就能用

在线制作表白网源码 没有数据库上传就能用 后台/admin 账号密码都是admin

【mars3d】批量关闭矢量数据的startFlicker()闪烁或者全部关闭startFlicker()

问题 1.graphic/entity/billboard怎么能够批量关闭startFlicker()闪烁或者 全部关闭startFlicker()呢? 相关链接 1.http://mars3d.cn/editor-vue.html?idgraphic/entity/billboard 2.http://mars3d.cn/apidoc.html#FlickerEntity 期望效果 1.graphic.stopFlic…

Java:爬虫htmlunit

为什么htmlunit与HttpClient两者都可以爬虫、网页采集、通过网页自动写入数据,我们会推荐使用htmlunit呢? 一、网页的模拟化 首先说说HtmlUnit相对于HttpClient的最明显的一个好处,HtmlUnit更好的将一个网页封装成了一个对象,如果你非要说H…

数字后端设计实现 | 数字后端PR工具Innovus中如何创建不同高度的row?

吾爱IC社区星球学员问题:Innovus后端实现时两种种不同高度的site能做在一个pr里面吗? 答案是可以的。 Innovus支持在同一个设计中中使用不同的row,但需要给各自子模块创建power domain。这里所说的不同高度的row,有两种情况。 1…

数据库高可用mha

MHA搭建的步骤 一.配置主从复制 1.初始化环境 #在四台服务器上初始化环境 systemctl stop firewalld systemctl disable firewalld setenforce 0 2.修改 Master、Slave1、Slave2 节点的主机名 #在Master上 hostnamectl set-hostname mysql1 su#在Slave1 hostnamectl set-h…

102、X^3 : Large-Scale 3D Generative Modeling using Sparse Voxel Hierarchies

简介 官网  Nvidia2023提出的一种新的生成模型,可生成具有任意属性的高分辨率稀疏3D体素网格,以前馈方式生成数百万体素,最细有效分辨率高达 102 4 3 1024^3 10243,而无需耗时的 test-time 优化,使用一种分层体素潜…

LeetCode 2125. 银行中的激光束数量【数组,遍历】1280

本文属于「征服LeetCode」系列文章之一,这一系列正式开始于2021/08/12。由于LeetCode上部分题目有锁,本系列将至少持续到刷完所有无锁题之日为止;由于LeetCode还在不断地创建新题,本系列的终止日期可能是永远。在这一系列刷题文章…

1.3进制,码(8421),化简规则、卡诺图化简、性质,触发器(转换与设计、应用),电路图,电路设计

十进制与原码、反码、补码之间的转换 正数的原码、反码、补码相同,符号位为0 负数的原码为、符号位1,二进制数 反码,符号位不变、其它取反, 补码为:反码最低有效位1 运算 卡诺图化简 奇偶校验码 检查1的个数&…

C语言编译器(C语言编程软件)完全攻略(第二十四部分:Turbo C 2.0使用教程(使用Turbo C 2.0编写C语言程序))

介绍常用C语言编译器的安装、配置和使用。 二十四、Turbo C 2.0使用教程&#xff08;使用Turbo C 2.0编写C语言程序&#xff09; 首先&#xff0c;我们给出一段完整的C语言代码&#xff1a; #include <stdio.h> int main() { puts("hello&#xff0c;world!"…

【mars3d】new mars3d.layer.GeoJsonLayer({实现多孔面遮罩mask: true,

【mars3d】new mars3d.layer.GeoJsonLayer({实现多孔面遮罩 官网测试示例&#xff1a; 1.功能示例(Vue版) | Mars3D三维可视化平台 | 火星科技 测试代码&#xff1a; export function showDraw(isFlyTo) { removeLayer() const geoJsonLayer new mars3d.layer.GeoJsonLaye…

【Spring实战】22 Spring Actuator 入门

文章目录 1. 定义2. 功能3. 依赖4. 配置5. 常用的应用场景1&#xff09;环境监控2&#xff09;运维管理3&#xff09;性能优化 结论 Spring Actuator 是 Spring 框架的一个模块&#xff0c;为开发人员提供了一套强大的监控和管理功能。本文将深入探讨 Spring Actuator 的定义、…