SRA Toolkit简单使用(prefetch和fastq-dump)

news2024/12/28 6:09:02

工具下载网址:

01. 下载 SRA Toolkit ·ncbi/sra-tools 维基icon-default.png?t=O83Ahttps://github.com/ncbi/sra-tools/wiki/01.-Downloading-SRA-Toolkit

 我下载的是linux 3.0.10版,目前最新版如下:https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/3.1.1/sratoolkit.3.1.1-centos_linux64.tar.gz

prefetch使用 

1.prefetch下载单个数据

prefetch SRR547531

2.prefetch批量下载数据
 

prefetch --option-file list.txt --output-directory /your/download/path

list.txt内容如下:

ERR274310  
SRR547531  
SRR548277  
SRR847503  
SRR847504  

运行结果如下:


fastq-dump使用

1.fastq-dump转换sra为fastq格式

fastq-dump SRR547531  --gzip --split-3

--split-3 :不知道sra是单端还是双端,默认使用

我发现下载的时候是一个文件夹,文件夹里有一个sra数据,转换的时候,直接输入SRR547531即可完成转换,而不用输入SRR547531/SRR547531.sra

结果如下:

2.fastq-dump批量转换

for id in `cat list.txt`
do
fastq-dump  /your/download/path/${id} --gzip --split-3 -O /your/path
done;

附加信息-完整参数

1.prefetch -h

Usage:
  prefetch [options] <SRA accession> [...]
  Download SRA files and their dependencies

  prefetch [options] --cart <kart file>
  Download cart file

  prefetch [options] <URL> --output-file <FILE>
  Download URL to FILE

  prefetch [options] <URL> [...] --output-directory <DIRECTORY>
  Download URL or URL-s to DIRECTORY

  prefetch [options] <SRA file> [...]
  Check SRA file for missed dependencies and download them


Options:
  -T|--type <value>                Specify file type to download. Default: sra 
  -t|--transport <http|fasp|both>  Transport: one of: fasp; http; both 
                                   [default]. (fasp only; http only; first try 
                                   fasp (ascp), use http if cannot download 
                                   using fasp). 
  --location <value>               Location of data. 

  -N|--min-size <size>             Minimum file size to download in KB 
                                   (inclusive). 
  -X|--max-size <size>             Maximum file size to download in KB 
                                   (exclusive). Default: 20G 
  -f|--force <yes|no|all|ALL>      Force object download: one of: no, yes, 
                                   all, ALL. no [default]: skip download if the 
                                   object if found and complete; yes: download 
                                   it even if it is found and is complete; all: 
                                   ignore lock files (stale locks or it is 
                                   being downloaded by another process use 
                                   at your own risk!); ALL: ignore lock files, 
                                   restart download from beginning. 
  -r|--resume <yes|no>             Resume partial downloads: one of: no, yes 
                                   [default]. 
  -C|--verify <yes|no>             Verify after download: one of: no, yes 
                                   [default]. 
  -p|--progress                    Show progress. 
  -H|--heartbeat <value>           Time period in minutes to display download 
                                   progress. (0: no progress), default: 1 

  --eliminate-quals                Don't download QUALITY column. 
  -c|--check-all                   Double-check all refseqs. 
  -S|--check-rs <yes|no|smart>     Check for refseqs in downloaded files: one 
                                   of: no, yes, smart [default]. Smart: skip 
                                   check for large encrypted non-sra files. 
  -o|--order <kart|size>           Kart prefetch order when downloading 
                                   kart: one of: kart, size. (in kart order, by 
                                   file size: smallest first), default: size. 
  -R|--rows <rows>                 Kart rows to download (default all). Row 
                                   list should be ordered. 
  --perm <PATH>                    PATH to jwt cart file. 
  --ngc <PATH>                     PATH to ngc file. 
  --cart <PATH>                    To read kart file. 

  -a|--ascp-path <ascp-binary|private-key-file>  Path to ascp program and 
                                   private key file (asperaweb_id_dsa.putty) 
  --ascp-options <value>           Arbitrary options to pass to ascp command 
                                   line. 

  -o|--output-file <FILE>          Write file to FILE when downloading 
                                   single file. 
  -O|--output-directory <DIRECTORY>  Save files to DIRECTORY/ 

  -h|--help                        Output brief explanation for the program. 
  -V|--version                     Display the version of the program then 
                                   quit. 
  -L|--log-level <level>           Logging level as number or enum string. One 
                                   of (fatal|sys|int|err|warn|info|debug) or 
                                   (0-6) Current/default is warn. 
  -v|--verbose                     Increase the verbosity of the program 
                                   status messages. Use multiple times for more 
                                   verbosity. Negates quiet. 
  -q|--quiet                       Turn off all status messages for the 
                                   program. Negated by verbose. 
  --option-file <file>             Read more options and parameters from the 
                                   file. 

sratoolkit.3.0.10-centos_linux64/bin/prefetch : 3.0.10

2.fastq-dump -h

Usage:
  sratoolkit.3.0.10-centos_linux64/bin/fastq-dump [options] <path> [<path>...]
  sratoolkit.3.0.10-centos_linux64/bin/fastq-dump [options] <accession>

INPUT
  -A|--accession <accession>       Replaces accession derived from <path> in 
                                   filename(s) and deflines (only for single 
                                   table dump) 
  --table <table-name>             Table name within cSRA object, default is 
                                   "SEQUENCE" 

PROCESSING

Read Splitting                     Sequence data may be used in raw form or
                                     split into individual reads
  --split-spot                     Split spots into individual reads 

Full Spot Filters                  Applied to the full spot independently
                                     of --split-spot
  -N|--minSpotId <rowid>           Minimum spot id 
  -X|--maxSpotId <rowid>           Maximum spot id 
  --spot-groups <[list]>           Filter by SPOT_GROUP (member): name[,...] 
  -W|--clip                        Remove adapter sequences from reads 

Common Filters                     Applied to spots when --split-spot is not
                                     set, otherwise - to individual reads
  -M|--minReadLen <len>            Filter by sequence length >= <len> 
  -R|--read-filter <[filter]>      Split into files by READ_FILTER value 
                                   optionally filter by value: 
                                   pass|reject|criteria|redacted 
  -E|--qual-filter                 Filter used in early 1000 Genomes data: no 
                                   sequences starting or ending with >= 10N 
  --qual-filter-1                  Filter used in current 1000 Genomes data 

Filters based on alignments        Filters are active when alignment
                                     data are present
  --aligned                        Dump only aligned sequences 
  --unaligned                      Dump only unaligned sequences 
  --aligned-region <name[:from-to]>  Filter by position on genome. Name can 
                                   either be accession.version (ex: 
                                   NC_000001.10) or file specific name (ex: 
                                   "chr1" or "1"). "from" and "to" are 1-based 
                                   coordinates 
  --matepair-distance <from-to|unknown>  Filter by distance between matepairs. 
                                   Use "unknown" to find matepairs split 
                                   between the references. Use from-to to limit 
                                   matepair distance on the same reference 

Filters for individual reads       Applied only with --split-spot set
  --skip-technical                 Dump only biological reads 

OUTPUT
  -O|--outdir <path>               Output directory, default is working 
                                   directory '.' ) 
  -Z|--stdout                      Output to stdout, all split data become 
                                   joined into single stream 
  --gzip                           Compress output using gzip: deprecated, not 
                                   recommended 
  --bzip2                          Compress output using bzip2: deprecated, 
                                   not recommended 

Multiple File Options              Setting these options will produce more
                                     than 1 file, each of which will be suffixed
                                     according to splitting criteria.
  --split-files                    Write reads into separate files. Read 
                                   number will be suffixed to the file name.  
                                   NOTE! The `--split-3` option is recommended. 
                                   In cases where not all spots have the same 
                                   number of reads, this option will produce 
                                   files that WILL CAUSE ERRORS in most programs 
                                   which process split pair fastq files. 
  --split-3                        3-way splitting for mate-pairs. For each 
                                   spot, if there are two biological reads 
                                   satisfying filter conditions, the first is 
                                   placed in the `*_1.fastq` file, and the 
                                   second is placed in the `*_2.fastq` file. If 
                                   there is only one biological read 
                                   satisfying the filter conditions, it is 
                                   placed in the `*.fastq` file.All other 
                                   reads in the spot are ignored. 
  -G|--spot-group                  Split into files by SPOT_GROUP (member name) 
  -R|--read-filter <[filter]>      Split into files by READ_FILTER value 
                                   optionally filter by value: 
                                   pass|reject|criteria|redacted 
  -T|--group-in-dirs               Split into subdirectories instead of files 
  -K|--keep-empty-files            Do not delete empty files 

FORMATTING

Sequence
  -C|--dumpcs <[cskey]>            Formats sequence using color space (default 
                                   for SOLiD),"cskey" may be specified for 
                                   translation 
  -B|--dumpbase                    Formats sequence using base space (default 
                                   for other than SOLiD). 

Quality
  -Q|--offset <integer>            Offset to use for quality conversion, 
                                   default is 33 
  --fasta <[line width]>           FASTA only, no qualities, optional line 
                                   wrap width (set to zero for no wrapping) 
  --suppress-qual-for-cskey        suppress quality-value for cskey 

Defline
  -F|--origfmt                     Defline contains only original sequence name 
  -I|--readids                     Append read id after spot id as 
                                   'accession.spot.readid' on defline 
  --helicos                        Helicos style defline 
  --defline-seq <fmt>              Defline format specification for sequence. 
  --defline-qual <fmt>             Defline format specification for quality. 
                                   <fmt> is string of characters and/or 
                                   variables. The variables can be one of: $ac 
                                   - accession, $si spot id, $sn spot 
                                   name, $sg spot group (barcode), $sl spot 
                                   length in bases, $ri read number, $rn 
                                   read name, $rl read length in bases. '[]' 
                                   could be used for an optional output: if 
                                   all vars in [] yield empty values whole 
                                   group is not printed. Empty value is empty 
                                   string or for numeric variables. Ex: 
                                   @$sn[_$rn]/$ri '_$rn' is omitted if name 
                                   is empty
 
OTHER:
  --ngc <path>                     <path> to ngc file 
  --disable-multithreading         disable multithreading 
  -h|--help                        Output brief explanation of program usage 
  -V|--version                     Display the version of the program 
  -L|--log-level <level>           Logging level as number or enum string One 
                                   of (fatal|sys|int|err|warn|info) or (0-5) 
                                   Current/default is warn 
  -v|--verbose                     Increase the verbosity level of the program 
                                   Use multiple times for more verbosity 
  --ncbi_error_report              Control program execution environment 
                                   report generation (if implemented). One of 
                                   (never|error|always). Default is error 
  --legacy-report                  use legacy style 'Written spots' for tool 

sratoolkit.3.0.10-centos_linux64/bin/fastq-dump : 3.0.10

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2266784.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

Spring Boot介绍、入门案例、环境准备、POM文件解读

文章目录 1.Spring Boot(脚手架)2.微服务3.环境准备3.1创建SpringBoot项目3.2导入SpringBoot相关依赖3.3编写一个主程序&#xff1b;启动Spring Boot应用3.4编写相关的Controller、Service3.5运行主程序测试3.6简化部署 4.Hello World探究4.1POM文件4.1.1父项目4.1.2父项目的父…

【开源框架】从零到一:AutoGen Studio开源框架-UI层环境安装与智能体操作全攻略

一、什么是AutoGen AutoGen是微软推出的一款工具&#xff0c;旨在帮助开发者轻松创建基于大语言模型的复杂应用程序。在传统上&#xff0c;开发者需要具备设计、实施和优化工作流程的专业知识&#xff0c;而AutoGen则通过自动化这些流程&#xff0c;简化了搭建和优化的过程。 …

【论文阅读】MedCLIP: Contrastive Learning from Unpaired Medical Images and Text

【论文阅读】MedCLIP: Contrastive Learning from Unpaired Medical Images and Text 1.论文背景与动机2.MedCLIP的贡献3.提出的方法4.构建语义相似矩阵的过程5. 实验6. 结论与局限性 论文地址&#xff1a; pdf github地址&#xff1a;项目地址 Zifeng Wang, Zhenbang Wu, Di…

雷电模拟器安装Lxposed

雷电模拟器最新版支持Lxposed。记录一下安装过程 首先到官网下载并安装最新版&#xff0c;我安装的时候最新版是9.1.34.0&#xff0c;64位 然后开启root和系统文件读写 然后下载magisk-delta-6并安装 ,这个是吾爱破解论坛提供的&#xff0c;号称适配安卓7以上所有机型&#x…

全解:Redis RDB持久化和AOF持久化

&#x1f9d1; 博主简介&#xff1a;CSDN博客专家&#xff0c;历代文学网&#xff08;PC端可以访问&#xff1a;https://literature.sinhy.com/#/literature?__c1000&#xff0c;移动端可微信小程序搜索“历代文学”&#xff09;总架构师&#xff0c;15年工作经验&#xff0c;…

VMware虚拟机安装银河麒麟操作系统KylinOS教程(超详细)

目录 引言1. 下载2. 安装 VMware2. 安装银河麒麟操作系统2.1 新建虚拟机2.2 安装操作系统2.3 网络配置 3. 安装VMTools 创作不易&#xff0c;禁止转载抄袭&#xff01;&#xff01;&#xff01;违者必究&#xff01;&#xff01;&#xff01; 创作不易&#xff0c;禁止转载抄袭…

HTML5实现喜庆的新年快乐网页源码

HTML5实现喜庆的新年快乐网页源码 前言一、设计来源1.1 主界面1.2 关于新年界面1.3 新年庆祝活动界面1.4 新年活动组织界面1.5 新年祝福订阅界面1.6 联系我们界面 二、效果和源码2.1 动态效果2.2 源代码 源码下载结束语 HTML5实现喜庆的新年快乐网页源码&#xff0c;春节新年网…

5G学习笔记之Non-Public Network

目录 0. NPN系列 1. 概述 2. SNPN 2.1 SNPN概述 2.2 SNPN架构 2.3 SNPN部署 2.3.1 完全独立 2.3.2 共享PLMN基站 2.3.3 共享PLMN基站和PLMN频谱 3. PNI-NPN 3.1 PNI-NPN概述 3.2 PNI-NPN部署 3.2.1 UPF独立 3.2.2 完全共享 0. NPN系列 1. NPN概述 2. NPN R18 3. 【SNPN系列】S…

大语言模型(LLM)中大数据的压缩存储及其重要性

在大型语言模型&#xff08;LLM&#xff09;中&#xff0c;KV Cache&#xff08;键值缓存&#xff09;的压缩方法及其重要性。 为什么要压缩KV Cache&#xff1f; 计算效率&#xff1a;在生成文本的过程中&#xff0c;每个生成的token都需要与之前所有的token的键值&#xff…

canvas之进度条

canvas之进度条 效果&#xff1a; 封装的组件 <template><div class"circle" :style"{ width: props.radius px, height: props.radius px }"><div class"circle-bg" :style"{ width: props.radius - 5 px, height: pr…

Boost之log日志使用

不讲理论&#xff0c;直接上在程序中可用代码&#xff1a; 一、引入Boost模块 开发环境&#xff1a;Visual Studio 2017 Boost库版本&#xff1a;1.68.0 安装方式&#xff1a;Nuget 安装命令&#xff1a; #只安装下面几个即可 Install-package boost -version 1.68.0 Install…

18.springcloud_openfeign之扩展组件二

文章目录 一、前言二、子容器默认组件FeignClientsConfigurationDecoder的注入Contract约定 对注解的支持对类上注解的支持对方法上注解的支持对参数上注解的支持MatrixVariablePathVariableRequestParamRequestHeaderSpringQueryMapRequestPartCookieValue FormattingConversi…

7-8 N皇后问题

目录 题目描述 输入格式: 输出格式: 输入样例: 输出样例: 解题思路&#xff1a; 详细代码&#xff08;dfs&#xff09;&#xff1a; 简单代码&#xff08;打表&#xff09;&#xff1a; 题目描述 在NN格的国际象棋盘上摆放N个皇后&#xff0c;使其不能互相攻击&#xff0c;即任…

现代网络负载均衡与代理导论

大家觉得有有参考意义和帮助记得及时关注和点赞&#xff01;&#xff01;&#xff01; Service mesh 是近两年网络、容器编排和微服务领域最火热的话题之一。Envoy 是目前 service mesh 数据平面的首选组件。Matt Klein 是 Envoy 的设计者和核心开发。 文章循序渐进&#xff0…

Kubernetes Gateway API-2-跨命名空间路由

1 跨命名空间路由 Gateway API 具有跨命名空间路由的核心支持。当多个用户或团队共享底层网络基础设施时,这很有用,但必须对控制和配置进行分段,以尽量减少访问和容错域。 Gateway 和 Route(HTTPRoute,TCPRoute,GRPCRoute) 可以部署到不同的命名空间中,路由可以跨命名空间…

Wend看源码-Java-集合学习(List)

摘要 本篇文章深入探讨了基于JDK 21版本的Java.util包中提供的多样化集合类型。在Java中集合共分类为三种数据结构&#xff1a;List、Set和Queue。本文将详细阐述这些数据类型的各自实现&#xff0c;并按照线程安全性进行分类&#xff0c;分别介绍非线程安全与线程安全的实现方…

集成方案 | Docusign + 蓝凌 EKP,打造一站式合同管理平台,实现无缝协作!

本文将详细介绍 Docusign 与蓝凌 EKP 的集成步骤及其效果&#xff0c;并通过实际应用场景来展示 Docusign 的强大集成能力&#xff0c;以证明 Docusign 集成功能的高效性和实用性。 在当今数字化办公环境中&#xff0c;企业对于提高工作效率和提升用户体验的需求日益迫切。蓝凌…

突围边缘:OpenAI开源实时嵌入式API,AI触角延伸至微观世界

当OpenAI宣布开源其名为openai-realtime-embedded-sdk的实时嵌入式API时&#xff0c;整个科技界都为之震惊。这一举动意味着&#xff0c;曾经遥不可及的强大AI能力&#xff0c;如今可以被嵌入到像ESP32这样的微型控制器中&#xff0c;真正地将AI的触角延伸到了物联网和边缘计算…

webrtc-internals调试工具

Google 的 Chrome&#xff08;87 或更高版本&#xff09;WebRTC 内部工具是一套内置于 Chrome 浏览器中的调试工具; webrtc-internals 能够查看有关视频和音频轨道、使用的编解码器以及流的一般质量的详细信息。这些知识对于解决音频和视频质量差的问题非常有帮助。 webrtc-int…

使用Webpack构建微前端应用

英文社区对 Webpack Module Federation 的响应非常热烈&#xff0c;甚至被誉为“A game-changer in JavaScript architecture”&#xff0c;相对而言国内对此热度并不高&#xff0c;这一方面是因为 MF 强依赖于 Webpack5&#xff0c;升级成本有点高&#xff1b;另一方面是国内已…