工具下载网址:
01. 下载 SRA Toolkit ·ncbi/sra-tools 维基https://github.com/ncbi/sra-tools/wiki/01.-Downloading-SRA-Toolkit
我下载的是linux 3.0.10版,目前最新版如下:https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/3.1.1/sratoolkit.3.1.1-centos_linux64.tar.gz
prefetch使用
1.prefetch下载单个数据
prefetch SRR547531
2.prefetch批量下载数据
prefetch --option-file list.txt --output-directory /your/download/path
list.txt内容如下:
ERR274310
SRR547531
SRR548277
SRR847503
SRR847504
运行结果如下:
fastq-dump使用
1.fastq-dump转换sra为fastq格式
fastq-dump SRR547531 --gzip --split-3
--split-3 :不知道sra是单端还是双端,默认使用
我发现下载的时候是一个文件夹,文件夹里有一个sra数据,转换的时候,直接输入SRR547531即可完成转换,而不用输入SRR547531/SRR547531.sra
结果如下:
2.fastq-dump批量转换
for id in `cat list.txt`
do
fastq-dump /your/download/path/${id} --gzip --split-3 -O /your/path
done;
附加信息-完整参数
1.prefetch -h
Usage:
prefetch [options] <SRA accession> [...]
Download SRA files and their dependencies
prefetch [options] --cart <kart file>
Download cart file
prefetch [options] <URL> --output-file <FILE>
Download URL to FILE
prefetch [options] <URL> [...] --output-directory <DIRECTORY>
Download URL or URL-s to DIRECTORY
prefetch [options] <SRA file> [...]
Check SRA file for missed dependencies and download them
Options:
-T|--type <value> Specify file type to download. Default: sra
-t|--transport <http|fasp|both> Transport: one of: fasp; http; both
[default]. (fasp only; http only; first try
fasp (ascp), use http if cannot download
using fasp).
--location <value> Location of data.
-N|--min-size <size> Minimum file size to download in KB
(inclusive).
-X|--max-size <size> Maximum file size to download in KB
(exclusive). Default: 20G
-f|--force <yes|no|all|ALL> Force object download: one of: no, yes,
all, ALL. no [default]: skip download if the
object if found and complete; yes: download
it even if it is found and is complete; all:
ignore lock files (stale locks or it is
being downloaded by another process use
at your own risk!); ALL: ignore lock files,
restart download from beginning.
-r|--resume <yes|no> Resume partial downloads: one of: no, yes
[default].
-C|--verify <yes|no> Verify after download: one of: no, yes
[default].
-p|--progress Show progress.
-H|--heartbeat <value> Time period in minutes to display download
progress. (0: no progress), default: 1
--eliminate-quals Don't download QUALITY column.
-c|--check-all Double-check all refseqs.
-S|--check-rs <yes|no|smart> Check for refseqs in downloaded files: one
of: no, yes, smart [default]. Smart: skip
check for large encrypted non-sra files.
-o|--order <kart|size> Kart prefetch order when downloading
kart: one of: kart, size. (in kart order, by
file size: smallest first), default: size.
-R|--rows <rows> Kart rows to download (default all). Row
list should be ordered.
--perm <PATH> PATH to jwt cart file.
--ngc <PATH> PATH to ngc file.
--cart <PATH> To read kart file.
-a|--ascp-path <ascp-binary|private-key-file> Path to ascp program and
private key file (asperaweb_id_dsa.putty)
--ascp-options <value> Arbitrary options to pass to ascp command
line.
-o|--output-file <FILE> Write file to FILE when downloading
single file.
-O|--output-directory <DIRECTORY> Save files to DIRECTORY/
-h|--help Output brief explanation for the program.
-V|--version Display the version of the program then
quit.
-L|--log-level <level> Logging level as number or enum string. One
of (fatal|sys|int|err|warn|info|debug) or
(0-6) Current/default is warn.
-v|--verbose Increase the verbosity of the program
status messages. Use multiple times for more
verbosity. Negates quiet.
-q|--quiet Turn off all status messages for the
program. Negated by verbose.
--option-file <file> Read more options and parameters from the
file.
sratoolkit.3.0.10-centos_linux64/bin/prefetch : 3.0.10
2.fastq-dump -h
Usage:
sratoolkit.3.0.10-centos_linux64/bin/fastq-dump [options] <path> [<path>...]
sratoolkit.3.0.10-centos_linux64/bin/fastq-dump [options] <accession>
INPUT
-A|--accession <accession> Replaces accession derived from <path> in
filename(s) and deflines (only for single
table dump)
--table <table-name> Table name within cSRA object, default is
"SEQUENCE"
PROCESSING
Read Splitting Sequence data may be used in raw form or
split into individual reads
--split-spot Split spots into individual reads
Full Spot Filters Applied to the full spot independently
of --split-spot
-N|--minSpotId <rowid> Minimum spot id
-X|--maxSpotId <rowid> Maximum spot id
--spot-groups <[list]> Filter by SPOT_GROUP (member): name[,...]
-W|--clip Remove adapter sequences from reads
Common Filters Applied to spots when --split-spot is not
set, otherwise - to individual reads
-M|--minReadLen <len> Filter by sequence length >= <len>
-R|--read-filter <[filter]> Split into files by READ_FILTER value
optionally filter by value:
pass|reject|criteria|redacted
-E|--qual-filter Filter used in early 1000 Genomes data: no
sequences starting or ending with >= 10N
--qual-filter-1 Filter used in current 1000 Genomes data
Filters based on alignments Filters are active when alignment
data are present
--aligned Dump only aligned sequences
--unaligned Dump only unaligned sequences
--aligned-region <name[:from-to]> Filter by position on genome. Name can
either be accession.version (ex:
NC_000001.10) or file specific name (ex:
"chr1" or "1"). "from" and "to" are 1-based
coordinates
--matepair-distance <from-to|unknown> Filter by distance between matepairs.
Use "unknown" to find matepairs split
between the references. Use from-to to limit
matepair distance on the same reference
Filters for individual reads Applied only with --split-spot set
--skip-technical Dump only biological reads
OUTPUT
-O|--outdir <path> Output directory, default is working
directory '.' )
-Z|--stdout Output to stdout, all split data become
joined into single stream
--gzip Compress output using gzip: deprecated, not
recommended
--bzip2 Compress output using bzip2: deprecated,
not recommended
Multiple File Options Setting these options will produce more
than 1 file, each of which will be suffixed
according to splitting criteria.
--split-files Write reads into separate files. Read
number will be suffixed to the file name.
NOTE! The `--split-3` option is recommended.
In cases where not all spots have the same
number of reads, this option will produce
files that WILL CAUSE ERRORS in most programs
which process split pair fastq files.
--split-3 3-way splitting for mate-pairs. For each
spot, if there are two biological reads
satisfying filter conditions, the first is
placed in the `*_1.fastq` file, and the
second is placed in the `*_2.fastq` file. If
there is only one biological read
satisfying the filter conditions, it is
placed in the `*.fastq` file.All other
reads in the spot are ignored.
-G|--spot-group Split into files by SPOT_GROUP (member name)
-R|--read-filter <[filter]> Split into files by READ_FILTER value
optionally filter by value:
pass|reject|criteria|redacted
-T|--group-in-dirs Split into subdirectories instead of files
-K|--keep-empty-files Do not delete empty files
FORMATTING
Sequence
-C|--dumpcs <[cskey]> Formats sequence using color space (default
for SOLiD),"cskey" may be specified for
translation
-B|--dumpbase Formats sequence using base space (default
for other than SOLiD).
Quality
-Q|--offset <integer> Offset to use for quality conversion,
default is 33
--fasta <[line width]> FASTA only, no qualities, optional line
wrap width (set to zero for no wrapping)
--suppress-qual-for-cskey suppress quality-value for cskey
Defline
-F|--origfmt Defline contains only original sequence name
-I|--readids Append read id after spot id as
'accession.spot.readid' on defline
--helicos Helicos style defline
--defline-seq <fmt> Defline format specification for sequence.
--defline-qual <fmt> Defline format specification for quality.
<fmt> is string of characters and/or
variables. The variables can be one of: $ac
- accession, $si spot id, $sn spot
name, $sg spot group (barcode), $sl spot
length in bases, $ri read number, $rn
read name, $rl read length in bases. '[]'
could be used for an optional output: if
all vars in [] yield empty values whole
group is not printed. Empty value is empty
string or for numeric variables. Ex:
@$sn[_$rn]/$ri '_$rn' is omitted if name
is empty
OTHER:
--ngc <path> <path> to ngc file
--disable-multithreading disable multithreading
-h|--help Output brief explanation of program usage
-V|--version Display the version of the program
-L|--log-level <level> Logging level as number or enum string One
of (fatal|sys|int|err|warn|info) or (0-5)
Current/default is warn
-v|--verbose Increase the verbosity level of the program
Use multiple times for more verbosity
--ncbi_error_report Control program execution environment
report generation (if implemented). One of
(never|error|always). Default is error
--legacy-report use legacy style 'Written spots' for tool
sratoolkit.3.0.10-centos_linux64/bin/fastq-dump : 3.0.10