SRA Toolkit简单使用(prefetch和fastq-dump)

 我下载的是linux 3.0.10版,目前最新版如下:



prefetch SRR547531


prefetch --option-file list.txt --output-directory /your/download/path






fastq-dump SRR547531  --gzip --split-3

--split-3 :不知道sra是单端还是双端,默认使用




for id in `cat list.txt`
fastq-dump  /your/download/path/${id} --gzip --split-3 -O /your/path


1.prefetch -h

  prefetch [options] <SRA accession> [...]
  Download SRA files and their dependencies

  prefetch [options] --cart <kart file>
  Download cart file

  prefetch [options] <URL> --output-file <FILE>
  Download URL to FILE

  prefetch [options] <URL> [...] --output-directory <DIRECTORY>
  Download URL or URL-s to DIRECTORY

  prefetch [options] <SRA file> [...]
  Check SRA file for missed dependencies and download them

  -T|--type <value>                Specify file type to download. Default: sra 
  -t|--transport <http|fasp|both>  Transport: one of: fasp; http; both 
                                   [default]. (fasp only; http only; first try 
                                   fasp (ascp), use http if cannot download 
                                   using fasp). 
  --location <value>               Location of data. 

  -N|--min-size <size>             Minimum file size to download in KB 
  -X|--max-size <size>             Maximum file size to download in KB 
                                   (exclusive). Default: 20G 
  -f|--force <yes|no|all|ALL>      Force object download: one of: no, yes, 
                                   all, ALL. no [default]: skip download if the 
                                   object if found and complete; yes: download 
                                   it even if it is found and is complete; all: 
                                   ignore lock files (stale locks or it is 
                                   being downloaded by another process use 
                                   at your own risk!); ALL: ignore lock files, 
                                   restart download from beginning. 
  -r|--resume <yes|no>             Resume partial downloads: one of: no, yes 
  -C|--verify <yes|no>             Verify after download: one of: no, yes 
  -p|--progress                    Show progress. 
  -H|--heartbeat <value>           Time period in minutes to display download 
                                   progress. (0: no progress), default: 1 

  --eliminate-quals                Don't download QUALITY column. 
  -c|--check-all                   Double-check all refseqs. 
  -S|--check-rs <yes|no|smart>     Check for refseqs in downloaded files: one 
                                   of: no, yes, smart [default]. Smart: skip 
                                   check for large encrypted non-sra files. 
  -o|--order <kart|size>           Kart prefetch order when downloading 
                                   kart: one of: kart, size. (in kart order, by 
                                   file size: smallest first), default: size. 
  -R|--rows <rows>                 Kart rows to download (default all). Row 
                                   list should be ordered. 
  --perm <PATH>                    PATH to jwt cart file. 
  --ngc <PATH>                     PATH to ngc file. 
  --cart <PATH>                    To read kart file. 

  -a|--ascp-path <ascp-binary|private-key-file>  Path to ascp program and 
                                   private key file (asperaweb_id_dsa.putty) 
  --ascp-options <value>           Arbitrary options to pass to ascp command 

  -o|--output-file <FILE>          Write file to FILE when downloading 
                                   single file. 
  -O|--output-directory <DIRECTORY>  Save files to DIRECTORY/ 

  -h|--help                        Output brief explanation for the program. 
  -V|--version                     Display the version of the program then 
  -L|--log-level <level>           Logging level as number or enum string. One 
                                   of (fatal|sys|int|err|warn|info|debug) or 
                                   (0-6) Current/default is warn. 
  -v|--verbose                     Increase the verbosity of the program 
                                   status messages. Use multiple times for more 
                                   verbosity. Negates quiet. 
  -q|--quiet                       Turn off all status messages for the 
                                   program. Negated by verbose. 
  --option-file <file>             Read more options and parameters from the 

sratoolkit.3.0.10-centos_linux64/bin/prefetch : 3.0.10

2.fastq-dump -h

  sratoolkit.3.0.10-centos_linux64/bin/fastq-dump [options] <path> [<path>...]
  sratoolkit.3.0.10-centos_linux64/bin/fastq-dump [options] <accession>

  -A|--accession <accession>       Replaces accession derived from <path> in 
                                   filename(s) and deflines (only for single 
                                   table dump) 
  --table <table-name>             Table name within cSRA object, default is 


Read Splitting                     Sequence data may be used in raw form or
                                     split into individual reads
  --split-spot                     Split spots into individual reads 

Full Spot Filters                  Applied to the full spot independently
                                     of --split-spot
  -N|--minSpotId <rowid>           Minimum spot id 
  -X|--maxSpotId <rowid>           Maximum spot id 
  --spot-groups <[list]>           Filter by SPOT_GROUP (member): name[,...] 
  -W|--clip                        Remove adapter sequences from reads 

Common Filters                     Applied to spots when --split-spot is not
                                     set, otherwise - to individual reads
  -M|--minReadLen <len>            Filter by sequence length >= <len> 
  -R|--read-filter <[filter]>      Split into files by READ_FILTER value 
                                   optionally filter by value: 
  -E|--qual-filter                 Filter used in early 1000 Genomes data: no 
                                   sequences starting or ending with >= 10N 
  --qual-filter-1                  Filter used in current 1000 Genomes data 

Filters based on alignments        Filters are active when alignment
                                     data are present
  --aligned                        Dump only aligned sequences 
  --unaligned                      Dump only unaligned sequences 
  --aligned-region <name[:from-to]>  Filter by position on genome. Name can 
                                   either be accession.version (ex: 
                                   NC_000001.10) or file specific name (ex: 
                                   "chr1" or "1"). "from" and "to" are 1-based 
  --matepair-distance <from-to|unknown>  Filter by distance between matepairs. 
                                   Use "unknown" to find matepairs split 
                                   between the references. Use from-to to limit 
                                   matepair distance on the same reference 

Filters for individual reads       Applied only with --split-spot set
  --skip-technical                 Dump only biological reads 

  -O|--outdir <path>               Output directory, default is working 
                                   directory '.' ) 
  -Z|--stdout                      Output to stdout, all split data become 
                                   joined into single stream 
  --gzip                           Compress output using gzip: deprecated, not 
  --bzip2                          Compress output using bzip2: deprecated, 
                                   not recommended 

Multiple File Options              Setting these options will produce more
                                     than 1 file, each of which will be suffixed
                                     according to splitting criteria.
  --split-files                    Write reads into separate files. Read 
                                   number will be suffixed to the file name.  
                                   NOTE! The `--split-3` option is recommended. 
                                   In cases where not all spots have the same 
                                   number of reads, this option will produce 
                                   files that WILL CAUSE ERRORS in most programs 
                                   which process split pair fastq files. 
  --split-3                        3-way splitting for mate-pairs. For each 
                                   spot, if there are two biological reads 
                                   satisfying filter conditions, the first is 
                                   placed in the `*_1.fastq` file, and the 
                                   second is placed in the `*_2.fastq` file. If 
                                   there is only one biological read 
                                   satisfying the filter conditions, it is 
                                   placed in the `*.fastq` file.All other 
                                   reads in the spot are ignored. 
  -G|--spot-group                  Split into files by SPOT_GROUP (member name) 
  -R|--read-filter <[filter]>      Split into files by READ_FILTER value 
                                   optionally filter by value: 
  -T|--group-in-dirs               Split into subdirectories instead of files 
  -K|--keep-empty-files            Do not delete empty files 


  -C|--dumpcs <[cskey]>            Formats sequence using color space (default 
                                   for SOLiD),"cskey" may be specified for 
  -B|--dumpbase                    Formats sequence using base space (default 
                                   for other than SOLiD). 

  -Q|--offset <integer>            Offset to use for quality conversion, 
                                   default is 33 
  --fasta <[line width]>           FASTA only, no qualities, optional line 
                                   wrap width (set to zero for no wrapping) 
  --suppress-qual-for-cskey        suppress quality-value for cskey 

  -F|--origfmt                     Defline contains only original sequence name 
  -I|--readids                     Append read id after spot id as 
                                   '' on defline 
  --helicos                        Helicos style defline 
  --defline-seq <fmt>              Defline format specification for sequence. 
  --defline-qual <fmt>             Defline format specification for quality. 
                                   <fmt> is string of characters and/or 
                                   variables. The variables can be one of: $ac 
                                   - accession, $si spot id, $sn spot 
                                   name, $sg spot group (barcode), $sl spot 
                                   length in bases, $ri read number, $rn 
                                   read name, $rl read length in bases. '[]' 
                                   could be used for an optional output: if 
                                   all vars in [] yield empty values whole 
                                   group is not printed. Empty value is empty 
                                   string or for numeric variables. Ex: 
                                   @$sn[_$rn]/$ri '_$rn' is omitted if name 
                                   is empty
  --ngc <path>                     <path> to ngc file 
  --disable-multithreading         disable multithreading 
  -h|--help                        Output brief explanation of program usage 
  -V|--version                     Display the version of the program 
  -L|--log-level <level>           Logging level as number or enum string One 
                                   of (fatal|sys|int|err|warn|info) or (0-5) 
                                   Current/default is warn 
  -v|--verbose                     Increase the verbosity level of the program 
                                   Use multiple times for more verbosity 
  --ncbi_error_report              Control program execution environment 
                                   report generation (if implemented). One of 
                                   (never|error|always). Default is error 
  --legacy-report                  use legacy style 'Written spots' for tool 

sratoolkit.3.0.10-centos_linux64/bin/fastq-dump : 3.0.10




