gt-shredder - Shredder sequence file(s) into consecutive pieces of random length.


gt shredder [option …] [sequence_file …]


-coverage [value]

set the number of times the sequence_file is shreddered (default: 1)

-minlength [value]

set the minimum length of the shreddered fragments (default: 300)

-maxlength [value]

set the maximum length of the shreddered fragments (default: 700)

-overlap [value]

set the overlap between consecutive pieces (default: 0)

-sample [value]

take samples of the generated sequences pieces with the given probability (default: 1.000000)

-clipdesc [yes|no]

clip descriptions after first space (fooled by \t, \n etc) adds offset and length to ensure unique identifier (default: no)

-width [value]

set output width for FASTA sequence printing (0 disables formatting) (default: 0)

-o [filename]

redirect output to specified file (default: undefined)

-gzip [yes|no]

write gzip compressed output file (default: no)

-bzip2 [yes|no]

write bzip2 compressed output file (default: no)

-force [yes|no]

force writing to output file (default: no)


display help and exit


display version information and exit

Each sequence given in sequence_file is shreddered into consecutive pieces of random length (between -minlength and -maxlength) until it is consumed. By this means the last shreddered fragment of a given sequence can be shorter than the argument to option -minlength. To get rid of such fragments use gt seqfilter (see example below).


Shredder a given BAC:

$ gt shredder U89959_genomic.fas > fragments.fas

Shredder an EST collection into pieces between 50 and 100 bp and get rid of all (terminal) fragments shorter than 50 bp:

$ gt shredder -minlength 50 -maxlength 100 U89959_ests.fas \
  | gt seqfilter -minlength 50 - > fragments.fas
# 130 out of 1260 sequences have been removed (10.317%)

Shredder an EST collection and show only random 10% of the resulting fragments:

$ gt shredder -sample 0.1 U89959_ests.fas