NAME

gt-encseq-encode - Encode sequence files (FASTA/FASTQ, GenBank, EMBL) efficiently.

SYNOPSIS

gt encseq encode sequence_file [sequence_file [sequence_file …]]

DESCRIPTION

-showstats [yes|no]

show compression results (default: no)

-ssp [yes|no]

output sequence separator positions to file (default: yes)

-des [yes|no]

output sequence descriptions to file (default: yes)

-sds [yes|no]

output sequence description separator positions to file (default: yes)

-md5 [yes|no]

output MD5 sums to file (default: yes)

-clipdesc [yes|no]

clip descriptions after first whitespace (default: no)

-sat [string]

specify kind of sequence representation by one of the keywords direct, bytecompress, eqlen, bit, uchar, ushort, uint32 (default: undefined)

-dna [yes|no]

input is DNA sequence (default: no)

-protein [yes|no]

input is protein sequence (default: no)

-plain [yes|no]

process as plain text (default: no)

-dust [yes|no]

mask low-complexity regions using the dust algorithm (default: no)

-dustwindow [value]

windowsize for the dust algorithm (default: 64)

-dustthreshold [value]

threshold for the dust algorithm (default: 2.000000)

-dustlink [value]

Max. distance between regions masked by dust before merging. (default: 1)

-indexname [string]

specify name for index to be generated (default: undefined)

-smap [string]

specify file containing a symbol mapping (default: undefined)

-lossless [yes|no]

allow lossless original sequence retrieval (default: no)

-v [yes|no]

be verbose (default: no)

-help

display help for basic options and exit

-help+

display help for all options and exit

-version

display version information and exit

REPORTING BUGS