NAME

gt-genomediff - Calculates Kr: pairwise distances between genomes.

SYNOPSIS

gt genomediff [option …] (INDEX | -indexname NAME SEQFILE SEQFILE […])

DESCRIPTION

-indextype []

specify type of index, one of: esa|pck|encseq. Where encseq is an encoded sequence and an enhanced suffix array will be constructed only in memory. (default: encseq)

-indexname [string]

Basename of encseq to construct. (default: undefined)

-unitfile [filename]

specifies genomic units, see below for description. (default: undefined)

-mirrored [yes|no]

virtually append the reverse complement of each sequence (default: no)

-pl [value]

specify prefix length for bucket sort recommendation: use without argument; then a reasonable prefix length is automatically determined. (default: 0)

-dc [value]

specify difference cover value (default: 0)

-memlimit [string]

specify maximal amount of memory to be used during index construction (in bytes, the keywords MB and GB are allowed) (default: undefined)

-scan [yes|no]

do not load esa index but scan it sequentially. (default: yes)

-thr [value]

Threshold for difference (du, dl) in divergence calculation. default: 1e-9

-abs_err [value]

absolute error for expected shulen calculation. default: 1e-5

-rel_err [value]

relative error for expected shulen calculation. default: 1e-3

-M [value]

threshold for minimum logarithm. default: DBL_MIN

-v [yes|no]

be verbose (default: no)

-help

display help for basic options and exit

-help+

display help for all options and exit

-version

display version information and exit

The genomediff tool only accepts DNA input.

When used with sequence files or encseq, an enhanced suffix array will be built in memory. The ESA will not be created completely, but construction will use -memlimit as a threshold and build it partwise, calculating the Shu-length for each part.

File format for option -unitfile (in Lua syntax):

units = {
 genome1 = { "path/file1.fa", "file2.fa" },
 genome2 = { "file3.fa", "path/file4.fa" }
}

Give the path to the files as they were given to the encseq tool! You can use

$ gt encseq info INDEXNAME

to get a list of files in an encoded sequence.

Comment lines in Lua start with -- and will be ignored.

See GTDIR/testdata/genomediff/unitfile1.lua for an example.

Options -pl, -dc and -memlimit are options to influence ESA construction.

REPORTING BUGS

Report bugs to <willrodt@zbh.uni-hamburg.de>.