-aggressive -moderate -conservative -hmin 3 3 3 -read-hmin 1 1 2 -altmax 1.00 0.99 0.80 -refmin 0.00 0.00 0.10 -mapqmin 0 10 21 -covmin 1 1 1 -clenmax unlimited unlimited unlimited -allow-multiple yes yes no
NAME
gt-hop - Cognate sequence-based homopolymer error correction.
SYNOPSIS
gt hop -<mode> -c <encseq> -map <sam/bam> -reads <fastq> [options…]
DESCRIPTION
- -c [string]
-
cognate sequence (encoded using gt encseq encode)
- -map [string]
-
mapping of reads to the cognate sequence it must be in SAM/BAM format, and sorted by coordinate (can be prepared e.g. using: samtools sort)
- -sam [yes|no]
-
mapping file is SAM default: BAM
- -aggressive [yes|no]
-
correct as much as possible
- -moderate [yes|no]
-
mediate between sensitivity and precision
- -conservative [yes|no]
-
correct only most likely errors
- -expert [yes|no]
-
manually select correction criteria
- -reads
-
uncorrected read file(s) in FastQ format; the corrected reads are output in the currect working directory in files which are named as the input files, each prepended by a prefix (see -outprefix option) -reads allows one to output the reads in the same order as in the input and is mandatory if the SAM contains more than a single primary alignment for each read (e.g. output of bwasw) see also -o option as an alternative
- -outprefix [string]
-
prefix for output filenames (corrected reads)when -reads is specified the prefix is prepended to each input filename (default: hop_)
- -o [string]
-
output file for corrected reads (see also -reads/-outprefix) if -o is used, reads are output in a single file in the order they are found in the SAM file (which usually differ from the original order) this will only work if the reads were aligned with a software which only includes 1 alignment for each read (e.g. bwa) (default: undefined)
- -hmin [value]
-
minimal homopolymer length in cognate sequence (default: 3)
- -read-hmin [value]
-
minimal homopolymer length in reads (default: 2)
- -qmax [value]
-
maximal average quality of homopolymer in a read (default: 120)
- -altmax [value]
-
max support of alternate homopol. length; e.g. 0.8 means: do not correct any read if homop. length in more than 80%% of the reads has the same value, different from the cognate if altmax is set to 1.0 reads are always corrected (default: 0.800000)
- -cogmin [value]
-
min support of cognate sequence homopol. length; e.g. 0.1 means: do not correct any read if cognate homop. length is not present in at least 10%% of the reads if cogmin is set to 0.0 reads are always corrected
- -mapqmin [value]
-
minimal mapping quality (default: 21)
- -covmin [value]
-
minimal coverage; e.g. 5 means: do not correct any read if coverage (number of reads mapped over whole homopolymer) is less than 5 if covmin is set to 1 reads are always corrected (default: 1)
- -allow-muliple [yes|no]
-
allow multiple corrections in a read (default: no)
- -clenmax [value]
-
maximal correction length default: unlimited
- -ann [string]
-
annotation of cognate sequence it must be sorted by coordinates on the cognate sequence (this can be e.g. done using: gt gff3 -sort) if -ann is used, corrections will be limited to homopolymers startingor ending inside the feature type indicated by -ft optionformat: sorted GFF3 (default: undefined)
- -ft [string]
-
feature type to use when -ann option is specified (default: CDS)
- -v [yes|no]
-
be verbose (default: no)
- -help
-
display help for basic options and exit
- -help+
-
display help for all options and exit
- -version
-
display version information and exit
Correction mode:
One of the options -aggressive, -moderate, -conservative or -expert must be selected.
The -aggressive, -moderate and -conservative modes are presets of the criteria by which it is decided if an observed discrepancy in homopolymer length between cognate sequence and a read shall be corrected or not. A description of the single criteria is provided by using the -help+' option. The presets are equivalent to the following settings:
The aggressive mode tries to maximize the sensitivity, the conservative mode to minimize the false positives. An even more conservative set of corrections can be achieved using the -ann option (see -help+).
The -expert mode allows one to manually set each parameter; the default values are the same as in the -conservative mode.
(Finally, for evaluation purposes only, the -state-of-truth mode can be used: this mode assumes that the sequenced genome has been specified as cognate sequence and outputs an ideal list of corrections.)
REPORTING BUGS
Report bugs to https://github.com/genometools/genometools/issues.