$ gt fingerprint U89959_ests.fas | sort | uniq > U89959_ests.checklist_uniq
NAME
gt-fingerprint - Compute MD5 fingerprints for each sequence given in a set of sequence files.
SYNOPSIS
gt fingerprint [option …] sequence_file […]
DESCRIPTION
- -check [filename]
-
compare all fingerprints contained in the given checklist file with checksums in given sequence_files(s). The comparison is successful, if all fingerprints given in checkfile can be found in the sequence_file(s) in the exact same quantity and vice versa. (default: undefined)
- -duplicates [yes|no]
-
show duplicate fingerprints from given sequence_file(s). (default: no)
- -extract [string]
-
extract the sequence(s) with the given fingerprint from sequence file(s) and show them on stdout. (default: undefined)
- -width [value]
-
set output width for FASTA sequence printing (0 disables formatting) (default: 0)
- -o [filename]
-
redirect output to specified file (default: undefined)
- -gzip [yes|no]
-
write gzip compressed output file (default: no)
- -bzip2 [yes|no]
-
write bzip2 compressed output file (default: no)
- -force [yes|no]
-
force writing to output file (default: no)
- -help
-
display help and exit
- -version
-
display version information and exit
If neither option -check nor option -duplicates is used, the fingerprints for all sequences are shown on stdout.
Fingerprint of a sequence is case insensitive. Thus MD5 fingerprint of two identical sequences will be the same even if one is soft-masked.
Examples
Compute (unified) list of fingerprints:
Compare fingerprints:
$ gt fingerprint -check U89959_ests.checklist_uniq U89959_ests.fas 950b7715ab6cc030a8c810a0dba2dd33 only in sequence_file(s)
Make sure a sequence file contains no duplicates (not the case here):
$ gt fingerprint -duplicates U89959_ests.fas 950b7715ab6cc030a8c810a0dba2dd33 2 gt fingerprint: error: duplicates found: 1 out of 200 (0.500%)
Extract sequence with given fingerprint:
$ gt fingerprint -extract 6d3b4b9db4531cda588528f2c69c0a57 U89959_ests.fas >SQ;8720010 TTTTTTTTTTTTTTTTTCCTGACAAAACCCCAAGACTCAATTTAATCAATCCTCAAATTTACATGATAC CAACGTAATGGGAGCTTAAAAATA
Return values
-
0 everything went fine (-check: the comparison was successful; -duplicates: no duplicates found)
-
1 an error occurred (-check: the comparison was not successful; -duplicates: duplicates found)
REPORTING BUGS
Report bugs to https://github.com/genometools/genometools/issues.