DESCRIPTION

Usage:

  • cdbfasta <fastafile> [-o <index_file>] [-r <record_delimiter>]

  • [-z <compressed_db>] [-i] [-m|-n <numkeys>|-f<LIST>]|-c|-C]

  • [-w <stopwords_list>] [-s <stripendchars>] [-v]

  • Creates an index file for records from a multi-fasta file. By default (without -m/-n/-c/-C option), only the first space-delimited token from the defline is used as a key.

  • <fastafile> is the multi-fasta file to index; -o the index file will be named <index_file>; if not given,

  • the index filename is database name plus the suffix '.cidx' -r <record_delimiter> a string of characters at the beginning of line

  • marking the start of a record (default: '>') -Q treat input as fastq format, i.e. with '@' as record delimiter

  • and with records expected to have at least 4 lines -z database is compressed into the file <compressed_db>

  • before indexing (<fastafile> can be "-" or "stdin" in order to get the input records from stdin) -s strip extraneous characters from *around* the space delimited

  • tokens, for the multikey options below (-m,-n,-f); Default <stripendchars> set is: '",`.(){}/[]!:;~|><+- -m ("multi-key" option) create hash entries pointing to

  • the same record for all tokens found in the defline -n <numkeys> same as -m, but only takes the first <numkeys>

  • tokens from the defline -f indexes *space* delimited tokens (fields) in the defline as given

  • by LIST of fields or fields ranges (the same syntax as UNIX 'cut') -w <stopwordslist> exclude from indexing all the words found

  • in the file <stopwordslist> (for options -m, -n and -k) -i do case insensitive indexing (i.e. create additional keys for

  • all-lowercase tokens used for indexing from the defline -c for deflines in the format: db1|accession1|db2|accession2|...,

  • only the first db-accession pair ('db1|accession1') is taken as key -C like -c, but also subsequent db|accession constructs are indexed,

  • along with the full (default) token; additionally, all nrdb concatenated accessions found in the defline are parsed and stored (assuming 0x01 or '^|^' as separators) -a accession mode: like -C option, but indexes the 'accession'

  • part for all 'db|accession' constructs found -A like -a and -C together (both accessions and 'db|accession'

  • constructs are used as keys -v show program version and exit