rabema_evaluate (1)

Rabema evaluation synopsis rabema_evaluate [options] --reference ref.fa --in-gsi in.gsi --in-sam mapping.sam rabema_evaluate [options] --reference ref.fa --in-gsi in.gsi --in-bam mapping.bam description compare the sam/bam output mapping.sam/mapping.bam of any read mapper against the rabema gold standard previously built with rabema_build_gold_standard. the input is a reference fasta file, a gold standard interval (gsi) file and the sam/bam input to evaluate. the input sam/bam file must be sorted by queryname. the program will create a fasta index file ref.fa.fai for fast random access to the reference. -h, --help displays this help message. --version display version information -v, --verbose enable verbose output. -vv, --very-verbose enable even more verbose output. input / output: -r, --reference fasta path to load reference fasta from. valid filetypes are: fa and fasta. -g, --in-gsi gsi path to load gold standard intervals from. if compressed using gzip, the file will be decompressed on the fly. valid filetypes are: gsi and gsi.gz. -s, --in-sam sam path to load the read mapper sam output from. valid filetype is: sam. -b, --in-bam bam path to load the read mapper bam output from. valid filetype is: bam. --out-tsv tsv path to write the statistics to as tsv. valid filetype is: tsv. benchmark parameters: --oracle-mode enable oracle mode. this is used for simulated data when the input gsi file gives exactly one position that is considered as the true sample position. for simulated data. --only-unique-reads consider only reads that a single alignment in the mapping result file. usefull for precision computation. --match-n when set, n matches all characters without penalty. --distance-metric metric set distance metric. valid values: hamming, edit. default: edit. one of hamming and edit. default: edit. -e, --max-error rate maximal error rate to build gold standard for in percent. this parameter is an integer and relative to the read length. the error rate is ignored in oracle mode, here the distance of the read at the sample position is taken, individually for each read. default: 0 default: 0. -c, --benchmark-category cat set benchmark category. one of {all, all-best, any-best. default: all one of all, all-best, and any-best. default: all. --trust-nm when set, we trust the alignment and distance from sam/bam file and no realignment is performed. off by default. --ignore-paired-flags when set, we ignore all sam/bam flags related to pairing. this is necessary when analyzing sam from soap's soap2sam.pl script. --dont-panic do not stop program execution if an additional hit was found that indicates that the gold standard is incorrect. logging: --show-missed-intervals show details for each missed interval from the gsi. --show-invalid-hits show details for invalid hits (with too high error rate). --show-additional-hits show details for additional hits (low enough error rate but not in gold standard. --show-hits show details for hit intervals. --show-try-hit show details for each alignment in sam/bam input. the occurrence of "invalid" hits in the read mapper's output is not an error. if there are additional hits, however, this shows an error in the gold standard. return values a return value of 0 indicates success, any other value indicates an error. memory requirements from version 1.1, great care has been taken to keep the memory requirements as low as possible. the evaluation step needs to store the whole reference sequence in memory but little more memory. so, for the human genome, the memory requirements are below 4 gb, regardless of the size of the gsi or sam/bam file. references m. holtgrewe, a.-k. emde, d. weese and k. reinert. a novel and well-defined benchmarking method for second generation read mapping, bmc bioinformatics 2011, 12:210. http://www.seqan.de/rabema rabema homepage http://www.seqan.de/mason mason homepage version rabema_evaluate version: 1.2.0 last update march 14, 2013