Filter duplicate reads depending on sequencing depth
filterdup <-t file> [-o outputfile] [-g genomesize] [options]
filterdup -- Filter duplicate reads like in MACS. This script can also be used to convert ELAND result, ELAND multi, ELAND export, SAM, BAM, BOWTIE map formats to BED format.
--version
show program's version number and exit
-h, --help
show this help message and exit.
-t TFILE
Sequencing alignment file. REQUIRED.
-o OUTPUTFILE
Output BED file name. If not specified, will write to standard output. DEFAULT: stdout
-f FORMAT, --format=FORMAT
Format of tag file, "AUTO", "BED" or "ELAND" or "ELANDMULTI" or "ELANDEXPORT" or "SAM" or "BAM" or "BOWTIE". The default AUTO option will let %prog decide which format the file is. Please check the definition in 00README file if you choose ELAND/ELANDMULTI/ELANDEXPORT/SAM/BAM/BOWTIE. DEFAULT: "AUTO"
-g GSIZE, --gsize=GSIZE
Effective genome size. It can be 1.0e+9 or 1000000000, or shortcuts:'hs' for human (2.7e9), 'mm' for mouse (1.87e9), 'ce' for C. elegans (9e7) and 'dm' for fruitfly (1.2e8), DEFAULT:hs
-s TSIZE, --tsize=TSIZE
Tag size. This will overide the auto detected tag size. DEFAULT: Not set
-p PVALUE, --pvalue=PVALUE
Pvalue cutoff for binomial distribution test. DEFAULT:1e-5
--keep-dup=KEEPDUPLICATES
It controls the %prog behavior towards duplicate tags at the exact same location -- the same coordination and the same strand. The default 'auto' option makes %prog calculate the maximum tags at the exact same location based on binomal distribution using given -p as pvalue cutoff; and the 'all' option keeps every tags (useful if you only want to convert formats). If an integer is given, at most this number of tags will be kept at the same location. Default: auto
--verbose=VERBOSE
Set verbose level. 0: only show critical message, 1: show additional warning message, 2: show process information, 3: show debug messages. If you want to know where are the duplicate reads, use 3. DEFAULT:2