Sequence alignment/map file format
Sequence Alignment/Map (SAM) format is TAB-delimited. Apart from the header lines, which are started with the `@' symbol, each alignment line consists of:
| 1 |
|---|
| QNAME |
| Query template/pair NAME |
| 2 |
| FLAG |
| bitwise FLAG |
| 3 |
| RNAME |
| Reference sequence NAME |
| 4 |
| POS |
| 1-based leftmost POSition/coordinate of clipped sequence |
| 5 |
| MAPQ |
| MAPping Quality (Phred-scaled) |
| 6 |
| CIGAR |
| extended CIGAR string |
| 7 |
| MRNM |
| Mate Reference sequence NaMe (`=' if same as RNAME) |
| 8 |
| MPOS |
| 1-based Mate POSistion |
| 9 |
| TLEN |
| inferred Template LENgth (insert size) |
| 10 |
| SEQ |
| query SEQuence on the same strand as the reference |
| 11 |
| QUAL |
| query QUALity (ASCII-33 gives the Phred base quality) |
| 12+ |
| OPT |
| variable OPTional fields in the format TAG:VTYPE:VALUE |
Each bit in the FLAG field is defined as:
| 0x0001 |
|---|
| p |
| the read is paired in sequencing |
| 0x0002 |
| P |
| the read is mapped in a proper pair |
| 0x0004 |
| u |
| the query sequence itself is unmapped |
| 0x0008 |
| U |
| the mate is unmapped |
| 0x0010 |
| r |
| strand of the query (1 for reverse) |
| 0x0020 |
| R |
| strand of the mate |
| 0x0040 |
| 1 |
| the read is the first read in a pair |
| 0x0080 |
| 2 |
| the read is the second read in a pair |
| 0x0100 |
| s |
| the alignment is not primary |
| 0x0200 |
| f |
| the read fails platform/vendor quality checks |
| 0x0400 |
| d |
| the read is either a PCR or an optical duplicate |
| 0x0800 |
| S |
| the alignment is supplementary |
where the second column gives the string representation of the FLAG field.
https://github.com/samtools/hts-specs
The full SAM/BAM file format specification