Sequence alignment/map file format
Sequence Alignment/Map (SAM) format is TAB-delimited. Apart from the header lines, which are started with the `@' symbol, each alignment line consists of:
1 |
---|
QNAME |
Query template/pair NAME |
2 |
FLAG |
bitwise FLAG |
3 |
RNAME |
Reference sequence NAME |
4 |
POS |
1-based leftmost POSition/coordinate of clipped sequence |
5 |
MAPQ |
MAPping Quality (Phred-scaled) |
6 |
CIGAR |
extended CIGAR string |
7 |
MRNM |
Mate Reference sequence NaMe (`=' if same as RNAME) |
8 |
MPOS |
1-based Mate POSistion |
9 |
TLEN |
inferred Template LENgth (insert size) |
10 |
SEQ |
query SEQuence on the same strand as the reference |
11 |
QUAL |
query QUALity (ASCII-33 gives the Phred base quality) |
12+ |
OPT |
variable OPTional fields in the format TAG:VTYPE:VALUE |
Each bit in the FLAG field is defined as:
0x0001 |
---|
p |
the read is paired in sequencing |
0x0002 |
P |
the read is mapped in a proper pair |
0x0004 |
u |
the query sequence itself is unmapped |
0x0008 |
U |
the mate is unmapped |
0x0010 |
r |
strand of the query (1 for reverse) |
0x0020 |
R |
strand of the mate |
0x0040 |
1 |
the read is the first read in a pair |
0x0080 |
2 |
the read is the second read in a pair |
0x0100 |
s |
the alignment is not primary |
0x0200 |
f |
the read fails platform/vendor quality checks |
0x0400 |
d |
the read is either a PCR or an optical duplicate |
0x0800 |
S |
the alignment is supplementary |
where the second column gives the string representation of the FLAG field.
https://github.com/samtools/hts-specs
The full SAM/BAM file format specification