4.2 Working with SAM/BAM files Flashcards
What is the first part of any SAM file?
Header: contains info related to reference sequence dictionary and program used to make the BAM file
What is in each of the 11 columns in SAM/BAM?
- QNAME: Query template Name
- FLAG: bitwise Flag
- RNAME: reference sequence name
- POS: 1-based leftmost mapping position
- MAPQ = mapping quality
- CIGAR = Cigar string
- RNEXT = Ref. name of the mate/next read
- PNEXT = position of mate/next read
- TLEN = observed template length
- SEQ segment sequence
- QUAL = ASCII pf phred based quality +33
What is a bit flag?
An encoding of alignment info; can be used to filter reads downstream
What is secondary alignment?
the same part of the sequence aligns to multiple locations
What is supplementary alignment?
where (mostly) non-overlapping parts of a sequence align to multiple locations
What is a cigar string?
- compresses info about alignments such as matches, insertions, deletions in the order the occur
- encodes alignment segments
What does this cigar string mean?
3M1I3MD5M
3 match, 1 insertion, 3 match, 1 deletion, 5 match
What to +/- signs in the observed template length indicate?
+: forward read (left most read)
- :reverse read (right most read)
If all segments are mapped to the same reference, what does the unsigned observed template length equal to
the number of bases from the left most mapped base to the right most mapped based of the read pairs
what does an observed template length of 0 mean?
the template is a single segment of the information in N/A
What is Samtools?
A set of utilities that manipulates alignments in BAM format
Used for
-sorting, merging, indexing, read retrieval
checks the working dir for index file and download index if absent
What 3 steps are needed to work effectively with SAM files?
- convert to binary format –> faster for computer to access
- sort by reference position
- generate an index to speed look up process
What does the samtools sort cmd do?
- sorts alignments by leftmost coordinate
- may create tmp files when whole genome alignment can’t be fitted to memory
What command produced a .bai file
samtools index
What 3 files are needed to view the alignments in IGV?
- reference
- BAM files w alignments
- index file (bai)