Bioninformatics block test 2 Flashcards
What is variant calling?
Variant calling refers to how after read alignment we identify positions within a sample that are variable relative to the reference genome used in the allignment.
What must be done before variant calling can be carried out?
Duplicates must be removed from the SAM/BAM files.
What are the 4 major types of duplicates?
PCR duplicates.
Clustering duplicates.
Optical duplicates.
Sister duplicates.
What are PCR duplicates?
PCR duplicates arise due to more than 1 copy of the same fragment annealing to the surface of the flow cell.
What are Clustering duplicates?
Clustering duplicates are unique to Illumina and arise during cluster generation when a single library spreads across 2 adjacent tiles on the flow cell.
What are optical duplicates?
Optical duplicates also arise during cluster generation, the base caller software reads a single cluster of reads as two, across two adjacent tiles on the flow cell.
What are sister duplicates?
Sister duplicates occur when both strands of the same library fragment anneal to the surface of the flow cell.
Why is it important to mark and remove duplicates for variant calling?
It is important to remove duplicates for variant calling as duplicates artificially inflate sequencing depth, resulting in homozygous positions being called heterozygous and heterozygous positions being called homozygous.
What are the 4 types of variant callers?
- Copy Number Variant (CNV) callers.
- Structural Variant (SV) callers.
- Somatic callers. (somatic mutations)
- Germline callers (inherited traits)
What are the two ways in which variant calling can be preformed?
- Single sample calling: Per individual.
- Joint calling: Using information from multiple samples at a time.
What is the varient caller pipeline?
Variant caller pipeline:
- Take SAM/BAM file as input.
- Mark and remove duplicates.
- Apply statistical methods to identify variants and assign genotypes.
- Produce a gVCF file as output.
What is variant annotation?
Variant annotation is the process of describing the nature and effect of the DNA alterations produced by a variant.
In variant annotation, what does nature and effect refer to?
Nature: Type of sequence alteration (indel, substitution, cnv etc…)
Effect: How the variant changes the annotated reference sequence it occurs in.
Variants will always have a single nature, but can have multiple effects, because the effect on the context of each transcript.
Some common varient annotation tools are SnpEff, ANNOVAR, and Varient Effect Predictor (VEP), what do they all have in common?
All varient annotation tools take a VCF file as input, annotate each varient in the file to return an annotated VCF file, with annotations in the info field of the file.
What tools are used to predict varient impact on the structure and function of human proteins?
SIFT (Sorting Tolerant From Intolerant), creates a postion-specific weight matrix that corresponds to the profiles of specific protein domains. Position specific weight matrices can also be used to identify transcription factor binding sites.