Sequencing and resequencinh HG Flashcards
1
Q
How much of the human genome encodes for proteins?
A
- about 1.4%
2
Q
Exome capturing
A
Genomic DNA -> Costruct shotgun library -> Fragments -> Hybridization -> Wash + Pulldown -> Captured DNA -> DNA sequencing -> Mapping, alignment, variant calling
3
Q
Alignment of reads
A
- align reads on whole reference genome and not just target regions
- reads might not be from target regions but forced to align there
- Reference genome + fq reads -> alignment program + parameters -> result sam
4
Q
Variant callers procedure
A
- sam file too big to manually analyze
- in lab an exome produces about 40 millions reads
- conversion of sam file into vcf file, each line is a variant
- GATK is the most commonly used program for variant calling
- analyzes NGS data
5
Q
Prioritization in VCF file
A
- VCF file not easy to analyse without dedicated software
- few genes might be ok but not for large amounts of variants
- variant prioritization takes into consideration many parameters
- quality of the call
- frequancy of the variant in-population
- effect of variant
- biological and medical features
6
Q
Phenotypes and variants
A
- identify genetic variations that are producing a phenotype
- mutations or polymorphisms
- set of observable characteristics of an individual
- variant definitions:
- mutation, frequency <1%
- polymorphism, frequency >1%
- phen caused by a variation in a single gene are easier to study (basic Mendel’s law)
- phen caused by multiple genes are more difficult to characterize
7
Q
Types of variants
A
- SNP (single nucleotide polymorphism) or SNV (single nucleotide variant)
- generallybiallelic
- MAF frequency of less abundant one
- Indels
- very short insertions and deletions (one or few bases)
- less frequent than SNP
- Microsatellites (SSR simple sequence repeats, STR tandem)
- short squences, few bases
- repeated many times (20)
- stable within same family
- Structural variations
- large insertions/deletions/inversions
- can be polymorphic
- can have detectable phenotypic
- CNV (copy number variations)
- large duplications
- may be related to diseases
8
Q
Variant analysis, why?
A
- individual and population characterization
- variant discovery
- population genetics
- ancestry
- genetically driven breeding
- gene hunting
- searching for associated loci
- linkage analysis
- searching for causative genes
9
Q
Inheritance of DNA over generations
A
- 50% from mother and 50% from father
- after n generation we have (1/2^n) DNA from each ancestor
- 32 generations, one base each
- 2 loci at 1 cMorgan (1% separation by crossing over)
- two loci not separated 0.99
- n generations 0.99^n
- in practical we inherit very large regions of genome only from a small number of our ancestors
10
Q
Linkage disequilibrium
A
- non-random occurrence of alleles at two or more loci in general population
- haplotypes do not occur
- haplotype, group of alleles inherited from a single ancestors
- how much variants are randomly assorted in individuals
- level of linkage disequilibrium is generally high
11
Q
Genome Wide Association Studies (GWAS)
A
- high level linkage disequilibrium
- genomic sequence = haplotypes + 1/2 mil SNP
- no new mutations but phenotyper linked to polymorphic variants
- microarrays very useful and cost effective
- allele count of each measured SNP is evaluated
- density usually very high some highly probable
- Manhattan plot helps identifying associated risk loci