Sequencing and resequencinh HG Flashcards

1
Q

How much of the human genome encodes for proteins?

A
  • about 1.4%
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Exome capturing

A

Genomic DNA -> Costruct shotgun library -> Fragments -> Hybridization -> Wash + Pulldown -> Captured DNA -> DNA sequencing -> Mapping, alignment, variant calling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Alignment of reads

A
  • align reads on whole reference genome and not just target regions
    • reads might not be from target regions but forced to align there
  • Reference genome + fq reads -> alignment program + parameters -> result sam
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Variant callers procedure

A
  • sam file too big to manually analyze
    • in lab an exome produces about 40 millions reads
  • conversion of sam file into vcf file, each line is a variant
  • GATK is the most commonly used program for variant calling
    • analyzes NGS data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Prioritization in VCF file

A
  • VCF file not easy to analyse without dedicated software
    • few genes might be ok but not for large amounts of variants
  • variant prioritization takes into consideration many parameters
    • quality of the call
    • frequancy of the variant in-population
    • effect of variant
    • biological and medical features
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Phenotypes and variants

A
  • identify genetic variations that are producing a phenotype
    • mutations or polymorphisms
    • set of observable characteristics of an individual
  • variant definitions:
    • mutation, frequency <1%
    • polymorphism, frequency >1%
  • phen caused by a variation in a single gene are easier to study (basic Mendel’s law)
    • phen caused by multiple genes are more difficult to characterize
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Types of variants

A
  • SNP (single nucleotide polymorphism) or SNV (single nucleotide variant)
    • generallybiallelic
    • MAF frequency of less abundant one
  • Indels
    • very short insertions and deletions (one or few bases)
    • less frequent than SNP
  • Microsatellites (SSR simple sequence repeats, STR tandem)
    • short squences, few bases
    • repeated many times (20)
    • stable within same family
  • Structural variations
    • large insertions/deletions/inversions
    • can be polymorphic
    • can have detectable phenotypic
  • CNV (copy number variations)
    • large duplications
    • may be related to diseases
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Variant analysis, why?

A
  • individual and population characterization
    • variant discovery
    • population genetics
    • ancestry
    • genetically driven breeding
  • gene hunting
    • searching for associated loci
    • linkage analysis
    • searching for causative genes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Inheritance of DNA over generations

A
  • 50% from mother and 50% from father
    • after n generation we have (1/2^n) DNA from each ancestor
    • 32 generations, one base each
  • 2 loci at 1 cMorgan (1% separation by crossing over)
    • two loci not separated 0.99
    • n generations 0.99^n
    • in practical we inherit very large regions of genome only from a small number of our ancestors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Linkage disequilibrium

A
  • non-random occurrence of alleles at two or more loci in general population
    • haplotypes do not occur
  • haplotype, group of alleles inherited from a single ancestors
  • how much variants are randomly assorted in individuals
    • level of linkage disequilibrium is generally high
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Genome Wide Association Studies (GWAS)

A
  • high level linkage disequilibrium
    • genomic sequence = haplotypes + 1/2 mil SNP
    • no new mutations but phenotyper linked to polymorphic variants
  • microarrays very useful and cost effective
  • allele count of each measured SNP is evaluated
    • density usually very high some highly probable
  • Manhattan plot helps identifying associated risk loci
How well did you know this?
1
Not at all
2
3
4
5
Perfectly