Lecture 4 Flashcards

1
Q

Gives route of traditional gene discovery

A
  • Determine mode of inheritance
  • Recombination mapping using markers
  • Haplotype analysis of recombinants
  • Rapid screening of candidate genes for mutation
  • Identify mutation using sanger sequencing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Give state of the art gene discovery

A
  • Whole exome next generation sequencing
  • Lots of polymorphisms
  • Filter polymorphisms to get candidate genes
  • Confirm using sanger sequencing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explain the human genome project

A

Launched in 1990 and completed in 2003

Generated representative human genome of ~3,000,000,000 bases

Performed using sanger sequencing

Covered 92% of human genome sequence

Used enough gel to fill a lecture theatre

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are flow cells

A

Can sequence 2 human genomes in 2-3 days

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What has happened to the cost of human genomes over time?

A

Decreased from almost 100 million dollars in 2002 to almost 100 dollars from 2010 onwards

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What caused the decrease in human genome cost

A

Next generation sequencing technology, but also filled in 8% gap

Long read NGS helps resolve duplication and repetitive regions

Complete human genome was described in 2022 by Telomere-to-telomere consortium

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are reference genomes

A
  • Forms foundation of medical, function and diversity studies
  • Provides common point of reference for genomic loci:
    Gives genes addresses
    Reported variants relative to reference genome
  • Provides template to guide assembly of new genomes and enables assay design/data analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How can genetic variations be characterised against reference genomes?

A
  • Single nucleotide polymorphisms
  • Structural variants e.g. deletions, insertions, duplications, inversions, translocations, copy number variation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the most investigated variant types in genomes?

A

Single nucleotide polymorphisms:
- Ease of analysis

  • Single nucleotide substitutions
  • Present at >1% of population
  • 4-5 million SNPs in every individuals (every 1000 bp)
  • Over 600 million reported
  • Single nucleotide variations are similar to SNPs but don’t require >1% in population
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What factors need to be considered when exploring genetic variations

A

Cost - experimental, analysis, other

Time - Sample prep, run time, analysis time, sample transport

Information capture - Accuracy, feature length, complex variant detection

Appropriate tools should be selected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What can be used as an input sample?

A

DNA (easy to manipulate)

RNA - can be useful for charactersing disease subtypes as ENA shows genes cells are actively using

Selected DNA/RNA

Protein not typically used as can’t be easily manipulated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How much information is required

A

3 billion bases in human genome

BRCA1:
110kb/85kb in length (intron and exon)
0.006% of genome
BRCA2:
7.8kb/10.2kb in length (exon only)
0.0005% genome

Is all genome info required?
- Microarray
- Enrichment/amplicon -> gene panel sequencing
- Enrichment/aplicon -> exome sequencing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Examples of target enrichment and amplicon

A

Hybridisation capture:
- Section of genome fragmented
- Adapter and DNA bound to gel to form gene library
- Washing and elution of DNA

Amplification:
1. Library hybridisation
2. PCR amplification
3. Washing and elution

Amplicon sequencing:
- Genomic DNA undergoes multiplex PCR with specific primers

  • Ligate adaptors are then used to form a barcoded library
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Explain in more detail amplicon sequencing

A

PCR primers designed to target gene of interest

Amplify region using PCR

Regions sequenced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Explain target enrichment methods

A
  • Allow more targets to be enriched at once compared to amplicon sequencing

Exons account for 2% of genome, but 85% of known disease variants. Exome sequencing is therefore cheaper as it only sequences exons

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Selected inputs and sequencing

A

DNA -> Whole genome sequencing

DNA -> PCR -> Amplicon sequencing

DNA -> hybridisation capture -> target enrichment sequencing

17
Q

Read, map, and depth/coverage

A

Read - sequence corresponding to DNA fragment

Map - Determining where reads originated in genome

Depth/coverage - Depth is number of times sequencing reads cover a region if genome

18
Q

How are sequences of DNA fragments mapped onto reference genome

A

DNA fragmented by chemical, physical or enzymatic means

Individual fragments are sequenced

Map reads to reference genome to identify variants

19
Q

What are the three different sequencing methods?

A

Sanger sequencing: Validation and disease diagnosis of known genomic regions
800-1000bp read length
Fast, cost-effective, high accuracy
Low sensitivity, more sample required, and might miss new variants

Short read NGS: genetic contribution of diseases, GWAS, gene panel testing
20-500bp
High sensitivity, low sample input, can discover new variants
Less cost effective and poor resolution for repetitive sequences

Long read NGS: identify difficult to detect de novo mutations
10,000-100,000bp
Low sample input and resolves structural variants
High error rates and more expensive

20
Q

Explain microarrays

A

Hybridisation approach (not sequencing)

Array of spots which contain small DNA sequence commentary to interest sequence:
- Array-based comparative genomic hybridsation (aCGH) - detects copy number variation
- Single-nucleotide polymorphism (SNP) array for GWAS
- Transcriptomics

Easier to analyse than sequencing

21
Q

Process of microarray example with tumour and normal DNA

A

Normal and tumour RNA undergoes RT-PCR and labelled with fluorescent dyes

Combine equal amounts of cDNA and hybridize probe to microarray

Scan

22
Q

What might genetic tests be comparing

A

Cancer vs non-cancer samples

de novo mutations in children by testing mothers, fathers and children

Common genetic traits

23
Q

The HapMap project in 2007

A

269 genomes from geographically diverse cohort e.g. Japanese, Han Chinese, European, African etc

Described chromosome regions with sets of strongly associated SNPs

24
Q

1000 genome project

A

Whole genome sequenced in 2500 people

100,000 genome in 2018

UK biobank - genomic and clinical information

25
Q

23andme

A

Microarray based (600,000 SNPs)

  • 12 million people have paid for service to get information on their genetics
  • A few FDA-approved clinically relevant genetics
26
Q

Genotype Tissue expression project

A
  • Genomic and transcriptomic data from 54 tissue types
  • 948 decreased donors
27
Q

TCGA project

A

Genomic, epigenomic, transcriptomic, proteomic data from 33 cancer types

11,000 cancer patients

28
Q

What are a major issue regarding data collection in variant discovery

A

Ethnicity:
Reference genome is European ancestory
HapMap, 1000 genome - efforts to characterise variations across different ethnicities
T2T - constructed a Chinese genome - highlighted unique genes and exclusive sequences/variants compared to European genome

Poorer ability to identify contributing genetic variants in non-European populations

  • Participation bias
29
Q

Pangenome reference

A

Improved variant detection using multiple ethinicity genomes

First draft in May 2023
Reduce small variant discovery error by 34%
Increase structural variants detected per haplotype by 104%