Week 4 Flashcards

1
Q

Are homozygotes informative when it comes to linkage analysis?

A

No, because the markers are homozygous we don’t know which allele is actually associated with the disease allele because markers are the same.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How many Mb or bp can cytogenetic tests detect

A

5Mb (5000000bp)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define repetitive sequence

A

DNA fragment that are present in multiple copies in the genome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Types of variable number tandem repeats

A
  1. Mini satellites ( repeat unit is 7-49)
  2. Microsatalities (repeat unit is 2-6 bases)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Characterise Minisatellites

A
  1. Hypermutable (ver unstable)
  2. Encourage cross over
  3. 90% found in sub telomeric regions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Characterise Microsatellites (short tandem repeats)

A
  1. Found in coding and non-coding regions of the genome.
  2. Highly polymorphic and extremely useful
  3. STR are used in forensics, paternity testing, ancestry testing and diagnostic
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Size of copy number variants

A

The repeat unit may range from 50-1000 bases, to several mega bases in size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Can copy number variants cause disease.

A
  1. Most are benign
  2. But CNVs in developmental genes causes; nervous system disease (incl Parkinson’s, Autism and Alzheimer’s)
  3. Also very common in cancer cells
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Where are most repetitive regions found in the genome

A

Repetitive DNA elements are often associated with heterochromatin.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Heterochromatin dysfunction leads to ….

A

Heterochromatin dysfunction leads to genomic dysregulation by inducing aberrant repeat repair, chromosome segregation errors, transposons activation and replication stress.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

It is harder finding what variants when using current technology?

A
  1. Intermediate- size structural variants (<2000 bp)
  2. Inversions
  3. Regions with DNA composition that is GC- or AT-rich
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are recurrent CNVs

A
  1. Similar size and recurrent breakpoints in segmental duplication.
  2. Enhances population diversity.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are non-recurrent CNVs

A
  1. Random breakpoints scattered across genomic regions
  2. Usually more severe phenotypic consequences
  3. Dependent on size and location
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How to detect CNVs

A
  1. Karyotyping
  2. Fluorescence in situ hybridisation (FISH)
  3. MLPA
  4. Microarray
  5. Next generation sequencing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Disadvantages of detecting CNVs through Karyotyping and FISH

A
  1. Large CNVs only
  2. Not able to detect small interagency rearrangements
  3. Time consuming
  4. Low throughput
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Disadvantages of detecting CNVs through MLPA

A
  1. Limited number of loci
  2. Only known gene targets can be assessed = no discovery
  3. No breakpoint detection
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Types of NGS methods to analyse cnvs

A
  1. Paired -end mapping
  2. Split end
  3. Read death
  4. Assembly based
  5. Combination approach
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Databases to visualise/ analyse CNVS

A
  1. Database of genomic variants (DGV)
  2. GnomAD-SV
  3. DECIPHER
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

The purpose of the human genome project

A

An international research project to map each human gene and to completely sequence the entire human DNA complement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Aims of the human genome project . (7)

A
  1. Determine the DNA sequence of the human genome
  2. Developed improved sequencing technologies
  3. Sequence model organisms
  4. Store information in a useful way
  5. Develop better tools for analysis
  6. Identify all genes and their function
  7. Consider. Ethical, legal and social implications
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What two approaches did the human genome project take?

A
  1. Segment assembly approach : aligning and merging fragments that have been obtained from a longer DNA sequence to try and construct the original sequence.
  2. Whole genome shotgun sequencing: sequencing many overlapping DNA fragments in parallel and using a computer to assemble the small fragments into larger contiguous and then eventually chromosome.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Ethical and legal considerations of sequencing someone’s genome.

A
  1. Fairness and privacy: who should have access to your genetic information?
  2. Psychological effect: how does knowing your predisposition to disease affect you as an individual?
  3. Genetic testing and genetic screening: issue around the commercialisation of data.
  4. Reproductive implications: The use of genetic information in decision making.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What was the purpose of sequencing the genome of smaller organisms in the human genome project.

A
  1. Foster cooperation
  2. Smaller genomes serves as tests for developing sequencing methodologies.
  3. Serve as comparative genomes
  4. Developed mathematical, statistical and computational tools.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are primary sequence databases and secondary annotation database.

A
  1. Primary sequence database: databases that stores genomic sequence data.
  2. Secondary databases either have algorithm that predict and store or just store the annotation provided for the raw sequence.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Why would you need to access sequence data?

A
  1. Know what the sequence of a gene is
  2. Identify variants in the sequence
  3. Compare your sequence to others
  4. Identify similar sequences
  5. Find diseases associated with variation in your gene of interest.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is PubMed ?

A

Extensive biomedical literature database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is RefSeq

A

Comprehensive, integrated, well-annotated set of reference sequences -genomic, transcript and protein.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is OMIM

A

Online mendeline inheritance in man- database of human genes and genetic phenotypes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is clinVar

A

Database of genomic variation and the relationship to human health.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is Ensembl

A

Resource for high quality integrated annotation data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is uniprot

A

Universal protein resource for protein sequence and functional annotation data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is PDBe

A

Protein data bank Europe -collection of 3D structural data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is interPro

A

Database of protein families, domains and conserved site.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

The aim of gnomAD

A

Enable researchers too better understand the role of genomic DNA variation in both health and disease states.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is ExAC

A

Aggregate and harmonise exome sequencing data from a wide variety of large-scale sequencing projects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

How to interpret z-scores on gnomAD

A
  1. Positive z-score; fewer variants observed than expected: highly constraint, intolerant to variation.
  2. Negative z-score: gene has more variants observed than expected: tolerant to variation.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

How to interpret pLI values on gnomAD

A

PLI score close to 0 - LoF is tolerated by natural selection
PLI score close to 1 -Genes where LoF is not tolerated/ Haploinsufficient genes.

38
Q

What clinical indications result in array-CGH testing? (4)

A
  1. Multiple congenital malformations
  2. Single congenital malformation with a number of Dysmorphic features.
  3. Intellectual disability/developmental delay of unexplained aetiology or in association with a congenital anomaly /Dysmorphic features.
  4. Significant growth disturbances
39
Q

What is the value of molecular diagnostic testing in the case of monogenic disorders.

A
  1. Screening/ surveillance: better outcome/prognosis. Patient and patient family can be Better prepared.
  2. Clarify recurrence risk: test at risk family members and prenatal testing
  3. Social support: access to grant, support services and resources.
40
Q

Advantages of Sanger sequencing

A
  1. Few genomic targets >20per sample
  2. Fast, reliable. And low error rate validate NGS findings.
  3. Targeted quick analysis
41
Q

Disadvantages of Sanger sequencing

A
  1. Can only seq one gene region at a time or hot-spot regions.
  2. Less cost-effective for high number of regions.
42
Q

Advantages of NGS.

A
  1. Multiple samples and targets many genomic regions.
  2. Higher discovery and variant resolution
  3. More data with less DNA/RNA input.
43
Q

What does library prep entail when doing NGS.

A
  1. Physical shearing, enzyme digestion and PCR based amplification-fragmentation.
    2.ligate fragments to adapter sequences
  2. Adapter sequences have unique barcodes that are used to tag each sample.
  3. Important for pooling of libraries
44
Q

Types of sequencing reads (NGS)

A
  1. Single-end reads -5’ or 3’ (random)
  2. Paired -end reads -5’ and 3’
  3. Mate-pair reads -5’ and 3’
45
Q

Characterise targeted panel sequencing

A
  1. Categorical genetic disorder
  2. Up to thousands of genes
  3. High coverage and depth
  4. Lowest cost
  5. Highest accuracy amongst all the NGS categories.
46
Q

Characterise whole exome sequencing

A
  1. Whole exome
  2. Intermediate coverage and depth
  3. Good accuracy
47
Q

Characterise whole genome sequencing

A
  1. All genes and non coding DNA
  2. Lower coverage
  3. Highest cost
  4. Lower accuracy
48
Q

Characterise short read sequencing

A
  1. 100-300bp fragments
  2. Sequencing by synthesis or ligation
  3. DNA polymerase or ligase enzymes extend numerous DNA strand in parallel.
  4. Short reads/ fragments are assembled together for contiguous sequence then aligned to the reference.
  5. Most labs use short reads sequencing for SNV calling
  6. Not ideal for complex and repetitive areas of the genome.
49
Q

Characterise long read sequencing

A
  1. 5000-3000 base pairs in one single read.
  2. Sequence directly from DNA/RNA
  3. Sequence error rate is higher than short reads: variant calling unreliable
  4. Aligning and processing long read sequence data takes longer
  5. Most labs use LRS for CNVs (structural and also big Indels
50
Q

What is the basic premises of ClinVar ?

A

Process submission on reported variants in patient samples. Assertions made regarding their clinical significance. Data is mapped. To reference sequences, and reported according to the HGVS standard.

51
Q

Major goal of clinVar

A
  1. To support computational re-evaluation, both of genotype and assertion.
  2. To enable the ongoing evolution and development of knowledge.
52
Q

What is clingen

A

Central resource that defines the clinical relevance of genes and variants for use in precision medicine and research.

53
Q

What is HGMD and its use?

A
  1. It’s an up to date and comprehensive collection of known and published pathogenic gene variants.
54
Q

Factors to consider for variant analysis (6)

A
  1. Design/ platform such as Panel, WES or WGS depending on intended use
  2. Disease inheritance -such as mode of inheritance - use OMIM ,gene review, varsome, HGMD.
  3. Functional effect: is variant in a functional domain or hotspot region will it have a consequence? - use uniprot, varsome, many others
  4. Population allele frequency - use gnomAD or 1000 genomes
  5. Variant quality - IGV
  6. Clinical relevance - de novo ? Gene/ allelic heterogeneity
55
Q

Targeted gene panels is useful if :

A
  1. Phenotype relatively distinct
  2. Multiple genes known to cause similar phenotype.
56
Q

When should whole exome sequencing be considered?

A
  1. Poorly defined phenotype
  2. Suspected new syndrome
57
Q

When should whole genome sequencing be considered?

A
  1. May detect deep intronic mutation
  2. May detect breakpoints
  3. May detect structural rearrangements
58
Q

What decisions need to be made when doing a genetic test ?

A
  1. Ethnicity: high risk ancestry group
  2. Who to test : closer the relative the better
  3. Family history; any known mutations ?
  4. Limitations of genetic testing approaches
59
Q

Challenges and limitations of NGS testing (6)

A
  1. Variants of uncertain significant
  2. Incidental findings
  3. If sequencing few sequencing targets-less cost effective.
  4. More analysis time and complex
  5. amplification bias, sequencing errors
  6. Missing heritability
60
Q

What can i do with ensemble? (7)

A
  1. View genes with other annotation along the chromosome.
  2. View alternative transcripts for a given gene.
  3. Examine single nucleotide polymorphisms (SNPs) for a gene or chromosomal region.
  4. Upload your own data
  5. Use BLAST, or BLAT against any enable genomes
  6. Export sequence or create a table of gene information with bioMart.
  7. Variant effect predictor- effect of a variant on a gene.
61
Q

Things that effect genomic architecture? (8)

A
  1. Linkage disequilibrium
  2. Age of population
  3. Effective population size
  4. Admixture
  5. Selection
  6. Autozygosity
  7. Cultural norms and practices
  8. Population size
62
Q

Whole genome sequencing projects on Africans in Africa.

A
  1. 1000 genomes Project (high coverage)
  2. African genome variation project (low coverage on Yoruba, Baganda, Ethiopia, Luhya and zulu)
  3. Southern African human genome programme (>30x, on Sotho, zulu, Xhosa, coloured)
  4. Uganda GPC (low coverage on the general population of Uganda)
  5. H3Africa (>30x on 50 ethnolinguistics)
63
Q

Knowledge gaps population genomic in Africa. (6)

A
  1. Limited data on hunter gather populations
  2. Many ethnic groups not yet included in genomic studies.
  3. Ancient genomes
  4. Functional interpretation of variants
  5. Phenotype to genotype links poorly understood
  6. Modest sample sizes
64
Q

Reasons to study African genomes

A
  1. Detect novel variation of potential functional impact
  2. Develop African- appropriate research tools
  3. Explore historical events
  4. Understand the molecular and biochemical basis of disease on the continent
  5. Ensure that personalised medicine has a role in Africa.
65
Q

What is thee aim of the HapMap project .

A

To develop a haplotype map of the human genome, which will describe the common patterns of human genetic variation.

66
Q

What is the premise of the HapMap project ?

A

The human HapMap is built on SNPs distribution approximately every 1000 base pairs throughout the genome. Analysis of the SNPs revealed regions that exhibit no recombination within one of the four test populations,flanked by short regions of high recombination frequency. This suggests that identifying only a few a SNPs in each recombination free region will be sufficient to predict the remaining SNP alleles in the same regions .

67
Q

What are some interesting findings of the HapMap project.

A
  1. Similarity of allele frequencies in Chinese and Japanese samples.
  2. Identification of recombination hotspots
    3.haplotype sizes vary across populations due to migration along history.
  3. LD correlates to genomic features
68
Q

The aims of the 1000 genomes project

A
  1. Discover population level human genetic variations of all types
  2. Define haplotype structure and structural variation in the human genome.
  3. Develop sequence analysis methods, tools, and other reagents that can be transferred to other sequencing projects.
69
Q

Interesting outcomes of the 1000 genomes project.

A
  1. Confirmed that non-African diversity is largely a subset of African diversity.
  2. African sample provided a more complete discovery resource for variant sites in non-African than the converse.
  3. Newly discovered SNPs are mostly at low frequency and enriched for functional variants.
70
Q

How does a familial bell curve look compared to a normal population.

A

Threshold remains the same but curve moves to the right (mean liability increases) , this means recurrence risk increases because families share genes and environment.

71
Q

Explain the principle of twin studies.

A
  1. Determinants of phenotype for many diseases in monozygotic vs dizygotic twins.
  2. MZ twins share 100% of genes + shared environment: therefore all differences between MZ twins assumed to be due to unshared environmental factors.
  3. DZ twins share 50% of genes + shared environment: if co-occurrence of condition in both twins occurs more commonly in MZ than DZ twins then is assumed to reflect genetic differences.
72
Q

What is the common disease, common variant hypothesis.

A

Holds that the genetic component of most common non-communicable disorders is due to the combined effect of a relatively large number of disease causing alleles that occur relatively often in the population.

73
Q

What is GWAS and what is the study design.

A
  1. GWAS were designed to find loci associated with occurrence of multifactorial disease designed to interrogate the CD/CV hypothesis.
  2. Case-control study design: compare large number of case (with disease) to controls (without). This genome wide SNP array comprising polymorphic SNP markers throughout genome.
74
Q

How do Patterns of linkage disequilibrium form.

A
  1. LD patterns evolve over generations due to homologous recombination of chromosomes .
  2. SNPs on the same chromosome are inherited in blocks and the pattern of SNPs in a block is a haplotype.
75
Q

Impact of GWAS studies.

A
  1. Elucidating the biology of complex disease
  2. Identify therapeutic targets
  3. Improving individual risk assessment
76
Q

Potential roles of potential risk scores.

A
  1. Improving prediction of disease occurrence
  2. Informing screening
  3. aiding disease diagnosis
  4. Informing selection of therapeutic interventions
77
Q

How are polygenic risk scores calculated.

A

PRS= weighted sum of a number of risk alleles carried by an individual, where the risk alleles and their weights are defined by SNPs and their measured effects.

PGS= weight x allele dosage + weight x allele dosage…..

78
Q

Explain how receiver operator characteristics curve work.

A
  1. The less overlap between true positive and false positives, the more concave the curve, the better test.
    Interpretation of the area under the curve:
    0.5 < AUC <0.7 less accurate
    0.7 <AUC<0.9 moderately accurate
    0.9<AUC<1 highly accurate
79
Q

Define polygenic

A

Reflects a trait that is influenced by more than one gene

80
Q

Define multifactorial

A

Reflects a trait that can be influenced by the environment.

81
Q

Define empiric recurrence risk

A

How we predict that a polygenic multifactorial disease will occur in an individual.

82
Q

Empiric recurrence risk influenced by..

A
  1. Severity of the disease
  2. Number of affected family members
  3. How closely related a person is to affected individual
83
Q

What could be causing the missing heritability from GWAS.

A
  1. Common vs rare mutations
  2. Structural variation
  3. Epistasis
  4. Environmental
  5. Epigenetic’s
84
Q

Does the number of affected family members influence risk? In Mendelian, polygenic and multifactorial.

A

Mendelian- number of affected family members does not influence risk
polygenic - recurrence risk varies
multifactorial- more affected family members do influence risk

85
Q

What are genomic microarray analysis

A
  1. Genome-wide analysis technology used to assess DNA copy number.
  2. Detection of genomic alterations such as copy number variations and copy-neutral changes
86
Q

CNVs vs CNC

A
  1. CNV-Deletion or duplication > 50bp
  2. CNC- runs of homozygosity (ROH) /long contiguous stretches of homozygosity e.g uniparental disomy
87
Q

Two types of microarray

A
  1. 1- Color
  2. 2- colour
88
Q

Characterise SNP array

A
  1. Allele-specific oligonucleotide
  2. Patient DNA is hybridised to the microarray and results analysed against a reference.
  3. Detects both CNVs and SNPs
  4. B allele frequency (BAF) = the B allele signal divided by the sum of the A and B signals.
89
Q

Characterise CGH array

A
  1. Oligonucleotide probes
  2. Differentially labelled patients and control DNA hybridised to the microarray
  3. True comparative hybridisation
  4. Relative fluorescence is converted to a Log2 ratio which indicates dosage.
90
Q

Why are arrays useful in a diagnostic lab

A
  1. Untargted analysis of constitutional errors.
  2. Yield is greater than that of karyotyping alone.
  3. Improved resolution
91
Q

Advantages of CGH array

A
  1. Analyse DNA from almost any tissue type, no culturing necessary
  2. High resolution, customisable
  3. Objective data analysis
  4. SNP arrays can detect copy -neutral abnormalities.
  5. Automation and enhance software capability
92
Q

Disadvantage of CGH array

A
  1. Cannot detect genetic abnormalities that do not affect copy number
  2. Not useful for low level mosaicism and ploidy.
  3. Chromosomal mechanism is not defined
  4. Does not detect regions of the genome that are not covered. By probes, therefore not all micro-deletions / duplications.