Week 4 Flashcards

Question

Why would you need to access sequence data?

Answer 1

1. Know what the sequence of a gene is 2. Identify variants in the sequence 3. Compare your sequence to others 4. Identify similar sequences 5. Find diseases associated with variation in your gene of interest.

Answer 2

Extensive biomedical literature database.

Answer 3

Comprehensive, integrated, well-annotated set of reference sequences -genomic, transcript and protein.

Answer 4

Online mendeline inheritance in man- database of human genes and genetic phenotypes

Answer 5

Database of genomic variation and the relationship to human health.

Answer 6

Resource for high quality integrated annotation data

Answer 7

Universal protein resource for protein sequence and functional annotation data

Answer 8

Protein data bank Europe -collection of 3D structural data.

Answer 9

Database of protein families, domains and conserved site.

Answer 10

Enable researchers too better understand the role of genomic DNA variation in both health and disease states.

Answer 11

Aggregate and harmonise exome sequencing data from a wide variety of large-scale sequencing projects.

Answer 12

1. Positive z-score; fewer variants observed than expected: highly constraint, intolerant to variation. 2. Negative z-score: gene has more variants observed than expected: tolerant to variation.

Answer 13

PLI score close to 0 - LoF is tolerated by natural selection PLI score close to 1 -Genes where LoF is not tolerated/ Haploinsufficient genes.

Answer 14

1. Multiple congenital malformations 2. Single congenital malformation with a number of Dysmorphic features. 3. Intellectual disability/developmental delay of unexplained aetiology or in association with a congenital anomaly /Dysmorphic features. 4. Significant growth disturbances

Answer 15

1. Screening/ surveillance: better outcome/prognosis. Patient and patient family can be Better prepared. 2. Clarify recurrence risk: test at risk family members and prenatal testing 3. Social support: access to grant, support services and resources.

Answer 16

1. Few genomic targets >20per sample 2. Fast, reliable. And low error rate validate NGS findings. 3. Targeted quick analysis

Answer 17

1. Can only seq one gene region at a time or hot-spot regions. 2. Less cost-effective for high number of regions.

Answer 18

1. Multiple samples and targets many genomic regions. 2. Higher discovery and variant resolution 3. More data with less DNA/RNA input.

Answer 19

1. Physical shearing, enzyme digestion and PCR based amplification-fragmentation. 2.ligate fragments to adapter sequences 3. Adapter sequences have unique barcodes that are used to tag each sample. 4. Important for pooling of libraries

Answer 20

1. Single-end reads -5’ or 3’ (random) 2. Paired -end reads -5’ and 3’ 3. Mate-pair reads -5’ and 3’

Answer 21

1. Categorical genetic disorder 2. Up to thousands of genes 3. High coverage and depth 4. Lowest cost 5. Highest accuracy amongst all the NGS categories.

Answer 22

1. Whole exome 2. Intermediate coverage and depth 3. Good accuracy

Answer 23

1. All genes and non coding DNA 2. Lower coverage 3. Highest cost 4. Lower accuracy

Answer 24

1. 100-300bp fragments 2. Sequencing by synthesis or ligation 3. DNA polymerase or ligase enzymes extend numerous DNA strand in parallel. 4. Short reads/ fragments are assembled together for contiguous sequence then aligned to the reference. 5. Most labs use short reads sequencing for SNV calling 6. Not ideal for complex and repetitive areas of the genome.

Answer 25

1. 5000-3000 base pairs in one single read. 2. Sequence directly from DNA/RNA 3. Sequence error rate is higher than short reads: variant calling unreliable 4. Aligning and processing long read sequence data takes longer 5. Most labs use LRS for CNVs (structural and also big Indels

Answer 26

Process submission on reported variants in patient samples. Assertions made regarding their clinical significance. Data is mapped. To reference sequences, and reported according to the HGVS standard.

Answer 27

1. To support computational re-evaluation, both of genotype and assertion. 2. To enable the ongoing evolution and development of knowledge.

Answer 28

Central resource that defines the clinical relevance of genes and variants for use in precision medicine and research.

Answer 29

1. It’s an up to date and comprehensive collection of known and published pathogenic gene variants.

Answer 30

1. Design/ platform such as Panel, WES or WGS depending on intended use 2. Disease inheritance -such as mode of inheritance - use OMIM ,gene review, varsome, HGMD. 3. Functional effect: is variant in a functional domain or hotspot region will it have a consequence? - use uniprot, varsome, many others 4. Population allele frequency - use gnomAD or 1000 genomes 5. Variant quality - IGV 6. Clinical relevance - de novo ? Gene/ allelic heterogeneity

Answer 31

1. Phenotype relatively distinct 2. Multiple genes known to cause similar phenotype.

Answer 32

1. Poorly defined phenotype 2. Suspected new syndrome

Answer 33

1. May detect deep intronic mutation 2. May detect breakpoints 3. May detect structural rearrangements

Answer 34

1. Ethnicity: high risk ancestry group 2. Who to test : closer the relative the better 3. Family history; any known mutations ? 4. Limitations of genetic testing approaches

Answer 35

2. Variants of uncertain significant 3. Incidental findings 4. If sequencing few sequencing targets-less cost effective. 5. More analysis time and complex 6. amplification bias, sequencing errors 7. Missing heritability

Answer 36

1. View genes with other annotation along the chromosome. 2. View alternative transcripts for a given gene. 3. Examine single nucleotide polymorphisms (SNPs) for a gene or chromosomal region. 4. Upload your own data 5. Use BLAST, or BLAT against any enable genomes 6. Export sequence or create a table of gene information with bioMart. 7. Variant effect predictor- effect of a variant on a gene.

Answer 37

1. Linkage disequilibrium 2. Age of population 3. Effective population size 4. Admixture 5. Selection 6. Autozygosity 7. Cultural norms and practices 8. Population size

Answer 38

1. 1000 genomes Project (high coverage) 2. African genome variation project (low coverage on Yoruba, Baganda, Ethiopia, Luhya and zulu) 3. Southern African human genome programme (>30x, on Sotho, zulu, Xhosa, coloured) 4. Uganda GPC (low coverage on the general population of Uganda) 5. H3Africa (>30x on 50 ethnolinguistics)

Answer 39

1. Limited data on hunter gather populations 2. Many ethnic groups not yet included in genomic studies. 3. Ancient genomes 4. Functional interpretation of variants 5. Phenotype to genotype links poorly understood 6. Modest sample sizes

Answer 40

1. Detect novel variation of potential functional impact 2. Develop African- appropriate research tools 3. Explore historical events 4. Understand the molecular and biochemical basis of disease on the continent 5. Ensure that personalised medicine has a role in Africa.

Answer 41

To develop a haplotype map of the human genome, which will describe the common patterns of human genetic variation.

Answer 42

The human HapMap is built on SNPs distribution approximately every 1000 base pairs throughout the genome. Analysis of the SNPs revealed regions that exhibit no recombination within one of the four test populations,flanked by short regions of high recombination frequency. This suggests that identifying only a few a SNPs in each recombination free region will be sufficient to predict the remaining SNP alleles in the same regions .

Answer 43

1. Similarity of allele frequencies in Chinese and Japanese samples. 2. Identification of recombination hotspots 3.haplotype sizes vary across populations due to migration along history. 4. LD correlates to genomic features

Answer 44

1. Discover population level human genetic variations of all types 2. Define haplotype structure and structural variation in the human genome. 3. Develop sequence analysis methods, tools, and other reagents that can be transferred to other sequencing projects.

Answer 45

1. Confirmed that non-African diversity is largely a subset of African diversity. 2. African sample provided a more complete discovery resource for variant sites in non-African than the converse. 3. Newly discovered SNPs are mostly at low frequency and enriched for functional variants.

Answer 46

Threshold remains the same but curve moves to the right (mean liability increases) , this means recurrence risk increases because families share genes and environment.

Answer 47

1. Determinants of phenotype for many diseases in monozygotic vs dizygotic twins. 2. MZ twins share 100% of genes + shared environment: therefore all differences between MZ twins assumed to be due to unshared environmental factors. 3. DZ twins share 50% of genes + shared environment: if co-occurrence of condition in both twins occurs more commonly in MZ than DZ twins then is assumed to reflect genetic differences.

Answer 48

Holds that the genetic component of most common non-communicable disorders is due to the combined effect of a relatively large number of disease causing alleles that occur relatively often in the population.

Answer 49

1. GWAS were designed to find loci associated with occurrence of multifactorial disease designed to interrogate the CD/CV hypothesis. 2. Case-control study design: compare large number of case (with disease) to controls (without). This genome wide SNP array comprising polymorphic SNP markers throughout genome.

Answer 50

1. LD patterns evolve over generations due to homologous recombination of chromosomes . 2. SNPs on the same chromosome are inherited in blocks and the pattern of SNPs in a block is a haplotype.

Answer 51

1. Elucidating the biology of complex disease 2. Identify therapeutic targets 3. Improving individual risk assessment

Answer 52

1. Improving prediction of disease occurrence 2. Informing screening 3. aiding disease diagnosis 4. Informing selection of therapeutic interventions

Answer 53

PRS= weighted sum of a number of risk alleles carried by an individual, where the risk alleles and their weights are defined by SNPs and their measured effects. PGS= weight x allele dosage + weight x allele dosage…..

Answer 54

1. The less overlap between true positive and false positives, the more concave the curve, the better test. Interpretation of the area under the curve: 0.5 < AUC <0.7 less accurate 0.7

Answer 55

Reflects a trait that is influenced by more than one gene

Answer 56

Reflects a trait that can be influenced by the environment.

Answer 57

How we predict that a polygenic multifactorial disease will occur in an individual.

Answer 58

1. Severity of the disease 2. Number of affected family members 3. How closely related a person is to affected individual

Answer 59

1. Common vs rare mutations 2. Structural variation 3. Epistasis 4. Environmental 5. Epigenetic’s

Answer 60

Mendelian- number of affected family members does not influence risk polygenic - recurrence risk varies multifactorial- more affected family members do influence risk

Answer 61

1. Genome-wide analysis technology used to assess DNA copy number. 2. Detection of genomic alterations such as copy number variations and copy-neutral changes

Answer 62

1. CNV-Deletion or duplication > 50bp 2. CNC- runs of homozygosity (ROH) /long contiguous stretches of homozygosity e.g uniparental disomy

Answer 63

1. 1- Color 2. 2- colour

Answer 64

1. Allele-specific oligonucleotide 2. Patient DNA is hybridised to the microarray and results analysed against a reference. 3. Detects both CNVs and SNPs 4. B allele frequency (BAF) = the B allele signal divided by the sum of the A and B signals.

Answer 65

1. Oligonucleotide probes 2. Differentially labelled patients and control DNA hybridised to the microarray 3. True comparative hybridisation 4. Relative fluorescence is converted to a Log2 ratio which indicates dosage.

Answer 66

1. Untargted analysis of constitutional errors. 2. Yield is greater than that of karyotyping alone. 3. Improved resolution

Answer 67

1. Analyse DNA from almost any tissue type, no culturing necessary 2. High resolution, customisable 3. Objective data analysis 4. SNP arrays can detect copy -neutral abnormalities. 5. Automation and enhance software capability

Answer 68

1. Cannot detect genetic abnormalities that do not affect copy number 2. Not useful for low level mosaicism and ploidy. 3. Chromosomal mechanism is not defined 4. Does not detect regions of the genome that are not covered. By probes, therefore not all micro-deletions / duplications.

Week 4 Flashcards

(92 cards)