Investigating the Genome Flashcards

1
Q

What has happened to genome seq price?

A

Seq price dec sig where seq costs $1000

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why is whole genome seq better than exome seq?

A

can get more data in 1 step

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are benefits of collecting from whole genome?

A
  • whole genome is complete (can also know where pieces been missed)
  • indiv’s genome doesn’t change
  • pot to collect it once, store and refer to again for clinical care
  • only need to analyse each time for specific ques and not for every disease
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Where are C.F. clinical images stored?

A

in PACS (pic archiving & comm system)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What makes cheap whole genome seq poss?

A

next gen tech (NGS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is NGS based on?

A

seq bns of random fragments in parallel

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How long is length of a fragment?

A

150 bases x 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How much is each pos in genome (3bn letters) seq on av?

A

30x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How much is 2 copies in each cell seq on av?

A

15x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is adv of equal dis of sequence fragments?

A

easier to spot which side is wild type and which has mutation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is disadv of unequal dis of sequence fragments?

A
  • poss to get e.g. 4/5 of 30 on 1 but can still detect mutation
  • but only 2 on 1 side can make you think they’re just errors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

List the diff types of variation

A
  • single nucleotide variation (SNV)
  • del (1 base/many
  • ins (1 base/many) - special case: tandem dup
  • inv
  • translocation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What can WGS detect and not detect?

A

can detect small rearrangements (SNV, del, ins, inv, trans) but not large variations where you need to work out structure but can tell they’re there

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are limitations of current tech?

A
  • short reads of NGS make accurate characterisation of large variants hard bc most human genomes been seq with NGS so knowledge of ‘normal’ structural variants limited
  • short fragments - hard to reconstruct anything specifically diff about 1 genome compared to ref
  • NGS accuracy currently lower than older, more expensive seq tech + variants detected by NGS verified using ‘Sanger seq’ (involves use of primers to target variants)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What % of genome is whole exome (protein-coding region)?

A

1.5%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How many bases and variants does whole exome have?

A
  • bases: 30-50mn

- ~ 20 000 variants

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How many bases and variants does whole genome have? What are the sig of the variants?

A
  • bases: 3bn

- variants: 3mn (most not going to have any effect as looking for single variant that causes disease)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the 2 sources of info about variants?

A
  1. functional annotation of ref genome

2. occurrence between affected + unaffected indiv

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Give an e.g. of functional annotation of ref genome

A
  • annotation of SMURF2 gene, covering just 123,340 bases of genome
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How does exome seq compare with whole genome seq in functional annotation of genome?

A
  • exome seq - just seq fragments but in whole genome seq everything
  • can see where variant compares with known annotation e.g. if its in protein coding gene/another region of genome
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is reg build?

A

involved in controlling genes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How many non-coding genes are there according to GENCODE 25 stats?

A

~ 20 000 non-coding genes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are the strategies to identify causal variants?

A
  • filter freq observed variants
  • look for variants identified as pathogenic
  • look for variants in genes linked to cond
  • look for variants that affect functional elements
  • look for variants normally conserved (across species)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How can you filter freq observed variants?

A
  • ExAC for variants in coding regions (>60k exomes)
  • 1000 genomes data for variants outside coding regions
  • coloured bits where variation has been seen but if filter common ones + leave out rare ones, no.s dec dramatically
  • if seen variant a lot in pop before - probs not that variant as will be rare in pop
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What functional elements can be used to look for variants?

A
  • protein coding seq - does it change it?
  • splicing - “
  • reg element (but don’t know effects well enough + probs don’t cause disease if they are affected)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

How are variants labelled as pathogenic?

A
  • from rare disease diagnostic seq + added to databases
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What causes false +ve pathogenic variants?

A
  • freq of variant occurrence only recently been surveyed in normal pop so put variants in database when they don’t know how common they are in pop as not much seq has occurred at that time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What does ExAC database do?

A

aggregates protein coding regions from 60 000 indiv - gives idea of which variants have been seen before

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What has ExAC analysis shown about false +ves?

A
  • each person has av of 54 mutations labelled as pathogenic but 41 would be false +ves as they are common
30
Q

What does rare disease discovery science do?

A
  • seq groups of affected indiv

- looking to identify genes sharing variants

31
Q

What does rare disease diagnostics do?

A
  • seq affected indiv + other family members (affected/unaffected)
  • data more restricted
32
Q

Why is WGS in clinical care restricted?

A

limits in understanding

33
Q

What is WGS application limited to?

A
  • monogenic diseases (looking for 1 variant)
  • patients with clear phenotype - can focus on genes know to be ass with cond
  • patients who are ill
34
Q

What is WGS reporting mainly limited to?

A
  • variants in protein coding sequence - easier to predict effect of mutation
35
Q

What can WGS not currently be usefully applied to?

A
  • diagnosis of complex diseases e.g. diabetes as involve many genes + variants
  • prediction of risk e.g. patients worried about 23andme result
  • patients with unexplained cond - can’t narrow down to subset of genes
36
Q

What is Strategic Priorities Working Group?

A
  • established for 100k genomes project
  • seq 100k patients in health service
  • recommended those affected by rare diseases, certain cancers + infections (seq pathogens genome instead)
  • areas where they believe into of genomic tech will have greatest benefit for patient health
37
Q

What were steps in UK towards genomic med?

A
  • 2009: House of Lords report on gen med said how seq not cheap + accurate
  • 2010: creation of Human Genome Strategy Group (HGSG) - looked at ways to implement project in health service
  • 2011: UK Life Sciences Strategy
  • 2012: HGSG report UK Life Sciences Strategy update, 100k genomes project launched
  • 2013: Genomics England launched - mapping DNA to better understand cancer, rare + infections diseases
38
Q

What is 100k genomes project?

A
  • primarily treatment not research project
  • NHS transformation project
  • all clinical WGS (>30x)
  • seq and compared rare disease proband/patient trios with cancer (normal/tumor pairs)
39
Q

What was mission of Genomics England?

A
  • seq 100k genomes
  • improve health of NHS patients (personalised strategy - stimulate wealth gen - can prod start-ups + dev algorithms which analyse data, make apps that interp data e.g. safe pharmacogen
  • create legacy of infrastructure, human capacity + capability (data from all genomes v. large - want to build scalable structure to support it in long term
  • enable large scale genomics research - as data is also for research purposes + improve indiv interp for patients
40
Q

What is the process overview?

A
  • procured seq:
    1. collect sample DNA
    2. Seq it (BAM)
    3. Work out variants (VCF)
  • procured annotation:
    4. identify candidate variants
    5. clinical interp
  • NHS:
    6. Clinical interp
    7. Seq validation
    8. Clinical action
  • everything fed into into GEL database - reviews whose best at the moment (could do this with Genomics Eng but diff experts on diff steps which changes over time)
41
Q

What makes up seq + anno assessment?

A
  • seq bake-off

- annotation bake-off

42
Q

What does seq bake-off involve?

A
  • samples sent to participants; returned seq assessed
  • evaluation on quality + coverage
  • informed seq consent
43
Q

What are NHS Genomic Med Centres?

A
  • 13 established Dec 2014+15 by NHS Eng - lead way in delivering Project
  • eligible patients referred to GMCs by clinicians
44
Q

Describe how Genomics Eng works with the Data Centre

A
  • Gen Eng has contract with seq companies e.g. Illumina - sends samples to them + evaluates what comes back
  • NHS Eng (involves as lots of patients included) have contract with NHS GMCs who give consent for phenotypes to be stored in Data Centre + DNA to be sent to Illumina stores BAM/VCF in Data Centre (gov one - includes health service data)
    NHS Firewall - data stays in NHS - analysis can occur + linked to records for clinical purposes
45
Q

How are data models dev?

A
  • consider which participants to recruit:
  • list of cond - currently ~ 70 (which bits of clinical info useful for interp genome?)
  • eligibility statements
  • consider what data you need:
  • metadata: demographics, sample, consent
  • clinical data: data models (everything known about cond)
  • ass genes: gene packages
46
Q

What’s the diff in dev data models for rare disease patients?

A
  • complex as need consultation with experts in field
47
Q

What is human phenotype ontology?

A
  • universal ontology for phenotypic features
  • chosen as standard for deep rep of phenotypic features
  • adopted by other projects familiar to many in rare diseases
  • being actively dev in collab with broader R&D
  • existing mapping from diseases to HPO terms
  • data models specific to each cond with diff levels
  • add terms not present in data model can be nat added
48
Q

What does anno bake-off involve?

A
  • seq sent to participants (BAM + VCF)
  • rare diseases: trio
  • cancer: germline + tumor
49
Q

What is disavd of anno bake-off?

A
  • harder than assessing seq
  • gold standard less well defined
  • lack of established data standards
50
Q

Who does Gen Eng have contracts with?

A
  • seq companies
  • Data Centre
  • genome interp service companies (also has grants with them)
51
Q

What is function of clincal interp services?

A
  • genome interp services linked to them

- send clinical report to NHS GMCs

52
Q

What is role of Gen Eng Clin Intero Partnership/GENE Embassies?

A
  • other clinical NHS Data inc registry (PHE) + HES (HSCIC) sent to them
  • GENE Consortium also sends info to them
  • GeCIP also linked
  • receives info from Biobank sample
53
Q

What is role of Biobank Sample?

A
  • receives info from NHS GMCs and GeCIP
54
Q

What is Health Education Eng?

A
  • genomics ed prog
  • 9 uni providers of MSc in Genomic Med
  • aimed at NHS healthcare prof working in Eng
  • full/part time study
  • fully funded places available through HEE
  • indiv (CPPD) modules available for range of prof backgrounds + groups e.g. med, bursing, healthcare scientists + technologists
  • online training + resources
55
Q

What makes up feedback to the NHS?

A
  • info about patient’s main cond (all participants agree to receive results about main cond referred for
  • info about add ‘serious + actionable’ cond (optional) - participants can opt in to receive feedback on selection of known gen alterations of high clinical sig
  • carrier status for non affected patients of children with rare disease (optional) - eligible adults can opt in to find out their current status for certain gen diseases
56
Q

What add findings are offered by 100k genomes project?

A
  • optional predis for:
  • bowel cancer
  • breast cancer
  • other cancer
  • familial hypercholestrolaemia
  • auto rec carrier status
57
Q

What are the requirements for add findings offered by 100k genomes project?

A
  • reliably detected by genome seq
  • curated list of high confidence + penetrance variants
  • treatable/preventable cond
  • other cond may be added if clinically appropriate + technically feasible
58
Q

What are patient and public participation groups?

A
  • contribute to project within each GMC
  • national participant panel - members will sit on committee inc Data Access Review Committee
  • panel ensures experiences of participants improved, respond to feedback + oversee who should have access to participant data
59
Q

Give e.g.s of rare disease pilot results

A
  • ~ 4800 participants for 170 diff cond
  • standardising eligibility & phenotyping using HPO
  • 12 966 +ve annotations - presence of a feature
  • 43 088 -ve anno - absence of a feature
  • findings on 1st 347 cases (670 genomes returned to GMCs in pilot with predicted diagnostic rate: 20-25%
60
Q

Who are the 3 genome interp providers and what do they do they do?

A
  • currently contracting pilot phase for up to 8000 “reports” with:
    1. Omicia
    2. Congenica
    3. WuXiNextCode
  • Gen Eng will provide web-based tools enable could be between GEL GePICs + GMCs to analyse, assess, review + validate clinical interp of whole genomics
61
Q

Describe standardisation through tiering of reported variants

A
  • tier 1:
  • in gene panel
  • clear LOF (truncating splicing etc)
  • known pathogenic variants
  • tier 2:
  • in gene panel
  • missense + other VUS
  • tier 3: not in panel
62
Q

What is panel app?

A
  • rare new disease gene tool allowing knowledge of rare disease gen to be shared + evaluated
  • aims:
63
Q

What are aims of panel app?

A
  • use expert knowledge to establish final diagostic grade gene panel (green list) for each disorder - used in classification of gen variants to aid clinical interp of rare disease genomes
  • engage scientific comm, encourage open debate + begin to establish consensus on gene panels for rare disease
  • standardisation of terms + collection of gene-disease related info, acc of reviews over time + updated resources
64
Q

Who and what access is there to panel app?

A
  • public access: view + download panels + view reviewer’s comments
  • register to be a reviewer: view + download panels + view reviewer’s comments + evaluate genes + make comments + add genes to gene panel
65
Q

What is the research protocol under new designation of “Bioresource”?

A
  • single project-wide approval: no need for site specific approvals
  • indep review committee grants data access to bona-fide research uses
  • consent for return of add findings (secondary: 17 genes + carrier status: 8 genes)
  • participants can be re-contacted up to 4x a yr
  • samples for various - omics tech collected
  • revision of diagnosis if underlying ev changes (e.g. when new gene is discovered)
66
Q

What is Gen Eng Clin Interp Partnership (GeCIP)?

A
  • working with research comm
  • launched at Wellcome Trust inn June 2014
  • partnership between 2000+ researches from academia + NHS trainees + international collaborators
  • design to accelerate academic/industry partnership + dev of diagnostics + therapies
  • 35+ topics (domains) of research + most domains cover single disease/group of diseases + some are wider e.g. epigenomics, health ec + tech
  • all data gen contributes to Gen Eng Dataset
67
Q

What is GENE Consortium?

A
  • working with industry
  • 12 companies = Genomics Expert Network for Enterpirse (GENE) Consortium to oversee yr long industry trial
  • aims to identify most effective + secure way to accelerate dev of new diagnostics + treatments for patients
68
Q

Describe generic model for data use?

A
  • patients + clinicians give consent for clinical data to be used in data centre (inc research + clinical apps)
  • they also provide decision support query to clinical apps
  • data passes on research reports to researchers through NHS firewall
  • researchers pass on research data to clinical apps
  • clinical apps pass on clinical reports to patients + clinicians
69
Q

What is impact of Gen Eng?

A
  • Genome Med Centre contracts: engine for NHS personalised med transformation
  • phenotype data models for rare diseases + cancer: driver standardisation of secondary data capture
  • support for interp services: engine for ec activity around dev of genome interp apps for decision support
  • dual purpose data centre: pioneering single informatics ev that support clinical interp services + research analysis
70
Q

What is plan for Gen Eng by end of 2017?

A
  • 100 000 WGS of NHS patients
  • working with NHS, academics
    + industry to drive Gen Med in NHS
  • support with edu
  • leave legacy of NGS centres, sample pipeline + biorepository, large-scale data store that makes this usable by NHS
  • new diagnostics + therapies + opportunities for patients