Investigating the Genome Flashcards
What has happened to genome seq price?
Seq price dec sig where seq costs $1000
Why is whole genome seq better than exome seq?
can get more data in 1 step
What are benefits of collecting from whole genome?
- whole genome is complete (can also know where pieces been missed)
- indiv’s genome doesn’t change
- pot to collect it once, store and refer to again for clinical care
- only need to analyse each time for specific ques and not for every disease
Where are C.F. clinical images stored?
in PACS (pic archiving & comm system)
What makes cheap whole genome seq poss?
next gen tech (NGS)
What is NGS based on?
seq bns of random fragments in parallel
How long is length of a fragment?
150 bases x 2
How much is each pos in genome (3bn letters) seq on av?
30x
How much is 2 copies in each cell seq on av?
15x
What is adv of equal dis of sequence fragments?
easier to spot which side is wild type and which has mutation
What is disadv of unequal dis of sequence fragments?
- poss to get e.g. 4/5 of 30 on 1 but can still detect mutation
- but only 2 on 1 side can make you think they’re just errors
List the diff types of variation
- single nucleotide variation (SNV)
- del (1 base/many
- ins (1 base/many) - special case: tandem dup
- inv
- translocation
What can WGS detect and not detect?
can detect small rearrangements (SNV, del, ins, inv, trans) but not large variations where you need to work out structure but can tell they’re there
What are limitations of current tech?
- short reads of NGS make accurate characterisation of large variants hard bc most human genomes been seq with NGS so knowledge of ‘normal’ structural variants limited
- short fragments - hard to reconstruct anything specifically diff about 1 genome compared to ref
- NGS accuracy currently lower than older, more expensive seq tech + variants detected by NGS verified using ‘Sanger seq’ (involves use of primers to target variants)
What % of genome is whole exome (protein-coding region)?
1.5%
How many bases and variants does whole exome have?
- bases: 30-50mn
- ~ 20 000 variants
How many bases and variants does whole genome have? What are the sig of the variants?
- bases: 3bn
- variants: 3mn (most not going to have any effect as looking for single variant that causes disease)
What are the 2 sources of info about variants?
- functional annotation of ref genome
2. occurrence between affected + unaffected indiv
Give an e.g. of functional annotation of ref genome
- annotation of SMURF2 gene, covering just 123,340 bases of genome
How does exome seq compare with whole genome seq in functional annotation of genome?
- exome seq - just seq fragments but in whole genome seq everything
- can see where variant compares with known annotation e.g. if its in protein coding gene/another region of genome
What is reg build?
involved in controlling genes
How many non-coding genes are there according to GENCODE 25 stats?
~ 20 000 non-coding genes
What are the strategies to identify causal variants?
- filter freq observed variants
- look for variants identified as pathogenic
- look for variants in genes linked to cond
- look for variants that affect functional elements
- look for variants normally conserved (across species)
How can you filter freq observed variants?
- ExAC for variants in coding regions (>60k exomes)
- 1000 genomes data for variants outside coding regions
- coloured bits where variation has been seen but if filter common ones + leave out rare ones, no.s dec dramatically
- if seen variant a lot in pop before - probs not that variant as will be rare in pop
What functional elements can be used to look for variants?
- protein coding seq - does it change it?
- splicing - “
- reg element (but don’t know effects well enough + probs don’t cause disease if they are affected)
How are variants labelled as pathogenic?
- from rare disease diagnostic seq + added to databases
What causes false +ve pathogenic variants?
- freq of variant occurrence only recently been surveyed in normal pop so put variants in database when they don’t know how common they are in pop as not much seq has occurred at that time
What does ExAC database do?
aggregates protein coding regions from 60 000 indiv - gives idea of which variants have been seen before