7: Investigating Cancer Genomes Flashcards
What is a main motivation for large-scale cancer genome sequencing studies?
Identify cancer driver genes
They provide cells with a growth advantage when mutated
Why are cancer bio banks paired with blood samples from the same patient so valuable to clinicians?
Shows the progress/regression of a patient, and can identify what is going on from diagnosis > flow up/relapse
At what points through the cancer patient’s journey are blood samples taken?
Diagnosis
Surgery
Chemo start
Chemo end
Follow - Up
Relapse, if applicable
Outline the steps of workflow of NGS data anaylsis
- Assessment of Quality
- Aligning sequences
- Identifying variants
- Annotating variants
- Visualizing NGS data
What is NGS data analysis?
Next-generation sequencing
emerging technology which determines DNA/RNA sequences for whole genome or specific regions of interest
What happens during assessment of quality in NGS data analysis?
- NGS reads are evaluated to remove, correct, or trim reads that don’t meet standards
- Errors include base calling errors, poor quality reads.
- This is mostly automated.
What are NGS reads?
Short reads from chopping genome randomly and re-assembling them
What happens during the aligning sequences phase of NGS data analysis?
Reads as aligned to reference genome, eg to GRC (genome reference consortium).
What happens during the identifying variants phase of NGS data analysis?
Compares the difference between patient tumour and reference.
Sequence coverage is important in this stage as identified mutations hound be supported by multiple reads.
What is Coverage in NGS?
The average number of reads that align to/cover known reference bases
What is depth of coverage?
The number of reads of a given nucleotide in an experiment
Why is coverage an important factor for variant detection?
- Determines wether discovery can be made with a certain degree of confidence at particular base positions
- The higher the depth of coverage, the more likely you are to find all mutants
What are the preferred depths of coverage for normal versus cancer DNA?
Normal: 30x
Cancer: 60x
Why is the preferred depth of coverage for Cancer DNA higher than for normal DNA?
Due to tumour heterogeneity.
Some parts of the tumour may have a mutation present, whilst other areas may not. You therefore need more coverage as normal tissues tend to be more homogenous.
What are the 3 different groups of genomic changes in cancer?
- small variants (SMPs, indels. <50bp change)
- copy number alterations (amplifications, deletions)
- structural variations (inversions, translocations)
What happens during the annotating variants of NGS data analysis?
Identifies disease causing variants. Annotation of SNPs and INDELs provided via computational annotation tools.
What happens during the visualization NGS data phase of NGS data analysis?
Use visualization tools and genome browsers to visualize variants.
Obtain information about variants.
What variants can we obtain by visualizing variants?
- Mapping quality
- Aligned reads
- Annotation information ( consequence, impact of variance, scores of annotation tools)
How may a genome be visualization using a UCSC genome browser?
- Gene as a long horizontal line, and exons as small vertical lines along it.
- Arrows to denote direction of gene from promoter to 3’ end.
What is the overall idealized pipeline Fiordland cancer genome analysis?
- Sequence data prep and processing (sequencing of matched tumour/normal DNA, alignment to reference genome)
- Dissect and catalogue genomic changes (Nucleotide changes, copy number alterations, structural variation)
- Consequence analysis (recurrent changes, significantly altered genes, biological pathways)
What is the challenge of genome sequencing glioblastomas?
There is strong intra- and inter- tumoural heterogeneity
What % of the mutations in cancer occur in non-coding parts of the genome?
98.5%
Outline the idea of evolutionary conservation
- Conserved genome positions for 100 million years implies these areas are important, and have a specific function
- We can observe conserved invariant sequences across species and evolution
- We can use this to identify novel candidate driver genes