Lecture 6: An Intro to Clinical Bioinformatics Flashcards
Clinical bioinformatics
- putting together a puzzle and seeing how that effects the patient
- sample little pieces of genome
- some parts will easily recover and some will not
- some parts of the genome difficult to sequence
- hard to know where to look
sequencing bits of
- matter that we can probably understand
- coding called the exome
sequencing
- cutting genome section to bits and find the coding
- align reads back to the genome (bam file)
- undergo some checking
- remove over sequenced reads (duplicate)
- poor quality reads
- complexity of sequence matters
variance
- real deletion in DNA
- variation in genomes and across genomes
local realignment
- to see that there is a deletion
- pick up small deletion at about 18 to 20 base pairs
- insertion and deletion-indels
Phreds (quality) scores
- individual bases-look at score of reads across a sample
- mapping qualities
- base qualities
How do you know when variant is real?
- fewer reads lead susceptibility
- begging or ending reads have little cover
- blacklisted-genomic community notes that specify regions performed badly
A gene may have multiple ways a machinery reads it
-this give rise to a new transcript
Effect of protein for that transcript
- silicone-algorithm approach to figure out what might happen to the protein
- seen these variant in a particular gene
codons
-read in groups of three synonymous-amino acid is the same -missense or non synonymous -frameshift (indel) -codon for the stop get introduce -protein degraded and not used
24,000 variance in genome
- filter coding region and gene likely for the disease based on the person
- splice kind of variance we see
- deal with the artifact of the way we see things
- different per territory
- focus on gene relevant to patient’s disease
- deal w/current condition instead of secondary or incidental findingd
gene list
- based on phenotype of that patient
- used for babies and children
want to get to the proteins more that are:
-more rare and damaging
take into account the quality of info
- ask:
- have we seen the variant in that family before
- is it seen before in pop. database
- or was hat region covered in the population database or did the person ethnicity make them susceptible to it
- region never seen change then?
- some known mutations in region where they are not fully concern in other species?
All humans are:
variation, but most are fine
big problem in completion methods to figure out what a protein will do
-65,000 individuals in database to figure out what variance is tolerated
find region that are:
-restricted and areas that are not
common variance
-predicted to have high damaging effect
in-silicon prediction methods are:
-not independent
length of sequence
-change predictability