Part 2 - Lecture 5/6 - Substitution Scoring Matrices Flashcards by Gia Gupta

What gives us an alignment and a score for similarity for an entire sequence?

global pairwise alignment

How well did you know this?

Not at all

Perfectly

What gives us the alignment and score for parts of sequences?

local pairwise alignment

How well did you know this?

Not at all

Perfectly

What can precisely indicate interesting residues of nucleotides and amino acids?

Multiple sequence alignment MSA

How well did you know this?

Not at all

Perfectly

What does MSA depend on?

accuracy in pairwise alignment - which depends on scoring

How well did you know this?

Not at all

Perfectly

What are the desirable features of a scoring matrix?

-we can think in terms of mutation and selection
-should we think about these differently - nucleotides and amino acids and what properties matter the most

How well did you know this?

Not at all

Perfectly

If you start with a high confidence alignment what do you get?

-have no gaps or spaces
-hopefully see very few mismatches
-can call these sequences related sequences

How well did you know this?

Not at all

Perfectly

How do you calculate the ways to choose k elements from a set of n elements?

nCk = (n!)/k!(n-k)!

How well did you know this?

Not at all

Perfectly

How do you move from substitution counts to probabilities?

take the number of pairs of nucleotides in column, and multiple by the number of columns and divide the counts by that calculated product

How well did you know this?

Not at all

Perfectly

Which exons are important?

the first and last exons

How well did you know this?

Not at all

Perfectly

What happens to the part of a gene after a stop codon?

it will be part of the mRNA post splicing but will not be expressed

How well did you know this?

Not at all

Perfectly

Why make a substitution scoring matrix?

had to do this because there was not way to make databases of genes to a single nucelotide of DNA position

How well did you know this?

Not at all

Perfectly

Why could you have a long untranslated part of a gene?

tRNA polymerase not starting at some position

How well did you know this?

Not at all

Perfectly

What happens more frequently and has less of an effect on function and is less penalized in a scoring matrix than translation?

transition

How well did you know this?

Not at all

Perfectly

In a real scoring matrix why are values scaled so that the highest entry is 100?

makes things easier to calculate

How well did you know this?

Not at all

Perfectly

How do amino acids affect protein structure?

hydrophobic residues go inside and hydrophilic outside which affects shape and not all parts of the protein are important; need to pay attention to H bonding, acidic, basic, polar, nonpolar; will amino acids be able to same role in chemical sense

How well did you know this?

Not at all

Perfectly

If you have two unreleated sequences if they are i.i.d than the pN is what which means the expected number of matches is what?

Study These Flashcards

pN=0.25; 1/4 of the sequences is matches

What is the null hypothesis for two sequences S1 and S2?

Study These Flashcards

S1 and S2 have no more similarity than expected by chance

What is the alternative hypothesis for two sequences S1 and S2?

Study These Flashcards

S1 and S2 seem related more than similar than expected by chance

What is testing hypotheses equivalent to?

Study These Flashcards

comparing models; allows us to compare two models which describe relationships between two factors

What is the probability for twp sequences by chance under Ho?

Study These Flashcards

What is the probability for twp sequences by chance under H1 or alternative hypothesis?

Study These Flashcards

What is the likelihood ratio and what does its value represent?

Study These Flashcards

that the sequence is 5X more likely to have arisen from our related model than our unrelated model

What does it mean if our starting data is symmetric?

Study These Flashcards

-no species are ancestors of others
-substitutions are not all symmetric in their biological rates - dinucleotides are not in equilibrium

How did we use our original scoring scheme?

Study These Flashcards

-add scores corresponding to different alignment positions

In the original scoring scheme what score were good and bad positions given?

good positions - positive score bad positions - negative score

What is mu or u in the original scoring scheme?

the relative weight of matches and mismatches

How do you factor the likelihood ratio to emphasize individual positions?

Is any information lost by taking the log of likelihood ratios?

No information is lost by taking the log of likelihood ratios

Why is it better to add logs than multiply ratios of prababilities?

it is easier computationally for computer and humans

What is the log likelihood scoring scheme?

just take the log of each term in the matrix

What does a score of more than zero mean for the log value?

is a high confidence alignment

Should match scores always be equal?

no they do not have to be

Should there only be on scoring matrix?

no because it depends on if we have different species and the rates and types of changes vary between different species over generations due to evolution

Can we make different scoring matrices for different situations?

yes you can begin with a high confidence alignment which corresponds to different time periods

What can we use a scoring matrix to get?

pairwise alignment

What can we use a pairwise alignment to get?

a MSA or multiple sequence alignment

What do we use out MSA or multiple sequence alignment as the basis of?

a scoring matrix

What is the function inference using sequence similarity?

(1)it works very well and (2) we can have a problem of drift in biases and (3) if it is recognized and persists it maybe inherent to genomics

Part 2 - Lecture 5/6 - Substitution Scoring Matrices Flashcards

(39 cards)