Bioinformatics 1 Flashcards

1
Q

What are two types of pairwise alignments?

A

Global alignment and local alignment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a global alignment?

A

Finds the simplest alignment across the entire two sequences - key for building a MSA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are global alignments used for?

A

Similar proteins - same protein from different species.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a local alignment?

A

Tries to identify any common domains/regions between sequences and aligns them - these will be surrounded by unaligned regions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What happens if you use a local alignment on very similar proteins?

A

It will affectively just do a global alignment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are local alignments used for?

A

Protein sequences with common domains and aligning cDNA to the genomic sequence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why do you need to use a local alignment for aligning cDNA to the genome and not a global alignment?

A

A global alignment will try to force an alignment because the exons match - but would be wrong.
A local alignment would give several alignments for each exon match in the genome - this identifies their location in the genome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What database does blastn use?

A

nucleotide

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What database does blastp use?

A

amino acid

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the query type and database for tblastn?

A
query = amino acid
database = translated nucleotide
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the query type and database for blastx?

A
query = translated nucleotide
database = amino acid
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the query type and database for tblastx?

A
query = translated nucleotide 
database = translated nucleotdie
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Which one of all the BLAST programs does not search in a protein database?

A

blastn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why can a non exact match be as informative as a perfect match in an alignment?

A

Because amino acids are mutated during evolution - there is a higher probability the mutated amino acid being maintained if the chemical/physical properties of original amino acid are being maintained in the mutation. This means that mutations between sequences can be scored.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do you build a substitution matrix?

A

By choosing a group of similar proteins and scoring based on the observed frequency of the amino acids within the protein.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What was the first substitution matrix and when was it developed?

A

PAM - point accepted mutation developed in the 1970s.

Note = by Margaret Dayhoff

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How was PAM developed?

A

Did a global alignment on closely related proteins and looked at the sequence differences in the proteins - used this to derive a scoring matrix for how often at position did one amino acid mutate to another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What type of proteins does PAM only work with?

A

Only works with closely related proteins.

19
Q

What is the difference between PAM30 and PAM70?

A

The higher number in the naming scheme denotes lower sequence similarity and larger evolutionary distance.

20
Q

What does BLOSUM stand for?

A

Block substitution matrix

21
Q

What kind of proteins does BLOSUM work with?

A

Evolutionary divergent proteins.

22
Q

Explain the basis of BLOSUM.

A

Does MSA of evolutionary proteins - looks at conserved regions of the proteins. If they are conserved this normally means they are functionally important and more pressure on amino acids that mutate to maintain similar properties.

23
Q

How does a BLOSUM62 matrix work?

A

calculates the likelihood of the two amino acids that have aligned mutating to one another. This attributes a score to every alignment - so can see which alignment is the best.

24
Q

What does a positive BLOSUM62 score mean?

A

Conservative substitution - likely to happen

e.g. Lys/Arg = +2

25
Q

What does a negative BLOSUM62 score mean?

A

Unlikely to happen.

e.g. Gly/Leu = -4

26
Q

What is the log odds?

A

The logarithm of the odds of success.

27
Q

What is the equation for log odds?

A

log(observed frequency AA paring/expected random frequency of pairing)

28
Q

What log base does BLAST used?

A

base 2

29
Q

If the frequency of Met = 1/100 and leucine = 1/10, what is the log odds if the observed frequency is 1/500?

A

random frequency = 1/100*1/10 = 1/1000
(1/500)/(1/1000) = 2
log2(2) = 1

30
Q

How are log odds simplified?

A

They are converted to integers/

31
Q

How is precision of the score maintained when converting the score to an integer?

A

It is multipliedby a scaling factor

32
Q

What are scores that have been scaled and converted to integers called?

A

Raw scores

33
Q

What scaling factor is used for BLOSUM matrices?

A

2

34
Q

How this normalised score calculated from the raw score and why is it?

A

Use the matrix specific constant lambda.

Done because raw scores can be misleading as the scaling factors are arbitrary.

35
Q

What is lambda?

A

Is approximately the inverse of the original scaling factor - however value may be different due to integer rounding errors.

36
Q

What does BLAST also introduce to achieve the best possible alignment?

A

Gaps

37
Q

What is the penalty for a opening a gap and why is it so high?

A

Penalty = -11

From an evolutionary perspective insertions and deletions are costly so a high penalty.

38
Q

What is the penalty for extending a gap?

A

-1

39
Q

What is the penalty for opening a gap and extending a gap in nucleotide alignments?

A
Opening = -5
Extending = -2
40
Q

Define BLAST score.

A

The score is calculated by incrementing for matches/similarities and decrementing for mismatches gaps.

41
Q

Define BLAST identity.

A

The number of residues that are identical in the alignment.

42
Q

Define BLAST gaps.

A

The number of gaps in the alignment.

43
Q

Define BLAST e-value.

A

It is a measure of how reliable an alignment is/how likely it is to be correct.

44
Q

What does the E-value take into account?

A

Size of the query sequence, length of the alignment, overall score and the size of the queried database.