Shane - Lecture 1 Flashcards

1
Q

What can you infer if two sequences are similar?

A

They probably have the same ancestor, share the same structure and have a similar biological function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What qualifies as a homologue?
(2)

A

An amino acid sequence that is more than 100 amino acids long/nucleotides long

Where there is 25% identical amino acids or 70% identical nucleotides

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the twilight zone?
(2)

A

Protein sequences with between 0 and 20% identical amino acids

It is not significant -> this could have arisen by chance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is E-value?

A

Expectation value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does BLAST do?

A

Searches a database of your choice for sequences that have homology or a shared ancestry with the sequence you have entered

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does an E-value quantify?

A

The chance of the match happening by chance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does an E-value close to zero mean?

A

It is very unlikely that the similarity arose due to chance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does an E-value close to 1 mean?

A

It is very likely that the similarity arose due to chance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Write a note on BLAST
(4)

A

Quick heuristic alignment algorithm

Found on the national centre for biotechnology information (NCBI)

Matches DNA sequences to other DNA sequences

Uses either a gene sequence or a protein sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How does BLAST work?
(4)

A

It tries to find a small match first then expands on this match

It divides the sequence up into shorter parts e.g. 11 nucleotides and tries to match them (heuristic)

There might be some mismatches when the match extends

BLAST will keep extending the match until mismatches become too significant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does BLAST stand for?

A

Basic
Local
Alignment
Search
Tool

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

List some uses of BLAST

A

To identify an unknown sequence by trying to match it to something known

Get clues about the function/structure of a protein by finding similar proteins

Map a sequence in a genome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the two types of blast searches?

A

Nucleotide BLAST

Protein BLAST

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which type of BLAST is better to use?

A

Protein BLASTS are more sensitive and biologically significant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do you decide what type of BLAST to use?
(2)

A

Do you have a nucleotide sequence or a peptide sequence

Do you want a close match or something identical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What tool can you use to translate a nucleotide sequence into a peptide sequence?

A

ExPASy Translate tool

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Why is it recommended to use a protein search instead of a gene search?

A

Gene sequences might be identical but have different functions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Give examples of specialised BLAST searches

A

BlastX

tblastn

19
Q

Write a note on BlastX
(4)

A

You have a gene sequence but want a protein sequence

Blast X translates your nucleotide sequence into proteins and finds a match for it

BlastX translates the sequences in all six reading frames

BlastX is often the first analysis performed with newly determines nucleotide sequences

20
Q

How many reading frames does any piece of DNA have?

A

Six reading frames

21
Q

Explain what is meant by DNA having six reading frames
(3)

A

In one direction a DNA sequence can start from three different points = 3 reading frames

In the other direction a DNA sequence can start from three different point = 3 reading frames

= 6 reading frames

22
Q

What indicates a stop codon in a BLASTX search?

A

They will be highlighted in red

23
Q

How can you tell when BLASTX has used the wrong reading frame?

A

The stop codon will appear too early

24
Q

Write a note on tblastn
(5)

A

Searches a translated nucleotide database using a protein query

Rarely used

The reverse of BlastX

Reads 6-frame translations

Finds homologous protein coding regions in unannotated nucleotide sequences

25
Q

What does a sequence with refseq_protein in front of it indicate?

A

It indicates the protein has been sequenced really well and is really reliable

26
Q

What three things should you look out for in front of a sequence that proves it’s of good quality?

A

refeq_protein
swissprot
pdb

27
Q

Who allocates swissprot headers?

A

Swissprot protein sequences

A European protein database

28
Q

Who allocates ‘pdb’?

A

Protein Data Bank

Sequences from RCSB protein data bank with experimentally determined structures

29
Q

What should you consider when choosing an algorithm?

A

How close of a match are you looking for - a close match or a perfect match

30
Q

List the three protein-protein BLAST algorithms

A

Blastp
PSI-BLAST
PHI-BLAST

31
Q

What is Blastp used for?

A

For distantly related proteins

32
Q

What is PSI-Blast used for?

A

Distantly related proteins
There are some allowed mutations

33
Q

What is PHI-Blast used for
(2)

A

Used when you know protein family has a signature pattern, active site, structural domain etc

Looking for another protein with a specific pattern

34
Q

What are the three nucleotide BLAST algorithms?

A

blastn
Megablast
Discontiguous megablast

35
Q

What is blast n used for?

A

Used to find similar nucleotide sequences

36
Q

What is megablast used for?
(3)

A

Used to find highly similar nucleotide sequences

Very fast

Used to identify nucleotide sequences

37
Q

What is a discontiguous megablast used for?

A

Used to find possible homologies

Finds dissimilar sequences

38
Q

What is the BLAST MAx socre?

A

The score of single best aligned sequence

39
Q

What is the BLAST total score?

A

The sum of scores of all aligned sequences

40
Q

What does it mean if total score and max score are the same?

A

Only a single alignment is present

41
Q

What is query coverage?

A

What percentage of the query sequence is aligned

42
Q

What is E value?

A

Number of matches with the same score expected by chance

43
Q

List the three protein databases

A

Reference proteins (refseq_protein)

Swissprot protein sequences (swissprot)

Protein Data Bank proteins (pdb)