L3 Flashcards

1
Q

Retrieval of biological sequences in databases is based on what?

A

Similarity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Searching biological sequence databases involves?

A

Submission of a query sequence and performing a pairwise comparison query with all individual sequences in a database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Requirements for implementing algorithms for sequence database searching include

A
  • sensitivity
  • selectivity
  • speed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Sensitivity

A

Refers to the ability to find as many correct hits as possible. The correct hits are considered true positives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Selectivity

A

also called specificity, which refers to the ability to exclude incorrect hits. These
incorrect hits are considered “false positives.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Speed

A

which is the time it takes to get results from database searches

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

An increase in sensitivity leads to

A

a decrease in selectivity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

an increase in speed leads to

A

a decrease in sensitivity and selectivity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the types of algorithms in database searching

A
  • exhaustive
  • heuristic
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Exhaustive algorithm

A

makes use of a rigorous algorithm to find the best or exact solution for a particular problem by examining all mathematical combinations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Heuristic algorithm

A

a computational strategy to find the near optimal solution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do heuristic algorithms take shortcuts

A

by reducing space according to some criteria

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what are the methods used to infer sequence similarity

A

Global and Local alignment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Local alignment

A

Finds domains and short regions of similarity between a pair of sequences eg
-looking for domains within proteins
-looking for regions of genomic DNA that contain introns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Global alignment

A

Finds the optimal alignment over the entire length of the two sequences under comparison eg
-genes are being aligned whose sequences are of comparable length
-entire gene is homologous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what does BLAST stand for

A

Blasic Local Alignment Search Tool

17
Q

How does BLAST work

A

It uses heuristics to align a query sequence with all sequences in a database. Its objective is to find high-scoring segments among related sequences.

18
Q

How does BLAST perform sequence alignment

A
  1. reads in query sequence
  2. Create a list of words from the query sequence (seeding) 3 RESIDUES FOR PROTEIN, 11 FOR DNA SEQUENCES
  3. Search a sequence database for the occurrence of these words.
  4. matching of the words is scored by a given substitution matrix
  5. Pairwise alignment
19
Q

The resulting contiguous aligned segment pair without gaps is called what

A

high-scoring segment pair

20
Q

Database search programs such as BLAST use

A

scoring/substitution matrices

21
Q

Scoring matrices are what

A

empirical weighting schemes

22
Q

Possible identities and substitutions are assigned a score based on the?

A

observed frequencies of such occurrences in alignments of related proteins

23
Q

What does BLASTN do

A

queries nucleotide sequences with a nucleotide sequence database

24
Q

How does BLASTP work

A

uses protein sequences as queries to search against a protein sequence
database. Default word size is 3

25
Q

How does BLASTX work

A

uses translated nucleotide sequences as queries which are used to query a
protein sequence database.

26
Q

How does TBLASTN

A

queries protein sequences to a nucleotide sequence database with the DNA
sequences translated.

27
Q

How does TBLASTX work

A

uses nucleotide sequences, which are to search against a nucleotide sequence
database that has all the sequences translated also

28
Q

What is BLAST used for?

A
  • to detect similarity between sequences of interest.
  • to determine whether there are other plausible alignments between query and target sequences
29
Q

What is the BLAST E-value

A

it provides information about the likelihood that a given sequence match is
purely by chance. The lower the E-value, the less likely the database match is a result of
random chance.

30
Q

HSPs significances are determined by Blast using the Karlin-Altschul equation

A

E = kmNe -lamda(s)

31
Q

E stands for

A

the expectation value

32
Q

k and lamda are what?

A

Karlin-Altschul constants

33
Q

m stands for

A

the number of letters (amino acids/nucleotides) in the query

34
Q

N is the

A

the total number of letters (aa/nuc) in the database

35
Q

If E < 1e− 50 (or 1 × 10−50),

A

there should be an extremely high confidence that the database match is a result of homologous relationships.

36
Q

If E is between 0.01 and 1e− 50,

A

the match can be considered a result of homology

37
Q

If E is between 0.01 and 10,

A

the match is considered not significant, but may hint at a tentative remote homology relationship.

38
Q

If E > 10,

A

the sequences under consideration are either unrelated or related by
extremely distant relationships that fall below the limit of detection with the current
method.