Protein Function Flashcards

1
Q

What is protein function?

A

The specific role or task that a protein performs within a cell or organism.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How is the functional aspect of a protein investigated?

A

Gene Ontology (GO) - a standardized machine-readable vocabulary
Allows researchers to know which aspect of function is being investigated and to annotate their findings in a computationally processed format.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the three ontology trees described by GO that describe the different aspects of protein function?

A
  1. Molecular function
  2. Biological process
  3. Cellular location
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How does the standard vocabulary provided by GO help with functional predictions of proteins?

A
  • Allows annotations to be computationally processed
  • Provides a standard approach for programs to output their functional predictions.
    Makes it easier to compare and analyze the functions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is homology-based transfer?

A
  • Computatuional funciton-prediction method
  • Assigning unannotated proteins with the function of their annotated homologs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why is homology-based transfer commonly used for protein function prediction?

A

Based on the assumption that two sequences with a high degree of similarity most likely evolved from a common ancestor and therefore must have similar functions.
This approach is based on the relationship between proteins that are homologous.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is BlastP and how does it work?

A
  • A program used for comparing an unknown protein sequence against a sequence database.
  • It is the fastest and most widely used heuristic tool for pairwise protein sequence comparison.

Basic Local Allignment Search Tool

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the difference between FASTA and BLAST and when would you use each one?

A

Two commonly used programs for comparing an unknown sequence against a sequence database.
FASTA
* Better at DNA searches
* Sometimes misses weak protein matches

BLAST
* Better at protein searches.
* The fastest and most widely used heuristic tool for pairwise protein sequence comparison.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a graphic display in BLAST and what does the horizontal axis represent?

A

Provides an overview of the alignments.
The horizontal axis corresponds to the query sequence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What do the color codes in a BLAST graphic display indicate?

A

Indicates the quality of the match.
* Red = good
* Green = acceptable
* Black = bad

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What information is included in a Hit List?

A

Provides:
* The sequence accession number and name,
* Description of potential function based on annotation
* Bit score
* E-value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does the bit score measure in BLAST search results?

A

Measures the statistical significance of the alignment.
High bit score = good match

match between the query sequence and database sequence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the E-value in BLAST and what does a low E value indicate?

A
  • The expected number of chance alignments.
    (Likelihood that the similarity occured by chance)
  • The lower the more significant the hit
  • E-value < 10-4 = good match.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What information is provided in the Alignments section of BLAST search results?

A
  • % identity
  • Positives
  • Gaps
  • Length of the alignment.

Top line is the query sequence
Bottom line is the subject sequence
The numbers to the right of the sequences indicate the coordinates of the match on the query and subject sequence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does the % identity in BLAST represent?

A

The number of identical residues divided by the number of matched residues, ignoring gaps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does the positives field measure in BLAST search results?

A

Gives a measure of the fraction of residues that are either identical or similar, represented by a +.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does the gaps field show in BLAST search results?

A

Shows residues that were not aligned.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are low-complexity segments in BLAST search results?

A

Regions containing many identical residues
masked by BLAST in the query sequence with the letter “X”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is Identity in sequence alignment?

A

Identity is a measure made on an alignment
E.g. Sequence A can be “32% identical to” Sequence B.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is similarity in sequence alignment?

A

A measure of how close two amino acids are to being identical.
E.g. isoleucine and leucine are similar.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What does homology imply in sequence alignment?

A

Implies an evolutionary relationship between sequences
Either exists or does not exist.

For example, Sequence A IS or IS NOT homologous to Sequence B,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the difference between Homology and Similarity in sequence alignment?

A
  • Homologous implies they share an evolutionary relationship and have a similar 3D structure and function.
  • Similar merely implies that their sequences are similar.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the difference in the criteria used to determine homology between DNA and protein sequences?

A

DNA sequences,
Two sequences with more than 70% identity (over 100 nucleotides) are homologs.

Protein Sequences
Two sequences with more than 25% identity (over 100 amino acids) are homologs.

There is a higher bar for two genes to be homologized because the DNA code is degenerate, and you can change a codon but still code for the same amino acid.

24
Q

What is the “twilight zone” in sequence alignment?

A

When two proteins (at least 100 residues long) share less than 25% identity

25
Q

What are sequence motifs?

A

Conserved regions of a protein that evolve at different rates.

26
Q

How are sequence motifs used to predict protein function?

A

By identifying short sequences
that are diagnostic of the active site
or binding region of a protein

27
Q

What is an MBD motif and what does its presence mean?

A

Metal Binding Domain
* if a protein contains a MBD motif, it is possible to predict that one of its functions might be to bind metals.
* The presence provides an experimentally testable hypothesis as to protein function.

28
Q

What are the two main techniques to search for sequence motifs?

A
  1. Patterns (motifs)
  2. Profiles
29
Q

What is a consensus sequence?

A

a sequence of nucleotides or amino acids
that represents the most commonly observed base or
amino acid at each position in a sequence alignment.

30
Q

What is the advantage of using patterns (motifs) to search for sequence motifs?

A

It’s quick to run and the databases of such patterns are large and extensive.

31
Q

What is the disadvantage of using consensus sequences in pattern (motif) search?

A

Somewhat insensitive:
only patterns which exactly match the consensus sequence are reported as hits,
whereas sequences which almost match are completely ignored.

32
Q

What is a sequence profile?

A

A table that lists the frequencies of each amino acid
in each position of the conserved region
following a multiple sequence alignment

33
Q

How is the sensitivity of sequence profiles compared to patterns in identifying distantly related sequences?

A

Much more sensitive than patterns at identifying distantly related sequences.

34
Q

What is PROSITE?

A

A well-known pattern database containing around 2,000 different families
Uses highly conserved regions to create a signature of multiple patterns (motifs) for each domain family

35
Q

<M-R-[DE]-x(2,4)-[ALT]-{AM}

A

x’ - position where any amino acid is accepted
‘[ ]’ - can be occupied by any of the contained amino acids
* [DE] means D or E

’{ }’ - cannot be occupied by the contained amino acid
* {AM} means any amino acid other than A or M.

Repetition of a pattern component is indicated by following that component with a number
* x(2,4) corresponds to x-x, or x-x-x-x

Beginning with ‘<‘ or ends with ‘>’ - pattern is restricted to either the N- or C-terminal domain of a sequence
* This one starts at the N-terminal domain.

36
Q

What is the limitation of using PROSITE for sequence analysis?

A
  • Patterns in PROSITE are small and have a high chance of appearing at random in proteins,
  • Can lead to over-prediction of short sequences and false positives.
  • Has a limited ability to find complex patterns that do not match the consensus sequence due to slight variations in the test protein.
37
Q

What are profile databases?

A

A more sensitive tool than pattern databases for searching for conserved motifs in proteins.

38
Q

What are some examples of profile databases?

A
  • PRINTS
  • BLOCKS
  • Pfam
  • PROSITE
39
Q

What is InterPro?

A
  • A central compendium of family and domain descriptions that links different resources,
  • To provide access to a range of diagnostic opportunities for a given query through a single interface.
40
Q

What is phylogenomic profiling?

A

Predicts protein function based on the observation that:
The loci of functionally related proteins tend to be colocated on the chromosome
(in blocks or pedigrees).

genomic contect and expression-based prediciton method

41
Q

What function-predicition algorithm has phylogenomc profiling given rise to?

A

Phydbac2

Phylogenomic display of bacterial genes

42
Q

How does the “guilt by association” approach contribute to protein function annotation?

A

Predicts protein function based on co-expression or co-location of functionally related proteins.
* It has given rise to algorithms such as Phydbac2
* Useful for annotation of the cellular aspect of protein function.

43
Q

How can protein-protein interaction (PPI) data facilitate protein function annotation?

A

By assuming that physically interacting proteins have similar overall cellular functions.
PPI databases:
* STRING
* DIP
* BioGRID.

44
Q

What is a hydrophobicity plot?

A

A quantitative analysis of the degree of hydrophobicity or hydrophilicity of amino acids of a protein.

45
Q

How do hydrophilic residues and hydrophobic residues interact with the bilayer?

A

Hydrophilic residues
Favour hyrdophilic region of the bilayer
external bathing solution or cell cytoplasm
Hydrophobic residues
Favour the hydrophobic region of the bilayer
embedded in membrane

46
Q

Aromatic Residues

A

Favour lipid-water interface
Y (Tyrosine) and W (Tryptophan)

47
Q

What are some amino acids that “snorkel” to the water surface, and what is their dual nature?

A

Amino acids with long side chains
R (arginine) and K (Lysine)
Dual Nature:
* Hydrophobic chain
* Charged thermus

48
Q

What is the Kyte-Doolittle scale used for?

A

Hydrophobic amino acids.

49
Q

What is the Hopp-Woods scale used for?

A

Hydrophilic amino acids

50
Q

What can a hydrophobicity profile predict?

A

structural features of a protein,
or to determine the likely cellular location of a protein – i.e. membrane bound or cellular.

51
Q

What does a stretch of ~20 hydrophobic amino acids in a hydrophobicity plot indicate?

A

Likely part of alpha-helix spanning the lipid bilayer
suggesting that the protein is membrane-associated.

52
Q

What are leader sequences and how do they relate to protein localization?

A

Signals within a protein’s sequence that assist in their processing within the cell and can target proteins to specific compartments within cells.

53
Q

What is SignalP used for?

A

Predicts leader sequences and cleavage sites in both prokaryotes and eukaryotes.

54
Q

What is PSORT used for?

A
  • Searches prokaryotic or eukaryotic sequences for protein sorting signals
  • Reports on the probability of the protein being localized to different compartments within the cell.
55
Q

What are coiled coils?

A

Stable structures formed by two helices winding around each other
Held together by hydrophobic interactions at their interface.

56
Q

Where can coiled coils be found in proteins?

A

Leucine zippers in transcription factors

57
Q

What programs can be used to identify coiled coils?

A
  • PairCoil
  • COILS