Lecture 7: Protein Domain Structure: Comparison and Classification Flashcards

1
Q

What is the general idea of a structural domain?

A

A segment within the protein that folds and can exist independently. Multiple domains are able to make up proteins.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a protein domain superfamily?

A

A group of protein domains that share a common evolutionary ancestor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Within superfamilies, how much of the structure is conserved?

A

At least 30-40% of structure is conserved.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the challenges when comparing protein structures?

A

Residue substitutions due to mutations.

Insertion and Deletions of residues.

While the core is usually conserved, there can still be significant differences outside the core.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the simplest way to score structural similarity?

A

Using the Root Mean Square Deviation (RMSD).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you get around insertions and deletions?

A

Only compare secondary structure, ignore the variable loop regions.

Use algorithms that explicitly handle insertions and deletions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are methods you can use to find secondary structure similarity?

A

Graph Theory

Contact Maps

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How is graph theory used to score structural similarity?

A

Secondary structures are represented using nodes.

The edges represent distance and angles.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are contact maps and how are they used to score structural similarity?

A

Contact maps are essentially distance maps that represent how far each residue is from other residues in the protein’s folded state.

Sections of contact maps for different proteins are then used as comparison metrics between them.

An RMSD score is output from this comparison, which is used as a scoring metric.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Foldseek and what are its basic principles?

A

Foldseek reduces a 3D structure to a 1D sequence.

It does this by defining an alphabet that describes 3D interactions.

Once sequence has been created, regular sequence comparison algorithms are used for comparison.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is CATH?

A

It is a domain classification database.

CATH stands for Class, Architecture, Topology, Homologous Superfamily.

It has classified ~600,000 domains into ~5,000 domain superfamily.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the different classes?

A

1.Mostly Alpha
2.Mostly Beta
3.Mixed
4.Few Secondary Structures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How many architectures exist in CATH?

A

40+ (~41)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is divergent evolution and convergent evolution?

A

Divergent evolution is when domains share the same ancestor.

Convergent evolution is where domains/proteins have the same shape because it was advantageous to have it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly