Lecture 5 Flashcards

Question 1

Q

The SCOP database

Answer

A

Structural Classification of Proteins Database developed in Cambridge

Sorted proteins based on structural classes->folds->Superfamilies->families.

For example: Small proteins->Cysteine-knot cytokines->Cysteine-knot cytokines->Transforming growth factor beta - more of a functional than structural classification

a-helical

B-sheet

a+B - a-helices and B-sheets in different parts of proteins, no B-a-B motifs

a/B - Helices and sheets assembled from B-a-B motifs

a/B-linear - Line through centres of strands of sheet roughly linear

a/B-Barrels - Line through centres of strands of sheet roughly circular

Proteins with little/no secondary structure e.g. soft proteins

Question 2

Q

What does SCOP struggle with?

Answer

A

Domains

Many proteins are multi-domain and SCOP assumes they are single-domain

For example: clotting factors like factor XII:
Fn2-EGF-EGF-Fn1-Kr-SerPr

Factor IX:
Gla-EGF-EGF-SerPr

Difficult to classify evolutionary origins due to domain shuffling.

Question 3

Q

How do we define domains

Answer

A

The Gō plot method

Defines domains using distances of amino acids from centre of a protein and whether they cluster distantly to the centre of a protein.

Conducting Go plot:
1. Calculate the radius of spherical volume of protein
2. Calculate distance from each a-carbon of each amino acid to all the others
3. If distance is greater than spherical radius, score +.

Lines are drawn in a triangle to identify protein domains

Never get completely clean triangles in real proteins

Question 4

Q

What are the disadvantages of Gō plots

Answer

A

Requires solved structures
Domain boundaries not always clear
Gō method now superseded by sequence based algorithms

Question 5

Q

Explain modular evolution of proteins

Answer

A

Domain boundaries are usually at exon boundaries
Not all exon boundaries are domain boundaries
Genome rearrangements are important in evolution of new domain combinations

Question 6

Q

How does Pfam build domains?

Answer

A

Start with high quality protein structure (X-ray crystallography, good resolution, low angstrom).
BLAST PDB to find related protein structures.
Align these - maximise structural homology (adjust alignment so secondary structural element boundaries match).
Build a statistical profile (Hidden Markov Model - HMM) of ‘seed’ alignment.

Question 7

Q

Gaining a protein from PDB

Answer

A

Input seuqence and Protein sequence databank -> BLAST search

BLAST search -> Filter results (E<threshold) -> Multiple alignment sequence -> Position-specific scoring matrix

Question 8

Q

Hidden Markov modelling

Question 9

Q

Showing patterns using sequence logos

Answer

A

Sizes of DNA/amino acids can be used to illustrate their presence across a series of different molecules.

Question 10

Q

What software shows DNA/amino acid patterns using sequence logos

Answer

A

Pfam on a larger scale

Question 11

Q

How does pfam build domains?

Answer

A

Use the HMM to query GenPept – hmmsearch
Align the new hits to the HMM – hmmalign
Rebuild the HMM to include the new hits – hmmbuild
Repeat as desired, or until there are no new hits
“Structure, structure, structure” (Alex Bateman, founder of Pfam)

Question 12

Q

Disadvantages of Pfam?

Answer

A

Domains defined by a HMM, and HMM only as good as ‘seed’ alignment used to construct it.
HMM building process is iterative, so errors can be magnified.
Curation is uneven due to numbers of domains in Pfam
Viruses under-represented

Question 13

Q

Homstrad

Answer

A

Homologous Structure Alignment Database

Used to collect good seed alignments

Used in construction of globin molecules

Question 14

Q

Not enough structures

Answer

A

Structure determination is far harder than sequencing
Illumina and Minion sequencing have made sequencing ultra high throughput
No equivalent technological leap forward for structural biology

The structural genomics consortium

Question 15

Q