block 2 Flashcards

1
Q

Bioinformatics

A

uses computational methods to study protein sequence, structure and function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is the point of studying sequences and identifying similarity?

A

-Similarity indicates conserved function
-Human and mouse genes are more than 80% similar at sequence level
-But these genes are small fraction of genome
Most sequences in the genome are not recognizably similar
Comparing sequences helps us understand function
-Locate similar gene in another species to understand your new gene

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

protein domains

A

Definition: A distinct, independently folding unit of a protein chain.
Characteristics:
Can exist independently in structure and function.
Typically connected to other domains within the same protein.
Functions:
Examples include protein-protein interaction, nucleic acid binding, and catalytic activity.
Structure:
Tertiary structure: Includes the arrangement of units within domains and how domains fit together.
Quaternary structure: Refers to the association of separate polypeptide chains, not domains within the same chain.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what does a high sequence similarity mean?

A

-structural similarity and functional similarity
-probably the same fold

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

interPro:protein family database

A

InterPro: Protein Family Database

Overview: A large database containing ~22,000 protein families, represented by multiple sequence alignments.
Purpose: Helps identify functional regions (domains) and homologous proteins, offering insights into protein functions.
Key Concept: Proteins are made of functional domains, and combinations of these domains create the diverse range of proteins in nature.
Application: Useful for studying protein family relationships and functional annotations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

SCOP

A

structural classification of proteins.
-different classes within the database and different combinations of how they come together
-SCOP classifies proteins into a hierarchy using the following categories:

Class: Proteins are grouped into classes based on their secondary structure content, including all alpha proteins, all beta proteins, alpha and beta proteins (a/b), and alpha and beta proteins (a+b).

Fold: Describes the overall shape and arrangement of secondary structures.

Superfamily: Proteins within a superfamily are thought to share a common evolutionary origin.

Family: Proteins within the same family are closely related and share higher sequence similarity.
The SCOP database, along with other databases like CATH, helps in understanding the hierarchy of protein structure, aiding in protein classification and the study of their structure and function. The classifications are based on experimentally determined protein structures and are used to infer possible functions based on the relationship between structure and function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

why does sequence variation occurs

A

-due to random mutations and natural selection
-On the organism level, mutations might lead to disease or death
A single mutation in a protein is usually harmless, unless it appears in the following:
An active site: Enzymes
A binding site: Receptors, antibodies, signaling proteins
A site promoting toxic aggregation: Sickle-cell anemia

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

sequence formats

A

FASTA is the most completely used
-protein sequence with the file heading on top with a greater than symbol to indicate what the protein does

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

sequence alignment and pairwise identity

A

-To compare two (or more) sequences we need to align them (see below).
One way to quantify similarity between two aligned sequences is by their pairwise sequence identity (with a few caveats):
-Identical amino acids: -Contribute to identity
-Similar amino acids: Contribute to similarity
-Gaps: Contribute to the overall alignment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

substitution matrix

A

-pairwise alignment

-Definition: A scoring system to evaluate sequence alignments by quantifying the likelihood of character substitutions (e.g., amino acids or nucleotides).
Scores: Positive for likely substitutions; negative for rare ones.
Types:
PAM: Based on evolutionary changes in closely related sequences.
BLOSUM: Derived from conserved protein regions (e.g., BLOSUM62).
Uses: Sequence alignment, identifying homologs, and studying evolutionary relationships.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

classes of pairwise alignment

A

global allignment=Tries to align entire sequence
Align all letters from query to target
Suitable for closely related & equal length sequences
local alignment= Aligns regions with highest similarities
Align substring of target with substring of query
Suitable for more divergent sequences, different length and conserved region containing sequences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

sequence identity and homology

A

Homology vs Sequence Similarity:

Sequence similarity refers to the comparison of two sequences (e.g., DNA, RNA, or protein) to quantify how similar they are. This can be measured using:

Score or pairwise sequence identity (e.g., 76% identical in sequences).

Homology refers to the evolutionary relationship between sequences, i.e., whether two genes or proteins share a common ancestor. This is a hypothesis based on sequence comparison, not something that can be directly measured.

Incorrect statement: “Two sequences are 76% homologous.” This is not meaningful. Homology isn’t quantified as a percentage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

homologs

A

genes sharing a common origin

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

orthologs

A

genes originating from a single ancestral gene in the last common ancestor of the compared genomes (speciation is the key event)
- are homologs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

paralogs

A

genes related via duplication (gene duplication is the key event)
are homologs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the difference between tertiary and quaternary structure?

A

Tertiary structure describes how units within domains associate and how domains fit together. Quaternary structure describes how separate polypeptide chains associate with each other.

17
Q

What is a substitution matrix used for?

A

A substitution matrix is used in pairwise alignment to quantify similarity between amino acids, based on their biochemical and biophysical properties. Examples include BLOSUM62 and PAM250.

18
Q

What are the two main types of pairwise alignment?

A

The two main types of pairwise alignment are global alignment, which aligns the entire sequence, and local alignment, which aligns regions of highest similarity

19
Q

What is the main purpose of global alignment and local alignment?

A

Global alignment is used to identify folds and sequence homology, while local alignment is used to identify motifs and gene duplication events

20
Q

What is sequence identity?

A

Sequence identity refers to the identical amino acids in an alignment. It is a main tool to establish sequence homology and similarity in folds

21
Q

What is BLAST?

A

BLAST (Basic Local Alignment Search Tool) is a heuristic pairwise alignment tool that searches for local similarities between sequences, approximating the Smith-Waterman algorithm

22
Q

What is the most common input format for BLAST?

A

The most common input format for BLAST is the FASTA format

23
Q

What does the e-value in BLAST output represent?

A

The e-value in BLAST output represents the number of alignments expected by chance

24
Q

What is Multiple Sequence Alignment (MSA) used for?

A

MSA is used to align multiple sequences to identify conserved regions, motifs, and patterns. MSA can also be used to study sequence conservation, gene duplication events, and amino acids that are important for binding or catalysis.

25
What can a sequence logo depict?
Sequence logos are a graphical representation of sequence conservation. They depict sequence characteristics, such as protein-binding sites in DNA or functional units in proteins
26
What is Anfinsen's Hypothesis regarding protein folding?
The native structure of a protein is determined solely by its amino acid sequence
27
What is Anfinsen's Hypothesis regarding protein folding?
The native structure of a protein is determined solely by its amino acid sequence
28
What is the Molten Globule Hypothesis?
The molten globule is an important intermediate in the protein folding pathway where a polypeptide chain transitions from an unfolded to a folded state. It has most of the secondary structure of the native state but is less compact with no proper packing interactions in the protein's interior
29
What is Levinthal's paradox?
It is the idea that there are way too many possible conformations for a protein to test, even with advanced computing, making a simple calculation of interatomic interactions to find the lowest energy conformation impossible
30
What is the general idea of how proteins fold in practice?
What is the general idea of how proteins fold in practice?
31
What is the relationship between misfolded proteins and disease?
Disruption of protein structures can lead to neurodegenerative diseases
32
What is the role of chaperones in protein folding?
Chaperones help proteins fold correctly
33
What are intrinsically disordered proteins (IDPs)?
Disordered regions of proteins can conform to many different proteins, facilitating interaction with numerous different partner proteins
34
What is the purpose of finding homologs through sequence analysis in protein structure prediction?
To predict the structure of an unknown sequence and predict its function
35
What is the purpose of finding homologs through sequence analysis in protein structure prediction?
To predict the structure of an unknown sequence and predict its function
36
What is homology modelling or threading (comparative modelling)?
If homologs are found through sequence analysis, then homology modelling can be used to predict structure
37
What is de-novo structure prediction?
If no homologs can be identified, structures can be predicted de-novo using only sequence information. This was an extremely hard problem until recently
38
What are AlphaFold and Rosetta?
AlphaFold and Rosetta are examples of tools that have caused a paradigm shift in protein folding. They have changed the landscape of prediction of protein structures