block 2 Flashcards by teniola fajuke

Bioinformatics

uses computational methods to study protein sequence, structure and function

How well did you know this?

Not at all

Perfectly

what is the point of studying sequences and identifying similarity?

-Similarity indicates conserved function
-Human and mouse genes are more than 80% similar at sequence level
-But these genes are small fraction of genome
Most sequences in the genome are not recognizably similar
Comparing sequences helps us understand function
-Locate similar gene in another species to understand your new gene

How well did you know this?

Not at all

Perfectly

protein domains

Definition: A distinct, independently folding unit of a protein chain.
Characteristics:
Can exist independently in structure and function.
Typically connected to other domains within the same protein.
Functions:
Examples include protein-protein interaction, nucleic acid binding, and catalytic activity.
Structure:
Tertiary structure: Includes the arrangement of units within domains and how domains fit together.
Quaternary structure: Refers to the association of separate polypeptide chains, not domains within the same chain.

How well did you know this?

Not at all

Perfectly

what does a high sequence similarity mean?

-structural similarity and functional similarity
-probably the same fold

How well did you know this?

Not at all

Perfectly

interPro:protein family database

InterPro: Protein Family Database

Overview: A large database containing ~22,000 protein families, represented by multiple sequence alignments.
Purpose: Helps identify functional regions (domains) and homologous proteins, offering insights into protein functions.
Key Concept: Proteins are made of functional domains, and combinations of these domains create the diverse range of proteins in nature.
Application: Useful for studying protein family relationships and functional annotations.

How well did you know this?

Not at all

Perfectly

SCOP

structural classification of proteins.
-different classes within the database and different combinations of how they come together
-SCOP classifies proteins into a hierarchy using the following categories:
◦
Class: Proteins are grouped into classes based on their secondary structure content, including all alpha proteins, all beta proteins, alpha and beta proteins (a/b), and alpha and beta proteins (a+b).
◦
Fold: Describes the overall shape and arrangement of secondary structures.
◦
Superfamily: Proteins within a superfamily are thought to share a common evolutionary origin.
◦
Family: Proteins within the same family are closely related and share higher sequence similarity.
The SCOP database, along with other databases like CATH, helps in understanding the hierarchy of protein structure, aiding in protein classification and the study of their structure and function. The classifications are based on experimentally determined protein structures and are used to infer possible functions based on the relationship between structure and function

How well did you know this?

Not at all

Perfectly

why does sequence variation occurs

-due to random mutations and natural selection
-On the organism level, mutations might lead to disease or death
A single mutation in a protein is usually harmless, unless it appears in the following:
An active site: Enzymes
A binding site: Receptors, antibodies, signaling proteins
A site promoting toxic aggregation: Sickle-cell anemia

How well did you know this?

Not at all

Perfectly

sequence formats

FASTA is the most completely used
-protein sequence with the file heading on top with a greater than symbol to indicate what the protein does

How well did you know this?

Not at all

Perfectly

sequence alignment and pairwise identity

-To compare two (or more) sequences we need to align them (see below).
One way to quantify similarity between two aligned sequences is by their pairwise sequence identity (with a few caveats):
-Identical amino acids: -Contribute to identity
-Similar amino acids: Contribute to similarity
-Gaps: Contribute to the overall alignment

How well did you know this?

Not at all

Perfectly

substitution matrix

-pairwise alignment

-Definition: A scoring system to evaluate sequence alignments by quantifying the likelihood of character substitutions (e.g., amino acids or nucleotides).
Scores: Positive for likely substitutions; negative for rare ones.
Types:
PAM: Based on evolutionary changes in closely related sequences.
BLOSUM: Derived from conserved protein regions (e.g., BLOSUM62).
Uses: Sequence alignment, identifying homologs, and studying evolutionary relationships.

How well did you know this?

Not at all

Perfectly

classes of pairwise alignment

global allignment=Tries to align entire sequence
Align all letters from query to target
Suitable for closely related & equal length sequences
local alignment= Aligns regions with highest similarities
Align substring of target with substring of query
Suitable for more divergent sequences, different length and conserved region containing sequences

How well did you know this?

Not at all

Perfectly

sequence identity and homology

Homology vs Sequence Similarity:

Sequence similarity refers to the comparison of two sequences (e.g., DNA, RNA, or protein) to quantify how similar they are. This can be measured using:

Score or pairwise sequence identity (e.g., 76% identical in sequences).

Homology refers to the evolutionary relationship between sequences, i.e., whether two genes or proteins share a common ancestor. This is a hypothesis based on sequence comparison, not something that can be directly measured.

Incorrect statement: “Two sequences are 76% homologous.” This is not meaningful. Homology isn’t quantified as a percentage.

How well did you know this?

Not at all

Perfectly

homologs

genes sharing a common origin

How well did you know this?

Not at all

Perfectly

orthologs

genes originating from a single ancestral gene in the last common ancestor of the compared genomes (speciation is the key event)
- are homologs

How well did you know this?

Not at all

Perfectly

paralogs

genes related via duplication (gene duplication is the key event)
are homologs

How well did you know this?

Not at all

Perfectly

What is the difference between tertiary and quaternary structure?

Tertiary structure describes how units within domains associate and how domains fit together. Quaternary structure describes how separate polypeptide chains associate with each other.

What is a substitution matrix used for?

A substitution matrix is used in pairwise alignment to quantify similarity between amino acids, based on their biochemical and biophysical properties. Examples include BLOSUM62 and PAM250.

What are the two main types of pairwise alignment?

The two main types of pairwise alignment are global alignment, which aligns the entire sequence, and local alignment, which aligns regions of highest similarity

What is the main purpose of global alignment and local alignment?

Global alignment is used to identify folds and sequence homology, while local alignment is used to identify motifs and gene duplication events

What is sequence identity?

Sequence identity refers to the identical amino acids in an alignment. It is a main tool to establish sequence homology and similarity in folds

What is BLAST?

BLAST (Basic Local Alignment Search Tool) is a heuristic pairwise alignment tool that searches for local similarities between sequences, approximating the Smith-Waterman algorithm

What is the most common input format for BLAST?

The most common input format for BLAST is the FASTA format

What does the e-value in BLAST output represent?

The e-value in BLAST output represents the number of alignments expected by chance

What is Multiple Sequence Alignment (MSA) used for?

MSA is used to align multiple sequences to identify conserved regions, motifs, and patterns. MSA can also be used to study sequence conservation, gene duplication events, and amino acids that are important for binding or catalysis.

What can a sequence logo depict?

Sequence logos are a graphical representation of sequence conservation. They depict sequence characteristics, such as protein-binding sites in DNA or functional units in proteins

What is Anfinsen's Hypothesis regarding protein folding?

The native structure of a protein is determined solely by its amino acid sequence

What is Anfinsen's Hypothesis regarding protein folding?

The native structure of a protein is determined solely by its amino acid sequence

What is the Molten Globule Hypothesis?

The molten globule is an important intermediate in the protein folding pathway where a polypeptide chain transitions from an unfolded to a folded state. It has most of the secondary structure of the native state but is less compact with no proper packing interactions in the protein's interior

What is Levinthal's paradox?

It is the idea that there are way too many possible conformations for a protein to test, even with advanced computing, making a simple calculation of interatomic interactions to find the lowest energy conformation impossible

What is the general idea of how proteins fold in practice?

What is the relationship between misfolded proteins and disease?

Disruption of protein structures can lead to neurodegenerative diseases

What is the role of chaperones in protein folding?

Chaperones help proteins fold correctly

What are intrinsically disordered proteins (IDPs)?

Disordered regions of proteins can conform to many different proteins, facilitating interaction with numerous different partner proteins

What is the purpose of finding homologs through sequence analysis in protein structure prediction?

To predict the structure of an unknown sequence and predict its function

What is the purpose of finding homologs through sequence analysis in protein structure prediction?

To predict the structure of an unknown sequence and predict its function

What is homology modelling or threading (comparative modelling)?

If homologs are found through sequence analysis, then homology modelling can be used to predict structure

What is de-novo structure prediction?

If no homologs can be identified, structures can be predicted de-novo using only sequence information. This was an extremely hard problem until recently

What are AlphaFold and Rosetta?

AlphaFold and Rosetta are examples of tools that have caused a paradigm shift in protein folding. They have changed the landscape of prediction of protein structures