1. Uniformative 2. Ungapped 3. Gapped

Struggle Flashcards by Max Roux

What is bioinformatics?

It is the analysis and conceptualisation of complex biological information.

How well did you know this?

Not at all

Perfectly

What us BLOSUM62

is a substitution matrix used for sequence alignment of proteins. BLOSUM matrices are used to score alignments between evolutionarily divergent protein sequences. They are based on local alignments. Pairwise alignment greater than 62%

How well did you know this?

Not at all

Perfectly

Explain Affine Gap Penalties

Penalises insertions/ deletions, Penalty for gap openings, gap extensions, length of gap extensions. Gap openings have a higher cost.

How well did you know this?

Not at all

Perfectly

What is In Silico

Ligand analysis performed on a computer

How well did you know this?

Not at all

Perfectly

Explain BLAST

(basic local alignment search tool) is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of proteins or the nucleotides of DNA and/or RNA sequences. It uses Heuristic to speed us computation

How well did you know this?

Not at all

Perfectly

What is Dynamic Programming

dynamic programming is solving complex problems by breaking them into states. It gives a score to find the optimal alignment. This process is very slow. The steps involve 1. initialisation 2. scoring the matrix 3. traceback

How well did you know this?

Not at all

Perfectly

Protein vs DNA

Protein has 20 characters rather than 4. Codons are degeneratable. Offers a longer look back in time.

How well did you know this?

Not at all

Perfectly

Paralogs

Duplication event

How well did you know this?

Not at all

Perfectly

Why is DNA used?

To identify cDNA, non-coding regions of DNA and to identify DNA polymorphorisms.

How well did you know this?

Not at all

Perfectly

Types of Algorithms?

Uniformative
Ungapped
Gapped

How well did you know this?

Not at all

Perfectly

Describe a hierarchical approach?

Different groups are given a chromsome to sequence
The hroups genereate a bacterial artifical chromosome (BAC)
BAC is divided and shothun sequences
High fideltiy maps identify motifs and allow detection of overlapping sequences.

How well did you know this?

Not at all

Perfectly

How many Genes were found

51k

How well did you know this?

Not at all

Perfectly

How many genes code

20k

How well did you know this?

Not at all

Perfectly

How many genes non code

20k

How well did you know this?

Not at all

Perfectly

What are pseudo genes

genes that seem to be protein coding but mutation renderers them non coding. 18k found

How well did you know this?

Not at all

Perfectly

How many mRNA’s found? and what does this mean?

Study These Flashcards

98k, for every gene, 5 mRNA are made.

Why are MSA done?

Study These Flashcards

To elucidate functional in formation with proteins. Perform evolutionary analysis.

How are alignments scored?

Study These Flashcards

Maximum number of sequences are matched.
Scoring is done with Sum of Pairs
Each column is scored by summing all possible matches, gaps and mismatches.

What is the E-value?

Study These Flashcards

The Expect value (E) is a parameter that describes the number of hits one can “expect” to see by chance when searching a database of a particular size. It decreases exponentially as the Score (S) of the match increases.

What us the Affine Gap penality

Study These Flashcards

Large opening gap penalty. smaller penalty for extending gap.

Neighbour Joining

Study These Flashcards

Similar to UPGMA, corrects evolutionary rate. Created unrooted tree

What are some ways to create a tree?

Study These Flashcards

Distance Matrix Method
Maximum parsimony method
Maximum likelihood method

What is bootstrapping?

Study These Flashcards

A way of statistically validating a tree

Data is resampled

How is MSA measured

Study These Flashcards

It uses the ClustalW to form a phylogenetic tree. It uses the Sum of Pairs (Heuristic)

How does CLUSTAL work?

1. Begin with a pairwise alignment 2. Build a phylogenetic tree 3. Take the most closely related sequence and align them 4. Repeat with the next most closely related sequence

What is a characteristic of a fully resolved tree?

Only two branches on each node

Describe the distance matrix method?

Distance-based methods must transform the sequence data into a pairwise similarity matrix for use during tree inference. 10. UPGMA. Stands for Unweighted pair group method with arithmetic mean.

Character based method

Use the aligned sequences directly during | tree inference.

Describe the shotgun approach

1. DNA is isolated and chopped into fragments 2. Fragments are cloned into vectors and sequenced 3. Overlapping genes contribute to assemble the genome into contigs 4. Scaffolds resemble contigs

Output of Sanger Sequencing

500-1k

Output of NGS

Billions

Sanger sequencing?

The target DNA is copied many times, making fragments of different lengths. Fluorescent “chain terminator” nucleotides mark the ends of the fragments and allow the sequence to be determined.

What determines the conformations of a protein active site

The conformations of side chains of the proteins non-active site regions

Describe the difference between local and global alignment.

global and local. The global approach compares one whole sequence with other entire sequences. The local method uses a subset of a sequence and attempts to align it to subset of other sequences. Local alignments reveal regions that are highly similar, but do not necessarily provide a comparison across the entire two sequences.

Explain the steps of BLAST

1. break the query into short words of a specific length. 2. These words are then compared against a sequence in a database. 3. those words whose T value was greater than 18 were used as seeds to extend the alignment.

What does an E value of 1 represent

An e value of 1 means that one alignment using a query of this size will by chance produce a S score of this value in a database of this size.

Struggle Flashcards

(36 cards)