Struggle Flashcards

1
Q

What is bioinformatics?

A

It is the analysis and conceptualisation of complex biological information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What us BLOSUM62

A

is a substitution matrix used for sequence alignment of proteins. BLOSUM matrices are used to score alignments between evolutionarily divergent protein sequences. They are based on local alignments. Pairwise alignment greater than 62%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explain Affine Gap Penalties

A

Penalises insertions/ deletions, Penalty for gap openings, gap extensions, length of gap extensions. Gap openings have a higher cost.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is In Silico

A

Ligand analysis performed on a computer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain BLAST

A

(basic local alignment search tool) is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of proteins or the nucleotides of DNA and/or RNA sequences. It uses Heuristic to speed us computation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Dynamic Programming

A

dynamic programming is solving complex problems by breaking them into states. It gives a score to find the optimal alignment. This process is very slow. The steps involve 1. initialisation 2. scoring the matrix 3. traceback

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Protein vs DNA

A

Protein has 20 characters rather than 4. Codons are degeneratable. Offers a longer look back in time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Paralogs

A

Duplication event

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why is DNA used?

A

To identify cDNA, non-coding regions of DNA and to identify DNA polymorphorisms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Types of Algorithms?

A
  1. Uniformative
  2. Ungapped
  3. Gapped
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe a hierarchical approach?

A
  1. Different groups are given a chromsome to sequence
  2. The hroups genereate a bacterial artifical chromosome (BAC)
  3. BAC is divided and shothun sequences
  4. High fideltiy maps identify motifs and allow detection of overlapping sequences.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How many Genes were found

A

51k

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How many genes code

A

20k

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How many genes non code

A

20k

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are pseudo genes

A

genes that seem to be protein coding but mutation renderers them non coding. 18k found

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How many mRNA’s found? and what does this mean?

A

98k, for every gene, 5 mRNA are made.

17
Q

Why are MSA done?

A

To elucidate functional in formation with proteins. Perform evolutionary analysis.

18
Q

How are alignments scored?

A
  1. Maximum number of sequences are matched.
  2. Scoring is done with Sum of Pairs
  3. Each column is scored by summing all possible matches, gaps and mismatches.
19
Q

What is the E-value?

A

The Expect value (E) is a parameter that describes the number of hits one can “expect” to see by chance when searching a database of a particular size. It decreases exponentially as the Score (S) of the match increases.

20
Q

What us the Affine Gap penality

A

Large opening gap penalty. smaller penalty for extending gap.

21
Q

Neighbour Joining

A

Similar to UPGMA, corrects evolutionary rate. Created unrooted tree

22
Q

What are some ways to create a tree?

A
  1. Distance Matrix Method
  2. Maximum parsimony method
  3. Maximum likelihood method
23
Q

What is bootstrapping?

A

A way of statistically validating a tree

Data is resampled

24
Q

How is MSA measured

A

It uses the ClustalW to form a phylogenetic tree. It uses the Sum of Pairs (Heuristic)

25
Q

How does CLUSTAL work?

A
  1. Begin with a pairwise alignment
  2. Build a phylogenetic tree
  3. Take the most closely related sequence and align them
  4. Repeat with the next most closely related sequence
26
Q

What is a characteristic of a fully resolved tree?

A

Only two branches on each node

27
Q

Describe the distance matrix method?

A

Distance-based methods must transform the sequence data into a pairwise similarity matrix for use during tree inference. 10. UPGMA. Stands for Unweighted pair group method with arithmetic mean.

28
Q

Character based method

A

Use the aligned sequences directly during

tree inference.

29
Q

Describe the shotgun approach

A
  1. DNA is isolated and chopped into fragments
  2. Fragments are cloned into vectors and sequenced
  3. Overlapping genes contribute to assemble the genome into contigs
  4. Scaffolds resemble contigs
30
Q

Output of Sanger Sequencing

A

500-1k

31
Q

Output of NGS

A

Billions

32
Q

Sanger sequencing?

A

The target DNA is copied many times, making fragments of different lengths. Fluorescent “chain terminator” nucleotides mark the ends of the fragments and allow the sequence to be determined.

33
Q

What determines the conformations of a protein active site

A

The conformations of side chains of the proteins non-active site regions

34
Q

Describe the difference between local and global alignment.

A

global and local. The global approach compares one
whole sequence with other entire sequences. The local method uses a subset of a sequence and
attempts to align it to subset of other sequences. Local alignments reveal regions that are highly similar, but do not
necessarily provide a comparison across the entire two sequences.

35
Q

Explain the steps of BLAST

A
  1. break the query into short words of a specific
    length.
  2. These words are then compared against a sequence in a database.
  3. those words whose T
    value was greater than 18 were used as seeds to extend the alignment.
36
Q

What does an E value of 1 represent

A

An e value of 1 means that one alignment using a query of this size will by chance produce a S score of this value in a database of this size.