Bio Background Flashcards

1
Q

DNA

A

Deoxyribose nucleic acid, encodes genetic program of prokaryotes and eukaryotes.

Long polymer made from nucleotides or bases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Four DNA bases

A

C ytosine
G uanine
A denine
T hymine

adhered to the sugar/phospate to form the nucleotide

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Purines

A

Adenine + Guanine (AG) - pair of connected rings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Pyrimidines

A

Cytosine, Thymine and Uracil (RNA) - single ring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Base pairing

A

Double helix stabilized by:
- Hydrogen bonds
- Base stacking interactions among nucleotides

A-T (2 hydrogen bonds)
C-G (3 hydrogen bonds)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Structure

A
  • Backbone is alternating sugars/phosphates
  • Center are hydrogen bonds
  • Space between strands are binding sites for transcription
  • Strands are antiparallel
  • 5’ start - phosphate group
  • 3’ end hydroxyl group

Top strand: 5’ -> 3’ (watson)
Bottom strand: 3’ -> 5’ (crick)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Replication

A

Occurs from 3’ to 5’ direction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Proteins

A

Linear polymer of aminoacids linked by peptide bonds

20 different types of aminoacids

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Aminoacids

A

Alanine, Cysteine, Aspartic Acid (Asp D), Glutamic Acid (Glu E), Phenylalanine, Glycine, Histidine, Isoleucine, Lysine, Leucine, Methionine, AsparagiNe, Proline, Glutamine, Arginine, Serine, Threonine, Valine, Tryptophan, Tyrosine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Protein structure

A

Primary - sequence of aminoacids
Secondary - Local spatial arragment due to backbone interactions, short stretches alpha helices, beta helices
Tertiary - long range 3D chain side-to-side interactions
Quaternary - Chains fold around one another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Ribbon diagrams

A

3D structures adopted by aminoacids
Coiled ribbon = alpha helix
arrow ribbon = beta strand
thin string = loops

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Central Dogma Molecular Biology

A

DNA - (Transcription) > RNA - (Translation) > Protein

Aminoacid seq in RNA is determined by nucleotide seq in DNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Gene

A

Region of DNA controls a hereditary characteristic.

Corresponds to single mRNA which will be translated to a protein

Eukaryotes have exons interrupted by introns (no code seq)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

RNA

A

DNA but sugar is ribose
Instead of Thymine is Uracil
Single stranded

mRNA transcribed from DNA -> translated into protein
tRNA used in translation
rRNA helps ribosomes assemble proteins

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Transcription

A

Initiation - RNA polymerase binds to promoter site on DNA and unzips double helix
Elongation - free nucleotides bind to template strand and thymine is changed by uracil
Termination - seq signal termination RNA transcript is released and DNA zips up again

TAA, TAG, TGA -> Stop seq
ATG -> begin seq

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Pattern matching

A
  • Naive
  • Finite Automata
  • KMP
  • Boyer Moore
  • Suffix Tree
  • Suffix Array
  • Generalized suffix tree and array
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Pattern matching - naive

A

Brute force algorithm
sliding pattern over the text
n = len(T)
m = len(P)
Time complexity O((n-m+1) * m) worst case

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Pattern matching - Finite automata

A

Sigma = alphabet
Time complexity O(n)
preprocessing: O (m|sigma|)
pattern matching -> O(n+m)

Finite-Automaton-Matcher(T,d,m)
begin
q := 0;
for i:= 1 to n do
begin
q := delta(q,T(i));
if q = m then
print “P occurs from position”+ (i-m+1)
end;
end;

worst time O(m^3 * |alpha|)
build_delta(P, alpha)
begin
for q := 0 to m do
for each a in alphabet do
begin
k := min(m+1, q+2);
repeat
k := k-1;
until P[1..k] ] (is a suffix of) P[1..q]a;
delta (q, a) := k;
end;
end

Suffix function logic:
get the longest prefix of P which is a suffix of current input like

P = at
suffix(atcat) = 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Pattern matching - KMP

A

Prefix
Time complexity O(n)
prefix function O(m)

Avoid testing useless shifts, avoid precompute delta function

Run linear complexity
Good for short patterns with repetitions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Pattern matching - Boyer Moore

A

bcr function O(|alphabet| + m)
gsr function O(m)

total: O( (n-m+1) * m + |alphabet|)

Bad Character Rule - check the cases that allow a shift of n characters on the pattern, move to right as much as possible

Good suffix Rule - use knowledge of how many matched characters in the pattern suffix
Case 1 - a complete match exists as another prefix in P, then shift based on delta array
Case 2 - there is not a complete match, use prefix function almost shift by all remaining chars

Run sublinear complexity
Usually faster for large P and T with repetitions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Suffix tree

A

preprocessing O(n)
search O(m+k)
total: O(n+m+k)

22
Q

Ukkonen

A

Implicit suffix tree
- Suffix links, connect substrings using links between internal nodes
- Edge-labels compression, use indices instead of substrings O_space(1)

23
Q

Generalized Suffix Tree

A

Time complexity: O (|alphabet of two strings| * n)
Concatenate two strings with unique separators and apply ukknonen algorithm to build the suffix tree T

Used for common substrings
text comparison
palindromes
find largest suffix of S1 which is also prefix of S2, viceversa

24
Q

Pattern matching - Suffix array

A

Time complexity O(n)
Traverse depth first lexical order

Binary search to find match occurrences
Complexity O (m log n)
random strings O ( m + log n)
m = |P|

mlr accelerator (minimum left right)

25
Q

Assumption of sequence alignment

A

Life is monophyletic
Biological entities share common ancestry

26
Q

Phylogenetic similarity

A

Homology: similarity due to evolution
Analogy: similarity due to analogous function

27
Q

Homology

A

A is homologus to B if they relate from divergence from a common ancestor

Paralogues -> different but related functions in one species
Orthologues -> Same function in different species

28
Q

Analogy

A

A is analogous to B if similar function but different origins
Convergent evolution -> different species in a similar env adapt to same function

29
Q

Types of Mutations

A

Transition

A <-> G between two purines
C <->T between two pyrimidines

Transversion
(mixed one purine one pyrimidine)
C <-> G
A <-> T

Deletions

Insertions

Inversions
A-T adenine turns to thymine
G-C Guanine turns to cytosine

30
Q

Maximal parsimony hyphotesis

A

Among all explanations, the simplest one is preferred. Explain the absence/presence of nucleotides with the min number of evolutional changes

Easier to delete n bases in one site than one base in n sites

31
Q

Types of alignments

A

Global alignment
- comparative and evolutionary studies
Local alignment
- database searching and retrieval
- ignores distantly related biological regions and focuses on evolutionarily conserved signals of similarity

32
Q

Sequence alignment

A

Pairwise alignment - two sequences
- exact solution
Multiple sequence alignment - 3 or more
- approximated (heuristic) solutions

33
Q

Gaps

A

ends - missing data
internal - deletion or insertion

34
Q

Types of alignment

A

Manual alignment
Dot Matrix

35
Q

Dot Matrix

A

match -> diagonal step through dot
mismatch -> diagonal step through empty
gap on top seq -> vertical step
gap on left seq -> horizontal step

W/ multiple window size, stringency, alphabet size
stringency = if at least h chars are identical

adv:
- simple
- trial and error to explore

disadv:
- expensive for large seq
- may not find best
- qualitative analysis

36
Q

Scoring matrices and Gap penalties

A
  • scoring system
    : gap penalty -> gap-opening penalty, gap-extension penalty
    : scoring Matrix M(a,b) -> based on the additive property of the score, implies poistion independence
    match (a=b)
    mismatch(a!=b)
37
Q

Gap penalty

A

Fixed gap-penalty system -> 0, 1, or a constant

Linear gap-penalty system -> gamma(g) = -g*d = gap length g by a constant d

Affine score ->
opening cost (d)
extension cost (e)
gamma(g) -d - (g-1) * e, with e < d

Logarithmic gap-penalty system -> gap-ext increases with the logarithm of the gap length

38
Q

Scoring matrices

A
  • Identity scoring -> match 1, mismatch 0
  • DNA scoring -> match 3, transition 2, transversion 0
  • Chemical Similarity Scoring, higher scores for amino acids based on chemical similarity: size, charge, hydrophobicity
  • Observed matrices: analyze substitution frequency
39
Q

Scoring matrices

A

DNA
M(a,b) > 0 if match <=0 if mismatch

Aminoacids
PAM -> Percent/Point Accepted Mutation
Possibility of pair caused to homology and not by chance

BLOSUM -> Substitution matrices for aminoacids
direct observation of blocks of proteins having similar functions

40
Q

PAM

A

related proteins up to 85% very similar
PAM1 - 1 substitution in 100 amino acid residues 1%
Going through N percent mutations
PAM-N Matrix
PAM250 -> 250 evolutionary steps
Pos score common replacement
neg score unlikely replacement

Short -> short seq, strong local similarities
Long -> Long seq, weak similarities

PAM60 60% close relations
PAM120 general use 40% identity
PAM250 distant 20% identity

41
Q

BLOSUM

A

Blocks database, based on local alignments or blocks / observed
Families of proteins with identical function
highly conserved protein domains
Identify motifs -> blocks of local alignments

BLOSUM 62 is the default matrix in BLAST 2.0
BLOSUMn based on seq that are at most n percent identical
higher n more closely related

BLOSM62 general use
BLOSUM80 close relations
BLOSUM45 distant relations

42
Q

PAM vs BLOSUM

A

top = distantly related proteins

PAM100 ~= BLOSUM90
PAM120 ~= BLOSUM80
PAM160 ~= BLOSUM60
PAM200 ~= BLOSUM52
PAM250 ~= BLOSUM45

bottom = closely related sequences

42
Q

PAM vs BLOSUM best ones

A

BLOSUM -> best for local alignments
BLOSUM62 -> majority of weak protein similarities
BLOSUM45 -> long and weak alignments

PAM250 -> seq 17-27% identity
BLOSUM62 -> moderately distant proteins
BLOSUM50 -> FASTA searches

43
Q

Hamming Distance

A

permits only substitutions (positive cost)
if |A| = |B|, 0<= d(A,B) <= |A|

X =aaaccd, Y=abcccd
d(X,Y) = 2 two substitutions necessary

44
Q

Levenshtein Distance (Edit distance)

A

permits insertions, deletions and substitutions (positive costs)
0 <= d(A,B) <= max(|A|, |B|)

X=aaaccd, Y=abccd
d(X,Y)=2 , one substitution and one deletion

45
Q

Episode distance

A

permits only insertions (positive cost)
d(A,B) = |B| - |A| or inf
X=aaccd,Y=abbaccd
d(X,Y) =2 two insertions

46
Q

Dynamic programming - Pair alignment - Edit distance

A

D[i,0] = i
D[0,j] = j
D[i,j] = min(
D[i-1, j-1] + f(i,j),
D[i-1, j] + 1,
D[i, j-1] + 1
)
f(i,j) = 0 if match else 1

47
Q

Needleman-Wunsch - Pair alignment

A

Complexity - Space (nm)
Time for build O(nm)
backtrack O(n+m)
D[i,0] = i
D[0,j] = j
D[i,j] = min(
D[i-1,j-1] + f(i,j),
D[i-1, j] + 1,
D[i, j-1] + 1
)
f(i,j) = 0 if match else 1

follow path back from D[n,m] to D[0,0]
vertical step -> align a symbol in A with gap in B
horizontal step -> align a symbol in B with gap in A
diagonal step -> match or mismatch

48
Q

Similarity Score - Semiglobal alignment

A

y = gap penalty
D[i,0] = iy
D[0,j] = j
y
D[i,j] = max(
D[i-1,j-1] + ro(i,j),
D[i-1, j] + y,
D[i, j-1] + y
)
ro(i,j) = scoring matrix

B top
A left
D[0,j] = 0 gap beginning of B first column
D[n,j] = 0 gap tail of B last column

D[i,0] = 0 gap beginning of A first row
D[i,m] = 0 gap tail of A last row

49
Q

Smith-Waterman - Local alignment

A

D[i,0] = 0
D[0,j] = 0
D[i,j] = max(
D[i-1,j-1] + ro(i,j),
D[i-1, j] + y,
D[i, j-1] + y,
0
)
ro(i,j) = scoring matrix . f.e. match 1 mismath -1 y gap -1
y = gap penalty

50
Q

Needleman-Wunsch vs Smith-Waterman

A

SW finds segments in two seq that have similarities
First row and first column = 0
neg score = 0
begin with highest score end at 0, top left

NW aligns two complete sequences
first row and column = gap penalty
can be negative
begin with cell at (n,m) end at (0,0)