Bio Background Flashcards
DNA
Deoxyribose nucleic acid, encodes genetic program of prokaryotes and eukaryotes.
Long polymer made from nucleotides or bases.
Four DNA bases
C ytosine
G uanine
A denine
T hymine
adhered to the sugar/phospate to form the nucleotide
Purines
Adenine + Guanine (AG) - pair of connected rings
Pyrimidines
Cytosine, Thymine and Uracil (RNA) - single ring
Base pairing
Double helix stabilized by:
- Hydrogen bonds
- Base stacking interactions among nucleotides
A-T (2 hydrogen bonds)
C-G (3 hydrogen bonds)
Structure
- Backbone is alternating sugars/phosphates
- Center are hydrogen bonds
- Space between strands are binding sites for transcription
- Strands are antiparallel
- 5’ start - phosphate group
- 3’ end hydroxyl group
Top strand: 5’ -> 3’ (watson)
Bottom strand: 3’ -> 5’ (crick)
Replication
Occurs from 3’ to 5’ direction
Proteins
Linear polymer of aminoacids linked by peptide bonds
20 different types of aminoacids
Aminoacids
Alanine, Cysteine, Aspartic Acid (Asp D), Glutamic Acid (Glu E), Phenylalanine, Glycine, Histidine, Isoleucine, Lysine, Leucine, Methionine, AsparagiNe, Proline, Glutamine, Arginine, Serine, Threonine, Valine, Tryptophan, Tyrosine
Protein structure
Primary - sequence of aminoacids
Secondary - Local spatial arragment due to backbone interactions, short stretches alpha helices, beta helices
Tertiary - long range 3D chain side-to-side interactions
Quaternary - Chains fold around one another
Ribbon diagrams
3D structures adopted by aminoacids
Coiled ribbon = alpha helix
arrow ribbon = beta strand
thin string = loops
Central Dogma Molecular Biology
DNA - (Transcription) > RNA - (Translation) > Protein
Aminoacid seq in RNA is determined by nucleotide seq in DNA
Gene
Region of DNA controls a hereditary characteristic.
Corresponds to single mRNA which will be translated to a protein
Eukaryotes have exons interrupted by introns (no code seq)
RNA
DNA but sugar is ribose
Instead of Thymine is Uracil
Single stranded
mRNA transcribed from DNA -> translated into protein
tRNA used in translation
rRNA helps ribosomes assemble proteins
Transcription
Initiation - RNA polymerase binds to promoter site on DNA and unzips double helix
Elongation - free nucleotides bind to template strand and thymine is changed by uracil
Termination - seq signal termination RNA transcript is released and DNA zips up again
TAA, TAG, TGA -> Stop seq
ATG -> begin seq
Pattern matching
- Naive
- Finite Automata
- KMP
- Boyer Moore
- Suffix Tree
- Suffix Array
- Generalized suffix tree and array
Pattern matching - naive
Brute force algorithm
sliding pattern over the text
n = len(T)
m = len(P)
Time complexity O((n-m+1) * m) worst case
Pattern matching - Finite automata
Sigma = alphabet
Time complexity O(n)
preprocessing: O (m|sigma|)
pattern matching -> O(n+m)
Finite-Automaton-Matcher(T,d,m)
begin
q := 0;
for i:= 1 to n do
begin
q := delta(q,T(i));
if q = m then
print “P occurs from position”+ (i-m+1)
end;
end;
worst time O(m^3 * |alpha|)
build_delta(P, alpha)
begin
for q := 0 to m do
for each a in alphabet do
begin
k := min(m+1, q+2);
repeat
k := k-1;
until P[1..k] ] (is a suffix of) P[1..q]a;
delta (q, a) := k;
end;
end
Suffix function logic:
get the longest prefix of P which is a suffix of current input like
P = at
suffix(atcat) = 2
Pattern matching - KMP
Prefix
Time complexity O(n)
prefix function O(m)
Avoid testing useless shifts, avoid precompute delta function
Run linear complexity
Good for short patterns with repetitions
Pattern matching - Boyer Moore
bcr function O(|alphabet| + m)
gsr function O(m)
total: O( (n-m+1) * m + |alphabet|)
Bad Character Rule - check the cases that allow a shift of n characters on the pattern, move to right as much as possible
Good suffix Rule - use knowledge of how many matched characters in the pattern suffix
Case 1 - a complete match exists as another prefix in P, then shift based on delta array
Case 2 - there is not a complete match, use prefix function almost shift by all remaining chars
Run sublinear complexity
Usually faster for large P and T with repetitions