Lecture 1 - Biology for Computational Genetics Flashcards
What are the two forms computational formulation?
- Numerically encoded input with a computable objective function
- Numerically encoded input with a computable statistical test for significance
What is an example of a computational form in which there is a numerical input an a computable objective function?
input: x is all real value
f(x) = x^2
output: x that maximizes f(x)
What is an example of a computational form in which there is a numerical inout and a computable statistical test for significance?
input: the counts of each type of nucleotide in a genome
output: the probability that the nucleotides could be generated by a process that is independent and identically distributed
What is maximum parsimony?
if there are multiple solutions to a problem the one involving the fewest steps is the biologically correct one
-meaning if you want to figure out is two sequences are related it is through the way that has the least steps
What are synteny blocks?
show regions of homologous genes between two organisms
-direction matters for these
What are the three steps for the computational formulation of genome arrangements?
- Transform the problem into a numeric input
a. a list of genes [1…N] and their order and orientation a permutation in another genome
Example: A=[1,2,3,4,5,6,7,], B=[1,-7,-6,-5,2,3,4] - Define model
a. let the evolutionary model be a reversal (define Ref(G,i, j) to be the reversal of elements in G from position i to position j. this reversed the order and orientation of each element
Example: Rev (B, 2,4) = [1,5,6,7,2,3,4] - Define problem:
a. given genome A and permutation B find the minimal number of reversals to transform B into A
What is the central dogma of genetics?
DNA - RNA - protein
What is a plasmid?
small circular genome and can encode a virus or some bacterial genes
What has a circular genome?
bacteria
What has a linear genome made up of chromosomes?
eukaryotes
How many pairs of autosomal chromosomes and sex chromosomes to humans have and how many bases in total?
22 autosomal chromosomes
1 sex chromosome
3.2 billion bases in total
How many angstroms and in m are DNA basepairs?
3.4 angstroms (1^-10m)
How many bases is 1 full DNA twist and how many angstroms is it?
10.5 bases
34 angstroms (1^-9m)
How many angstroms is one nucleosome?
340 angstroms (1^-8m)
How many nm is a virus?
20-300nm (1^-8-1^-7m)
How many m is a bacteria?
10^-6m
How many um is a nucleus?
6um or 10^-6m
How many um is a cell?
10um or 10^-5m
What are the genome orders of magnitude?
What is the repeat structure in the human genome?
-the human genome is highly repetitive with repeats spanning many scales
-50% of the human genome is repeats excluding centromeres since when you add them you get a 55% range
-the repeat structure of genomes effects computational problems such as genome assembly and read alignment
What is a short tandem repeat or STR?
-repeated monomer sequence that is 2-7 bases repeated up to several hundred time
-total amount of genome is 4.3% or 138 Mbp
What is the mutation rate of STR compared to DNA?
up to 10,000 time greater than ordinary DNA
What is the societal importance of STRs?
since the repeats are highly mutable the odds of people getting the same sequence at a locus are low which means you can get a unique DNA “fingerprint” with only 13 loci for forensics databases
What is the biological importance of STRs?
STR expansions are linked to diseases such as ALS pr Huntington’s
What is a mobile element?
mobile DNA are sequences that copy themselves or hijack reverse translation system; pieces of DNA that have info encoded in them so the cell can copy them and put them back into a gene
What is an autonomous mobile element?
autonomous mobile elements are sequences that encode the proteins that copy themselves; encodes for proteins that copy that very sequence of DNA - create a positive feedback loop and are often hyper methylated to terminate the positive feed back loop
What is a non-autonomous mobile element?
get copied into RNA and then get spliced back but cannot do this themselves and rely on other cell machinery to do so
What is a non-autonomous mobile element of the human genome?
Alu (280-350 base sequence) that have 1 million copies in the human genome
What is an autonomous mobile element of the human genome?
LINE - up to 7,000 bases and is 15% of the human genome
What is a segmental duplication?
a sequence that is atleast 1kb in length that is not a mobile element or tandem repeat that is duplicated with at least 90% identity elsewhere in the human genome
How are segmental duplication drivers of evolution and disease?
segmental duplication drive nonallelic homologous recombination and this results in the duplication or deletion of a region
i.e. 15% of chromosome 16 is novel with respect to the human chimpanzee ancestor due to segmental duplication expansion
-1% of autism cases are linked to a deletion in a segmental duplication
What is the binary representation of DNA?
A = 00
T = 11
G = 10
C = 01
represented in two bits
How many bits make a byte?
8 bits
How can the reverse complement be computed?
the NOT or !A operator
How to represent all combinations of nucleotides of length 8?
4^8 combinations
How much memory is required to store a 4G or 4 billion base genome with binary encoding?
one byte = 8 bits
one base = 2 bits
4 bases = one byte
1 billion bytes
What is included in the anatomy of a gene?
5’ untranslated region to an ORF to another 3’ untranslated region
What do enhancer or silencer’s do?
increase or decrease the amount of RNA transcribed through DNA binding proteins where certain proteins or Transcription Factors bind to the DNA