Lecture 2 - Biology for Computational Genomics Flashcards
What are the two forms of computational formulation?
-numerically encoded input with a computable objective function
i.e. f(x)
-numerically encoded input with a computable test for significance
i.e. probability that nucleotides are i.i.d
What is the rule of maximum parsimony?
-if there are multiple solutions to a problem the one involving the fewest steps is the biologically correct one
What are the three steps for the computational formulation of genome rearrangements?
- transform the problem into numeric input
- define the model
- define the problem
What is the repeat structure in the human genome?
the human genome is highly repetitive and 50% of the human genome is repeats which excludes the centromeres and when you add them you get the 55% range
-the repeat structure of genomes effect computational problems such as genome assembly and read alignment
What are STRs or short tandem repeats?
a repeated monomer sequence that tends to be 2-7 bases repeated up to several hundred times is 4.3% of genome and the mutation rate is very high for STRs
What is the societal and biological importance of STRs?
-13 STR loci are used as a forensics database
-STR expansions are linked to disease such as ALS
What are mobile elements?
mobile DNA are sequences that copy themselves or hijack reverse translation systems
-pieces of DNA that have info encoded in them so the cell can copy them and put them back into the gene
-autonomous sequences that encode the protein that copy themselves and encodes for proteins that copy that very sequence of DNA is a positive feedback loop and are often hyper methylated which prevent this from going out of control - nonautonomous get copied into RNA and spliced back via an autonomus element so cannot do on own
What are segmental duplications?
a sequence that is at least 1kb in length that is not a mobile element or a tandem repeat that is duplicated with at least 90% identity elsewhere in the genome
-is mosaic and has multiple overlapping regions in the genome
How are segmental duplications drivers of evolution and disease?
-drive nonallelic homologous recombination which results in the duplication or deletion of a region which occurs between two sequnces of dna that are highly similar but are not alleles
What are the classes of repeats in human genome?
STRs (2-6)
-variable number repeats (7-500bp)
-mobile elements (300-8,000bp)
-segmental duplications (1,000bp-1,000,000bp)
What is the binary representation of DNA?
A=00
T=11
G=10
C=01
What are some binary conversions?
one base = two bits
one byte = eight bits
one byte = 4 bases
memory size
What is the organization of DNA?
untranslated region to open reading frame to other untranslated region
Where does RNA splicing occur the most?
GU/AT sequences of introns