Introduction Flashcards
Phylogenetics
Evolutionary process over millions of years
Population Genetics
Evolution within a species focusing on the genetic variation among people
Evolution will be treated as a mathematical process (in this class)
mathematical process
Biological evolutionary processes operate on several scales
cellular (in the body) / somatic
Tens/hundreds of thousands of years
Millions of years
Addressing most read world problems in data science requires using a mix of toolkits
tools like matlab, Python
Central Dogma in Biology
DNA - RNA - Protein
These are the major classes of polymers
Proteins are not used to go back to creating DNA or RNA.
RNA reverses transcribes to DNA in special cases
Genetic sequence is ideal natural representation
Linking biology and data
DNA is represented by “alphabet”
A , C , G ,T
RNA is represented by “alphabet”
A, C , G , U
Proteins have an alphabet of 20 letters (sometimes more) representing Amino Acids that make up proteins
Proteins
Transcription
DNA to RNA (1 to 1 mapping)
Most of the DNA in most genomes (e.g., 75% in humans and 15% in bacteria)
are not part of “genes”. These regions are not transcribed.
Translation
Protein are synthesized using the informatio in m(RNA). Protein are the building blocks of living cells
Different cells “express” RNA/Protein of each gene
at various levels and in multiple forms (splicing)
How does the 4 letter DNA/RNA code for 20 letter (Amino Acid) proteins?
Three DNA/RNA letters ion arow, called codon, code for one amino acid
There are 4^3 = 64 codons, but only 20 amino acids. There is redundancy
LEss than 2% of genome is for coding
Exons: protein coding regions
Stop codons: gene boundaries
Introns: regions between exons. Introns are transcribes but not translated
DNA Replication / Mutation
Parts of the molecule occasionally change in the new copy. These events are called mutations