Genome Diversity in Space - Before Midterm Flashcards
1
Q
Comparative Genomics
A
- a field of biological research in which the genomes of different organisms are compared in order to understand phenotypic differences between them and infer evolutionary relationships and processes
- it starts with the phenotype, and then tries to find the genetic source of those phenotypic differences
2
Q
Key features of the human genome
A
- The human genome is only about 1.5% genes
- We have about 20,000 protein coding genes, and 22,000 genes for non-coding RNA, like rRNA, tRNA, and short-non coding RNA
- An important type of short, non-coding RNA is micro-RNA (miRNA)
- 24% of the genome is introns and other non-coding DNA
- 12% of the genome is pseudogenes; genes that used to code for something but don’t anymore because they are “broken;” they’re just relics
- 43% of the genome is interspersed repeats
- 20% of the genome is short tandem repeats and other things
- Collectively , the interspersed repeats and tandem repeats are considered “junk” DNA
3
Q
miRNA
A
- micro RNA found in the human genome (and prob most eukaryotes?)
- a type of short, non-coding mRNA
- miRNA silences genes by binding to mRNAs that are complementary to it and recruiting the protein RISC which recruits the protein argonaute which cuts up that mRNA
4
Q
Interspersed repeats
A
- One of the two types of repeats in the human genome
- Make up 43% of the human genome
- They are spread out throughout the chromosome
- Are mobile genetic elements
- Are copied at random throughout the genome
- The types of interspersed repeats are LINEs, SINEs, LTRs, and DNA transposons
5
Q
Tandem repeats
A
- One of the two types of repeats in the human genome
- Are tandem; meaning right after one another in the genome
- Typically are derived rom copying errors
- They have different names depending on their size
- satellite = 5-200 bp repeats
- mini-satellite <= 25 bp repeats
- micro-satellites <= 13 bp repeats
- dinucleotide : “AT” repeats
- trinucleotides: found in Huntington’s disease
6
Q
LINEs
A
- Long interspersed nuclear repeats
- A type of interspersed repeat (as obvious by the name)
- LINES, like LTRs, code for a protein that helps them jump around the genome
- This protein they code for is a “copy and paste” enzyme, meaning the enzyme copies its own gene that made it and inserts it somewhere else in the genome
- They have recognition elements on either side to help the enzyme they code for recognize themselves
7
Q
LTRs
A
- Long-teminal repeats
- A type of interspersed repeat
- Like LINEs, they code for a protein that helps them jump around the genome
- This protein they code for is a “copy and paste” enzyme, meaning the enzyme copies its own gene that made it and inserts it somewhere else in the genome
- They have recognition elements on either side to help the enzyme they code for recognize themselves
8
Q
SINEs
A
- Short-interspersed nuclear repeats
- A type of interspersed repeat
- They don’t code for their own protein to help them move about the genome, but instead use the copy and paste enzyme of LINEs to move about
9
Q
DNA Transposons
A
- A type of interspersed repeat
- Code for a “cut and paste” enzyme which cuts the gene out of the DNA and inserts it in a different location in the genome, so no copies are made, but the gene is moved
10
Q
Eukaryotic DNA Structure Overview
A
- DNA takes the form of linear DNA during replication, but it is normally in chromosomal structures when not replicating
- To form a chromosome, linear DNA wraps around histones, which then coil around themselves to be more dense, and then coil even more to become denser and denser, forming chromosomes
11
Q
Prokaryotic DNA Structure Overview
A
- Prokaryotes have their DNA in a circular, double stranded structure
- This circular structure condenses into a super-coil, and then condenses even more around a protein core to make a weird structure (see Notes on 10/14 for picture)
- These structures are called nucleoids (I think; or maybe the structures come together to make nuceoids?)
- This circular chromosome structure is a prokaryotes primary genome, but many prokaryotes also have plasmids on top of this
- Plasmids are useful because they allow for horizontal gene transfer
- Since prokaryotes have their main circular genomes and also plasmids, there are different names for referring to the different “genomes”
- core genome: the set of essential genes ALL members of a particular prokaryotic species have
- accessory genome: the set of ‘extra’ gens that individual members of a particular prokaryotic species may or may not have in their genome
- pan genome: the core genome AND the ENTIRE accessory genome for a species
- NOTE: the core genome isn’t always just the big chromosome; it can also contain plasmids that ALL individuals in that species have, and then the accessory genome will be extra plasmids some individuals in that species may have that others don’t
12
Q
Horizontal Gene Transfer
A
- Passing DNA to contemporaries as opposed to offspring
- Plasmids allow bacteria to do horizontal gene transfer
13
Q
Key features of Prokaryotic Genome
A
- Over 90% of the genome is coding
- Prokaryotes hardly ever have introns
- Their genes sometimes overlap, because they sometimes use both strands in the same region
- Genes with related functions are often organized into operons
14
Q
Diversity between Domains
A
- Gene diversity: high in both bacteria and archaea, and low in eukaryotes
- Introns: not present in bacteria or archaea; present in eukaryotes
- Repeats: only about 1% in bacteria and archaea; around 60% in eukaryotes
- Structure: Bacteria and archaea both have a chromosomal structure and plasmids, with the chromosomal structure making up a nucleoid; eukaryotes have chromosomes that exist in the nucleus
- Organization: bacteria and archaea have operons, eukaryotes do not
- Comparative genomics data actually suggests that eukaryotes and archaea are more similar to each other than either are to bacteria
15
Q
Similarities between Eukaryotes and Archaea
A
- Their ribosomal proteins are more similar in sequence to each others than to those of bacteria
- Their RNA polymerases are more similar in sequence tp each others than to those of bacteria
- Their DNA replication enzymes are more similar in sequence to each others than to those of bacteria
16
Q
Homologs
A
- Genes descended from a common ancestor
- Looking at homologs is a way to qualitatively compare two genes
- There are two types of homologs:
- orthologs: homologs in different species that evolved from a common ancestor via speciation
- paralogs: homologs related by gene duplication within a species
- When deciding if a pair of genes are orthologs or paralogs, ask if the event that resulted in these different versions of the gene was duplication or speciation
- Orthologs are genes separated by speciation
- Paralogs are genes separated by duplication
- I think (I may be wrong), that they’re orthologs if in different species and paralogs if in the same species
17
Q
Analogs
A
- Genes that appear similar, but are the result of parallel (convergent) evolution in different lineages
- An example is the pandas and red pandas
- They both have pseudo-generation (inactivation) of TASIRI protein, the umami last receptor
- This was achieved in different ways, however
18
Q
Phylogenetics
A
- A way to quantitatively compare two genes
- Involves looking at the similarities between genes, proteins, etc and then mapping species out based on the similarities
- Molecular phylogenies are based on sequences of DNA or the proteins they encode
- Morphological phylogenies are based on physical characteristics
- We can make phylogenies quantitative by taking a gens that is a homolog in all the species we are looking at, and compare the sequences in an objective way
- we can look at identity, which is the percentage of bases or amino acids that are identical (opposite is “distance”; distance = number of differences )
- When looking at percent identity, we use Hamming distance, which is the number of positions where the bases are different
- We can look at similarity, which is the percentage of amino acids that are similar (opposite is “dissimilarity”)
- we can look at identity, which is the percentage of bases or amino acids that are identical (opposite is “distance”; distance = number of differences )
19
Q
Hamming Distance and Distance Matrix
A
- the number of positions where the bases are different when comparing gene homologs between species
- is used when making phylogenies
- The first step in measuring Hamming Distance is to line up all the sequences for the different organisms, then measure the hamming distance between pairs
- A good way to keep track of hamming distances is to make a distance matrix, which contains all pairwise distances
- Organisms with shorter distances tend to have shared a common ancestor not too long ago, whereas those with longer distances probably shared a common ancestor further back in time, since more time could explain why there are more differences (mutations)
- When making a distance matrix, it is usually helpful to do so via hierarchical clustering