Genomics Flashcards
What is the difference between complex and simple genomes?
Simple genomes have no introns, not much repetitive content and are mostly protein coding
Why do prokaryotes have such small genomes?
Are limited by power - DNA replication costs energy, so limit on genome size depending on the power the organism can produce
When eukaryotes engulfed bacteria, they decoupled replication from genome size allowing larger, messier genomes to develop
What is the C-value enigma?
Genome size doesn’t correlate to organism complexity (C-value is amount of DNA in haploid nucleus). Is resolved as very little of genome is protein coding in eukaryotes.
What is the structure of chromosomes?
Have a short arm (p = petite) and a long arm (q = the letter after p) Have a centromere (where the kinetochore forms) Have telomeres (for replication and stability, are conserved tandemly repeating sequences)
What are the different types of eukaryotic chromosomes?
Metacentric (centromere in the middle)
Submetacentric (centromere off centre)
Afrocentric (satellite p arms)
Telocentric (no p arms)
What are the different types of tandem repeats?
Mini satellites (10-100bp units), found in telomeric regions in humans Micro satellites (1-6bp units), found throughout the genome, is the large majority of all repeats Macro satellites (>100bp) difficult to analyse with PCR Useful for fingerprinting and population genetics
What is satellite DNA?
Short, tandemly repeated sequences, including mini, micro and macro satellites. Named as they appear as a ‘satellite’ when centrifuging sheared DNA in a caesium chloride density gradient as they are AT rich
What are pseudogenes?
Genes that are inactivated due to mutation, (frame shift, nonsense) or regulation
Often occurs if the gene is non-essential, or there are 2 copies (second copy accumulates mutation)
What are paralogs?
Homologous genes separated by gene duplication - genes with a common ancestor (have been duplicated)
What are orthologs?
Homologous genes separated by speciation - e.g. pig working vitamin C vs human not-working vitamin C synthesis gene
What are examples of transposed sequences?
Retroviruses
Transposable elements - DNA that can move around the genome
Processed pseudogenes - integration of cDNA back into a genome; has a poly A tail, no introns and no promoter
What are the characteristics of retroviral elements?
LTR
3’ and 5’ target sites for integration
Could interrupt a gene
Replicate DNA as they insert (target site duplication), disrupting gene expression
What are the characteristics of class I retrotransposons?
Copy and paste mechanism via and RNA intermediate
2 types
Type 1 are LTR: similar to retroviruses without env, don’t form infectious particles
Type 2 are non-LTR: LINEs (reverse transcriptase, make up 21% of human genome, most are unfunctional) and SINEs (no functional protein, need other mobile elements to move)
What are the characteristics of class II transposons?
Cut and paste mechanism
Encode a transposes enzyme
Most are inactive (e.g. deletions)
What are processed pseudogenes?
Mature mRNA is reverse transcribed and integrated into the genome. Lacks promoter (so is dead on arrival)and introns and has a poly A tail
Often have 5’ truncations due to low processivity of reverse transcriptase
Are dispersed throughout the genome (not near original gene)
Have target site duplication from insertion
What is the evolutionary story behind the IRGM gene family?
Immunity Related GTPase gene family
3 copies of the family in most mammals (humans only have 2)
50 million years ago, all but one copy was inactivated in monkey/great ape ancestor
24 million years ago, a retrovirus inserts at the start of a gene and forms a new promoter
12 million years ago, functional copy was fixed in gorilla, chip and human lineage
Today is expressed in several tissues in humans
Where is variation in genomes seen?
Sequence
Base modification (e.g. methylation)
Histone modification
Chromosome structure (length, inversions, duplications, deletions)
How does variation arise?
Mistakes in replication and chromosomal recombination and segregation. Has to be inherited i.e. in the germline to persist
What are SNPs?
Single Nucleotide Polymorphisms (or Variants, SNVs). Can be a transition (purine to purine, pyrimidine to pyrimidine) or a transversion (purine to pyrimidine). Could also be a single nucleotide deletion. Arise due to natural mutation or exposure to a carcinogen
What are the consequences of DNA variation?
Most are neutral and tolerated (a lot of DNA doesn’t encode protein; genetic code is degenerate so amino acid may not be altered; some amino acids can be interchanged). Some somatic mutations contribute to the changes seen in cancer. Occasionally there is positive or negative selection for a mutation
What are the most sensitive parts of the genome to mutation?
CpG dinucleotides that are subject to methylation. Methylated C can be deaminated to make a T. This can either be repaired (using the G on the other strand as a template) or fully converted to a T:A pair
What are CNVs?
Larger regions of DNA subject to duplication or deletion. They are evolutionarily important as sequences can diverge after a duplication. They usually arise due to non-allelic recombination (they are flanked by sequences with high homology).
What are the consequences of CNVs?
Pathways in which there is tight regulation of gene expression are most commonly disrupted such as control of foetal growth and brain development (or revealing a mutation on the ‘normal’ allele - loss of heterozygosity)
What is an example of non-sequence/DNA related mutation?
Epigenetic mutation. Could either be nucleic acid (e.g. methylating DNA) or protein (e.g. histone) modification