Coding Life: Genomics and Population Genetics Flashcards
genome - numbers and complexity
Genome: the collated set of all the genes of all the chromosomes of a single species
3 billion in humans
More distantly related species have less genomic similarity
Two unrelated humans are 99.9% the same
Can be used to examine evolution
Number of genes does not correlate with complexity
Genome size also doesn’t correlate with complexity
genomics
DNA sequencing technologies helped
Started with model organisms and simple organisms
Draft human genome in 2000
Next generation sequencing has made this faster (Sanger used before)
next generation sequencing
Next generation sequencing: involved parallel generation of huge numbers of short DNA sequences that need to be assembled into long contiguous sequences.
Look for overlapping parts to work out how they fit together
There are tandem, dispersed and simple-sequence repeats and we don’t know why they are
Repeats make assembly hard
75% is unknown or repeat. Only 3% is coding sequences in humans
Computers can recognise RNA from a protein coding region and its codons, regions of RNA in things like tRNA that have folds or hairpin structures, transcription factors bind near protein-coding genes - helps identify genes
transcriptomics
involve using mRNA to determine where the functional gene may lie in the genome.
gene duplications
Gene duplications: provide some of the raw material for evolution and diversification.
Duplication is often tolerated (might make more gene product and not kill)
One copy might mutate and take on novel functions while the other one is normal
This is important in evolution
How we end up with multiple genes with complementary functions
genome structure and evolution - cystic fibrosis
Cystic fibrosis
Mendelian (single gene) trait
Most traits are multifactorial (several genes and environmental factors)
genome structure and evolution - identifying disease genes
Identifying disease genes
Traditional gene mapping, cloning and sequencing candidate disease genes
If mutations are found only in affected individuals, that’s probably the disease gene
Can improve genetic counselling - having more children, family members can be screened
Provides information about disease pathways - helps with therapy
Increases understanding of related and more common diseases
challenges and opportunities of genomics
Pathogenic variants may be in non-coding areas
Not all affected individuals will have a particular risk factor
Some healthy people have the risk factor
Difficult as most disease genes are multifactorial
Developing algorithms that sort through all the data
factors that contribute to a phenotype
Genes - alleles
Mutation and recombination result in new alleles
Biochemical and visible phenotypes produced by these alleles
Environmental factors
example population - equilibrium and natural selection
Thousands of red and white plants in an isolated field
0.8 frequency of red, 0.2 for white
If frequencies remain the same overtime, the population is at equilibrium
If it changes, one allele might be at advantage and change with natural selection
hardy-weinberg equilibrium
conditions where we can translate between allele frequencies and genotype frequencies. AA is p^2 Aa is 2pq aa is q^2 A population stays at this given: Large population No gene flow No natural selection No mutation There is random mating
microevolution
change in allele and genotype frequencies (usually violation of first three)
stabilising NS
Selection against extremes in a population, favouring intermediate phenotypes
Intermediate birth weight in babies (mortality rate is lower at this spot)
balancing NS
Acts to maintain more than one allele at a particular locus
Heterozygous advantage: aka over dominance is an example where the heterozygote at a locus is more fit than either homozygote under certain conditions
directional NS
Leads to a change in a trait overtime Darwin’s finches Drought killed off plants with seeds Seeds that remained were bigger Big beak size increased in the population