Lecture 22 (why sequence the human genome?) Flashcards
Why was the human genome sequenced?
The human genome project, begun in 1990, aimed to:
Identify all human genes, and their roles
Analyse genetic variation between humans
Sequence the genomes of many model organisms used in genetics
Develop new sequencing techniques and computational analyses
To share genome information with scientists and the general public as fast as possible
The human genome project
The Human Genome Project (HGP) was an international scientific research project with the goal of determining the base pairs that make up human DNA, and of identifying and mapping all of the genes of the human genome from both a physical and a functional standpoint.
Begun in 1990
Human reference genome
Consists of about 10-13 anonymous individuals
Genome
Complete set of DNA of an organism, including all its genes
(complete set of DNA that you inherit from your biological parents)
(genome encompasses all DNA (mitochondrial and nuclear DNA)
Genomics
The study of genomes
Nuclear DNA
22 autosomes, X and Y
6 billion base pairs
Half from each parents
<21,000 genes
Mitochondrial DNA
Single, circular DNA
16,569 base pairs
All from mother
37 genes
Key findings of the human genome
There are fewer genes than expected
Less than 2% of out genome encodes for proteins
The genome is dynamic
We still don’t know what many of our protein coding genes do (also don’t know the number of them)
Most human genes are related to those of other animals (no genes are particularly uniquely human)
All humans are 99.9% similar at sequence level (difference at about one in every 1000 base pairs)
Key findings of the human genome - identify all human protein coding genes, and their roles
Define a gene and search for things that look like genes in sequences
1.5% coding (exons)
20% introns
Have approx 21000 genes. Many genes (about 25%) still have unknown function
Regulatory sequences (5%) sit around genes and determine whether they are turned on or off
Variation in the human genome
99.9% similarity between genomes, regardless of race or ethnicity
Which genomes vary the most?
African genomes vary most
Why are our genomes 0.1% different?
Changes range from single base to chromosome rearrangements. So every 1000 letters you might have a difference, differences are due to inheriting different variations from parents
SNPs
Single Nucleotide Polymorphisms (SNPs) are sites in the DNA that commonly vary within populations
SNPs are locations within the human genome where the type of nucleotide present (A,T,G, or C) can differ between individuals. SNPs are the most common type of genetic variation found among people.
While some of these variants affect protein function, most do not
SNPs stands for …
Single nucleotide polymorphisms (SNPs)
SNPs and how they relate to variation in the human genome
SNPs are common single base pair changes or variants
SNPs are common, around 1 in every 300 nucleotides. Your SNPs are mostly from your parents (you also have unique ones to you but mostly you inherit them)
Each genome sequenced adds to the variation on record. Diversity in the genome sequencing adds to knowledge of variation . Many SNPs don’t “do” anything, they are just inherited variations but this doesn’t mean that they are not useful
Analysing common variants (genotyping) can tell you…
Variation in the human genome - each new genome adds to the variation data
Analysing common variants (genotyping) can tell you…
Who you are related to
Where (some of) your ancestors came from
Disease risk/ association (masked data outside USA)
If you will lose your hair
Your muscle type
How you might respond to drugs
This data can also be used in crime solving
Some of these SNPs reveal our species ancestral interbreeding with other hominins
Linked SNPs
Located outside of the gene. These have no effect on protein production or function.
They do not reside within genes and do not affect protein function. Nevertheless they do correspond to a particular drug response or to the risk of getting a certain disease
Causative SNPs
These SNPs are located in the gene and includes non-coding SNPs and coding SNPs
Non-coding SNP
These SNPs changes the amount of protein produced and it is a SNP located in the gene
Coding SNP
Located within the coding region of a gene, changes the amino acid sequence of the gene’s protein product
Variation in the human genome - STRs
Short Tandem Repeats (STRs) and DNA profiling
STRs are repeats of 2-5 nucleotides, found in specific regions of genome
Each person inherits 2 alleles, one from each biological parent - which can be different lengths. They can be used to create profiles of genetic profiles or “DNA finger prints”
STR example
For example, at one STR site there could be 8 repeats of ‘CAG’ from the biological mother and there could be 3 repeats of ‘CAG’ from the biological father therefore this person is 3,8 at STR1 (1 stands for at one locus)
Variation in the human genome - InDels
An insertion/deletion polymorphism, commonly abbreviated “indel,” is a type of genetic variation in which a specific nucleotide sequence is present (insertion) or absent (deletion). While not as common as SNPs, indels are widely spread across the genome.
InDels = small insertions or deletions
Second most common variant type in the human genome
One of the most common genetic human diseases, cystic fibrosis, is caused by CFTR deltaF508, which is a 3 nucleotide deletion (it doesn’t change the reading frame of the codon, it just takes out one amino acid from the protein)
Can cause a frameshift - which changes the way that the DNA is read, if in protein-coding regions (if you lose the reading frame, the protein becomes garbage)
Variation in the human genome - structural variants
CNVs - copy number variations, changes of DNA (>500bp) that are present at different amounts or “copy numbers” relative to a reference genome (chunks that are accidentally duplicated or deleted)
Can be deleted or duplicated
Can span multiple genes
Humans have 10,000 CNVs, found within and between genes
Many genes found in CNV are associated with sensory perception (e.g. smell) and immunity
The human genome - where to from here?
Where did we come from? Variation is a key driver of evolution and a signature of descent (otherwise there would be nothing for natural selection to act on)
What about the genes we have still not analysed?
Understanding of complex (polygenic) and rare diseases needs to be found
Which drugs will work best for us, which ones should we avoid? (personalised medicine, or “pharmacogenomics”) (this could happen in the future)
Who does “our” data belong to, and who can access to? (these are the biggest unknown questions that are yet to be answered?)
Variation is the key driver to evolution …
Most variation is inherited, but each human also has their own small number of unique variants