Human Genome Flashcards
How much of the genome encodes for proteins ?
5%
What are Isochores ?
Large DNA segments (>300 kb) which are characterised by an internal variation in GC content
Describe CpG Islands
- Cytosine base followed by a guanine base is rare In vetebrate DNA
- Cytosines following guanines tend to be methylated - the methylation state of these CpG islands can regulates the expression of the genes
Describe LINE retroansposon
- repetitive element in genome
- retro = going through an RNA intermediate
- LINE = Long Interspersed Elements
- complete sequence is 6000-8000 bp long
Describe a SINE retrotransposon
- SINE = Short Interspersed Elements
- its a parasite’s parasite –> depends on LINE for its propagation
- Alu elements are most abundant in humans - 300bp long
How much of the genome are interspersed repeats ?
46%
What are the 4 repetitive elements found in the human genome ?
- LINE retrotransposon
- SINE retrotransposon
- Retroviruses
- DNA transposon
What are Alu Elements ?
- only found in primates
- can be sorted into distinct families according to shared patterns of variation
- only one or several Alu “master copies” are capable of transposing
What are flanking regions ?
- consist of ‘unordered’ DNA
- occur on each side of the repeat unit
- critical because they allow for the development of locus-specific primers to amplify the microsatellites with PCR
Why are repetitive elements bad?
- repetitive elements waste energy
- insertion of of these repeats can be harmful
Why do repetitive elements still exist ?
- generation & deletion of repeats have reached an equilibrium
- mammalian genomes can tolerate them as they’ve developed mechanisms to control them - eg histone modification
Describe a duplicated pseudogene
- created from tandem duplication or unequal-crossover
- segment duplication is prevalent
What is a multigene family ?
groups of genes from the same organism that encode proteins with a similar sequence either over their full length or limited to a specific domain
What results in a multigene family ?
- DNA duplications that involve 1 or more genes generate gene pairs
- if both copies are maintained in subsequent generations then a multigene family will exist in the genome
What are histone proteins?
- globular in shape
- 5 members in the family = H1, H2A. H2B, H3 and H4
- the family members are closely related but not identical in amino acid sequences
Describe the structure of histone proteins
- octamer 2x H2A, H2B, H3 & H4
- H1 holds the structure together
- the globular protein contain tails which are rich in amino acids which have positively charged R-groups
Describe epigenetics
- phenotype changes in a cell/organism
- not the result of nucleotide changes
- brought about by chemical changes to DNA (CpG) methylation and histones
What is the first distinct mechanism caused by histone modifications and how does it affect chromosome function ?
modifications may alter the electrostatic charge of the histone resulting in a structural change in histones or their binding to DNA
What is the second distinct mechanism caused by histone modifications and how does it affect chromosome function ?
modifications are binding sites for protein recognition modules, such as chrodomains that recognise acetylated lysines or methylated lysine
What are 2 problems in genome sequencing ?
- genomes are incredibly large
- you need a lot of DNA to get a sequence
What are the solutions to the problems of genome sequencing ?
- cut the DNA into smaller pieces then put it back together
- make lots of copies of each bit
Describe PCR
- Polymerase Chain Reaction
- enables large amounts of DNA to be produced from very small/very complex samples
- uses 2 primers & DNA polymerase
What does PCR allow?
- selecting the region of study
- producing millions of copies of it
Describe Sanger sequencing
- similar to PCR
- uses only a single primer and polymerase to make new single stranded DNA pieces
What is used to reduce the size of DNA in order to sequence it ?
- restriction enzymes
- they recognise and cut specific sequences
- there are hundreds of restriction enzymes are available
What is BAC ?
- Bacterial Artificial Chromosomes
- after the use of restriction enzymes there is a soup of DNA fragments
- we use the BAC to amplify and identify each fragment
What are the 2 purposes of the BAC libraries?
- separates fragments so they can be sequenced individually
- allows the production of lots of DNA by growing up lots of bacteria containing identical BACs or plasmids
Why is BAC library not suitable for genome sequencing on a large scale ?
- 20,000 different BAC clones are needed to contain the 3 billion pairs of bases in the genome
- each inserts 150,000-200,000 base pairs
- minimum of 1.5 million sub-clones needed
what are the simplified steps of DNA sample preparation ?
- Extract DNA
- Randomly Shatter (sonifcation)
- Attach adapter sequence
Describe Illumina Next-Generation Sequencing
- most popular method
- produces millions of reads
- reads can be paired-end
- very good for alignment
Describe DNA Barcoding
- barcoded = tagged with a short known sequence
- allows multiple samples to be put on the same run
- samples can be computationally isolated