human genome project Flashcards
requirements for DNA
must carry information
must replicate
must allow for info to change
must govern the expression of the phenotype
extreme accuracy of DNA replciation is necessary to preserve the genome over generations
complementarity allows the heredity info to be copied digitally, accurately, quickly and efficiently + ensure mutation rates are low
HGP
goal was identify the order of all the bases on every chromosome
develop new technology + resources, understand ethical, legal and social ussues
in 2003, HGP produced a genome sequence that accounted for over 90% of human genome
over 99.6% is done now
HGP process
DNA from 5 human subjects used
genomic fragmentation to DNA fragments using restriction enzymes
genome fragments placed in vectors (BAC or plasmids)
first generation sequencing
supercomputer put the pieces together; assembling each BAC then entire sequence
contigs to scaffolds + scaffolds placed on BACs, scaffolds placed on chromosomes to generate whole genome assembly
types of sequencing
hierarchical shotgun
whole genome shotgun
hierarchical shotgun
lots of large insert clones and keeps them separate
genome is broken down into larger segments
whole genome shotgun
whole genome is sequenced
sequences overlapping DNA sequences
computer is used to assemble the dna fragments
genome assemblies hierarchical
genome assembly- the process of putting nucleotide sequence into the correct order.
shortest assembly components are contigs which are sequences taken from individuals
contigs are assembled into longer scaffolds + scaffolds are assembles into chromosome if there is sufficient mapping info
types of scaffold
placed- placed within a chromosome
unplaced scaffold- not known which chromosome scaffold belongs to
unlocalised scaffolds- scaffolds orientation is not known
gene size
genes vary in size and exon content
inverse correlation between gene size and fraction of coding DNA
8.8 exons + 7.8 exons per gene
natural selection favours short introns in highly expressed genes as transcription is costly in time and energy
reference genome
digital nucleic acid sequence database, representative example of set of genes in one idealised individual organism
assembles from sequencing of DNA from a number of individual donors
improvements driven by technological advances
human pangenome
contains 47 phased, diploid individuals from a cohort of genetically diverse indivduals (trying to sequence regions with lots of variation)
added 119 million base pairs to the exisitng reference GRCh389
set of reference human genome sequences
Y chromosome
complete sequencing of Y chromosome
uncovered important genetic features including factors in sperm production
loss of Y chromosome is observed in multiple cancer types so can understand why
Y chromosome is highly repetitive
acrocentric chromosomes
recombination
short arms of human acrocentric chromosomes (12,14,15,21 +22) share large homologous regions including ribosomal DNA repeats and extended segmental duplications
short arms become more like eachother as they swap DNA
mediated via the PHRs because they contain genes for making ribosomal RNAs
robertsonian translocations
impact of lost segments of p arms
often seen in cancers
potential intranuclear mislocalisation of the q arms of the chromosomes
genome engineering- pluripotent stem cells
- isolate and culture donor cells
- transduce (convert) stem cell-associated genes into the cells by viral vectors
- harvest and culture the cells according to ES cell culture
- a small subset of the transfected cells become iPS cells and generate ES like colonies