Organisation of the human genome Flashcards
Outline the human genome.
The human genome contains all of the DNA content in human cells.
Nuclear genome:
- ~23,000 genes
- 19,599 protein coding genes
- Contains >99% of cellular DNA
- Arranged in 24 linear double-stranded DNA molecules (22 autosomes, 2 sex chromosomes)
- 3 billion bases
- Average gene is 3,000 bp long
MItochondrial genome:
- 37 genes
Outline ENCODE
ENCODE: ENCyclopedia Of DNA Elements.
ENDCODE is a public research consortium launched by NHGRI which was launched in Septemper 2003.
The goal of ENCODE is to build a comprehensive parts list of functional elements in the human genome, including elements that act at the protein and RNA levels, and regulatory elements that control cells and circumstances in which a gene is active.
The first pilot phase, started in 2003, accrued such information on just 1% of the genome and determined which experimental techniques were likely to work best on the whole genome.
After the initial pilot phase, scientists started a second round of technology development phase to apply their methods to the entire genome in 2007 and closed successfully in September 2012 with the promotion of several new technologies to generate high throughput data on functional elements.
The productions of 1640 datasets focusing on 24 standard types of experiment within 147 different cell types reveal that 80.4% of the human genome displays some functionality in at least one cell type.
What were the key findings of ENCODE?
- 80.4% of the human genome displays some functionality in at least one cell type.
This is more than was previsouly thought.
Give an overview of how the genome is organised in the context of the cell.
DNA is organised into different chromosomes within the nucleus, and during interphase these chromosomes are organised into different territories. The chromatin exists in two forms based on how condensed it is: euchromatin and heterochromatin.
Compare heterochromatin against euchromatin.
- Heterochromatin: darkly stained aggregates at the periphery of the nucleus. It is stable, conserved during development, makes up centromeres and telomeres, is gene poor, replicated late, and contained repetitive sequences. Heterochromain is reversible, and depends upon the stage of development or the cell type.
- Euchromatin: lightly stained material in interphase nuclei, gene rich, early replicating.
What is the composition of heterchromatin and euchromatin on average in a single chromosome?
- 3000 Mb of the genome is euchromatin and 200 Mb is heterochromatin.
- The average chromosome is 140 Mb.
- There is around 3 Mb of heterochromatin at each centromere.
How are genes distributed throughout the genome?
From karyotyping and G-banding, it can be seen that genes are randomly distributed, and there are regions of clustered genes. Chromatin domains can be categorised based on their gene expression status. The genome contains gene-dense urban areas that are actively transcribed, which are rich in G/C base pairs (CpG rich islands) and produce chromosome bands. There are also transcriptionally inactive A/T rich regions which are gene-poor‘deserts’.
Overall, ~20% of the human genome consists of deserts that have no protein-coding genes.
What are CpG islands?
CpG islands are short DNA stretches (~200 bp - 1 kb) with a G+C content >50%, which is higher compared to the rest of the genome. There are about 29,000 islands in the human genome, and ~56% of human genes are estimated to be associated with such sequences. CpG islands tend to extend over promoters of expressed genes.
Cytosine methylation most commonly occurs at CpG to produce 5-methylcytosine. CpG doesn’t occur often throughout the genome as a consequence of DNA-mismatch repair which results in cytosine replacement by a uracil (deamination, which is replaced with a C by glycosylase) or thymine (deamination of 5-methylcytosine, mis-match repaired). As the repair process is not perfect, over time there will be a depletion of CpG.
What are the most gene-rich and -poor chromosomes?
- Gene poor chromosomes: 4, 18, X & Y
- Gene rich chromosomes: 19 & 22
- Gene rich regions of all chromosomes: subtelomeric regions
How do histones modify access to DNA?
Histones make up largest protein component of the nucleus. Areas of the genome are organised differently in terms of transcriptional activity, with the denser regions being harder to read. The accessibility to DNA depends upon histone markers and is regulated by the histone code, which is related to epigenetics.
What is the histone code?
The histone code is a hypothesis that the transcription of genetic information encoded in DNA is in part regulated by chemical modifications to histone proteins, primarily on their unstructured ends. Together with similar modifications such as DNA methylation it is part of the epigenetic code.
The histone code hypothesis is based on the fact that distinct modifications manifested at specific histone tail residues serve as domains for interaction with specific proteins and such interactions compartmentalize chromatin into heterochromatin and euchromatin.
The genome contains at least two types of information: genetic information, that is, the nucleotide sequence in the DNA, and __________ information, which is encoded in a complex set of chemical modifications of histones and DNA. __________ gene control systems decide about the genetic repertoire of the different differentiation states of cells in an organism and are stably transmitted during mitosis.
The functional state of chromatin domains, such as the histone-modification state (i.e., histone p_______________, a___________, m___________, or u______________) are thought to alter the interaction of histones with DNA (for instance by changing the charge) and to change interactions of chromatin with chromatin-associated proteins. The combination of histone modifications and the degree of a histone modification (i.e., whether the modification occurs in a mono, di, or tri form or at a particular lysine) are related to _____ activity and __________ folding.
The genome contains at least two types of information: genetic information, that is, the nucleotide sequence in the DNA, and epigenetic information, which is encoded in a complex set of chemical modifications of histones and DNA. Epigenetic gene control systems decide about the genetic repertoire of the different differentiation states of cells in an organism and are stably transmitted during mitosis.
The functional state of chromatin domains, such as the histone-modification state (i.e., histone phosphorylation, acetylation, methylation, or ubiquitination) are thought to alter the interaction of histones with DNA (for instance by changing the charge) and to change interactions of chromatin with chromatin-associated proteins. The combination of histone modifications and the degree of a histone modification (i.e., whether the modification occurs in a mono, di, or tri form or at a particular lysine) are related to gene activity and chromatin folding.
What different modifications can be made to histones? What do they do? Give examples.
Histone modifications are critical for the higher order organization of the DNA.
- Histone phosphorylation has been associated with transcriptional regulation, DNA repair, and chromatin condensation. The best characterized histone phosphorylation event is the phosphorylation of H2AX (γ-H2AX) that is linked to the ability of the cells to sense, respond, and repair lesions in DNA. The enzymes responsible for removing the phosphate group from H2AX are PP2A (mammalian).
- The major purpose of acetylation is to induce decondensation of chromatin. Acetylation neutralizes the positive charge of lysine residues decreasing their interaction with negatively charged DNA. In humans, five different families of histone acetyltransferases (HATs) have been identified as being responsible for acetylation of lysines on histone H3 and H4.
- Histone methylation of lysine residues has been linked to the detection and repair of DNA DSBs. In mammalian cells, methylation of lysine 79 on histone H3 is involved in DNA DSBs processing.
- Ubiquitylation of histones plays a key role in transcriptional activation and repression, heterochromatic silencing, and DNA repair.
Outline the Human Epigenome Project (HEP).
The Human Epigenome Project (HEP) aims to identify, catalogue and interpret genome-wide DNA methylation patterns of all human genes in all major tissues. Methylation is the only flexible genomic parameter that can change genome function under exogenous influence. Hence it constitutes the main and so far missing link between genetics, disease and the environment that is widely thought to play a decisive role in the aetiology of virtually all human pathologies. Methylation occurs naturally on cytosine bases at CpG sequences and is involved in controlling the correct expression of genes. Differentially methylated cytosines give rise to distinct patterns specific for tissue type and disease state. Such methylation variable positions (MVPs) are common epigenetic markers. Like single nucleotide polymorphisms (SNPs), they promise to significantly advance our ability to understand and diagnose human disease.
What are the different classes of tandemly repeated DNA?
- Satellites: highly repetitive DNA consisting of short sequences repeated up to a million times in the genome. May have a different density to the rest of the genome due to the base pair composition of the repeated sequence. Most satellite DNA is found around the centromere, in subtelomeric regions an in the heterochromatic short arms of acrocentric chromosomes and most of the Y chromosome.
- Minisatellites: a class of highly repetitive sequences that consist of sequences between 10 bp and 100 bp long repeated in tandem arrays that vary in size from 0.5 to 40kb. They tend to occur near telomeres although they have been found elsewhere. Also known as variable number tandem repeats (VNTRs).
- Microsatellites: tandem repeats of short sequence 2-4 nucleotides in length found at many different locations in the genome. These are short, usually being <10 bp, making up ~6 Mb or 2% of the genome.