Organisation of the human genome Flashcards

1
Q

Outline the human genome.

A

The human genome contains all of the DNA content in human cells.

Nuclear genome:

  • ~23,000 genes
  • 19,599 protein coding genes
  • Contains >99% of cellular DNA
  • Arranged in 24 linear double-stranded DNA molecules (22 autosomes, 2 sex chromosomes)
  • 3 billion bases
  • Average gene is 3,000 bp long

MItochondrial genome:

  • 37 genes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Outline ENCODE

A

ENCODE: ENCyclopedia Of DNA Elements.

ENDCODE is a public research consortium launched by NHGRI which was launched in Septemper 2003.

The goal of ENCODE is to build a comprehensive parts list of functional elements in the human genome, including elements that act at the protein and RNA levels, and regulatory elements that control cells and circumstances in which a gene is active.

The first pilot phase, started in 2003, accrued such information on just 1% of the genome and determined which experimental techniques were likely to work best on the whole genome.

After the initial pilot phase, scientists started a second round of technology development phase to apply their methods to the entire genome in 2007 and closed successfully in September 2012 with the promotion of several new technologies to generate high throughput data on functional elements.

The productions of 1640 datasets focusing on 24 standard types of experiment within 147 different cell types reveal that 80.4% of the human genome displays some functionality in at least one cell type.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What were the key findings of ENCODE?

A
  • 80.4% of the human genome displays some functionality in at least one cell type.

This is more than was previsouly thought.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Give an overview of how the genome is organised in the context of the cell.

A

DNA is organised into different chromosomes within the nucleus, and during interphase these chromosomes are organised into different territories. The chromatin exists in two forms based on how condensed it is: euchromatin and heterochromatin.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Compare heterochromatin against euchromatin.

A
  • Heterochromatin: darkly stained aggregates at the periphery of the nucleus. It is stable, conserved during development, makes up centromeres and telomeres, is gene poor, replicated late, and contained repetitive sequences. Heterochromain is reversible, and depends upon the stage of development or the cell type.
  • Euchromatin: lightly stained material in interphase nuclei, gene rich, early replicating.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the composition of heterchromatin and euchromatin on average in a single chromosome?

A
  • 3000 Mb of the genome is euchromatin and 200 Mb is heterochromatin.
  • The average chromosome is 140 Mb.
  • There is around 3 Mb of heterochromatin at each centromere.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How are genes distributed throughout the genome?

A

From karyotyping and G-banding, it can be seen that genes are randomly distributed, and there are regions of clustered genes. Chromatin domains can be categorised based on their gene expression status. The genome contains gene-dense urban areas that are actively transcribed, which are rich in G/C base pairs (CpG rich islands) and produce chromosome bands. There are also transcriptionally inactive A/T rich regions which are gene-poor‘deserts’.

Overall, ~20% of the human genome consists of deserts that have no protein-coding genes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are CpG islands?

A

CpG islands are short DNA stretches (~200 bp - 1 kb) with a G+C content >50%, which is higher compared to the rest of the genome. There are about 29,000 islands in the human genome, and ~56% of human genes are estimated to be associated with such sequences. CpG islands tend to extend over promoters of expressed genes.

Cytosine methylation most commonly occurs at CpG to produce 5-methylcytosine. CpG doesn’t occur often throughout the genome as a consequence of DNA-mismatch repair which results in cytosine replacement by a uracil (deamination, which is replaced with a C by glycosylase) or thymine (deamination of 5-methylcytosine, mis-match repaired). As the repair process is not perfect, over time there will be a depletion of CpG.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the most gene-rich and -poor chromosomes?

A
  • Gene poor chromosomes: 4, 18, X & Y
  • Gene rich chromosomes: 19 & 22
  • Gene rich regions of all chromosomes: subtelomeric regions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do histones modify access to DNA?

A

Histones make up largest protein component of the nucleus. Areas of the genome are organised differently in terms of transcriptional activity, with the denser regions being harder to read. The accessibility to DNA depends upon histone markers and is regulated by the histone code, which is related to epigenetics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the histone code?

A

The histone code is a hypothesis that the transcription of genetic information encoded in DNA is in part regulated by chemical modifications to histone proteins, primarily on their unstructured ends. Together with similar modifications such as DNA methylation it is part of the epigenetic code.

The histone code hypothesis is based on the fact that distinct modifications manifested at specific histone tail residues serve as domains for interaction with specific proteins and such interactions compartmentalize chromatin into heterochromatin and euchromatin.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The genome contains at least two types of information: genetic information, that is, the nucleotide sequence in the DNA, and __________ information, which is encoded in a complex set of chemical modifications of histones and DNA. __________ gene control systems decide about the genetic repertoire of the different differentiation states of cells in an organism and are stably transmitted during mitosis.

The functional state of chromatin domains, such as the histone-modification state (i.e., histone p_______________, a___________, m___________, or u______________) are thought to alter the interaction of histones with DNA (for instance by changing the charge) and to change interactions of chromatin with chromatin-associated proteins. The combination of histone modifications and the degree of a histone modification (i.e., whether the modification occurs in a mono, di, or tri form or at a particular lysine) are related to _____ activity and __________ folding.

A

The genome contains at least two types of information: genetic information, that is, the nucleotide sequence in the DNA, and epigenetic information, which is encoded in a complex set of chemical modifications of histones and DNA. Epigenetic gene control systems decide about the genetic repertoire of the different differentiation states of cells in an organism and are stably transmitted during mitosis.

The functional state of chromatin domains, such as the histone-modification state (i.e., histone phosphorylation, acetylation, methylation, or ubiquitination) are thought to alter the interaction of histones with DNA (for instance by changing the charge) and to change interactions of chromatin with chromatin-associated proteins. The combination of histone modifications and the degree of a histone modification (i.e., whether the modification occurs in a mono, di, or tri form or at a particular lysine) are related to gene activity and chromatin folding.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What different modifications can be made to histones? What do they do? Give examples.

A

Histone modifications are critical for the higher order organization of the DNA.

  • Histone phosphorylation has been associated with transcriptional regulation, DNA repair, and chromatin condensation. The best characterized histone phosphorylation event is the phosphorylation of H2AX (γ-H2AX) that is linked to the ability of the cells to sense, respond, and repair lesions in DNA. The enzymes responsible for removing the phosphate group from H2AX are PP2A (mammalian).
  • The major purpose of acetylation is to induce decondensation of chromatin. Acetylation neutralizes the positive charge of lysine residues decreasing their interaction with negatively charged DNA. In humans, five different families of histone acetyltransferases (HATs) have been identified as being responsible for acetylation of lysines on histone H3 and H4.
  • Histone methylation of lysine residues has been linked to the detection and repair of DNA DSBs. In mammalian cells, methylation of lysine 79 on histone H3 is involved in DNA DSBs processing.
  • Ubiquitylation of histones plays a key role in transcriptional activation and repression, heterochromatic silencing, and DNA repair.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Outline the Human Epigenome Project (HEP).

A

The Human Epigenome Project (HEP) aims to identify, catalogue and interpret genome-wide DNA methylation patterns of all human genes in all major tissues. Methylation is the only flexible genomic parameter that can change genome function under exogenous influence. Hence it constitutes the main and so far missing link between genetics, disease and the environment that is widely thought to play a decisive role in the aetiology of virtually all human pathologies. Methylation occurs naturally on cytosine bases at CpG sequences and is involved in controlling the correct expression of genes. Differentially methylated cytosines give rise to distinct patterns specific for tissue type and disease state. Such methylation variable positions (MVPs) are common epigenetic markers. Like single nucleotide polymorphisms (SNPs), they promise to significantly advance our ability to understand and diagnose human disease.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the different classes of tandemly repeated DNA?

A
  1. Satellites: highly repetitive DNA consisting of short sequences repeated up to a million times in the genome. May have a different density to the rest of the genome due to the base pair composition of the repeated sequence. Most satellite DNA is found around the centromere, in subtelomeric regions an in the heterochromatic short arms of acrocentric chromosomes and most of the Y chromosome.
  2. Minisatellites: a class of highly repetitive sequences that consist of sequences between 10 bp and 100 bp long repeated in tandem arrays that vary in size from 0.5 to 40kb. They tend to occur near telomeres although they have been found elsewhere. Also known as variable number tandem repeats (VNTRs).
  3. Microsatellites: tandem repeats of short sequence 2-4 nucleotides in length found at many different locations in the genome. These are short, usually being <10 bp, making up ~6 Mb or 2% of the genome.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Define a transposable element.

A

Transposable elements (TEs), also known as “jumping genes” or transposons, are sequences of DNA that move (or jump) from one location in the genome to another.

17
Q

Outline transposons. What are the different types?

A

Transposons are mobile genetic elements that are generated in a new location. When this happens, one copy is found at the original site and one in the new location. Because the removal of duplicate copies is a slow process on an evolutionary time-scale, they will increase in number. In humans they occupy 45% of the genome.

The signature of mobile genetic elements is a short direct repeat sequence either side of the point of insertion into the host chromosome. There are four classes of repeated DNA elements:

  • long interspersed elements (LINEs): make up around 20% of the human genome, complete LINEs are ~6-8 kb in size and contain a promoter for RNA Pol II and two ORFs.
  • short interspersed elements (SINEs): present at high frequencies in various eukaryotic genomes, three families are found in the human genome- Alu, MIR and MIR3. They are a class of retrotransposons, DNA elements that amplify themselves throughout eukaryotic genomes, often through RNA intermediates. 300-400 bp in size.
  • LTR retrovirus-like elements: direct LTRs that range from ~100 bp to over 5 kb in size. Between the LTRs are gag and pol genes that encode a revers transcriptase, protease, RNAse H and integrase. These proteins are sufficient to programme the autonomous transposition of the element. Nearly all LTRs in the mammalian genome are inactive.
  • DNA transposons: these resemble bacterial transposons, they have terminal inverted repeats and encode a transposonase (only by an intact element). Transposonase can act in the nucleus on deleted and inactive copies of the element, leading to accumulation of inactive elements. They tend to be short lived. There are seven classes in human genomes (including MER1 and MER2), occupying <3% of the genome. All are inactive.
18
Q

What degree of genome similarity is there between individuals? What are types of variation?

A

The DNA sequence in the genome is more or less the same between individuals, apart from a very small amount. It is usually assumed that 99.9% (10 years ago, post HGP) of DNA sequence is the same, but this has recently changed due to discoveries of more variations, leading to a new estimate of around 99.8%. There are many factors to take into account, so it is hard to establish the exact value. There are two types of variation: structural level (CNVs) and sequence level (SNPs).

19
Q

SNPs

Outline sequence level variation. What projects research this?

A

Sequence level variation is where there is variation in the genome at the level of individual nucleotides. Single nucleotide polymorphisms (SNPs) are variations in a single nucleotides between individuals that occur at a specific positions in the genome, and some are more common at certain positions than others. There was a project (post HGP) to explore the organisation of the human genome called HapMap. The aim was to identify SNPs in the human genome, now we know that there are 11 million SNPs in the human genome. This is important for medical geneticists as they can be associated with disease (common diseases). Genome Wide Association Studies (GWAS) analyse all relevant SNPs in one experiment. Identification of SNPs is done through genotyping (scanning the whole genome for sets of SNPs).

20
Q

CNVs

Outline structural level variation. How can structural level variation be detected?

A

Copy number variation is a phenomenon in which sections of the genome are repeated and the number of repeats in the genome varies between individuals in the human population. Copy number variants (CNVs) can vary from two bases to many thousands. When plans were made for the HGP, only a few individuals were used for sequencing. This means that variation could not be assessed, as more genomes were sequenced, CNVs were discovered. These can be deletions, duplications, inversions, and large-scale (thousands) copy numbers. Some of these can be seen at the microscopic (karyotype) level. These sub-microscopic variations vary from 1 kb - 3 kb in size, and amount between 4 and 24 Mb. These types of variation are more common than at the sequence level.

Cytogenetic detection of CNVs can be seen using G-banding, C-banding, and FISH. Comparative genomic hybridisation can be used to compare CNV profiles, e.g. between parents and children, so see if CNVs are inherited. Microarray technologies can also be used to detect copy number variations (print dots of genes onto a microscope slide (complete human gene set), and you hybridise short sequences to complete a profile). These can be compared between individuals.

21
Q

What are the different methods of CNV detection? What are their advantages and drawbacks?

A
22
Q

How do CNVs and SNPs compare in contribution to genome variation?

A

It is now recognized that the genomes of any two individuals in the human population differ more at the structural level than at the nucleotide sequence level. Conservative estimates suggest that CNVs between individuals amount to 4 Mb (1/800 bp) of genetic difference, and less conservative estimates put this figure in the range of 5–24 Mb. By either measure, CNVs** account for more nucleotide variation on average than single nucleotide polymorphisms (SNPs), which account for approximately 2.5 Mb (1/1,200 bp**).

23
Q

What is the importance of SNPs?

A

Geneticists can use SNPs as markers to locate genes in DNA sequences. Some SNPs may be associated with human disease and therefore be of particular interest.

24
Q

What kinds of structural variation exists in the genome?

A
  • Insertions/Deletions
    • Insertion and deletion events represent the most frequent** type of **structural variation in the human genome, and also the best characterized. According to the Human Genome Mutation database, 5% of all mutations associated with simple Mendelian genetic diseases are currently attributed to submicroscopic insertion or deletions.
    • Insertion/deletion polymorphisms of several genes with functions in metabolism influence a variety of common phenotypes. A number of drug detoxification enzymes show this type of polymorphism, with some being homozygously deleted in as many as 30% of individuals of certain ethnicity. Copy number changes of cytochrome P450 drug-metabolizing enzymes, such as CYP2D6, are associated with variability in metabolism of tricyclic antidepressants and antipsychotic drugs.
  • Duplications
  • Inversions
    • Inversions represent a change in the order or orientation of a DNA segment, and can either be balanced or unbalanced. Although inversions can potentially affect gene expression, either by disrupting coding regions that span the breakpoints or by position effects acting on genes adjacent to the breakpoints, most inversions are not associated with alterations in gene copy number and thus may not cause an obvious phenotypic effect.
    • Example: 900-kb inversion polymorphism in two divergent European haplotypes (H1 and H2): correlates perfectly with the alternate orientations of this inversion and diverged ∼3 million years ago. Detailed studies showed a small but significant increase in fertility in female carriers of the inversion, explaining how its frequency increased rapidly in Europeans, despite emerging only relatively recently.
    • In Sotos syndrome, which is often caused by microdeletion of a 2.2-Mb region at 5q35, a heterozygous inversion of the critical region was detected in all fathers (37) of the children carrying a paternally derived deletion in one study. In each case, inversion of the region between the flanking duplications is thought to result in abnormal meiotic pairing, leading to an increased susceptibility to unequal nonallelic homologous recombination (NAHR).
  • Large scale copy-number variants
25
Q
A