Human Genome Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What is the C-value paradox?

A

Genome size is not always proportional to the complexity of the organism.
To a certain extent, species complexity is linked to the genome size but is a loose relationship, so:
- genome size is not always proportional to complexity of organism
- similar organisms may have greatly differing genome sizes (due to differences in amount of repetitive DNA)
- there is a correlation between minimum genome size for a class of organism and complexity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are retroelements?

A

Retroelements are DNA elements that arise by reverse transcription

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What has the smallest known genome of vertebrates?

A

Fungu (pufferfish) has smallest known genome size of vertebrates with it being 390 millions of base pairs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are/were the aims of the human genome project and its successors?

A
  • generation of complete sequence of human genome
  • encode project: identification of all functional human DNA sequences e.g. Genes, splice junctions, promoter and enhancer sequences etc.
  • 1000 genomes project: mapping of sites where genome sequence varies between individuals
  • cancer genome atlas and cancer genome project: identification of genomic alterations associated with cancer
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the differences between heterochromatin and euchromatin?

A

Heterochromatin DNA is very tightly condensed and therefore is assumed to be non functioning, non coding, largely repetitive DNA - enzymes can’t get to it

Euchromatic DNA is potentially transcribable DNA which is less condensed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Where can repetitive sequences occur?

A

Repetitive sequences can occur within genes as well as intergenic DNA - it is very common to see repetitive sequences in the introns of genes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What makes up intergenic DNA?

A

Intergenic DNA is made up from pseudogenes, structural DNA sequences and repetitive DNA

Pseudogenes - genes which were once active in our evolutionary past but have gone out of use

Structural DNA sequences - sequences needed to maintain chromosome integrity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What percentage of the genome do exons, introns and repetitive DNA make up?

A

Exon’s make up 2.9% of the genome with only 1.2% of this coding and the other 1.7% being non-coding

Introns make up 36.6% of the genome

Repetitive DNA makes up 45% of the genome with this being split into interspersed repetitive DNA and tandem repetitive DNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is interspersed repetitive DNA?

A

Interspersed repetitive DNA is made up of;

  • DNA transposons
  • LINEs - long interspersed nuclear elements
  • SINEs - short interspersed nuclear elements
  • Endogenous retroviruses

Interspersed repetitive DNA is derived from transposons which are mobile genetic elements which can move to new locations within the genome of the cell

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the two types of transposition?

A

Copy and paste - e.g. Have a transposon on chromosome 1 and makes a copy which inserts itself in chromosome 2. Old or newly inserted transposon acquire mutations which make it no longer able to transpose.

Cut and paste - e.g. Have a transposon that was in chromosome 1 which uproots and inserts itself into chromosome 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which of the LINESs is active?

A

LINEs represent 21% of the genome (870,000 LINEs in genome split into 3 families)
LINE-1 elements are potentially active with about 500,000 of these in our genomes

LINE-2 and LINE-3 elements are least abundant and are inactive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What do LINEs encode?

A

LINE-1 encodes two separate proteins as it has two open reading frames.

  • One open reading frame encodes for transposition RNA binding protein.
  • The other open reading frame encodes large protein which has both endonuclease activity and reverse transcriptase activity.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the difference between direct repeat and inverted repeat flanking sequences?

A

Direct repeat - same sequence repeating itself on the same strand

Inverted strand - sequence in 5’ to 3’ on one strand that is also present on the lower strand in 5’ to 3’.

SINEs and LINEs are flanked by direct repeats

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How are LINEs transposed?

A
  • LINE-1 DNA sequence are transcribed into LINE-1 mRNA
  • LINE-1 encodes RNA binding protein and endonuclease/reverse transcriptase protein
  • RNA binding protein and endonuclease/reverse transcriptase protein bind to LINE-1
  • Complex of LINE-1 RNA protein complex enters nucleus
  • LINE-1 endonuclease cuts the target site (few Ts followed by a few As
  • LINE-1 has stretch of A residues at 3’ end which allows it to base pair with target site
  • Reverse transcriptase associated with mRNA reverse transcribes makes a DNA copy of it
  • now have DNA-RNA duplex, RNA is replaced with DNA
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are SINEs?

A

SINEs are short interspersed nuclear elements which constitute approximately 13% of the human genome and are split into 3 families:

  • Alu - only SINE active in transposition
  • MIR
  • MIR3

SINEs transpose by the same mechanism as LINEs despite having no capacity to make a protein and so they use the LINE-1 proteins.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

By what mechanisms can SINEs and LINEs lead to mutations?

A

1) a transposition that leads to a LINE-1 or Alu sequence disrupting gene - Transpsotion interrupts gene

2) deletions can be caused by recombination between two nearby Alu or LINE elements. They ,ah arise form unequal crossing over between repeats
- the Alu sequence before gene has become aligned with the Alu sequence after the gene. The two are slightly out of register but recombination can occur because there is virtually identical sequences, there is crossing over, unequal crossing over means loss of gene in a gamete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a gene?

A

A gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products - encode project definition

The set of DNA sequences required to encode a particular functional product or a set of products which overlap in sequence - Glenn’s definition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How many protein coding genes are there in the human genome?

A

There are 20,000 protein coding genes in the human genome (between 19,000 and 21,000)

19
Q

What do introns start and with?

A

Introns more or less always start with GT

Introns more or less always end in AG

20
Q

Why are there more proteins in a human than there is genes in the human genome?

A
  • 20-90% of human genes show alternative splicing
  • each alternatively spliced gene produces an average of 4 different mRNAs
  • so the human genome may encode over 60,000 proteins
21
Q

What is the average human gene? (Mean number of exons, mean exon size, mean final exon size, mean intron size, mean gene size)

A

Mean number of exons = 11 (range 1 - 365)
Mean exon size = 161bp (excluding final exon)
Mean final exon size = 1716bp
Mean intron size = 6,400bp
Mean gene size = 66,000bp (range < 1000bp > 2,000,000bp)

22
Q

How are chromosomes ordered?

A
  • Chromosomes are numbered in order of decreasing size
  • Chromosome size is not representative of number of genes on that chromosome

Highest density chromosome is chromosome 19, 22.5 genes per Mb

Lowest density chromosome is Y chromosome, 0.76 genes per Mb

23
Q

What is the function of genes that do not encode proteins?

A

These are difficult to identify, but there’re may be approximately 23,000 genes in the human genome that encode functional RNA other than mRNA e.g. Genes encoding;

  • tRNA
  • rRNA
  • small-nuclear RNAs (snRNAs involved in splicing)
  • micro-RNAs (miRNAs - these regulate stability and translation of complementary mRNAs)
  • Long non-coding RNAs with regulatory, enzymic or structural roles
24
Q

What percentage of total RNA mass is each RNA type?

A
rRNA - 80-90%
tRNA - 10-15%
mRNA - 3-7%
snRNA &amp; miRNA - <0.5%
Long ncRNA - <0.2%
25
Q

RNA facts

A

In most tissues >50% of all RNA originates from mitochondrial genes - cell has many mitochondria but only one nucleus

At least 4,000 protein coding genes appear to be housekeeping genes expressed in all human tissues

Any given tissue has fewer than 200 genes expressed only in that tissue

26
Q

Gene families and super families

A

Most genes are single copy - occur once per haploid genome
Genes encoding very abundant proteins (e.g. Histones, rRNA) may occur in multiple identical copies
Most human genes belong to a gene family i.e. A family of related genes all derived from a common ancestor

27
Q

Nomenclature in gene families

A

Genes that share a common ancestor are homologues (e.g. All globin genes)

Two homologous genes arising as a result of species divergence are orthologues (human and cat beta-globin)

Two homologous genes arising as a result of gene duplication are paralogues (e.g. Human beta-globin and human alpha-globin)

28
Q

Why do we see changes in genome when we compare two similar animals?

A

We see changes in genome when we compare two similar animals because of the expansion and contraction of gene families

  • approximately 80% of human genes have a direct equivalent (orthologue) in the mouse
  • <1% of human genes have no related genes in the mouse
  • most of the difference between gene content of different vertebrate genomes are due to changes in the number of genes within families e.g. Mice have more olfactory receptors than us (1035 olfactory receptor genes in mice compared to 396 in humans) and so mice have better sense of smell
29
Q

How can new genes appear in a genome?

A
  • exon shuffling
  • gene duplication
  • insertion of reversed transcribed mRNA to generate a retrogene
  • origin of gene de novo
  • horizontal gene transfer (very rare in vertebrates)
30
Q

What is exon shuffling?

A

Exons from gene become inserted into, or fused with, another gene. Often but not always, a protein will have a number of distinct domains, and again often but not always each domain will be encoded by a distinct exon.

This may happen by chromosomal rearrangement or by the gene being next to a transposon
Exon shuffling generates a new hybrid gene

31
Q

What is gene duplication and how does it occur?

A

Gene duplication is the duplication of a gene.

Gene duplication occurs by:
- unequal crossing one between repetitive sequences - gene is flanked by two copies of repetitive sequence, repetitive sequence on opposite side of both gene cross over

  • Unequal crossing over between duplicated genes - when there is a cluster of related genes new genes may be generated by unequal crossing over during meiosis (these genes are almost the same as each other)
32
Q

What is the fate of duplicated genes?

A

Becomes a pseudogene (non-functional) - gene duplication may be disadvantageous as protein it encodes could be toxic in excess

One of duplicates acquires mutations that give it a new beneficial function (neofunctionalization) - produces protein with altered proteins giving it a new function

Function of the gene is shared between the two duplicates (sub-functionalisation)

The gene encodes a protein that is beneficial in excess

33
Q

What is neofunctionalization and what are examples of this?

A

Neofunctionalization is the result of one of a pair of duplicated genes which gains mutations which gives it a new beneficial function. The protein which it produces has altered properties and so has a new function.
An example of this includes;
The antifreeze glycoprotein gene of Antarctic fish, which encodes a glycoprotein with 41 copies of the motif Thr-Ala-Ala, evolved from a duplicate of the trypsinogen gene

34
Q

How do retrogenes arise?

A

Retrogenes and processed pseudogenes arise from insertion into the genome of reverse transcribed mRNAs

A significant retrogene derived from the gene encoding fibroblast growth factor (FGF-4) gives dachshund its short-legged phenotype

35
Q

What is de novo generation?

A

De novo generation is a gene that is generated from DNA that was previously non-coding.
This process requires:
- transcription of previously non-coding DNA
- presence or generation of an open reading frame
- acquisition of function

36
Q

Why is it difficult to prove the presence of human specific de novo genes?

A

They can’t be analysed for conservation between species (since they do not exist in other species)

They can’t have an essential function (since ancestors didn’t have them)

37
Q

What are pseudogenes?

A

Pseudogenes resemble genes (including having intron-exon like regions) but have mutations (usually lots) that make them unable to produce a functional product.

They can either be processed or unprocessed.

38
Q

What are the two species of pseudogenes?

A

Duplicated pseudogenes generated when null mutations occur in a recently duplicated gene. Humans have about 3500 duplicated pseudogenes

Unitary pseudogenes generated when null mutations arise in an existing gene which is no longer required. Humans have about 200 unitary pseudogenes (e.g. GULO –> encoding an enzyme required for vitamin C synthesis)

39
Q

Comparison between pseudogenes (unprocessed pseudogenes) and processed pseudogenes

A

Both usually contain mainly mutations rendering them non-functional

Unprocessed pseudogenes are derived from DNA (resemble genes e.g. Show introns, exons, promoter etc.)
Processed pseudogenes are derived from mRNA so resemble mRNA (introns are absent, no promoter, poly A rich region at 3’ end)

Unprocessed pseudogenes may be in a cluster that contains active, related genes
Processed pseudogenes are usually distant from active, related genes

Unprocessed pseudogenes are not flanked by direct repeats
Processed pseudogenes are flanked by direct repeats

40
Q

What was the aim of the 1000 genomes project?

A

The aim was to identify all common genetic variants within humans and to explore the link between genotype and phenotype

41
Q

What are the three main types of variation between humans?

A
  • single nucleotide polymorphism (SNP) –> where one person has an A and another person has a C at the same position
  • insertions/deletion
  • copy number variants (CNVs)
42
Q

What is the difference in genome between two typical individuals?

A

Approximately 4 millions SNPs
Approximately 600,000 INDELs
Approximately 1,000 CNVs
Approximately 1,000 sites of LINE and SINE insertions

43
Q

On average how many gene alleles in humans have a mutation causing complete loss of function?

A

The average person has approximately 200 gene alleles that have a mutation causing complete loss of function

44
Q

In what way do different vertebrate species show variation?

A

Change in the number of genes in a gene family

Change in coding sequence of orthologous genes

Variation in regulatory sequences that lead to a change in the expression of orthologous genes