Week 1.1: Introduction, the landscape of the human genome Flashcards

1
Q

When was the human genome project published?

A

The human genome was first published in 2001

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Who led the private, and public HGP? Where were they published?

A

The private project led by J. Craig Venter published in Science. Eric S. Lander led the public project that was mainly funded by the US government and published in Nature, during the same time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Human genomics timeline

1866

A

1866 Mendel publishes laws of inheritance, using pea plants, this is when we first began to understand how genetics works.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Human genomics timeline

1869

A

1869 Miescher isolates “nuclein” from cells, at the time we did not know about the link (DNA) between the nuclein discovered and the genetics of Mendel’s discovery.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Human genomics timeline

1912

A

1912 Chromosome counts 47 male, 48 in female, at the time chromosomes were not understood. Today we know that the counts were wrong, because the male Y chromosome, which is smaller than the X chromosome, was too small to be seen by the microscopes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Human genomics timeline

1944-1952

A

1944-1952 DNA shown to be genetic material

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Human genomics timeline

1953

A

1953 Crick and Watson structure of DNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Human genomics timeline

1961

A

1961 Nirenberg cracks genetic code

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Human genomics timeline

1996

A

1996 Yeast genome sequenced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Human genomics timeline

2001

A

2001 Human genome sequence published

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Human genomics timeline

2012

A

2012 ENCODE project published, telling us a lot more about the functionality of the human genome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the latest version of the human genome called? when was it released?

A

Latest version of the human genome is called GRCh38, and was released in December 24th 2013.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How many bases does the human genome have? What percentage of bases are unknown? what percentage of bases are unplaced?

A

Current best human genome we have is 3.2 billion bases long, and about 4.98% of those bases are unknown, we do not know if they are A,G,C, or T. To this day, we do not have a perfectly complete human genome. 0.14% of bases are unplaced, meaning we do not know where on the chromosome they are.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How many bases long is the human genome? What is the equivalent to in bibles, Qu’rans?

A

The size of the human genome, it is 3.2 billion letters long, the equivalent to 1,000 Bibles or 10,000 Qu’rans, in (almost) every cell in our bodies we carry 2 copies. We have 100 trillion cells, therefore we carry about 6.5x10^23 bases of DNA.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In the context of computers, one computer byte can represent four base pairs, meaning a haploid human genome can be coded in 0.8Gb of information

How much information does the human body contain?

A

Thus, the human body contains approximately 161 billion terabytes of information in DNA
= 161 exabytes
= 322 billion of average laptop hard-drives
= 32 trillion DVDs at 5Gb
= 26 years of internet traffic
As you walk around more information is moving then all the information on the internet over 26 years!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Is it cheaper to generate DNA than to store it? What could DNA be used as?

A

Generating big data, in sequencing DNA is relatively cheap and easier, we are getting to a point where generating the data is cheaper than storing the information. As DNA is an incredibly efficient way of storing and moving information, we are thinking about storing our scientific results in a DNA file in a tube at -80degrees rather than on a hard drive.

Nick Goldman demonstrated this.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How many genes are there?

Sweepstake carried out in 2000-2003, where scientists were asked to guess how many protein-coding genes there are in the human genome

What was the median bid?

A

There was a median bid of 61,302 genes, lowest bid = 25,847, highest bid = 300,000.

The actual result was 21,000 genes.

18
Q

What percentage of the genome is made up of protein-encoding genes?

What part of the DNA are protein coding?

A

Genes – 1%

Exons are the part of the DNA that is protein coding.
The DNA is transcribed into RNA which is translated into proteins. 3 bases to one amino acids, they code for structural proteins, enzymes and signal proteins.

19
Q

What percentage of the genome is made up of Gene-related?

A

Gene-related – 36%
Gene-related are parts of the genome that have something to do with genes. Introns are found dispersed across the exons. Untranslated regions (UTRs), which are transcribed with RNA. Promoter regions that govern the expression of genes. RNA genes (ncRNA’s) which are involved in regulation of genes. Pseudogenes which appear as ‘dead’ genes. Gene fragments, left over bits of genes.

20
Q

What does a typical protein coding gene consist off?

A

Promoter with a TATA box, is followed by a series of exons and introns between them. Splice sites between the exons and introns (GT | intron | AG). UAA stop codon, at the end of the last exon. This gets transcribed into pre-mRNA. Which is then spliced, so that the introns are removed and it also gets capped at the 5’ end of the mRNA, and there is a poly-A tail at the 3’ end. This makes it a target for the cytoplasm. In the cytoplasm that mRNA is translate and made into a polypeptide proteins.

21
Q

What are RNA (ncRNAs) genes?

Give 4 examples?

A
RNA Genes (ncRNAs) 
RNA genes are something that we did not know much about until recently, whilst estimated number of protein-coding genes has fallen, estimated number of RNA genes has increased. This is one of the biggest genomics surprises over the last 10 years. Some of these occur in big clusters in the human genome, such as ribosomal, tRNA, U1 snRNA, pi-RNA. Others dispersed among the protein-coding regions and found in introns.
22
Q

What is transfer RNA (tRNA)?

A

Transfer RNA is an example of RNA that we have known about for a while, the tRNA proposed as the adaptor linking mRNA to proteins when mRNA is translated. tRNA binds onto the amino acid at one end and at the other end it binds onto the mRNA and that’s how the amino acids gets into the chain that’s defined by the mRNA.

tRNA is found in large clusters of up to 700 genes on multiple chromosomes, we need tRNA’s all of the time. Thus, it makes sense to have many genes coding them.

23
Q

What are ribozymes and what do they do?

A

Ribozymes are enzymes made up of RNA’s they are not translated into proteins but function as RNA’s. As RNA’s are single stranded nucleic acids, they are able to fold up on it and as a result are able to form many complex structures that enable it to interact with DNA. For this reason, they are often involved in the cleaving of other RNA’s. Often they are found in a single copy of the human genome.

24
Q

What are ribonucleoproteins (RNPs) and what do they do?

A

Ribonucleoproteins (RNPs) are ribozymes that form complexes with proteins and they are involved in lots of functions in processing nucleic acids. Ribosomes (rRNA) are the best and best-known examples of ribonucleoproteins. Spliceosome (snRNPs) is another and telomerase – which has a role in governing the length of telomeres in chromosomes.
RNPs RNA part can bind to the DNA and the protein can carry out functions that the RNA is unable to do.

25
Q

Where are ribosomal genes found?

A
Are found in clusters see diagram on page 5 which illustrates one such cluster;
This 40 Kbp (Kilobase Pairs) long module is tandemly repeated 30-40 times on the short arms of human chromosomes 13, 14, 15, 21 and 22. They comprise about eight Mbp (mega base pairs) of the human genome.

Human 5S rDNA occur within 2.2 kb repeating elements organized in tandem of 35 to 175 copies per haploid genome on Chromosome 1.
These ribosomal clusters under a microscope with suitable dye can be observed.

26
Q

Ribosomal clusters under a microscope with suitable dye can be observed.

What dye is this and what can we find?

A

(FISH) Fluorescent in-situ hybridisation of human metaphase chromosomes. Green = rDNA, yellow = centromeres, red = telomeres.
Recently as well as mRNA, within the DNA of a protein coding gene we find all sorts of other types of RNA some short, some long.

Short ncRNA’s (nc = non-coding) 
Long ncRNA’s (nc =non-coding) 
snoRNA 
miRNA (microRNAs) 
Some are 5’ to 3’ others are opposite direction
27
Q

Many of these mini RNA genes are involved in anti-sense regulation of the messenger RNA, because they are encoded in the same segment if read in the opposite direction they can bind to the mRNA that has been made from the same region, as they are the reverse complement/complementary of it.

What protein do they bind too? What does this do?

A

They bind to a protein called argonaute This is a clever way of controlling gene expression. The messenger RNA will be chopped up if too much small RNA is transcribed.

28
Q

What are argonaute proteins?

A

Argonaute proteins are the catalytic components of the RNA-induced silencing complex (RISC), the protein complex responsible for the gene silencing phenomenon known as RNA interference (RNAi).[1] Argonaute proteins bind different classes of small non-coding RNAs, including microRNAs (miRNAs), small interfering RNAs (siRNAs) and Piwi-interacting RNAs (piRNAs).[2] Small RNAs guide Argonaute proteins to their specific targets through sequence complementarity, which typically leads to silencing of the target. Some of the Argonaute proteins have endonuclease activity directed against messenger RNA (mRNA) strands that display extensive complementarity to their bound small RNA, and this is known as Slicer activity.

29
Q

What are microRNA (miRNA)? What do they do?

A

MicroRNA (miRNA)
Small RNA, 22 nucleotides long, in the genome they form a hair-pin structure in RNA, that are cleaved out, exported from nucleus, and cut in two. > 1000 types. Often they are in introns or UTRs, and occur in arrays.

30
Q

What are long regulatory ncRNA’s? What do they do?

A

They often >1000bp, there is over 3000 types they are also involved in regulation of gene expression
HOX cluster contains 39 protein-coding genes, was first discovered in drosophila, but produces 231 different long ncRNA’s. Some look like mRNA but act as anti-sense regulators, as they are transcribed in opposite direction.
Function of vast majority unknown, usually present in genome as individual copies.
Long dsRNA  cleaved by ‘dicer’  into siRNA (small-intron RNA)  these break up into single stranded RNA that then bind to reverse complement  they either guide argonaute complexes to cleave matching mRNA transcripts or guide argonaute complexes methylate DNA near newly synthesising mRNA

31
Q

What are piwi protein interacting RNA (piRNAs)

A

Piwi protein interacting RNA (piRNAs)
24-31 nucleotides long, over 15000 types, found in 89 clusters in genome, 10-75 Kb long
Often the get processed from long RNA precursors, and processed to form piRNAs

32
Q

What are small nucleolar RNA (snoRNA)?

A

Small nucleolar RNA (snoRNA)
60-300 nucleotides long, involved in maturation of ribosomal RNA, they carry out site-specific methylation, they change uridine into pseudouridine. Usually found in introns of protein coding genes, some found in clusters.

33
Q

What are pseudogenes and gene fragments?

A

Pseudogenes and gene fragments
Partial gene sequences that do not appear to code for viable proteins. Mutated copies of functional genes; “Processed” pseudogenes lack introns and promoters. Sometimes have a role in regulation of similar genes that code viable proteins. Especially when they produce an anti-sense RNA, or contain an inverted repeat.

34
Q

LINEs, SINEs and LTRs

A

LINEs, SINEs and LTRs
Over 40% of the human genome,
Retrotransposons can turn into RNA and then make to DNA : DNA  RNA  DNA
Repeated elements, structured like RNA transcripts, not like protein-coding genes.

35
Q

What are LINEs?

A

LINEs 20%
Long interspersed elements
We have about 0.5 million of them in genome. Only ~80-100 in most human genomes can still “jump” because they encode two proteins;
• ORF1p: RNA-binding and acts as a nucleic acid chaperone
• ORF2p: reverse transcriptase and endonuclease
These allow the LINE element to be transcribed into RNA then this binds to ORF1p and then the ORF2p protein allow it to go back to DNA.
These also promote the mobilisation of other parts of the genome, and can reshuffle parts of the genome. It also seems active in brain development
Nature 479: 534–537 (2011)
And implicated in disease
Ann.Rev.Genom.Hum.Genet 12: 187-215 (2011)

36
Q

What are SINEs?

A

SINE 13%
Short interspersed elements, ~300 bp long
Most common: Alu elements; 11% of genome
Looks like7SL RNA gene (part of a RNP involved in protein trafficking) after deletion of a central sequence. It may be involved in regulation of gene expression;
Nucleic Acids Research 34: 5491-5497 (2006)

37
Q

What are LTRs? What percentage do they make up?

A

LTRs – 8%
Long terminal repeats are retro-virus like elements, named for their flanking “long terminal repeats”, human endogenous retroviral sequences (HERVs)

Some are 6-11 Kbp long; Encode protease, reverse transcriptase, RNAse H and integrase genes
Some are 1.5-3 Kbp long; Encode fewer or no genes
~240,000 copies in human genome

38
Q

What are DNA transposons? What percentage do they make up?

A

DNA transposons 3%
Transpose via a DNA intermediate through a “cut-and-paste” mechanism not as common as retro-transposons that have a “copy-and-paste” mechanism.
Large “autonomous” transposons encode transposase up to 40 Kbp.
Small “non-autonomous” transposons; miniature inverted-repeat transposable elements, some encode miRNAs PLoS ONE 2(2): e203.

39
Q

What are DNA Microsatellites? What percentage do they make up?

A
Microsatellites – 3%
Simple Sequence Repeats (SSRs)
ATATATATAT
GCGCGCGC
Length evolves rapidly due to slippage in replication – (often used in population genetics)
40
Q

What are other intergenetic?

A

Other intergenetic 16% - we don’t really know…

41
Q

Mitochondrial genome;

How long is it?
What percentage is protein coding?
What percentage is RNA genes and regulatory sequences?
What percentage are other sequences?

A
Mitochondrial genome; 
•	16.6 Kb in length. Circular.
•	66% protein coding genes
•	32% RNA genes and regulatory sequences
•	2% other sequences
42
Q

What is the mitochondrial genome?

A

Mitochondrial genome;
It has 2 places where replication starts and because it has different codons it has to make its own tRNA. It has to make separate tRNA’s, it can be transcribed in both directions and is very information dense. Loads of RNA and proteins coded for. Some of its genes overlap, thus translated in more than one reading frame – this is clever as you can have two genes by two different proteins being coded for my same DNA in a single strand in the mitochondria. The MT-ATP6 and MT-ATP8 genes are transcribed in different reading frames from overlapping segments of the mitochondrial H strand.