Week 1.1: Introduction, the landscape of the human genome Flashcards
When was the human genome project published?
The human genome was first published in 2001
Who led the private, and public HGP? Where were they published?
The private project led by J. Craig Venter published in Science. Eric S. Lander led the public project that was mainly funded by the US government and published in Nature, during the same time.
Human genomics timeline
1866
1866 Mendel publishes laws of inheritance, using pea plants, this is when we first began to understand how genetics works.
Human genomics timeline
1869
1869 Miescher isolates “nuclein” from cells, at the time we did not know about the link (DNA) between the nuclein discovered and the genetics of Mendel’s discovery.
Human genomics timeline
1912
1912 Chromosome counts 47 male, 48 in female, at the time chromosomes were not understood. Today we know that the counts were wrong, because the male Y chromosome, which is smaller than the X chromosome, was too small to be seen by the microscopes.
Human genomics timeline
1944-1952
1944-1952 DNA shown to be genetic material
Human genomics timeline
1953
1953 Crick and Watson structure of DNA
Human genomics timeline
1961
1961 Nirenberg cracks genetic code
Human genomics timeline
1996
1996 Yeast genome sequenced
Human genomics timeline
2001
2001 Human genome sequence published
Human genomics timeline
2012
2012 ENCODE project published, telling us a lot more about the functionality of the human genome
What is the latest version of the human genome called? when was it released?
Latest version of the human genome is called GRCh38, and was released in December 24th 2013.
How many bases does the human genome have? What percentage of bases are unknown? what percentage of bases are unplaced?
Current best human genome we have is 3.2 billion bases long, and about 4.98% of those bases are unknown, we do not know if they are A,G,C, or T. To this day, we do not have a perfectly complete human genome. 0.14% of bases are unplaced, meaning we do not know where on the chromosome they are.
How many bases long is the human genome? What is the equivalent to in bibles, Qu’rans?
The size of the human genome, it is 3.2 billion letters long, the equivalent to 1,000 Bibles or 10,000 Qu’rans, in (almost) every cell in our bodies we carry 2 copies. We have 100 trillion cells, therefore we carry about 6.5x10^23 bases of DNA.
In the context of computers, one computer byte can represent four base pairs, meaning a haploid human genome can be coded in 0.8Gb of information
How much information does the human body contain?
Thus, the human body contains approximately 161 billion terabytes of information in DNA
= 161 exabytes
= 322 billion of average laptop hard-drives
= 32 trillion DVDs at 5Gb
= 26 years of internet traffic
As you walk around more information is moving then all the information on the internet over 26 years!
Is it cheaper to generate DNA than to store it? What could DNA be used as?
Generating big data, in sequencing DNA is relatively cheap and easier, we are getting to a point where generating the data is cheaper than storing the information. As DNA is an incredibly efficient way of storing and moving information, we are thinking about storing our scientific results in a DNA file in a tube at -80degrees rather than on a hard drive.
Nick Goldman demonstrated this.
How many genes are there?
Sweepstake carried out in 2000-2003, where scientists were asked to guess how many protein-coding genes there are in the human genome
What was the median bid?
There was a median bid of 61,302 genes, lowest bid = 25,847, highest bid = 300,000.
The actual result was 21,000 genes.
What percentage of the genome is made up of protein-encoding genes?
What part of the DNA are protein coding?
Genes – 1%
Exons are the part of the DNA that is protein coding.
The DNA is transcribed into RNA which is translated into proteins. 3 bases to one amino acids, they code for structural proteins, enzymes and signal proteins.
What percentage of the genome is made up of Gene-related?
Gene-related – 36%
Gene-related are parts of the genome that have something to do with genes. Introns are found dispersed across the exons. Untranslated regions (UTRs), which are transcribed with RNA. Promoter regions that govern the expression of genes. RNA genes (ncRNA’s) which are involved in regulation of genes. Pseudogenes which appear as ‘dead’ genes. Gene fragments, left over bits of genes.
What does a typical protein coding gene consist off?
Promoter with a TATA box, is followed by a series of exons and introns between them. Splice sites between the exons and introns (GT | intron | AG). UAA stop codon, at the end of the last exon. This gets transcribed into pre-mRNA. Which is then spliced, so that the introns are removed and it also gets capped at the 5’ end of the mRNA, and there is a poly-A tail at the 3’ end. This makes it a target for the cytoplasm. In the cytoplasm that mRNA is translate and made into a polypeptide proteins.
What are RNA (ncRNAs) genes?
Give 4 examples?
RNA Genes (ncRNAs) RNA genes are something that we did not know much about until recently, whilst estimated number of protein-coding genes has fallen, estimated number of RNA genes has increased. This is one of the biggest genomics surprises over the last 10 years. Some of these occur in big clusters in the human genome, such as ribosomal, tRNA, U1 snRNA, pi-RNA. Others dispersed among the protein-coding regions and found in introns.
What is transfer RNA (tRNA)?
Transfer RNA is an example of RNA that we have known about for a while, the tRNA proposed as the adaptor linking mRNA to proteins when mRNA is translated. tRNA binds onto the amino acid at one end and at the other end it binds onto the mRNA and that’s how the amino acids gets into the chain that’s defined by the mRNA.
tRNA is found in large clusters of up to 700 genes on multiple chromosomes, we need tRNA’s all of the time. Thus, it makes sense to have many genes coding them.
What are ribozymes and what do they do?
Ribozymes are enzymes made up of RNA’s they are not translated into proteins but function as RNA’s. As RNA’s are single stranded nucleic acids, they are able to fold up on it and as a result are able to form many complex structures that enable it to interact with DNA. For this reason, they are often involved in the cleaving of other RNA’s. Often they are found in a single copy of the human genome.
What are ribonucleoproteins (RNPs) and what do they do?
Ribonucleoproteins (RNPs) are ribozymes that form complexes with proteins and they are involved in lots of functions in processing nucleic acids. Ribosomes (rRNA) are the best and best-known examples of ribonucleoproteins. Spliceosome (snRNPs) is another and telomerase – which has a role in governing the length of telomeres in chromosomes.
RNPs RNA part can bind to the DNA and the protein can carry out functions that the RNA is unable to do.