Lec 6 Flashcards by Dee Abeysuriya

Define genome

complete set of all genetic information

i. e
- chromosomes 1-22
- mitochondrial DNA
- chromosomes xx or xy

How well did you know this?

Not at all

Perfectly

Define transcriptome

All the mRNA molecules that can be expressed from the genes of an organism

How well did you know this?

Not at all

Perfectly

What main methods do we use to study genomes

Sequencing
Microarrays
Visualisations

How well did you know this?

Not at all

Perfectly

What are some sequencing methods that can be used?

-whole genome sequencing
-exome sequencing
RNAseq
ATAC-seq
Targeted seq

How well did you know this?

Not at all

Perfectly

What are some methods of Microarrays

SNPchips

- Expression arrays

How well did you know this?

Not at all

Perfectly

What are some methods of Visualization?

FISH (fluorescence in situ hybridisation)
northern and southern blotting
qPCR

How well did you know this?

Not at all

Perfectly

FISH stands for

fluorescence in situ hybridization

How well did you know this?

Not at all

Perfectly

Out of the main methods of sequencing, microarrays and visualisation, what is the main method we use to study sequencing now?

Sequencing

How well did you know this?

Not at all

Perfectly

What do microarrays give us?

microarrays can be a SNPchip or an expression array

which tells you if an individual has a particular SNP or deletion

Microarray may have 100, 1000s of probes that ask a very specific question.

i.e Does your genome have this specific change.

How well did you know this?

Not at all

Perfectly

What do expression arrays tell us?

Expression array tells us how much of e.g DGAT is expressed in different cell types or diff organisms

How well did you know this?

Not at all

Perfectly

If there is an unknown species. How do we describe it genetically and compare it to other species?

2 main questions

how do we describe it genetically?
how would we compare it to other species?

-how is it’s genome structured?

how many chromosomes do they have?
do the chromosomes have any strange and/or unusual structures

what genes does it have?
do any of these genes do interesting or unusual things

-are the genes similar to that of other organisms

How well did you know this?

Not at all

Perfectly

What are the main challenges of genetically describing an unknown species

What size is the genome
how many chromosomes
what is the genome ploidy
how big and how many repeats are there?
How easy to extract the DNA

How well did you know this?

Not at all

Perfectly

Unknown species. why is the genome size important?

is the genome a few billion base pairs or tens of billions of bps

This makes a huge difference in how much data needs to be generated in order to sequence the genome

How well did you know this?

Not at all

Perfectly

Unknown species. Why are the number of chromosomes important?

1? or 99?

99 chromosomes will be harder to sequence than 1

How well did you know this?

Not at all

Perfectly

Unknown species. Why is the genome ploidy important?

haploid species e.g microbiome = very easy to work with as there is only 1 copy of DNA

Diploid e.g humans= a bit harder to work with as we have 2 copies of DNA which are similar

Tetra etc..
will be harder to work with as they will have 5 or 6 different copies of DNA that is basically identical except for a few changes

How well did you know this?

Not at all

Perfectly

Unknown species. Why are the number and size of repeats important?

A lot of the majority of species are repeatitive sequences in their genome

E.g in humans 40% of the genome is repeated.

These repetitive elements can come from simple repeats of DNA or other things such as endogenous retroviruses (that copy themselves into host genomes millions of years ago and replicated and expanded for more space in the genome- modify things)

How big are the repeated sites? 10bp or 1000sbp?= this influences how the genome is analysed

Unknown species. Why is it important to know how easily the DNA can be extracted?

Not all species have all their DNA in all their cells

e.g human RBC have mitochondrial DNA only.

Some species of fish have their full genome in their germ cells but their somatic cells contain a subset of their genome.

WGS stands for

Whole genome sequencing

all the genes are sequenced from the organism

What methods are used for WGS?

Short read methods - illumina, IonTorrent

Long read methods- PacBio and Oxford Nanopore

Historic methods - Sanger sequencing

Depends on what is being sequenced RNA or DNA.

WGS is generally thought of as a ___ sequencing method because?

WGS is generally thought of as a DNA sequencing method as cDNA is usually used.

Describe illumina sequencing

Short read method of WGS

SBS sequencing = highly accurate short read technology

short reads 2 x 150bp
generates huge amounts of data ( up to 3000gb)
cheapest sequence (90gb

Human genome in its haploid state is 3gigabase

Each chip of SBS sequencing technology can hold ____ human genomes

Each chip of SBS sequencing technology can hold 24 human genomes (clinical grade)

Describe how the illumina SBS sequencing works

takes DNA, fragments it
sticks adapters on and flows it across
The piece of DNA on the vertical line binds to a point on the cell and is put on such a level that these separate quite widely over the flow cell

Make a second copy of the DNA and force it to bend over and attach on the other end and cause it to split into 2 single strands

= 2 copies of the same piece of DNA quite close to each other

This is then replicated a few hundred times = this gives a cluster

e.g a thousand copies of 1 piece of DNA all spatially located on a slide on a very tight cluster

This is the prep step

Get rid of the dsDNA and add a single base containing a big fluorescent marker attached to it that will stop any other bases attaching to it.

The enzymes get stuck it, and put all the bases on at once

once it attached, the base that fits with the DNA.

since there is a big marker blocking the next base cannot be integrated

A laser is fired and a photo is taken with a microscope. SInce there are a thousand copies you now have a thousand different things contributing to the same signal cause all the As= 1 colour
Ts = 1 colour

Therefore, this give a bright spot of colour that is correlated to the molecules

and know that position on the slide is an e.g A

easy to

add an enzyme/acid that chops off the fluorophore and then add the next available set of bases

repeat process 150 times

What is the error rate of SBS sequencing

less than 0.5%

SBS sequencing has some issues with _____ regions

SBS sequencing has some issues with high A/T or G/C regions

Only repeat the process 150 times (150 base pairs) is because, overtime, the molecules get out of sync = e.g 1 error in a 1000 = doing it 150 times = errors start to accumulate in the clusteres and the clusters get less accurate

150 bases are read again but from the other side. (150 x 2) = end up with 150 base pairs from 2 ends of the same DNA molecule

Briefly describe IonTorrent

Ion torrent - short reads 200-400bp - generates up to 10Gb of data - issues with homopolymers - lower accuracy than illumina

Describe the method that occurs in IonTorrent

A bead is used to attach 1 piece of DNA and replicate it so that you end up with a bead that has 100s/1000s copies of the same DNA piece. If the beads are put across a slide with little wells, each bead can only fit into 1 well, so you end up with a bead with a lot of DNA copies which can then be flown across the nucleotides. e.g if you stick the nucleotide 'A' across and flow it across the entire cell and the piece of DNA with the bead that wants an 'A' , the bead will grab the 'A' and release a hydrogen ion in the process.

How do you measure the intensity of the DNA copies in IonTorrent seq?

since there are 1000s, 10000s of copies of the DNA you will get 1000s,10000s of hydrogen ions released when the bead grabs onto a specific nucleotide and release hydrogen ions in the process. Each of the wells of the slide contain a pH meter so when there is a change in pH around the bead to indicate that a DNA base has been added a signal will be produced.

How do you estimate the intensity of the signal produced in IonTorrent seq?

for example, if you get 1 base added and you only get a thousand hydrogen ions released and if you get 4 bases added one after another then you will get four thousand hydrogen ions released will produce a stronger signal which you can try to estimate. The downside is that the production of the signal is not a linear relationship. If 4x many hydrogen ions are released then in reality it does not give you 4x as much signal, instead you get a fraction of that- therefore, this has a high error rate. This method is however, reasonably priced and quick.

What is PacBio seq?

Pacific Biosciences sequencing Is a type of sequencing that is important for whole genome assembly or trying to workout a new organism

Briefly describe PacBio

- more modern - long read 1k-20kb - moderate random error rate (very high consensus accuracy due to lack of systematic error) - generates up to 20gb of data - expensive - gives a random error e.g if you copy the same piece of DNA 2x then you will get 2 diff random errors. = e.g if the DNA was copied a hundred times then you could build-up a very accurate copy

Why are illumina and short-read technologies said to have a bias towards the error?

E.g if they see CCCTC the short-read tech would make an error on the Cs so Doesn’t matter how much DNA is put in - you’ll get this bias

what kind of errors do long-reads produce? bias, random etc..

random errors