Theme 2 Flashcards

1
Q

Define genome annotation

A

an overlay of biological information on to the genome sequence to predict and mark important features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What features does genome annotation look for

A

Protein coding genes (by location)

RNA features (by location and function)

Protein function (by similar sequences)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Features of protein coding genes

A

Contained in an ORF

Have an initiation codon, usually ATG

Have ribosome binding site

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What technology do we use to find genes location

A

Genefinders

e.g: GeneMarkS, GLIMMER and Prodigal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do we predict protein function from similar sequences

A

Compare query sequence to databased

Similar order and aa content is good similarity (must be similar in order)

<10% identical: similarity occurs by chance thus not related

10-35% identical: might have a related function

> 35% identical: probably have a related function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the most common tool for comparing sequences

A

BLAST
p = compare protein query to protein database
n = compare nucleotide query to protein database

The “expect” measures the likelihood of a match up occurring by chance

  • Near 0 = good
  • Above 0.1 = bad
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Explain a genetic proof using virulence factors

A

Proving what a gene does

Comparing 2 strains one with a new virulence factor
- virulent version of the gene is put into the avirulent strain. If the virulence factor can then be observed, it has been proven that that specific gene or gene change is responsible for the virulence factor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is read depth (coverage)

A

A measurement of how much of the genome it will cover in reads.

Depth = (no. of reads x length of each read in bases)/estimated genome size

30x to 100x is enough to avoid gaps

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe Sanger sequencing

A
  • Dideoxy nucleotides (with radioactive marker)
  • Normal deoxy nucleotides
  • Primer
  • ssDNA template
  • DNA polymerase

As a sequence is made dideoxy nucleotide will terminate it and can be read by the marker. `

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe illumina sequencing technology and the 2 main machines

A

Same as sanger but uses “blocked” nucleotide. A photo is taken when nucleotide added, then unblocked so next can join, then another photo taken.

HiSeqX10: for human genome, 3 days, many 150 bp reads

MiSeq: for other jobs, 56 hours, (less than above but) 350 bp reads

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is FASTQ and a phred quality score

A

FASTQ file stores sequence fragments before mapping and FASTA shows them after mapping.

Multi-FASTQ: list of all the reads

Phred quality score: Measure of the quality of sequence identification by symbols

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain what a draft genome is

A

A sequence that has not been perfectly check and annotated

Made up of contigs (an unbroken consensus sequence)

A contig break is where there is no overlap (but >30x depth usually prevents this)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you go from a draft to a closed genome

A

We use the short read draft genome from illumina with long reads from other technology

The long reads span more than the longest repeated elements so we can locate them in the genome.

Sum of read quality and read length

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the technologies that can be used to make large reads

A

PacBio: Single molecule sequencing with fluorophores on nucleotides so that a fluorescent flash can be recorded when a base is added

Nanopore: Pulls ssDNA through nanopore. electrical pulse measured from when a base passes the sensor

PCR strategy: Design primers to amplify the gaps between contigs. Makes a ‘PCR amplicon’ which can be sequenced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Outline the HGP

A

Aimed to determine the entire sequence of human DNA to identify all the genes.

1990-2003

Cover 99% of genome with error 1 in 100,000 bases.

Not telomeres, centromeres

Found that:

  • HG is 316.4 million bases and 20,000-25,000 genes
  • Less than 2% encodes proteins
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why are eukaryotic genomes more difficult for gene identification

A

Promoter sequences not easily recognised and can be far from start site

Most genes are interrupted

17
Q

Explain CpG islands

A

CG or CpG is frequently methylated in DNA which turns off genes by altering chromatin structure.

Normal C if deaminated = U. but corrected in repair

Methylated C if deaminated = T which is not corrected

Over time this mutation has occurred meaning there is less CpG.

Promoters of genes which are on are not methylated meaning if a mutation occurs it is repaired.

Promoters are usually CpG dense because of this regulation ability.

18
Q

What is junk DNA

A

Pseudogenes, mobile genetic elements, segmental duplications, small sequence repeats that may be remnants or contribute to chromosomal bulk

19
Q

What are the 3 types of pseudogenes

A

Classical pseudogenes: arise by a gene DNA duplicating Contain introns.

Processed pseudogenes: processed mRNA integrating into DNA. Do not contain introns as the mRNA was already spliced.

Other pseudogenes may be transcribed and have biological roles – microRNA decoy.

20
Q

What is a pseudogene

A

Pseudogenes are non functional DNA segments that resemble genes. Become inactive by a mutation occurring

21
Q

What is a mobile genetic element

A

DNA segments that can move or copy itself to another position in the genome which will alter expression/function.

Can be transposons and retrotransposons

22
Q

What is a transposon

A

(cut and paste)

DNA segments which can move within/between chromosomes by encoding their own transposase

Only transposon remnants evident in human genome.

23
Q

What is a retrotransposons

A

(copy and paste)

move from one point to another in the genome via RNA intermediates being reverse transcribed into ss cDNA which is then converted into dsDNA and inserted into new site.

Can be retroviral like or non retroviral

24
Q

Explain retroviral like retrotransposons

A

Retroviral-like: endogenous retroviral element, ERV

From retrovirus infection

25
Q

Explain non retroviral retrotransposons

A

Long interspersed nuclear element, LINE: have a promoter and encode a protein with combined endonuclease and reverse transcriptase (RT) activity.

Short interspersed nuclear element, SINE: SINEs do not encode any own proteins move using enzymes produced by other mobile elements e.g. LINEs.

26
Q

Retrotransposons in somatic tissue may:

A

Alter function

Contribute to disease

Change phenotype expressed

27
Q

What is ENCODE

A

Encyclopaedia of DNA elements

Aims to catalogue all of the functional elements in human genome