Theme 2 Flashcards by Mercer Reid

Define genome annotation

an overlay of biological information on to the genome sequence to predict and mark important features.

How well did you know this?

Not at all

Perfectly

What features does genome annotation look for

Protein coding genes (by location)

RNA features (by location and function)

Protein function (by similar sequences)

How well did you know this?

Not at all

Perfectly

Features of protein coding genes

Contained in an ORF

Have an initiation codon, usually ATG

Have ribosome binding site

How well did you know this?

Not at all

Perfectly

What technology do we use to find genes location

Genefinders

e.g: GeneMarkS, GLIMMER and Prodigal

How well did you know this?

Not at all

Perfectly

How do we predict protein function from similar sequences

Compare query sequence to databased

Similar order and aa content is good similarity (must be similar in order)

<10% identical: similarity occurs by chance thus not related

10-35% identical: might have a related function

> 35% identical: probably have a related function

How well did you know this?

Not at all

Perfectly

What is the most common tool for comparing sequences

BLAST
p = compare protein query to protein database
n = compare nucleotide query to protein database

The “expect” measures the likelihood of a match up occurring by chance

Near 0 = good
Above 0.1 = bad

How well did you know this?

Not at all

Perfectly

Explain a genetic proof using virulence factors

Proving what a gene does

Comparing 2 strains one with a new virulence factor
- virulent version of the gene is put into the avirulent strain. If the virulence factor can then be observed, it has been proven that that specific gene or gene change is responsible for the virulence factor.

How well did you know this?

Not at all

Perfectly

What is read depth (coverage)

A measurement of how much of the genome it will cover in reads.

Depth = (no. of reads x length of each read in bases)/estimated genome size

30x to 100x is enough to avoid gaps

How well did you know this?

Not at all

Perfectly

Describe Sanger sequencing

Dideoxy nucleotides (with radioactive marker)
Normal deoxy nucleotides
Primer
ssDNA template
DNA polymerase

As a sequence is made dideoxy nucleotide will terminate it and can be read by the marker. `

How well did you know this?

Not at all

Perfectly

Describe illumina sequencing technology and the 2 main machines

Same as sanger but uses “blocked” nucleotide. A photo is taken when nucleotide added, then unblocked so next can join, then another photo taken.

HiSeqX10: for human genome, 3 days, many 150 bp reads

MiSeq: for other jobs, 56 hours, (less than above but) 350 bp reads

How well did you know this?

Not at all

Perfectly

What is FASTQ and a phred quality score

FASTQ file stores sequence fragments before mapping and FASTA shows them after mapping.

Multi-FASTQ: list of all the reads

Phred quality score: Measure of the quality of sequence identification by symbols

How well did you know this?

Not at all

Perfectly

Explain what a draft genome is

A sequence that has not been perfectly check and annotated

Made up of contigs (an unbroken consensus sequence)

A contig break is where there is no overlap (but >30x depth usually prevents this)

How well did you know this?

Not at all

Perfectly

How do you go from a draft to a closed genome

We use the short read draft genome from illumina with long reads from other technology

The long reads span more than the longest repeated elements so we can locate them in the genome.

Sum of read quality and read length

How well did you know this?

Not at all

Perfectly

What are the technologies that can be used to make large reads

PacBio: Single molecule sequencing with fluorophores on nucleotides so that a fluorescent flash can be recorded when a base is added

Nanopore: Pulls ssDNA through nanopore. electrical pulse measured from when a base passes the sensor

PCR strategy: Design primers to amplify the gaps between contigs. Makes a ‘PCR amplicon’ which can be sequenced

How well did you know this?

Not at all

Perfectly

Outline the HGP

Aimed to determine the entire sequence of human DNA to identify all the genes.

1990-2003

Cover 99% of genome with error 1 in 100,000 bases.

Not telomeres, centromeres

Found that:

HG is 316.4 million bases and 20,000-25,000 genes
Less than 2% encodes proteins

How well did you know this?

Not at all

Perfectly

Why are eukaryotic genomes more difficult for gene identification

Study These Flashcards

Promoter sequences not easily recognised and can be far from start site

Most genes are interrupted

Explain CpG islands

Study These Flashcards

CG or CpG is frequently methylated in DNA which turns off genes by altering chromatin structure.

Normal C if deaminated = U. but corrected in repair

Methylated C if deaminated = T which is not corrected

Over time this mutation has occurred meaning there is less CpG.

Promoters of genes which are on are not methylated meaning if a mutation occurs it is repaired.

Promoters are usually CpG dense because of this regulation ability.

What is junk DNA

Study These Flashcards

Pseudogenes, mobile genetic elements, segmental duplications, small sequence repeats that may be remnants or contribute to chromosomal bulk

What are the 3 types of pseudogenes

Study These Flashcards

Classical pseudogenes: arise by a gene DNA duplicating Contain introns.

Processed pseudogenes: processed mRNA integrating into DNA. Do not contain introns as the mRNA was already spliced.

Other pseudogenes may be transcribed and have biological roles – microRNA decoy.

What is a pseudogene

Study These Flashcards

Pseudogenes are non functional DNA segments that resemble genes. Become inactive by a mutation occurring

What is a mobile genetic element

Study These Flashcards

DNA segments that can move or copy itself to another position in the genome which will alter expression/function.

Can be transposons and retrotransposons

What is a transposon

Study These Flashcards

(cut and paste)

DNA segments which can move within/between chromosomes by encoding their own transposase

Only transposon remnants evident in human genome.

What is a retrotransposons

Study These Flashcards

(copy and paste)

move from one point to another in the genome via RNA intermediates being reverse transcribed into ss cDNA which is then converted into dsDNA and inserted into new site.

Can be retroviral like or non retroviral

Explain retroviral like retrotransposons

Study These Flashcards

Retroviral-like: endogenous retroviral element, ERV

From retrovirus infection

Explain non retroviral retrotransposons

Long interspersed nuclear element, LINE: have a promoter and encode a protein with combined endonuclease and reverse transcriptase (RT) activity. Short interspersed nuclear element, SINE: SINEs do not encode any own proteins move using enzymes produced by other mobile elements e.g. LINEs.

Retrotransposons in somatic tissue may:

Alter function Contribute to disease Change phenotype expressed

What is ENCODE

Encyclopaedia of DNA elements Aims to catalogue all of the functional elements in human genome

Theme 2 Flashcards

(27 cards)