Sequencing Flashcards

1
Q

Genome Sequencing Methodology

A

5-10 times the number of anonymous participants as needed provided DNA samples
Taken from local sites. DNA extracted from blood
Sequenced from composite of genomes of fraction of participants, known by nobody

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

BACS libraries

A

Bacterial Artificial Chromosomes
Sorted chromosomes from which DNA is isolated
Restriction Enzymes cut specific palindromic sequences
Restriction enzymes cut isolates DNA into multiple fragments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Creation of BACS libraries

A

DNA fragments inserted into circular DNA and included into bacteria (BACS)
Single sequences called CONTIGS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

BACS clones

A

Dilute solution of bacteria can be cultured on agar plate and the colonies produced are clones
Single colony contains clones of DNA sequence
Clones then used for sequencing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

BACS automation

A

Automated massively parallel creation of BACS
Copied DNA isolated and sequenced
Computational tools applied to obtain the physical map

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Production of physical map

A

Select clones for sequencing (overlapping)
Sequence to at least draft coverage
Merge data
Order and orient with mRNA, paired end reads and other data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Genetic mapping

A

Produced using a physical map by assessing the location of the genes.
Genes on same chromosome are ‘linked’.
More recently. Position of genes is determined by the exact frequency of recombination has occurred.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

FISH mapping

A

Fluorescence in situ hybridization
Attach fluorescent labels to DNA sequences
Process chromosomes on glass so location of specific genes within the chromosome can be identified

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Sequencing developments

A

Can do 20kb with 99.5% accuracy
Can sequence mRNA directly
Only suitable for a single strand of DNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Current sequencing methods

A

PacBio HiFi - Mid length, Mid accuracy
Illumina - Low length, High accuracy
Oxford Nanopore - High length, Low accuracy

Not available during Human Genome Mapping Project

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

PacBio Hifi

A

Polymerase enzyme, nano-sized hole
Single strand of DNA introduced
Fluorescent nucelotides emit light as they are ‘stitched’ into the complementary double strand
Colour of light emmission provides accurate sequencing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Illumina Sequencing

A

Individual pieces of DNA attached to glass surface
Sequencing by synthesis
As complementary nucleic acid attached, fluorescence produced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Oxford Nanopore

A

Double strand of DNA unzipped
Single strand inserted into protein nanopore
Electric current created by flow of ions which is a function of the nucleic acid base
Current as a function of time provides sequence information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Linkage distance

A

Distance in bp between genes on the same chromosome

Smaller linkage distance = more likely to be inherited together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Make up of Human Genome

A

Only 2% contains exons
26% introns
Only recently been able to understand role of other sequence information (lots of repetitive sequences)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Sequence reassembly - Reducing computational efforts

A

Sequencing a large array of overlapping short fragments (contigs) created from the BACS
Short sequences are called reads

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Gel electrophoresis

A

Comparing size of fragments/contigs
Fragments migrate in an applied electric field
Shortest move the fastest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Digital Trees/Trie

A

Multiway tree often used for storing large sets of words
Trees with a possible branch for every letter of an alphabet
Words end with $

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Trie usage

A

Implementation of sets
Quicker insertion, deletion and find
Quicker than binary trees and hash tables
Spell checkers, completion algorithms, longest-prefix matching, hyphenation
Search finds longest match between words in set and query

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Sequence analysis - Tries

A

Can store DNA/proteins
Finding next fitting section in DNA reconstruction
Useful for finding errors, only need to search a small sub-tree
DNA, 4 way tree meaning your tree is deep but doesn’t waste so much memory
Searching for particular sequence motifs

21
Q

FInding protein coding genes

A
Ab initio
Computer approaches
Finding common sequences (start and end of protein coding genes)
Promoter regions - protein binding
Start codons
Stop codons
22
Q

Regulatory Region

A

Promoter - TATA box - Start of 5’ UTR

23
Q

Transcription and Splicing

A

Removal of introns in transcribed regions

Results in mRNA

24
Q

Regulatory Region Function

A

In this sequence, RNA polymerase will bind to initiate the transcription of the cDNA into RNA

25
Q

Promoter Sequence

A

Firsts binds the RNA polymerase
upstream / 5’ end of the transcription initiation site
100-1000 base pairs long
High occurrence of AA,AT,TA and TT dinucleotides (also A+T trinucleotides)

Over representation of GC,GG,CG,AG,GA,TG downstream of promoter

26
Q

TATA box

A

30% of human genes
Contains sequence TATAWAW
W = A or T

27
Q

Benefits of sequencing the mRNA

A

Start codons, stop codons and exon sequences can be looked for in both the chromosomal DNA and the mRNA
Can find them with tries
Subsequent codons in mRNA are in groups of three for coding amino acids in sequence
Start codon unique

28
Q

Memory issues with tries/Time issues with tries

A

Can use a regular trie for a suffix tree, would typically use far too much memory to be useful
Use of pointers to the original text
Can build a suffix tree using O(n) memory where n is the length of the text
Also linear time O(n) algorithm for trie construction (non-trivial)

29
Q

When to use suffix trees

A

Efficient when it is likely that you will need to do multiple searches
Exact word matching
Use with dynamic programming for inexact matching (match with smallest edit distance)
Bioinformatics, Advanced ML

30
Q

Suffix trees with genome sequences

A

Suffix trees are valuable given the number of repeats present in the genome sequences
With more unique reads in the genome, becomes less efficient

31
Q

Genome Homology

A

Genomes of human are 99.9% homologous

32
Q

Variants Removal of Negative Mutations

A

100s of new mutations in offspring for each generation
Most mutations neutral in phenotypical effect or removed by negative selection
Many mutations corrected by repair enzyme machinery of the cell

33
Q

Variants - Mutations causing an advantage

A

Occasionally mutations create an advantage w.r.t survival or reproduction advantage to offspring (positive selection)

34
Q

Mutations occurring in the genome

A

Mutations don’t occur randomly.

Occur in particular regions in the genome known as hotspots

35
Q

Variant definition

A

Permenant change in the DNA sequence which makes up a gene

36
Q

Variant as opposed to gene mutation

A

Such changes do not always cause disease and can be present in non-coding regions

37
Q

Allele

A

Variation of a given gene at the same position (locus) on the chromosome
Can also be present in non-coding regions
Typically multiple alleles at locus between different individuals in population

38
Q

Polymorphism

A

Allelic variation determined as the number of alleles present

39
Q

Phenotypic traits

A

Derived from the transmission of genes and alleles to an organism’s offspring

40
Q

SNP

A

Single nucleotide polymorphism
Most common variation in human genomic DNA
Single nucleotide differs between members of the population/chromosome pairs
4-5 million in each person’s genome

41
Q

Other genomic polymorphisms

A

Deletions and insertions

42
Q

Chromosome synteny

A

Used to define genes which lie on the same chromosome

More recently term used for the conservation of blocks of order within two compared chromosomes

43
Q

Repetitive Sequences

A

aka repetitive elements, repeating units, repeats

Make up approximately 50% of the human genome

44
Q

Dispersed repeats

A

Recognized as potential source of genetic variation and regulation

45
Q

Tandem repeat sequences (trinucleotide repeats)

A

Important in several human diseases
Implication of repeats within exon region causes protein misfolding when present in high numbers (>40 copies for huntington’s disease)

46
Q

CpG islands

A

Sequences containing repeats of CG closer to the 5’ end of the gene sequence (promoter)
At least 200bp long
% c+g >50%
Observed/expected frequency >0.6

47
Q

Expected frequency of CpG islands

A

Human genome has 42% GC content
Expected frequency of a CpG = 0.21 ** 2
Actual frequency is 1%

48
Q

Location of alleles or genes in chromosomes

A

Defined by bands (historically created by G-stain)

49
Q

BCRA2

A

Breast/Prostate cancer
One BRCA1 and BRCA2 are sequenced from blood samples
Can use suffix trees to detect which of the stable mutations are present
Short specific sequence motifs (mutations) within the flanking base pairs can be mined