Final Flashcards

Question

What is SOLiD?

Answer 1

Another type of NGS that also uses DNA fragments attached to beads. It uses labelled segments with 2 binding nucleotides and a fluorescent code based on each dinucleotide, each pair being given a specific colour. Sequencing then occurs multiple times with primers slightly offset from each other for full sequence coverage.

Answer 2

Another NGS based on bead fragment attachment. The beads are placed in wells in the semiconductor chip. Every few seconds, a nucleotide solution floods the chip. If the nucleotide is incorporated than the released H+ changes the chip voltage. Voltage is then recorded as a nucleotide incorporation.

Answer 3

GBs of data of varying quality are produced that must be assessed to remove dubious data. Illumina typically outputs in FASTQ formate.

Answer 4

In a FASTQ file, it is composed of 4 lines/sequence. 1 - sequence identifier 2 - raw sequence data 3 - (+) symbol 4 - quality score based on ascii characters from ! to ~

Answer 5

Separate programs are used to remove low quality programs. This removal prevents low quality data from leading to incorrect reassembly of the genome and reducing the amount of data that needs processing.

Answer 6

Reference based | De Novo

Answer 7

Uses previously assembled genomes to provide the scaffolding to which sequence reads can be aligned. When enough reads are aligned to the reference, minor errors in the reads are filtered out.

Answer 8

Involves building a sequence from scratch when there is no known reference genome. This takes advantage of how DNA fragments can overlap to stitch reads to contains consensus regions of DNA. the contiges are collected into a possible scaffold and aligned to a similar genome.

Answer 9

High coverage, long reads, and good quality reads.

Answer 10

A measure of how many reads we have for a sequence. Ideally, a sequence will result in high degree of coverage across the genome and a high depth of coverage for each base pair.

Answer 11

Process of identifying the regions of you sequence DNA that contains genes and coding regions. Many programs will annotate genome, but most are organism specific.

Answer 12

PCR has issues with high GC content, resulting in poor coverage. De Novo has assembly issues with short reads, resulting in contains that are difficult to combine into a whole genome assembly. Lots of man hours Resolution of gene location

Answer 13

This is a series of sequencing techniques primarily characterized by one reads they output. They utilize single molecule sequencing, which uses individual pieces of DNA instead of short sequence amplification, and real time sequencing, which runs constantly.

Answer 14

A third generation sequencing technique that utilizes single molecule real time and uses circular fragments to allow multiple passes to be done for each segment. High single pass error rate (~13%).

Answer 15

TGS that uses ionic solutions separated by a membrane. DNA molecules pass through the membrane one base at a time, changing the current. These shifts are recorded and used to identify the sequence. This also has a high error rate (~15%) and requires high quality DNA fragments.

Answer 16

To prevent the introduction and spread within Canada of plant pests of quarantine significance To detect and control or eradicate designated plant pests in Canada To certify plant and plant products for domestic and export trade

Answer 17

Let heads be coded by 1 and tails by 0. Let h be the probability of heads an d1-h the probability of tails. Let x be the result of the toss. p(x=1) = h p(x|h) = h^x(1-h)^1-x In a set of tosses: SEE NOTES

Answer 18

The value that maximizes the likelihood function

Answer 19

(1925-1983) She pioneered bioinformatics with the creation of the atlas of protein sequences and structures in 1965. It was the beginning of collection biological data into a single place.

Answer 20

(1924-1994) He was a population geneticist who studied amino acid sequences and the underlying nucleotide sequences. Proposed the natural theory of molecular evolution while collecting and analyzing data.

Answer 21

We can make certain predictions about population variation and notices there was a lot more variation than expected due to multiple variants having the same fitness advantage and allowing them to coexist. Many of these variants were neutral (not good or bad).

Answer 22

1969 - present | Invented Linus in 1991, which because the basis of all statistical computing

Answer 23

It is a method of estimating the parameters of a statistical model so the observed data is most probably. In protein design, it is used infer values givent the parameters.

Answer 24

The same number of observed interactions between pairs as seen in a real database.

Answer 25

Assumes likelihood is maximized by gradient descent and is numerically estimated by thermodynamic integration.

Answer 26

Aimed to formulate a protein design problem using model-based statistical inference. They used maximum likelihood principles to estimate the unknown parameters of a statistical potential, called inverse potential. This was then based on Markov chain Monte Carlo and applied to simple pairwise contact potential.

Answer 27

Guaranteed an optimal predictive power of the resulting potential and was very general, able to be applied to any form of statistical potential.

Final Flashcards

(51 cards)