Bioinformatics Flashcards

Question

What is a dotplot

Answer 1

a graphical method that allows the comparison of two biological sequences and identification of regions of close similarity between them

Answer 2

Similarity matrices are used to align sequences of nucleic acids or amino acids

Answer 3

1 for a match; 0 for a mismatch

Answer 4

a more complicated matrix would give a higher score to transitions (pyrimidine to pyrimidine or purine to purine) than to transversions (pyrimidine to purine or vice versa); the match/mismatch ratio of the matrix sets the target evolutionary distance

Answer 5

an algorithm used to align protein or nucleotide sequences; one of the first applications of dynamic programming to compare biological sequences

Answer 6

by Dayhoff in 1978

Answer 7

by Henrikoff and Henrikoff in 1992 by use of BLOSUM matrices

Answer 8

introduced by Needleman and Wunsch (global) in 1970 and formalised by Smith and Waterman (local) in 1981

Answer 9

Genbank, EMBL, DDBJ, UniProtKB/SwissProt, PIR, PDB, Enzyme

Answer 10

a protein identified from genome data is hypothetical until verified by experiment

Answer 11

structural data

Answer 12

enzyme classifications (EC numbers)

Answer 13

PROSITE, PRINTS, BLOCKS, INTERPRO

Answer 14

A complete and precise set of steps that will solve a problem and achieve an identical result whenever given the same set of data to a defined level of accuracy.

Answer 15

PROSITE is a protein database consisting of entries describing the protein families, domains and functional sites, as well as amino acid patterns and profiles in them.

Answer 16

[ST]-x-[RK]

Answer 17

N-{P}-[ST]-{P}

Answer 18

[FY]-C-[RH]-[NS]-x(7,8)-[WY]-C

Answer 19

a graphical method that allows the comparison of two biological sequences and identification of regions of close similarity between them

Answer 20

1 for a match; 0 for a mismatch

Answer 21

a subfield in the general field of genome analysis, which includes anything that can be done with genome sequences by computational means

Answer 22

- a coding region may be missed - an incomplete protein may be reported - splicing may be predicted incorrectly - coding regions may overlap - exon assembly (splicing) may be different in different tissues - some apparent coding sequences may be defective or not expressed

Answer 23

introduced by Needleman and Wunsch (global) in 1970 and formalised by Smith and Waterman (local) in 1981

Answer 24

approximate fast methods

Answer 25

- index the database by finding locations of short 'words' - take 'words' from the probe sequence and look them up in the index - look for multiple matches and extend to find likely hits to full alignment

Answer 26

by the Sanger method: di-deoxy chain termination

Answer 27

each segment

Answer 28

approximate fast methods

Answer 29

- index the database by finding locations of short 'words' - take 'words' from the probe sequence and look them up in the index - look for multiple matches and extend to find likely hits to full alignment

Answer 30

by the Sanger method: di-deoxy chain termination

Answer 31

each segment

Answer 32

aligning and merging fragments from a longer DNA sequence in orfer to reconstruct the original sequence

Answer 33

A complete and precise set of steps that will solve a problem and achieve an identical result whenever given the same set of data to a defined level of accuracy.

Answer 34

1. detect similarity with known coding regions | 2. ab initio methods; make predictions based on typical features

Answer 35

expressed sequence tags; short subsequences of a cDNA sequence, used to identify gene transcripts and instrumental in gene discovery and in gene-sequence determination

Answer 36

``` initial 5' exon (transcription start point with upstream promoter; ends immediately before a GT splice signal) internal exons (begins after AG; ends before a GT splice signal) final 3' exon (begins after AG splice signal; ends with stop codon and poly-A tail) ```

Answer 37

machine learning methods; a general class of computer software which learns from examples and is then able to make predictions

Answer 38

- artificial neural networks - support vector machines - decision trees - naive Bayesian classifiers

Answer 39

- family of models inspired by biological neural networks | - used to estimate or approximate functions that can depend on a large number of inputs and are generally unknown

Answer 40

- a coding region may be missed - an incomplete protein may be reported - splicing may be predicted incorrectly - coding regions may overlap - exon assembly (splicing) may be different in different tissues - some apparent coding sequences may be defective or not expressed

Answer 41

the quality of raw data is as good as the methods that produce it the quality of annotations is as good as the curators

Answer 42

the quality of raw data is as good as the methods that produce it the quality of annotations is as good as the curators