Bioinformatics. Flashcards

Question

What are 6 factors about RNA that bioinformatics can help study?

Answer 1

The different products of RNA splicing. The expression of different RNA’s in different tissues. The structure of different RNA’s. The types of RNA that are produced by a single gene. The RNA’s that are produced by thousands of genes at the same time. The creation of specific DNA chips and microarrays for RNA analysis.

Answer 2

To know all of the products that can be produced by a single gene. To know which RNAs are produced in response to an external stimulus. To build genetic probes and microarrays.

Answer 3

The identification of protein families. The identification of various protein domains and regions. The identification of various protein structures. The identification of various protein functions.

Answer 4

The results from electrophoresis and mass spectrometry are fed into a database for protein identification.

Answer 5

Databases that store information relating to nucleic acids. Databases that store information relating to proteins.

Answer 6

The NIH genetic sequence database and is part of the International Nucleotide Sequence Database Collaboration (INSDC).

Answer 7

The DNA Data Bank of Japan (DDBJ). The European Molecular Biology Laboratory (EMBL). The GenBank at NCBI in the USA.

Answer 8

To identify a particular nucleotide sequence by searching through millions of different nucleotides.

Answer 9

In a format that displays information about the nucleic acid such as function etc.

Answer 10

A database that combines all of the information from major international databases.

Answer 11

European Bioinformatics Institute (EBI). Protein Information Resource (PIR). Georgetown University Medical Centre (GUMC). National Biomedical Research Foundation (NBRF). The Swiss Institute of Bioinformatics (SIB).

Answer 12

A tool from the SIB and it provides access to databases and software tools that cover all areas of life sciences.

Answer 13

The expert Protein Analysis System.

Answer 14

A particular sequence that is chosen by the experimenter and instead into a BLAST search.

Answer 15

Of amino acids or nucleotides.

Answer 16

When a researcher wants to discover more information that relates to their query sequence.

Answer 17

At least 15 nucleotides or amino acids.

Answer 18

They insert the query sequence into a database. This allows them to compare their sequence to all of the known sequences within the database.

Answer 19

The FASTA forma. The identifier format.

Answer 20

Only the nucleotide or amino acid sequence.

Answer 21

It contains the sequence with an accession number or gene ID information.

Answer 22

The presentation of 2 sequences that can be compared to show the regions of similarity.

Answer 23

When a researcher compares their query sequence with a known sequence.

Answer 24

A term that is used to measure the quality of alignment between a query sequence and the search results.

Answer 25

On the number of nucleotide or amino acid matches between the query sequence and the search results.

Answer 26

The higher the score the better the alignment.

Answer 27

For the selection of the best match between the query sequence and the search results.

Answer 28

The expectation value. It measures the amount of possible outcomes.

Answer 29

The significance of an alignment. Many alignments mean that there are many possible outcomes. A single alignment means there is only one possible outcome.

Answer 30

The lower the E-Value the better the match.

Answer 31

The process of obtaining biological information from unprocessed sequence data.

Answer 32

To create a labelled genome, where biological information is linked to a particular genetic sequence.

Answer 33

Exactly what each gene does.

Answer 34

Structural and functional annotations.

Answer 35

Genomic elements such as the promoter sequence or the TATA box.

Answer 36

Researching the structure of genes. Researching the areas of the genome that code for certain products. Reaserching the location of regulatory motifs such as the TATA box.

Answer 37

The identification of the cis factors that are used in transcription and translation.

Answer 38

To identify the biological functions of genomic products such as proteins.

Answer 39

The BLAST program which allows us to identify similarities between genes and proteins.

Answer 40

To identify the genes found in long DNA sequences that code for amino acids and have no STOP-codons.

Answer 41

The DNA reading frame which allows for the strand to be read in triplets or codons.

Answer 42

The knowledge of the amino acid products that are provided by the DNA strand.

Answer 43

6 possible reading frames for a DNA molecule as it consists of 2 strands.

Answer 44

We can identify the protein that is created by a gene by using the list of amino acids that are created.

Answer 45

The Open Reading Frame Finder at ORF FINDER or at GEN-scan.

Answer 46

The fact that most of a genome is made from non-coding DNA. This means gene prediction software must identify coding DNA from non-coding DNA.

Answer 47

The open reading frame. A start codon. A stop codon. A terminator sequence (prokaryotes). A TATA box (eukaryotes). A Shine Delgano sequence (prokaryotes). Kozak sequence (eukaryotes) A poly-A addition sequence (eukaryotes). Intron and exon boundaries. CPG islands.

Answer 48

The bacterial genome.

Answer 49

Because they contain introns and exons that are spliced into and out of mRNA’s that code for proteins.

Answer 50

It should begin with a START-codon (a methionine residue) and end with an in frame STOP-codon.

Answer 51

It suggests that an ORF is not present.

Answer 52

As the longer the ORF, the less likely it is to occur by chance.

Answer 53

Ffor a newly sequenced gene to be analysed to see if it is already known and stored in a database.

Answer 54

BLAST (Basic Local Alignment Search Tool). FASTA (Fast All).

Answer 55

The BLAST program.

Answer 56

It searches through a database to find matching or similar sequences the one that is being tested.

Answer 57

As high scoring segment pairs (HSPs). Where the score is the amount of matches between the query sequence and the database sequence.

Answer 58

We can evaluate the matches.

Answer 59

To find regions of similarity between the sample sequence and the known sequences from the database.

Answer 60

Where both sequences have a region of similarity that is based in a single location.

Answer 61

Where the 2 sequences have regions of similarity all over the sequence.

Answer 62

A protein database. A protein query.

Answer 63

A protein database. A translated nucleotide query.

Answer 64

This method compares the 6-frame translations of DNA to a protein database.

Answer 65

A translated nucleotide database. A protein query.

Answer 66

A translated nucleotide database. A translated nucleotide query.

Answer 67

This method compares the 6 frame translations of a DNA query to the 6 frame translations of a DNA database.

Answer 68

To BLAST-P sequences.

Answer 69

DNA or protein database. A DNA or protein query.

Answer 70

A protein database. A translated DNA sequence.

Answer 71

A translated DNA database. A protein query.

Answer 72

A single line description that is followed by the lines of sequence data.

Answer 73

Becuase the description line has a greater than (“>") symbol in the first column.

Answer 74

It should not exceed 80 characters.

Answer 75

Lines of sequencing data without the FASTA definition line.

Answer 76

They are accession numbers, accession versions or gi’s. These are sequence ID tags that the database has attached to a particular gene or protein.

Answer 77

The analysis of DNA sequences from different organisms to determine the evolutionary relationships.

Answer 78

Bacteria can exchange DNA sequences via horizontal gene transfer.

Answer 79

Multiple sequence alignment software such as CLUSTAL and COBALT.

Answer 80

Phylogenetic trees.

Bioinformatics. Flashcards

(106 cards)