L2 Flashcards

1
Q

What is the major challenge of the genomics era?

A

To store and handle terabytes (TB) of sequence data through the establishment and use of computer databases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a database

A

A computerized archive used to store and organize data in such a way that information can be retrieved easily via a variety of search criteria

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are databases made of?

A

Computer hardware and software for data management

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What should each record(entry) in a database contain?

A

A number of fields that hold the actual data items.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the process of making a query

A

Process by which a user expects the computer to retrieve a whole data record by specifying a particular piece of info to be found in a particular field.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is knowledge discovery

A

A function of biological databases which refers to the identification of connections between pieces of information that were not known when the information was first entered

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Types of databases

A
  • flat file format
  • relational database management
  • object-oriented database management systems
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a flat file format

A

A long text file that contains many entries separated by a delimeter (|)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are database management systems

A

Sophisticated computer software programs for organizing, searching and accessing data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are relational databases

A

They make us of a set of tables to organize data. They are created using a programming language known as structured query language (SQL)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Each table in a relational database is also called

A

Relation which is made up of columns and rows. Columns represent individual fields. Rows represent values in the fields

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is a query executed in a relational database

A

The system selects linked data items from different tables and combines the information into one report

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are primary databases?

A

They are archives of raw proteins or DNA sequence data submitted by the scientific community

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Examples of primary databases

A

GenBank, Protein Data Bank (PDB)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What databases does the International Nucleotide Sequence Database Collaboration made of?

A

-GenBank
-European Molecular Biology Laboratory (EMBL)
-DNA Data Bank of Japan (DDBJ)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is GenBank

A

The most complete collection of annotated nucleic acid sequence data for almost every known organism

17
Q

GenBank consists of

A
  • DNA
  • mRNA
  • cDNA
  • ESTs
18
Q

The Genpept database is for?

A

Protein sequences, majority of which are conceptual translations from DNA sequences

19
Q

What are the two ways to search for sequences in GenBank

A
  • using text-based keywords
  • using molecular sequences to search by sequence similarity using BLAST
20
Q

Functional divisions in GenBank

A
  • EST
  • GSS
  • WGS
  • ENV
21
Q

EST (expressed sequence tags)

A

Contains short single cDNA reads. Represent what is expressed in a given tissue at a particular development stage

22
Q

GSS (genome survey sequences)

A

Contains genomic sequences derived from random single-pass reads

23
Q

WGS (whole genome shotgun sequence)

A

Use a whole genome shotgun approach to gain large coverage with the caveat of large amounts of unassembled sequence

24
Q

ENV (environmental samples)

A

Contains sequences normally derived from a metagenomic sample

25
Q

Other functional divisions in GenBank

A
  • PRI (primate sequences)
  • ROD (rodent sequences)
  • MAM (other mammalian sequences)
  • PLN (plant, fungal and algal sequences)
  • VRL (viral sequences)
26
Q

Similarity and difference in data contained in GenBank, EMBL, DDBJ

A

Similarity: data entered is identical, info regarding species, sequence length are entered via structure fields

27
Q

What is accession

A

Usually a single or two letters followed by five or six digits respectively (U12344)

28
Q

what is GI

A

If a sequence changes in any way, a new GI number will be assigned

29
Q

What is the FASTA file format

A

It is a sequence format because it contains plain sequence information

30
Q

What are secondary databases

A

They contain computationally processes sequence information derived from primary databases

31
Q

Example of a secondary database

A

SWISS-PROT. It provides detailed sequence annotation that includes structure, function and protein family assignment

32
Q

specialized databases

A

Serve a specific research community or focus on a particular organism

33
Q

Specialized databases include

A
  • Flybase
  • WormBase
  • TAIR
  • EcoCyc
  • SGD
34
Q

What is Entrez

A

A biological database retrieval system which comes from cross-referencing between NCBI databases

35
Q

Disadvantages of biological databases

A
  • overreliance on sequence information and related annotations without understanding the reliability of the information
  • there can be many errors in sequence databases
  • there are high levels of redundancy in primary sequence databases
  • annotations of genes can also be false or incomplete