Databases (Week 2) Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q
  1. Define “Bioinformatics”
A

Oxford: Management info system for molecular biology + practical applications

NCBI: Field of science in which biology, computer science + information technology
merge into 1 discipline. 3 subtypes
✓ Development of new algorithms + statistics to assess relationships among
data set
✓ Analysis + interpretation of data types including nucleotide + amino acid
sequences
✓ Development + implementation of tools that enable efficient access +
management of different information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
  1. Difference between bioinformaticist and bioinformatician
A

Bioinformaticist: An expert that knows how to use bioinformatics tools + write
interfaces for effective use of tools. Designs + implements tools + makes use
complex algorithms

Bioinformatician: A trained individual who knows how to use bioinformatics tools
without a deeper understanding. Most biological scientists have a basic
understanding of underlying algorithms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
  1. Difference between information storage with regards to single sequences,
    features and annotations, as well as collections. Know examples thereof
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
  1. Define “database”
A

Database: A comprehensive collection of related data organized for convenient access, generally stored in computer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

5.1 Raw Data (8)

A
  1. Sequences = uploaded + maintained by those who submitted it
  2. RAW
  3. Redundant (unnecessary)
  4. Level of info = sparse
  5. Can represent incomplete records
  6. Quality of data = unknown
  7. Longevity can be indefinite
  8. No automatic linking withother database
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

5.2 Curated Data (6)

A
  1. 3rd party that maintains it, who did not necessarily generate data
  2. Originates from a primary database
  3. Non – redundant (not unnecessary)
  4. Created to extract extra information
  5. Represents complete records
  6. Linking with other databases
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

5.3 Specialist Data (4)

A
  1. Contains mix of Primary + Derived data of only select group or single species Maintained + updated by unofficial collaborators
  2. Contains bulk Whole Genome Sequencing data which can be considered as primary
  3. Contains reference sequences derived from multiple sequences
  4. Restricted public access
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
  1. Difference between level of scope and level of curation of databases
A

Level of scope: Single, collection + features and annotations [SCFA]

Level of curation: Raw, curated and specialist [RCS]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
  1. Examples of raw, curated and specialist databases
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

8.1 Direct query (& PROBLEM)

A
  1. Know exactly what is wanted
  2. Each record in database has unique value = Accession number
  3. Return results ONLY for that query. Unless database contains links to similar data

Problem: few databases share same accession
numbers + use same format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

8.2 Indirect query (4)

A
  1. Generally know what is wanted
  2. Can use different text queries such as gene names, organisms, products of genes
  3. Multiple entries for 1 sequence you will receive multiple results
  4. Meta data such as authors, date of publication can also be used
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

8.3 Way to search with multiple queries:

A
• NCBI’s Entrez system: link out to other databases that might contain
different data of your match
• Return matches from multiple databases
• Boolean Operators
• AND, OR + NOT
• CAPITALS
• Multiple times in 1 query
• ….. AND …. OR …. NOT …. AND …. OR
• The NCBI’s Entrez system: certain info of records individually indexed
allowing you to search specifically for them
• Text qualifiers or Indexed terms 4
• Sequence length [SLEN]
• Organism [ORGN]
• Features [FKEY]
• Properties [PROP]
• gbdiv – GenBank division
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Data type refers to
AND
Single means

A

data type - different formats

single - that it is one data entry with more than one data type.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

9.1 FASTA Text format

A

FASTA (*. fa or *.fas or *.fasta)
• Simple text-based file format
• Edit with Notepad or any other basic text editor

• Used to download sequence info from records

  • NO OTHER DATA PRESENT
  • NO SEQUENCE FEATURES OR RECORD FEATURES
  • Only 2 lines per record
  • 1st line preceded with ‘>’ to denote name of record
  • 2nd line sequence itself
  • ONLY IUPAC nucleotide or amino acid letters
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

9.2 GenBank Text format

A

flat file (*.gb)

  • Complex file format that preserves ALL sequence information
  • Sequence features + Meta data
  • Not readably editable
  • Cannot open with text editor.
  • Allows interactive views of sequences when using programs that can accommodate them
  • Multiple lines per record # of lines dependent on how much data is available
How well did you know this?
1
Not at all
2
3
4
5
Perfectly