Access to Sequenced Data and Related Information Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

Library of related information

  • collection & and preservation, easy access, standardized data presentation, minimize redundancy, data independence, management, updating, and organizing data into knowledge
A

BIOLOGICAL DATABASES

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

3 Main Nucleotide Sequence Database

A

GenBank
European Nucleotide Archive
DNA Database of Japan

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

National Center for Biotechnology Information (NCBI) of the National Institutes of Health (NIH) in Bethesda

A

GenBank

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

European Molecular Biology Laboratory (EMBL)-Bank Nucleotide Sequence Database at the European Bioinformatics Institute (EBI) in Hinxton, England

A

European Nucleotide Archive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

National Institute of Genetics in Mishima

A

DNA Database of Japan

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Other Common Biological Database

A

PubMed
UCSC
Genome Browser
e!Ensembl
FlyBase
UniProt
WormBase
GENEONTOLOGY
RCSB PROTEIN DATA BANK
tair
Rice Genome Annotation Project
Kyoto Encyclopedia of Genes and Genomes (KEGG)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Integration of Biological Databases

Challenges:
1. Database architecture = similar structure
2. How to access & what can be accessed data surfing
3. Naming system (S. cerevisiae RAD24 =rad17 in S. pombe)
4. Clash of concepts = definitions of terms (definition of GENE)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Integration of Biological Databases
Approaches:

A

Link Integration
View Integration
Data Warehousing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Integration of Biological Databases
Approach wherein:

▪ researchers begin their query with one data source and then follow hypertext links to related information in other data sources
▪ Vulnerable to naming clashes and ambiguities, updates, researcher-dependent

A

Link Integration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Integration of Biological Databases
Approach wherein:

▪ leaves the information in its source databases but builds an environment around the databases that makes them all seem to be part of one large system
▪ didn’t perform as well as the source database

A

View Integration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Integration of Biological Databases
Approach wherein:

▪ bringing all the data under one roof in a single database
▪ Issue on keeping the data warehouse up to date

A

Data Warehousing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What technique transforms the contents of multiple source databases to common data model. It then integrates the source data into a single large database.

A

Data warehouse technique

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Types of Biological Data

A

Genomic Databases
RNA Databases
Protein Databases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Genomic Databases (3)

A

Sequenced Tag Sites (STS)
Genome Survey Sequences (GSSs)
High-Throughput Genomic Sequence (HTGS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

which genomic database?

= short (typically 500 base pairs long)
genomic landmark sequences

A

Sequenced Tag Sites (STS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which genomic database?

= consist of sequences that are genomic in origin

A

Genome Survey Sequences (GSSs)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Which genomic database?

= contains unfinished DNA sequences from sequencing centers

A

High-Throughput Genomic Sequence (HTGS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Which RNA database?

= contain sequence data on “single-pass” cDNA sequences

A

Expressed Sequence Tags (ESTs)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Which RNA database?

= (unique gene) created for gene-oriented clusters by making nonredundant sets of ESTs

A

UniGene

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Which protein database?

is the most comprehensive, centralized
protein sequence catalog

A

UniProt (aka Universal Protein Resource)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

RNA Databases (2)

A

Expressed Sequence Tags (ESTs)
UniGene

22
Q

Key databases of UniProt:

A

Swiss-Prot
TrEMBL
PIR

23
Q

key database under UniProt

= considered the best-annotated protein database (structure and function)

A

Swiss-Prot

24
Q

Key database under UniProt

= translated EMBL-NSDL provides automated annotations of proteins

A

TrEMBL

25
Q

Key database under UniProt

maintains the Protein Sequence Database curated also by experts

A

PIR (aka Protein Information Resource)

26
Q

Key database layers under UniProt (3)

A

UniProtKB
UniRef
IniParc

27
Q

Key database layer under UniProt

the central database of either manual or automated annotations

A

UniProtKB (UniProt Knowledgebase)

28
Q

Key database layer under UniProt

offer nonredundant reference clusters on UniProtKB

A

UniRef (UniProt Reference Clusters)

29
Q

Key database layer under UniProt

consists of a stable, nonredundant archive of protein sequences

A

UniParc (UniProt Archive)

30
Q

Central Bioinformatics Resource (2)

A

National Center for Biotechnology Information (NCBI)
European Bioinformatics Institute (EBI)

31
Q

Central Bioinformatics Resource

  • creates public databases
  • conducts research in comp. biology
  • develops software tools for analyzing genome data
  • disseminates biomedical information
A

National Center for Biotechnology Information (NCBI)

32
Q

Central Bioinformatics Resource

  • Comparable to NCBI in its scope and mission
  • Represents a complementary, independent resource
  • Have six (6) core molecular databases
A

European Bioinformatics Institute (EBI)

33
Q

A database providing information on the structure of assembled genomes, assembly names and other meta-data, statistical reports ,and links to genomic sequence data

A

Assembly

34
Q

Finds regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance.

A

BLAST

35
Q

a molecular biology database system that provides integrated access to databases

A

Entrez

36
Q

Boolean Operators:

A

AND, OR, NOT

37
Q

What searching database punctuation should be used to find specific phrase

A

Quotation marks “ “

38
Q

What searching database punctuation should be used to process as a unit rather than sequentially

A

Parenthesis ( )

39
Q

What searching database punctuation should be used for truncating query that begins/ends with a particular text string

A

Asterisk*

40
Q

a string of about 4–12 numbers and/or alphabetic characters that are associated with a molecular sequence record/expression/structure

A

Accession Numbers

41
Q

assigned consecutively to each sequence that is processed

A

GenInfo Number (GI:12345678)

42
Q
  • provide the best representative sequence for each normal (i.e., nonmutated) transcript produced by a gene and for each normal protein product
  • are curated by the staff at NCBI and are nearly nonredundant
A

Reference Sequence (RefSeq) Project

43
Q

define genomic sequences that can be used as reference standards for genes, representing a standard allele

A

Locus Reference Genomic (LRG) Project

44
Q

was established to identify a core set of protein-coding sequences that provide a basis for a standard set of gene annotations; “gold standards” of best-supported gene and protein annotations

A

Consensus Coding Sequence (CCDS) Project

45
Q

Offers high-quality, manual (expert) annotation of the human and mouse genomes, as well as selected other vertebrate genomes

A

Vertebrate Genome Annotation (VEGA) Project

46
Q

databases with a graphical interface representing sequence information and other data as a function of position across the chromosomes

A

Genome Browsers

47
Q

Principal genome browsers are:

A
  1. University of California, Santa Cruz (UCSC) Genome browser
  2. Ensembl Genome browser
  3. Map Viewer at NCBI
48
Q

Genome browser

  • supports the analysis of dozens of vertebrate and invertebrate genomes
  • provides graphical views of chromosomal locations at various levels of resolution
  • Each chromosomal view is accompanied by horizontally oriented annotation tracks
A

University of California, Santa Cruz (UCSC) Genome browser

49
Q

Genome browser

offers a series of comprehensive websites emphasizing a variety of eukaryotic organisms

A

Ensembl Genome browser

50
Q

Genome browser

includes chromosomal maps for a variety of organisms
(1) Home page, (2) genome view, (3) map view,(4) sequence view

A

Map Viewer at NCBI