Introduction Flashcards

Week 1 Lecture 1 - Introduction

1
Q

How often does DNA data double?

A

Every 18-24 months

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How often does structure data double?

A

Every 6 years

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the rough composition of the human genome?

A

3.2 Gbp, 2-5% coding, >50% repeated sequences, ~35% genes have alternative splicing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is bioinformatics?

A

The application of computers to biological problems:
- Aiding the biologist in creating, storing, and analysing biological data (mainly sequences/structures)
- Presenting it in a way biologists can use
- Applying the analysis to make predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is fragment assembly?

A

Searching sequence fragments for overlapping regions to join them in a continuous sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do we conduct fragment assembly?

A
  1. Enforce a minimum overlap size to reduce the probability of a chance match
  2. Fuzzy matches account for errors in sequencing
  3. Apply a confidence score
  4. 50% of the genome is repeated sequences so there may be problems with sequence repeats
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a moving window?

A

Take an odd number of residues and calculate some average property (typically between 7-21). Slide the window along each residue and calculate the averages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What information can we predict from a DNA sequence?

A
  1. Membrane regions
  2. Secondary structure
  3. Accessibility
  4. Flexibility
  5. Antigenicity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is an algorithm?

A

A complete and precise set of steps that will solve a problem and achieve an identical result when given the same set of data to a defined level of accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does computer programming enable us to do?

A
  1. Automation of tasks
  2. Manipulation of data
  3. Advanced analysis of data
  4. Tools to make predictions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are machine learning methods?

A

A general class of computer software which learns from examples and is then able to make predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do MLMs work?

A
  1. Train a learning method with real examples of data
  2. The method learns features of real examples
  3. Apply the trained system to make predictions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are examples of MLMs?

A
  1. Neural networks
  2. Decision trees
  3. Naive Bayesian classifiers
  4. Support vector machines
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a database?

A

A structured collection of data with some tool enabling it to be queried.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a databank?

A

A collection of data (normally in a simple text file) without an associated query tool. It allows you to use whatever software you like to analyse the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the types of databanks?

A
  • Primary
  • Secondary
  • Composite
  • Gateways
17
Q

What is a primary databank? Give examples.

A

Raw data deposition and curation. e.g. Genbank, PDB, UniProtKB

18
Q

What is a secondary databank? Give examples.

A

Derived data, patterns, annotations. e.g. Prosite, Pfam, Cath

19
Q

What is a composite databank? Give examples.

A

Non-redundant sets of data derived from primary databases. e.g. OWL, NRDB

20
Q

What is a gateway? Give examples.

A

Gateways give access to data. e.g. NCBI, Expasy, EBI

21
Q

What is a gene ontology? Give examples.

A

Controlled vocabulary to describe gene and gene product attributes. e.g. molecular function, biological process, cellular components.

22
Q

In what ways can you search databases?

A
  • Text searches
  • Sequence similarity searches
  • Structure similarity searches
23
Q

What are the different sequence alignment methods?

A
  • Automatic pairwise
  • Consensus
  • Profile
  • Structure prediction
24
Q

What does annotation include?

A
  • Authors
  • References
  • Methods
  • Cross-links to other databases
  • Feature tables
25
Q

What are some probelms with databanks?

A

The data might be unreliable.
- Multiple names of the same gene
- Multiple proteins with the same name
- Spelling errors
- Changes in annotations

26
Q

What does bioinformatics enable us to do?

A
  • Create data
  • Make predictions
  • Provide tools to store and search data
  • Create 3D models
  • Transfer of annotations