Introduction Flashcards
Week 1 Lecture 1 - Introduction
How often does DNA data double?
Every 18-24 months
How often does structure data double?
Every 6 years
What is the rough composition of the human genome?
3.2 Gbp, 2-5% coding, >50% repeated sequences, ~35% genes have alternative splicing
What is bioinformatics?
The application of computers to biological problems:
- Aiding the biologist in creating, storing, and analysing biological data (mainly sequences/structures)
- Presenting it in a way biologists can use
- Applying the analysis to make predictions
What is fragment assembly?
Searching sequence fragments for overlapping regions to join them in a continuous sequence
How do we conduct fragment assembly?
- Enforce a minimum overlap size to reduce the probability of a chance match
- Fuzzy matches account for errors in sequencing
- Apply a confidence score
- 50% of the genome is repeated sequences so there may be problems with sequence repeats
What is a moving window?
Take an odd number of residues and calculate some average property (typically between 7-21). Slide the window along each residue and calculate the averages.
What information can we predict from a DNA sequence?
- Membrane regions
- Secondary structure
- Accessibility
- Flexibility
- Antigenicity
What is an algorithm?
A complete and precise set of steps that will solve a problem and achieve an identical result when given the same set of data to a defined level of accuracy
What does computer programming enable us to do?
- Automation of tasks
- Manipulation of data
- Advanced analysis of data
- Tools to make predictions
What are machine learning methods?
A general class of computer software which learns from examples and is then able to make predictions
How do MLMs work?
- Train a learning method with real examples of data
- The method learns features of real examples
- Apply the trained system to make predictions
What are examples of MLMs?
- Neural networks
- Decision trees
- Naive Bayesian classifiers
- Support vector machines
What is a database?
A structured collection of data with some tool enabling it to be queried.
What is a databank?
A collection of data (normally in a simple text file) without an associated query tool. It allows you to use whatever software you like to analyse the data.