Bioinformatics Flashcards
Aspects of sequence analysis
- gene and promoter prediction
- RNA secondary structure and gene expression
- protein sequence analysis
- restriction mapping for cloning and primer design for PCR
Purposes of DNA sequencing in bioinformatics
- able to convert between sequence formats
- percentage nucleotide composition
- restriction analysis (looking for restriction sites)
- primer design
- finding coding and non-coding features
- removal of vector sequence
- gene prediction
- getting the final protein
Aspects of bioinformatics and cloning
- retrieving the sequence of interest
- identifying restriction enzyme sites
- engineering new sites using PCR
- sequencing an insert
What is an SNP?
- a DNA sequence variation occurring commonly within a population in which a single nucleotide in the genome differs between members of a species
What does fst measure?
The similarity between populations due to genetic structure
Implications of genetic diversity
- could affect gene expression if in a TF site
- could affect splicing
- could affect protein abundance or function
- potential disease phenotype
Uses of a functional interaction network
- function prediction
- network analysis for important proteins
Applications of network/systems biology
- finding important genes
- viewing connections between genes and the effect of knocking one out
- overlaying high-throughput data
- GWAS analysis
- integrating data from different sources
- multi-disciplinary research
Types of signatures for functional regions
- position weight matrix
- protein signatures
- pattern
- matrix/profile
- hidden markov model
Features of a genomic context
- which strand it is encoded on
- Exon/intron structure
- promoter region
- other features
- genes up and downstream
Models of evolution
- nucleotide substitution
- amino acid substitution
- demographic
- molecular clock
- phyllo geographic
What does GWAS stand for?
Genome wide association studies
Workflow of experiment for a pharmacogenetic gene
- GWAS studies and data king to identify candidate genes
- functional analysis and validation of candidate gens
- drug id and population studies
- point of care and personalized medicine
Properties to assess in a systems nework
- hubs
- degree
- betweenness
- closeness
What is a gene signature?
A group of genes whose combined expression pattern is uniquely characteristic of a biological phenotype
Why does BLAST work?
- similar sequences have similar functions and are evolutionarily related
BLASTn
Nucleotides
BLASTp
Amino acids
BLASTx
Six frames of nucleotides vs amino acids
tBLASTn
Amino acids vs six frames of nucleotides
tBLASTx
6 frame nucleotides vs amino acids vs 6 frame nucleotides into amino acids
MegaBLAST
Most commonly used because it is fast, but less sensitive
PSI BLAST
Slow but takes into account regions that are more evolutionarily conserved
Define E
The value for match equal to be the probability of getting that match by chance
Why would you use multiple alignment vs pair wise
It reveals more subtle similarities
Evolutionary relationships become apparent when examining more that 3 sequences in alignment
Features of ecological/demographic histories of populations
- gene flow
- population size changes
- natural selection
- migration
Pair wise alignment methods
- ClustalW
- MUSCLE
- MAUVE
Balancing selection
Some useful/conditionally useful mutations never reach fixation
What does UPGMA stand for?
Unweighted pair group method with arithmetic mean
What does BLAST stand for?
Basic local alignment search tool
Other methods of finding evolution
- neighbour joining
- least squares
- max positioning
- max likelihood
- Bayesian MCMC (markov chain Monte Carlo)
Mechanisms of recombination
- double stranded break and repair
- disintegration and repair
- template switching during reverse transcription
Why is recombination important?
- repairing DNA breaks
- repairing harmful mutations
- better exploration of sequence space
Define sequence space
Every possible combo of nucleotides in every length of DNA
Problems with recombination
- If parental sequence is too diverged, the sequence specific interaction is compromised
- high rate of recombination breaks beneficial mutations
- more sequence space is not necessarily beneficial
Process of basic functional analysis of sequences
- collect the sample
- sequence alignment
- find conserved regions
- generate signature
- run against other sequences
- functional/context analysis