Bioinformatics - Final Exam Content Flashcards

1
Q

What are the differences between substitution models?

A

the substitution changes based on what parameters you include, simplest models include just the number of substitutions (hamming distance), others correct for unobserved mutations, some may characterize transitions vs transversions differently, others may have proportions of invariable sites and gamma distributions, differences between models result from what parameters each model includes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What parameters are included in substitution models?

A
  • transitions vs transversions
  • hamming distance
  • jukes and cantor distance (correcting for unobserved mutations)
  • equal/unequal base frequencies
  • proportion of invariable sites
  • gamma distributed rate variation among sites
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you find the best substitution model?

A
  • the best thing to do is test ALL models and find the one that best fits your sequence data, this is done under the maximum likelihood framework, based mostly on lowest BIC and highest AIC values
  • after all of this is determined you also want to include bootstrap analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the steps to finding the best Tree?

A
  1. do a tree search under each model
  2. calculate the maximum likelihood score of the best tree for each model
  3. compare them using BIC or AIC scores, which are estimators of relative quality of statistical models
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do phylogenetic approaches provide insight on evolution?

A

phylogeny - compare phylogenies to biogeography and major paleoecological events
evolutionary processes - pattern heterogeneity and selection ratios (dN/dS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do we use the Disparity Index (I) to estimate pattern heterogeneity?

A
  • a common WRONG assumption is that sequences evolve in homogeneity (same conditions and processes)
  • we know that sequence evolve differently based on locations and pressures
  • we measure pattern heterogeneity via the disparity index
  • the disparity index identifies pairs of sequences that evolved under substantially different evolutionary processes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the basis for dN/dS ratio tests?

A

it is a means to test if selection is occuring, substitution rate outliers will include sequences which affect an organism’s ability to survive and reproduce, substitution patterns reflect selection and dN/dS is the best thing we have for this

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you interpret I (disparity index) statistics?

A

I = 0 means the sequences evolved under the same processes and pressures
I > 0 means the sequences evolved under different processes and pressures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

how do you interpret dN/dS statistics?

A

dN/dS = 1 : neutral not undergoing selection
dN/dS > 1 : positive selection so a mutation made that is beneficial
dN/dS < 1 : purifying selection so a mutation change is bad and these will lead to fixed sites

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Transition

A

a change from an A to G or C to T
- in other words these are substitutions which are more likely to happen because we are not changing from purine to pyrimidine or vice versa

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Transversion

A

a change from A>C, A<T, G<C, G<C
- these are substitutions which happen less frequently and are more serious because it is change from purine to pyrimidine or vice versa

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Hamming Distance ( Dh)

A
  • the simplest approach to modeling substitutions, it counts the number of difference, this is differences divided by length
  • Dh = n / N
  • n is the number sites which are different
  • N is the length of the alignment
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Jukes and Cantor (1969)

A
  • a model for distance of substitutions which corrects for unobserved mutations
  • Djc1969 = (-3/4)ln(1-4/3p)
  • p = the proportion of sites which differ between sequences
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Distance (phylogenetic tree sense)

A

essentially it is based on how different sequences in the alignment are taking into account the differences or substitutions which have occurred

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Proportion of invariable sites (I)

A
  • a parameter to significantly improve models
  • (I) is the extent of static, unchanging site in the dataset
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

gamma distribution (G)

A
  • a parameter to significantly improve models
  • indicates a gamma distributed rate variation among sites
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

BIC value

A

bayesian information criteria (lowest scored model is best)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

AIC value

A

akaike information criteria (highest scored model is best)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

pattern heterogeneity

A

if two sequences evolved under the same processes their nucleotide composition will be similar, however if they evolved under separate pressures their nucleotide composition will reflect that

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

dN/dS ratio

A
  • a highly important and common approach for testing if selection has occurred
  • nonsynonymous subs per site / synonymous subs per site
  • = 1 : neutral not undergoing selection
  • > 1 : positive selection so a mutation made that is beneficial
  • <1 : purifying selection so a mutation change is bad and these will lead to fixed sites
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

disparity index (I)

A

the observed difference in evolutionary patterns for a pair of sequences based on nucleotide composition
- I = 1/2 summation (xi - yi) squared - Nd
- xi = composition of ith nucleotide
- yi = composition of ith nucleotide
- Nd = composition of distance under homogeneity
values associated w disparity index:
I = 0 -> same evolutionary pressures
I > 0 -> different evolutionary pressures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

neutral theory of molecular evolution (Kimura 1968)

A
  • most mutations are neutral or “nearly neutral
  • it is a basic principle that differences in fecundity lead to natural selection and fixation of mutations
  • substitution pattern reflect selection
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

synonymous

A
  • sub where the amino acid will stay the same
  • more likely to be neutral
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

nonsynonymous

A
  • sub where the amino acid will change
  • more likely to change phenotype
  • positive selection may result from a beneficial change in phenotype
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

neutrality

A
  • dN/dS ratio = 1 where the number of dN and dS are the same, indicates no selection happening
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

positive selection

A
  • when the dN/dS ratio > 1
  • a mutation is beneficial so selection is occuring to change to that mutation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

purifying selection

A
  • when the dN/dS ratio < 1
  • a mutation is detrimental so selection is preventing that bad mutation and working to fix a site in a population
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is the perspective that molecular genetics uses to examine variation?

A

molecular evolution/genetics focuses on fixed differences between species

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is the perspective that population genetics uses to examine variation?

A

population genetics focuses on the differences between populations of one species
- so like how does a mountain range separating two populations of the same species affect how those species have evolved

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What parameters are estimated in population genetics?

A

gene pool, allele frequency, genotype frequency
- these population parameters will affect the gene pool in a predicted way

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

what are the basics of the hardy weinberg equilibrium (HWE)?

A
  • extending Mendel’s law of inheritance to populations yield HWE
  • when gametes containing either two alleles, A or a, unite in random to form the next generation, the genotype frequencies in offspring (zygote) is A : Aa : a (p2 : 2pq : q2)
  • we maintain genotype frequency by allele frequency
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

what are the assumptions of the HWE?

A

allele frequencies will remain constant over time if these assumptions are met:
- random mating
- infinite population size
- no migration
- no selection
- no mutation
violations to these assumptions have predicted effects on allele and genotype frequencies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

how does violating assumptions of HWE effect parameters?

A

inbreeding - decreases heterogeneity, so genotype frequencies change but allele frequencies to not, lead to heritable diseases
genetic drift (small pop) - randomly drift towards one allele, so we converge on one allele type (fixation), but which allele becomes fixed is random
migration - may lead to admixture, combining two or more pops w different allele frequencies into one group
selection - maybe recessive, dominant, or additive, a frequency of a certain allele becomes fixed in a population
mutation - randomly change genotype

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

what can you estimate if you know allele and genotype frequencies?

A

we can go backwards and guess which assumptions were violated
- inbreeding rates
- population sizes
- effective population size (number of breeders)
- migration/dispersal
- population structure/gene flow
- recent changes in population sizes
- selection coefficients
- genotype-phenotype associations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

how do we estimate population structure with fixation index (Fst) values?

A
  • we look at how alleles are distributed among vs within populations
  • Fst is an estimate of the genetic divergence between species
  • Fst = AP / (WI + AI + AP)
    AP = estimated variance in allele frequencies Among Populations
    WI = estimated variance in allele frequencies Within Individuals
    AI = AP = estimated variance in allele frequencies Among Individuals
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

what are microsatellites?

A

short repeats found within a species and certain populations may have varying numbers of these repeats
- short segment of DNA, usually one to six or more base pairs in length, that is repeated multiple times in succession at a particular genomic location

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

how are microsatellites genotyped?

A
  • obtain primer for microsat
  • PCR
  • fragment analysis
  • see how big the pieces are to determine how many repeats they have
  • genotype
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

populations

A

group of individuals of one species living in the same geographical area

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

subpopulations

A

local populations within which most individuals find their mates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

gene pool

A

all genetic variation within a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

allele

A

variant at a locus, comes from a mutation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

locus

A

independent location on a chromosome, can be a gene

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

allele frequency

A

proportion of any specific allele in a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

genotype frequency

A

proportion of individuals in a population with a specific genotype
(in diploid, the genotype is the combination of two alleles in individual hetero or homo)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Hardy Weinberg equilibrium

A

when gametes containing either of two alleles, A or a, unite at random to form the next generation, the genotype frequencies in offspring (zygote) is AA : Aa : aa (alo p2 : 2pq : q2) and p + q = 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

inbreeding

A

violates non-random mating, decreases heterogeneity and usually fitness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

genetic drift

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

migration

A
  • movement of individuals between populations followed by breeding
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

selection

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

additive selection

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

recessive selection

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

dominant selection

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

fixation index (Fst)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

microsatellites

A

a short segment of DNA, usually one to six or more base pairs in length, that is repeated multiple times in succession at a particular genomic location. These DNA sequences are typically non-coding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

how are phenotypes associated with genotypes?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

why are phenotypes associated with genotypes?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

how do we model gene-phenotype interactions?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

how does linkage disequalibrium lead to haplotype blocks?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

how does linkage disequalibrium lead to haplotype blocks?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

how does linkage disequilibrium lead to haplotype blocks?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

how are GWAS studies performed?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

how are GWAS studies interpreted?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

what are some ways to decrease error in GWAS studies?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

genome wide association studies (GWAS)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

quantitative traits

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

genotype-phenotype association

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q

genotype-phenotype models

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

multiplicative : genotype-phenotype model

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
69
Q

additive : genotype-phenotype model

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
70
Q

additive : genotype-phenotype model

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
71
Q

recessive : genotype-phenotype model

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
72
Q

common dominant : genotype-phenotype model

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
73
Q

polygenic : genotype-phenotype model

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
74
Q

linkage map

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
75
Q

cM

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
76
Q

linkage disequilibrium

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
77
Q

haplotype block

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
78
Q

coefficient of linkage disequilibrium (D)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
79
Q

TAG SNP

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
80
Q

Bonferroni correction

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
81
Q

power

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
82
Q

odds ratio

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
83
Q

multi-stage approach

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
84
Q

permutation

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
85
Q

false positives

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
86
Q

population stratification

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
87
Q

admixture

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
88
Q

why do need NGS? what did we hope to learn?

A

most phenotypes and diseases are complex
Health things to learn
- genetic factors affecting health
- predict, prevent, detect disease
- personalized effective treatment
- monitor disease progression
Wildlife/domestic animals things to learn
- genes that affect traits
- better management and conservation
- improve important traits

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
89
Q

what makes up our genome?

A
  • 45% of the genome is repetitive elements
  • 30% of genome from genes, of that only about 2% is coding exons, there are also noncoding RNAs
  • 70% of genome is intergenic (between genes), this includes repetitive elements (simple repeats, transposons, SINES and LINES), conserved noncoding regions, regulatory regions, and structural regions (centromeres and telomeres)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
90
Q

what types of variation are present in genomes?

A
  • deletion
  • duplication
  • inversion
  • translocation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
91
Q

why do we need next gen sequencing (NGS)?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
92
Q

elaborate on the development of NGS

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
93
Q

how has NGS impacted genomics

A
94
Q

what is illumina sequencing technology?

A
95
Q

what are the methods of illumina sequencing technology?

A
96
Q

how is sequence data presented and formatted in Fastaq files?

A
97
Q

how is a De Novo sequencing assembly constructed using NGS?

A
98
Q

how do you evaluate how good an assembly is?

A
99
Q

how do you deal with repeats when assembling contigs and scaffolds?

A
100
Q

how/why do we re-sequence genomes to characterize variation?

A
101
Q

repetitive elements

A
102
Q

Alu transposable element

A
103
Q

L1 transposon

A
104
Q

Hemophilia A

A
105
Q

indel

A
106
Q

SNP

A
107
Q

structural variation

A
108
Q

insertion : structural variation

A
109
Q

deletion : structural variation

A
110
Q

translocation : structural variation

A
111
Q

inversion : structural variation

A
112
Q

alternative splicing

A
113
Q

MAPT gene

A
114
Q

next generation sequencing (NGS)

A
115
Q

illumina

A
116
Q

adapter

A
117
Q

barcode

A
118
Q

flow cell

A
119
Q

cluster

A
120
Q

bridge amplification

A
121
Q

cycle

A
122
Q

paired reads

A
123
Q

Fastaq

A
124
Q

phred score

A
125
Q

vector

A
126
Q

De Novo assembly

A
127
Q

C (coverage)

A
128
Q

string graph

A
129
Q

consensus

A
130
Q

N50

A
131
Q

contigs

A
132
Q

scaffolds

A
133
Q

collapsed contig

A
134
Q

repeat region

A
135
Q

mate pair reads

A
136
Q

assembly programs

A
137
Q

velvet

A
138
Q

re-sequencing

A
139
Q

split mapping

A
140
Q

what are the strategies behind genome re-sequencing?

A
141
Q

what is the design of low coverage re-sequencing?

A
142
Q

what are different types of reduced-representation sequencing?

A
143
Q

what are ampliconic libraries?

A
144
Q

what are the different types of targeted enrichment libraries?

A
145
Q

elaborate on the methods for RadSeq libraries

A
146
Q

how does one interpret the results of RadSeq libraries?

A
147
Q

how can genomics be used to understand adaptation?

A
148
Q

genome re-sequencing

A
149
Q

low-coverage sequencing

A
150
Q

reduced-representation sequencing

A
151
Q

restriction enzyme digestion

A
152
Q

plasmodium flaciparum

A
153
Q

amylase

A
154
Q

targted enrichment

A
155
Q

uniplex

A
156
Q

multiplex

A
157
Q

RainStorm

A
158
Q

hybridization

A
159
Q

oligo probes

A
160
Q

biotin

A
161
Q

streptavidin

A
162
Q

miller syndrome

A
163
Q

RadSeq

A
164
Q

Sbf

A
165
Q

ApeKI

A
166
Q

GBS

A
167
Q

RadTag

A
168
Q

sliding window analysis

A
169
Q

selective sweep

A
170
Q

Bobcat

A
171
Q

GPR158

A
172
Q

LECT2

A
173
Q

LECT

A
174
Q

TRPM

A
175
Q

what is meant when referring to the dynamic nature of gene expression?

A
176
Q

what are the pitfalls of gene expression analysis?

A
177
Q

what are some experimental approaches needed to understand gene expression?

A
178
Q

how are microarrays designed?

A
179
Q

how are microarrays analyzed?

A
180
Q

what are the 7 main steps to differential gene expression?

A
181
Q

how is RNAseq data analyzed?

A
182
Q

elaborate on microarray and RNAseq analysis?

A
183
Q

what are the main approaches to data analysis of gene expression?

A
184
Q

what is involved in the pre-processing to clean up data?

A
185
Q

elaborate a bit on inferential (t-tests and ANOVA) and descriptive statistics (scatter plots, volcano plots)

A
186
Q

what are inferential statistics?

A
187
Q

what are descriptive statistics?

A
188
Q

how do we interpret results for biological significance?

A
189
Q

how do we analyze clustering and heatmaps?

A
190
Q

how does gene ontology allow for understanding function?

A
191
Q

functional analysis

A
192
Q

gene expression differences

A
193
Q

microarrays

A
194
Q

RNAseq

A
195
Q

inferential statistics

A
196
Q

exploratory statistics

A
197
Q

oligos

A
198
Q

probes

A
199
Q

cDNA

A
200
Q

hybridization

A
201
Q

fluorescent tags

A
202
Q

Rett syndrome

A
203
Q

a b crystallin

A
204
Q

clustering

A
205
Q

classification

A
206
Q

northern blots

A
207
Q

western blots

A
208
Q

RT-PCR

A
209
Q

in situ hybridization

A
210
Q

technical replicates

A
211
Q

biological replicates

A
212
Q

RNAseq pipeline

A
213
Q

gene expression omnibus (GEO) databases

A
214
Q

metadata

A
215
Q

MIAME

A
216
Q

annotated reference

A
217
Q

FPKM

A
218
Q

fragment count

A
219
Q

isoforms

A
220
Q

preprocessing

A
221
Q

systematic bias

A
222
Q

normalization

A
223
Q

scatter plot

A
224
Q

volcano plot

A
225
Q

heat map

A
226
Q

validation

A
227
Q

gene ontology

A
228
Q

cellular component

A
229
Q

biological process

A
230
Q

molecular function

A
231
Q

enrichment analysis

A
232
Q

pathways

A