Midterm 2 Flashcards
Signal Detection theory:
- hit
- miss
- false alarm
- correct reject
hit: correct answer»_space; when signal is present and decision is yes
miss: wrong answer»_space; signal is present and decision is no
false alarm: wrong answer»_space; signal is absent and decision is yes
correct reject: correct answer»_space; signal is absent and decision is no
internal response
variable/value that forms basis of observer’s decision (x axis)
criterion on a signal present/absent graph
- false alarm? correct reject? hit? miss?
- before criterion line is no, right of the criterion line is yes
- any area under the signal absent curve (left) that is after the line is a false alarm, and any area before it is a correct reject
- any area under the signal present curve (right) that is after the line is a hit, and any area before it is a miss.
accuracy equation
(#present%hits + #absent%CR)/total**
- need present/absent percentages/numbers in order to calculate accuracy.
**total = present + absent
how to increase accuracy (2)
- how good is your accuracy? comparison
- information acquisition: increases correct responses (hits and CRs)
- criterion change: leads to trade off btwn hits and CRs
- if your accuracy is worse that what would occur by chance, it is shit accuracy
Why could peak accuracy be greater in a 20% present, 80% absent case compared to a 50/50 case?
- since it’s 80% absent, it is good to maximize correct rejects so you would move the criterion more to the right (more conservative»_space; say no more often»_space; maximizing tumor absent correctness). Since it’s only 20% present, chances of misses are low.
- for 50/50, moving it in either direction would have trade-offs
why would you change the criterion? (3)
- when maximizing accuracy»_space; difference in signal present/absent
- special case: when 50/50 present/absent»_space; optimal criterion = where graphs intersect
- when optimizing a parameter other than accuracy (eg. cost)»_space; balance (where they intersect) between cost of FAs vs cost of misses to minimize total cost
calculating total cost (money wasted by incorrect responses)
present%misscost + #absent%FAcost
Discriminability + reducing errors (2)
- being able to distinguish btwn stimuli»_space; errors due to overlap
2 ways to decrease the overlap
- increase separation
- reduce spread
Cohen’s d (d’)
- what does it represent?
- how to increase?
- what if you don’t have sigma?
- worst case?
- represents magnitude of effect of IV on DV (interval/ratio); expressed in units of SD
d’ = separation/spread = (u2-u1)/sigma
- if you don’t have sigma you can used pooled SD sqrt((SD1^2 + SD2^2)/2)
- inc d’ by increasing separation or decreasing spread
worst case scenario: d’ = 0»_space; no separation = no information
parameter vs statistic
parameter: true value of quantity in popn
statistic: value of the same quantity based on a sample (statistic used to estimate parameter)
u vs M
- accuracy or precision?
u = population mean
M = sample mean»_space; unbiased estimator of u
- unbiased = accuracy, not precision
sigma^2 vs s^2
- why squared?
- SD relation?
sigma^2 = popn variance s^2 = sample variance >> unbiased estimator of sigma^2
- s means standard deviation (SD = sqrt of variance)
- SD (s)»_space; unbiased estimator of sigma
Gaussian Distribution
- characteristics
- probability density + total area under curve
Characteristics:
- normal distribution/bell curve
- typically used for weight/height/IQ scores/exam scores
- unimodal
- symmetric
- goes from -inf to inf (No max/min)
- probability of any single value is zero if it is a probability density graph (probability = area under the curve)
- total area under the curve = 1
Gaussian Distribution:
- 1 SD, 2 SD, 3 SD»_space; chance of value occurring?
- SDT warning!
within:
1 SD: 68%
2 SD: 95%
3 SD: 99%
- when calculating SDT, the percentages exclude the tails»_space; beware!
- sampling distribution is based on the assumption that H0 is TRUE!
Uniform Distribution
- each event is equally likely (eg. throwing a fair dice = 1/6 probability)»_space; discrete
Poisson Distribution
- few events vs many events
- usually positively skewed
- used when random events occur at a certain rate over a fixed time period (eg. hourly # of customers at a bank)
- if expecting few events, it will be positively skewed
- if there are more events, distribution will become more symmetric
z-score
- difference in score as a proportion of the SD (with respect to the population!)»_space; units of SD
(basically, how many SD units away from the mean you are) - if you are ranking one score within popn:
(x-u)/sigma - similar to d’
- if you are finding the sample distribution of the sample mean
(Xavg - u)/(sigma/sqrt(n))
“the standard normal”
distribution of z scores
- M = 0 and SD = 1 (same for t- score!)
percentile rank
- how is it similar to z-score?
- what if its gaussian?
percent measurements of score value in the distribution below that value (eg. a score in the 99th percentile = 99% of all scores are below)
- both z-score and percentile rank both look at relative standing
if it’s gaussian we can calculate based on z-score (eg. z=1»_space; one SD from the mean (M = 50)»_space; add 34 to 50 = 84%)
how to calculate percentile from standard normal distribution table
- what if z score is negative?
- what does percentile represent?
- find first 2 digits in the first column, find third digit in the first row
- if z score is negative, do 1 - (positive percentile)
- percentile represents area before (left of) the z-score
Sir Francis Galton: CLT
- rule of thumb
central limit theorem: if x is the sum of identically distributed (uniform) variables, with a non-zero SD, then the distribution of x will approach gaussian
- rule of thumb: mean distribution will be gaussian if n>30
- even if the original data is skewed, the avg will be gaussian
effect size
- small, med, large
describes relationship among variables in terms of size/amt/strength»_space; descriptive»_space; shows extent to which results are meaningful
effect size based on Cohen’s d:
small: 0.2
med: 0.5
large: 0.8
purposes of inferential stats (2)
parameter estimation: estimate value of population parameter based on random sample
hypothesis testing: whether effect occurred by chance or not (probability)