Learning curve Flashcards
rule of 10 000 hours of practice
idea that after 10 000 hours you would become expert in the field
no empirical data to support this theory
actually refuted
descriptive models
just fit the data - doesn’t tell anything about underlying cognition
cognitive models
parameters actually mean sth -> underlying mechanism in terms of cognition, derived from psychological theory
exponential model
P=1-exp(- μ*t)
where
P - performance scaled between 0 and 1 (proportion correct)
t - trial run
u - learning rate
what are characteristics of exponential model?
learning very quickly at the beginning, then plateau
fitting model to the data
estimate model parameters given the data
concave model of law of practice
both power and exponential functions are concave -> decelerating curve!
hyperbolic function
P = t/(t+d)
What is Gaussian noise?
type of statistical noise where probability density function (PDF) follows normal distribution
can be simulated with stats.normv or random.normal
What did Estes assumed about exponential law of practice?
change in performance over time depends on total performance yet to be achieved - elements to be learned
dP/dt = u(P max - P)
dP/dt = changing performance over time
P max - maximum performance
P - current performance
u - learning rate
What is alternative to concave models?
s-shape learning function!
when should you use concave exponential function?
P = 1 - exp(-ut)
while learning single words (items)
when should you use compound exponential function?
P = ( 1 - e **-ut)c
when one has to learn sets of c words (fragments)
maximum likelihood function
used to estimate parameters of probability distribution (pdf) by maximizing likelihood function, so that under the assumed statistical model the observed data is most probable
in short: given the model, find parameters for which data are most probable
probability
Prob(data/model, parameters)
data - people who pick option x
given
model - number of people asked
parameters - probability of picking option x
likelihood
Likelihood(parameters/model, data)
parameters - probability of picking option x
given
model - number of people asked
data - people who pick option x
what is log likelihood? why is it prefered for maximum likelihood calculations?
natural logarithm of likelihood function
-> it turns products into sums, making complex likelihood functions easier to deal with
(it pushes large numbers down, avoiding infinity calculations)
-> smooths out numerical instability issues that may occur when multiplaying small probabilites
How to use optimization to find maximum likelihood estimate?
the idea is that you want to find the optimum (max or min) - which is similar to finding deepest point in the lake
you can use
derivatives = give you local information about the slope or the direction of the function at a given point
positive derivative = function is increasing in this direction
negative derivative = function is decreasing in this direction
concavity (2nd derivative) = helps to inform whether you are in concave region (bowl) or convex region (hill)
- helps to get an idea how near you are to minimum/maximum
What is local optimum?
it is illusory ‘‘deepest lake point’’ - so it is lower, but not the lowest point in the lake
you can use special algorithms like simulated annealing or genetic algorithms to avoid it
What can we done instead of maximizing the likelihood?
You can minimize! -> then you minimize NEGATIVE log-likelihood
multiple regression
y = 3 - 0.2 x + 0.5 w
- what is B0?
3 = intercept!
baseline value of DV y when both x and w are zero
multiple regression y = 3 - 0.2 x + 0.5 w
- what is B1?
0.2 = slope/effect of predictor x on y
how much y changes when x changes by one unit
multiple regression y = 3 - 0.2 x + 0.5 w
- what is B2?
0.5 = slope/effect of predictor w on y
quantifies how much y changes when w changes by one unit
what is sigma?
standard deviation
assumes that residuals follow normal distribution (important because likelihood is based on assumption that errors -residuals- are normally distributed with sd)