lecture 4 - model fitting Flashcards

Question

we want to find parameters that

Answer 1

1. **minimise the (euclidean) distance** between the model and the data (SSE) 2. **maximise the likelihood** of the data, given the model parameters (MLE)

Answer 2

- p(y∣θ) - tells us how likely the data is given a specific set of parameters

Answer 3

directly uses the likelihood function to find the set of "best" parameters that maximizes the likelihood of the observed data

Answer 4

By assigning probabilities to data points instead of deterministic values, the model assumes the data is generated as y=f(θ)+N(0,σ), where noise follows a normal distribution.

Answer 5

- function of θ - can be a vector, matrix, distribution, etc. of parameters - each setting of θ defines a specific model instance

Answer 6

- each set of parameters defines a model instance and a probability distribution over the outcomes of y, rather than a deterministic value - MLE identifies the parameter set (θ-hat) that maximizes p(y∣θ).

Answer 7

- **p(y∣θ) - likelihood**: by varying θ, we can quantify the probability of the data given θ - **p(θ|y) - posterior**: from this, you can make a distribution that shows the value of θ that maximizes the likelihood of the data p(θ|y)

Answer 8

**MLE** 1. only considers the likelihood term 2. ignores prior probability term as MLE does not use strong prior expectations 3. ignores marginal probability term since it uses fixed data **bayes** 1. combines evidence, expectations, and hypotheses

Answer 9

- problem 1: works with probabilities of observations, which can become really small (p = 0.00001) for the entire data set really fast if something is unlikely - solution 1: use logarithm of the likelihood (loglikelihood) - problem 2: many methods are used to work with distances (e.g., SSE), which are minimised, not maximized - solution 2: minimize negative log-likelihood, as this is mathematically equivalent to maximizing log(L), but it aligns with standard minimization-based optimization frameworks

Answer 10

- depends on the **quality** and **amount** of data - with more data, the landscape becomes smoother, making optimization easier and more reliable. - **sparse** or **noisy** data can lead to a rugged likelihood surface with many local minima.

Answer 11

they are dependent on each other

Answer 12

- simplest optimiser - exhaustively goes through all parameter combinations in a grid to find the best model

Answer 13

- higher number of parameters leads to exponential growth of the number of evaluations, causing computational explosion - therefore only feasible for models with very few parameters and small search spaces

Answer 14

- more complicated optimiser - systematically searches for the minimum of the likelihood landscape by following the gradient - more efficient than grid search — can be done in fewer iterations - efficient for larger parameter spaces

Answer 15

- relies on a smooth likelihood landscape. - if the landscape has multiple local minima, e.g., for complicated fits (non convex parameter space), gradient descent might converge to a suboptimal solution

Answer 16

1. set up a cost function J(θ) 2. set up an objective function that minimizes J(θ) based on distance (SSE) or likelihood (MLE) 3. on each iteration, determine the direction to step in based on the gradient at the present step 4. update based on the gradient multiplied by the learning rate 5. repeat until convergence

Answer 17

potentially overshooting the minimum or leading to divergence, oversensitive to noise

Answer 18

slow convergence and potentially getting stuck in local minima if the parameter space is not smooth enough

Answer 19

gradient descent is mathematically the same to RL speed-accuracy trade off: a good choice of the learning rate balances speed and stability

Answer 20

- problem: complicated fits (non-convex) run the risk of getting stuck in a local minimum/maximum - solution: difficult to find if you really found the global solution, so choose starting values wisely and check results to see if multiple intializations converge to the same solution

Answer 21

- allows us to specify the same model for all participants, and fit **per-subject** and **group-level** parameter estimates.

Answer 22

- fitting comes with different problems, like sometimes we have different sets of parameters that produce the same outputs. - we need to know if our problem is fittable - recovery ensures your modeling process produces results that are interpretable and replicable

Answer 23

1. simulate using fitted parameters 2. add noise 3. refit 4. check if parameters can be recovered

Answer 24

- if parameters come from this distribution, they should be able to simulate data that is close to the original dataset - if your model cannot recover known parameters, it may be overfitting, underfitting, or poorly specified.

Answer 25

‘All models are wrong. But some models are useful’