statistical modeling Flashcards

Question 1

Q

statistical model

Answer

A

a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data (and similar data from a larger population)

a statistical model represents, often in considerably idealized form, the data-generating process

a statistical model is usually specified as a mathematical relationship between one or more random variables and other non-random variables

Question 2

Q

standard error dependencies (models, eg parameters)

Answer

A

DMCS

quality of data (eg measurement errors)
quality of the model (ie fit / low bias)
collinearities (these can increase standard error)
sample size (asymptotically proportional to 1/sqrt(n))

Question 3

Q

permutation test

Answer

A

a Monte Carlo method to create a sampling distribution of a test statistic, such as a model parameter, by permuting the outcome variable values relative to the predictor tuples
eg the model is fit on each permutation, and the test statistic is recomputed
has the advantage of retaining the exact predictor distributions and whatever collinearities exist between the predictors
(normally, the same sampling distribution would be estimated via analytic methods, such as t-distributions for linear regression parameters)

Question 4

Q

rank transformation

Answer

A

some parametric statistical models are amenable to the rank transform, rendering them non-parametric;
eg,
* linear regression model, Y ~ A + B + C is parametric
* to transform to non-parametric use, rank(Y) ~ A + B + C, where rank assigns an ordinal (in order) to each value of Y
rank transformations may be useful for eg outliers, but may be difficult to interpret

Question 5

Q

variance partitioning property

Answer

A

for certain models, the variance “partitions” between that explained or accounted for by a model, and the remaining (residual) variation
total variance of the samples outcome variable = variance of the model output + variance of the residual(s)

Question 6

Q

how statistical models work (Kaplan)

Answer

A

statistical models partition variation
individual case = model value + deviation = amount model can explain + what model cannot explain

Question 7

Q

three main types of statistical models

Answer

A

description–describe a range or typical values of a quantity
classification or prediction
anticipating consequences of intervention–eg will a gas tax cause reduced consumption; related to causal modeling

Question 8

Q

ANOVA (for models)

Answer

A

the same methods for eg population mean ANOVA can be applied to models and residuals

general:
- SST = SSM + SSE, where SSM variance of fitted model output, and SSE is variance of model residuals
- after correcting for degrees of freedom and making some normality assumptions, the ratio MSM/MSE can produce an F value, whence to a p-value
- this is broadly applicable (just like R^2), regardless of the model type
per-variable effects
- each variable (model term) gets its own SS, MS, F value, and p-value
- the type of ANOVA (Type I, II, III) affects how ANOVA apportions effects among model variables, by determining how SS is computed for each term
  - eg Type I (sequential sum of squares) goes in order of predictors fed to the model: SS(p_k | p_1,…,p_{k-1}) = SS(p_1,…,p_k) - SS(p_1,…,p_{k-1}), for k=1 to number of predictors
  - if the predictors are correlated then Type I will give different per-predictor results, depending on ordering

Question 9

Q

some properties of covariates (in models)

Answer

A

aka confounding variables or nuisance variables
adding covariates to a model can never reduce R^2, only increase it or leave it unchanged
if covariates are correlated with explanatory variables, their inclusion will have an effect on model coefficients (of linear models)

statistical modeling Flashcards

(9 cards)