statistical modeling Flashcards

1
Q

statistical model

A

a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data (and similar data from a larger population)

a statistical model represents, often in considerably idealized form, the data-generating process

a statistical model is usually specified as a mathematical relationship between one or more random variables and other non-random variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

standard error dependencies (models, eg parameters)

A

DMCS

  • quality of data (eg measurement errors)
  • quality of the model (ie fit / low bias)
  • collinearities (these can increase standard error)
  • sample size (asymptotically proportional to 1/sqrt(n))
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

permutation test

A
  • a Monte Carlo method to create a sampling distribution of a test statistic, such as a model parameter, by permuting the outcome variable values relative to the predictor tuples
  • eg the model is fit on each permutation, and the test statistic is recomputed
  • has the advantage of retaining the exact predictor distributions and whatever collinearities exist between the predictors
    (normally, the same sampling distribution would be estimated via analytic methods, such as t-distributions for linear regression parameters)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

rank transformation

A

some parametric statistical models are amenable to the rank transform, rendering them non-parametric;
eg,
* linear regression model, Y ~ A + B + C is parametric
* to transform to non-parametric use, rank(Y) ~ A + B + C, where rank assigns an ordinal (in order) to each value of Y
rank transformations may be useful for eg outliers, but may be difficult to interpret

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

variance partitioning property

A
  • for certain models, the variance “partitions” between that explained or accounted for by a model, and the remaining (residual) variation
  • total variance of the samples outcome variable = variance of the model output + variance of the residual(s)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

how statistical models work (Kaplan)

A
  • statistical models partition variation
  • individual case = model value + deviation = amount model can explain + what model cannot explain
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

three main types of statistical models

A
  • description–describe a range or typical values of a quantity
  • classification or prediction
  • anticipating consequences of intervention–eg will a gas tax cause reduced consumption; related to causal modeling
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

ANOVA (for models)

A

the same methods for eg population mean ANOVA can be applied to models and residuals

  • general:
    • SST = SSM + SSE, where SSM variance of fitted model output, and SSE is variance of model residuals
    • after correcting for degrees of freedom and making some normality assumptions, the ratio MSM/MSE can produce an F value, whence to a p-value
    • this is broadly applicable (just like R^2), regardless of the model type
  • per-variable effects
    • each variable (model term) gets its own SS, MS, F value, and p-value
    • the type of ANOVA (Type I, II, III) affects how ANOVA apportions effects among model variables, by determining how SS is computed for each term
      • eg Type I (sequential sum of squares) goes in order of predictors fed to the model: SS(p_k | p_1,…,p_{k-1}) = SS(p_1,…,p_k) - SS(p_1,…,p_{k-1}), for k=1 to number of predictors
      • if the predictors are correlated then Type I will give different per-predictor results, depending on ordering
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

some properties of covariates (in models)

A
  • aka confounding variables or nuisance variables
  • adding covariates to a model can never reduce R^2, only increase it or leave it unchanged
  • if covariates are correlated with explanatory variables, their inclusion will have an effect on model coefficients (of linear models)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly