Advanced Design and Data Analysis Flashcards

1
Q

Define and describe variance

A

For estimating population variance:
* Person score-population mean/population is unbiased
○ Makes fewer extreme estimates
○ Unrealistic though because it’s rare that we know the population mean
* (Person score-sample mean)^2/population is biased
* (Population score-sample mean)^2/population-1 is unbiased
○ This is unbiased because the sample mean is not the same as the population mean (used in the first option), the sample scores are going to be closer to the sample mean than the population mean, so the squared deviation terms are going to tend to be underestimates. Therefore, (n-1) corrects for that by making the result slightly bigger (which has a larger proportional effect when the n is small - because when the sample is larger it will be a closer estimation to the population mean)

A sampling distribution is a sample of difference variances estimated by the formulas, from it you can see how biased/unbiased a predictor is if you know the actual population variance. You can get a more accurate prediction because it creates an average of many estimates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is power?

A

To say that a test has 80% power means it has 80% chance of rejecting the null hypothesis given that the null hypothesis is false, and given
* A particular sample size
* Particular effect size
* Particular alpha level (often .05 probability of rejecting the null hypothesis)
* Other considerations, including those related to whether the assumptions of the test are satisfied

Reflections on power:
* We don’t have great intuitions about sample size as it relates to power.
○ Our intuitions may have been warped by seeing psychology journals reporting findings with very small sample sizes but statistically significant results
§ Example of publication bias (when non-significant studies tend to get chucked out

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explain Statistical inference issues

A

-you cannot prove or disprove theories
-They provide probabilistic information and at most can corroborate theories
-significance tests do not allow probabilities to be assigned to any particular hypothesis
-with an alpha level of .05, we should reject the true null hypothesis 5% of the time (kind of like a margin of error). However, that is a global error rate, and doesn’t tell us the probability of a local mistake

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain P-Values

A

p is the probability of getting our observed result, or a more extreme result, if the null hypothesis is true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain confidence intervals

A

General formula for confidence intervals for the population mean is M +/- Margin of error
If you’re randomly sampling from a normally distributed population:
Issue is it is VERY rare to know the population standard deviation
If you don’t know the standard deviation you will base it off a t-distribution rather than a normal distribution
Cut-offs will be different, and with larger sample sizes it will be more similar to a normal distribution
But don’t just use the cut-offs for a normal distribution because they won’t be the same
INTERPRETING CONFIDENCE INTERVALS
* If we ran many studies, 95% of the intervals would contain the population mean
* We don’t know whether this particular interval does or doesn’t

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the difference between p-values and confidence intervals?

A

Advantages of confidence intervals
○ They give a set of rules that, if they were your null, would have been rejected
○ They tell you about the precision of your estimate
An advantage of p-values
○ The give a clearer indication of the evidence against the null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe the replication crisis and 5 suggested reasons for it

A

The ‘file drawer’ problem
* The bias introduced in the scientific literature by selective publication - chiefly by a tendency to publish positive results, but not to publish negative or nonconfirmatory results

Gelman (2016) mentions five reason:
* Sophistication
○ Since psych was focussing on more sophisticated concepts than other disciplines, it made it more open to criticism
* Openness
○ Has culture in which data sharing is very common - easier to find mistakes
* Overconfidence deriving from research design
○ Researchers may feel that they can’t go wrong using simple textbook methods, and their p-values will be ok
* Involvement of some prominent academics
○ More of its leading figures have been dragged into the replication crisis than other disciplines
* The general interest of psychology
○ Methods are very accessible so more people are willing to find mistakes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are some routes to the replication crisis

A

Outright fraud (rare)
P-hacking, data dredging, data snooping, fishing expeditions (rarer than is commonly believed)
○ Looking for what people want to find
○ Sifting through the data
The garden of forking paths (more common than generally realised)
○ Only run the experiment that seems like it might be significant based on the data (looking at data before doing the tests, if data was different, might have done other tests)
○ Solution could be to make requirements to preregister the hypotheses and methods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are some typical experimental designs?

A

Between-subjects design
* Different participants contribute data to each level of the IV
* But differences might be due to differences between participants
○ Random assignment can help reduce this
○ Or matching - balance out the conditions

Within-subjects design
* Each participant gets exposed to each level of the IV
* Major concern with this is sequencing effects (each previous level having an influence on the next level)

Single factor design
* Only one IV
* Can have two levels (each placebo and treatment), or more than two levels (eg placebo, treatment level 1, treatment level 2)

Factorial designs
* More than one IV
* Analysed with two-way ANOVA
* Interaction effects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Explain correlational research and its uses

A

nvestigates the relationships between two (usually continuour) variables - manipulate on variable to observe the effect on the other
This type of research is often linked with th concepts of correlation and regression
○ Regression
§ Predicting a variable from other variables in a regression model
Designs are useful when
○ Experiments cannot be carried out for ethical reasons
○ Ecological validity is a priority
While correlation and regression are associated with correlational research, ANOVA and t-tests are associated with experimental research
This distinction is bogus, can use any of them with another research design

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a quasi-experimental design?

A

○ Groups occur naturally in the world; cannot be random assignment to groups
§ Eg comparing men and women
○ Often used in program evaluation
§ Provides empirical data about effectiveness of government and other programs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are some issues with null hypothesis significance testing

A

If power is low, there may be a difference, but you don’t see it
If power is very high (if you have a very large sample size you will likely have high power), even small differences can seem significant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Explain empiricism

A

out about the world through eveidence
○ Could be considered as:
§ Observation = truth + error
§ Observation = theory + error
§ Observation = model + error
○ We need a theory of error to fit our models
○ Classical methods in statistics tend to assume errors are normally distributed
§ Gauss was first to fullt conceptualise normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why do we use linear models

A

○ Easy to fit
○ Commonly used
○ Lots of practical application (prediction, description)
○ Provide a descriptive model that is very flexible
○ Have assumptions that are broadly reasonable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the theory of error?

A

§ Often assume that the ‘error’ is normally distributed with zero mean
□ A theory of error
® It is the error term that requires statistical techniques
® The real question - eg relationship between age and IQ isn’t statistical at all

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why do we assume normal errors?

A

○ Two broad categories of justification for building models around the assumption of normal errors
§ Ontological
□ Study of nature of being
□ Normal distributions occur a lot in the world so let’s build model around them
□ Any process that sums together the result of random fluctuations has a good chance of somewhat resembling normal distributions
□ Sampling distributions and many statistics tend towards normality as sample size increases

	§ Epistemelogical
		□ Normal distributions represent a state of our knowledge (more like ignorance)
		□ Don't contain any info about the underlying process, except its mean and variance
		□ Should still be interested in underlying process, but when we don't know anything about it, it's best to take in as few assumptions as possible
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Explain common model assumptions

A

○ Validity
§ Ensure data is relevant to the research question
○ Representativeness
§ Want sample to represent population as well as possible
○ Additivity and linearity
§ Most important mathematical assumption
§ Want the non-error part of function to be a linear model
○ Independence of errors
○ Equal variance of errors
§ Also referred to as homogeneity/homoscedasticity
○ Normality of errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Explain moderation vs mediation

A

A
Moderation
○ Situations where the relationship between two variables depends on another variable
Mediation
○ Indirectly through another variable
○ Inherently causal
○ X causes M causes Y
○ Maybe X also causes Y, but it doesn’t have to

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Explain mediation effects

A

Mediating variables transmit the effect of an independent variable on a dependent variable
Mediation is important in many psychological studies
It is the process whereby one variable acts on another through an intervening (mediating) variable
○ Eg Theory of Reasoned Action
§ Attitudes cause
intentions, which cause behaviour
Simplest mediation model contains three variables
○ Predictor variable X
○ Mediating variable M
○ Outcome variable Y
○ Causal model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the causal steps approach to mediation

A

○ Based on the above regression equations and involves 4 requirements
§ X directly predicts Y (ie coefficient c is significant
§ If c is significant, X directly predicts M (ie coefficient a is signficiant)
§ M directly predicts Y (coefficient b is significant)
§ When both X and M predict the Y, the effect of X is either:
□ Reduced (coefficient c’ is smaller than c, though both remain significant), and there is partial mediation, or
□ Eliminated (ie coefficient c’ is not significant) - then there is full mediation
○ If any of the four requirements are not met stop everything

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the Baron and Kenny approach to mediation?

A

Independent regressions of the IV to DV, IV to MV, and IV + MV to DV. Mediation occurs if the effect of the IV is reduced when MV is introduced, but only if the IV was significant in the first place

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the basic idea behind Principal Component Analysis

A

○ We have multivariate data (we’ve measured participants on multiple variables)
○ We fit straight lines to our data and call the lines ‘Principal Components’ (PCs)
○ 1st PC is the best line we can fit
○ 2nd PC is second best line we can fit etc
○ Maximum number of PCs = number of variables in our dataset
○ We want to represent our data with fewer PCs
○ Correlated continuous variables, and reducing them into the least amount of factors while keeping the data
○ Aims to fit straight lines to data points
§ Second best line is line fitting the errors of the components
§ Eg reducing alcohol dryness and content into one component, while still describing the alcohol fully
□ First line reduces the diagonal distances from the data points and the principal component line
□ Worst will always be perpendicular/orthogonal to the principal component
* Can also be thought of in terms of dimensions
○ If you have n variables, you are in n dimensional space
○ Maybe there are new axes that make life simpler
○ Maybe you don’t need the full n components to describe your data well

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Explain the MAP test for determine how many components to extract for PCA

A

○ Velicer devised a test based on partial correlations known as the Minimum Average Partial Correlation test
§ After each component Is extracted, it (and those extracted before) gets partialled out of the correlation matrix of original variables, and the average of the resulting partial correlations are calculated
§ As more correlations are partialled out, the partial correlations approaches 0
§ But at some point components that reflected ‘noise’ would be partialled out so the average partial correlation would begin to rise
§ Choose number of components corresponding to minimum average partial correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Describe component rotation

A

○ With n variables, you can have n components
§ These are completely determined and follow directly from the matrix operations
○ But if we only use smaller number of components, there is some freedom in the final solution
§ In particular, we can rotate components to get a simple structure
§ Axes get rotated until the variables are tending to load on one component only, to as great an extent that is possible, and they are as close to 0 on the other components as possible
§ With large component loadings or some variables, and small component loadings for others
§ Used more in FA than PCA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What are different types of component rotations?

A

○ Orthogonal
§ Components/factors remain uncorrelated
§ Quartimax simplifies the variable pattern of the loadings
§ Varimax (most common method) simplifies factor patterns of loadings
§ Equamax a comprimise simplification of variable and factor pattern simplification
○ Oblique
§ Components/factors correlated
§ Direct Oblimin
§ Promax
§ Both offer control of the degree of correlation of factors
□ Direct oblimin
® Delta (ranging from -0.8-0.8)
□ Promax
® Kappa (ranging from 1 upwards

*recommended to do oblique because it is more realistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What matrices should be interpreted for orthogonal rotation?

A

Rotated component/factor matrix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What matrices should be interpreted for oblique rotation?

A

Pattern, structure, component/factor correlation matrix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What are the assumptions of the common factor model in EFA?

A

-Common factors are standardised
-Common factors are uncorrelated
-Specific factors are uncorrelated
-Common factors are uncorrelated with specific factors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Explain the rationale behind partial correlation

A

○ Suppose a correlation of 0.615 between items
1. ‘Don’t mind being the centre of attention’
2. ‘Feel comfortable around other people’
○ Correlation between item 1 and extraversion is 0.82
○ Correlation between item 2 and extraversion = 0.75
○ The aim is to find a latent or unobserved variable which, when correlated with observed vairables, leads to partial correlations between the observed variables that are as close to 0 as e can get

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What are some practical issues for EFA?

A

A
○ Interval or ratio data
§ If proceeding with it ordinal data, say that you are aware of it being problematic, but you are continuing anyway for the sake of the assignment
○ Adequate sample size
○ Any missing data dealt with
§ Either impute the missing data or delete the cases
○ Decently high correlations
○ Linearity
§ Misleading results for non-linear relationship
§ Look at scatterplots
§ Don’t bother converting data to linear relationships in assignment
○ Weak partial correlations
○ Absence of outliers
○ Absence of multicollinearity/singularity
○ Distribution appropriate to the method used to extract the factors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is the Guttman-Kaiser image approach?

A

-Used in EFA
○ Image analysis involves partitioning of the variance of an observed variable into common and unique parts, producing
§ Correlations due to common parts,
□ Image correlations
§ Correlations due to unique parts
□ Anti-image correlations (should be near 0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is the difference between PCA and EFA?

A

○ Principal components are ‘just’ linear combinations of obsereved variables. Factors are theoretical entities (latent variables)
○ In FA, error is explicitly measured, in PCA it isn’t
○ If another factor is added (removed) , the factor loadings of the others will change. If another component Is added (removed) that other component loadings stay the same
§ Not the case in PCA
○ Unlike PCA, FA is a theoretical modelling method, and we can test the fit of out model
○ FA fragments variability into common and unique parts, PCA doesn’t
○ PCA runs using single canonical algorithm and it always works. FA has many algorithms (some may not work with your data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What are the similarities between EFA and PCA?

A

-Both have same general forms
-They deliver similar results especially if number of variables is large
-If you loosely define ‘factor analysis’ as a method for suggesting underlying traits, PCA can do that too

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

How to know whether to use PCA or EFA?

A

-Run EFA if you wish to test a theoretical model of latent factors causing observed variables
-Run PCA if you want to simply reduce your correlated observed variables to a smaller set of important uncorrelated composite variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is Widaman’s conclusion regarding using PCA vs EFA

A

‘the researcher should rarely, if ever, opt for a component analysis of empirical data if their goal was to interpret the patterns of covariation among variables as arising from latent variables or factors’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Explain Structural Equation Modelling in a nutshell

A

An umbrella term for a set of statistical techniques that permit analysis of relationships between one or more Ivs and DVs, in possibly complex ways
○ Also known as causal modelling, causal analysis, simultaneous equation modelling, analysis of covariance structures
○ Special types of SEM include confirmatory factor analysis and path analysis
SEM enables a combined analysis that otherwise requires multiple techniques
○ For example, factor analysis and regression analysis
The modelling of data by joining equations [1] and [2] is known as structural equation modelling
That aspect of the model concerned with equation [2] is often called the measurement model
That part focusing on equation [1] is known as the structural model
If the structural model contains observed variables but no latent factors, we are doing a path analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Explain the difference between Confirmatory factor analysis and Exploratory Factor Analysis

A

Exploratory factor analysis can impose two kinds of restrictions
○ Could restrict the number of factors
○ Constrain the factors to be uncorrelated with an orthogonal rotation
Confirmatory factor analysis can restrict factor loadings (or factor correlations of variance) to take certain values
○ A common vale: zero
○ If factor loading was set to zero, the hypothesis is that the observed variable score was not due to the factor
Moreover,
○ Using maximum likelihood and generalised least squares estimation, CFA has a test of fit
○ So, it’s possible to test the hypothesis that the factor loading is zero
○ If the data fit the model, hypothesis is supported
○ Hence name confirmatory factor analysis
-CFA provides us with a confirmatory analysis of our theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What are some issues with CFA?

A

Sample size
○ Wolf et al. (2013) show ‘one size fits all’ rules work poorly in this context
○ Jackson, (2003), provides support for the N:q rule
§ Ratio cases (N) to parameters being estimated (q)
§ >20:1 recommended
§ >10:1 likely to cause problems
○ Absolute sample size harder to assess
§ N=200 is common but may be too small
§ Barrett (2007) suggests journal editors routinely reject any CFA with N<200
Significance testing
○ Kline (2016) reports diminshed emphasis on signficance testingf because
§ Growing emphasis on testing the whole model rather than individual effects
§ Large-sample requirement means even trivial effects may be statistically significant
§ P-value estimates could change if we used a different method to estimate model parameters
§ Greater general awareness of issues with significance testing
Distributional assumptions
○ The default estimation technique (maximum likelihood) assumes multivariate normality
§ Possible to transform variables to obtain normality
§ Widaman (2012): maximum likelihood estimation appears relatively robust to moderate violations of distributional assumptions
§ Some robust methods of estimations are available
○ CFA generally assumes continuous variables
§ Some programs have special methods for ordered categorical data
Identification
○ Necessary but insufficient requirements for identification
§ Model derees of freedom must be greater than or equal to 0
§ All latent variables must be assigned a scale
§ Estimation is based on solving of a number of complex equations
○ Constraints need to be placed on the model (not the data) in order for these equations to be solved unambiguously
Model is identified if it’s theoretically possible for a unique estimate of every model parameter to be derived

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Q
Explain the differences between underidentified, just-identified, and overidentified - which is ideal?

A

Underidentified:
-Not possible to uniquely estimate all the model’s free parameters (usually because there are more free parameters than observations. Need to respecify your model

Just-identified:
-Identified and has the same number of observations as free parameters (model df = 0). Infinite series of possible answers. Model will reproduce your data exactly, so won’t test your theory.

Over-identified:
Identified and has more observations than free parameters (df > or equal to 1). permits discrepancies between model and data, permits test of model fit, and of theory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What are some methods for estimation of a CFA model?

A

○ As for EFA, the most commonly used are
§ Unweighted least squares
§ Generalised least squares, and
§ Maximum likelihood
○ ML is often preferred, but assumes normality
○ Some more exotic methods for handling special types of data are available (but not taught in this course)
○ If you’re picking between two methods and they yield substantially different results, report both

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What are the global fit statistics for a CFA model and their cut-offs?

A

Chi-square:
-If chi Square isn’t significant, it could be because you have a large sample size, so doesn’t necessarily mean anything bad, but if it’s significant it could also be a factor because of a small sample size, so take it with a grain of salt
-0 with perfect model fit and increases as model misspecification increases
-p=1 with perfect model fit and decreases as model misspecification decreases

Standardised Root Mean Square Residual (SRMSR):
-should be less than .08
-Transforms the sample and model-estimated covariance matrices into correlation matrices

Comparative Fit Index (CFI):
-should be above .95
-Compares your model with a baseline model - typically the independence (null) model

Tucker-Lewis Index (TLI)
-Also known as the non-normed fit index (NNFI)
-Relatively harsher on complex models than the CFI
-Unlike the CFI, it isn’t normed to 0-1
-Highly correlated with CFI so don’t report both

Root Mean Square Error of Approximation (RMSEA):
-Less than .05, over .1 is unacceptable
-Acts to ‘reward’ models analysed with larger samples, and models with more degrees of freedom

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Discuss the components of CFA, Path Analysis, and ‘full’ SEM

A

CFA:
-not a structural model
-is a measurement model
-has latent variables
-has observed variables

Path analysis
-Is a structural model
-Is not a measurement model
-Does not have latent variables
-Has observed variables

‘full’ SEM:
-Is a structural model
-is also a measurement model
-has latent variables
-has observed variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

When will correlation and regression be the same?

A

Regression and correlation will only be the same if the variance of x is the same as the standard deviation of x and y (denominator will be the same

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

What are Path models?

A

Path models are expressed as diagrams
The drawing convention is the same as in confirmatory factor analysis
○ Observed variables are drawn as rectangles
○ Unobserved variables as circles/ellipses
○ Relations are expressed as arrows
§ Straight, single headed arrows are used to indicate causal or predictive relationships
§ Curved, double-headed arrows are used to represent a non-directional relationship such as correlation or covariance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

What are two types of path models?

A

○ Recursive
§ Simpler
§ Unidirectional
§ The residual error terms are independent
§ Such models can be tested with a standard multiple regression
○ Non-recursive
§ Can have
□ Bidirectional paths
□ Correlated errors
□ Feedback loops
§ Such models need structural equation software to fit them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

How can you model data for a Path analysis?

A

A
○ Can be done via
§ Multiple regression
§ Structural Equation Modelling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Compare regressions and SEM

A

Regression weights agree perfectly, but
Standard errors differ
Standardised regression weights differ
The squared multiple correlation is rather less in SEM
And we did get a warning regarding the uncorrelated predictors
Multiple regression must model the correlations among the independent variables, although this is not shown
○ A path analytic representation is thus a much more accurate representation
§ And gives more information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

Describe McCallum and Austin’s (2000) idea behind assessing model fit

A

They essentially say that no model is ever going to be perfect, so the best you can ask for is a parsimonious, substantively meaningful model that fits the observed data adequately well. But at the same time you also need to realise that there will be other models that do just as good of a job. So finding a good fit does not imply that a model is correct or true, but plausible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

What do model test statistics seek to find?

A

§ ‘It the variance-covariance matrix implied by your model sufficiently close to your observed variance-covariance matrix that the difference could plausibly be due to sampling error?’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

What are approximate fit indices?

A

○ Approximate fit indices ignore the issue of sampling error and take different perspectives on providing a continuous measure of model-data correspondence
○ Three main flavours available under the ML estimation
§ Absolute fit indices
□ Proportion of the observed variance-covariance matrix explained by the model
□ Eg SRMR
§ Comparative fit indices
□ Relative improvement in fit compared to a baseline
□ Eg CFI
§ Parsimony-adjusted indices
□ Compare model to observed data but penalise models with greater complexity
□ Eg RMSEA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

What are some limitations of global fit statistics?

A

○ Kline (2016) six main limitations of global fit statistics
§ They only test the average/overall fit of a model
§ Each statistic reflects only a specific aspect of fit
§ They don’t relate clearly to the degree/type of model misspecification
§ Well-fitting models do not necessarily have high explanatory power
§ They cannot indicate whether results are theoretically meaningful
§ Fit statistics say little about person-level fit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

Explain tests for local fit and why they are used

A

Growing recent acknowledgement that good global fit statistics can hide problems with local fit, ie poor fit in specific parts of your model
Various methods of testing local fit, some quite complex, described in Thoemmes et al. (2018)
Some simpler methods are also possible
○ Examining Modification Indices
○ Examining Residual Covariances

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

When examining the residual covariances as a test for local fit, what matrices can we look at, and which one do we want to use?

A

A
-Sample covariances: our input variance-covariance matrix

-Implied covariances: the model-implied variance-covariance matrix

-Residual covariances: differences between sample and implied covariances

-Standardised residual covariances: ratios of covariance residuals over their standard errors. This is the one we want to use

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

What happens when assumptions are not met?

A

Model can be incorrectly rejected as not fitting
Standard errors will be smaller than they really are (ie parameters may seem significant when they are not
Solve these problems through bootstrapping
○ To assess overall fit: Bollen-Stine test
To obtain accurate standard errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

Explain bollen-stine bootstrapping

A

Parent sample gets transformed so covariance matrix has perfect fit. Chi-square value for this would be 0. Transformed samples will fit pretty well because their parents sample has perfect fit. But the bootstrapped samples won’t fit exactly the same, so a model is said to have good fit if it performs better than 5% of the bootstrapped samples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

Explain naive bootstrapping

A

Take new samples from the observed dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

Explain linear vs non-linear models

A

Linear models
○ Changes in x produce the same changes in y regardless of the value of x
○ Eg
§ If someone’s height increases from 100 to 110, we predict an increase in weight from 31 to 37.6 (+6.6) kg
§ If their height increases from 200 to 210, we would predict a weight increase from 97 to 103.6kg (+6.6kg)
Non-linear models
○ Changes in x produce change in y that depends on the value of x
There are many cases where linear models are inappropriate
○ Not everything increases or decreases without bounds
§ Sometimes we have a lower bound of zero
§ Sometimes we might have an upper bound of some kind
○ Not everything changes by the same amount every time
§ Negatively accelerated functions: learning over time, forgetting over time, increase in muscle mass with training etc
§ Positively accelerated functions (eg exponential growth): spread of infections, population growth etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

What is logistic regression?

A

Regression on binary outcomes
○ What has two outcomes
§ Predicting whether someone is alive or dead
§ Predicitng whether or not a student is a member of a group
§ Predicting a participant’s two choice data
□ Accuracy! There are many cases where responses are scored either correct or incorrect
□ Yes vs no responses
□ Category A vs Category B (categorisation)
□ Recognise vs not-recognise (recognition memory)
-instead of predicting Y=0 or 1, we model the probability of Y=1 occurring, this is a continuous function ranging between 0 and 1
-Specifically, we model the log odds of obtaining Y=1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

What are log odds?

A

-We predict the logarithm of the odds as a regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

Difference between log odds and regular odds

A

Odds:
-P(Y=1)/P(Y=0)
-Suppose P(Y=1) = 0.8, then P(Y=0) = 0.2
Odds = 0.8/0.2 = 4
If odds >1 then Y=1 is a more probable outcome than Y=0. If the odds = 1 then it’s 50/50. If odds < 1, then Y=0 is more probable than y=1.
Bounded - only positive

Log Odds:
Log P(Y=1)/P(Y=0)
Log odds are unbounded
If log odds>0 then odds > 1 so Y=1 is more probable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

What is the generalised linear model?

A

Has some form as linear model, but is now written as a function of y. Eg, f(Y)=a+b1x1+b2x2+bnxn + e (linear model form).
This function is called the link, and can sometimes be written as mew.
The appropriate function/link can allow linear techniques to be employed even if the data is not linear.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

What are some links for the generalised linear model?

A

Identity link:
-mew=y
-This gives the linear model

Logistic link:
mew=logP(Y=1)/P(Y=0)
-Used for binary variables
-Gives logistic regression model

Logarithm link:
-mew = log(Y)
-used for counts or frequencies
-gives loglinear model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

Why do we have links for generalised linear models?

A

Standard linear regression conforms to certain assumptions: data are unbounded (-infinity to infinity). but datasets do not always meet these assumptions. So regression equation is substituted into the logistic function (for instance) and then predictors can range from negative infinity to positive infinity, but the logistic function makes it so they can only predict values between 0 and 1.

64
Q

Explain logistic regression

A

Best way to predict a binary variable from other variables. (make sure your dependent variable is coded (0,1), which is arbitrary, you can reverse code this).
-no assumption of normality (binary data come from a binomial or bernoulli distribution)
-No assumption of linearity
-No assumption of homoscedasticity (equal variances) - because with binomially distributed data the variance depends on the probability or frequency. As the probability/expected frequency approaches 0 or 1, the variance approaches 0.

65
Q

What are the assumptions of logistic regression?

A

Binary outcomes which are mutually exclusive
-Independence of observations

66
Q

How do we fit a logistic model to the data?

A

Maximum likelihood estimation:
-maximise the log likelihood of the data under the model parameters.
-For each observation we have predicted probability from the model. The closer the probabilities are to the data, the higher the likelihood
-Contrasts with linear regression which (commonly) uses ordinary least squares methods (minimises the squared deviations of the model from the data
-Parameters cannot be ‘solved’ the way they can in linear regression

67
Q

What is the variance explained for logistic regression models?

A

The variance depends on the proportion (mean) and hence cannot be compared either with linear regression R^2 or R^2 for other binary dependent variables with a different mean
-It’s easier to account for more vairance for more extreme values, since there is no variance .
-Means around .5 have very large variance (half values are 1, and half are 0)
-Means around .99 do not have much variance
-In logistic regression, R^2 is not calculated based on the correlation or variation accounted for at all. It is calculated based on likelihood ratios. The Cox and Snell approximation essentially says that the better your model does over the null model, the higher the R^2 value - this does not have a max of 1. The Nagelkereke transformation does have a maximum of 1

68
Q

What is Simpson’s paradox?

A

Conclusions drawn from the margins of a table is not necessarily the same as those from the whole table. Essentially saying that when you collapse the data, you can overlook important trends within it

69
Q

Explain Loglinear models

A

Statistical models for data based on counts, especially for 3 or more categorical variables (eg 3-way or higher contingency tables). Need to investigate the frequencies and proportions in each cell of the table. In a loglinear model, we want to test the interaction and see if the two variables are associated.
* Want to have a model with the smallest number of parameters as possible
-Hierarchical loglinear model - start more complex and shave off interactions to make it simpler

70
Q

Why would you want a simpler loglinear model?

A

-We want most bang for your buck from a model’s parameters.
-As we add more parameters to a model (predictors, interaction terms, etc.) we often get worse prediction or generalisation to new data.
-This doesn’t mean we should always accept a simple model - a simple model may fit badly!

71
Q

How do you simplify a loglinear model?

A

-Start with the saturated model (it fits perfectly!)
-Remove the highest order interaction
-if the change is non-significant, then the simpler, non-saturated model is a plausible description of the data
-If the change is significant, then the interaction is necessary for the model

72
Q

What is the measure of fit for a loglinear model?

A

The likelihood ratio statistic (G^2)
-Has an approximated chi-squared distribution
-The saturate model has zero degrees of freedom, so does not have a probability level

73
Q

What are the assumptions of a loglinear model?

A

-Each case in only one cell
-Ratio of cases to variables: 5 times as many cases as there are cells
-Expected cell frequencies: all should be greater than one and no more than 20% less than 5
-Standardised residuals should be normally distributed with no obvious pattern when plotted against observed values

74
Q

What are the differences between loglinear and logistic regression?

A

-Logistic regression is useful when you have a binary outcome (bounded between 0 and 1)
-Loglinear models are useful when you have counts or frequencies (eg lower bound of zero, upper bound of infinity)

75
Q

What are the similarities/differences between ANOVA and general linear model regression?

A

General Linear Model
○ Regression and ANOVA can do exactly the sam ething
§ Different emphasis so in practice, statistical software will return different output
§ Regression only handles one dependent variable
§ GLM does ANOVA through a dummy-variable multiple regression-like procedure
○ Why would you do an ANOVA using GLM?
§ It’s not necessary
§ BUT it’s helpful to understand the relation between the two techniques

76
Q

What is ANCOVA?

A

Analysis of covariance
ANCOVA is an extension of ANOVA where you control for one of more covariates
A covariate is a continuous variable that is correlated with the dependent variable but is not the focus of the study
○ Usually covariates are the focus, so this is unusual
Why bother?
○ Covariates could be possible confounds
○ In an ANOVA, the variance to due covariates becomes error. If it’s controlled in an ANCOVA, then the error variance is less, so you more accurately assess the variance due to the factor (IV)
§ Makes it easier to see the difference between the groups and improve the F ratio if the error is reduced

77
Q

Why use ANCOVA instead of ANOVA?

A

In ANOVA, despite differences between group means, in some cases it can be difficult to find an effect because of the high error variance. Whereas with ANCOVA, there is lower error variance so the differences between the group means are much clearer making it much more likely to find significant differences. So the differences between the groups remains the same, but the error variance is reduced to make the difference easier to see.

78
Q

What are the assumptions of an ANCOVA?

A

○ Normality
§ Same as ANOVA
○ Homogeneity of variance
§ Same as ANOVA
○ Linearity between pairs of covariates
§ In our example, we only have one so this does not apply
§ BUT if we have multiple covariates, they need to be linearly related
○ Linearity between the covariates and the DV
§ Because the relationship between them is modelled like a regression
○ Homogeneity of regression (very important)
§ The regression slope is the same for all cells in the ANOVA
§ In our case, the regression slope mathscrs and mathsach should be the same for males and females
§ Alternatively, there should be no interaction effect between mathscrs and gender in predicting maths achievement
□ In other words
® Both genders should benefit to the same extent from taking additional maths courses

79
Q

What are some practical issues with ANCOVA?

A

Indicates that the homogeneity of variance may not hold. However, because of equal large-ish numbers in cells, we can expect robustness of the ANCOVA.

80
Q

What does a significant Levene’s test imply for ANCOVA?

A

Indicates that the homogeneity of variance may not hold. However, because of equal large-ish numbers in cells, we can expect robustness of the ANCOVA.

81
Q

Explain the independence of covariate and factor assumption of ANCOVAs

A

Overlooked by many. Not taken into account by the regression slope. Regression slope could say they are predicted by the same amount for men and women but they could still be at different levels. This isn’t taken account of by the homogeneity of regression slope because while we may know that two groups may benefit from x treatment, that doesn’t mean that they took as much as each other. People often mistakenly turn to ANCOVA in hopes of ‘controlling for’ group differences on the covariate. There is no statistical means of accomplishing this ‘control’. If you are using an experimental design with randomly allocated groups then it is reasonable to assume the covariates do not relate to the groups.

82
Q

What is MANOVA?

A

Multivariate Analysis of Variance
○ Generalises ANOVA to situations where there are more than one DV
§ ANOVA tests whether mean differences between groups on one DV likely to have occurred by chance
§ MANOVA does that same but on a combination of DVs
§ In effect, MANOVA creates a new DV - combination of all the DVs, that maximises group differences - and then runs an ANOVA
§ To do so, it takes into account the correlations among DVs

83
Q

What are the advantages of MANOVA?

A

Advantages
○ Improves chances of discovering what changes
○ May reveal differences not shown in separate ANOVAs
○ But MANOVA is a more complicated analysis

84
Q

Why would someone want to analyse multiple DVs (for MANOVA)?

A

○ In many cases in psychology you only have a single DV: a measure on some scale
○ But there are several other DVs you could consider simultaneously
§ Response latency
§ EEG: measurement from multiple electrodes
§ MANOVA is very popular in neuroscience: analysis of EEG and fMRI data
○ It may be the case that DVs individually do not show much of an effect
§ But the combination of them show differences

85
Q

Why wouldn’t someone always use MANOVA?

A

○ Key thing is that the different DVs should be correlated - they should measure the same thing
§ Eg depression and anxiety: could combine these due to correlations to get an overall sense of ‘mental unwellness’
○ But you can reduce power if the DVs are not correlated!
§ Uncorrelated variables likely not measuring the same thing

86
Q

Explain MANOVA vs ANOVA

A

ANOVA:
-compares the variation between groups to the variation within groups
-Sums of squares between cells compared with sums of squares within cells

MANOVA:
-not only sums of squares
-sums of squares AND crossproducts (ie correlations between the DVs)
-SSCP matrix

87
Q

What is the difference between a cross product and the sums of squares?

A

Sums of Squares
-difference between the DV and the mean squared

Cross Product:
-Sum of difference between DV1 and its mean times difference between DV2 and its mean

88
Q

What are the multivariate significance tests for MANOVA?

A

Pillai-Bartlett trace:
- most powerful if group differences are concentrated on more than one variate
-Also used when sample sizes are equal - most robust to violations of assumptions

Hotellings

Wilk’s Lambda:
-this is criterion of choice unless there is reason to use Pillai’s

Roy’s largest root:
-most powerful if group differences are focused on first variate

89
Q

What are the assumptions of MANOVA?

A

multivariate normality:
-difficult to test
-If your DVs are each normally distributed this should be ok

Homogeneity of variance matrices:
-could use Box test (want to be not significant, often is significant tho, but robustness expected for equal sample sizes), levene’s test (only for univariate ANOVAs), Bartlett’s sphericity test (useful in univariate repeated measures designs). Instead of Box test, if logarithms of the different covariance matrices are in the same ballpark, it is safe to proceed

90
Q

Is it safe to do a univariate ANOVA once you do a MANOVA?

A

No. It is assumed you can by univariate tests, but if you keep doing more and more tests you will get false positives due to chance

91
Q

What are contrasts for MANOVA?

A

Suppose we want to check out the effect of father’s education against two DVs (this is a variable faed with 3 levels: 1=high school, 2=some college, 3= bachelor’s degree or more). Is helpful to choose a contrast to see where the differences lie if it is significant.

92
Q

What is multilevel modelling?

A

Models that permit constructs at more than one level.
-Individuals ‘nested’ in groups.
-predict individual outcomes from other individual variables as well as group level variables, taking into account grouping structure.
-What is macro and micro is context specific
-The grouping structure sets up independence among observations
-First sampling macro, then micro
-Micro observations are not independent of each other - issue for most data measures
-regression

93
Q

What is an example of micro-level propositions?

A

○ Holes in person’s jeans and apple produces owned by an individual - both individual level, but the dotted line suggests there is the presence of a macro level not being measured (eg subrub)

94
Q

What is a macro-micro interaction?

A

The strength of the macro predictor will depend on the micro variable

95
Q

What is a causal macro-micro-macro chain?

A

Causal chains - macro variable 1 predicts micro variable 1, which predicts micro variable 2, which predicts macro variable 2

96
Q

What is aggregation?

A

Aggregation is used if you are only interested in macro-level propositions, but raises several issues

97
Q

What are some issues with aggregation?

A

shift of meaning:
-variables aggregated to the macro level tell us about the micro level

issues related to neglect all of the data structure:
-eg reduction of power
-might miss patterns within the macro level and are now overlooking important info

Prevents examination of cross-level interactions

Ecological fallacy:
-general term for mistaken attempts to interpret aggregated data at a lower level (eg at the micro level)
-when people infer that associations at macro level translate to associations at a micro level

98
Q

What is disaggregation?

A

Applying the macro level variable to each individual on the micro level and conduct a regression at the micro level

99
Q

What are some issues with disaggregation?

A

-a measure of macro-level variable considered as micro-level
-miraculous multiplication of the number of units
-risks type 1 errors
-Do not take into account that observations within a macro-unit could be correlated

100
Q

What is a random effects ANOVA?

A

-For fixed effects ANOVA, we assume that the groups refer to categories, each with its own distinct interpretation (gender, religion, etc)
-But sometimes the groups are samples from a population (actual or hypothetical) of possible macro-units (eg three treatment groups based on different levels of drug intake)
-In this case the constant/intercept + the regression coefficient (now B0j) is not fixed but a random factor.

101
Q

What are the two sources of variance for a random effects ANOVA?

A

tau^2 is the vairance due to the group structure, and delta^2 is the residual variance. null hypothesis is that the group structure variance is zero

102
Q

Explain random intercept multilevel modelling

A

For the different linear models for the macro levels, they have different y-intercepts. Multi-level models assume that these are normally distributed around a mean value.
Issue:
-some intercepts are based on very little data
-Treats the intercepts as random but with fixed effect for slope

103
Q

What is the two-stage strategy for Multi-level modelling?

A

Used to investigate variables at two levels of analysis:
1. relationships among level 1 variables estimated separately for each higher level (level 2) unit
2. These relationships are then used as outcome variables for the variables at level 2

104
Q

Explain interclass correlation (ICC)

A

-For random effects ANOVA
-The proportion of variance explained by the group structure
-It is also the correlation between two randomly drawn individuals in one randomly drawn group

105
Q

Describe the random slopes model

A

-We can fit multi-level models that assume the slopes are normally distributed around a mean value
-Still let intercepts vary by group
-More like ‘random intercept and random slope model’
-If a model has random slopes it will most certainly have random intercepts too (because any data creating variability in the slope will likely also create variability in the intercepts

106
Q

What are some additional issues for Multi-level modelling?

A

Centring:
-It is often recommended that continuous predictors in MLMs be mean-centred, esp if they are going to appear in an interaction.
-Grand mean centring (compute a new variable by subtracting the overall mean from it)
-Group mean centring (compute a new variable by subtracting the group mean from it
-Easier to interpret, a zero score is now the mean value

Assumptions:
-Pretty much all the assumptions of regressions still apply (exception is dependency which is now included in the model)
. Additionally, the random effects are assumed to be normally distributed

Estimation:
-In most circumstances it won’t matter whether you use restricted maximum likelihood estimation (REML) or just ML. but if you are going to compare fit across models you need to use ML

Covariance structures: ○ Covariance structures: In this and previous lectures, we have not looked closely at different covariance structures for the random effects
§ In effect we have assumed that the random effects are uncorrelated with each other. This may not be the case.

107
Q

What are 4 types of covariance structures for MLM?

A

○Variance components
○ Diagonal elements are variances
○ Off diagonal elements are covariances
○ Essentially assumes the random effects don’t covary with each other
○ Also assumes the variances are the same
Diagonal
○ Covariances are the same
○ Variances are different
Autoregressive
○ Particularly important for repeated measures data
○ Form of nesting but where variances are nested within the individual
○ Correlation of scores now and in x amount of time is p
○ Expecting a correlation of p squared between scores now and scores in x amount of time
○ Scores now are 1
○ Useful when we think the correlations will get weaker over time
Unstructured
○ Want a structure that is simple but not so simple that it is oversimplified
○ Computers use these structures as starting points for analysis
○ Model parameters will be effected by what you choose here

108
Q

What are some other terms for MLM?

A

-Hierarchical linear modelling
-Linear mixed models
-Mixed models

109
Q

what is a mixed ANOVA?

A

Classical ANOVA with a mix of between-group and repeated-measures predictors (at least 1 of each). Relatively prone to running into problems with unbalanced designs and missing data. Includes fixed and random effects

110
Q

What does fixed effects mean?

A

Categorical variables whose levels are exhaustive (the levels in the study are the only ones you are concerned with).
-Modelling approach which treats group effects as fixed, in the sense that coefficients don’t vary, or in the sense that they vary but are not themselves modelled (eg ANOVA model with dummy variables to represent groups)

111
Q

What does random effects mean?

A

Categorical variables whose levels are chosen at random from a larger population (eg schools chosen at random from a list of all Australian schools)
-Modelling approach which treats coefficients representing levels of a group effect as randomly drawn from an underlying distribution (usually normal distribution)
-Subjects don’t have to be chosen randomly, but can be chosen as though they have been

112
Q

What is complete pooling?

A

Excluding the categorical predictor from the model.
-Model is learning too little from the data and is therefore underfitting
-It essentially assumes the variability between the groups is zero
-Also uses unbiased estimates
-But results will be relatively stable from sample to sample

113
Q

What is no pooling with separate regressions?

A

-Fitting separate regression for each separate group
-Parameter for one group might be way off the true value

114
Q

What is no pooling via fixed effects model?

A

-Single regression with group as categorical predictor
-Less extreme option of pooling or no pooling with separate regressions
-Estimates the intercept for each group separately but pools the slope estimates so they are the same for each group’
-Multiple regression
-Fixed effects model
-Has high variance though so vulnerable to variance within the data, can lead to overfitting

115
Q

Explain partial pooling through random intercepts model

A
  • Tries to make a compromise between complete pooling and no pooling
  • Keeps overall average in complete pooling but also uses group effect
  • Uses shrinkage
    • Only random effects shrink, fixed effects don’t
  • Also biased estimates
116
Q

What is cluster analysis

A

○ Exploratory - you wouldn’t use it for hypothesis testing
§ Used for exploring if there are any subgroups in your data
○ If you’re lucky, your data will be normally distributed
§ And individual differences are just spoken about in terms of how different they are from the group
○ In many cases though, you’ll have substantial individual differences
§ Can be used to summarise the data in a number of discrete groups
§ Eg noticing the regions in this map: with most data analysis it would just avergae the points and say somewhere in the middle
○ Cluster analysis is good at finding subgroups
-we need to understand the differences in our data
-one of the simplest means at looking for latent classes of participants

117
Q

What are the motives behind cluster analysis?

A

○ A simple approach to forming groups of variables or cases
§ Hierarchical cluster analysis
§ K-means cluster analysis
§ Two step cluster analysis (SPSS)
○ Individuals or variables who are ‘similar’ to one another are grouped into the same cluster
§ Decisions need to be made about how many clusters you want
§ Except for two step cluster analysis, which has inferential techniques to assist with decisions on the number of clusters
○ Essentially an exploratory technique
○ Can be used if there are qualitative differences between individuals
§ Eg to all subjects show the same effect, are there subgroups of subjects that show different effects etc
○ Very popular in consumer research
§ Often used to find subgroups with different purchasing behaviour
□ Eg purchasing primarily electonics etc

118
Q

What is hierarchical cluster analysis?

A

○ Can we put variables into groups where the variables within the group are more alike than between the groups?
○ Need measures of similarity between variables
○ We could use correlations
§ But correlations assess similar variation, not similar scores
§ So it depends on what your specific research question is

119
Q

What are some distance measures for hierarchical cluster analysis?

A

□ Euclidean distance
◊ Finding the differences between each set of corresponding variables, square them and then adding them up
® Squaring the euclidean distance
® In MOST cases you can use this, and we will be mostly focussing on this one
□ Block
® Unlike the Euclidean distance - finding the shortest distance if it’s not possible to cut through the middle (eg like a taxi driving through the city)
Finding the differences between each set of corresponding variables, and then adding the up
□ Minkowski-r
□ Squared Euclidean distance
□Power

120
Q

What is the difference between euclidean distance and block?

A

Euclidean distance is like the hypotenuse of a triangle, while block is like the two outside edges. Even though euclidean is smallest distance sometimes it isn’t possible (eg navigating thruogh a city)

121
Q

What is the proximity matrix?

A

The rows and columns represent variables; and the cells contain the distance between the two matrices

122
Q

What is the agglomerative hierarchical clustering method?

A

-Start with proximity matrix
-Combine the two closest variables in one cluster
-Recalculate the distance between the new cluster and all other variables/clusters
-Repeat until all variables are combined into one cluster

123
Q

What is nearest neighbour method to combining clusters?

A

-Also called single link
-Distance between clusters A and B defined as the smallest distance between any element of A and any element of B
-but choices are very arbitrary, and is subject to choosing an outlier

124
Q

What are some methods of combining clusters?

A

-nearest neighbour (single link, produces large sometimes straggly clusters)
-Complete linkage (furthest neighbour, furthest distance, produces tight clusters)
-average linkage within groups (similar to complete linkage)
-Ward’s method (a variance-based method, combines clusters with small and equal number of data points)
-Centroid method (distance between the means of all variables)
-Median method (similar to centroid, but small clusters weighted equally with large clusters)

125
Q

How to choose which cluster combining method to use

A

NO METHOD IS ALWAYS SUPERIOR
-Single linkage: elegant theoretically, but there have been recommendations against it)
-Complete linkage: more stable than single - better for presence of outliers
-Ward’s method and Average method have generally been shown to perform well but not with outliers

126
Q

how many clusters should be chosen in cluster analysis?

A

-Largely a matter of interpretation and choice
-Can look at agglomeration schedule
-agglomeration coefficient tells us how alike the two clusters being compared are
-choose a solution when the increase in the coefficient becomes large

127
Q

What is k-means clustering

A

-Has a random element to it
-Sometimes has lack of convergence - distances can keep changing even when you continue doing it
-Can be sensitive to the start points
-Produces reasonable numbers in clusters
-Used in market research to ‘segment’ the population

128
Q

How do you perform k-means clustering?

A

-define number of clusters (k)
-set initial cluster means
-Find squared euclidean distance from each case to the mean
-Allocate object to closest cluster
-Recalculate means for each cluster
-Find new distances
-Reallocate cases. If no change, stop, otherwise repeat

129
Q

K-means clustering vs hierarchical clustering

A

k-means:
-decided in advance
-clusters depend on the initialised cluster centres which are chosen randomly

Hierarchical:
-Everything is combined into a single cluster
-convention is to decide on the number of clusters when the distance becomes large
-Always produces the same result for a given linkage method
-Algorithm is completely deterministic, no random components

130
Q

What is two-step clustering?

A

-In the first step, clusters are grouped into a reasonably large number of small sub-clusters (technique involving cluster trees)
-In the second step, the sub-clusters are clustered using a standard hierarchical agglomerative procedure to produce the final clusters

131
Q

What are the advantages of two-step clustering?

A

-Combines both hierarchical and k-means clustering
-Handles outliers (stops them from forming ‘nuisance’ clusters)
-Allows for both categorical and continuous measures
-The researcher can either set the number of clusters, or allow the program to determine the number of clusters

132
Q

What are the disadvantages of two-step clustering?

A

Cluster membership can depend on the order of cases in the data file - a particular problem for small data sets

133
Q

Compare clustering and multidimensional scaling

A

-Both analyse the same data: measures of association between variables
-Differences:
*MDS older
*Clustering has weak or no model
*MDS has explicit model

134
Q

What is multidimensional scaling (MDS)?

A

-Analyses distances (dissimilarities, just like in cluster analysis, and can also analyse cases (individuals) or variables just like cluster analysis)
-Displays distance-like data as a geometrical picture
-Each object (ie case or variable) is represented as a point in multidimensional space (so that two similar objects appear close together - if in 2 dimensions, they can be represented in a chart, and if 1 dimension represented in a line)

135
Q

What are types of multidimensional scaling?

A

Classical MDS (metric scaling)
-The simplest kind
-One proximity matrix
-At least interval data, some times ratio
-assumes linear transformation

Non-metric MDS
-most frequently used
-Assumes only an ordinal level of measurement
-One distance matrix

More than one dissimilarity matrix:
-each subject assesses n objects on m qualities. Can create a proximity matrix for each subject
-unweighted: replicated MDS
-weighted: individual differences MDS
-The innovation underlying MDS is to replace the linear regression function with a rank-ordered one (need to find a set of coordinates such that the distances between the points in this space in the same rank order or as close as possible, as the size of the distances in the data)

136
Q

How many dimensions for multidimensional scaling?

A

A 1-dimension or 2-dimension solution is preferred because it can be visualised on a page. So the goal then is of dimension reduction

137
Q

Explain dimension reduction for multidimensional scaling

A

-All dimension reduction loses something (eg a map loses vertical distance)
-You can oversimplify
-How do we reduce dimensions? (Distances in the reduced dimensional space (map) have to be different than the original space (globe)

138
Q

Explain kruskal’s stress index of fit

A

Kruskal devised an index he called stress to see how a solution fit the data.
-Measure of how well an MDS representation fits the data
-Low values = better fit

As a rule of thumb, values less than .15 indicate a good fit BUT
-higher dimensions result in lower stress
-More variables result in higher stress
-if our stress is less than that of random data, accept the solution

139
Q

When to stop with your MDS

A

Three options:
-The stress value does not change by larger than a preset criterion (ie it could do better with more time but not much better)
-The stress value reaches a preset minimum value (not usually recommended though - stress varies across datasets) or
-The program reached a set number of iterations

140
Q

How to interpret Multidimensional scaling

A

The coordinate axes do not necessarily have meaning
-what is important is the position of each element relative to other elements

141
Q

What are the assumptions for multidimensional scaling?

A

MDS transforms ordinal proximities into distance data
-this assumes that the relationship between proximity data and derived distances is smooth
-Can check this with charts

Degeneracy: points of the representation are located in a few tight clusters
-These clusters may be only a small part of the structures of the data, but may swamp the interpretation
-And stress value may be close to 0
-Inspect the transformation plot: if it is reasonably smooth, then solution is ok; if it has obvious steps there may be problems

142
Q

When to use classical vs non-metric multi-dimensional scaling?

A

If the data is ratio - use classical

143
Q

What does a lack of statistical significance tell you?

A

The probability of the data under the null hypothesis is higher than our significance threshold
-does NOT tell you that the null hypothesis is true
-The other possibility is that you not have had enough data to reject the null hypothesis
-You cannot distinguish between these two using Null Hypothesis Significance Testing
-The likelihood of the data under the alternative hypothesis is missing under nul hypothesis significance testing

144
Q

What is conditional probability

A

The probability of B given A
-Is not symmetrical (ie p(B|A) does not equal p(A|B)

145
Q

What is Bayes rule?

A

provides a formal means for reversing a conditional probability using the given conditional probabilities and the probabilities of each of the events.

p(A|B|) = (p(B|A)*p(A))/p(B)

Can extrapolate to data:
p(model|data|) = (p(data|model)*p(model))/p(data)

Bayesian methods expand on maximum likelihood estimation by incorporating p(Model) - the prior probability and p(data) - marginal likelihood.
p(Model|data) is the posterior probability

146
Q

What is the Bayes factor?

A

ratio of evidence between two models (M1 and M2), although it can be generalised to any number of models.
In practice, we can specify M1 and M2 as the null and alternative hypothesis and test between them.
-Bayes factor is a distribution
-After you fit your model to the data, you have the posterior distribution of effect size which is essentially how much you have updated your vision

147
Q

What are some advantages of Bayesian hypothesis testing?

A

Being able to find evidence for the null hypothesis
-With Bayesian hypothesis testing you can find evidence for the null or the alternative hypothesis
-This ratio of evidence is quantified by the Bayes factor

148
Q

How to interpret Bayes factor

A

bayes factor of 1 means they ar ebothh equally likely
-when you don’t have enough data it doesn’t necessarily mean the null hypothesis is supported: it often means you will get a Bayes factor of 1, as more data is collected the BF will move towards the more supported hypothesis

149
Q

Why is it less problematic to inspect data in Bayesian Hypothesis testing?

A

-You can use the outcome of a prior Bayesian hypothesis test as the prior probability for the next test
-This allows you to update your results as data are being collected. You can also use these as priors for your next analysis in a sequence of experiments

150
Q

What are prior probabilities

A

-You can place prior probabilities on:
@Model parameters:
*effect size: can specify a prior distribution on the direction and magnitude of a particular effect
*coefficients of a regression model: you may already have a sense of both the direction and magnitude of a particular predictor

-Important because when conducting Bayesian data analyses, we can use prior probabilities to capture our senses of which results or hypotheses are more or less likely (this can be based on intuition, or based on lit review)

-If these is a high probability of a positive effect in a particular experimental paradigm, this can be reflected as a high prior probability for the effect

151
Q

What is Bem’s result and the criticised

A

Produced a paper with evidence for ESP. Has been criticised on several accounts:
-Conducted several experiments, not all of them found evidence for ESP
-Conditions and groups were analysed separately often without any form of statistical correction

152
Q

What is cross-validation?

A

-Instead of fitting ~all~ your data:
*You fit a subset of your data (training data), and evaluate the performance of the model on the remaining subset that it was not trained on (validation)
*The fit performance on the validation data is referred to as out-of-sample prediction

-Extremely effective way of comparing models
*The model that performs better on the validation that should be preferred
*This model exhibited better prediction/better generalisation to the data it was not trained on

153
Q

What is the ‘Leave out one cross-validation’ method?

A

-Leave out one cross validation (LOOCV)
*most common
*validation data is a single subject (or even a single data point), the rest are training data. Repeat the process where each subject/data point is left out (if you have N subjects/datapoints, you repeat the process N times)
*Evaluate the performance of each model on the predicted data
*If you are performing parameter estimation you can average over the parameters for all N fits

-Downsides of this method
*time intensive
*not easy to perform

154
Q

What is regularisation?

A

-Related to cross-validation technique for reducing the complexity of a regression model
-Most common technique is lasso regression: include an additional term in the error term that is the sum of all the values of the coefficients
-having high values on all of the coefficients makes the model perform worse (pushes estimates on small or weak predictors to zero)
-Requires specifying the regularisation term which is a major downside of the method because this can be difficult to specify in some cases

155
Q

Why would you want to use regularisation?

A

-predictors almost inevitably got non-zero estimates of coefficients even if they’re not doing anything
-Lasso regression naturally produces a simpler model where less predictors have significant coefficients

156
Q

Can you combine cross-validation, bayesian methods, and lasso regression?

A

Yes. The prior distribution in Bayesian analyses can behave like regularisation - if a prior distribution for a paramter is centred on zero, this ‘pulls’ the parameter toward zero similar to the lasso.