Advanced Design and Data Analysis Flashcards

Question

What are different types of component rotations?

Answer 1

○ Orthogonal § Components/factors remain uncorrelated § Quartimax simplifies the variable pattern of the loadings § Varimax (most common method) simplifies factor patterns of loadings § Equamax a comprimise simplification of variable and factor pattern simplification ○ Oblique § Components/factors correlated § Direct Oblimin § Promax § Both offer control of the degree of correlation of factors □ Direct oblimin ® Delta (ranging from -0.8-0.8) □ Promax ® Kappa (ranging from 1 upwards *recommended to do oblique because it is more realistic

Answer 2

Rotated component/factor matrix

Answer 3

Pattern, structure, component/factor correlation matrix

Answer 4

-Common factors are standardised -Common factors are uncorrelated -Specific factors are uncorrelated -Common factors are uncorrelated with specific factors

Answer 5

○ Suppose a correlation of 0.615 between items 1. ‘Don’t mind being the centre of attention’ 2. ‘Feel comfortable around other people’ ○ Correlation between item 1 and extraversion is 0.82 ○ Correlation between item 2 and extraversion = 0.75 ○ The aim is to find a latent or unobserved variable which, when correlated with observed vairables, leads to partial correlations between the observed variables that are as close to 0 as e can get

Answer 6

A ○ Interval or ratio data § If proceeding with it ordinal data, say that you are aware of it being problematic, but you are continuing anyway for the sake of the assignment ○ Adequate sample size ○ Any missing data dealt with § Either impute the missing data or delete the cases ○ Decently high correlations ○ Linearity § Misleading results for non-linear relationship § Look at scatterplots § Don’t bother converting data to linear relationships in assignment ○ Weak partial correlations ○ Absence of outliers ○ Absence of multicollinearity/singularity ○ Distribution appropriate to the method used to extract the factors

Answer 7

-Used in EFA ○ Image analysis involves partitioning of the variance of an observed variable into common and unique parts, producing § Correlations due to common parts, □ Image correlations § Correlations due to unique parts □ Anti-image correlations (should be near 0)

Answer 8

○ Principal components are ‘just’ linear combinations of obsereved variables. Factors are theoretical entities (latent variables) ○ In FA, error is explicitly measured, in PCA it isn’t ○ If another factor is added (removed) , the factor loadings of the others will change. If another component Is added (removed) that other component loadings stay the same § Not the case in PCA ○ Unlike PCA, FA is a theoretical modelling method, and we can test the fit of out model ○ FA fragments variability into common and unique parts, PCA doesn’t ○ PCA runs using single canonical algorithm and it always works. FA has many algorithms (some may not work with your data)

Answer 9

-Both have same general forms -They deliver similar results especially if number of variables is large -If you loosely define ‘factor analysis’ as a method for suggesting underlying traits, PCA can do that too

Answer 10

-Run EFA if you wish to test a theoretical model of latent factors causing observed variables -Run PCA if you want to simply reduce your correlated observed variables to a smaller set of important uncorrelated composite variables

Answer 11

‘the researcher should rarely, if ever, opt for a component analysis of empirical data if their goal was to interpret the patterns of covariation among variables as arising from latent variables or factors’

Answer 12

An umbrella term for a set of statistical techniques that permit analysis of relationships between one or more Ivs and DVs, in possibly complex ways ○ Also known as causal modelling, causal analysis, simultaneous equation modelling, analysis of covariance structures ○ Special types of SEM include confirmatory factor analysis and path analysis SEM enables a combined analysis that otherwise requires multiple techniques ○ For example, factor analysis and regression analysis The modelling of data by joining equations [1] and [2] is known as structural equation modelling That aspect of the model concerned with equation [2] is often called the measurement model That part focusing on equation [1] is known as the structural model If the structural model contains observed variables but no latent factors, we are doing a path analysis

Answer 13

Exploratory factor analysis can impose two kinds of restrictions ○ Could restrict the number of factors ○ Constrain the factors to be uncorrelated with an orthogonal rotation Confirmatory factor analysis can restrict factor loadings (or factor correlations of variance) to take certain values ○ A common vale: zero ○ If factor loading was set to zero, the hypothesis is that the observed variable score was not due to the factor Moreover, ○ Using maximum likelihood and generalised least squares estimation, CFA has a test of fit ○ So, it’s possible to test the hypothesis that the factor loading is zero ○ If the data fit the model, hypothesis is supported ○ Hence name confirmatory factor analysis -CFA provides us with a confirmatory analysis of our theory

Answer 14

Sample size ○ Wolf et al. (2013) show ‘one size fits all’ rules work poorly in this context ○ Jackson, (2003), provides support for the N:q rule § Ratio cases (N) to parameters being estimated (q) § >20:1 recommended § >10:1 likely to cause problems ○ Absolute sample size harder to assess § N=200 is common but may be too small § Barrett (2007) suggests journal editors routinely reject any CFA with N<200 Significance testing ○ Kline (2016) reports diminshed emphasis on signficance testingf because § Growing emphasis on testing the whole model rather than individual effects § Large-sample requirement means even trivial effects may be statistically significant § P-value estimates could change if we used a different method to estimate model parameters § Greater general awareness of issues with significance testing Distributional assumptions ○ The default estimation technique (maximum likelihood) assumes multivariate normality § Possible to transform variables to obtain normality § Widaman (2012): maximum likelihood estimation appears relatively robust to moderate violations of distributional assumptions § Some robust methods of estimations are available ○ CFA generally assumes continuous variables § Some programs have special methods for ordered categorical data Identification ○ Necessary but insufficient requirements for identification § Model derees of freedom must be greater than or equal to 0 § All latent variables must be assigned a scale § Estimation is based on solving of a number of complex equations ○ Constraints need to be placed on the model (not the data) in order for these equations to be solved unambiguously Model is identified if it’s theoretically possible for a unique estimate of every model parameter to be derived

Answer 15

Underidentified: -Not possible to uniquely estimate all the model’s free parameters (usually because there are more free parameters than observations. Need to respecify your model Just-identified: -Identified and has the same number of observations as free parameters (model df = 0). Infinite series of possible answers. Model will reproduce your data exactly, so won’t test your theory. Over-identified: Identified and has more observations than free parameters (df > or equal to 1). permits discrepancies between model and data, permits test of model fit, and of theory.

Answer 16

○ As for EFA, the most commonly used are § Unweighted least squares § Generalised least squares, and § Maximum likelihood ○ ML is often preferred, but assumes normality ○ Some more exotic methods for handling special types of data are available (but not taught in this course) ○ If you’re picking between two methods and they yield substantially different results, report both

Answer 17

Chi-square: -If chi Square isn’t significant, it could be because you have a large sample size, so doesn’t necessarily mean anything bad, but if it’s significant it could also be a factor because of a small sample size, so take it with a grain of salt -0 with perfect model fit and increases as model misspecification increases -p=1 with perfect model fit and decreases as model misspecification decreases Standardised Root Mean Square Residual (SRMSR): -should be less than .08 -Transforms the sample and model-estimated covariance matrices into correlation matrices Comparative Fit Index (CFI): -should be above .95 -Compares your model with a baseline model - typically the independence (null) model Tucker-Lewis Index (TLI) -Also known as the non-normed fit index (NNFI) -Relatively harsher on complex models than the CFI -Unlike the CFI, it isn’t normed to 0-1 -Highly correlated with CFI so don’t report both Root Mean Square Error of Approximation (RMSEA): -Less than .05, over .1 is unacceptable -Acts to ‘reward’ models analysed with larger samples, and models with more degrees of freedom

Answer 18

CFA: -not a structural model -is a measurement model -has latent variables -has observed variables Path analysis -Is a structural model -Is not a measurement model -Does not have latent variables -Has observed variables ‘full’ SEM: -Is a structural model -is also a measurement model -has latent variables -has observed variables

Answer 19

Regression and correlation will only be the same if the variance of x is the same as the standard deviation of x and y (denominator will be the same

Answer 20

Path models are expressed as diagrams The drawing convention is the same as in confirmatory factor analysis ○ Observed variables are drawn as rectangles ○ Unobserved variables as circles/ellipses ○ Relations are expressed as arrows § Straight, single headed arrows are used to indicate causal or predictive relationships § Curved, double-headed arrows are used to represent a non-directional relationship such as correlation or covariance

Answer 21

○ Recursive § Simpler § Unidirectional § The residual error terms are independent § Such models can be tested with a standard multiple regression ○ Non-recursive § Can have □ Bidirectional paths □ Correlated errors □ Feedback loops § Such models need structural equation software to fit them

Answer 22

A ○ Can be done via § Multiple regression § Structural Equation Modelling

Answer 23

Regression weights agree perfectly, but Standard errors differ Standardised regression weights differ The squared multiple correlation is rather less in SEM And we did get a warning regarding the uncorrelated predictors Multiple regression must model the correlations among the independent variables, although this is not shown ○ A path analytic representation is thus a much more accurate representation § And gives more information

Answer 24

They essentially say that no model is ever going to be perfect, so the best you can ask for is a parsimonious, substantively meaningful model that fits the observed data adequately well. But at the same time you also need to realise that there will be other models that do just as good of a job. So finding a good fit does not imply that a model is correct or true, but plausible.

Answer 25

§ ‘It the variance-covariance matrix implied by your model sufficiently close to your observed variance-covariance matrix that the difference could plausibly be due to sampling error?’

Answer 26

○ Approximate fit indices ignore the issue of sampling error and take different perspectives on providing a continuous measure of model-data correspondence ○ Three main flavours available under the ML estimation § Absolute fit indices □ Proportion of the observed variance-covariance matrix explained by the model □ Eg SRMR § Comparative fit indices □ Relative improvement in fit compared to a baseline □ Eg CFI § Parsimony-adjusted indices □ Compare model to observed data but penalise models with greater complexity □ Eg RMSEA

Answer 27

○ Kline (2016) six main limitations of global fit statistics § They only test the average/overall fit of a model § Each statistic reflects only a specific aspect of fit § They don’t relate clearly to the degree/type of model misspecification § Well-fitting models do not necessarily have high explanatory power § They cannot indicate whether results are theoretically meaningful § Fit statistics say little about person-level fit

Answer 28

Growing recent acknowledgement that good global fit statistics can hide problems with local fit, ie poor fit in specific parts of your model Various methods of testing local fit, some quite complex, described in Thoemmes et al. (2018) Some simpler methods are also possible ○ Examining Modification Indices ○ Examining Residual Covariances

Answer 29

A -Sample covariances: our input variance-covariance matrix -Implied covariances: the model-implied variance-covariance matrix -Residual covariances: differences between sample and implied covariances -Standardised residual covariances: ratios of covariance residuals over their standard errors. This is the one we want to use

Answer 30

Model can be incorrectly rejected as not fitting Standard errors will be smaller than they really are (ie parameters may seem significant when they are not Solve these problems through bootstrapping ○ To assess overall fit: Bollen-Stine test To obtain accurate standard errors

Answer 31

Parent sample gets transformed so covariance matrix has perfect fit. Chi-square value for this would be 0. Transformed samples will fit pretty well because their parents sample has perfect fit. But the bootstrapped samples won’t fit exactly the same, so a model is said to have good fit if it performs better than 5% of the bootstrapped samples.

Answer 32

Take new samples from the observed dataset

Answer 33

Linear models ○ Changes in x produce the same changes in y regardless of the value of x ○ Eg § If someone’s height increases from 100 to 110, we predict an increase in weight from 31 to 37.6 (+6.6) kg § If their height increases from 200 to 210, we would predict a weight increase from 97 to 103.6kg (+6.6kg) Non-linear models ○ Changes in x produce change in y that depends on the value of x There are many cases where linear models are inappropriate ○ Not everything increases or decreases without bounds § Sometimes we have a lower bound of zero § Sometimes we might have an upper bound of some kind ○ Not everything changes by the same amount every time § Negatively accelerated functions: learning over time, forgetting over time, increase in muscle mass with training etc § Positively accelerated functions (eg exponential growth): spread of infections, population growth etc

Answer 34

Regression on binary outcomes ○ What has two outcomes § Predicting whether someone is alive or dead § Predicitng whether or not a student is a member of a group § Predicting a participant’s two choice data □ Accuracy! There are many cases where responses are scored either correct or incorrect □ Yes vs no responses □ Category A vs Category B (categorisation) □ Recognise vs not-recognise (recognition memory) -instead of predicting Y=0 or 1, we model the probability of Y=1 occurring, this is a continuous function ranging between 0 and 1 -Specifically, we model the log odds of obtaining Y=1

Answer 35

-We predict the logarithm of the odds as a regression

Answer 36

Odds: -P(Y=1)/P(Y=0) -Suppose P(Y=1) = 0.8, then P(Y=0) = 0.2 Odds = 0.8/0.2 = 4 If odds >1 then Y=1 is a more probable outcome than Y=0. If the odds = 1 then it’s 50/50. If odds < 1, then Y=0 is more probable than y=1. Bounded - only positive Log Odds: Log P(Y=1)/P(Y=0) Log odds are unbounded If log odds>0 then odds > 1 so Y=1 is more probable

Answer 37

Has some form as linear model, but is now written as a function of y. Eg, f(Y)=a+b1x1+b2x2+bnxn + e (linear model form). This function is called the link, and can sometimes be written as mew. The appropriate function/link can allow linear techniques to be employed even if the data is not linear.

Answer 38

Identity link: -mew=y -This gives the linear model Logistic link: mew=logP(Y=1)/P(Y=0) -Used for binary variables -Gives logistic regression model Logarithm link: -mew = log(Y) -used for counts or frequencies -gives loglinear model

Answer 39

Standard linear regression conforms to certain assumptions: data are unbounded (-infinity to infinity). but datasets do not always meet these assumptions. So regression equation is substituted into the logistic function (for instance) and then predictors can range from negative infinity to positive infinity, but the logistic function makes it so they can only predict values between 0 and 1.

Answer 40

Best way to predict a binary variable from other variables. (make sure your dependent variable is coded (0,1), which is arbitrary, you can reverse code this). -no assumption of normality (binary data come from a binomial or bernoulli distribution) -No assumption of linearity -No assumption of homoscedasticity (equal variances) - because with binomially distributed data the variance depends on the probability or frequency. As the probability/expected frequency approaches 0 or 1, the variance approaches 0.

Answer 41

Binary outcomes which are mutually exclusive -Independence of observations

Answer 42

Maximum likelihood estimation: -maximise the log likelihood of the data under the model parameters. -For each observation we have predicted probability from the model. The closer the probabilities are to the data, the higher the likelihood -Contrasts with linear regression which (commonly) uses ordinary least squares methods (minimises the squared deviations of the model from the data -Parameters cannot be ‘solved’ the way they can in linear regression

Answer 43

The variance depends on the proportion (mean) and hence cannot be compared either with linear regression R^2 or R^2 for other binary dependent variables with a different mean -It’s easier to account for more vairance for more extreme values, since there is no variance . -Means around .5 have very large variance (half values are 1, and half are 0) -Means around .99 do not have much variance -In logistic regression, R^2 is not calculated based on the correlation or variation accounted for at all. It is calculated based on likelihood ratios. The Cox and Snell approximation essentially says that the better your model does over the null model, the higher the R^2 value - this does not have a max of 1. The Nagelkereke transformation does have a maximum of 1

Answer 44

Conclusions drawn from the margins of a table is not necessarily the same as those from the whole table. Essentially saying that when you collapse the data, you can overlook important trends within it

Answer 45

Statistical models for data based on counts, especially for 3 or more categorical variables (eg 3-way or higher contingency tables). Need to investigate the frequencies and proportions in each cell of the table. In a loglinear model, we want to test the interaction and see if the two variables are associated. * Want to have a model with the smallest number of parameters as possible -Hierarchical loglinear model - start more complex and shave off interactions to make it simpler

Answer 46

-We want most bang for your buck from a model’s parameters. -As we add more parameters to a model (predictors, interaction terms, etc.) we often get worse prediction or generalisation to new data. -This doesn’t mean we should always accept a simple model - a simple model may fit badly!

Answer 47

-Start with the saturated model (it fits perfectly!) -Remove the highest order interaction -if the change is non-significant, then the simpler, non-saturated model is a plausible description of the data -If the change is significant, then the interaction is necessary for the model

Answer 48

The likelihood ratio statistic (G^2) -Has an approximated chi-squared distribution -The saturate model has zero degrees of freedom, so does not have a probability level

Answer 49

-Each case in only one cell -Ratio of cases to variables: 5 times as many cases as there are cells -Expected cell frequencies: all should be greater than one and no more than 20% less than 5 -Standardised residuals should be normally distributed with no obvious pattern when plotted against observed values

Answer 50

-Logistic regression is useful when you have a binary outcome (bounded between 0 and 1) -Loglinear models are useful when you have counts or frequencies (eg lower bound of zero, upper bound of infinity)

Answer 51

General Linear Model ○ Regression and ANOVA can do exactly the sam ething § Different emphasis so in practice, statistical software will return different output § Regression only handles one dependent variable § GLM does ANOVA through a dummy-variable multiple regression-like procedure ○ Why would you do an ANOVA using GLM? § It’s not necessary § BUT it’s helpful to understand the relation between the two techniques

Answer 52

Analysis of covariance ANCOVA is an extension of ANOVA where you control for one of more covariates A covariate is a continuous variable that is correlated with the dependent variable but is not the focus of the study ○ Usually covariates are the focus, so this is unusual Why bother? ○ Covariates could be possible confounds ○ In an ANOVA, the variance to due covariates becomes error. If it’s controlled in an ANCOVA, then the error variance is less, so you more accurately assess the variance due to the factor (IV) § Makes it easier to see the difference between the groups and improve the F ratio if the error is reduced

Answer 53

In ANOVA, despite differences between group means, in some cases it can be difficult to find an effect because of the high error variance. Whereas with ANCOVA, there is lower error variance so the differences between the group means are much clearer making it much more likely to find significant differences. So the differences between the groups remains the same, but the error variance is reduced to make the difference easier to see.

Answer 54

○ Normality § Same as ANOVA ○ Homogeneity of variance § Same as ANOVA ○ Linearity between pairs of covariates § In our example, we only have one so this does not apply § BUT if we have multiple covariates, they need to be linearly related ○ Linearity between the covariates and the DV § Because the relationship between them is modelled like a regression ○ Homogeneity of regression (very important) § The regression slope is the same for all cells in the ANOVA § In our case, the regression slope mathscrs and mathsach should be the same for males and females § Alternatively, there should be no interaction effect between mathscrs and gender in predicting maths achievement □ In other words ® Both genders should benefit to the same extent from taking additional maths courses

Answer 55

Indicates that the homogeneity of variance may not hold. However, because of equal large-ish numbers in cells, we can expect robustness of the ANCOVA.

Answer 56

Indicates that the homogeneity of variance may not hold. However, because of equal large-ish numbers in cells, we can expect robustness of the ANCOVA.

Answer 57

Overlooked by many. Not taken into account by the regression slope. Regression slope could say they are predicted by the same amount for men and women but they could still be at different levels. This isn’t taken account of by the homogeneity of regression slope because while we may know that two groups may benefit from x treatment, that doesn’t mean that they took as much as each other. People often mistakenly turn to ANCOVA in hopes of ‘controlling for’ group differences on the covariate. There is no statistical means of accomplishing this ‘control’. If you are using an experimental design with randomly allocated groups then it is reasonable to assume the covariates do not relate to the groups.

Answer 58

Multivariate Analysis of Variance ○ Generalises ANOVA to situations where there are more than one DV § ANOVA tests whether mean differences between groups on one DV likely to have occurred by chance § MANOVA does that same but on a combination of DVs § In effect, MANOVA creates a new DV - combination of all the DVs, that maximises group differences - and then runs an ANOVA § To do so, it takes into account the correlations among DVs

Answer 59

Advantages ○ Improves chances of discovering what changes ○ May reveal differences not shown in separate ANOVAs ○ But MANOVA is a more complicated analysis

Answer 60

○ In many cases in psychology you only have a single DV: a measure on some scale ○ But there are several other DVs you could consider simultaneously § Response latency § EEG: measurement from multiple electrodes § MANOVA is very popular in neuroscience: analysis of EEG and fMRI data ○ It may be the case that DVs individually do not show much of an effect § But the combination of them show differences

Answer 61

○ Key thing is that the different DVs should be correlated - they should measure the same thing § Eg depression and anxiety: could combine these due to correlations to get an overall sense of ‘mental unwellness’ ○ But you can reduce power if the DVs are not correlated! § Uncorrelated variables likely not measuring the same thing

Answer 62

ANOVA: -compares the variation between groups to the variation within groups -Sums of squares between cells compared with sums of squares within cells MANOVA: -not only sums of squares -sums of squares AND crossproducts (ie correlations between the DVs) -SSCP matrix

Answer 63

Sums of Squares -difference between the DV and the mean squared Cross Product: -Sum of difference between DV1 and its mean times difference between DV2 and its mean

Answer 64

Pillai-Bartlett trace: - most powerful if group differences are concentrated on more than one variate -Also used when sample sizes are equal - most robust to violations of assumptions Hotellings Wilk’s Lambda: -this is criterion of choice unless there is reason to use Pillai’s Roy’s largest root: -most powerful if group differences are focused on first variate

Answer 65

multivariate normality: -difficult to test -If your DVs are each normally distributed this should be ok Homogeneity of variance matrices: -could use Box test (want to be not significant, often is significant tho, but robustness expected for equal sample sizes), levene’s test (only for univariate ANOVAs), Bartlett’s sphericity test (useful in univariate repeated measures designs). Instead of Box test, if logarithms of the different covariance matrices are in the same ballpark, it is safe to proceed

Answer 66

No. It is assumed you can by univariate tests, but if you keep doing more and more tests you will get false positives due to chance

Answer 67

Suppose we want to check out the effect of father’s education against two DVs (this is a variable faed with 3 levels: 1=high school, 2=some college, 3= bachelor’s degree or more). Is helpful to choose a contrast to see where the differences lie if it is significant.

Answer 68

Models that permit constructs at more than one level. -Individuals ‘nested’ in groups. -predict individual outcomes from other individual variables as well as group level variables, taking into account grouping structure. -What is macro and micro is context specific -The grouping structure sets up independence among observations -First sampling macro, then micro -Micro observations are not independent of each other - issue for most data measures -regression

Answer 69

○ Holes in person’s jeans and apple produces owned by an individual - both individual level, but the dotted line suggests there is the presence of a macro level not being measured (eg subrub)

Answer 70

The strength of the macro predictor will depend on the micro variable

Answer 71

Causal chains - macro variable 1 predicts micro variable 1, which predicts micro variable 2, which predicts macro variable 2

Answer 72

Aggregation is used if you are only interested in macro-level propositions, but raises several issues

Answer 73

shift of meaning: -variables aggregated to the macro level tell us about the micro level issues related to neglect all of the data structure: -eg reduction of power -might miss patterns within the macro level and are now overlooking important info Prevents examination of cross-level interactions Ecological fallacy: -general term for mistaken attempts to interpret aggregated data at a lower level (eg at the micro level) -when people infer that associations at macro level translate to associations at a micro level

Answer 74

Applying the macro level variable to each individual on the micro level and conduct a regression at the micro level

Answer 75

-a measure of macro-level variable considered as micro-level -miraculous multiplication of the number of units -risks type 1 errors -Do not take into account that observations within a macro-unit could be correlated

Answer 76

-For fixed effects ANOVA, we assume that the groups refer to categories, each with its own distinct interpretation (gender, religion, etc) -But sometimes the groups are samples from a population (actual or hypothetical) of possible macro-units (eg three treatment groups based on different levels of drug intake) -In this case the constant/intercept + the regression coefficient (now B0j) is not fixed but a random factor.

Answer 77

tau^2 is the vairance due to the group structure, and delta^2 is the residual variance. null hypothesis is that the group structure variance is zero

Answer 78

For the different linear models for the macro levels, they have different y-intercepts. Multi-level models assume that these are normally distributed around a mean value. Issue: -some intercepts are based on very little data -Treats the intercepts as random but with fixed effect for slope

Answer 79

Used to investigate variables at two levels of analysis: 1. relationships among level 1 variables estimated separately for each higher level (level 2) unit 2. These relationships are then used as outcome variables for the variables at level 2

Answer 80

-For random effects ANOVA -The proportion of variance explained by the group structure -It is also the correlation between two randomly drawn individuals in one randomly drawn group

Answer 81

-We can fit multi-level models that assume the slopes are normally distributed around a mean value -Still let intercepts vary by group -More like ‘random intercept and random slope model’ -If a model has random slopes it will most certainly have random intercepts too (because any data creating variability in the slope will likely also create variability in the intercepts

Answer 82

Centring: -It is often recommended that continuous predictors in MLMs be mean-centred, esp if they are going to appear in an interaction. -Grand mean centring (compute a new variable by subtracting the overall mean from it) -Group mean centring (compute a new variable by subtracting the group mean from it -Easier to interpret, a zero score is now the mean value Assumptions: -Pretty much all the assumptions of regressions still apply (exception is dependency which is now included in the model) . Additionally, the random effects are assumed to be normally distributed Estimation: -In most circumstances it won’t matter whether you use restricted maximum likelihood estimation (REML) or just ML. but if you are going to compare fit across models you need to use ML Covariance structures: ○ Covariance structures: In this and previous lectures, we have not looked closely at different covariance structures for the random effects § In effect we have assumed that the random effects are uncorrelated with each other. This may not be the case.

Answer 83

○Variance components ○ Diagonal elements are variances ○ Off diagonal elements are covariances ○ Essentially assumes the random effects don’t covary with each other ○ Also assumes the variances are the same Diagonal ○ Covariances are the same ○ Variances are different Autoregressive ○ Particularly important for repeated measures data ○ Form of nesting but where variances are nested within the individual ○ Correlation of scores now and in x amount of time is p ○ Expecting a correlation of p squared between scores now and scores in x amount of time ○ Scores now are 1 ○ Useful when we think the correlations will get weaker over time Unstructured ○ Want a structure that is simple but not so simple that it is oversimplified ○ Computers use these structures as starting points for analysis ○ Model parameters will be effected by what you choose here

Answer 84

-Hierarchical linear modelling -Linear mixed models -Mixed models

Answer 85

Classical ANOVA with a mix of between-group and repeated-measures predictors (at least 1 of each). Relatively prone to running into problems with unbalanced designs and missing data. Includes fixed and random effects

Answer 86

Categorical variables whose levels are exhaustive (the levels in the study are the only ones you are concerned with). -Modelling approach which treats group effects as fixed, in the sense that coefficients don’t vary, or in the sense that they vary but are not themselves modelled (eg ANOVA model with dummy variables to represent groups)

Answer 87

Categorical variables whose levels are chosen at random from a larger population (eg schools chosen at random from a list of all Australian schools) -Modelling approach which treats coefficients representing levels of a group effect as randomly drawn from an underlying distribution (usually normal distribution) -Subjects don’t have to be chosen randomly, but can be chosen as though they have been

Answer 88

Excluding the categorical predictor from the model. -Model is learning too little from the data and is therefore underfitting -It essentially assumes the variability between the groups is zero -Also uses unbiased estimates -But results will be relatively stable from sample to sample

Answer 89

-Fitting separate regression for each separate group -Parameter for one group might be way off the true value

Answer 90

-Single regression with group as categorical predictor -Less extreme option of pooling or no pooling with separate regressions -Estimates the intercept for each group separately but pools the slope estimates so they are the same for each group’ -Multiple regression -Fixed effects model -Has high variance though so vulnerable to variance within the data, can lead to overfitting

Answer 91

* Tries to make a compromise between complete pooling and no pooling * Keeps overall average in complete pooling but also uses group effect * Uses shrinkage * Only random effects shrink, fixed effects don’t * Also biased estimates

Answer 92

○ Exploratory - you wouldn’t use it for hypothesis testing § Used for exploring if there are any subgroups in your data ○ If you’re lucky, your data will be normally distributed § And individual differences are just spoken about in terms of how different they are from the group ○ In many cases though, you’ll have substantial individual differences § Can be used to summarise the data in a number of discrete groups § Eg noticing the regions in this map: with most data analysis it would just avergae the points and say somewhere in the middle ○ Cluster analysis is good at finding subgroups -we need to understand the differences in our data -one of the simplest means at looking for latent classes of participants

Answer 93

○ A simple approach to forming groups of variables or cases § Hierarchical cluster analysis § K-means cluster analysis § Two step cluster analysis (SPSS) ○ Individuals or variables who are ‘similar’ to one another are grouped into the same cluster § Decisions need to be made about how many clusters you want § Except for two step cluster analysis, which has inferential techniques to assist with decisions on the number of clusters ○ Essentially an exploratory technique ○ Can be used if there are qualitative differences between individuals § Eg to all subjects show the same effect, are there subgroups of subjects that show different effects etc ○ Very popular in consumer research § Often used to find subgroups with different purchasing behaviour □ Eg purchasing primarily electonics etc

Answer 94

○ Can we put variables into groups where the variables within the group are more alike than between the groups? ○ Need measures of similarity between variables ○ We could use correlations § But correlations assess similar variation, not similar scores § So it depends on what your specific research question is

Answer 95

□ Euclidean distance ◊ Finding the differences between each set of corresponding variables, square them and then adding them up ® Squaring the euclidean distance ® In MOST cases you can use this, and we will be mostly focussing on this one □ Block ® Unlike the Euclidean distance - finding the shortest distance if it’s not possible to cut through the middle (eg like a taxi driving through the city) Finding the differences between each set of corresponding variables, and then adding the up □ Minkowski-r □ Squared Euclidean distance □Power

Answer 96

Euclidean distance is like the hypotenuse of a triangle, while block is like the two outside edges. Even though euclidean is smallest distance sometimes it isn’t possible (eg navigating thruogh a city)

Answer 97

The rows and columns represent variables; and the cells contain the distance between the two matrices

Answer 98

-Start with proximity matrix -Combine the two closest variables in one cluster -Recalculate the distance between the new cluster and all other variables/clusters -Repeat until all variables are combined into one cluster

Answer 99

-Also called single link -Distance between clusters A and B defined as the smallest distance between any element of A and any element of B -but choices are very arbitrary, and is subject to choosing an outlier

Answer 100

-nearest neighbour (single link, produces large sometimes straggly clusters) -Complete linkage (furthest neighbour, furthest distance, produces tight clusters) -average linkage within groups (similar to complete linkage) -Ward’s method (a variance-based method, combines clusters with small and equal number of data points) -Centroid method (distance between the means of all variables) -Median method (similar to centroid, but small clusters weighted equally with large clusters)

Answer 101

NO METHOD IS ALWAYS SUPERIOR -Single linkage: elegant theoretically, but there have been recommendations against it) -Complete linkage: more stable than single - better for presence of outliers -Ward’s method and Average method have generally been shown to perform well but not with outliers

Answer 102

-Largely a matter of interpretation and choice -Can look at agglomeration schedule -agglomeration coefficient tells us how alike the two clusters being compared are -choose a solution when the increase in the coefficient becomes large

Answer 103

-Has a random element to it -Sometimes has lack of convergence - distances can keep changing even when you continue doing it -Can be sensitive to the start points -Produces reasonable numbers in clusters -Used in market research to ‘segment’ the population

Answer 104

-define number of clusters (k) -set initial cluster means -Find squared euclidean distance from each case to the mean -Allocate object to closest cluster -Recalculate means for each cluster -Find new distances -Reallocate cases. If no change, stop, otherwise repeat

Answer 105

k-means: -decided in advance -clusters depend on the initialised cluster centres which are chosen randomly Hierarchical: -Everything is combined into a single cluster -convention is to decide on the number of clusters when the distance becomes large -Always produces the same result for a given linkage method -Algorithm is completely deterministic, no random components

Answer 106

-In the first step, clusters are grouped into a reasonably large number of small sub-clusters (technique involving cluster trees) -In the second step, the sub-clusters are clustered using a standard hierarchical agglomerative procedure to produce the final clusters

Answer 107

-Combines both hierarchical and k-means clustering -Handles outliers (stops them from forming ‘nuisance’ clusters) -Allows for both categorical and continuous measures -The researcher can either set the number of clusters, or allow the program to determine the number of clusters

Answer 108

Cluster membership can depend on the order of cases in the data file - a particular problem for small data sets

Answer 109

-Both analyse the same data: measures of association between variables -Differences: *MDS older *Clustering has weak or no model *MDS has explicit model

Answer 110

-Analyses distances (dissimilarities, just like in cluster analysis, and can also analyse cases (individuals) or variables just like cluster analysis) -Displays distance-like data as a geometrical picture -Each object (ie case or variable) is represented as a point in multidimensional space (so that two similar objects appear close together - if in 2 dimensions, they can be represented in a chart, and if 1 dimension represented in a line)

Answer 111

Classical MDS (metric scaling) -The simplest kind -One proximity matrix -At least interval data, some times ratio -assumes linear transformation Non-metric MDS -most frequently used -Assumes only an ordinal level of measurement -One distance matrix More than one dissimilarity matrix: -each subject assesses n objects on m qualities. Can create a proximity matrix for each subject -unweighted: replicated MDS -weighted: individual differences MDS -The innovation underlying MDS is to replace the linear regression function with a rank-ordered one (need to find a set of coordinates such that the distances between the points in this space in the same rank order or as close as possible, as the size of the distances in the data)

Answer 112

A 1-dimension or 2-dimension solution is preferred because it can be visualised on a page. So the goal then is of dimension reduction

Answer 113

-All dimension reduction loses something (eg a map loses vertical distance) -You can oversimplify -How do we reduce dimensions? (Distances in the reduced dimensional space (map) have to be different than the original space (globe)

Answer 114

Kruskal devised an index he called stress to see how a solution fit the data. -Measure of how well an MDS representation fits the data -Low values = better fit As a rule of thumb, values less than .15 indicate a good fit BUT -higher dimensions result in lower stress -More variables result in higher stress -if our stress is less than that of random data, accept the solution

Answer 115

Three options: -The stress value does not change by larger than a preset criterion (ie it could do better with more time but not much better) -The stress value reaches a preset minimum value (not usually recommended though - stress varies across datasets) or -The program reached a set number of iterations

Answer 116

The coordinate axes do not necessarily have meaning -what is important is the position of each element relative to other elements

Answer 117

MDS transforms ordinal proximities into distance data -this assumes that the relationship between proximity data and derived distances is smooth -Can check this with charts Degeneracy: points of the representation are located in a few tight clusters -These clusters may be only a small part of the structures of the data, but may swamp the interpretation -And stress value may be close to 0 -Inspect the transformation plot: if it is reasonably smooth, then solution is ok; if it has obvious steps there may be problems

Answer 118

If the data is ratio - use classical

Answer 119

The probability of the data under the null hypothesis is higher than our significance threshold -does NOT tell you that the null hypothesis is true -The other possibility is that you not have had enough data to reject the null hypothesis -You cannot distinguish between these two using Null Hypothesis Significance Testing -The likelihood of the data under the alternative hypothesis is missing under nul hypothesis significance testing

Answer 120

The probability of B given A -Is not symmetrical (ie p(B|A) does not equal p(A|B)

Answer 121

provides a formal means for reversing a conditional probability using the given conditional probabilities and the probabilities of each of the events. p(A|B|) = (p(B|A)*p(A))/p(B) Can extrapolate to data: p(model|data|) = (p(data|model)*p(model))/p(data) Bayesian methods expand on maximum likelihood estimation by incorporating p(Model) - the prior probability and p(data) - marginal likelihood. p(Model|data) is the posterior probability

Answer 122

ratio of evidence between two models (M1 and M2), although it can be generalised to any number of models. In practice, we can specify M1 and M2 as the null and alternative hypothesis and test between them. -Bayes factor is a distribution -After you fit your model to the data, you have the posterior distribution of effect size which is essentially how much you have updated your vision

Answer 123

Being able to find evidence for the null hypothesis -With Bayesian hypothesis testing you can find evidence for the null or the alternative hypothesis -This ratio of evidence is quantified by the Bayes factor

Answer 124

bayes factor of 1 means they ar ebothh equally likely -when you don’t have enough data it doesn’t necessarily mean the null hypothesis is supported: it often means you will get a Bayes factor of 1, as more data is collected the BF will move towards the more supported hypothesis

Answer 125

-You can use the outcome of a prior Bayesian hypothesis test as the prior probability for the next test -This allows you to update your results as data are being collected. You can also use these as priors for your next analysis in a sequence of experiments

Answer 126

-You can place prior probabilities on: @Model parameters: *effect size: can specify a prior distribution on the direction and magnitude of a particular effect *coefficients of a regression model: you may already have a sense of both the direction and magnitude of a particular predictor -Important because when conducting Bayesian data analyses, we can use prior probabilities to capture our senses of which results or hypotheses are more or less likely (this can be based on intuition, or based on lit review) -If these is a high probability of a positive effect in a particular experimental paradigm, this can be reflected as a high prior probability for the effect

Answer 127

Produced a paper with evidence for ESP. Has been criticised on several accounts: -Conducted several experiments, not all of them found evidence for ESP -Conditions and groups were analysed separately often without any form of statistical correction

Answer 128

-Instead of fitting ~all~ your data: *You fit a subset of your data (training data), and evaluate the performance of the model on the remaining subset that it was not trained on (validation) *The fit performance on the validation data is referred to as out-of-sample prediction -Extremely effective way of comparing models *The model that performs better on the validation that should be preferred *This model exhibited better prediction/better generalisation to the data it was not trained on

Answer 129

-Leave out one cross validation (LOOCV) *most common *validation data is a single subject (or even a single data point), the rest are training data. Repeat the process where each subject/data point is left out (if you have N subjects/datapoints, you repeat the process N times) *Evaluate the performance of each model on the predicted data *If you are performing parameter estimation you can average over the parameters for all N fits -Downsides of this method *time intensive *not easy to perform

Answer 130

-Related to cross-validation technique for reducing the complexity of a regression model -Most common technique is lasso regression: include an additional term in the error term that is the sum of all the values of the coefficients -having high values on all of the coefficients makes the model perform worse (pushes estimates on small or weak predictors to zero) -Requires specifying the regularisation term which is a major downside of the method because this can be difficult to specify in some cases

Answer 131

-predictors almost inevitably got non-zero estimates of coefficients even if they’re not doing anything -Lasso regression naturally produces a simpler model where less predictors have significant coefficients

Answer 132

Yes. The prior distribution in Bayesian analyses can behave like regularisation - if a prior distribution for a paramter is centred on zero, this ‘pulls’ the parameter toward zero similar to the lasso.

Advanced Design and Data Analysis Flashcards

(156 cards)