Statistics And Data Analysis 21061 Flashcards

Question

Why do we use estimated standard error instead of standard deviation in T-distribution ## Footnote Week 2

Answer 1

Because it is a sampling distribution, instead of s.d we use s.e. This is because standard error is used to express the extent an individual sample mean difference deviates from 0 As we do not have all of the possible samples to calculate the standard error, we estimate the standard error , hence why we use e.s.e

Answer 2

Xd/ESEd AKA Mean of the difference / estimated standard error of the difference AKA variance between IV levels/variance within IV levels

Answer 3

If t value is closer to 0, smaller variance between IV levels relative to within If t value is further from 0 , large variance between IV levels relative to within IV levels

Answer 4

If the null hypothesis is true - 95% of sampled t-values will fall within the 95% bounds of the t-dist If the null hypothesis is true, only 5% of sampled t-values will fall outside the 95% bounds

Answer 5

the differences between the number of measurements (sample size) made & number of parameters estimated (usually one, mean) (Sample size - # of parameters) N-2 for independent t-test n-1 for paired t-test

Answer 6

They tend to 1.96, the original value

Answer 7

* **Normality**: the DV should be normally distributed under each level of the IV * **Homogeneity** of variance: The variance in the DV, under each level of the IV should be reasonably equivalent * **Equivalent sample size**: sample size under each level of IV should be roughly equal ( matters more with smaller samples) *** Independence of observations**: scores under each level of the IV should be independent

Answer 8

we use the non-parametric equvalent: Mann-Whitney U test

Answer 9

A test for equality of variance --> homogeneity of variances

Answer 10

**Tells us**: Whether theres a diff in variances under the IV levels **doesn't tell us**:if our means are different or IV manipulation

Answer 11

no diff between the variance under each level of the IV (i.e homogeneity in variance)

Answer 12

There is heterogeneity in variance - the way in which the data varies under both IVs is different

Answer 13

equal variance and homogeneity

Answer 14

- Manipulation of IV (treatment effects) - Experimental error

Answer 15

Experimental error (RM designs - can discount the variance due to individual differences (leaving **only variance due to error**))

Answer 16

* **Normality** - distribution of difference scores between the IV levels should be approximately normal * Assume ok if n> 30 * **Sample size** - sample size under each IV level should be roughly equal

Answer 17

We use the non-parametric equivalent - Wilcoxon test

Answer 18

we can't determine if result is likely to be significant by looking at 95% CI plot therefore we need to look at the influence of the IV in terms of size & consistency of effect

Answer 19

you cannot reject the null hypothesis as you cannot conclude that the true population mean difference is different from 0

Answer 20

The magnitude of difference between two IV level means, expressed in s.d units I.e - a standardised value expressing the diff between the IV level means

Answer 21

Effect size d Small 0.2 Medium 0.5 Large 0.8

Answer 22

D = magnitude of difference between two IV level means, **expressed in s.d units** T = magnitude of diff between two IV level means, **expressed in ESE units** **T takes sample size into account - qualifies the size of the effect in the context of the sample size .**

Answer 23

When we have 1 IV with more than 2 levels

Answer 24

Estimate whether the population means under the diff the levels of the IV are different

Answer 25

an extension of the t-test --> if you conducted a one-way anova on an IV w/ 2 levels, you'd obtain the same result (F = t^2)

Answer 26

the more we draw from a population, the more likely we are to encounter a type I error and reject the null hyothesis, even if it true

Answer 27

Probability that at least one of a ‘family’ of comparisons run on the same data, will result in a type I error Provides a corrected significance level (a) reducing the probability of making a type I error

Answer 28

a' = 1 - (1- a)^c where c is the number of comparisons e.g for 3 IV levels (3 comparisons) (ab ac bc) 1 - (1 - 0.05) ^3 = .143 = 14% chance of type I error for 4 IV levels (6 comparisons (ab ac ad bc bd cd) ) 1 - (1 - 0.05)^6 = .264 = 26% chance of type 1 error

Answer 29

To control familywise error rate

Answer 30

there is no difference between populations means under different levels of IV H0:u1=u2=u3

Answer 31

Variance between IV levels/ Variance within IV levels

Answer 32

F value close to 0 = small variance between IV levels relative to within IV levels F Value further from 0 = large variance between IV levels relative to within IV levels

Answer 33

Same as those for independent T-test **Normality**: DV should be normally distributed, under each level of the IV **Homogeneity of variance** : Variance in the DV, under each level of the IV, should be (reasonably) equivalent **Equivalent sample size** : sample size under each level of the IV should be roughly equal **Independence of observations** : scores under each level of the IV should be independent

Answer 34

We use the non-parametric equivalent, the Kruskal Wallis test

Answer 35

Model Sum of Squares (SSM): sum of squared differences between IV level means and grand mean (i.e. between IV level variance)

Answer 36

Residual Sum of Squares (SSR): sum of squared differences between individual values and corresponding IV level mean (i.e. within IV level variance)

Answer 37

Sum of squares total = SSm( Sum of squares model ) + SSr (Sum of squares residual)

Answer 38

MS = SS/df (Sum of squares/ degrees of freedom) MSm = model Mean square value MSr = residual mean square value

Answer 39

To calculate the F statistic

Answer 40

MSm/MSr aka model mean square value / residual mean square value

Answer 41

We report Welch's F instead of ANOVA F

Answer 42

The degrees of freedom are adjusted (to make the test more conservative)

Answer 43

*F*(dfm,dfr)=F-value, *p* =p-value

Answer 44

find the difference between the number of measurements and the number of parameters estimated i.e. no. of measurements – no. parameters estimated

Answer 45

Secondary analyses used to assess which IV level mean pairs differ

Answer 46

only when the F-value is significant

Answer 47

As t-tests, but we include correction for multiple comparisons

Answer 48

* Bonferroni * least significant difference (LSD) * Tukey honestly significant difference (HSD)

Answer 49

Bonferroni

Answer 50

Least significant difference (LSD)

Answer 51

Tukey Honestly significant difference (HSD)

Answer 52

>0.01 is small >0.06 is medium >0.14 is large

Answer 53

calculated in 2 ways, cohens d and partial eta squared

Answer 54

Model sum of squares/ (model sum of squares + residual sum of squares)

Answer 55

* Manipulation of IV (treatment effects) * Experimental error (random & potentially constant error

Answer 56

Experimental error (random error)

Answer 57

**Model variance**(variance between IV levels)/ **residual variance** (variance within IV levels) - **Individual differences** (in independent designs)

Answer 58

variance between IV levels/ variance within IV levels (excluding variance due to individual diffs WHEN IN RM design)

Answer 59

Mean sum of squares model/ mean sum of squares residual

Answer 60

* **Normality -** distribution of difference scores under each IV level pair should be normally distributed * **Sphericity (homogeneity of covariance)** - the variance in difference scores under each IV level pair should be reasonably equivalent * Unique to RM 1-way anova * **Equivalent sample size**: sample size under each level of the IV should be roughly the same

Answer 61

Greenhouse-geisser

Answer 62

Mauchly's test & the W value

Answer 63

There is no difference between the covariances under each IV level pair (i.e homogeneity) If p ≤ .05 we reject null hypothesis (i.e heterogeneity)

Answer 64

we should use the non-parametric equivalent - Friedman test

Answer 65

The row that labelled Greenhouse-geisser as sphericity cannot be assumed

Answer 66

The row labelled sphericity assumed

Answer 67

F(dfM,dfR) =*F-value* p = *p value* (greenhouse-geisser/sphericity assumed)

Answer 68

dfM = K -1(where K number of IV levels/ parameters) dfR = dfM x (n-1) (where n = number of participants)

Answer 69

bonferroni

Answer 70

needs fewer p’s to gain same number of measurements

Answer 71

Remove variance due to individual differences from error variances --> leading to less variance within IV levels

Answer 72

There is more power with the same amount of participants * its easier to find a significant difference ( and avoid type II error)

Answer 73

They are the effects of having the participants go through the same thing in different conditions and becoming habituated to it in a variety of different ways. They **introduce confines** - error introduced systematically between IV levels

Answer 74

* Practice effets * fatigue * sensitisation * carry-over effects

Answer 75

P's get better at the task which positively skews how they do in subsequent IV levels

Answer 76

Participants get bored/ tired of engaging which negatively skews how they do in subsequent tasks

Answer 77

P's start behaving in a particular way to please or annoy the experimenter due to understanding IV manipulation

Answer 78

- effect of taking part in one IV level effects how one acts on subsequent IV levels

Answer 79

counterbalancing what order people undergo the IV levels go through must be done to ensure as much randomness as possible, this does not get rid of order effects, but spreads their impact

Answer 80

– Practice - extensive pre-study practise – Fatigue - short experiments – Sensitisation - intervals between exposure to IV levels – Carry-over effects - include a control group

Answer 81

to test for differences when we have more than one IV with at least 2 levels

Answer 82

* all IVs are between-subjects (independent) * all IVs are within-subjects (repeated measures) * a mixture of between-subjects and within-subjects IVs (mixed)

Answer 83

2 IVs/factors, each with 2 levels

Answer 84

2 IVs/factors, one with 2 levels and one with 4 levels

Answer 85

* is there a significant main effect of gender * is there a significant main effect of colour * is there an significant main interaction between gender and colour?

Answer 86

The primary IV is gender, the secondary IV is texture. Gender is the primary IV as it is the IV main IV we are looking for an effect for. Texture is the secondary IV as we are looking to see if the addition of this variable also creates an effect, hence it being secondary because it is not the focus.

Answer 87

**There is one per IV and one for each possible interaction IV pair.** e.g in 2 * 2 ANOVA , there is a null hypothesis of no difference in means for IV one, one for IV two and one for the interaction between IV one and IV two

Answer 88

that the effect of manipulating one IV depends on the level of the other IV

Answer 89

The combined effects of multiple IVs/factors on the DV

Answer 90

to determine if there is significant effect for either IV

Answer 91

There is no interaction of the two IVs

Answer 92

there is no main effect

Answer 93

Normality: DV should be normally distributed, under each level of the IV Homogeneity of variance : Variance in the DV, under each level of the IV, should be (reasonably) equivalent Levennes - DON'T want a significant result **NO correction** Equivalent sample size : sample size under each level of the IV should be roughly equal Independence of observations : scores under each level of the IV should be independent

Answer 94

**There is no non-parametric equivalent for factorial ANOVA** If our data seriously violate these assumptions we can attempt a ‘fix’ or we can simplify the design

Answer 95

one for each IV i.e the main effect for each IV

Answer 96

Classical eta^2 : proportion of total variance attributable to factor Partial eta^2: Only takes into account variance from one IV at a time (Proportion of total variance attributable to the factor, partialling out/excluding variance due to other factors)

Answer 97

If the main effect of at least one of the IVs is significant, then we reject the null hypothesis ***Only relevant when *** * main effect of IV is significant & IV hs more than 2 levels

Answer 98

nothing, we dont report Cohens d

Answer 99

effect of an IV at a single level of another IV * * do compairsosn of cell mean conditions (i.e t-tests)

Answer 100

we do independent t-test for each comparison

Answer 101

a correction that divides the required alpha level by the number of comparions (e.g for 6 comparisons , .05/6 = .008)

Answer 102

a flexible and powerful technique appropriate for many experimental designs

Answer 103

*Do I have a clear research question? *Do I know what analyses I will need to conduct to answer this? *Will I be able to carry out and interpret the results of these analyses? *Have I considered and controlled for potential confounds? *Will I understand the answer I get?

Answer 104

* **Scale of measurement** * **Research aim** *` `Descriptive only *` `Relational (relationships) *` ` Experimental (differences) * **Experimental design** *` `Subject design: between/within *Number of IV’s *` `Number of IV levels * **Properties of dependent/outcome variable** *Normally distributed: parametric *` `Not normally distributed: non parametric

Answer 105

Make predictions or infer causality

Answer 106

95% of all sampled means will fall within the 95% bound of the population mean

Answer 107

you drop the leading zero and report it to 3dp

Answer 108

Form, Direction ,Magnitude/strength

Answer 109

linear or curvilinear

Answer 110

positive or negative

Answer 111

The R value

Answer 112

The dots are random and there is no systematic relationship

Answer 113

± 0.1 - 0.39 = weak correlation ± 0.4 - 0.69 = moderation correlation ± 0.7 - 0.99 = strong correlation

Answer 114

The idea that some DV’s peak at a certain point of an IV (e.g confidence in ability to pass course, too low = do worse, too high = do worse, at optimum = do best)

Answer 115

Linear correlation involves measuring relationship between 2 variables measured in a sample We use sample stats to estimate population parameters -whole logic of inferential statistical testing

Answer 116

no relationship between population variables

Answer 117

* Both **variables should be continious** * **Related pairs**: each P (or observation) should have a pair of values (one for each axis/IV) * **absence of outliers**: outliers skew results, we can usually just remove them * **linearity**: points in scatterplot should be best explained w/ a straight line

Answer 118

**they are sensitive to range restrictions** * E.g floor and ceiling effects - floor effect, clustering of scores at bottom of scale, ceiling effect = clustering at top of scale * Can be hard to see relationship between variables as you dont see how far they stretch due to cap **There is debate over likert scales,** if you have 6-7 points, can get away with parametric, if you have less, best to use non-parametric

Answer 119

use non-parametric equivalent **Spearman's rho (or kendall’s Tau if fewer than 20 cases)

Answer 120

* Investigates relationship between 2 quantitative continuous variables * Resulting correlation coefficient ( r ) is a measure of strength of association between the two variable

Answer 121

Variance between the x and Y variable

Answer 122

1. For each datapoint, calculate diff from mean of X and difference from mean of Y 2. Multiply the differences 3. Sum the multiplied differences 4. Divide by N-1

Answer 123

a measure of variance shared between our X and Y variables it is a ratio of covariance (the shared variance) to separate variances

Answer 124

If covariance is large relative to separate variances - r will be further from 0 If covariance is small relative to the separate variances - r will be closer to 0 If the things (variables) tend to go up and down together a lot (large covariance), the correlation (r) will be far from 0, indicating a strong relationship. If the things don't move together much (small covariance), the correlation will be closer to 0, indicating a weaker relationship.

Answer 125

how well a straight line fits the data points (i.e strength of correlation → strength is about how tightly your data points fit on the straight line ) If data points cluster closely around the line, r will be further from 0 If data points are scattered some distance from the line, r will be closer to 0

Answer 126

The fact that if you took two samples from the same populations you're likely to get two different R values.

Answer 127

if we plotted the R values, the majority would cluster around a common point,the true populaion mean.

Answer 128

The mean would be 0 thus most R values would cluster close to 0

Answer 129

* It is the extent to which an individual sampled correlation coefficient (r) deviates from 0 which can be expressed in standard error units * we can determine the probability of obtaining an r-value of a given magnitude when the null hypothesis is true (p-value) * **the mean is 0**

Answer 130

the obtained r-value is a point estimate of the underlying population r-value

Answer 131

* Similarly to linear correlation, it is used when the relationship between variables x & y can be described with a straight line * by proposing a model of the relationship between x & y, regression allows us to estimate how much y will change as a result of given change in x

Answer 132

The variable that is being predicted --> **the outcome variable**

Answer 133

The variable that is being used to predict --> **The predictor variable** **can have Multiple predictor variables **

Answer 134

* Investigating strength of effect x has on y * Estimating how much y will change as a result of a given change in x * Predicting a value of y, based on a known value of x

Answer 135

Regression assumes that Y (to some extent) is dependent on X, this dependence may or may not reflect causal dependency. This therefore means **regression does not provide direct evidence of causality**

Answer 136

No, other factors other than our used predictor variables may come in to effect, thus can't suggest causality.

Answer 137

1. analysing the relationship between variables 2. proposing a model to explain the relationship 3. evaluating the model

Answer 138

Determining the strength & direction of the relationship

Answer 139

a line of best fit where the distance between the line and the individual datapoints is minimised as much as possible

Answer 140

* half above, half below line * clustered as close as possible to line (signifies strong relationship) * distance is minimised as much as possible

Answer 141

* **The intercept:** value of y when x is 0 (typically the baseline) (a value) * **The slope:** how much y changes as a result of a 1 Unit increase in x (the gradient) (b value)

Answer 142

Assessing the goodness of fit of our model (best model/line of best fit) vs the simplest model (b=0, comparing data points to the mean of y)

Answer 143

* Using the average Y value (mean) to estimate what Y might be * **assumes no relationship between x and y (b=0) **

Answer 144

* based on the relationship between x & y * uses regression line & line of best fit to **determine what a value of Y would be at a particular value of X** * allows for better predicition

Answer 145

first check how much variance remains when checking the simplest model (mean of y) to predict Y. **This provides the sum of squares total** (diff between each data point & mean value, & squaring it and summing them )

Answer 146

calculate difference between each data point and point on the line it matches up to (score that would be predicted), square these differences and then add them together **This gives you the sum of square of the residuals**

Answer 147

The model is providing a better model, meaning there is smaller error variance, and that the model is more accurate (about variance due to the variable in question).

Answer 148

*the difference between the observed values of y and the mean of y i.e. the variance in y not explained by the simplest model (b = 0)* '

Answer 149

Sum of squares residual

Answer 150

The difference between Sum of squares total and and sum of squares residual , in other words **the model sum of squares ** **SST - SSR = SSM **

Answer 151

a large(er) improvement in the prediction using the regression model over the simplest model

Answer 152

the improvement due to the model (SSM) relative to the variance the model does not explain ( SSR) It is reported as the F-ratio

Answer 153

* provides a measure of how much the model has improved the prediction of y, relative to level of inaccuracy of the model * F = Model mean squares / residual mean squares

Answer 154

the improvement in prediction due to the model (MSM) will be large, while the level of inaccuracy of the model (MSR ) will be small

Answer 155

* Linearity: x and y must be linearly related * Absence of outliers * Normality * homoscedasticity * Independence of residuals

Answer 156

Using a normal P-P plot of regression standardised residual * Ideally data points will lie in a reasonably straight diagonal line from bottom left to top right - this would suggest no major deviations from normality

Answer 157

Using the scatterplot of regresssion standardised residual * Ideally, residuals will be roughly, rectangularly distributed, with most scores concentrated in the centre (0)

Answer 158

* **R** - strength of relationship between x and Y * **R^2**- proportion of variance explained by the model * **Adjusted R^2** - R^2 adjusted to account for the degrees of freedom (number of participants and number of parameters being estimated)

Answer 159

-If we wanted to use the regression model to generalise the results of our sample to the population, R2 is too optimistic

Answer 160

* **a** - constant, also the intercept where the line intersects Y * **b** - gradient of slope * **beta** - slope converted to a standardised score

Answer 161

Beta coefficient and R are the same value

Answer 162

* t-value: equivalent to √F when we only have 1 predictor variable) * *i.e.** it does the same job as the F-test when we have just one predictor variable**

Answer 163

The b value has 95% confidence intervals

Answer 164

the amount of variance in y explained by the model (SSM), relative to the total variance in y (SST)

Answer 165

as a proportion or as a percentage

Answer 166

Correlation shows what variance is shared,Regression explains the variance by showing that a certain amount of the variance can be explained by the mode

Answer 167

to assess the influence of several predictor variables (e.g. x1, x2, x3 etc…) on the outcome variable (y)

Answer 168

Need to combine both predictor variables to see the joint effect on the outcome variable

Answer 169

Because you're looking at 3 things; outcome variable & predictor variables one and two , thus it will be best model in 3 dimensions instead of two, thats why we look at a plane instead of a line

Answer 170

* **Sufficient sample size** * **Linearity** - Predictor variables should be linearly related to the outcome variable * **Absence of outliers** * **Multicollinearity** - *Ideally, predictor variables will be correlated with the outcome variable but not with one another

Answer 171

* There is some overlap in the variables you are measuring for (the predictor variables might be one thing in two different terms - e.g., confidence and self-esteem are basically the same) * Predictor variables which are highly correlated with one another (r = .9 and above) are measuring much the same thing

Answer 172

* The regression model provides a better fit (explains more variance) than the simplest model *** I.e at least one of the slopes is not 0 (without specifying which)**

Answer 173

Hierarchical regression involves entering predictor variables in a specified order of 'steps' based on theoretical grounds. This allows us to see the relative contribution of each 'step' (set of predictor variables) in making the prediction stronger.

Answer 174

* Examine influence of predictor variable(s) on an outcome variable after ‘controlling for’ (i.e partialling out) the influence of other variables

Answer 175

Step 1 (what you want to partial out) Step 2 (what you want to measure) = optimism

Answer 176

The row labelled Model 2. Particularly the R square change, F Change and Sig F change values. (Check SPSS, this will make sense)

Answer 177

Whether this predictor variable alone explains a significant proportion of the variance of the outcome variable

Answer 178

* Between P’s - Independent T-test → Mann-whitney U Test * Within P’s - Paired T-test → Wilcoxon test * Between P’s - 1 way independent ANOVA - Kruskal Wallis test * Within P’s - 1 way Repeated measures ANOVA → Friedman test

Answer 179

Factorial designs do not have a non parametric equivalent and either need to have a simplified design or have adjustments made

Answer 180

Spearmans rho

Answer 181

Kendall's tau

Answer 182

Spearmans rho and Kendallls tau are both non parametric equivalents of pearsons correlation coefficient

Answer 183

Partial correlation has no non-parametric equivalent

Answer 184

Regression has no non-parametric equivalent

Answer 185

Chi-square (one variable or test of independence)

Answer 186

non-parametric

Answer 187

neither of them have parametric equivlaents, they are non-parametric only

Answer 188

An ANOVA (because they control for familywise error rate)

Answer 189

n x (n-1/2) e.g N = 3 3((3-1)/2) = 3(2) / 2 = =6/2 = 3 e.g N = 6 6((6-1)/2) = 6((5)/2) = 30/2 = 15

Statistics And Data Analysis 21061 Flashcards

(218 cards)