statistical analysis Flashcards

1
Q

basic results section format of report

A
  1. restate hypothesis
  2. assumption check
  3. descriptives analysis (including visualisations and tables)
  4. inferential analysis
  5. accept/reject/retain hypothesis with brief interpretation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Descriptives

A

= used to compare conditions in relation to a hypothesis
- data in described in terms of point estimate(central tendencies), spead, shape and outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

central tendency types

A

mean = most sensitive measure as it’s value is directly affected by each value, most common.
median = if data contains extreme values, we use the median. position =(N+1)/2 -> number at this position
mode = used in catagorical data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

shape of distribution

A
  • if mean, median and mode are all equal -> normal distribution (symmetrical)
  • negative skew = tail runs towards the lower values, peaks to the right
  • positive skew = tail runs towards the larger values (peaks to the left)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

measures of spread

A

= the degree of dispersion or variability of values in a dataset
- range ( subtract lowest from highest value)
- interquartile range (between 25th and 75th percentiles)
- variance ( average squared deviation from the mean)
- standard deviation ( square root of variance)
- all these refer to the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

calculating outliers using IQR method

A

IQR x 1.5 = lowest value that is not an outlier. Hence, anything blow this will be an outlier.
- if lower quartile = 90 and IQR = 10 then 90-(10x1.5) = 75, so any value less than 75 is an outlier, repeat for upper q.

  • upper q = 90+10 = 100 +(10 x 1.5) = 15 = 115, any value > than 115 is also an outlier.

IQR = length of the box

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

standard error of the mean

A

= a measure of deviation of the sample mean from the population mean.
- is calculated by the SD of the sample/ the square root of the number of values
- is represented by SE.
- as the sample size increases, SE will decrease as the uncertainty about the population mean decreases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

probability

A

= the likelihood of the occurrence of an event or outcome
p = number of ways the event could arise/ number of possible outcomes
- expressed in decimals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

joint probability

A

= the probability of two unrelated events occuring together
- calculated by multiplying together the probability of each individual event

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

replacement

A

= resetting the number of outcomes to the original value after an event occurs.
eg take one card out deck of 52, deck becomes 51
No replacement would mean you just used the 51
Replacement would mean you replace the card to make it 52 again

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

null hypothesis significance testing (NHST)

A

= assuming that the null hypothesis is true, what is the probability of obtaining the value that we did, or a larger. Always test even if hypothesis is not null.
- we construct two hypothesis
1.) null hypothesis = no difference/ relationship
2.alternative hypothesis = difference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

level of significance = alpha(a)

A

= the pre-determined level of significance at which we reject the null hypothesis, usually .05, the cut off line, false positive error rate
rejection region = portion of a sampling distribution which includes samples with probabilities less than alpha (a).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

z- score

A

= the number of standard deviations any particular score is away from the mean.
- relies on the assumption that the data is not heavily skewed.

  • to calculate: subtract the mean from your value, and divide by SD of the dataset.

+ -> value is above the mean
- -> value is below the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

confidence intervals - population mean

A

= range of values that is expected to capture the true value of a parameter (population mean) with a specified degree of confidence.
- they are an estimate
- 95% is most common

upper 95% CI value = mean + (1.96 x standard error)
lower 95% CI value = mean - (1.96 x standard error)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

one sample chi-squre test

A

= only interested in whether the observed categorical frequency distribution differs from what would be expected by probability.
- assumes that each participant contributes one observation and their are at least two or more categorical outcomes.
- hypothesis has to be testable and a significant difference can be tested.
- observed frequencies = the actual counts per category
- expected = calulated for each group by 1/k x n
where k is the number of categories and n is the total number of observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

degrees of freedom

A

= the number of observations that are free to vary to produce a given outcome (known test statistic)

n-1
n= number of conditions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

chi squared value and the standard write up

A

= if our chi squared value is greater than the critical value, then we reject the null hypothesis and accept the alternative one.

  • we found…x^2 (df, N) = chi-square value, p-value, effect-size
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

effect size

A

= a standardised measure of the difference of interest, is comparible across experiments that use different numbers of observations (N).
.1 = small effect size
.5 = large effect size.
phi = the square root of the chi squared value/ N

19
Q

p value

A

is the probability of finding a test statistic as large, or larger, than the one found in your study, if the null hypothesis was true.

  • If p > α, then fail to reject null hypothesis
    If p < α, then reject the null hypothesis and accept alternative hypothesis
20
Q

cross tabulation chi-square tests

A
  • more than one category
  • requires independent samples and categorical variable
  • does not explicitly distinguish between IV and DV
  • shows whether the variables are related/have an association.
  • same formula as for one sample.
21
Q

how to calculate expected values in a cross tabulation chi-squared and the df

A

total row x total column/ N total

df = (rows -1) x (columns - 1)

22
Q

effect size - cramer’s V

A

= is the square root of the chi square value/ the total number of observations * the minimum value or c-1 OR R-1
c = number of columns
R = number of rows
use whatever value is smaller
- crammers V and phi are both standard measures of the size of relationship/effect

23
Q

notes on the write up of cross tabulation chi squares test

A
  • do not calculate new expected values, for the follow up test. You must use the expected values from the original.
  • if there is a significant association, we run further analysis (one sample chi squares based on the comparison your interested in.
24
Q

the t-test

A

= an inferential test used to compare differences between two samples/conditions or between one sample and a criteria.
- based on NHST
- the degrees of freedom relates to the number of participants.

25
Q

the t-value

A

= a ratio between two aspects of data

t = difference between group means/variability about the group means

small difference + large variability = small t
big mean difference + small variability = big t

26
Q

the t distribution

A

= helps figure out when our t value is large enough to say there is a significant difference
- the more people you test, the more the t-distribution looks like the normal distribution.
- t value needs to be above the cut off to be significant. This value becomes smaller the more participants there are closer to the normal.
- different from normal distribution as this is based on a whole popultion, t distribution is based on your number of participants.

27
Q

two tailed test

A
  • looks to both sides of the distribution.
  • makes no reference to the direction of the difference, just states there is one.
  • most alternative hypothesis’ are non-directional.
  • “there will be a significant difference in Y if I chnage X”
28
Q

one tailed test

A
  • specifies the direction of the difference
  • “directional alternative hypothesis”
  • only looking at either the left OR the right of the distribution.
  • t value is significant when lower than this critical value
  • can only look in this one direction, cant look for results in the other direction.
29
Q

confidence interval

A

are an estimate of the number of samples that would contain the mean using the data you have collected, based on probability.
- If the lower and upper CI don’t cross zero at all, there going top be significantly different, if the do then won’t be significant.

30
Q

power

A

= is the long run probability of your experimental design correctly rejecting the null hypothesis when there is a true effect to be found, for a given size and sample set.
- is the measure of the ability of a design to find an effect when there is an effect to be found.
-tests with power <0.8 are underpowered.
calculated as 1-beta
- is an estimate

31
Q

beta, alpha and power

A

alpha -> stating that there is an effect when there isn’t (false positive)
beta -> stating that there isn’t an effect when there actually is (false negative)
power = the reverse of beta, about saying there was an effect and there was.

32
Q

using power- APES

A

A= alpha
P = power
E = effect size
S = sampe size
- we determine alpha and power so can use this to establish E and S

33
Q

power analysis

A
  • helps to determine how many people you should have in your quantitative study.
  • determine what size of effect your study can reliably find
  • use your smallest effect size of interest (SESOI) -> what size of difference would you care about, a smaller effect size requires more participants.
  • a = .01 has lower power than .05 for a given sample size and effect size.
  • a small sample size can only ever find large effect sizes.
34
Q

one sample t test

A

= compare one sample against a known test-value.
eg did my test condition score significantly better than 50% overall
- assumptions:
-assumes interval/ratio data
continuous scale
assumes scores are independent from each other
data is approximately normally distributed
-during write up put df in the bracket
t(df) = t value, p < .05, d = effect size

35
Q

within sample t test

A

= compare two conditions where it is the same participants in each condition or different participants but they are closely matched on a criteria.
- is calculated by the mean of the difference in scores between conditions/ by the SD of the difference in scores over the square root of N.
- assumptions:
interval level/ratio data, continuous scale, the difference between scores is approximately normally distributed
scores are NOT independent.
write up same as one sample

36
Q

between samples t test

A

compare two groups or conditions where the participants are different in each group and have not been matched by criteria.
- two types students t test or welch’s t -test
- all t tests are parametric, meaning they reply on more assumptions about the data.

37
Q

residuals

A

= the difference between your estimate and the actual data point.
normality -> the residuals of the data are normal, not necessarily the original data you collected.

38
Q

what to do if your degrees of freedom is not on the look up table

A

go to the next smallest df (this will have a higher critical value)

39
Q

matched pair approach

A

Is where participants in the two conditions are different participants but they have been matched on a variety of demographics so that the only change is the change of interest.

40
Q

what to include in t tests

A
  • the t value
  • df
  • ## effect size d
41
Q

the students t-test

A
  • used with groups with similar variance and sample size
  • sp = pooled SD (specific formula)
  • s2= variance
    assumptions:
    interval/ratio data
    continuous scale
    all scores are independent from each other
    data is approximately normally distributed
    homogeneity of variance (equal variance in each group)
42
Q

welch’s test

A
  • does not make an assumption about the variance
  • rest of the assumtions are the same as the students test
43
Q

polarity of the t-value

A
  • the polarity of a t value (+ or-) tells you which group has the larger mean. It doesn’t change the outcome.
  • it would matter if a one tailed test was being ran
    • If A minus B gives negative t-value then B has larger mean
  • If A minus B gives positive t-value then A has larger mean
44
Q

purpose of a t test

A

to determine if there is a significant difference between the means of two groups and how they are related.