Stats only Flashcards

1
Q

What is important to consider when looking at sample size?

A
  • Size matters
  • Sampling error can result if your sample is not large enough
  • Trade off between size and time/cost
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Factors in deciding sample size?

A

o Design
o Response rate
o Heterogeneity of population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a population parameter?

A
  • a quantity that describes some characteristic of a population with respect to a specific variable
  • E.g., population mean, population range etc.
  • Not usually possible to calculate
  • Might be given to you if available
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a sample statistic?

A

– a quantity that describes some characteristic of a sample with respect to a specific variable

  • E.g., sample mean, sample range etc.
  • We can always calculate these from a sample
  • Sample statistics provide an estimate of population parameters
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why is it important to summarise data?

A
  • Data can be very complex and therefore it is useful to summarise it
  • Allows for interpretation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are measures of central tendency?

A

They provide an indication of a “typical” score in the data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the mean?

A

o Provides and estimate of the average score in the data set

o Is affected by extreme data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the median?

A

o Is insensitive to extreme scores in the data set

o Doesn’t reflect the shape of the scores e.g., doesn’t care how far away the extreme scores are

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the mode?

A

o Easy to calculate from a histogram and easy to understand – the most common value
o Data might have more than 1 mode or no mode at all

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the range?

A

o Difference between min and max scores

o Range doesn’t always change for distributions with different shapes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a deviation?

A

o The signed distance of a score from the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How to calculate simple variance?

A
o	Calc mean
o	Calc deviations
o	Square deviations
o	Calc a slightly adjusted average squared deviation
        - You divide by n-1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the issue with simple variance?

A

potential issue is the units used, so if deviations are in hours, when squared the units would become hours squared which isn’t comprehendible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How to calculate sample standard deviation?

A
o	Calc mean
o	Calc deviations
o	Square deviations
o	Calc sample variance
o	Take square root of sample variance – now back in comprehendible units
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a histogram?

A
-	Good way to inspect data
o	Can see if there’s any odd-looking scores
o	Can see the mode
o	Can see how spread out the scores are
o	Can see how the data is distributed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a box plot?

A

Seems to be plotted vertically instead of horizontal?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a scatter plot?

A

shows the relationship between variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is a data summary plot?

A
  • Plot bar showing mean (categorical data) or line graph (numerical data)
  • Plot error bars showing +/- 1s.d.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is distribution of data?

A

the manner in which data for a particular variable is spread over its range is commonly referred to a its distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is normally distributed data?

A
  • Many naturally occurring variables are Normal
  • E.g., height, IQ (not naturally occurring but has been defined as this)
  • If we don’t have much data then the normality can be difficult to see in a histogram
  • As sample size increases, the normality will emerge
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is non-normal data?

A
  • Has a tail either to the right or left – skewed data
  • Positively skewed = long tail to the right, peaks at the left
  • Negatively skewed = long tail to left, peaks at the right
  • E.g., reaction time – tends to be positively skewed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the danger of non-normal data?

A
  • Danger – mean is distorted by the tails which are the more extreme values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the danger of bimodal data?

A
  • Danger – mean is not representative

- Tends to suggest an issue with your experiment – more than one underlying population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is bimodal data?

A

Data that has two modes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is the normal distribution?

A
  • Bell-shaped
  • Symmetric about the centre
  • Tails never reach 0 – go towards infinity
  • The area under the centre is always equal to 1
  • Very close to 0 by the time it gets to 3 SD from the mean – can use this to draw a rough idea of a normal distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is probability?

A

– a measure of how likely it is that an uncertain event will occur

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is conditional probability?

A
  • Probability of an event given that something else is known/assumed e.g., A|B
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is a z-score?

A
  • Z measures how far away your sample is from the population mean in multiples of the SD
  • If you were to find z-scores for all points on a normal distribution, you would find that it would form a normal distribution with mean 0 and SD 1 – N (0, 1)
  • The area underneath a normal distribution above/below some variable value of x EQUALS the area underneath N (0, 1) above/below z
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

How do you obtain a z-score?

A
  • Obtained by subtracting the population mean from x and then dividing by the population SD – (x-µ)/σ
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is a SND table and how do you use it?

A
  • Table that provides values of areas underneath the SND in different ranges
  • Find z-score (first column) then decide if you want the area above or below this score
  • If z-score is negative, use the positive value in the table but be careful when choosing above or below because the scores will be flipped
    o E.g., z-score = -2 and you want the area below. On table you will use z-score 2 but use the area above
  • If you have a range that is bounded e.g., 70
31
Q

What is a sampling error?

A

Sampling error – the error associated with examining statistics calculated from a sample rather than the population

32
Q

Why do sampling errors occur?

A
  • It occurs because in our sample we don’t have all the members of the population
33
Q

What does the magnitude of a sampling error depend on?

A

The sample size

  • Bigger sample = big sampling error less likely
  • Smaller sample = big sampling error more likely
34
Q

How do we generate a sampling distribution?

A
  • Take a sample (size N) from a population
  • Calculate a sample statistic (e.g., mean, SD etc.)
  • Add the new statistic to a frequency plot (a histogram) of the sample statistic
  • Repeated the above 3 steps multiple times
35
Q

What does the sampling distribution tell us?

A
  • Tells us important info about how a statistic changes from sample to sample
  • What is the mean value of the statistic over all samples?
  • How variable is the statistic over all samples?
  • What shape is the distribution of the statistic over all samples?
36
Q

What are the properties of the sampling distribution of the mean (SDM)?

A
  • Mean which is the same as the parent population
  • SD is different to that of the parent population – find by calculating σ (of p pop)/√N (sample size)
  • SD is called the standard error of the mean (s.e.m.) or standard error (s.e.)
  • S.e.m. must be smaller than SD of the parent population because you are diving by something that is bigger than one
37
Q

What is a parent population a distribution of?

A

Parent population is a distribution of individual scores x (e.g., from an individual person or thing)

38
Q

What is SDM a distribution of?

A

SDM is a distribution of sample means for samples of size N drawn at random from the parent population

39
Q

What is central limit theorem?

A
  • Given a population with a mean and SD, the sampling distribution of the mean approaches a normal distribution with a mean and SD sigma/ square root N as N increases
  • This is true regardless of the underlying distribution – so even if your population is not normal, the distribution of means sampled from it will be
40
Q

How do you find a z-score for a SDM?

A

z-score = (x-µ)/(σ/√N)

41
Q

What is a point estimate?

A

– a single value estimate of a population parameter e.g., sample mean

42
Q

What is an interval estimate?

A

– a range of possible values of a population parameter e.g., confidence interval

43
Q

What is a confidence interval?

A

– describes an interval (e.g., a range) of values for our population parameter, together with a specified level of confidence that the parameter is in that range

44
Q

For a sample drawn at random from a normal population N (µ, σ) with known s.d. σ ,the 95% CI for the population mean is centred on the sample mean m and goes from?

A

m – (1.96 x σ/√N) to m + (1.96 x σ/√N)

45
Q

What does a 95% confidence interval mean?

A

A 95% confidence level means that if we repeated our sampling many times and worked out a new CI each time centred on our new sample mean we would expect the population mean to be in the interval on 95% of those repeats

46
Q

True or False, if centred on sample mean, there is a 95% chance that the population mean is also in the range and vice versa (if looking for a 95% confidence interval)?

47
Q

True or False, if centred on sample mean, there is a 5% chance that the population mean falls outside of this range and vice versa (for a 95% confidence interval)?

48
Q

What are the steps for null hypothesis testing?

A
- Formulate research hypothesis  	
o	Null hypothesis (H0)
o	Research hypothesis (H1)
- Collect data
- Evaluate inconsistency with H0 and data
o	How inconsistent are the data with H0?
- Reject or fail to reject H0?
- Interpret in context
49
Q

True or false, If we were able to reject the null (H0) in favour of the research hypothesis (H1) then we can claim to have evidence for the research hypothesis?

50
Q

True or false, If we fail to reject the null (H0) then we can claim to have evidence for the null hypothesis?

51
Q

What do values of p > α suggest?

A

suggest not inconsistent with H0: fail to reject null

52
Q

What do values of p > α suggest?

A

suggest not inconsistent with H0: fail to reject null

53
Q

What do values of p < α suggest?

A

suggest inconsistent with H0: reject the null

54
Q

What is the value of α in stats?

55
Q

What is the p-value?

A

p-value = the conditional probability associated with your sample statistic

56
Q

How do you conduct a z-test?

A
  • Use NHST framework
  • Calculate inconsistency with mean by calculating the z-score, use the table to find the associated p-value and compare this to 0.05 to decide whether to reject or fail to reject the null hypothesis
57
Q

When is a z-test used?

A
  • To check if a sample mean that has been obtained is different from some population mean
58
Q

What is a 1 tailed hypothesis that is right hand tailed?

A
  • Something is better than the population
  • H1: sample mean > population mean
  • Looking for p-value above score
59
Q

What is a 1 tailed hypothesis that is left hand tailed?

A
  • Something is worse than the population
  • H1: sample mean < population mean
  • Looking for p-value below score
60
Q

What is a two tailed hypothesis?

A
  • Something is different than the population
  • H1: sample mean =/= to population mean
  • Looking for p value above and below score – have sample mean and then also find another value the same distance away from the population mean but on the other side. E.g., population mean = 67.5, sample mean = 70.7, the difference is 3.2 so the other value you should consider is 64.3 (z-score will be the same for the two)
  • Conditional probability = 2 x p-value
61
Q

When can you formulate a 1 tailed hypothesis?

A
  • There is previous research

- You can predict the effect

62
Q

What is a type I error? Why does it occur?

A
  • Rejecting the null hypothesis when it was correct – occur due to sampling error
63
Q

What is a type II error? Why does it occur?

A
  • Failing to reject the null hypothesis when it was incorrect
  • Arise due to a number of reasons such as a biased sample, an error in the experimental task, sample size was too small etc.
64
Q

Why do we use α = 0.05?

A
  • It is small so it is difficult to reject the null hypothesis but not so small that it is impossible to do so
  • It is a compromise between type I and type II errors
65
Q

How is a student’s t distribution similar to SND?

A
  • Bell-shaped, symmetric, uni-modal
66
Q

How is a student’s t distribution different to SND?

A
  • Has a lower peak, higher tails, have more variance
67
Q

When is a student’s t distribution used?

A
  • When population s.d. is unknown
68
Q

Does student’s t distribution include a variety of t tests?

69
Q

How do you find the t statistic?

A

T(m) = (m-µ) / (s/√N)

70
Q

How do you find the estimated standard error?

A

(s/√N) – estimated standard error

71
Q

When using t distribution table, what value should you use for v?

A

When using t table – t (v = N-1) – subtract 1 off of sample size

72
Q

How do you find confidence intervals when population s.d. is unknown?

A
  • For 95% of repeat sample mean m would be within:
    o Some number c e.s.e.’s of µ
    o (µ- (c x s/√N) to µ+ (c x s/√N))
  • To find c:
    o Find t value for 0.025% in one tail (or 0.05% for 2 tails)
73
Q

How do you conduct a 1 sample t test?

A
  • Same as a z test except:
  • Work out e.s.e.
  • Find t statistic
  • Find if t stat is inconsistent with critical value for corresponding t(n) and significance level
  • Reject or fail to reject H0
  • Interpret in context
74
Q

When do you use a 1 sample t test?

A
  • Use to test whether sample mean you have is different from some given or hypothetical population mean