Descriptive & Inferential Stats Flashcards

1
Q

What are the main four steps in the data collection process?

A
  1. Constructing a data collection form
  2. Establishing a coding strategy
  3. Collecting the data
  4. Entering data onto the collection form
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The data entry process is vulnerable to what?

A

Human error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When coding data, what main things should you remember?

A

Use single digits
Use codes that are simple and unambiguous
Use codes that are explicit and discrete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Name some data collection rules:

A
  1. Get permission from your institutional review board
  2. Decide what type of data you will need to collect
  3. Consider where will the data come from?
  4. Make a duplicate of original data and keep separate
  5. Ensure whoever is collecting data is well trained
  6. Cultivate sources for finding participants
  7. Don’t throw away your original data!
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Qualtrics and SPSS used for?

A

Qualtrics - for creating online questionnaires
SPSS - used to enter data collected from data collection forms and analyse the data using a wide range of statistical methods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are descriptive statistics?

A

Descriptive statistics include measure of central tendency, variability and distribution, and association, presented both numerically and visually. It assists in simplifying the data so you can analyse it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the three main statistics for central tendency?

A

Mean
Median
Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is central tendency?

A

Central tendency looks at the middle of the data and tries to capture a middle road picture of the data set.
3 main types - mean (sum of a set of scores divided by the number of score), median (the score of point in a distribution above which one-half of the scores lie) and mode (the score that occurs the most frequently)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is one major issue with the mean value? (with respect to scores…)

A

It can be influenced by extreme scores or outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When should we use the Median?

A

Relatively few scores fall at the high or low end of the distribution, when the distribution is not normal. You still include the extreme scores…
Use with ordinal data eg. rank in class, birth order, income

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When should we use the Mode?

A

When the data is measured in a nominal (sometimes ordinal) scale. eg eye colour party affiliation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When should we use the Mean?

A

For interval and ratio data eg. speed of response, age in years

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is variability?

A

Variability refers to the dispersion or spread of scores in the data. Some measures of variability are: range, inter-quartile range, standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is range?

A

Range is the simplest and crudest measure of variability - it is effected significantly by extreme scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is interquartile range?

A

Quartiles divide set into four equals based on three values (the middle value being the median). The interquartile range is the upper quartile minus the lower quartile.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is standard deviation?

A

The average amount that each individual scores deviate from the mean. SD is good when you have a normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Researchers like to examine the entire set of scores at one time.. how can they do this?

A

By looking at the data’s distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is a histogram?

A

Graph showing frequency distribution (Y = frequency, X = score)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a normal distribution?

A

A normal distribution has the following properties: it has a bell shape, the mean and median are equal, and 68% of the data falls within 1 standard deviation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is something (not average…) that the normal distribution can tell us?

A

It tells us a lot about people who deviate from the average cluster of people (whether low or high end).
You can accurately determine what percentage of the data is below or above any value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Name some ways data can deviate from the norm?

A

Skew
Kurtosis
Modality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Explain skew

A

skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Explain the types of skew

A

Positive skew - scores bunched at low values with tail point to high values (ie the positive end)

Negative skew - scores are bunched at the high end with the tail pointing to the low values

24
Q

What is kurtosis

A

The sharpness of the peak of a frequency-distribution curve.

25
Q

Explain the types of kurtosis

A

Positive kurtosis - distribution has many scores in the tail than the normal distribution (usually looks more peaked)
Negative kurtosis - distribution has fewer scores in the tail than the normal distribution (looks flatter)

26
Q

What is modality?

A

The modality of a distribution is determined by the number of peaks it contains

27
Q

Explain the types of modality

A

Unimodal - one prominent high point
Bimodal - two high peaks
Multimodal - multiple prominent high peaks

28
Q

What is a Z score?

A

Z score is a measure of how many standard deviations below or above the mean a raw score is. Also known as a standard score and it can be placed on a normal distribution curve.

29
Q

Probability distribution is illustrated by a ….

A

Bell curve

30
Q

Ideally, data would be distributed symmetrically around the centre of the all scores… why?

A

Many inferential tests (ie that assist us analysing the data and make inferences) require normal distributed data

31
Q

X axis on the bell curve is associated with what..?

A

Different Z scores along the x axis

Location of the z score is assoc with a percentage of the distribution

32
Q

Z scores are used to predict…?

A

The percentage of scores both above and below a particular score
The probability that a particular score will occur in a distribution

33
Q

What can z scores determine?

A

Whether it is likely that an observed score would be seen within a normally distributed population.
This can assist us identifying anomalies in the data - like outliers, and also whether it is likely that two samples of data are likely to appear in the same population (or are they statistically significant?)

34
Q

What is the Empirical Rule of Standard Deviation?

A

That 68% of the data falls within 1 standard deviation of the mean
That 95% falls within 2 standard deviations of the mean
That 99.7% falls within 3 standard deviations of the mean

35
Q

What is the Central Limit Theorem?

A

States that the distribution of sample means approximates a normal distribution as the sample size gets larger.

36
Q

What are inferential statistics?

A

Inferential statistics allow us to infer something from our data - conclusions that extend beyond the immediate data alone. to the larger population.
We can make judgments of the probability that an observed difference between groups is a dependable one or one that might have happened by chance in this study.

37
Q

Basically, how are inferences made from two groups?

A

Representative samples from the two groups are selected
Participants tested
Means of both groups are compared
We conclude that the measured differences are either:
a result of chance or
reflective of a true difference
Conclusions are drawn regarding the observed difference

38
Q

What is the first explanation for observed differences in a group?

A

Chance
However the goal of science is to control for sources of variability and therefore reduce the role of chance as an explanation

39
Q

The _____ distribution is the foundation for many inferential statistical tests.

A

Normal distribution

40
Q

Why is the central limit theorem a critical assumption in being able to generalise results to the population?

A

Because nothing about the population distribution of scores needs to be known to generalise results from a sample to the population.
We can use properties of the normal distribution for our inferences about the population

41
Q

Why is sample representativeness so important?

A
  1. Because the calculation of inferential statistics is based on the extent to which the sample is representative - ie d
    does the sample mean = population mean?
  2. Because the hypothesis cannot be tested on the entire population so a sample is only option
  3. Because sample is imperfect - therefore there will be sampling error
42
Q

How can we limit sampling error?

A

Increasing sample size

Stratification

43
Q

Background to inferential statistics…. Fisher said what? What did Neyman and Pearson say..?

A

Fisher - (Lady and the tea story experiment) calculate the probability of an event and evaluate this probability within the research context. ie Only when there is a very small probability that observed behaviour is due to chance alone would we conclude there was a genuine effect.
Neyman and Pearson - scientific statements should be split into testable hypothesis

44
Q

What are two types of hypothesis?

A

Null
Alternative (research)
Can be directional (predicts the direction of a difference) or non-directional (does not predict the direction of difference)

45
Q

Explain how we can prove the alternative hypothesis

A

We can’t.

We can reject the null only (or fail to reject it)

46
Q

What is statistical significance?

A

The degree of risk that you are willing to take that you will reject the null hypothesis when it is actually true (ie what percentage are you ok with being ‘wrong’?)

47
Q

What is NHST

A

Null hypothesis significance testing

  1. Assume the null is true
  2. compute the probability (p) of the observed data or more extreme, given the null is true
  3. if p is small enough (predetermined level of significance) the observed data is unlikely under the null - so we REJECT the null and accept the alternative
48
Q

What are the two mistakes we can make in testing hypothesis?

A

Type 1 error - we reject the null when we shouldn’t (ie we believe there is an effect but there isn’t - and this is your predetermined level of alpha - usually 0.5)
Type 2 error - we fail to reject the null (ie we don’t think there is an effect when there is one)

49
Q

How do we decrease the likelihood of committing a Type 2 error?

A

Increase sample size

Type 2 error decreases with bigger effects

50
Q

What are degrees of freedom?

A

Degrees of freedom of an estimate is the number of independent pieces of information that went into calculating the estimate.

51
Q

To test for significance, what tests should we use for establishing whether there is a difference between the means of:

  1. two unrelated groups
  2. two related groups
  3. three groups
A
  1. t-test for independent means
  2. t-test for dependent means
  3. ANOVA (analysis of variance)
52
Q

What is the t-value?

A

The t-value measures the size of the difference relative to the variation in your sample data. PT is simply the calculated difference represented in units of standard error. The greater the magnitude of T, the greater the evidence against the null hypothesis

53
Q

What is a t-test?

A

A t-test is a type of inferential statistic used to determine if there is a significant difference between two data populations and their means.

54
Q

What is a critical value?

A

Value needed to reject the null which depends on

  • level of significance chosen
  • degrees of freedom
55
Q

What should we do if we obtain a value more than the critical value?

A

We reject the null

56
Q

What should we do if we obtain a value less than the critical value?

A

Fail to reject the null, ie accept the null