Statistics Flashcards

1
Q

The Scientific Method

A

A logical, systematic approach to the solution of a scientific problem

  1. Develop theory; observations, literature review, prior research
  2. Construct a hypothesis
  3. Design a study
  4. Analyse data
  5. Draw conclusions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a parameter?

A

A numerical summary of a population. Such as mean, median, range… of a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the types of data?

A
  • Quantitative; numeric i.e. age, height, weight

- Qualitative; descriptive i.e. favourite colour, suburb, type of car

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does discrete data include?

A

Only limited set of values

  • Nominal: values where order is arbitrary (i.e. gender, ethnicity, etc), unordered, categorical also known as binary, dichotomous, indicator variable (qualitative)
  • Ordinal: scale where ranking matters but are not consistently correlated (i.e. NYHA), ordered categorical (e.g. level of education, high-school, under/post degree) (qualitative OR quantitative)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does continuous data include?

A

Unlimited values

  • Interval: have legit mathematical values (i.e. temperature), numeric scale with consistent differences between points (i.e. standardist IQ) (quantitative)
  • Ratio: equal intervals and meaningful zero point (i.e. height, wt, time, length), numeric scale with consistent differences between points and absolute zero (weight in kilos) (quantitative)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When does experimental manipulation occur?

A

Between subjects -> independent groups

- Within subjects/repeated measures: related groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is measurement error?

A

An error that occurs when there is a difference between the information desired by the researcher and the information provided by the measurement process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are extraneous and confounding variables?

A

Extraneous: another variable that is not the IV or DV
Confounding: An extraneous variable that can potentially explain the relationship between the IV and DV
- Example: age reading ability, year of school in children
- IV: age, DV: reading ability, Confound: year of school

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the measurement type of the variables?

A

Categorical data: discrete categories or groups

  • Frequency tables and bar chart / pie chart
  • Numeric data: a score on a scale
  • Numeric summary statistics (mean/median/mode, standard deviation) and a histogram
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Consider the following aspects when summarising data:

A
  • Typicality (mean, median and mode)
  • Variability (range, IQR, std dev, variance)
  • Shape (skew, kurtosis)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What features does a normal distribution have?

A
  1. Variability
  2. Unimodality
  3. Central tendency
  4. Symmetrical
  5. Mesokurtic
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a z-score?

A

Z-scores are standardised scores, measuring the difference between a score and the mean, expressed in std dev units
z = score - mean / std dev

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the central limit theorem?

A
  • Distribution of sample means will be approximately normal

- Mean of the sample means will be the same as the population mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is standard error?

A
  • Standard deviation of sample means = Standard Error

- Standard Error = std dev / square root N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is null hypothesis significance testing?

A
  • Analysing data from a sample, to see whether it can make a contribution to a field of knowledge
  • Conservative approach: begin by assuming the null hypothesis is true, then
    test whether we have evidence against that
  • Summarise data and compute a test statistic
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the hypothesis testing procedure?

A
  1. Decide on alpha
  2. Calculate test statistics
  3. Compare obtained with critical statistic
    obtained >= critical -> reject H0
    obtained < critical -> don’t reject H0
17
Q

What are T-tests?

A

Inspecting mean scores on a numeric variable

T-test = signal-to-nose ratio

18
Q

What are one-sample t-tests?

A

Average score on variable in the population from which the sample is drawn signficantly different to a known number?
- Is the population’s mean score different to another population?
t = sample mean - test value / SE (sd / root N)

19
Q

What are assumptions?

A

Conditions that need to be met for the test to be valid

20
Q

What are assumptions of a one-sample t-test?

A
  1. Variable is on a numeric scale (interval or ratio)
  2. Variable is normally distributed in the population
  3. Observations are independent
21
Q

What are t-statistics?

A

Is the ratio of signal (difference between means) to noise (variance around the mean)

  • The bigger the t-statistics = 1: signal to equivalent to noise
  • Null hypothesis significance testing: how likely is it that we obtained this t-statistic if the null hypothesis is true
22
Q

What are independent samples t-test?

A
  • Comparing 2 group means
  • Is there a difference between means of 2 (independent) groups?
    t = sample mean of group 1 - sample mean of group 2
    / square root of std dev 1/n1 + stddev2 / n2
23
Q

What are the assumptions of independent-samples t-test?

A
  1. DV is on a numeric scale
  2. DV is normally distributed in the population within groups
  3. Variance of DV is equal between groups
  4. Observations are independent (within and between groups)
24
Q

What are paired t-tests?

A

Is there a difference in average scores between 2 related groups?
- Same person over 2 time points or 2 conditions
- Related people
- 2 observation (Scores) are non-independent (related)
one numeric DV, one categorical IV

25
Q

What are the assumptions of paired samples t-test?

A
  1. Outcome variable is on a numberic scale
  2. Difference scores are normally distributed in the population
  3. Observations are related across groups, independent across pairs
26
Q

What is Shaprio-Wilk’s test of normality?

A
  • A statistical test of whether the population from which the sample is drawn is normally distributed
  • Applies to numeric (quantitative) variables only
  • Use to see if the assumption of normality is met in CONJUNCTION with a graph and/or numeric descriptive statistics
27
Q

What are confidence intervals?

A
  • Display the variability around the point estimate (mean)
  • Point estimates have a greater precision, interval estimates have greater accuracy
  • Range of values that estimate an unknown population parameter (95% confidence interval)
28
Q

Example of conclusion for Results section

A

Results showed that there was a statistically significant increase in the proportion of time that infants spent gazing at the singer of the familiar song, compared to the singer of the unfamiliar song, under the test condition (Mt = 0.59, SDt = 0.18) compared to the baseline condition (Mb = 0.52, SDb = 0.18), t(31) = -2.04, p = .022, which suggests that the infants preferred the familiar song melody.

29
Q

How are effect sizes indicated?

A
  • How big is the difference in means between 2 groups?
  • How big is the mean difference score?
  • Many different kinds of effect sizes
  • Standardised vs unstandardised
30
Q

What is Cohen’s D?

A

One effect size measure of group differences (expressed in SD units)
d = mean group 1 - mean group 2
/ pooled standard deviation (root sd2^2 + sd2^2) / 2
d >= 0.2 = small effect
d >= 0.5 = medium effect
d >= 0.8 = large effect

31
Q

What is a correlation?

A
  • Linear (straight line) relationship between 2 numeric variables
32
Q

What is a positive correlation?

A

An increase in one variable is assoicated with an increase in other variable
e.g. number of hours studying and exam perofrmance

33
Q

What is a negative correlation?

A

An increase in one variable is assoicated with a decrease in the other variable i.e. inverse
e.g. greater the number of children, less sleep parents get

34
Q

What is pearson’s correlation coefficient?

A
  • Ranges from -1.00 to +1.00
    closer to 0, weaker; closer to +/- 1, stronger
    ― 0 to .10 = very weak-to-no relationship
    ― .10 to .30 = weak relationship
    ― .30 to .50 = moderate relationship
    ― .50 to 1 = strong relationship
35
Q

What are the assumptions of a correlation?

A
  1. Both variables are numeric (and approximately normally distributed)
  2. Any relationship is linear (i.e. it’s not non-linear)
  3. No substantial outliers or gaps in data
  4. Independence of observations
36
Q

Correlations and Scatterplots

A
  1. Is it monotonic?
  2. Is it linear?
  3. Is it positive or negative?
  4. Effect of X on Y?
  5. Strength of correlation?
  6. Any outliers?
  7. Any gaps?
37
Q

How we formulate conclusions

A
  1. Identify the hypothesis (hypotheses)
  2. General descriptive statistics (understanding the sample and data collected)
  3. Identify variables involved in the hypothesis (what they are and what type they are)
  4. Identify the appropriate statistical test for the hypothesis
  5. Check the assumptions for the test
  6. Run the test and make conclusions
38
Q

How can we validly interpet p-values?

A
  • When the assumptions of the hypothesis test are met
  • When the hypothesis we test has a valid theoretical basis
  • When we conduct exactly one hypothesis test on a sample
  • There is an implicit assumption that only one test is performed
  • If we have >1 DV this basis fails
39
Q

What is an efficency vs type 1 error?

A

We need to ‘do something’ about 𝛼 because we have tried to be efficient in our research and test several hypotheses in one study, rather than conducting several independent studies.
• This is very efficient but increases the probability of type I error
• We counteract this risk by making it tougher to reject H0
• But it goes against the probability basis of hypothesis tests