Lecture Week 3 Flashcards

1
Q

6 reasons for conducting exploratory data analysis

A
  1. Check for data entry errors
  2. Obtain a descriptive analysis of the data
  3. Find patterns that are not obvious
  4. Analyse and deal with missing data
  5. Checking for outliers
  6. Checking assumptions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Central Tendency

A

Tendency for the values of a random variable to cluster round its mean, mode, or median.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Multiple Measures of Central Tendency

A
  • Summary statistic that represents the center point value of a dataset
  • Three most common measures of central tendency are the mean, median, and mode.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Multiple Measures of Variability

A
  • Define how far away the data points tend to fall from the center.
  • Range, Interquartile Range, Variance and Standard Deviation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Quantitative measures of Shape

A
  • The distribution shape of quantitative data can be described as there is a logical order to the values, and the ‘low’ and ‘high’ end values on the x-axis of the histogram are able to be identified.
  • Histograms
  • Kurtosis
  • Skewedness
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Confidence Intervals

A
  • For 95% confidence intervals, an average of 19 out of 20 contain the population parameter.
  • Suppose you have a 95% confidence interval of [5 10] for the mean.
  • You can be 95% confident that the population mean falls between 5 and 10
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Normality & Sample Size

A
  • Normality and Skewness divided SE Skewness are impacted by sample size
  • If there is a really large sample, tests become hypersensitive
  • Even trivial deviations from normality will violate the assumption of normality
  • Can create a false positive
  • Skewness doesn’t really change though
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Monte Carlo Tests

A
  • Simultated Studies that change the characteristics of the data show that even gross deviations from Normality don’t impact the statsistical significance of the tests.
  • Most tests can withstand even gross deviations from Normality - they are called robust
    *
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Central Limit Theorem

A
  • If you have a population with mean μ and standard deviation σ and large random samples
  • The distribution of the sample means will be approximately normally distributed.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What to do if you decide that data is not Normal?

A
  • No simple answer
  • You could do a transformation
  • Sometimes transforming the data can fix a problem of Normality
  • If transforming the data successfully converts the data to normality you can be confident the data is in fact Normally Distributed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Transformation

A

Applying a simple mathematical operation to data to deal with violations of assumptions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Common Transformations

A
  • Log 10
  • Square Root
  • Reciprocal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Log10 Transformation

A
  • Base 10 Logarithm
  • Formula in SPSS: Transform/Compute Variable/rename Target Variable/Numeric Expression: lg10(Variable)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Square Root Transformation

A

Formula in SPSS: Transform/Compute Variable/rename Target Variable/Numeric Expression: sqrt(variable)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Reciprocal Transformation

A

Formula in SPSS: Transform/Compute Variable/rename Target Variable/Numeric Expression: 1/Variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Homogeneity of Variance

A
  • This is problematic when we have an unbalanced sample
  • Tested Using Levene’s Test
  • Assumption underlying both t tests and F tests
  • Population variances of two or more samples are considered equal.
  • Corrected using one Transformation called a Power Transformation
17
Q

Levene’s Test

A
  • Tests Homogeneity of Variance
  • Must include a grouping variable - that is a variable which can be placed into groups (such as gender)
  • Formula in SPSS: Analyse/Descrfiptive Statistics/Explore/Dependent List: variable/Factor List: grouping variable/Plots/Spread vs Level with Levene Test/Power Estimation
  • In output look at Test of Homogeneity of Variance and then look at Based on Mean
  • And Standard Deviation
18
Q

Power Transformation

A
  • Used to transform data when the assumption of Homogeneity of Variance has been violated
  • This is raising figures to a Power Value sauch as squared or cubed
  • In SPSS Transform/Compute Variable/Rename: Target Variable/Numeric Expression: state ** variable
19
Q

Spread vs Level Plot

A
  • Plot itself is kinda pointless
  • In the fine print it says: Power for Transformation
  • This is the number to use when doing a Power Transformation to fix a problem with Levene’s Tests
20
Q

General comments about transformations

A
  • They are not a magic bullet
  • They don’t cope with zeros (there needs to be a constant such as adding 10 to every score)
  • They are unpredictable and can affect Normality and Homogeneity of Variance even if you weren’t planning to.
  • Some data is “untransformable”
  • Only use transformed figures to fix statistical tests
  • Only report from the true data
21
Q

Acceptable Skewness level to acheive Normality Assumption

A

Skewness statistic divided by std error equals either >+2 or <-2