Word Doc Week 2 Flashcards

1
Q

What should I do before conducting any data analysis?

A
  • “Screen & Clean” Data
  • SPSS can not tell if data is ridiculous
  • Only the researcher knows this so I need to make sure I screen the data for errors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Six reasons for conducting exploratory data analysis

A
  1. Checking for data entry errors
  2. Obtaining a thorough descriptive analysis of your data.
  3. Examining patterns that are not otherwise obvious
  4. Analysing and dealing with missing data
  5. Checking for outliers
  6. Checking assumptions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

EXPLORE command in SPSS

A
  • The explore command is SPSS covers all bases.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Who is John Tukey

A
  • A very practical statistician
  • Language and techniques of EDA developed by him
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does Screening and Cleaning involve

A
  • Computing new variables from existing ones
  • Recoding variables
  • Dealing with missing data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Checking Data Entry Errors

A
  • Use the frequencies command to check for data entry errors in categorical/nominal variables
  • Use the outliers option in the explore command to check for data entry erros in continuous/scale variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Options for Dealing with Data Entry Errors

A
  1. Remove data
  2. Make ‘educated guesses’ about what was intended
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Obtaining a thorough descriptive analysis of data

A
  • The explore command provides more descriptive stats than any other procedure
    • Multiple measures of central tendency
    • Multiple measures of variability
    • Quantitative measures of shape
    • Confidence intervals
    • Percentiles
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Examining Patterns that are not otherwise obvious

A
  • Stem and Leaf Plots
  • Box and Whisker Plots
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Analysing and dealing with missing data

A

Do you leave it out or do you substitute in the mean?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Outliers

A
  • Are they legitimate, or are they error?
  • Do you keep them? Do you discard them?
  • Balancing act, as with so much in data analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Assumptions

A
  • All parametric procedures (e.g., t-tests, ANOVA’s, correlation) operate under certain assumptions
  • Main two:
    • Normality
      • Assumes that your data comes from population that is normally distributed
    • Homogeneity of variance
    • Assumes that, if your data is to be divided into groups, the level of variability in the groups will be approximately equal (i.e., not significantly different).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Normality is tested in four ways:

A
  1. Visual inspection of histograms and stem and leaf plots
  2. Visual inspection of normality and detrended normality plots
  3. Normality tests
  4. Skewness divided by SE skewness
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Subjective Normality Tests

A
  1. Visual inspection of histograms and stem and leaf plots
  2. Visual inspection of normality and detrended normality plots
    • not influenced by sample size
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Objective Normality Tests

A
  1. Normality tests
  2. Skewness divided by SE skewness
  • Objective but influenced by sample size
  • With a large sample, even trivial deviations from normality will indicate a violation of the assumption
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The least important assumption

A
  • Normality
  • Statistical tests are very robust to even gross deviations from normality, particular when you have a large sample
17
Q

Why is normality the LEAST important assumption?

A
  • Statistical tests are based on sampling distributions
  • Sample distrbutions will always be normal regarles of the underlying population distribution
  • they will get more normal as the sample size increases
18
Q

When is normality a problem?

A
  • When the deviation is strong and the sample is small
19
Q

Tests 3 and 4 are sensitive to sample size

A
  • Ironical
  • Indicate a violation of normality with a large sample
  • Even if deviation is trivial
  • Normality is rarely a problem when the sample size is large
20
Q

Transformation

A
  • Applying a simple mathematical operation to the data to deal with violations of assumptions
21
Q

Common Transformations

A
  1. Log 10
  2. Square root
  3. Reciprocal
22
Q

Homogeneity of variance assumption

A
  • More serious particularly when the sample is unbalanced
  • tested using dedicated test: Levene’s Test
  • Dealt with by power transformation
23
Q

Levene’s Test

A
  • Used to test if k samples have equal variances.
  • Equal variances across samples is called homogeneity of variance.
  • Some statistical tests, for example the analysis of variance, assume that variances are equal across groups or samples.
  • The Levene test can be used to verify that assumption.
24
Q

Some general comments about transformations

A
  • They are no “magic bullet”.
  • They can’t cope with zeroes (need to add a constant)
  • They are unpredictable (e.g., a transformation designed to deal with normality can fix up homogeneity of variance and vice versa)
  • Some data is “untransformable”
25
Q

Recoding Data:

A
  • Often need to recode for a whole host of reasons:
    1. Reducing numbers of groups
    2. Reverse scoring