Word Doc Week 2 Flashcards

Question 1

Q

What should I do before conducting any data analysis?

Answer

A

“Screen & Clean” Data
SPSS can not tell if data is ridiculous
Only the researcher knows this so I need to make sure I screen the data for errors

Question 2

Q

Six reasons for conducting exploratory data analysis

Answer

A

Checking for data entry errors
Obtaining a thorough descriptive analysis of your data.
Examining patterns that are not otherwise obvious
Analysing and dealing with missing data
Checking for outliers
Checking assumptions

Question 3

Q

EXPLORE command in SPSS

Answer

A

The explore command is SPSS covers all bases.

Question 4

Q

Who is John Tukey

Answer

A

A very practical statistician
Language and techniques of EDA developed by him

Question 5

Q

What does Screening and Cleaning involve

Answer

A

Computing new variables from existing ones
Recoding variables
Dealing with missing data

Question 6

Q

Checking Data Entry Errors

Answer

A

Use the frequencies command to check for data entry errors in categorical/nominal variables
Use the outliers option in the explore command to check for data entry erros in continuous/scale variables

Question 7

Q

Options for Dealing with Data Entry Errors

Answer

A

Remove data
Make ‘educated guesses’ about what was intended

Question 8

Q

Obtaining a thorough descriptive analysis of data

Answer

A

The explore command provides more descriptive stats than any other procedure
- Multiple measures of central tendency
- Multiple measures of variability
- Quantitative measures of shape
- Confidence intervals
- Percentiles

Question 9

Q

Examining Patterns that are not otherwise obvious

Answer

A

Stem and Leaf Plots
Box and Whisker Plots

Question 10

Q

Analysing and dealing with missing data

Answer

A

Do you leave it out or do you substitute in the mean?

Question 11

Q

Outliers

Answer

A

Are they legitimate, or are they error?
Do you keep them? Do you discard them?
Balancing act, as with so much in data analysis

Question 12

Q

Assumptions

Answer

A

All parametric procedures (e.g., t-tests, ANOVA’s, correlation) operate under certain assumptions
Main two:
- Normality
  - Assumes that your data comes from population that is normally distributed
- Homogeneity of variance
- Assumes that, if your data is to be divided into groups, the level of variability in the groups will be approximately equal (i.e., not significantly different).

Question 13

Q

Normality is tested in four ways:

Answer

A

Visual inspection of histograms and stem and leaf plots
Visual inspection of normality and detrended normality plots
Normality tests
Skewness divided by SE skewness

Question 14

Q

Subjective Normality Tests

Answer

A

Visual inspection of histograms and stem and leaf plots
Visual inspection of normality and detrended normality plots
- not influenced by sample size

Question 15

Q

Objective Normality Tests

Answer

A

Normality tests
Skewness divided by SE skewness

Objective but influenced by sample size
With a large sample, even trivial deviations from normality will indicate a violation of the assumption

Question 16

Q

The least important assumption

Answer

A

Normality
Statistical tests are very robust to even gross deviations from normality, particular when you have a large sample

Question 17

Q

Why is normality the LEAST important assumption?

Answer

A

Statistical tests are based on sampling distributions
Sample distrbutions will always be normal regarles of the underlying population distribution
they will get more normal as the sample size increases

Question 18

Q

When is normality a problem?

Answer

A

When the deviation is strong and the sample is small

Question 19

Q

Tests 3 and 4 are sensitive to sample size

Answer

A

Ironical
Indicate a violation of normality with a large sample
Even if deviation is trivial
Normality is rarely a problem when the sample size is large

Question 20

Q

Transformation

Answer

A

Applying a simple mathematical operation to the data to deal with violations of assumptions

Question 21

Q

Common Transformations

Answer

A

Log 10
Square root
Reciprocal

Question 22

Q

Homogeneity of variance assumption

Answer

A

More serious particularly when the sample is unbalanced
tested using dedicated test: Levene’s Test
Dealt with by power transformation

Question 23

Q

Levene’s Test

Answer

A

Used to test if k samples have equal variances.
Equal variances across samples is called homogeneity of variance.
Some statistical tests, for example the analysis of variance, assume that variances are equal across groups or samples.
The Levene test can be used to verify that assumption.

Question 24

Q

Some general comments about transformations

Answer

A

They are no “magic bullet”.
They can’t cope with zeroes (need to add a constant)
They are unpredictable (e.g., a transformation designed to deal with normality can fix up homogeneity of variance and vice versa)
Some data is “untransformable”

Question 25

Q

Recoding Data:

Answer

A

Often need to recode for a whole host of reasons:
1. Reducing numbers of groups
2. Reverse scoring