Exploratory Data Analysis Flashcards

1
Q

What is exploratory data analysis

A

Using various tools to discover patterns in data

Not about testing or proving hypotheses

Inductive philosophy about data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

EDA according to Behrens (1997)

A

Inductive, bottom up & data driven process

1) understanding what’s going on with data
2) central use of graphic/visual representations
3) tentative model building & hypothesis generation (not hypothesis testing)
4) robust methods (less influenced by bias, outliers or specific scales used) & subset analyses e.g. post-hocs with stats correlations (follow ups, not planned)
5) skepticism & flexibility in methods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Tukey: EDA, CDA & in-between

A

Researchers work in either exploratory, rough confirmatory or confirmatory mindsets

If using CDA would never run post-hocs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Potential problems with EDA

A

Often can find more or less anything you’re looking for in dataset (especially if big/complex)- easy to claim you were looking for that all along

Is empiricism any use of we don’t have any underpinning theory or explanation of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Techniques & hallmarks of EDA

A

Representing data visually/graphically

Trying to avoid assumptions about your data

Paying attention to outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How is EDA useful to us?

A

Core techniques very good practice for looking at dataset- checking for patterns to guide later analysis, examining distributions & qualifies of variables

Easy to forget that stats used are full of assumptions- stats applied blindly lots, checking & visualising data great way to help these jump out at you

Useful when exploring new topics of new measures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Visualising data for EDA; histograms & stem & leaf plots

A

Help to identify shape of distribution (skew, Kurtosis, spread or variation in scores)

Help to identify unusual scores

Show you when something obvious is wrong

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Visualising data for EDA; box & whisker plots

A

Box shows IQR & whiskers show full range of data (besides outliers)

Robust methods (quartiles/median)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Visualising data for EDA; error bar charts

A

Used to compare groups or samples, bar usually shows mean score

Error bar displays precision of the mean either by using=

1) standard deviation (rare, most sensible when exploring distribution of dataset rather than comparisons)
2) confidence interval (often used as line up well with significance testing, if error bars overlap likely no significant difference)
3) standard error of mean (used lots as give smallest error bars so looks like error is smaller & significant difference present)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Visualising data for EDA; Scatterplots

A

Plot 2 or more variables against each other
More variables= more dimensions

Let’s you see potential relationship between variables (potential correlations & whether data fullfil assumptions for linear analyses)

Simple scatterplot= 1 group of pp
Grouped scatterplot= different groups in data (one colour per group)
3D grouped scatterplot= 3 continuous variables & at least 2 groups
Matrix scatterplot= grid of scatterplots looking at paired relationships of multiple variables in dataset, useful when going in blind on new set of measures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Presenting data clearly

A

Extremely important for conveying ideas outwardly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly