Lec 19 and 20 Flashcards

1
Q

plenty of times, context known, looks unimportant?

A

exploring data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

can afford detail but looks vital?

A

results in paper

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

appeal from afar but have close up detail

A

poster at conference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

must sell idea in seconds

A

presentation or talk at conference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

counts in categories?

A

table or barchart which is barplot in R - these show frequency more clearly than pie chart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

why not to use pie chart

A

Pie charts are usually a poor way of representing frequency (count) data because the human eye is far better at comparing lengths than angles

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Does a paper graph need colour and title?

A

no

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

stacking of groups,. helpful or no?

A

This might be useful if you are particularly interested in the sums across the primary x-axis categories, but it isn’t easy to compare the breakdown by the second categorical variable. For this, it is easier to have unstacked bars, side by side.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

when to use colour?

A

for talk or poster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is a mosiac plot for?

A

Mosaic plot for exploration of complex contingency tables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

how many colours does default R palette have?

A

8

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

how to overcome overlap?

A

use open symbols and jitter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What chart is used to represent distribution in diff groups

A

stripchart and histogram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

when to represent anaylsis and when to represent the data themselves?

A

analysis for talk/poster
data themselves for paper (although many articles present analysis summaries, so best to use both so reader can decide if right analysis has been used)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

repeated measures - what to plot and what not to plt

A

dont plot the simple treatment medians (or means) as it conceals the paired design. Instead, Plot the differences between treatments (but only really viable for two treatments)
for two or more treatments, remove subject differences first

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

plotting correlation or regression

A

y ~ x as it matches the analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

regression line?

A

Ordinary Least Squares regression

When you are predicting y from x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

reduced major axis reg line?

A

only when no prediction involved or causation implied

for correlation or allometry

19
Q

relationship between many variables

A

pairs () uses whole dataset

20
Q

shingling

A

break down the second continuous variable into categories. By default, the ‘shingles’ overlap by 50%

21
Q

how to make 3d plot easier to read

A

add drop lines or Adding the plane describing the regression of z on y and x

22
Q

How is analysis represented? and how is data represented?

A

anaylsis represented using means and CI

data represented using boxplots

23
Q

what is data mining

A

True exploration when you don’t know what patterns might exist
May have a large number of candidate predictors and no strong theory to predict which should be more important
And as the number of variables goes up, the number of possible interactions goes up geometrically
** Before ‘data mining’ check for missing values and outliers and ‘clean up’ the data

24
Q

what does vertical spacing in CART represent

A

variance explained

25
Q

when pruning a tree in CART do errors go up or down with no of predictors?

A

error goes down with no of predictors

26
Q

what is Cross-validation?

A

For any classification method, your ‘best model’ will always perform well on the sample you developed the model for
So a risk of over-fitting
So, build model using one set of data and test it on another (=cross-validation)
partitioning the original sample into a training set to train the model, and a test set to evaluate it

27
Q

pos and negsof CART

A
It’s crude: doesn’t make use of continuous information; simply splits such variables as ‘high’ or ‘low’
But it’s robust
 Variables can have any distribution
And powerful
Examines all interactions
And gives results in an easy-to-use form
A decision tree
28
Q

two classes of clusters - definitions

A

Supervised’ learning – know the true identity of some clusters and use these to develop a predictive model for data where you don’t know group membership

Unsupervised’ learning – don’t know what’s ‘right’ or ‘wrong’, so try and find natural clustering patterns in the data (which can then be used in future prediction)

29
Q

examples of supervised and unsupervised

A

supervised -Discriminant function analysis
Logistic regression (two groups)
Multinomial logistic regression (>2 groups and no order)
Support Vector Machines
Neural networks
Genetic algorithms

unsupervised - k-means clustering
Specify number of clusters to find
Allocates data to clusters to minimise within-cluster sums-of-squares

30
Q

when to use CART vs clustering?

A

Use robust methods like CART to explore predictive relationships
Use clustering to expose unexpected ‘structure’ (groupings) in your data

31
Q

what is multinomial logistic regression

A

Multinomial logistic regression (>2 groups and no order)

32
Q

plotting raw data?

A

stripchart or dotplot

33
Q

a histrogram represents..

A

partially summarised data

34
Q

plotting continuous data

A

stripchart, sctterplot, histogram (to explore whole distributions)

35
Q

notched boxplot

A

If the notches of 2 boxes don’t overlap, medians are likely to differ

36
Q

how much of the data is represented in a boxplot?

and what is the interquartile range?

A

A boxplot consists starts at the 25th percentile and ends at the 75th percentile, so this box contains 50% of the data.
The interquartile range is the difference between the 25th and 75th percentiles, which are also referred to as the lower and upper quartiles.

37
Q

ellipses package

A

Simplify correlations to ellipses with hue & intensity indicating direction & strength

38
Q

which test is this?..the slopes are assumed to be the same (parallel) for each level of the categorical factor..

A

ANCOVA - can represent in sep panel or Or in same plot, with separate regression lines
USE SHINGLING to help visualisation

39
Q

what graph summaries the data?

A

boxplot

40
Q

what is inferential statisitcs?

A

testing hypothesis, makes inferences about populations using data drawn from the population.

41
Q

what is a conditioning plot and when is it used?

A

for investigating the relationship between multiple variables , is a plot of y against x conditioned upon (or broken down by) a third variable (or more if you have them).
If the third variable is categorical (a factor
with discrete levels/groups), then you get a plot of y against x for each level of the third variable,
each in a separate panel
If the third variable is continuous, the third variable is
broken into similarly-sized groups, somewhat overlapping, and you get a plot of y against x for each
of those.

42
Q

what is k-clustering?

A

allocates data to clusters to minimise within-cluster sum of squares

43
Q

how does hierarchical or agglomerative clustering work?

A

need a measure of distance between points, join the clostest two points, when two or more are joined they are treated as an item with a single ‘location’ … once all joined up, then create a pattern of linkage tree

44
Q

when to use CART vs hierarichcal/agglomerative?

A

CART for exploring predictive relationships, agglo/hierrchical to expose unexpected structures.