Data Analysis Flashcards

1
Q

what is effect size

A

the size of the outcomes from a determinant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what are the measures of average

A

mean - average measurement
median - midpoint of measurements
mode - the most common value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what are the measures of spread

A

range - extremes of all data
standard deviation - average spread of values around the mean
IQR - spread of values around the median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what are the directions of skew

A

positive skew is to the right

negative skew is to the left

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what does normally distributed data measure

A

report mean and standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what does skewed data measure (discrete)

A

report median and inter quartile range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is the relationship between mean and median in normally distributed data

A

roughly the same

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what values are dependant on distribution

A

mode and range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what are the two ways of displaying data

A

skewed data - box plot - median, IQR and range

two continuous variables - scatterplot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

how do you measure associations between categorical variables

A

use risk, risk ratio, odds ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what test would you use for a continuous outcome, or categorical exposure

A

t test or non-parametric equivalent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what measure of association would you use between continuous variables

A

correlation or lier regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is the difference between correlation and linear regression

A

c - association between two variables

lr - effect of one on the other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is the definition of correlation and the two types

A

measure of linear association between two continuous variables (r)
persons - both variables normally distributed
spearman’s - either or both variables skewed

-1 = perfect negative linear relationship 
0 = no linear relationship 
\+1 = perfect  positive linear relationship
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what are the pros and cons of correlation

A

simple method of association
order doesn’t matter
calculated between two variables only
assessment of straight line association only
cannot describe an exposure/outcome relationship or make predications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is anscombes quartet

A

4 completely different relationships that give all the same r value - con against correlation tests as can’t describe the relationship

17
Q

describe the uses of linear regression

A

models the relationship between two or more variables

can describe an exposure/outcome relationship

18
Q

what does the r2 value of linear regression mean

A

the variability in the health outcome that is explained by the given causative variable - described between 0 and 1 = higher the better

19
Q

what are the axis of linear regression

A

outcome on y

exposure on x

20
Q

in data interpretation what is the difference between correlation and regression

A

c - tells us whether both variables do or don’t change together between -1 and 1

r - quantifies how they change together - a gradient

21
Q

what would 0.32 mean in regression

A

means that for every 1% increase in exposure there is a 0.32% increase in outcome

22
Q

how do you interpret the r2 value

A

the proportion of the outcome variance that the model explains
the larger the better between 0 and 1

23
Q

what are point estimates in data interpretation

A

estimated values for particular points eg one patient comes in and diagnosed with A - other similar patient is diagnosed with B

24
Q

what is the difference between confidence level and interval width

A

c l - how often is this true
i w - the boundaries in which the truth lies at this level

confidence level is proportional to confidence width ie with an increased interval you can be more certain that the trust value lies within it

25
Q

what is the p value definition

A

the probability of a coefficient at least as big as yours, assuming the coefficient is actually zero

26
Q

describe a small vs large p value

A

small = zero-assumption is probably wrong - an effect is likely

large - zero assumption is probably right - an effect is unlikely

27
Q

what would a p value below 0.05 mean

A

statistically significant