data analysis Flashcards

1
Q

when is linear regression used?

A

to improve correlation when measuring associations between continuous exposures and outcomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

how can you get a more representative sample?

A

more data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what does statistics allow for?

A

it allows us to take all data in and summarise it in a way that is understandable and useful

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what are the two main properties of data that we want to capture through statistics?

A

where quantitative data sits in numerical space and what categorical data is more or less common, what the values look like and understand the relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what does the analysis done depend on?

A

how is the data recorded and how is the data distributed and the research question - does it answer what it is meant to

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

how is categorical data usually recorded?

A

as text or labels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is ordinal data?

A

when it is ordered or ranked

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

how can you present categorical data?

A

counts, percentages, tables and graphs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what alters how you present data?

A

who you are presenting the data to

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

in what order does STATA follow commands?

A

command name, then argument for command and then further options after comma

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what are arguments?

A

they are variables to determine how the command is run i.e. bar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

when should you add graphics to the bar chart?

A

only if they provide more information and help to understand the information already given

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what are the methods for testing relationships?

A

logistic regression and T tests and chi squared - this is where we have one categorical and one continuous variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is numerical data?

A

it is when the data is data is in numbers - can count or measure the values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is discrete?

A

when the numerical data is whole numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

how can you summarise the size of numerical values?

A

mean and median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

how can you summarise the spread of numerical values?

A

variance, SD and IQR

18
Q

how can we report some sort of extreme in numerical values?

A

modal value, the minimum and the maximum

19
Q

what do you need to consider when analysis numerical data?

A

the specific reason for comparing groups or populations

20
Q

what can you get from the simple plot graph?

A

the range and the mode and understand how the data fits together

21
Q

what do histograms show?

A

how common the values are relative to each other - where the typical or most common values fall

22
Q

what would you use in normal distribution?

A

it is symmetrical so mean and SD

23
Q

how do you calculate SD?

A

you find each value and subtract the mean and then square each result. Add them altogether and divide by one less than the total number of values and take square root

24
Q

what is the mean?

A

the sum of all values/total number of values

25
Q

what is the SD?

A

it is the average spread of values around the mean

26
Q

what is left skew?

A

when the low values are quite rare and the long tail goes to the left - opposite is right skew

27
Q

what is the IQR?

A

the spread of values around the median - distance between values one quarter of way into the data and 3/4

28
Q

when would you use the IQR and median?

A

when the data is skewed

29
Q

where is the median when there is a tie?

A

it lies between them

30
Q

what is true of a normal distribution?

A

median = mean

31
Q

what does a scatterplot show?

A

it shows the relationship between two numeric variables - how the x changes relative to the y - how they covary

32
Q

what is a perfect positive and negative correlation?

A

positive = 1 and negative corr = -1

33
Q

what is the value for no correlation?

A

0

34
Q

how would you formalise a correlation?

A

use a correlation test

35
Q

what correlation test would you use for a) a normal distribution and b) a skewed?

A

a) Pearson

b) Spearman’s Rank

36
Q

what are the limits of correlations?

A

cannot comment on the exposure: outcome relationship, only use two variables, does not comment on the direction of the correlation, can only test for linear relationship, can be an oversimplification - may show some things as similar when they are not

37
Q

what is Anscombe’s quartet?

A

it is a set of pairs of variables that all have the same correlation between each other but when looked at individually actually have very different structures - shows how reducing the relationship between two variables to one number may miss detail

38
Q

why must many correlation tests be done?

A

to see the effect of confounders

39
Q

what is regression analysis good for?

A

can specify multiple exposures, include non-linear relationships and specify and exposure and outcome

40
Q

what is included in regression analysis?

A

for the relationship between two variables an intercept value followed by an effect size is given - the effect size shows how for one unit of change per exposure how the outcome is expected to change - can add a best fit to easily calculate this

41
Q

what is R2?

A

it is the proportion of variation in outcome explained by the exposure

42
Q

what must you include in regression analysis?

A

the 95% CI and the P value