basic data Flashcards

Question 1

Q

types of data

Answer

A

Nominal Data
This describes categorical data without an order. Examples include blood groups (O, A, B, AB), eye colour and marital status.

Ordinal Data
Ordinal data are also categorical, but in this case categories have an order and can be ranked. Examples include stages of breast cancer. Importantly the “distances” between the different groups can be variable. For example, Likert responses may have the options “strongly agree”, “agree”, “neither agree nor disagree”, “disagree” and “strongly disagree”. Clearly this can be ordered, so it is an example of ordinal data, but it is apparent that the difference in agreement between “agree” and “strongly agrees” may not be the same as that between “agree” and “neither agree nor disagree”.

Binary data
Binary, or dichotomous, data have only two possible outcomes. Common examples are Yes/No or True/False responses, but they could also include other common epidemiological outcomes, such as “survived” and “not survived”.

Numeric data
Numeric data can be discrete or continuous. Discrete data have fixed values. Examples include shoe size or number of people. Continuous data can take any value, frequently within a given range. Examples include weight and length (where the range would be from zero to, theoretically, infinity).

Question 2

Q

types of data scales

Answer

A

(i) A nominal scale uses numbers purely as a label and there is no intrinsic order to the values, for example, ethnic group. A nominal variable, is used for mutually exclusive, but not ordered, categories. For example, a study might compare five different countries. You can code the five countries with numbers, but the numerical order is arbitrary.

(ii) Ordinal scales are qualitative, and ordered, but without any mathematical relationship between the points, for example, social class. An ordinal variable, is one where the order matters but not the difference between values.

(iii) Interval scales are ordered but the intervals between consecutive points on the scale are equal. That is, interval scales are where the difference between two values is meaningful (e.g. temperature in centigrade or Fahrenheit).

(iv) Ratio scales are interval scales but with a true zero, e.g. weight. That is, ratio scales have all the properties of interval scales, and also have a clear definition of zero (e.g. height or weight).

https://www.fph.org.uk/media/1223/june-2011-final.pdf

Question 3

Q

how can you measure the spread of data

Answer

A

range
IQR
variance/ SD
coeffient of variation (sd/n)

Question 4

Q

what is variance and standard deviation?

Answer

A

in formula sheet

Question 5

Q

what is standard error of the mean?

Answer

A

standard deviation of the sample distribution

in formula sheet

95% of sample means will fall within 1.96 SEM of the population mean –> pop mean within 1.96 SEM of the sample mean 95% of the time

Question 6

Q

what is the normal distribution

Answer

A

symetrical around the mean (median and mode), bell shaped

Question 7

Q

what is the p value

Answer

A

the probability of getting the observed value, or one that is more extreme, if the null hypothesis were correct.

Question 8

Q

unpaired z test

Answer

A

parametric test to see if difference between 2 groups if large n

1) The data must be normally distributed.
2) All data points must be independent.
3) For each sample the variances must be equal.

A z-score of 1.96 is equivalent to a two-tailed p-value of 0.05; therefore, a z-score >1.96 can be considered statistically significant at the 5% level
for proportions se calculated by
-se≈ √(p(1−p)/n1)+(p(1−p)/n2)

where p = average proportion for the two groups

Question 9

Q

paired z test

Answer

A

paraemtric if large and paired data

1) The data must be normally distributed.
2) All data points must be independent.
3) For each sample the variances must be equal.

where d = mean of the differences between the samples,
D= hypothesised mean of the differences (usually this is zero),
n = is the sample size and
σ2 = is the population variance of the differences.

Question 10

Q

unparied t test

Answer

A

if small n (<30 normally)

parametric test

1) The data must be normally distributed.
2) All data points must be independent.
3) For each sample the variances must be equal.

Question 11

Q

ANOVA

Answer

A

parametric test to compare mean of one exposure between 2+ groups

can do 2 way, multi if more than one exposure

assumptions:
- outcome normally distributed,
- SD same for each exposure

Question 12

Q

liner regression

Answer

A

normal distribution, linear relationship

can a pearsons correlation co-efficient (parametric)

Question 13

Q

what is Bayes therom

Answer

A

P (A|B) = P(A n B) / P (B)

P(A | B) = P (B|A) x P (A)/ P(B)

Question 14

Q

chi squared test

Answer

A

test for independance
large sample size (n>5 for each square)

to test if r x c are independent or if there is an association

H0: variable 1 and variable 2 are independent.
H1: not independent.
for 2 x2 (1df) chi squared > 3.84 for p<0.05

how to calculate:
1. create 2x2 table
2. calcualte expected ((row sum * column sum) / table sum.)
3. chi sqaured formula to work out number
4. Is it >3.84, reject H0, they are associated.

use fisher exact test is n small

Question 15

Q

chi squared test for trend

Answer

A

ordered categorical exposure variables. It tests the null hypothesis that there is no linear increase in the log odds per exposure group.

eg menarche and small/medium/large fold test)

Question 16

Q

McNemars Test

Answer

A

used when have paired data
to see if the outcome and exposure are independent

look at discordant!!

o Assumption 1: You have one categorical dependent variable with two categories (i.e.,a dichotomous variable) and one categorical independent variable with two related groups.
o Assumption 2: The two groups of your dependent variable mutually exclusive.
o Assumption 3: The cases are a random sample from the population of interest.

for 1 df (X2 distribution)

X > 3.84 p <0.05!!

create 2 x2 table

o + -
+ a r
- s b

in formula sheet

Question 17

Q

Direct standardisation

Answer

A

way to adjust for age if you have age specific rates for the study population

procedure:
1. identify standardised population
2. age specfific rate from study population x standard population number for that strata
3. sum all of these up
4. sum (ASR from study pop x standard pop number) / total standard population = Age standardised rate

look at pattern of change of rates in each strata are the same

if 2 can calculate compartive mortality ratio: just divide

Question 18

Q

Indirect standardisation

Answer

A

for when you DONT have age specific rates for the study population

procedure:
1. identify standardised population
2. apply standard population age specific rates to the study population to get EXPECTED number of deaths
3. SMR = observed/ expected

how much more/les likey to (die) compared to someone of the same age/sex in the standardised population

(if 1 same)

dont compare different SMR as may have different underlying populations

Question 19

Q

Wilcoxon signed rank

Answer

A

non parametric

similar to paired t test

null: median of differences between paired oberservations = 0

W > test statistic: = fail to reject the null

opossite to everything else where a bigger value then the test statistic would mean p even lower then that threshold value

Question 20

Q

Wilcoxon rank sum/ Mann-Whitney U

Answer

A

non parametric
similar to unpaired t test
H0: difference between the medians will be 0

opossite to everything else where a bigger value then the test statistic would mean p even lower then that threshold value

Question 21

Q

bootstrapping

Answer

A

take repeated samples from sample population with replacement

if do this 1000 of times can create CI

Question 22

Q

systematic review

Answer

A

the application of scientific strategies that limit bias by the systemematic assembly, critical appraisal and synthesis of all relevant studies on a specific topic.

Question 23

Q

Likelihood ratio (+v)

Answer

A

sensitivity/ 1 - specificity

P(test positive and have disease) / P( test positive and dont have disease)

Question 24

Q

post test probability

Answer

A

Post-test probability = post-test odds / (post test odds+1)

Post-test odds = pre-test odds * LR

Pre-test odds = pre-test probability / (1-pre-test probability) (for population screening it is the PREVELENCE OF DISEASE)

Question 25

Q

ROC curve

axis and uses

Answer

A

x: 1 - specificity (false positive)

y: sensitivity (true positive)

Uses:
- to set a cut-off value for a test result (for continuous diagnostic variables)

to compare the performance of different tests measuring the same outcome (test validation)

Area under ROC: AUROC = larger = better test

Question 26

Q

What type of regression analysis should be used to assess the difference in survival time

Answer

A

cox regression

Question 27

Q

Kruskal-Wallis

Answer

A

It is a non-parametric test
It is a rank-based test
it is used to test whether two or more independent groups differ.
It is the nonparametric version of one-way independent ANOVA (1 mark)

basic data Flashcards

part 2 stats