1 Statistics Basics Flashcards

1
Q

Simple random sampling

A

Randomly selecting from everybody in the sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Stratified sampling

A

Creating different groups/strata and picking from each proportionally (to the overall group). Usually a large strata.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Systematic sampling

A

Chooses by selecting every nth term. The attribute being studied should be randomly distributed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Convenience sampling

A

Based on ease of selection. E.g: people physically closer to you are more likely to be picked than someone in the back row who you can’t even really see.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Cluster random sampling

A

Divides population into different coherent areas then randomly select areas to assess.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Snowball sampling

A

Finding people who are suitable for the study and then asking them to refer others they know who would also be suitable for the study.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is probability sampling

A

epresentative of the population as every individual has the same probability of being selected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

For symmetric data we use…

A

mean and SD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

For asymmetric data we use…

A

median and IQR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When a z scores used

A

when the values in question do not fall on specific reference ranges of the 68 rule.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Steps of a basic z score

A
  1. calculate the z scores.
  2. Search it in the table to find the corresponding area above these values.
  3. Use the overlap of area to find only the desired area.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a t distribution

A

Like normal distribution but takes into consideration degrees of freedom.

flatter/longer than a normal distribution peak.

  • inc degrees of freedom
  • inc sample size

the T distribution becomes more like the normal distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is degrees of freedom

A

(the number of data values that can change)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the central limit theorem

A

As n, the population, of a sample increases, the sample data is less likely to be skewed (more people = more likely outliers etc.).

The more samples we include on the mean distribution graph, the more it will look normally distributed, even if the initial data is skewed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is standard error

A

the standard deviation of the sampling distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why is hypothesis testing used

A

analyse if the results in a sample are due to chance and if they are similar to the total population the sample came from.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

WE ONLY TALK ABOUT THE

A

NULL NOT THE ALTERNATE.

18
Q

What is a type 1 error

A

reject the null hypothesis even though it is true.

represents observations of the null hypothesis due to chance.

We use a p value to test this error.

19
Q

What is a type 2 error

A

keep the null hypothesis even though it is false.

20
Q

What does a p value represent

A

It represents the times an observance was due to chance

OR

the probability of a type 1 error occurring.

21
Q

p>0.05 =

A

a lot of chance involved,

likely a type 1 error will occur,

insufficient evidence to reject H0.

22
Q

p<0.05 =

A

not much chance involved,

type 1 error unlikely,

statistically significant to reject H0,

23
Q

What does a 95% confidence interval represent

A

represents the interval of values that we are 95% confident will contain the sample statistic representing the whole population (not just the sample used).

If confidence interval contains the sample statistic value represent the null hypothesis then we cannot reject the null hypothesis.

24
Q

How do u find a t multiplier

A

from the table

25
Q

Ways to test equal variance

A

Graph and compare dispersion, similar = equal

(Larger SD)^2/(smaller SD)^2, ratio>=2 then unequal

Use hypothesis levenes test with equal and unequal hypotheses, p<0.05 = unequal

26
Q

Relative Risk (RR)

A

THE RISK OF GETTING THE DISEASE

Comparing people getting disease in exposed to unexposed

= Cumulative incidence (exposed)÷ Cumulative incidence (unexposed)

=incidence rate exposed / incidence rate unexposed

27
Q

Convert RR to %

A

Increase = RR-1 x 100

Decrease = 1- RR x 100

28
Q

Attributable Risk AR

A

Shows amount of disease due to just exposure

Cumulative incidence (exposed) – Cumulative incidence (unexposed)

% = risk in exposed group – risk in unexposed group / risk in exposed group x 100

29
Q

Population Attributable Risk (PAR)

A

incidence in general population – incidence in unexposed group

30
Q

Odds Ratio (OR)

A

THE ODDS OF BEING EXPOSED

We speak about OR in terms of exposure vs non-exposure

= (exposed cases/non-exposed cases) / (exposed controls/non-exposed controls)

31
Q

Explain talking about OR

A

The odds of developing DISEASE among EXPOSURE is OR times more/less than NON-EXPOSURE.

32
Q

What are the assumptions when doing linear regression

A

y follows a normal distribution (check by histogram or box plot)

Relationship between y and x is linear (check with scatterplot)

There is constant variance of the outcome across different values of the x (check with residual plot)

33
Q

What is the beta coefficient

A

represents the amount of change in y for every unit change in x

34
Q

How do we select significant variables

A

run the regression model with all variables first,

identify the insignificant covariate (could be a p>0.05) and then drop that covariate.

repeated until all significant.

35
Q

What does a chi square test do

A

compares two categorical variables to see if the variation in data is due to chance, or due to the variables being tested.

compare the data of observed frequencies with what we would expect to occur if the null hypothesis was true.

36
Q

Why is logistic regression different to linear regression

A

the response variable is binomial rather than continuous.

aim of logistic regression is to obtain an odds ratio.

37
Q

Evaluating logistic regression model: chi square

A

Hosmer‐Lemeshow “goodness of fit” statistic fins chi square statistic

low = p>0.05 = good fit

high = p<0.05 = poor fit

38
Q

Evaluating logistic regression model: ROC

A

Receiver Operator Characteristic (ROC) and c-index

How well does the model discriminate between patients who develop / do not develop the outcome (according to prediction)

use a c-index to test how accurate the model is.

39
Q

What is external validity

A

use a creation data set, then a validation data set to ensure the model works.

Sometimes models are good in creation dataset but not valid in the other dataset.

Therefore no appropriate

40
Q

What is a Hazards Ratio (HR)

A

help measure the effects of an intervention on an outcome of interest over time.

ratio of an individual at a particular time point following an intervention.

= Hazard in intervention / hazard in control

o HR > 1: Factor increases risk of event.

o HR < 1: Factor decreases risk of event (i.e. protective).