1 Statistics Basics Flashcards by Mercer Reid

Simple random sampling

Randomly selecting from everybody in the sample.

How well did you know this?

Not at all

Perfectly

Stratified sampling

Creating different groups/strata and picking from each proportionally (to the overall group). Usually a large strata.

How well did you know this?

Not at all

Perfectly

Systematic sampling

Chooses by selecting every nth term. The attribute being studied should be randomly distributed.

How well did you know this?

Not at all

Perfectly

Convenience sampling

Based on ease of selection. E.g: people physically closer to you are more likely to be picked than someone in the back row who you can’t even really see.

How well did you know this?

Not at all

Perfectly

Cluster random sampling

Divides population into different coherent areas then randomly select areas to assess.

How well did you know this?

Not at all

Perfectly

Snowball sampling

Finding people who are suitable for the study and then asking them to refer others they know who would also be suitable for the study.

How well did you know this?

Not at all

Perfectly

What is probability sampling

epresentative of the population as every individual has the same probability of being selected

How well did you know this?

Not at all

Perfectly

For symmetric data we use…

mean and SD

How well did you know this?

Not at all

Perfectly

For asymmetric data we use…

median and IQR

How well did you know this?

Not at all

Perfectly

When a z scores used

when the values in question do not fall on specific reference ranges of the 68 rule.

How well did you know this?

Not at all

Perfectly

Steps of a basic z score

calculate the z scores.
Search it in the table to find the corresponding area above these values.
Use the overlap of area to find only the desired area.

How well did you know this?

Not at all

Perfectly

What is a t distribution

Like normal distribution but takes into consideration degrees of freedom.

flatter/longer than a normal distribution peak.

inc degrees of freedom
inc sample size

the T distribution becomes more like the normal distribution.

How well did you know this?

Not at all

Perfectly

What is degrees of freedom

(the number of data values that can change)

How well did you know this?

Not at all

Perfectly

What is the central limit theorem

As n, the population, of a sample increases, the sample data is less likely to be skewed (more people = more likely outliers etc.).

The more samples we include on the mean distribution graph, the more it will look normally distributed, even if the initial data is skewed.

How well did you know this?

Not at all

Perfectly

What is standard error

the standard deviation of the sampling distribution

How well did you know this?

Not at all

Perfectly

Why is hypothesis testing used

analyse if the results in a sample are due to chance and if they are similar to the total population the sample came from.

How well did you know this?

Not at all

Perfectly

WE ONLY TALK ABOUT THE

Study These Flashcards

NULL NOT THE ALTERNATE.

What is a type 1 error

Study These Flashcards

reject the null hypothesis even though it is true.

represents observations of the null hypothesis due to chance.

We use a p value to test this error.

What is a type 2 error

Study These Flashcards

keep the null hypothesis even though it is false.

What does a p value represent

Study These Flashcards

It represents the times an observance was due to chance

the probability of a type 1 error occurring.

p>0.05 =

Study These Flashcards

a lot of chance involved,

likely a type 1 error will occur,

insufficient evidence to reject H0.

p<0.05 =

Study These Flashcards

not much chance involved,

type 1 error unlikely,

statistically significant to reject H0,

What does a 95% confidence interval represent

Study These Flashcards

represents the interval of values that we are 95% confident will contain the sample statistic representing the whole population (not just the sample used).

If confidence interval contains the sample statistic value represent the null hypothesis then we cannot reject the null hypothesis.

How do u find a t multiplier

Study These Flashcards

from the table

Ways to test equal variance

Graph and compare dispersion, similar = equal (Larger SD)^2/(smaller SD)^2, ratio>=2 then unequal Use hypothesis levenes test with equal and unequal hypotheses, p<0.05 = unequal

Relative Risk (RR)

THE RISK OF GETTING THE DISEASE Comparing people getting disease in exposed to unexposed = Cumulative incidence (exposed)÷ Cumulative incidence (unexposed) =incidence rate exposed / incidence rate unexposed

Convert RR to %

Increase = RR-1 x 100 Decrease = 1- RR x 100

Attributable Risk AR

Shows amount of disease due to just exposure Cumulative incidence (exposed) – Cumulative incidence (unexposed) % = risk in exposed group – risk in unexposed group / risk in exposed group x 100

Population Attributable Risk (PAR)

incidence in general population – incidence in unexposed group

Odds Ratio (OR)

THE ODDS OF BEING EXPOSED We speak about OR in terms of exposure vs non-exposure = (exposed cases/non-exposed cases) / (exposed controls/non-exposed controls)

Explain talking about OR

The odds of developing DISEASE among EXPOSURE is OR times more/less than NON-EXPOSURE.

What are the assumptions when doing linear regression

y follows a normal distribution (check by histogram or box plot) Relationship between y and x is linear (check with scatterplot) There is constant variance of the outcome across different values of the x (check with residual plot)

What is the beta coefficient

represents the amount of change in y for every unit change in x

How do we select significant variables

run the regression model with all variables first, identify the insignificant covariate (could be a p>0.05) and then drop that covariate. repeated until all significant.

What does a chi square test do

compares two categorical variables to see if the variation in data is due to chance, or due to the variables being tested. compare the data of observed frequencies with what we would expect to occur if the null hypothesis was true.

Why is logistic regression different to linear regression

the response variable is binomial rather than continuous. aim of logistic regression is to obtain an odds ratio.

Evaluating logistic regression model: chi square

Hosmer‐Lemeshow “goodness of fit” statistic fins chi square statistic low = p>0.05 = good fit high = p<0.05 = poor fit

Evaluating logistic regression model: ROC

Receiver Operator Characteristic (ROC) and c-index How well does the model discriminate between patients who develop / do not develop the outcome (according to prediction) use a c-index to test how accurate the model is.

What is external validity

use a creation data set, then a validation data set to ensure the model works. Sometimes models are good in creation dataset but not valid in the other dataset. Therefore no appropriate

What is a Hazards Ratio (HR)

help measure the effects of an intervention on an outcome of interest over time. ratio of an individual at a particular time point following an intervention. = Hazard in intervention / hazard in control o HR > 1: Factor increases risk of event. o HR < 1: Factor decreases risk of event (i.e. protective).

1 Statistics Basics Flashcards

(40 cards)