Definitions from scratch Flashcards

1
Q

Two types of variable

A

Metric variables

Categorical variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Categorical variables can be

A

Nominal: relates to named things i.e. it is NOT numeric
It is categorical because we allocate each bit of data to a category e.g. male or female

Ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Nominal data

A

= categorical nominal variable

Nominal: relates to named things
Category: each data point is placed in a category

Properties of nominal data:

  • They do not have units of measurement
  • The ordering of categories is arbitrary i.e. does not matter

Example:
Males: 45
Females: 72

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Properties of nominal data

A

Categorical nominal variable

  • They do not have units
  • The ordering of categories is arbitrary

Example:
Males 43
Females: 52

OR

Females: 52
Males: 43

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Ordinal data

A

=categorical ordinal data

Ordinal data is categorical but it can be ordered in a meaningful way i.e. smallest to largest

e.g. Glasgow coma scale
If person A has a GCS of 5, and person B has a GCS of 10 we can conclude person A’s consciousness is lower BUT we can conclude by how much i.e. we CANNOT say half as much

The difference between adjacent scores is not constant

The seemingly numeric values are NOT number, but labels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Properties of ordinal data

A

Categorical ordinal data

  • Does not have units
  • CAN be ordered in a meaningful way
  • Nearly always integers
  • Assessed rather than measured

NOTE: they do not have a numeric value, they seemingly have numeric values but these are actually labels i.e. GCS of is saying that they fit into a category called GCS 3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What you shouldn’t do with ordinal data

A

YOU SHOULD NOT TREAT THEM AS NUMBERS

i.e. for ordinal data you should not add, divide, or average it

Ordinal data = number labels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Metric variables can be

A

Discrete: values occur in discrete intervals i.e. 1, 2, 3, 4, 5,

  • comes from counting i.e. number of operations
  • difference between each count is constant (in comparison to ordinal data)
  • 4 operations is twice as many as 2 operations

Continuous:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Properties of discrete data

A

Discrete metric data

  • Has units
  • Discrete variables can be counted, meaning they are real numbers - produce Integers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Continuous data

A

Continuous metric data

  • Values form a continuum
  • Real numbers
  • Has units
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Frequency table

A

Used to illustrate descriptive statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Frequency distribution

A

Illustrates the number of events in each category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Relative frequency

A

= percentages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Contingency table

A

Cross tabulations

Illustrate association between two variables in a single population

Has two columns for the given variable in the row

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Ranking data

A

Allows assessment of non-parametric data

Order data into size

Starting with larges variable, rank this with value of 1
Next value rank as 2

Equal values are tied with the value of the average some of ranks used in tied series e,g, 7 8 5 5 5 3 1

8: 1
7: 2
5: =4
5: =4
5: =4 (3 , 4 , 5 avergae = 4)
3: 6
1: 7

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Ogive

A

Pronounced ojive

Cumulative frequency curve with continuous metric data

Curved (no step) chart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Measures of shape (skew)

A

Skewness:
-skewness coefficient defined from -1 to +1

Kurtosis:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Left skew

A

= negative skew

Lots of large values

Negative –> peak is further away from y-axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Right skew

A

=positive skew

Lots of small values

“Right skew, close to you”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Distributions

A

Symmetric: classic one humped distribution

Bimodal: two peaks

Multimodal: multiple peaks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Kurtosis

A

Measure of distribution

Distributions with large kurtosis exhibit tail data exceeding the tails of the normal distribution (e.g., five or more standard deviations from the mean).

Skewness differentiates extreme values in one versus the other tail, kurtosis measures extreme values in either tail.

If you hold the area the same, if you increase the kurtosis, the peak would get flatter and broad and hence larger spread

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Kurtosis value of normal distribution

A

=3

(excess kurtosis value = 0 i.e. the excess subtracts 3 form calculation)

(uniform distribution =1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Mode

A

Useful in categorical data

Useless in continuous data when no two values likely to be the same

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Median

A

CAN BE USED FOR ORDINAL AND CONTINUOUS DATA

Discards a lot of information

Not as affected by skew vs mean

Not as affect by outliers vs mean

Therefore median is a stable measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Mean
Uses all the data - each value is included Therefore subjected to effect from outliers and skew Cannot be performed on ordinal data
26
Percentiles
Values that divide a data set into 100 equal-sized group To find percentile, multiply percentage in decimals by (n+1) Where n is equal to number of data points
27
Properties of using range
Lowest to highest value Not affected by skew Sensitive to outliers which may misguide range
28
Interquartile range
Removes 25% from each end Reduces effect of outliers Affected by skewed distributions
29
Limitations of interquartile range
Discards 50% of the data!
30
Definition of standard deviation
Average distance of the data values from their collective mean Uses each data point, i.e. uses all the data (unlike IQR)
31
Measuring spread with ordinal data
Range and IQR Not standard deviation, can only be used on continuous data
32
Why not use median and SD
If you have continuous data, SD is the measure of choice. However, if you use a median value, that has suggested you have skewed data. Therefore, you shouldn't use SD.
33
Mean +/- 1 SD
68% of values in range *for normally distributed data
34
Mean +/- 2 SD
95% of values in range *for normally distributed data
35
Mean +/- 3 SD
99% of values in range *for normally distributed data
36
Testing for normal distribution
Shapiro-wilk test - if less than 2000 values - provides p-value with null hypothesis set for normal disrubtion Kolmogorov-Smirnov test - >2000 values - provides p-value with null hypothesis set for normal disrubtion
37
Transforming data
Make it normally distributed log (to the base 10) = most common square-root 1 over value
38
Incidence rate
Is actually the crude incidence rate Number of new cases of a disease or event over for a defined population given time period = number of new cases / number at risk (same time period)
39
Incidence rate ratio
ratio of two incidence rates
40
Prevalence
Number of cases in a given population at a given point in time
41
Crude mortality rate
= number if deaths over a period time (usually 1 year) divided by population at mid-point of that time duration MULTIPLE by 1000 gives crude mortality per 1000 per year
42
Case fatality rate
Number of deaths from a disease in a given time period divide by total number with disease over that time period
43
Standardised mortality rate
Crude mortality rate divide by overall standardised mortality rate
44
Properties of a confounding variable:
A confounding variable: - is associated (casually or not) with the exposure - causally related to the outcome - must not be part of the exposure-outcome pathway
45
Positive confounding
Leads to effect of exposure being inflated
46
Negative confounding
Leads to effect of exposure being reduced
47
Confounding by indication
Occurs when the clinical indication for selecting a particular treatment (eg, severity of the illness) also affects the outcome. Indication for exposure leads to disease outcome Not exposure itself
48
Residual confounding
Residual confounding is the distortion that remains after controlling for confounding in the design and/or analysis of a study. There are three causes of residual confounding: There were additional confounding factors that were not considered, or there was no attempt to adjust for them, because data on these factors was not collected. Control of confounding was not tight enough. For example, a study of the association between physical activity and age might control for confounding by age by a) restricting the study population to subject between the ages of 30-80 or b) matching subjects by age within 20 year categories. In either event there might be persistent differences in age among the groups being compared. Residual differences in confounding might also occur in a randomized clinical trial if the sample size was small. In a stratified analysis or in a regression analysis there could be residual confounding because data on confounding variable was not precise enough, e.g., age was simply classified as "young" or "old". There were many errors in the classification of subjects with respect to confounding variables.
49
Controlling for confounding at design stage
1. Restriction - exclude all those with exposure hairball - limits generalisation of evidence e.g. if you exclude all smokers then results unlikely to be generalise to any population 2. Matching - choice of method in case-control studies e. g. frequency matching (same proportions) e. g. propensity score matching 3. Randomisation - choice of method in RCTs - controls for known and unknown confounding
50
Controlling for confounding at analysis
1. Stratification - divides into strata, with and without exposure - essentially restriction but after the event 2. Adjustment - regression
51
Reverse causality
The exposure-disease process is reversed; In other words, the exposure causes the risk factor. Lower employment status is linked to causing depression. It may well be depression is linked to causing employment status.
52
Descriptive cross-sectional studies
Do no infer any causality and only measure one variable (i.e. incidence) but can measure multiple - Generally not subject to confounding if only measure prevalence - If measures multiple things, will need to adjust for potential confounding
53
Analytical cross-sectional studies
Attempts to asses potential links between two or more variables at a given time point - Does not infer causality - Need to be adjusted for confounding variables
54
Cross-sectional studies
- Take one set of measurements from each participant at a SINGLE point in time - Used to investigate associations between variables but NOT causality or direction - Not useful if condition is rare - If used to asses opinions or attitudes, referred to as surveys
55
Cohort studies
Pros - Main purpose is to identify if exposures or risk factors cause a certain disease - Several outcomes can be studied for single exposure - Temporal relationship can be established - ->therefore adds to causality - Suited for rare exposures - Less subject to bias and confounding than case-control Cons - Sampling bias - Not suited for rare diseases - Long follow-up: leads to attrition and bias - Recall bias in retrospective studies - Data quality in retrospective studies
56
Problems with case-control studies
- Recall bias - Selection of cases difficult to find - Difficult to match patients for each variable - Sampling bias of cases +/- controls - Definition of a case e.g. GOLD 1 COPD is unlikely to clarify much
57
Ecological studies
Make large-scale comparisons between two groups of people
58
Statistical inference
Data from the sample will inform conclusions about the target population Using sample statistics, we are inferring about the population
59
Sample statistics
Variable measured in a sample =sample statistics This is used to inform inferences regarding population parameters
60
Sample error
Deviation from true value of a parameter in sampled population -usually unknown
61
Rules of probability
Chance of an event occurring lies between 0 - 1 1 is absolute certainty of an event e.g. everyone will die some day 0 is an impossible outcome e.g. rolling an 8 on a dice labelled 1- 6 If an event is equally likely to happen as to not happen, the probability would be 0.5 If p is the probability of an event happening, the probability of the vents not happening is 1 - p
62
Proportional frequency
Used to calculate probability in clinical settings when outcomes do not all have an equal chance of occurring i.e. any clinical setting Proportional frequency states that the probability of an event occurring is equal to the proportion of times that outcome would have occurred if we repeated the experiment a large number of times
63
Techniques for randomisation
Simple randomisation Block randomisation -ensures at any given point there are roughly equal numbers in each group Stratification -ensures balanced strata of variables across each group
64
Reducing placebo or response bias
Blinding of participant
65
Problems with cross-over trials
- Participants may undergone change between treatment 1 and treatment 2 - Does not work for treatments that require a long time to take effect - Does not work in self-resolving or acute illness that responds to therapy immediately - "Carry over" effect despite washout periods
66
Hawthorne effect
Change in behaviour after knowledge of being observed Some trials do not recruit controls if data collection will not differ as Hawthorne effect changes outcomes
67
Intention to treat
The process of analysing the data as if participants are still in the original group allocation despite loss or changeover of participants - Maintains baseline characteristics - Prevents attrition bias - Reflects real-world practice - Keeps sample size and power the same Cons - requires imputation - can sometimes underestimate effect size
68
Per protocol analysis
Analysis performed as per treatments received by participant - protocol deviation has taken place - balanced baseline characteristics lost - attrition bias now in action - subject to confounding - loss of power - likely to overestimate effect size e.g. those most unwell are least likely to tolerate the side effects of new drug, hence only moderate disease is analysed in treatment group vs full-spectrum of severity in control group
69
Cluster sampling
Overcomes need for sampling frame Common sampling technique in randomised-controlled trials Units represent GP surgeries, hospitals, schools, clinics etc. Sampling units are a likely place to find spectrum of participants However, not a sampling frame - do not include everyone eligible and hence sampling and then selection bias will be introduced Example: 75 GP surgeries identified as eligible sampling units Randomly select 25 as your cluster sample -People who aren't registered with a GP have no chance of being included, hence not equal sampling probability
70
Probability density function
Used when calculating probability function in continuous variables pdf gives probability that a continupous random vairbale will lie between two values THIS IS BECAUSE: continuous variable have infinite possible number of outcomes, hence probability of a given outcome = 0
71
Absolute risk
Probability of an outcome occurring in a population with exposure
72
Relative risk
Risk exposed divided by risk unexposed Same as risk ratio (decimal)
73
Risk ratios
Risk exposed / risk unexposed Decimal Can over-inflate risk
74
Relative risk reduction
risk reduction / risk in unexposed