statistics Flashcards

1
Q

What is nominal data?

A
  • Categories without order
    • eye colour
    • marital status
  • Discrete data
  • Qualitative
  • Non parametric
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is ordinal data?

A
  • Ordered categories
    • e.g. fiscat grades
  • Discrete data
  • Qualitative
  • non parametric
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is integer data?

A
  • Number of counts
    • papers published
  • Discrete
  • Quantitative
  • parametric or non parametric
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Ratio data?

A
  • Zero at origin
  • value dependent on units
    • e.g. age,distance
  • Continuous data
  • Quantitative
  • Parametric/non parametric
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is interval data?

A
  • Distance between units are of known size
    • e.g. hours spent revising
  • Continuous
  • Quantitative
  • Parametric
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what are the different types of distribution curves?

A
  • Normal distribution- bell shaped curve
  • Skewed distribution
    • positive
    • negative
  • Kurtic distribution
  • Platykurtic distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is skewed distribution?

A
  • asymmetrical
  • tail
  • positive or negative
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is used in skewed distribution to measure the central tendency?

A
  • Median or mode
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is kurtosis?

A
  • Measure of the relative peakness or flatness of a distribution cf normal distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is leptokurotosis?

A
  • Positive kurtosis
  • indicates a realtively peaked distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is platykurtosis?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the name given to how normal data can be normalised in order to allow parametric testing

A
  • transformation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is mean?

A
  • The average of the data
  • measured by dividing the sum of all observations by the number of observations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is median?

A
  • The central value of the data
  • used for ordinate data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the mode?

A
  • The data value with the most frequency
  • used for nominal data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In perfectly normalised data what is significant about the mean. median and mode?

A
  • They are the same
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the range?

A
  • The lowest and highest values of data
  • the range does not give much information about the spread of the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is percentiles?

A
  • grouping of data into brackets of 1%, 10%, or more commonly 25%
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is variance?

A
  • the measure of spread where the mean is a measure of the central tendency
  • variance is the correct sum of the squares about the mean
  • (σ (x-mean)2/ (n-1) )
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the standard deviation?

A
  • The square root of the variance
  • for a resonable symmetrical shaped bell data, one standard deviation contains roughly 68% of the data, 2 SD contains roughly 95% of the data, 3 SD contains 99.7% of the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is normal distribution defined by?

A
  • 2 parameters
  • the mean
  • the standard deviation
  • symmetrical = mode= median= mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the coefficient of variation?

A
  • SD/mean x 100
  • indicates how big the SD is in comparison with the mean
  • if SD high then the data are highly variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what is the standard error of the mean?

A
  • as the SD divided by the square root of the sample size
  • used in relation to sample rather than the population as a whole
  • the formula does not assume a normal distribution
  • it measures how closley the sample mean approximates the population mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the confidence intervals?

A
  • ranges on either side of a sample mean giving a rapid visual impression of significance
  • CI are equal to the values between the confidence limits and area set of number of standard errors of estimate size
  • for a large sample size 95% CI are approx 2 SEMs either side of the mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Why are CI prefered to P values?

A
  • Ci’s relate to sample size
  • a range of values are provided
  • CI provide a rapid visual impression
  • CI have the same units as the variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

what is a null hypothesis?

A
  • where a primary assumption is made that any difference seen occurred purely by chance
  • collected data are then tested to disprove the null hypothesis
  • if the result is significant the null hypothesis is rejected on the basis that it is wrong
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is the p value?

A
  • 5% probability that the difference was seen was due to chance
  • often p = 0.05
  • if the p value is < 0.05 then this suggests the probability of the difference seen being due to chance is less than 5%
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

what is type 1 error?

A
  • **When a difference is found **
  • but in reality there is not a difference
  • ie a false positive
  • null hypothesis is rejected incorrectly
  • this is the 5% of cases where the difference occured by chance
  • ie convincting an innocent man
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

How can we protect against type 1 errors?

A
  • Reducing significant levels ( although this increases type 2 errors)
  • as reduce p values= reduces type 1 errors
  • but then bigger samples sizes are required to protect against type 2 errors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is a type 2 error?

A
  • When no difference is found but in reality a difference does exist
  • is a false negative
  • there fore the null hypothesis is falsely accepted
  • failing to convict a person guilty of the crime
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

what is a type 2 error the result of?

A
  • Small sample size
  • nb important to preform power analysiss before undertaking the study
  • protect against type 2 errors by statistical power
  • type 2 errors are common in ortho studies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

what is a type 3 error?

A
  • occurs rarely
  • when the researcher correctly rejects the null hypotheiss but incorrectly attributes the cause
  • ie the researcher misinterprets the cause and effect
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is a statistical power analysis?

A
  • A method for determining the number of subjects needed to study in order to have a resonable chance of showing a difference if oen exists
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is statistical power?

A
  • is the probability of demonstrating a true effect or statistically significant difference
  • 1-ß
  • expressed as a %
  • is the probability that the test will correctly reject the null hypothesis
  • if the power of the expt is low then there is a good chance the expt wil be inconclusive or give a type 2 error
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What are the factors affecting power analysis?

A
  1. Size of the difference between the means
    • the larger the difference the easier to detect a difference & > the power
  2. Spread of the data
    • the larger the spread, the less likely a difference will be detected
  3. Acceptable level of significance
    • is the p value set
  4. sample size
    • power increases with increasing sample size
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What is an observational study?

A
  • The investigator observes rather than alters events.
  • e.g. review of PE after THR
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What is an experimental study?

A
  • the investigator applies a maneovre and then oberves the outcome
  • e.g. a surgeon may conduct a rct cf warfarin & heparin on the prevalence of DVT in pt with thr
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What is the different study timelines?

A
  • Retrospective study
  • Prospective study
  • Cross sectional study
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What is a retrospective study?

A
  • The outcome of interest has already occurred and teh pt or cohort ( group) is followed forward in time from a point in the past
40
Q

What is a prospective study?

A
  • follows pt or cohort forward in time
  • stronger than a retrospective study
41
Q

What is a cross sectional study?

A
  • examines pts or events at 1 point in time without followup
  • used when looking at the prevalence of a condition or desscribing distribution of variables
42
Q

what is type I and type 2 errors set at when conducting a study?

A
  • type 1 = 0.05
  • type 2 = 0.20
  • ie power of 80%
43
Q

When preforming mutliple tests there is a risk of what?

how can this be corrected

A
  • **increased risk of type 1 error **
  • so p value may have to be decreased
  • or preform Bonferonni correction
44
Q

What is the Bonferonni correction?

A
  • if one is testing n independent hypotheses, then one should use a significance level of 0.05/n.
  • e.g. 2 independent hypotheses then a result would be declared sig if P is less than 0.025
45
Q

What are the features of a parametric test?

A
  • Assumes data were sampled from normal population
  • observations must be independent
  • populations must have the same variance
  • can use absolute difference between data points
  • increased power for a given sample size n
  • rarely exists in orthopaedics
46
Q

What are the features of non parametric tests?

A
  • No assumptions are made about origins of data
  • no limitations on types of data
  • rank order of values
  • less likely to be significant
  • decreased power for a given n
  • cannot relate back to parametric properties of data
47
Q

when is a paired t test used?

A
  • When there is a **pair of observations on a single subject **
  • e.g. blood pressue before and after application of tournquet
  • aka students T -test
48
Q

Where there are multiple observations in a normal distribution what test is used?

A
  • Analysis of one way variance ANOVA
  • it determines the proabability that 2 or more samples were drawn from the same parent population
49
Q

When is a unpaired t test used?

A
  • used to compare 2 random samples provided they both follow a normal distribution
  • samples can be differing size but should be independent
  • ie no chance that a subject could appear in both of the groups being tested
50
Q

When is a chi squared test used?

A
  • for qualiatitve data
  • used only on actual numbers of occurances ( frequencies) but not porportions/%/means or derived statistics
  • the test compares distribution of a categorical variable in a sample with a distribution of a categorical variable in another sample
  • it assess whether the observed data
51
Q

What does correlation measure?

A
  • the degree of association between 2 parameters, with the correlation coefficient r being anywhere inbetween -1 and +1
52
Q

What is pearson’s coefficient?

A
  • a measure of linear ie parametric association
  • if one pararmeter increases as the other does, then the correlation coefficient is positive
53
Q

What is the data for coefficent is always expressed on what?

A
  • A scatter plot
  • if a curve line is needed to express the relationship , then a more complicated measure of correlation must be used- Spearman’s rank non parametric data
54
Q

What is the regression coefficient?

A
  • regression is a straight line drawn over the scatted plot using the equation y=a+bx
  • the regression coefficient is the direction of the regression line
55
Q

What is regression show?

A
  • shows how one variable changes on average with another
  • it can be used to find out what one variable is likely to be when the other is known
  • regression relationships may be linear, multiple or logistic
56
Q

What does the regression function r2 show?

A
  • Indicates the amount of variance in the dependent variable is related to variance in the independent variable.
  • ie if knee pain correlates with walking distance by r2= 0.6 then 60% of the variation in walking distance can be explained by variation in knee pain. The remaining 40% of variability is not explained
57
Q

List the types of studies and the level of evidence?

A
  • Level 1
    • **Meta-analysis/Systematic reviews of RCT **
    • **Randomised controlled trials **
  • Level 2
    • **Prospective cohort study **
    • systematic review of level 2 trials
  • Level 3
    • Case control study
    • Retrospective study
    • Systematic review of level 3 trials
  • Level 4
    • Case series
  • Level 5
    • expert opinon
58
Q

What are expert opinons?

A
  • Experts in their field has to say on a given subject
  • level 5 evidence
59
Q

What are case series?

A
  • Level 4 evidence
  • the outcomes of the group are reported, but there is no comparison group/ control group
  • weak in relation to causation
  • should act as a stimulus for more powerful studies
60
Q

What are case- control studies?

A
  • retrospective studies where cases are gathered with a certain outcome and then compared with controls that did not have the same outcome in order to look back at the effects of interventions /tx
  • quick and cheap to preform
  • limited by methological bias
61
Q

what are cohort studies?

A
  • 2 groups, one of which has undergone an intervention or tx are followed up over time in order to compare outcomes such as onset of disease or adverse effects
  • useful in identify incidence and established relative risk
62
Q

What are the disadvantages of cohort studies?

A
  • Diagnostic access bias- due to preselection
  • expense
  • decreased validity - due to loss to follow up
63
Q

What are RCT?

A
  • Gold standard
  • groups of patients are randomised to either recieve or not recieve an intervention or tx, and the outcomes are compared in a prospective manner
64
Q

What are the criteria of an RCT?

A
  • Randomisation
  • Generalisability
  • Sample selection
  • Outcome selection
  • Bias
  • Confounding factors
  • Masking/blinding
  • Ethics
  • Publication
  • Sequenetial analysis
  • Equvalence study
65
Q

What is randomisation?

A
  • it ensures that all prognostic variables both known and unknown will probably be distributed equally amongst the tx groups
  • this avoids bias in treatment assignment
66
Q

what should the outcome measures be?

A
  • Valid
  • Reproducible
  • responsive to change
  • choice of outcome clinically relevant
67
Q

What is intention to tx?

A
  • ie if a subject drops out during the study/tx the subject should still be included in the analysis
  • opposite is analysis per protocol/study
68
Q

What is bias?

how can it be redued?

A
  • Refers to flaws in impartiability that introduces systematic error into methodology and results in a study
  • Reduced by
    • Randomization
    • masking ( blinding)
    • meticulous attention to study protocol
69
Q

Name the types of bias?

A
  • experimental
    • during either selection or tx
    • reduced by randomisation
  • Observational
    • errors in measurement or classification of disease
    • use of hip and knee scoring systems
  • Patient bias
  • Publication bias
70
Q

What are the confounding factors?

A
  • Independent variables that interfere with the drawing of statistically valid conclusions from a study
  • these factors may not be distrubuted equally between groups =>confounding bias
71
Q

How can confounding bias be reduced?

A
  • Matching e.g. age
  • Stratification
72
Q

What is an equivalent study?

A
  • A RCT in which 2 treatments are expected to have the same outcome
  • the research hypothesis is that there is a difference between the 2 groups aka alternative hypothess cf null hypothesis
73
Q

What is different between meta- analysis and systematic review?

A
  • Meta - analysis- aim is to find relevant evidence from several studies in an unbiased manner and to apprasie each paper in all rct for metholoigcal quality. results are reported as a common estimate with confidence intervals
  • SR in that no common estimate of confidence intervals
  • cochrane collabration organises and publishes highly detailed systematic reviews in its database.
74
Q

What are the screening criteria?

A
  • “IATROGENIC”
  • Important conditon with known Incidence
  • Accepted and effective tx
  • Treatment and diagnostic facilities available
  • Recognizable latent and early symptomatis stages and consideration given as to whether early pick up at the latent stage leads to intevention adn whether intervention improves outcome
  • Opinions on who to tx are agreed
  • Guaranteed safety, sensitivity and specificity of test
  • Examination &/or tx are acceptable to pt
  • Natural history of condition known
  • Inexpensive tests, simple to preform
  • Cost effective screeening with a policy drawn up on whom to tx and it should be Continuously rolled out and repeated at intervals
75
Q

What is epidemiology?

A
  • Is the study of frequency and cause of diseases in human populations
76
Q

What is incidence?

A
  • Is the rate of occurance of new disease in a population previously free of disease
  • the rate is found by dividing the number of new cases in the study period by the number of individuals at risk at the beginning of the study period
77
Q

What is prevalence?

A
  • Is the frequency of a disease at a given time
  • found by dividing the no of patients with the disease by the sum of the number of patients with the diease and number of patients at risk
78
Q

What is sensitivity?

A
  • The ability of the test to exclude false negatives
  • ie the ability of the test to pick up all causes of disease
  • no true positives / true positive + false negative
79
Q

What is specificity?

A
  • Ability of the test to exclude false positives
  • ie ability to exclude the disease
  • no of true negatives/ true negatives + false postives
80
Q

What is the positive predicitive value?

A
  • Is the probability that a subject who tests positive is truly positive
  • ie the PPV indicates the significance of a positive test
  • PPV= true positive/ true positive + false positives
81
Q

What is the negative predictive value?

A
  • is the probability that a subject who test negative is truly negative
  • is the NPV indicates the significance of a negative test
  • NPV= True negative/ true negative + false positives
82
Q

What is accuracy?

A
  • Gives an idea of how often a test is correct
  • True positive + true negatives/ True positive + False positive+ True negative + false neagative
83
Q

What is odds ratio?

A
  • Used in case control studies
  • is the ratio of the odds that an event will occur in one group to the odds that the event will occur in the other group.
  • OR = (c/d)/( a/b)
  • = cb/ad
84
Q

How is relative risk reduction measured?

A
  • Success rate of tx group - sucess rate of control/ success rate of control
  • succes rate of tx = c/ (c=d)
  • Success rate of control group = a/(a+b)
85
Q

What is validity?

A
  • is the extent to which a test or outcome measure actually measures what it purports to measure
  • test have to be precise ( consistency of repeated measures ) and accurate ( represent what they mean to represent)
86
Q

What are the different type of validity?

A
  • Construct validity
    • the extent to which a measure corresponds to theoretical concepts or constructs concerning the phenomenon of interest
  • Content validity
    • the extent to which a measure represents the domain of interest
  • Criterion or concurrent validity
    • correlating scores on a new instrument or test with external criteria known or believed to measure the attribute
87
Q

What does realiabilty assess?

A
  • The random error of a measure
  • important to consider reliability within the same assessor intra-observer and different- interobserver
88
Q

What is kappa analysis?

A
  • Involves adjusting the observed proportion of agreeement in relation to the porportion of agreement expected by chance
  • Used for categorical data
  • a value of 1.0= complete agreement
  • a value of 0 = agreement can be explained purely by chance
  • a negative value= systematic disagreement
  • can be weighted or unweighted
  • weighted kappa statistics allow for the measuring of observer agreement in rank scales taking into account agreeement by chance and bringing the magnitude of disagreement into calculation
89
Q

What is survival analysis?

A
  • Is the study in which the outcome of an intervention is plotted over time which allows for variable dates of entry and for patients to be followed up for different lengths of time
  • analysed continuously = actuarial method
  • times at failure= Kapalan -Meier product limit method
  • combo of both= life table analysis
90
Q

How do you construct a life table for joint replacements?

A
  • Define end point / outcomes
    • sucess
    • failure
    • death
    • revision
  • For each joint replacement the number of joints being followed and the no of failures are determined for each year after operation
  • at each time point the no of pts at risk, the no of failures, no of pts withdrawn ( death/LTFU) recorded.
  • pts who complete trial and deaths= successful withdrawals / censored data/ non endpoints
  • these don’t count as failures and only affect no of pts at risk
  • each year the no of pts at risk calculated = as no of pts at beginning of year- 1/2 the no of withdrawals
91
Q

How is the percentage failure rate for each year calculated?

A
  • no of failures/ no of pts at risk during period
92
Q

how is the cumulative estimate survival calculated?

A
  • Cumulative estimated survival = 100%- cumulative proabability of failure
93
Q

How is the annual survival rate calculated?

A
  • By cumulating the success rate for all previous years and year in question
94
Q

When is the survival rates measure in a life-table analysis cf a kaplan-meier analyses?

A
  • Survival rates annually for life table analysis
  • Recalculated every time a failure occurs for a kaplan meier
  • the steps in the graph represent failures at each time point
95
Q

What is the survivorship cuvre?

A
  • IS the cumulative estimate of suvival plotted with 95% CI
  • upwards blips of solida circles are used to represent censored data on graphs
  • when reporting survival analysis on emust inlcude 95% CI , best and wirse case scenerios adn no of pts left at longer follow-up