Exam 2018 Flashcards
what level in the NHMRC evidence hierarchy are cross sectional studies?
IV
What are some other names for a cross sectional study?
Also known as a cross-sectional analysis, transversal study, or prevalence study
What kind of study is a cross sectional study and what does it do?
- Is observational
• Is descriptive
• Collects data from a population at one specific point in time
How are the groups determined in a cross sectional study?
• Groups determined by existing differences, not random allocation
What are the advantages of a cross sectional study?
- ‘Snapshot’ of a population at one point in time
- Can draw inferences from
existing relationships or differences - Can use large numbers of subjects
- Relatively inexpensive
- Can generate odds ratio, absolute risk, relative risk, and prevalence
6 Could combine finding with other research to develop a hypothesis about why the prevalence of certain disease increases with “factor”
What are the disadvantages of a cross sectional study?
- Results are static (time bound). No indication of a sequence of events or historical or temporal contexts
- Does not randomly sample
- Cannot establish cause and effect relationships
What are the ethical considerations of cross sectional research?
Must promote
• aims of research (knowledge, truth, and avoidance of error)
• values that are essential to collaborative work (trust, accountability, mutual respect, and fairness)
• public support for research
• moral and social values (social responsibility, human rights, animal welfare, compliance with the law, and public health and safety)
What are the 14 CASP questions for a cross sectional study?
- Did the study address a clearly focused issue?
- Did the authors use an appropriate study design to answer their
question? - Were the subjects recruited in an acceptable way?
- Were the measures accurately measured to reduce bias?
- Were the data collected in a way that addressed the research aims?
- Did the study have enough participants to minimize the play of chance?
- Have the correct statistical methods been selected? Are they clearly described with rationales?
- Was the data analysis sufficiently rigorous?
- Have the authors taken account of the confounding factors in the
design and /or analysis phase? - How are the results presented and what was the main result?
- How precise was the result?
- Is there a clear statement of findings?
- Can the results be applied to the local population?
- Howvaluableistheresearch?
What are the five P’s?
Population Problem Prevalence Pos/Neg Clinical implications Proposal
What does Pearson’s correlation coefficient [Rho-ρ] measure?
linear relationship between two variables with ρ=0 suggesting ‘no
linear’ relationship [may have non-linear relationships?]
What do Pearson’s product–moment correlation analyses measure?
Whether the continuous outcome variables were associated with the set of independent variables
What is the problem with Pearson’s correlation coefficient and Pearson’s product-moment correlation?
They offer crude linear associations and unable to adjust for other variables, so need multiple linear regression
What is the purpose of regression modelling?
• To investigate whether an association exists between the variables of interest
• To measure the strength (as well as direction) of an association between the variables
• To study the form of relationships
For a continuous outcome, relationships can be examined by linear or non-linear regression models
For a categorical outcome, logistic regression is usually used to examine possible relationships
Consider a linear model with a positive slope: Y=3+2X
What would it mean If X=0, Y= 3 + 2 (0) = 3
►1 unit increase in X results in 2 units increase in Y
►A positive slope (+2) implies upward slopping line and a positive association
Consider another linear model with a negative slope: Y=3-2X
What would it mean If X=0, Y= 3 - 2 (0) = 3
►1 unit increase in X results in 2 units decrease in Y
►A negative slope (-2) implies downward slopping line and a negative association
What are the considerations for Linear regression?
- The response or outcome variable [DV] must be continuous (e.g. weight, balance measure)
- The independent variables can be categorical or continuous, or a combination of both
• In linear regression, we test the null hypothesis of no relationship between the DV and the IV. If β represents regression coefficient of the DV and the IV:
H0: β = 0
Ha: β ≠ 0 [two-sided]
• Alternative hypothesis can be one-sided (Ha: β>0 or Ha: β<0), depending on the research question.
What are the assumptions for Linear Regression?
- The relationship between DV and IVs is linear
- The observations are independent and randomly selected
- Homogeneity of variances – constant variance
- The residuals (differences between observed and predicted observations) are independent and normally distributed
- The effects are additive
- Absence of outliers and multi-collinearity
What are some steps to take before examining the data for relationships?
- Descriptive statistics of all variables
- Distribution of outcome variables using histogram, quantile-quantile (QQ) plot, normality tests
- Scatter plot to examine linearity
- Collinearity diagnostics
- Appropriate transformation for normality if an outcome variable is not normally distributed
- Use median, range, inter-quartile range to summarize non-normal data
If a variable had skewness=0 & kurtosis=0, what would it’s distribution be?
Normal
• The further the value is from zero, the more likely it is that the variable is not normally distributed.
How will the data be skewed if If the mean > median -
positively skewed
How will the data be skewed if If the mean < median -
negatively skewed
What would it mean if mean, median and mode were all equal?
Normally distributed variable
What does a positive value of skewness indicate?
A pile-up of scores on the left side of the distribution (positively skewed).
What does a positive value for kurtosis indicate?
A pointy and heavy tailed distribution.
What is leptokurtic?
- pointy, heavy tailed
What is Mesokurtic?
normal
What plots can you use to check normality?
Histogram, Box-Whisker, and QQ plots
What are two tests of normality?
Kolmogorov-Smirnov test and Shapiro-Wilk test
What do Kolmogorov-Smirnov test and Shapiro-Wilk test do?
compare the shape of the sample distribution to the shape of a normally distributed curve:
Which test of normality is better for a large sample size?
Kolmogorov-Smirnov test
Which test of normality is better for a small sample size?
Shapiro-Wilk test
What does a non-significant (p>0.05) test suggest in a test of normality?
Suggests that the distribution of the sample is not significantly different from a normal distribution- NORMAL
What does a significant (p≤0.05) test suggest in a test for normality?
That the distribution in question is significantly different from a normal distribution- NOT normal
When is a test of normality particularly important?
Normality test important especially in small samples
What is multicollinearity?
- In regression analysis, “multicollinearity” refers to IVs that are correlated with other IVs.
- In presence of multi-collinearity, regression models may not give valid estimates of the individual predictors.
What is Variance inflation factor (VIF)?
Variance inflation factor (VIF) is a measure of how much the variance of the estimated regression coefficient is “inflated” by the existence of correlation among the IVs in the model.
What are the VIF cut off values?
VIF = 1 No correlation among the predictors
• VIF > 4 warrants further investigation
• VIF > 10 are signs of serious multicollinearity
When is multicollinearity more of a problem?
Large correlation - more significant problem with multi-collinearity
Small samples are vulnerable to multi-collinearity
How would evidence of homoscedasticity look on a scatter plot?
If a plot of the fitted values against the residuals scattered randomly around zero, this is an evidence of homoscedasticity
How else can you test homoscedasticity?
In addition, we can use statistical tests to examine constant variance (homoscedasticity) assumption
What does an insignificant p-value [p>0.05] of the homoscedasticity test tell us?
The constant variance assumption is supported
When would you transform your data?
If the data do not satisfy the assumption of normality, they can be transformed to make them resemble normal data
Does transformation of the data work better for large or small samples?
Transformation works better for large samples
What do you do if the data cannot be transformed?
If the outcome data are not suitable for any transformation to make them normal and/or sample is small, alternative non-parametric options to explore relationships include:
- Spearman rank-correlation coefficients
- Quantile regression
What is the B value in Linear regression?
for each unit increase in the independent variable score, how much unit increase would there be in the dependent variable
How do you check the assumption of linearity for multiple linear regression?
Lack of patterns in the scatterplots of standardised residuals support the linearity assumption
How do you check the assumption of constant variance for multiple linear regression?
- Plot of the residuals against the predicted values does not show any particular pattern = supports constant variance assumption
- Insignificant p-values of Breusch-Pagan test (p=0.25) and Koenker test (p=0.29) support the constant variance or homoscedasticity assumption
What is the R squared value?
R2 = 0.36 suggests that 36% of variability in the dependent variable is explained by the fitted model
What are two methods you can use to transform data (to make normal)?
A power transformation
A cubic transformation
What level study is a cohort study?
III-2
What are five examples of observational studies?
- Cohort
- Case-control
- Cross sectional
- Prevalence
- Case report
What are three components of observational studies?
- Subjects are observed in their natural state.
- The groups of subjects that are compared are self-selected e.g., manual workers versus non-manual workers or subjects with and without disease
- Subjects may be measured and tested (e.g., disease status ascertained) but there is no intervention or treatment (e.g. patients not allocated to different exercise programs, or to new drug or placebo).
What is a cohort longitudinal study?
• A population of subjects is identified by a common link (e.g., living in the same geographical area, working in the same environment, attending the same clinic, diagnosed with the same condition/disease e.g., brain injury).Cohort studies consider factors not under the control of the researcher
• Cohort studies often follow a cohort (people with a shared
characteristic) over time and can provide information about long term outcomes. They can also provide predictive information.
What are three ways that you can conduct cohort longitudinal studies?
- The researcher can follow them across time to see what happens to them (e.g., following people with moderate brain injury over time to see what the range of functional outcomes are). This is useful for establishing the natural history of a condition.
- The cohort can be divided at the outset into subgroups of people whose experience is to be compared (e.g., people who have had a coma with brain injury vs those not having a coma ). They are followed over time and the incidence of outcomes of interest (e.g., return to work) is compared between groups. This is helpful for considering possible causative factors or for establishing predictive factors.
- Or the cohort may be followed over a set period of time or until the event of interest occurs. The characteristic of those who have the event of interest with those who do not are then compared. This helps identify those most likely to develop the outcome
What are two examples of samples in a cohort study?
- Select group (e.g. occupational or professional group)
- Exposure group (Person having exposure to some physical, chemical or biological agent.)When starting out, groups should be free of disease, groups should be equally susceptible to disease and groups should be otherwise
comparable
What are some ways of obtaining data on exposure?
- Personal interviews / mailed questionnaire • Reviews of records
- Dose of drug, radiation, type of surgery etc • Medical examination or special test
- Blood pressure, serum cholesterol • Environmental survey
What are the different types of comparison groups in an exposure cohort study?
• Internal comparison
(Only one cohort involved in study
Sub classified and internal comparison done)
• External comparison
(More than one cohort in the study for the purpose of
comparison e.g. Cohort of radiologist compared with ophthalmologists)
• Comparison with general population rates
( If no comparison group is available we can compare the
rates of study cohort with general population. E.g., Cancer rate of uranium miners with cancer in general population)
How can you obtain follow up data in a cohort study?
- Mailed questionnaire, telephone calls, personal interviews
- Periodic medical examination
- Reviewing records
- Surveillance of death records
- Follow up is the most critical part of the study
- Some loss to follow up is inevitable due to death change of address, migration, change of occupation etc.
What is a major drawback of cohort studies?
Loss to follow-up is one of the draw-backs of cohort studies.
What is the aim of data analysis in cohort studies?
- Calculation of incidence rates among exposed and non exposed groups
- Estimation of risk
What are the strengths of cohort studies?
- We can find out incidence rate and risk
- More than one disease related to single exposure
- can establish cause - effect
- good when exposure is rare
- minimizes selection and information bias
What are the weaknesses of cohort studies?
• losses to follow-up • often requires large sample • ineffective for rare diseases • long time to complete • expensive • Ethical issues
What is internal validity?
Internal validity :
- Ability to reduce confounding factors
- so no other variables, except the one you are studying, caused the results
What are the three main focus points of a study question?
- The population studied
- The outcomes considered
- The prognostic factors/predictors of interest
What is an inception cohort?
Inception cohort = designated group of people assembled at a common
point in time early in the development of the disorder
How do you assess for bias in the recruitment of subjects to a cohort study?
Was there an inception cohort?
Was the cohort representative of a defined population? Was everybody included who should have been included?
What are some ways to assess for bias in terms of measurement in a cohort study?
• Did they use subjective or objective measurements?
• Do the measurements truly reflect what you want them to (have they been validated)?
• Were all the subjects classified into exposure/condition groups using the same procedure?
• Were the measurement methods similar in the different groups?
• Were the subjects and/or the outcome assessor blinded
to exposure/condition (does this matter)?
How would you assess power of a cohort study?
- Did the authors present power analysis/sample size calculation with adequate information?
- Was the sample size calculation based on pilot data or assumptions?
NOTE: cohort studies generally need large sample sizes
How would you assess for confounding factors?
Look for restriction in design, matching, and techniques e.g. modelling, stratified analysis, or sensitivity analysis to correct, control or adjust for confounding factors.
What is continuous vs. dichotomous data vs polychotomous data?
continuous = data that can take any value in a range. Dichotomous = two options e.g. yes/no polychotomous = >2 options
What to consider for continuous data in cohort study review?
What is the expected value (e.g. mean) at the time point of interest?
• Have data for prognostic factors/possible predictors been provided?
• Are the results statistically significant? How strong is the association
(e.g., effect size)?
What to consider for dichotomous data in cohort study review?
• What is the expected rate/proportion of the outcome between those
exposed/unexposed, (or the ratio/ absolute rate difference) at the time
point of interest?
• Have data for prognostic factors/possible predictors been provided?
• Are the results statistically significant? How strong is the association
(e.g., effect size)?