Biostats Flashcards
What is correlation?
A statistical technique used to measure the relationship between 2 variables
When performing a correlation analysis, do you manipulate the variables that you are studying?
No - the variables are usually not manipulated. They are simply observed as they exist naturally in the environment.
What is the value of using a scatter plot to look at correlations?
It allows you to see patterns and trends that exist in the data
What 3 characteristics are measured by correlation between 2 variables?
Direction
Form
Strength/consistency
What is meant by the direction of a correlation?
Whether the correlation is positive (as x increases so does Y) or negative (as x increases, y decreases)
What is meant by the form of a correlation?
Whether the correlation is a straight line or curved line
What is meant by the strength/consistency of a correlation?
Correlation of 1 = perfect fit
Correlation of 0 = no fit
How closely the data suits a line format is

What 4 things can correlations be used for?
Prediction
Validity
Reliability
Theory validation
How can correlation be used to predict?
If 2 variables have a reliable relationship, then one can be used to predict the other
How can correlation be used to validate?
If x is supposed to be related to Y, then X and Y should be correlated
How can correlation be used to show reliability?
If x and y have a strong and reliable relationship, then x and y shouldbe strongly correlated.
How can correlation be used for theory validation?
the prediction of the theory could be tested by determining the correlation between the 2 variables
What is the pearson correlation?
What is the equation for the pearson correlation?
Measures the degree and direction of a linear relationship between variables

True/False: Correlation implies causation.
False
True/False: Correlation values can be greatly affected by the range of scores in the data.
True
True/False: Outliers have very little effect on a correlation value.
False, they have a dramatic effect
True/False: a correlation can be interpreted as a proportion. Ex: r = 0.5 means that predictions can be made with 50% accuracy.
False.
What is regression?
A procedure that identifies and defines the straight line that provides the best fit for any specific set of data. The resulting line is called the regression line.
What does the regression line represent?
The central tendency of the relationship, or simplified description of the relationship. It can be used for prediction.
What is multiple regression?
The process of using several predictor variables to help obtain a more accurate prediction
What is the equation for multiple regression lines (for 2 predictor case)?
Y = m1X1 + m2X2 + b
What is the goal of multiple regression?
To produce the most accurate estimated values of Y
What is the standard error of estimate for regression lines?
Gives a measure of the distance between a regression line and the actual data points.

What are the 4 basic steps of hypothesis testing?



What does parametric and non-parametric mean?
Parametric = normally distributed
Non-parametric = not normally distributed


If you are comparing two proportions, what statistical tests are available to you?
Small sample size –> Fisher’s exact test
Large sample size –> Chi-square test

If you are comparing means from different samples, what is the next question you need to ask about your data?
Are the samples dependent (i.e. compare waist circumference of the same patient from before a diet intervention to their waist circumference after diet intervention) or independent (i.e. compare average waist circumference of a control group who did not receive the diet intervention to the average waist circumference of a group who did receive the intervention)?

For a parametric, independent sample, what test would be performed to compare the sample means in the following scenaios:
- 1 sample mean vs. the population mean
- 2 sample means
- > 2 sample means

With regards to t-tests, independent is synonymous with […] and dependent is synonymous with […]
Unpaired
Paired
For a non-parametric, independent sample, what test would be performed to compare the sample means in the following scenaios:
- 2 sample means
- > 2 sample means

For a non-parametric, dependent sample, what test would be performed to compare the sample means in the following scenaios:
- 2 sample groups
- > 2 sample groups

For a parametric, dependent sample, what test would be performed to compare the sample means in the following scenaios:
- 2 sample groups
- > 2 sample groups

Identify the type of statistical test that would be most appropriate for these scenarios.


When do you need to use an ANOVA test?
When you want to compare more than 2 means or more than 2 groups
Identify the type of statistical test that would be most appropriate in these scenarios.


The word “Factor” when used to refer to ANOVAs is synonymous with [independent or dependent] variable.
Independent
What statistical test would be most appropriate in this scenario?


What is a level of a factor?
The degree of manipulation of the factor.
College students were asked to estimate another student’s willingness to help load a sofa into a van in return for a cash payment of either 0.50 or 5, or candy of equivalent value (low amount or high amount).
In this example, what are the factors and levels?
Factor: Type of payment
- Levels: cash, candy
Factor: Amount of payment
- Levels: Low (0.50 or low candy value), high (5 or high candy value)




For an ANOVA test, what does between treatments variance measure? What about within treatments variance?

An ANOVA test allows you to isolate the […] while taking into account individual difference and chance.
Treatment effect
What does an F ratio of 1 mean?

What is “post-hoc” testing and when is it done?
Post hoc tests are additional hypothesis tests that are done after an ANOVA to determine which mean differences are significant and which are not. It is necessary when an ANOVA returns a significant F-ratio because you cannot determine from that alone which sample groups are significantly different and which are not.
Data that are considered nominal or ordinal are [categorical or quantitative]
Categorical
For proportional data, how do you know when to use fisher’s vs. chi squared?
