Lecture 2 - relationships in research Flashcards
Relationships in the research inform what we do.
EX: if I know that increased hip ROM = reduced falls w/ stair navigation w/ some diagnosis, im going to work directly on hip ROM to decrease fall risk
Measures the strength of association between 2 or more vairiables
* How releated two things are
Correlation (does not = causation)
If one goes up the other goes up
For example grip strength and fall risk are correlated. More grip strength = less fall risk. However, grip strength in no way helps you not fall. So them being correleated isnt the cause of decreased falls
* However, deconditioning overall, affects both of these factors - if im a deconditined individual im most likely not going to have good grip strength, and my fall risk is going to increase because im not improving strength/challenging balance
* So grip strength and fall risk are releated, however, they don’t directly affect one another (corelation did not = causation)
r = 1 means if one variable increases the other increase
r = -1 means as one increases the other decreases (still a relationship)
* negative relaionship
r = .14 is barley any relationship (norms shown later)
How strong or weak the relationship between two indepedent variables are
Correlation
are they in a close relationship where one increases in the other increases/decreases or they they have no affect on eachother
* NOTE: its still a relationship if one increases at the same time the other decreases and visa versa. Its not a relationship when theres no pattern pattern found
Perasons product moment correlation (r): Defines the magnitude and direction of a LINEAR relationship
r = 0 means no relationship
Correlation means the two are releated, not that one causes the other
* So, when you’re looking at a study you want to know if they controled for confounding factors (something that would influence this relationship)
Grip strength does not directly reduce risk of falling. But deconditioning level does.
* So it may look like decreased grip strength is causing increased fall risk, however, its the conditioning and they would need to control for this.
* Since grip strength and falling have an indirect relationship we can still use that relationship to quantify the risk of falling by getting a numerical value on grip strength (essentially measureing deconditioning by getting grip strength, and deconditioning has a direct relationship on fall risk)
* Because we don’t really have a good deconditioning measurement, so can use the quantitative value of grip strength for this
Confounding factors: variables that affect both the indepdent variable (what is being studied) and the depdent variable (the outcome), making it difficult to determine the true relationship between them
* these factors can give misleading results because theu introduce bias, suggesting a false association or masking a real one
EX: Imagine a study is trying to determine if drinking coffee leads to better job performance. The researcjers find that people who drink coffee tend to perform better at work. However, a confounding factor could be sleep habits. People who drink coffee might also sleep less or have different energy levels, which could influence their job performance.
* In this case, its unclear if the improved job performance is due to the coffee itself or the fact that these people have different sleep patterns. To draw accurate conclusions, the researchers would need to control for sleep habits to separate the effect of coffee on job performance.
KNOW: correlation r can be used to measure effect size and estimate power or sample size
Effect size = amount of effect the indepdent variable has on the dependent variable
r = 0.1 will indiacte a weak effect size (the indepdent variable barley affects the outcome or dependent variable)
r = 0.8 represent a strong effect size (the indepdent variable signficantly impact the outcome or depdent variable)
The closer r is to 1 or -1, the stronger the effect size, meaning the indepdent variable has a larger influence on the depdenent variable
power is the porability of detecting an effect if there is one, while sample size referes to the number of participants needed in a study.
Higher correlation (r) values typically require fewer participants to detect an effect because the relationship between the variables is stronger
Lower correlation (r) values require larger sample sizes to detect an effect because the relationship is weaker and harder to observe
To estimate power or sample size, researchers sue correlation (r) in power analysis. The stronger the correlation (effect size), the fewer particiapnts are needed to achieve a high level of power
r values:
* Strong =
* Moderate to good =
* Low to Fair =
* Little to no relationship =
Remember these values can be positive or negative vales depending on the relation (r = -1 or 1)
Body weight and exercse time per week is a positive or negative relationship?
negative
Increased exercise = Decreased body weight
As one variable increases the other decreases
Exercise intensity and heart rate is a positive or negative relationship?
Positive
Increased intensity = Increased HR
As one variable increases the other decreases
Assumptions of correlation:
With correlation do we assume a normal distribution or abnormal distribution (w/ graph)
Normal
Think a bell curve
* this is a natural phenomena (like height, weight, and test scores) tend to follow this normal distribution
Assymptions of correlation:
* Each subject contributes a score for the X and Y axis
What does this mean in the study below?
It means we know both their age and their strength
It means if there was any fall off in the study it should not be included
* Say you got the age but never got a strength measurement, well that data shouldnt be included
Assumptions for correlation
* X and Y are independent measures
Meaning they can be releated (which is why were doing the study, to see how strongly releated they are) but they can’t be apart of it
EX: If I’m doing doing a study on BMI, I shouldnt do a study of BMI vs Height, because height is litteraly apart of BMI (height/weight)
* Ofc those things are releated, one influences the other directly
* This serves no value or purpose
EX: We couldnt do gait speed and distance traveled
* Because gait speed = distance/time and distance is litteraly apart of gait speed (very interreleated)
* Distance is going to directly affect it
- **EX: A good one would be gait speed vs fall risk
- They’re releated but the other does not directly influence the other**
- One is completely indepdent of the other
Dichotomous
type of question that offers only to possible answers (think yes or no questions)
* either or, theres two options
Assumptions of correlation
*X values are observed
X values are observed: This means you collect data on X, a variable of interest, without manipulating it. X could be something like age, weight or income.
Y can be inte intervention: In some studies Y referes to an intervention or treatment that you apply to see if it affects X. For exmaple, Y could be a medication, and you’re interested in seeing how it affects BP (X)
X is the outcome: in other contexts, X might be the outcome you’re measuring after applying Y. For instance, you apply an intervention (Y) and then observe the outome (X), such as changes in behavior or health status
Both X and Y can also be observed. This means that in many studies, both variables are simply measured without any intervention. For example you might observe the relation ship between height (X) and weight (Y) in a population. Here both X and Y are observed, and you’re looking at how they correlate naturally without any experimental manipulation
Sometimes Y is an intervention or treatment, X is the outcome you’re interesed in
In other cases, both X and Y are just observed variables, and you study how they relate to each other without manipulating them.
X = the depdentend variable (for when Y is the intervention)
Both would be observed in gait speed and fall risk
* theres no intervention being implemented here, just observedation.
X is always some observed measure
Assumptions in correlation:
* The relationship must be liner - specifically for peasrons product (r)
5 assumptions in correlation
1) Normal distrubtion
2) Each subject contributes a score for the X and Y axis
3) X and Y are independent measures
4) X values are always observed
5) The relationship must be linear (r)
NOTE: This is an example of a non linear relationship. you can see the r value is low because its non linear
When they use a non linear line of best fit (like below) they will state what it was.
However, for just straight correlation, we must utilize a linear relationship
KNOW: in the study below they wanted to see if there was a relationship between cognitive function and ambulation ability
null hypothesis = correlation is 0 (no relationship between variables, each one is indepdenent and they do not affect eachother at all)
* H0: ρ = 0
Alternative hypothesis = correlation is not 0, there is some sort of relationship there
* H1: ρ ≠ 0
note we use ρ instead of r due to the assumption that data represents population
* because the data is the normal distribution it can represent the general population (think bell curve)
r = 0.348 (low to fair)
* slight positive correlation (not very strong, kind of all over the place [look at graph below]
what is a null relationship?
Assumes there is no effect or relationship between variables
It serves as a default starting position
EX: If you’re studying whether a new drug lowers BP, the null hypothesis would state: “The new drug has no effect on blood pressure.” This means you’re starting with the assumption that the drug does not work (meaning the variables had no affect on eachother)
The goal of research is to collect data and analyze it to either
1) Reject the null hypothesis (meaning there is evidence that supports an effect or relationship exists)
2) Fail to reject the null hypothesis (meaning the data does not provide strong enough evidence to conclude an effect or relationship exists)
** TEST: shes going to ask us what a graph generally looks like and if its a strong correlation, and if its positive or negative correlation**
* If it matches the r value given
Should make since in the relationship below that cognition level vs cognition r = 1
* its the same variable
Cognition levels vs Ambulation r = 0.348 (can see below that its not a strong relationship)
Significance (2-tailed): Telling us how sure we are that this 0.348 (r value) correlation is accurate
* 0.001 < 0.05 = significant relationship
* Meaning that theres a 0.001 chance that theres a really strong correlation that we missed - so we can be pretty sure that this # is accurate
* So this is important, because now were sure that this is not a strong relationship
TEST
* Should be able to look at the graph below and determine whether its a positive or negative relationship
* Be able to look at the graph below and determine if its a strong or weak relationship (w/o a r value)
* On the table, knowing where their r value is (just know its labeled in the var that says pearson correlation)
* Understand how to interpret the significance level
NOTE: in real research they would proably only provide the top half of the box below because its essentially just repeating itself in the bottom hakf
α = 0.05
P = 0.14
Is this significant or not significant?
What kind of error does this represent?
Not significant because probability (P) is greater Alpha (a)
To great of a probability of type 1 error
essentially saying the proabibility of type 1 error is 14% when the max we set that it could possibly be was 5% (a)
Explain what type 1 error is
So P = proabibility of type 1 error
a = 0.05
if P is greater than that (i.e., 0.6+) than were saying that our proabibility of type 1 error is to high
Type 1 error meaning that correlation (r) isnt as special as we think it is
* theres a higher chance that the r value isnt actually what we think it is
in short, its incorrectly identifying a significant difference when there isnt one
What is Alpha (α)?
The exceptiable amount of type 1 error
* typically set at 0.05 or 0.01
If alpha is set at 0.05 were saying the exceptable amount of type 1 error is 5%
if P = .14 that means that we have a 14% chance of type 1 error (higher than 0.05 and 0.01) meaning our faith in this correlation (r) being real goes down
* meaning theres a chance this isnt the true correlation for the population
Alpha (α) is typically set at 0.05, however, sometimes we set it at 0.01. What can happen if its set super low
The chance for type 2 error increases
What is Type 2 error
When alpha (α) is set so low (probs 0.01 instead of 0.05) that theres an actual correlation, but the P value is higher, so we say there isnt actually a correlation when there is
we could incorectly miss a significant difference
we miss an actaul relationship
What is beta (B)
* in studies you don’t typically see beta by itself. What do you see instead
type II error
Instead of beta you see power (1-B)
P value always represents the probability of type 1 error - we just compare that to alpha to know if thats significant
What is power
* What do we want power to =
1-Beta
Our proabibility of corretly identifying a trend
Want it to be >/ 80%
* we want to be correct at least 80% of the time
* We want our proabibility of correctly identifying something to be at least 80%
probs not on exam
Compare and contrast ranked/oridinal data w/ continuous data
Continuous data:
* Can take on any value within a range. These values can be infinely precise, meaning they can include decimals or fractions
* EX: Height, weight, temperature, time, distance
Ordinal data: Represents categories with a meaninful order or ranking, but the intervals between the values are not equal or defined
* EX: survey responses like “satisfied”, neutral, dissastified
* The values indicate a position or order (1st, 2nd, 3rd), but the difference between them isnt necessarily uniform or measureable
* Ordinal data tell you the relative ranking (better, worse) but not the magnitude of difference between rankings
* Cannot measure precise differences: For example, the difference between a satisfied and neutral survey response is not as clear or measureable as continuous data like temperature or weight
In summary, continuous data measures values that can be infinetly precise, while ordinal data rank or order categories without specifying exact differences between them
Is ranked/ordinal data parametric or non-parametric?
* What does this mean?
Non-parametric
Meaning that the data does not assume a specific probability (like normal distribution), and parametric statistical methods, which rely on such assumptons, are not appropriate for analyzing it
* basically it doesnt follow a normal distrubtion/bell curve
(NOTE: continuous data is often parametric) but not always
KNOW:
r = 0 means no relationship (null hypothesis)
r>0 means theres a relationship (alternative hypothesis)
* meaning on the positive or the negative side
Instead of using pearsons product moment correlation (r) ranked/ordinal data uses what for r
Uses spearman rank correlation coefficient
Pearsons Product-Moment Correlatoin
* Type of Data: used for continuous data where the relationship between the variables is linear and the data is measured on an interval or ratio scale (height, weight, temperature)
* The data should have a normal distrubtion
* The relationship between the two variables should be linear
* Use pearsons correlation when you want to assess the strength and direction of a linear relationship between two continous variables
* EX: If you want to know the correlation between students test scores and their number of study hours, and you believe the relationship is linear and normally distributed, perason correlation is appropriate
Spearman rank correlation Coefficient (r)
* Type of data: It is used for ordinal (ranked) data or continous data that does not meet the assumptions of normality or linearity (irgnore this last part). It measures the strength and direction of a monotonic relationship (where on variable consistently increases or decreases as the oteher does, but not necessarily at a constant rate [non-linear])
* Assumptions:
* The data does not need to be normally distributed
* The relationship can be monotonic ratehr than strictly linear
* Use case: Use Spearmans correlation when your data is ordinal or when you suspect a non-linear but still monotonic relationship between variables
* EXL if you want to explore the relationship between rankings of student satisfaction (i.e., satisfied, neutral, dissastisfied) and their class attendance, spearmans correlation would be more appropriate
Better said: Spearmans correlation is used to show the amount of correlation in a ordinal/ranked data set (thats non-linear) while Pearsons is used to show the correlation in a continous linear relationship
Reading and Verbal Comprehension scores in children w/ a learning disability
What we want to know as a researcher is if there is a relationship between their reading compreshions and their verbal compresion (if they’re good at one are they good at the other?)
* If we speak something to them, how much do they absorb vs if we read something to them, how much do they absorb.
You can see below that they ran both spearmans rho and kenall tau (both showing the correlation between ordinal/ranked data sets [variables that impact eachother in a non-linear way])
You can see that Spearman’s rho yeilded a higher correlation (r)
* So they would proably only show spearmans
* This is a positive relationship
NOTE: when you look at level of significance and see .00000 theres a 1 at the end somewhere, just means we have a very low proabibility of error
n = # of participants
* n = 16 and found a significant value (certain of r value), however, not very many participants = decreases strength of study.
* Effect size for this study is smaller because of decreased # of participants, however, my level of significance is so good that this evens out.