bpk 304w Flashcards
What is the scientific method
- define problem
- develop a research question and hypothesis
3.design study and protocol - test hypothesis
compile results - communicate teh results
Making a good research question - key points
- specify the patients
- specify the intervention
- specify the control
- specify the outcomes
- WHO, WHAT, HOW, WHY
- Instead of asking it like a question - make it a statement that you will be trying to prove
eg - does it do this (wrong)
- This is what it does (right)
Validity vs reliability
Validity: systematic measurement errors that improves the accuracy of the result
Reliability: minimizing random measurement error that allows you to reproduce the results again
Testing the hypothesis for an effect or a difference (what test)
T-test, ANOVA, ANCOVA
testing the hypothesis for relationships or associations (what test)
correlation testing, regression, and p-value for significance
What is a Normal Frequency Distribution and what are the values at the peak
normal = 0 std (95% confidence)
1 std = 34% deviation
at the peak you find the:
Mean: the average value (use if normal dis)
Median: the absolute middle of the distribution (from least to greatest) (use if not normal dis)
Mode: the most often occurring value
Skewness
- what is it
- what does it represent
- what values are significant
Skewness is a measure of symmetry (or the lack of)
- this causes a bell curve to shape if it is perfectly symmetrical data
- the skewness can be negatively skewed (LEFT TAIL (less than -1)
- or positively skewed (RIGHT TAILmore than 1)
- if -1 < skewness< +1 = perfect normal
Kurtosis
- what is it
- what does it represent
- what values are significant
Kurtosis is a measure of peakedness
- high kurtosis (over 1) have more outliers and have more tail data than an even distribution
(distinct peak - leptokurtic)
- low kurtosis (between 1-3) have a more normal uniform distribution because they have less outliers and less tail data
(Flat peak - platykurtic)
normal = mesokurtic
Standard Deviation
to quantify the amount of variation, spread out numbers from the normal
Central Limit Theorem
the sampling mean will always be normally distributed if the population is large enough
Standard error of the mean (SEM)
how different the population mean would be in comparison to the sample mean (the accuracy of the results)
- this is obtained through your 95% confidence interval
(SD/sqrt(N)) –> as N inc the distribution will slowly become normal
- This is the error WITHIN the mean
Z scores (Standardization)
how far the mean is from a normal population - this is done by #std (unit) above or below the population mean
- this standardizes the data and ELIMINATES UNITS IN THE GRAPH – it’s just measured by STDs
- this does NOT change the distribution of the data
- done by making the reference Mean = 0 and the STD = 1 then seeing the scoring
internal norm of a Zscore
Standardizing based on changing the reference mean = 0 and STD = 1
- this is based on the mean and STD of the sample itself and *compares INTERNALLY - ind compared to other ind in that group
External Norm (Z-score)
Reference Mean and STD are changed and measured by NORMATIVE data’s mean and STD
- this compares the experiment group to the EXTERNAL normative data
Tscore
the same thing as a Z score but the reference mean and STD come from an external population (not normative data)
percentile
the percentage of population that lies at or below that score
- Eg. 95%, 75% 50% etc
inferential stats
you make relationships between variables and are able to test hypothesis
Testing for statistical significance
Compare the test statistic to the critical value (alpha - usually 0.05 @ 95% confidence interval)
Type 1 error (Alpha Error)
You conclude that the results are significant when they are actually not
- you reject the null hypothesis
- the larger your alpha is the more likely you are to make a type 1 error
p-value
probability that is NOT due to chance
- if your p-value is small the results obtained are likely not due to chance
- if your p-value is large is it likely that the result was because of chance (not real)
the alpha (significance level) is chosen as per the experiment but common is 0.05 (95% confidence)
type 2 error (Beta)
the probability of NOT rejecting the null hypothesis when your results are actually significant
one-tailed test
you are sure that the result only moves in ONE direction (either above or below normal distribution)
Two-tailed test
The alpha tests both directions of the study (above the normal distribution and below it)
Independent T-test Definition
&what’s the tvalue for this
Comparing 2 independent means from 2 different groups
- a T-stat is calculated depending on the difference of the means
* IT IS OPPOSITE FOR THE TVALUE*
– CALCULATED TVALUE HAS TO BE GREATER THAN THE CRITICAL TVALUE (0.05) TO REJECT NULL
*less than 120 subjects
Assumptions of the Independent T-Test
- The dependent variable is continuous and ~normal
- independent variable is in two separate groups
- there are no observed relationships between the groups
- have the same variance (levenes)
Levene’s Test
Checking if several groups have the same variance in a certain population
- tests the null – the comparison of samples occurs if they have the same variance
- Pvalue greater than 0.05 - variance is not significant
standard error of the difference BETWEEN means
- the difference between two means
- this includes the variances in each mean
Paired T-Test
comparing the means of two Dependent variables
- before and after exp
- repeated measurement
- the groups are related to each other
*less than 120 subjects
ANOVA
Used to test 2 or more means of Factors
*more than 120 subjects
Assumptions for Randomized (between-groups) ANOVA
- dependent is continuous and ~normal dist.
- independent variable is 2+
What is a factor in an ANOVA test
a factor is an independent variable with a number of levels
- Between-subjects factor identifies same independent variable but different groups
eg. walking in both seniors and young ppl
- within subjects factor (or repeated measures) identifies different conditions (ind. vari) but the same 1 group
eg. 1 group doing the same exercise at 3 different weights
One-way ANOVA
1 factor is being tested - but it can have 2 or more levels
Two way ANOVA
2 or more grouping factors are present
- used to see the sig of individual factors and the relationship between the others
Types of ANOVA
- Between subjects factors => use Randomized Groups ANOVA
- Within subject factors => use Repeated measures ANOVA
both factors = mixed ANOVA
What is the F-Statistic and how does it inc?
F-stat is a ratio that compares the variability in each group
F-stat inc if:
between group variability inc and
within group variability dec
Post-Hoc Test
A follow up test thats done if the og ANOVA was significant
- comparing 3+ means
- done because you dont know which out of 3+ means are significant to each other
- scheffes/tukeys is good
Correlation
a relationship exists between 2 variables and they are related in some way
linear correlation coefficient (r)
the strength of the correlation between 2 variables
- closer to +1 or -1 = a perfect (pos or neg) linear relationship (0 = no correlation at all)
- the correlation depends on the RANGE => when all points are considered the correlation inc
- critical value based on sample size and significance
- df = n-1
- stat sig doesn’t mean practical sig
Pearson Correlation
strength of LINEAR association using a best fit line between the data points of the 2 variables
Requirements to run a pearson correlation and what is an outlier/types that affect coeff
- the pair of data is a random sample of independent quantitative data
- visual examination that the plot is in a straight-line pattern
- if the outliers are errors they have to be removed
- (outlier = more than 3 std away from mean)
- on-line outliers inc coeff. and off-line outliers dec coeff.
coefficient of determination
correlation^2 = coeff of determination
- this is the % of variance out of 100%
Linear Regression
Y = mx+b
- predicting a variable using another variable to do it
- the one being used to predict = independent and the one trying to be explained = dependent
- linear regression basically tells you how much the dependent will change if you change the independent (the strength of correlation, relation, and stat sig of relation)
whats the difference between correlation and linear regression?
correlation will tell you the relationship between 2 variables where
LR will tell you the relationship of two variables in an eq where only 1 affects the other (x,y)
standard error of the estimate
how well and precisely the line predicts the linear regression (distance between the predicted and the observed)
- units of Y
- every 2/3 times you can predict the dependent variable using SEE
slope of linear regression
y=mx+b
m= slope
- tells you how drastically ‘y’ changes for every unit x is changed
how do you test linear regression
cross-validation study (use one sample and then test another sample) - if the SEE is the same its valid
split-sample study (use 100% of the sample data and then split it in half) - if you get the same results its valid
standardized linear regression
making everything into z-scores to show the importance of each variable and how much it impacts the Y
non-parametric stats
don’t require data to to be in a distribution
- not normal or continuously distributed
-Eg. a survey or a likert scale