exam 2 Flashcards
what is research
the systematic investigation into and study of materials and sources to establish facts and reach conclusions
what is key to research
carefully formulating a research question/falsifiable hypothesis
what are the types of research
- observational
- experimental
observational research
- measuring of relationships betwen events or conditions
- no manipulation or intervening
- supports future experimental work
what is the con of observational research
hard to make confidenct cause-and-effect inferences
expiremental research
investigator directly manipulates conditions to measure the response
what are the different types of variables
-independent variable
- dependent variable
- intervening/confounding variable
independent variable
controlled by the investigator, or the uncontrolled cause
other ways to refer to IV
treatment, predictor
dependent variable
- not controlled
- the effect
- the variable that is measured in response to the IV
other ways to refer to DV
response, criterion
intervening/confounding variables
influence the DV as well but are not controlled by the investigator
what can intervening/confounding variables lead to
erroneous conclusions
what is the goal of sampling
to extract a sample (n) that is representative of the population (N)
what does having a representative same do
the effect of an IV on a DV can be generalized to all other members of that population
what are the types of sampling methods
- random sample
- stratified sample
- conenience sample
- systemic sample
- cluster sample
random sample
each member of the population has an equal chance of being selected
stratified sample
ensure representation of subgroups within the population of interest
convinence sample
members are selected based on “ease and proximity”
systemic sample
members are selected at regular intervals from a randomly ordered list
cluster sample
populations are divided into subgroups or “clusters” then members are randomly selected from a cluster
what is the ideal sampling method how come it isnt used
random sample, but its difficult for this to occur
what is the sampling method that is most suseptable to bias
convinence sampling
bias
aspects of the sample make it unrepresentative to the population
how can bias be decreased
having a larger sample = gerater proportion of the population = less bias
what are the factors that go into determining which sampling method is appropriate
the research objectives, resources/cost/times, and population characteristics
what are the type of experimental designs
- pre-experimental
- quasi- experimental
- true experimental
Pre- experimental design
- exploratory
- used when rigorous approaches are not feasible
- weak evidence of causality
- no control group or random assignment
quasi experimental
- moderate evidence of causality
- no random assignment
true experimental
- random assignment of participants to treatment or control groups
- strong evidence of causality
how many treatment groups can a participant be in
multiple
how many control groups can a participant be in
one
examples of pre-experimental
- case study
- pretest-posttest
- static group comparison
- nonequivalent groups
examples of quasi-experimental
- interupted time series
- natural experiment
examples of true experimental
- independent groups
- matched groups
- randomized controlled trial
- repeated measures
- factoral
- pretest-posttest
- solomon four-group
how would a pre-experimental case study group be set up
single group is exposed to an intervention/treatment and the outcome is measured
what are the cons of case study pre-experimental study
theres no way of knowing whether other factors contributed to the outcome
how is a one group pretest-posttest pre-experimental study set up
a single group is measured before and after an intervention/treatment
how is a static group comparison pre-experimental study set up
- one of two groups recieves the intervention/treatment
- includes a control group
- no randomization
what influences a static group comparison stuyd
pre-existing differences between groups may influence the outcomes
how is an interupted time series quasi-experiment set up
multiple measurements taken before and after intervention/treatment
how is a natural quasi-experiment run?
observation of effects of natural occurances/changes
how does a natural experiment influence external validity
non labratory based setting and in the natural environment can be used to study how the real world operates and generalize findings
how is an independent group, betwen subject, true experiment run
- random assignment to study groups
- can have more than two groups
what does random assignment to study groups for independent group, between-subjects, tru experimental studies do
minimizes the effect of individual differences
can individual differences still affect the results of an inependent group, between subjects, true experimental study
yes
how are matched groups, true experimental study run
- participants matched on key attributes then randomly assigned to groups
- helps to minimize individual differences further
what is a matched group true experimental group useful for
when specific attributes are expected to interact with the IV
how are randomized control trial true experimental studies run
- participants are randomly assigned to treatment or control group
- uses blinding to reduce bias
what is the gold standard for clinical research
Randomized controlled trials
single blind
particicpants unaware of grouping
double blind
participants and researchers unaware of grouping
how are repeated measures, within subjects, true experimental studies run
participants complete all conditions
- the participants are their own controls
- random/counterbalanced order to minmize carryover effects
- smaller sample sizes needed comapted to equiavalent independent/matched groups design
what are repeated measures, within subjects, true experimental studies sensituve to
the effects of the IV
how are factorial, true experimental studies run
- examines the effects of multiple IVs on a single DV
- can be incorporated into between, within subjects design
how are solomon four-group experiments designed
- separate treatment and control groups may or may not be “pretested”
- controls for carryover effects from the pretest, improved internal validitiy
- requires a larger sample size and randomization into each group
criterion of parameters
source: population
calculated: no
constants: yes
examples: mean, standard deviation, population
criterion of statistics
source: sample
calculated: yes
constants: no
examples: mean, SD, n
statistical inference
estimating population parameters from sample statistics
sampling error
amount of error in th estimate of a population paramter that is derived from a sample statistic
why are probability statements accompanied with statistics
because of the uncertainty in our parameter estimate
law of large numbers
as a sample size increases, the sample mean approaches the population mean if 1. samplesa are independenct 2. samples are identically distributed
what are easily swayed by extreme values according to the law of large numbers
means of small random samples = larger sampling error
what are resistant to extrememe values according to the law of large numbers
means of large random samples = smaller sampling error
sampling distribution of the mean
theoretical frequency distribution of all possible sample means that can be calculated from a population
according to the sampling distribution of the mean what is the relationship between variability of the sampling distribution and each sample mean
the variability of the sampling distribution decreases as sample size of each sample mean increases
according to the sampling distribution of the mean what is the relationship between the variability of sampling distribution and the variability of the population
the variability of sampling distribution is smaller than the variability of the population
standard error of the mean
how much the sample mean (statistics) is likely to differ from the true population mean (parameter)
what is the standard error of the mean also known as
the standard deviation of the sampling distribution of the mean
what is the equation for SEm
SD/sqrtn
when will samples have smaller SEm
- they are homogenous
- they have a larger sample size
what is the square root law
the accuracy of a parameter estimate is inversely proportional to the square root of the sample size
how will quadrupuling the sample size affect the SEm if everything else is normla
it willl half the SEm (half the variablity) according to the square root law
how do you interpret SEm
just like SD on a normal curve
- e.g. SEm = Z score of +/- 1.0
how do you state standard error of mean
there is a 68% chance the population mean is within 163.5 <= mu <= 182.5 lbs
what does the 68.5% chance the population mean is within a given interval mean
that this is also the confidence interval of 68%
what does stating a confidence interval of 68% mean
that there is also a 32% probability of error, or a chance that the mean is not within that range
- p = 0.32
what is an acceptable level of uncertainty
what is Alpha (a)
the area under the curve that represents the probability of error, the liklihood of chance ocurrence
what does an alpha value of 0.05 mean
that there is 5% chance of rejecting the null hypothesis incorrectly
what is the equation for finding the confidence interval
C.I. = Z-score mean +/- Z-score * SEm
how to report a C.I.
A 95% CI will give the mean +/- 1.96(SEm)
- the 1.96 is the interval where 95% of the data is found
how to interpret a confidence interval of 173 +/- 18.62 lbs or 154.38 lbs <= mu <= 191.62
with 95% confidence we conclude that the mean weight of all college-ages men is between 154.38 and 191.62 lbs. However, there is a 5% chance (p = 0.05) that the true mean falls outside of this range
what does a larger confidence interval result in
- less likely to be wrong
- less precise
how does statistical hypothesis testing begin
with two mutually exclusive, exhaustive mathematical statements about the relationship between variables/groups are formed
what are the two hypothesese formed for statistical hypothesis testing
- null hypotheses (H0) (this is assumed to be true unless evidence is found to the contrary)
- alternative hypothesis
what does mutual exclusive hypothesis means
only one can be true
what does exhaustive mean in terms of hypothesis testing
that no other option exists
nondirectional hypothesis
H0: mean 1 = mean 2
H1L mean 1 does not equal mean 2
directional hypothesis
H0: mean 1 < mean 2
H1: mean 1 > mean 2
what does a p value indicate in statistical hypothesis testing
indicates the probabilituy of obtaining the data collected IF the null hypothesis H0 is true
what does a p value < 0.05 indicate
that the result is statistically significant and the H0 can be rejected and you accept the alternative hypothesis
what does rejecting a H0 indicate
depending on what the H0 is, it would be indicating that there is a difference bettwen the two variables or that there the treatment group is significant
two tailed test hypotheses
H0: mean 1 = mean 2
HA: mean 1doesnt = mean 2
region of rejection for two tailed tests
- set by alpha value
- split between tails of the distribution (each 2.5% AUC)
when do you use a two tailed test
when prior research/logical reasoning does not suggest a direction or different, a difference should be expected
one tailed test hypotheses
H0: category 1 > category 2
HA: category 1 </= category 2
region of rejection
- set by alpha value
- concentrated at one tail of the distribution
when to use a one tailed test
when there is a strong evidence to think a difference exists
Type I error
H0 is rejected when it is actually true (a false positive)
what is confluded froom a type I error
conclude that an effect/relationship exists when, in reality, if does not
how can type I error risk be reduced
by decreasing alpha
Type II error
H0 is accepted when it is actually false (falsse negative)
what does a Type II error conclude
that no effect/relationshiup exists when it really does
how can Type II error risk be reduced
through decreasing beta
what is beta
the probability of committing a Type II error (typically strive for beta=0.2)
statistical power
probability of rejecting H0 when H0 is false
what is the equation for statistical power
power = 1 - beta
- typically strive for 0.8
factors that may tie into Type I error
- measurement error
- lack of random sample
- alpha value too liberal (a=0.10)
- investigator bias
- improper use of one tailed test
factors that tie into Type II
- measurement error
- lack of sufficient power (N too small)
- alpha value too conservative ( a = 0.01)
- treatment effect not properly applied
how to decrease a
- decrease a priori significance level a (a bonferonni correction)
- control confounding variables
- increase sample size
what does decreasing the signifcance level of alpha do
you will increase the chance of a Type II error
what is a bonferoni correction
correction to the alpha value dividing 0.05/# of tests
how do you decrease beta
increase a priori significance level alpha
what does increasing significance level alpha result in
may increase the chance of a Type I error
what is the based way to Type I, II error risk with available resources
conducting a power analysis
Correlation
the degree of association between betwen two interval- level variables
what is correlation represented by
a coefficient between +1.00 and -1.00
what does a +1.00 correlation coefficient mean
- perfect positive correlation
- the size of deviations from the mean in both variables are equal in the same direction
what does a -1.00 correlation coefficient mean
- perfect negative correlation
- or the size of deviations from the mean in both variables are euqal in opposite directions
what is a 0.00 correlation coefficient mean
- no correlation
- there is no pattern to the size and direction of deviations from the mean between variables
what does the sign of the coefficient indicate
direction
what does the magnitude of the correlation coefficient indicate
strength
what are scatter plots best used for
to visualize the correlation between variables
what is the line of best fit
best linear estimate of the relationship between variables given the data used to calculate it
what does the line of best fit minimize
residuals
what are risiduals
error between measured and predicted values by the lines equation
what is pearson correlation coefficient also called
pearon’s product moment correlation coefficient
what is the equation for pearson correlation coefficient
r = sum of (ZxZy)/N
- Zx being number of score pairs
- Zy being product of z-scores for each variable
what is the alternative “machine formula” that does not require z-scores
r = (sum of (x-mean)(y-mean))/sqrt(sum of (x-mean x)^2) sum(y-mean y)^2))
what are the assumptions of pearson correlation
- both variables must be on a continuous (interval or ratio) scale
- each pair of variables must be indepoendent
- both variables should be approximately normally distributed
- the relationship between variables (if one exists) must be linear
- the dataset should not contain outliers
what do you do if the relationship is non linear for pearson correlation
use spearman’s rank
how does outliers affect Pearson Correlation
it is really sensitive to outliers so it may creat an overly strong correlation or weak correlation
what is the eqation for spearman’s rank correlation coefficient
p = 1- (6*sum of di^2)/(n(n^2-1))
- di^2: the difference between variable ranks
- n = number of observations
what is spearman’s rank
- a nonparametric test
- w/ fewer assumptions including about the data distribution
what are the parameters of spearman’s rank
- variables do not need to be normally distributed
- variables can be discrete
- relationship between variables can be non-loinear but must be monotonic
- less sensitive to outliers
coefficient of determination
r^2
- quantifies the shared variance betwen variables
- how well the indeoendent variables explain the variation in the dependent variables
how to verbally express the coefficient of determination
_____% of the variance in the dataset can be explained by the variance in what is being looked at.
degrees of freedom (df)
the number of scores that are free to vary when the sum the scores is set
what is the equation for degrees of freedom
df = N-#of variables in the correlation
what does correlation doesnt = causation mean
correlation does not necessarily mean that a change in one variable will result in a change in the other
bivariate regression
strong enough correlations allow for predictions of one variable based on the values of another variable
what is the equation for a bivariate regression model
y = beta not + beta1x + e
bivariate regression assumptions
- the relationship between variables must be linear
- each pair of variables must be independent
- for any value of a predictor (independent variable) the dependent variable must be approximately normally distributed
- the variance of the residuals must be consistent across the range of predictor values
what is homoscedasticity
when the spread of residuals is relatively consistent within the regression model
how to calculate coefficients for the bivariate regression model
beta1: (r(SDy)/(SDx))
beta0: mean y - ((rSDy)/(SDx))mean x
will there always be residuals
yes, unless there is a perfect correlation between variables
how can the residuals in a regression model be represented
- using the standard error of the estimate
- or the SD of the residuals
standard error of estimate equation
SEe = sqrt ((sum(yactual-ypred)^2)/(n-2)
what is the alternate equation for the standard error of estimate
SEe = SDysqrt(1-r^2)
what does using the alternate equation for the standard error of estimate result in
it underestimates SEe when the sample size is small
in a regression coefficient what is H0
beta 1= 0
ina regression coeeficent what is HA
beta1 does not equal 0
what does the t-statistic tell you
determine significance of beta1
what is multiple correlation
quantifies the degree of relationship/association betwen a function of independent variables and one dependent variabl
what is multiple correlation represented by
a coefficient R between 0 and 1
what does a R = 0.00 value indicate
no correaltion, or there is no relationship/association between independent variables and the dependent variable
what does a R = 1.00 value indicate
perfect correaltion, or the independent variables completely explain the dependent variable
what is the multivariabe coeeficient of determination
- R^2
- same interpretation as bivariate r^2
partial correlation
quantifies the relationship between an independent variable and dependent variable after removing the effect of another variable
covariate
an independent variable that can influence the outcome of a given statistical trial, but which is not of direct interest
partial coefficient of determination
the variance in Y explained by X1 after removing the effects of X2 on both
what is an example of partial correlation
- interested in association between children’s age (X1) and muscle strength (Y)
- children grow and get heavier with age (X2) and may be a covariate
- using partial correlation = partial out the effect of weight and can leave the variance in strength due solely to age
what is unexplained variance and what is it represented by
- (1-R^2)
- the amount of variation in a dependent variable that a model can explain using the independent variables
Multiple and partial correlation assumptions
- both variables must be on a continious (interval or ratio) scale
- each pair of variables must be independent
- both variables should be approx. normally distributed
- the relationship between variabels (if one exists) must be linear
- the dataset should not contain outliers
multiple linear regression
prediction of one dependent variable from multiple predictor variables (independent variables)
what is the equation of a multiple linear regression
Y = a + b1X1 + b2X2 + …. bkXk
- b values are the slope coefficients
- x values are the independent variables
- a is the Y-intercept
hierarchial multiple regression
reseracher has full control over the model equation and which predictors are included
when is hierarchial multiple regression used
when hypothesis testing is the goal rather than accurate, efficient dependent variable prediction
algorithmic multiple regression
computer software/algorithms construct the model equation
what are the types of algorithmic multiple regression
- forward selection
- backward elimination
- stepwise
forward selection algorithmic multiple regression
starts with the intercept only, predictors are added to the model one-by-one and assessed, if R^2 increases that shows unique variablility
backward elimination algorithmic multiple regression
- starts with all predictors
- eliminates predictors one-by-one and assesses the resulting model
- if the removal of the variable decreses explained the varible the least (not sig decrease) the variable is eliminated
stepwise algorithmic multiple regressio
same as forward selection but previously entered variables can be eliminated in later steps
- if R^2 is not affected by the inclusion or exclusion
what is the drawback of a stepwise multiple regression
requires a larger sample size compared to other methods to return reliable results
what is the ideal ratio of subjects:variables for a stepwise multiple regression
20:1 to 40:1 ratio
in a table of correlation values for a given dataset how do you interpret the values prsented
the values are the r values that indicate the strength of correaltion between variables
- values close to 1.00 indicate strong correalation
- this is then squared to report how much variance of the dataset is explained through this variable
how can you tell if a variable has unique variance
- based on if the variable is highly correlated with other variables
- if the addition of the variable in the R^2 calculation increases significantly, if it does this indicates unique variance
how can you visually tell if variables offer unique variance
- if the circle overlaps heavily with the dependent variable
- if the overlap is present but more overlap is seen with another variable, it doesnt explain that much for the variance and therefore isnt unique
what are the multiple regression assumptions
- the relationship between variables must be linear
- each pair of variables must be independent
- for any value of a predictor (independent variable), the dependent variables must be approx normally distributed
- variance of the residuals must be consistent across the range of predictor values
- independent variables (predictors) should not be correlated with each other
what does multicollinearity lead to
leads to inflated confidence intervals for slope coefficient estimates and unstable slope coefficient estimates when addtional predictors are added
is there a threshold off acceptable multicolinearity
no
what should the variance inflation factor be
greater than 10 should be suspicious
what is the equation of variance inflation factor
VIF = 1/1-R^2
what is singularity in multicollinearity
two IVs are perfectly related (r=1.00) usually because one was mathematically derived from the other
cross validation
the process of testing regression equations on a separate and equivalent sample from which they were built to ensure accuracy in their predictions
what is expected when applying models to different samples
higher prediction errors
what is the cross validation model good for and what will be a result
training data
- the correlation coefficient will undergo shrinkage and would be smaller on different samples
who developed the T test
william sealy
when are t tests useful
- we do not know the distribution of the population
- we have a relatively small sample relative to the population
how does sample size relate to t distribution
as sample size increases, the t distribution approaches a normal distribution
what is a t statistic
the ratio between mean differences and variability
what is a critical statistic
the value that must be met to reach statistical significance at a given alpha level
what is the generic t test formula
t = mean difference/SEof mean difference
what can a t statistic be though of as
a signal to noise ratio
SEM for t statistic
SD/sqrt n
what is the standard error of the difference look at
the variability of the difference between two groups
what are the assumptions of a t Test
- the data must be normally distributed
- the data must be on the interval or ratio scales
- the sample is randomly selevted from the greater population
- when two samples are taken, they should have homogeneity of variance
what is a single sample t Test
used to compare a single sample mean with a known population meanat i
what is the equation for single sample t Test
t = (sample mean - population mean)/SEM
what is used to determine the critical statistic for significance
the degrees of freedom
when can H0 be rejected and HA accepted
if the |t statistic| > criticial statistic
how to calculate a confidence interval for a single sample t Test
C.I. = sample mean +/- tcv(SEM)
when is the adjusted standard error of difference equation used
when unequal sample sizes are present
what is the formula for the adjusted standard error of the difference
SED = Square root([((n1-1)(SD1^2)+(n2-1)(SD2^2))/(n1+n2-2)][(1/n1)+1/n2)]
paired sample t test
used ot compare two means from the same or correlated samples
what is the equation for a paired sample t test
t = sample mean pre - sample mean post /SED
what is the corrected SED for the paired sample t test
SED = square root((SD1^@)/n1)+(SD2^2/n2)-2r(SD1^2/n1)(SD2^2/n2))
what is the alternate approach to calculting the t statistic and SED for a paired sample t Test
t = d/SED
- d = mean difference between individual’s scores
SED = SDd/sqrt n
- SDd = standard deviationof the difference
what is the confidence interval for the alternate t statistic and SED for paired samples t test
CI= mean difference between individuals socres +/-tcv(SED)
what is used if data violates the assumptions of a t Test
single samples: Wilcoxon signed rank test
independent samples: Mann-Whitney U test
paired samples: Tilcoson signed rank test
effect size
the stregnth of the relationship between variables
omega sqaured
estimate of the varinace explained by the influence of the independent variable
what is used as a measure of effect size for pretest-posttest
percent change