Finals Flashcards
observational research
- no direct manipulation of variables
- the investigator looks at relationships
- gives a weak level of causation
experimental reserach
- direct manipulation of variables
- depending on the type of experimental can have a stronger level of causation
discrete variable
limited to certain values
- whole numbers or categories
continuous variables
can theoretically assume any value
- specific values or calculations
what are the types of scales that are discrete
nominal and ordinal
nominal scale
mutually exclusive categories with no logical order
- no direction, no magnitude, no proportion
ordinal scale
ordered rankings but no indication of size or difference
- has direction, no magnitidue or proportion
what are the types of scales that are continous
interval and ratio scales
inverval scale
equal intervals but no absolute zero
- has direction and magnitude but no proportion
ratio scale
equal intervals and has an absolute zero
- has direction, magnitude, and proportion
validity
how well a study can be used to represent a relationship between two variables in a study
external validity
ability for results to be applied to the general population
population vs sample
population: all individuals/objects with a common set of characteristics
sample: a sportion of the larger population that is assumed to represent the population
standard score
expresses how many SD points away from the mean a data point is
what is a standard score also known as
a z score
how to asign percentile to an appropriate quartile, quintile, decile
percentile is by 1 incriments so given a percentile place it in the relative range
how to calculate the percentile of a specific raw score for a rank order or simple requency distribution
P = (n/N)*100
n = scores at or below desirec percentile score
N = total number of values
calculate percintile from raw score for grouped frequency distribution
P = [(((X-L)/i)f+c)/N)*100
X = raw score
L = lower limit
i = interval size
f = frequency of below interval
N = total scores
measures of central tendency
values that describe the middle or central characteristics of a dataset
calculate arithmetic mean
sum of all data/number of score
calculate median
the middle score in a rank ordered list of scores
calculate mode
the score that occurs the most requently in a dataset
measures of variability
quantify the dispersion or spread within a dataset
calculate range
difference between max and min scores
calculate variance
sum(score-mean)^2/(N-1)
claculate standard deviation
sqrt ((sum (score-mean)^2)/(N-1))
coefficient of variation
percentage that allows comparison of vairabiltiy between different variables
equation for CoV
(SD/mean)*100
what does a larger CoV mean
larger is greater relative variability in the dataset meaning that the spread of dispersion of data relative to its mean
cenral limit thereom
fundamental concept in statistics that descirbes the behavior of samples means when taking repeated samples from a population
z score
expresses the raw score in standard deviation unites
equation for z score
(score - mean)/SD
postively skewed distribution
the tail will be towards the positive end with the curve being towards the negative side
negatively skewed distribution
tail will be towards the negative side and the curve will be twoards the positive side
platykurtic
means that K < 0
there is a wide range of scores, low concentration around mean
leptokuritc
K > 0
narrow range, high concentration around the mean
mesokurtic
K = 0
moderate range, moderate concentration around the mean
independent variables
variables that are manipulated to see the effect on a variable
dependent variable
variales that are not manipulated and observed to see how the IV affects the resutls
confounding variables
independent variables that are not manipulated but that affect the dependent variables
random sampling
participants chosen at random from a population group
stratified sampling
participants are separated into subgroups based on similarities and then participants are chosen from subgroups randomly
systemic sampling
a specfiic sequence is sued in chosing individulas from a list at a random starting place
cluster sampling
populations divided into clusters based on geographical or natural groupings then randomly selected
rank which experimental types has the best to worst evidence of causality
True, Quasi, then pre-experimental
case study experimen
- pre experimental test
- single group is exposed to an intervention/treatment and the outcome is observed
randomized controlled trial
- true experimental
- participants are randomly assigned to treatment or control groups
independent groups study
- true experimental
- random assignment to study groups
repeated measures study
- true experimental
- participants undergo all treatments and they are their own control
factorial study
- true experimental
- examines the effects of multiple independent variables on a single DV
autonomy
individual decisions abpout starting/staying in a study are respected
how is autonomy preseved in reserach
- providing informed consent
- ensureing voluntary participation and respecting that
- protecting vulnerable populations
- efforts of the nuermburg code
- IRB oversight
what are the 3 R’s of animal reserach
replace, reduce, refine
fabrication
adding false data to help support a study
manipulation
changing the reported data of a study or hidign something to support the findings of the study
conflict of interest
factors that affects a reserachers ability to be objective or impartial
types of COIs
personal, financial, professional
personal COIs
having personal relationships to an author on the publication
financial COIs
having stocks or some financial tie to the company or individual being overviwed
professional COIs
reviewing a grant proposal from a competing lab
IRB
- insituational review board
- critically review study and informed consent protocols to ensure alignment with ethical standdards
IAUC
institutional animal care and use committees
- equivalent to IRBs but look at animals
law of numbers
as sample size increases, the sample mean approaches the population
how does the law of large numbers apply to sampling error
- small random samples are easily swayed by extreme values = larger sampling error
- large random samples are resistant to extreme values = smaller sampling error
how to calcualte confidence intervals
C.I. = mean +/- Z *SEM
how to interpret confidence intervals
based on the given confidence interval (e.g 95%) this indicates that with 95% confidence it can be concluded that the mean of the dataset is between the upper and lower ranges of the calculated CI
what does the remaining percentage of the CI mean
that there is that much of a chance that the true mean falls outside of the range states prior
what must a null and alternative hypothesis be
mutually exclusive and exhaustive
Type I error
when the null is rejected when it actually should be accepted (false positive)
type 2 error
when the null is accepted when it should be rejected (false negative)
what does the line of best fit do
minimizes the residuals or the error between measured and predicted values by the line’s equation
what is the pearson’s correlation coefficient
an r vlaue between -1 and +1 used to show correlation between pos and negative correlation
interpretation of the pearson correlation coefficient
-1: perfect negative correlation
+1: perfect positive correlation
0: indicates no linear realtionship
what are the assumptions of the pearson’s correlation
- variables must be continuous
- variables must be independent
- variables hsould be approximately normally distributed
- relationship between variables should be linear
- dataset should not contain outliers
coefficient of determination
quantifies the shared variance between variables
how is the coefficient of determination depicted and how is it interpreted
r^2 or p^2
which is interpreted by the value in a percentage of the varaiance in one variable can be explained by the variance in another variable
homoscedasticity
when the residuals of a plt are consistent across all variables
what does partial correlation doe
quantifies the relationship between an independent variable and dependent variable after removing the effect of another variable
what is a covariate
is a variable used in a study that my be used to explain part of the correlation between two variables
foward seletion
adding variables as it relates to the amount of uniqe varance offered in the comparison
backward elimination
all of the variables are added one by one taken out in the order that decreases the variance the least to observe how the R^2 value changes
stepwise
independent variables are added to the model same as in forward selection but variables can be eliminated in subsequent steps in the addition of another variable explains equivalence
multicolinearlity
two or more independent variables in a regression model are highly correlated making it hard to determine the unique contribution of each
students t-distirbution vs normal distribution
- students t-distribution: normal curve approximations that help to account for bias due to sampling error
- normal distribution: fixed shape with thinner tails and does not change with sample size
what are the assumptions for t-tests
- data must be normally distributed
- data must be on the interval or ratio scales
- sample is randomly selected from the greater population
- when two samples are taken they should have homogeneity of variance
when are independent samples t test performed
used for unequal sample sizes and the formula for standard error fo the difference must be adjusted
when are paired samples t test performed
comapre two means from the same or correlated samples
when are single sample t test performed
used to compare a single sample mean with a known population mean
between subjects ANOVA assumptions
- the populations from which the samples are drawn are normally distributed
- the variability within the samples is approximately equal
- scores in all groups are independent from scores in other grops
- data are on a continuous scale
how to calcualte F ratio given an ANOVA results table
F = MSb/MSw
MSb: SSb/dfb
MSw: SSw/dfw
SS: sum of (score-mean)^2
how is omega squared interpreted
the same way coefficient of determination is
when is post hoc testing appropriate for a study scenario
when correlation is found between two independent variables and its is needed to aidentify which groups within your dataset are significantly different from each other
what is the advantage of a repeated measures ANOVA over a between subjects
helps to reduce unexplained variability and increases statistical power
- this analysis takes into account baseline abilities
what are the two correcetions for a violation of the sphericity assumption
Greenhouse Geisser correction: adjusts the degrees of freedom using epsilon
HyenFeidlt Corrections: similar to Greenhouse Geisser but less conservative
what is the decision flow to deciding which ANOVa to run
- how many independent variables
- are the groups independent or dependent
- are you measuring multiple dependent variables
- is there an interaction between factors
what options are there for how many independent variables do you have
- one varible one way ANOVA or reapeated measured ANOVA
- more than one factorial ANOVA
are the groups independent or dependent
independent: between subjects ANOVA
dependent: repeated measures
what main effects and interactions are in a factorial ANOVA
main effects: measure the independent influence of each factor on the dependent variable
interactions: measure how the factors work together considering the combined effects of multiple independent variables on the dependent variable
what makes a good covariate for ANCOVA
one that increases the statistical power by removing unexplained variance in the DV due to the confounders whihc may be the source of within or between groups differences
what is the homgeneity of regression slopes assumption for ANCOVA
the slopes of the regression lines between the covariate and each group are equal
when do you use a MANOVA
to assess the effect of one or more IVs on mulitple DVs with a single test
why is using a MANOVA better than multiple ANOVAs in some cases
bc multiple ANOVAs may increase the risk of Type 1 error
how calculate relative risk
RR = [A/(A+B)]/[C/(C+D)]
what does an RR of 1.0 mean
the rate of the response is the same between intervention conditions
what does RR>1.0 mean
the rate of the response is greater in the intervention group
what does RR<1.0 mean
the rate of the response is less in the intervention grou p
absolute risk reduction
difference in rates of response between intervention conditions
how to calculate aboslute risk
ARR = [A/(A+B)] – [C/(C+D)]
number needed to treat
number of individuals that must go through intervention to prevent one additional negative outcome
sensitivity
true positive rate
how well does the test identify those with a condition
specificity
true negative rate
how well does a test identify those without a condition
positive predictive value
proportion of individuals with a positive test that really do have the condition
equation for postice predictive value
A/(A+B)
negative predictive value
proportion of individuals with a negative test that really do not have the condition