Midterm Flashcards
Belmont Report Guidelines
- Beneficence - maximize benefits, minimize harm
- Respect for persons - autonomy, informed consent
- Justice - fair distribution of benefits and burdens
Primary literature
- Original work that enhances or introduces knowledge
- Includes research results, case studies, descriptive and evaluative studies
- e.g. randomized controlled trial
Secondary literature
- Summarize, analyze, draw conclusoin from previous work
- e.g. reviews, meta-analysis
Evaluating Resources
acronym
Currency: published, updated, revised
Relevance: info, details, audience
Authority: credentials, peer-reviews
Accuracy: references, match others
Purpose: stated, objective or bias
Situational variable
- Describe characteristics of a situation/environment
- Categorical
- e.g. temperature in gym
Response variable
- Responses/behaviours
- Dependent variable
- e.g. RT
Participant/subject variable
- Individual differences, characteristics
- Numerical
- Independent variabke
- e.g. sex
3 Fundamental features of science
- Systematic empiricism
- Empirical questions
- Public knowledge
Beliefs/activities that imply science but lack 1+ of the 3 features of science
Pseudoscience
3 goals of science
- Describe - observational
- Predict - systematic relationship between variable
- Explain - mechanisms + causal rltnsp
Basic vs applied research
Basic: global understanding
Applied: address practical problems
PICOT
Patient pop. of interest
Intervention of interest
Comparison intervention/group
Outcome
Time
Sampling methods
simple random, systematic, cnvenience, cluster
Simple random: every member of pop has equal chance of being selected
Systematic: every nth participant
Convencience: nearby and willing
CLuster: divide pop into blocks, then randomly select blocks of participants
Stratified sampling
Divide pop based on characteristics, then sample is taken from strata using random, systematic, or convenienc e smapling
Variables other than the DV
Extraneous variables
Variable that systematically vary with DV
Confound variable
Provide alternative explanation
Difference between experimental and non-experimental rsrch
Manipulation of IV only in experimental
Can’t draw causal conclusions with non-exper
Measures of dispersion
range, standard dev, variance
Range: difference between highest and lowest score (outliers can mislead)
Standard deviation: avg distance between scores and mean; square root of variance
* √((⅀(x-m)²)/n)
Variance: mean of squared diffferenced (SD^2)
* calculate the variance by taking the difference between each point and the mean. Then square and average the results.
Descriptive stats
Examples and purpose
Describe/summarize data; no causal conclusions
e.g. %, central tendency, dispersion, correlation coefficients
Inferential stats
draw conclusions, determone statistical sig.
Type 1 and 2 errors
Type I vs Type II errors
Type I - false psoitive
Type II - false negative
What do the results of a study tell us
- can’t conclude/prove based off a single study
- Either support, refute, or modify theory
- scientific evidence, not proof
Continuous vs categorical levels of measurement
Cont - Interval and ratio
Cat - nominal and ordinal
Level of measurement with a meaningful zero
Ratio
Math with levels of measurement
Interval: add and subtract
Ratio: +, -, /, x
Summarizing levels of measurement
Nominal: mode
Ordinal: median and mode
Interval + Ratio: all three
Sex is an example of what level of measurement
Nominal
Place in a race is an example of what level of measurement
Ordinal
Temp in celsius is an example of what level of measurement
Interval
Temp in Kelvin is an example of what level of measurement
Ratio
When do ordinal and interval data overlap
- aggregating multiple items
- underlying construct is continuous
- Measurement instrument is reliable
Why collect as continuous data and then put into categoires?
Otherwise can’t get an average
Presents fewer analytic choices
sum of all scores divided by n
Mean
Median
50th percentile/middle score
First step: order scores
Next: locate middle ((n+1)/2)
Bimodal
Tie between 2 for most repeated score (mode)
2 distinct peaks in distribution shape
Multimodal
Tie between >2 for most repeated score (mode)
Where are central tendencies located on normal distribution
All in middle if perfectly normal
Frequency tables
Display distribution of a single variable
* Variable listed from highest to lowest on one side, frequency on other
Histograms
Graphical display distribution
* quantitative variables don’t have gaps between bars unless the score has frequency of 0
Skewed with peak on the right
negative skew
Skewed with peak on the right
negative skew
Percent of scores lower than an individual score
Percentile rank
Number of scores converted into %
Z Score
Difference between individual score and the mean of distribution (x-m), divided by the standard deviation √((⅀(x-m)²)/n)
Effect size differences between means
Cohen’s d
d = (M₁ - M₂) / SD
Formally use pooled SD
Pooled SD
The average spread of all data points about their group mean (not the overall mean)
Graphing correlations between quantitative variables
Line graph if x-axis variable (IV) has small # of values
Scatterplot if x-axis variable (IV) has large # of values
Linear vs nonlinear relationship
graphing
Linear: pts fit into single, relatively straight line; Pearson’s r
Nonlinear: pts fit into curved line
Pearson’s r
Purpose, limitation, steps
- For linear relationships
- From -1.00 to +1.00
- Limitation: restriction of range - limited range in sample relative to pop
- Turn scores into z scores (x and y variables seperately)
- Multiple x and y z-scores together for each individual
- Take mean of cross products
Bar graphs
Purpose, error bars, stat sig.
- Present and compare mean score of groups when IV is categorical
- Error bars for variability (extend one standard error in each direction)
- if difference between means is greater than 2 standard errors, there is statistical significance
SD of group divided by √n
STandard error
Line graphs
Application, error bars
- When IV is continuous (e.g. time) or small # of IV lvls
- Use when IV is quantitative
- Error bars for standard error
Scatterplots
- Correlations between quantitative variables when x-axis (IV) has large # of lvls
- Add regression line
Multiple-response measures
- Enter seperately, then combine using software
- assess internal consistency of the measure using Cronbach’s alpha or Cohen’s k
How to analyze/show difference between means
Bra graph and cohen’s d
How to analyze/show correlation between quantitative variables
Line graph or scatterplot (check for nonlinearity and restriction of range)
Pearson’s r
Null Hypothesis
- No relationship in the pop., relationship in the sample reflects only sampling error
- Occured by chance
- No relationship = Cohen’s d or Pearson’s r is 0
Probability of the sample result or a more extreme result if the H₀ were true
P Value
Factors of p value
Strength of relationship
Size of sample
Low p value
Sample or more extreme result would be unlikely if the H₀ were true
Reject the null hypothesis
Statistically significant
Alpha .05
5% chance or less of a result at least as extreme as the sample result if the null hypothesis were true
Even when the H₀ is true and alpha is .05, the H₀ will be mistakenly rejected 5% of the time
Greater than 5% chance ==> retain the H₀ (fail to reject)
Type I error
- Rejecting the H₀ when it is true (false positive)
- p value tells probability of making a type I error
- Cause: sampling error
- Reduce the chance of making Type I error by setting alpha to something less than .05 BUT then this raises probability of making Type II error!
File drawer problem
usually only moderate to strong positive relationships are published, leading to published effects showing a stronger relationship than is really in the population
p-hacking
various decisions in the research process to increase the chance of a stat. sig. result
Set alpha before!!!
Type II Error
- retaining the H₀ when it is false (false negative)
- Cause: relationship lacks adequate statistical power to detect relationship (e.g. sample is too small)
- Reduce the chance of making Type II error by setting alpha to something more than .05 BUT then this raises probability of making Type I error!
Statistical Power
- probability of rejecting the null hypothesis given the sample size and expected relationship strength (pearson’s r)
- Complement of the probability of committing a type II error
- Power of .80 is adequate - means there’s an 80% chance of rejecting the null hypothesis for the expected relationship strength
- To increase statistical power, increase the strength of the relationship or increase the sample size
*
Calculate probability of committing a TYpe II error given statistical power of .59
1 - .59 = .41
4 Moral Principles in scientific rsrch
- Balance risks vs benefits
- Act responsibly and with integrity (maintain trust + transperency)
- Seek justice (fair treatment)
- Respect right and dignity (autonomy + consent)
TCPS rsrch agencies
- CAN institute of health rsrch
- Nat sci and Eng. Rsrch council of CAN
- Soc sci & humanities Rsrch council of CAN
3 Levels of Risk
Federal Policy for the Protection of Human Subjects
- Exempt rsrch - nonsensitive, standard, public info
* Maintain confidentiality
* Once approved, exempt from regular, continuous review - Expedited Rsrch - no greater than minimal risk
* Reviewed by 1 member of IRB, or appointed subcommittee - Greater than minimal risk - needs full IRB review
Z score
define, distribution
Standard measure of the distance between a single point in the data and the overall mean for that variable
z-distribution:
* ranges from negative infinity to positive infinity
* has a mean of 0
* has a standard deviation of 1
T score
- Standard distribution with a mean of 50, and a SD of 10
- without negative values
- can use this to scale any variable (e.g. IQ)
Percentiles
Use to express z-scores more intuitively
Refers to the proportion scoring less than a particular value
Obtained from z-table
Parameters
Values describing population distribution
Population distribution
centre, dispersion
Centre at mu
Dispersion indicated by sigma
Sample distribution
x-bar = centre of the distribution
s = standard deviation/dispersion of the distribution
Decision Matrix
state of reality vs decision reached in inferential testing
* Type I error
* Type II error (beta)
* Power
Power
- Study’s ability to find a difference if there is one
- Correct rejection of the H₀
- 1-beta (type II error) = power
- ↑ Power by …
1. ↑ n
2. ↑ critical alpha
3. ↑ effect size (∆)
Null Hypothesis Sig. Testing
- Directional - upper/lower tailed (X… more/lower than Y) or two-tailed test (X different than Y)
- Establish rejection regions
* two-tailed tests: split the alpha value in half (0.025) so non-rejection region is 0.5 - 0.025 = 0.475
* single-tailed tests: rejection alpha value (0.05) on distribution graph - If sample falls within rejection region, reject the null hypothesis and conclude that the alt hypoth. is likely correct
Limitations of p value
- Only relevant to specific sample stats
- Conditioned on the null hypothesis being true
- the false positive rate associated with a p value of .05 is usually around 30%, but can be much higher
- silent on the magnitude and range of an effect
- Even the most miniscule effect can be statistically significant if the sample size is large enough
Limitation of significance testing
- Null hypothesis is rarely true
- ST provides a binary decision (yes or no) and a direction of the effect
- Mostly interested in the size of the effect
- Statistical vs practical significance
Statistical Significance ⍺
define, stat significance
- probability of results due to chance
- Represents chance of making Type I error
- smaller value = more “unusual” (e.g. sample is different than pop. that its being compared to)
- Fail to reject H₀ if p > ⍺
- Reject H₀ if p < ⍺
Power Analysis
Calculate expected power before conducting a study based on estimated n, critical ⍺, expected or minimum effect size (from related rsrch)
Avoid post-hoc power analysis!!!!!!
Effect size
Measure of the strength of a relationship
Unrelated to n and statistical significance
* Can be statistically significant but trivial effect
* Could be statistically insignificant, but notable effects (increase n to gain significance)
Cohen’s d = mean difference / SD
* d = 0.2, small effect
* d = 0.5, medium effect
* d = 0.8, large effect
P value vs alpha
Alpha is arbitrary number/threshold (e.g. finish line)
P-value is the actual measurement (e.g. race time)
Bayes’ Theorem
allows you to calculate exact percentages (conditional probability based on the occurrence of previous outcomes of similar circumstances)
Converging operations
Mulitple operational definitions for same construct
allows for more general conclusion if multiple measures have consistent scores
Levels of measurement with category labels
All of them!
Levels of measurement that can rank order
Ordinal, interval, ratio
Levels of measurement with = intervals
interval and ratio
Test-retest reliability
COnsistency over time
For consturcts that are assumed to be consistent over time
Internal Consistency
what? why? how?
Responses across the items on a multiple-item measure
* if all items represent the underlying construct, then people’s scores should be correlated which each other
* assessed by collecting and analyzing data
* Split-half correlation: split items into two sets, compute score for each set of items, examine relationship between the two sets using Cronbach’s alpha
Cronbach’s alpha
what? why?
mean of all possible split-half correlations for a set of items
Used to measure internal consistency
What does Cronbach’s alpha assess
quantitative judgements
What does Cohen’s k assess
Categorical judgements
Content validity
extent that measure covers the construct
check measurement method against coceptual def
Criterion Validity
what? concurrent? predicitive? convergent?
Criterion Validity: extent that peoples scores are correlated with other variables (criteria) that are expected to be correlated
- Criterion: any varibale that is expected to be correlated with the construct in question
- Concurrent validity: when the criterion is measured at the same time as the construct
- Predicitive validity: when the criterion is measured at some point in the future (after the construct has been measured)
- Convergent validity: other measures of same construct
Discriminant Validity
Discriminant Validity: extent to which scores on a measure are NOT correlated with measures of variables that are conceptually distinct (e.g. self-esteem and mood)
low correlations = evudence of conceptually distinct construct
Utility
efficiency and generality
Is the data precise and reliable, at the lowest possible cost (Efficiency)?
Can the method be applied successfully to a wide range of phenomena (Generality)?
Measurement errors
parallax, calibration, zero, damage
Parallax error (incorrectly sighting the measurement).
* e.g. don’t read the measurement right, copy it down wrong
Calibration error (if the scale is not accurately drawn)
* e.g. scale on a map is wrong
Zero error (if the device doesn’t have a zero or isn’t correctly set to zero)
Damage (if the device is damaged or faulty).
* e.g. bent level
Types of error
gross, systematic (3), random
Gross errors: human mistakes in reading instruments and recording and calculating measurement results
Systematic Errors
* Instrumental: shortcoming, misuse, measurement accuracy
* Environmental: external and envrmtl factors
* Observational: inaccurate readings, conversion error
Random Errors: disturbance about which we are unaware
5 causes of type I error
Measurement error
Lack of random sample
Alpha value too liberal
Investigator bias
Improper use of one-tailed test
Causes of type II error
Measurement error
Lack of sufficient power (n too small)
Alpha value too conservative
Treatment effect not properly applied
Social Desirable Responding
saying/doing socially appropriate thing
Demand characteristics
subtle cues in the measure that reveal how the researchers expect participants to behave
Meaurement Bias
Confirmation, recording, halo, social desirability
Confirmation bias
* find what u look for
Recording bias
* recall
* Rely on imperfect record (e.g. memory)
* Availability heuristic
* Primacy/recency effect
Halo effect
* extraneous variables affect measures
* common in subjective appraisal of ind diffs
Social desirability bias
* impression management
* participants more likely to report positive info to the experimenter
Expectancy effects
participant, pygmalion, hathorne, halo, placebo, biosoc, psysoc
- Participant types (good, bad, faithful, apprehensive)
- Pygmalion - self-fulfilling prophecy; use doubl-blind
- Hawthorne - alter bhvr (pierce!)
- Halo - short-term improvement bc of novelt of treatment
- Placebo - natural improvement, generalized effect of ‘being in treatment’, reinteropretation of outcome measures
- Biosocial experimenter cues - age, sex, attraction
- Psychosocial experimenter cues - warmth, status, etc
Reducing expectancy effects
- Standardize experimenter-participant interaction
- Use blinding techniques
- Use deception (active vs passive)
- Convince participant that you can detect lying