Week 3 Flashcards
External validity
Can results generalise to wider population.
Quasi experiment
Non random assignment of participants. I.e. Different professions, age etc
Ecological validity
Wether the experiment actually mirrors the real life conditions of phenom men being measured.
Measurement error
Assumed discrepancy of data collected and true value of measurement
Validity
Measuring what we set out to measure. Must be reliable too
Reliability
Consistency of results. Does not have to be valid. Must measure in same way each time.
Face validity
Weak type of validity. What the test taker thinks of the test. I.e personality test with weird questions.
Content validity
Are the items a representative sample of all possible items. Ie test measuring only week 1/2 of 10 weeks.
Criterion-related validity
Extent to which a score indicates a level if performance on an criterion against which it is compared. Predictive/concurrent.
I.e. GPa and honours entry
Construct validity
Intelligence measurement and how do we know they are measuring intelligence as a construct.
Convergent validity (construct validity)
Should correlate with questionnaires that measure
Same construct
Related constructs
Discriminant validity
Should not correlate with questionnaires that measure
Different constructs
Unrelated constructs
Experimenter effect (reactivity of measures)
What happens when bias of experimenter is known and impacts subjects performance. Hawthorne experiment with lights.
Reliability 3 types
Test-retest
Internal consistency
Interrater reliability
Test-retest
Measure same individuals at two points in time
Internal consistency
Uses responses at only one time and focuses on consistency of items (all measuring same things?)
Interrater reliability
Evidence of reliability when multiple taters agree in observations of the same thing
Operationally defined variables
Must be defined how going to capture/measure the variable each time. To turn it into numbers…
Levels of measurement
Relationship between what is being measured and the numbers that represent what is being measured.
Categorical variable
Names distinct entities I.e. Binary (2 groups)
Continuous variables
Can take on any value on the measurement scale
4 levels of measurement:
Nominal
Ordinal
Interval
Ratio
Nominal variable
Two or more things are equivalent in some way are given same number. Numbers have no meaning…just used as labels (categorical)
Linear model
Based on a straight line
Variables
Measured constructs that vary across entities in the sample
Parameters
Estimated from the data usually constants representing fundamental truth about the relations between variables I.e. Mean median
Coefficients (b)
Estimate relationship btwn 2 variables
Sum of squares
Same process as sum of squares except deviance = outcome - model (asseses fit) total deviance of scores from the mean
Sum of squares as a good measurement
Relies on amount of data
More data points
Higher SS
Average error
Divide SS (total error) by number of values (N)
Mean error
Divide not number of scores but the degrees of freedom (df)
Degrees of freedom (df)
Number of scores used to compute the total adjusted
Lack of fit
Large values relative to the model
Method of least squares
Principle of minimising the sum of squares
Standard error of the mean (SE)
SD of sample means. Acknowledges we can’t take 1000s of samples. SD divided by the square root of the sample size. How well sample fits overall population.
Central limit theorem
Applied if sample is large (above 30) use this to calculate SE. Sample distribution always normally distributed. The mean of all samples from 1 pop will be the same as the population mean
Confidence interval
Limits constructed such that for a certain % of samples (95% or 99%) the true values of pop mean will fall within these limits. Boundaries in which we believe true value of mean will fall.
Large SE
Means a lot of variability between means of different samples - may not give accurate representation of population
Small SE
Most sample means are close to pop mean - sample likely to be accurate reflection
95% confidence interval
If you collected 100 samples, calculated the mean and then calculated confidence intervals - 95% of CI would contain true value of mean in the pop.
Calculating confidence intervals
Need to know what SD and mean are.
Z x SD + mean = X
Lower boundary confidence interval
Mean - zscore x SE
Upper boundary of confidence interval
Mean + zscore x SE
Sampling variation
When the sample means vary from other sample means
Sampling distribution
Frequency distribution of sample means (graph)
Oridinal variable
Tell us order that things occur. Nothing about ranking etc. I.e. Horse racing (categorical)
Interval variable
Shows intervals on the scale I.e iq (continuous)
Ratio interval
Builds on an interval. How many kids in a family? Starts at 0 (continuous)
Continuous variables are:
Continuous - any measure on scale
Discreet - certain defined values
Descriptive statistics
Describe essential characteristics of data. Snapshot
Frequency distribution
Plots how many times each score occurs
Histogram
Values of observations plotted horizontal axis. Bars of frequency scores on y axis
95% confidence interval
Z score is 1.96
99% confidence interval
Z score 2.58
90% confidence interval
Z score 1.64
t-distribution
Calculates confidence interval for small sample sizes (instead of z score). Look up degrees of freedom in t-distribution.
Df - (t-score x SD) = lower boundary
Df + (t-score x SD) = upper boundary
Calculating confidence interval
% of confidence interval - (z/t score x SE) = lower boundary
% of CI + (z/t score x SE) = upper boundary
Null hypothesis
Effect is absent
Fisher’s p-value
(Probability) p = .01 strong evidence to back up a hypothesis
Alternative/experimental hypothesis
Effect will be present
Null hypothesis significance testing NHST
Designed to tell wether the alternative hypothesis is likely to be true
- Assume null is correct
- Fit statistical model to our data represents alternative hypothesis
- Use p-value to calculate probability
- If criterion less than .05 = model fits data and confidence gained in alternative hypothesis
One-tailed test
Statistical model that tests a directional hypothesis: the more someone reads this book the more they want to kill its author (predicting direction of data)
Two-tailed test - most common
Drastically model testing a non-directional hypothesis: reading more of this book could increase or decrease the readers desire to kill its author
Type I error
We believe genuine effect on population when there isn’t p of this error is .05
Type II error
We believe there is no effect on population and there is! Max probability of this error occurring .2
Experimentwise error rate
When a larger number of tests are conducted to measure research question. All type I errors for all tests are added and then taken from 1 to give the % probability that a type I error will occur.
Bonferroni correction
Calculation done to keep family wise error rate controlled and below .05
Divide type I error % by number of tests (comparisons)
Standard deviation
How well the mean fits the sample
Statistical power linked with…
Sample size
Effect size
Standardised measure of the magnitude of the observed effect I.e. Cohens d, pearsons correlation coefficient r
Cohens d
Effect size for the comparison of two means. Difference btwn 2 means divided by the pooled SD (if 2 SD use control group SD)
Cohen d sizes
.2 = small .5 = med .8 = large
Test statistic
How frequently different values occur. Used to test the hypothesis.
Statistical power
Ability of a test to detect an effect of a particular size.