Quantitative research methods Flashcards
What’s P hacking?
Changing the data so you get a significant value
What’s a proxy measure?
When you can’t directly measure it so you take lots of other values
What’s Harking?
Hypothesising after the results are known
What’s publication bias?
Not getting a significant difference is hard to get published
What’s a type 1 error?
If you find a significant difference where none should exist
Incorrect rejection of a true null hypothesis
Whats the Nuremberg code?
Informed consent is essential
Research should be based on prior animal work
Risks should be justified by benefits
Qualified scientists
Physical and mental suffering avoided
Research that could result in death or disabling shouldn’t be done
What’s a type 11 error?
When there is a significant difference but you fail to find it
What’s a cross sectional design?
Comparing different groups performances
What are longitudinal studies?
Comparing same groups performance at different time points
What’s observational research?
Correlation
Linear regression
Multiple regression
Useful for establishing relationships between variables, difficult to infer if its an actual cause and effect
How to establish cause and effect?
The dependent variable should vary only in changes to the independent variable
What does nominal mean?
Numbers used to distinguish amongst objects without quantitative value
What does ordinal mean?
Numbers used only to place objects in order
What does interval mean?
Scale on which equal intervals between objects represent equal differences (no true zero)
eg. celcius
What does ratio mean?
: Scale with a true zero point – ratios are meaningful. Ratio scale are often common physical ones of length, volume, time etc.
What’s Quasi experimental design?
Treatment group compared to control group
Static group comparison
No random assignment.
• Difficult to ensure baseline equivalence
True experimental gold standard approach?
Random assignment for treatment and control group
Effective research design?
Maximise systematic variance (driven by independent variables)
• Minimise error (random) variance
• Identify and control for confounding variables
How to Maximise systematic variance?
Proper manipulation of experimental conditions will ensure high variability of independent variables.
How to Minimise error (random) variance?
Reducing the part of the variability that is caused by measurement error.
What are nuisance variables?
Variables that produces underside variation in the dependent variable
Fixed by:
Conduct experiment in a controlled environment
Larger samples – randomly assign your subjects to different conditions
With small samples match your subjects on all
demographic variables across conditions
What are placebo and demand?
Some portion of effect due to the participant’s belief in the intervention
Participants want to please the experimenter.
Controlled by:
Good control conditions
• Keep the subjects ‘blind’
• Keep the purpose of the study hidden from the participant.
• If possible, disguise the independent variable.
• Sometimes this is difficult to balance with the ethics of participant recruitment and informed consent.
What’s central tendency?
Describes measures of the centre of a distribution:
Mean, median, mode
mean = average value (best one, as uses all values)
median = middle value
mode = most frequent recurring value
advantages and disadvantages of the mean average?
- Can be influenced by extreme values.
- Is affected by skewed distributions.
- Can only be used with interval or ratio data.
- Uses every value in the data set.
- Tends to be vey stable in different samples (enabling us to compare samples).
What does bimodal mean?
When there are 2 values for the mode
multimodal is when there is more than 2
Advantages and disadvantages of the mode?
- Easy to determine
- Not affected by extreme values
- Ignores most of the values in a data set
- Can be influenced by a small number of values in the data set
Advantages and disadvantages of median?
- Not affected by extreme values
- Not affected by skewed distributions (the majority of the data is at one end of the scale)
- Can be used with ordinal, interval and ratio data
• Ignores most of the values in a data set
What’s range?
Largest value - smallest value
Greatly effected by extreme values
How to work out lower quartile, second quartile and upper quartile?
Find the median of the data set (second quartile), then find the median above and below
Interquartile range?
Upper quartile - lower quartile
Good because not affected by extreme values, but you aren’t considering half your data set
How to calculate sum of squared errors?
Find the deviance of each value from the mean (how far each value is from the mean) and then square it and then sum it all up
What’s population variance?
Calculated the same was as the mean, and it’s for a population
What’s sample variance?
The same as calculating mean but divide by the degrees of freedom (n-1)
Square root of variance = ?
Standard deviation
SD values?
68% of population in one SD
95% in 2 SD
99% in 3 SD
Standard error =?
standard deviation / square root of the sample size
Normal distribution?
Bell curve shape
WHat’s positive skew?
the tallest bars at the lower end of the scale
What’s negative skew?
Tallest bars at higher end of the scale
What’s Kurtis?
How pointy the distribution is
. A distribution that has a lot of values in the tails (called a heavy tailed distribution) is usually pointy. This is called a leptokurtic distribution and is said to have positive kurtosis
A distribution that is thin in the tails (has light tails) is usually flatter than normal. This is called a platykurtic distribution and is said to have negative kurtosis
. If a data set has a normal distribution of data we call it a mesokurtic distribution.
How to know if something is skewed or kurtois?
If skewness is twice as big as the standard error then it’s skewed
This is the same for kurtois
Test to see if population is normal?
Kolmogorov-Smirnov test and Shapirio-Wilk test
If it’s non significant (bigger than 0.05) so it’s normal
If it’s smaller then it’s not normal
Shapiro-will is better for small sample sizes
What can you do if the data is not normal?
Remove outliers - use a stem and leaf plot
This is done with a standard deviation or a percentage based rule
Perform a non parametric version of the statistical tests you want to do
Collect more data, mote likely to be normally distributed
Independent t test?
2 groups you want to compare
Large t value shows that the groups are different (above 1) if bellow 1 shows that they are similar
t = ((mean 1 - mean 2)) / (square root of all, Standard error of the mean from sample 1 ^2 + Standard error of the mean from sample 2 ^2)
Sample t test?
One group across 2 conditions
Large t value shows that the groups are different (above 1) if bellow 1 shows that they are similar
Assumptions for a t test?
- Data is continuous. This can be either interval or ratio data.
- Both groups are randomly drawn from the population (they are independent from each other).
- The data for each group is normally distributed.
- That there is homogeneity of variance between the samples. In other words, each of the samples comes from a population with the same variance.
How to measure homogeneity?
Levene’s test
If more than 0.05 equal variances are assumed use top ine
If bellow 0.05 not assumed use bottom line
For an independent t test you need to report?
The t value
Degrees of freedom
The exact significance value (p)
mean and standard deviation/standard error
mean difference and confidence intervals range
Findings of the Levene’s test
eg. Levene’s Test for Equality of Variances revealed that there was homogeneity of varaince (p = 0.18). Therefore, an independent t-test was run on the data with a 95% confidence interval (CI) for the mean difference. It was found that creativity in the first time (42.15 ± 8.38) contestants was significantly higher than in returning (37.94 ± 7.41) contestants (t(66) = 2.20, p = 0.03), with a difference of 4.21 (95% CI, 0.38 to 8.03).
Equation for t value of a paired t test?
t = d(bar) / (SD/ square root of n)
d(bar) = mean of the differences between each individuals score in each test condition
(SD/ square root of n) = Estimate of variability of mean differences between scores in population
Assumptions for a paired t test?
- The data is continuous (interval or ratio data)
* The differences between the samples are normally distributed
Reporting the output of a paired t test?
- The t value (t)
- The degrees of freedom (df)
- The exact significance value (p)
The format of the test result is: t(df) = t-statistic, p = significance value.
- The mean and standard deviation/standard error for each group
- The mean difference and the confidence intervals range
eg. Participants were less mischievous when not wearing the invisibility cloak (3.75 ± 1.91) than when wearing the cloak (5.00 ± 1.65). There was a statistically significantly reduction mischief of -1.25 (95% CI -1.97,-0.53) when participants weren’t wearing the invisibility cloak, t(11) = -3.80, p = 0.01.
How is one tailed different to 2 tailed?
One tailed means you only test in one direction eg. is it bigger or smaller than
2 tailed is both directions
Should pretty much only use 2 tail
Method on how to calculate the variance for ANOVA? (think its the same for everything)
Work out the mean of the data set
Subtract the mean from each value in the dates to you the ‘distance’ of each value from the mean (deviance)
Square each deviance so they are positive
Add them together to get the sum of squares
Divide by the degrees of freedom to get the sample variance
Or divide the sum of squares by the number of items in the dataset to get population variance
(easiest way to do this is to create a table)
Good way to examine variance?
Looking at error bars on a graph - show the standard deviation which is the square root of the variance
What other bars can be shown on graph?
Standard error bars - standard deviation divided by the square root of the sample size
Standard error is smaller than the mean and accounts for the size of the data set (because n Is in the equation)
When to use standard deviation as your error bars?
If your assumptions of normality are met and you are interested in exploring the spread and variability of the data then the standard deviation is a more useful value to use.
Essentially the standard deviation tells about the range within which we expect to observe values
When to use standard error as your error bars?
interested in the precision of the mean you have calculated or in comparing and testing differences between your mean and the mean of another data set then the standard error is more useful
the standard error gives us some information about a range in which a statistic is expected to vary.