Research Methods & Communication Flashcards
Experimental design: what is a factor?
What you are testing (ie a drug)
What does the Anscombe Quartet show people?
That you should really plot your results before making assumptions.
Conditional probability
P(A|B) is the conditional probability that if A is true then B is also true.
What are the assumptions of a linear regression?
* Normal errors - check by looking at histogram of residuals or QQ plot.
* Variance is constant for all values of the independent variable. - check by looking at a plot of residuals vs fitted values
*** Assumes straight-line-relationship between variables **- check by looking at scatterplots & plots of residuals vs fitted values.
What is multiple regression?
Use more than one independent variable to predict the dependent variable. (eg plant growth is dependent on light AND rainfall)
What is a suggested alternative to the H index?
The M index which will be calculated the same way as the H index then be divided by the number of years since the first publication
Experimental design: what is a unit?
What you’re testing your factor on (number of people or plants or horses…)
Joint probability
P(AnB) is the joint probability that both A & B are true.
What is the t value equation?
t = X-μ OR there isanother t calculation
----- S/sqrtN
What symbol is used to represent significance level?
alpha
Why use R?
+ Free
+ Open Source
+ Widely used
- Command line
- Intimidating
When do you use the MULTIPLICATION RULE of probability?
To calculate the joint probability of two or more independent events. i.e flipping a head AND then flipping another head
When does confounding occur?
When it is impossible to separate the effects of experimental treatment from other factors that might affect the outcome.
What are the methods of randomisation?
simple, stratified, paired, pairwise, minimisation
What is pseudoreplication?
A special case of inadequate specification of random factors where both random and fixed factors are present.
What is a good M value?
Around 1 is a good M value.
With what data would you use a barplot?
With FREQUENCY data
When to use a Chi-Squared test?
* With nominal data
* “Goodness of fit” tests used to compare observed against theoretical frequencies
* Contingency test used to show whether data are associated or independent
How to calculate the value of cells in a contingency table?
column total X row total
grand total
Name measures of central tendency?
* Mean
* Median
* Mode
What is covariance?
Covariance is a measure of how much two random variables change together.
Experimental design: what is a level?
The level is the things you’re varying. So if your factor was a certain drug, you could have several levels within this: 10mg; 20mg; 30mg
Why are controls necessary?
Controls help avoid the treatment in question being confounded with experimental procedures associated with treatment. (eg without a placebo, drug effects are confounded with the act of taking the treatment)
What helps to reduce the risk of confounding?
Replication and randomisation
In the standard equation y = ax + b what variables are y and x?
y is the dependent variable
x is the independent variable
Standard deviation equation
S = (sqrt)S^2 ?????
Graphics for exploratory analysis : univariate data
* Stem-and-leaf plots
* Histograms
* Boxplots
What are the main points of the scientific method?
1) Logical guess based on other people’s results
2) Predictions tested
3) Results. Agree with hypothesis = win. If not, formulate new hypothesis.
What should you do if you cannot control for some confounding variables at the experimental design stage?
Attempt to control for variation statistically. - take measurements of variables that might influence the result, and hope we can quantify their influence. - this generally requires replication - we lose some degrees of freedom in estimating the effect of these variables.
What is the correlation coefficient and what does it show?
The correlation coefficient OR Pearson’s Product-Moment Correlation Coefficient OR r.
- falls between 1 and -1.
- 1 = complete positive correlation
- -1 = complete negative correlation
- 0 = no correlation
- Defined as the covariance divided by the product of their standard deviations.
Why are H, M and IF a bit shit?
All of them are strongly affected by discipline.
Common problems with experimental design and interpretation
* Non-independence of data points and pseudoreplication
* Sample size too small
* Confirmation bias & observer expectation
* Researcher degrees of freedom & ‘p-hacking’
* Interpreting non-significant result as meaning something true
* Interpreting a significant result as meaning that something is true
What are the pros and cons of Bayesian statistics?
+ Allows direct statements about probability (eg the probability that one drug is better than another)
+ Can be used to calculate the probability of future observations.
- It is subjective: because the posterior probability is affected by the prior probability, different people (with different priors) can reach different conclusions from the same data. - However, as more evidence is accumulated the posterior probabilities will converge on the same result, whatever the priors. Advocates of Bayesian statistics argue that since science is based on differences of opinion, methods of analysis should reflect this.
What do stripplots and boxplots show us?
* allows to identify outliers, errors and patterns in variance
* gives an impression of how the continuous variable is dependent on the categorical variable
* less useful when n is high
What do scatterplots show us?
* see relationships between two variables
* check for non-linearity
* check for outliers and errors
* check for change in variance
* check for structure in the data
How can you achieve a more stringent significance level?
Use lower significance levels (e.g, 0.01 or 0.001)
What does a two-factor ANOVA allow us to test for?
Main effect and interactions. A main effect is the effect of one factor in isolation. An interaction is the effect of one factor when the level of the other factors is taken into account.
Why do we randomise?
* to avoid selection bias
* control for temporal effects
* control for regression to the mean
* basis for statistical inference