Research Methods & Communication Flashcards

1
Q

Experimental design: what is a factor?

A

What you are testing (ie a drug)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

What does the Anscombe Quartet show people?

A

That you should really plot your results before making assumptions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Conditional probability

A

P(A|B) is the conditional probability that if A is true then B is also true.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the assumptions of a linear regression?

A

* Normal errors - check by looking at histogram of residuals or QQ plot.

* Variance is constant for all values of the independent variable. - check by looking at a plot of residuals vs fitted values

*** Assumes straight-line-relationship between variables **- check by looking at scatterplots & plots of residuals vs fitted values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is multiple regression?

A

Use more than one independent variable to predict the dependent variable. (eg plant growth is dependent on light AND rainfall)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a suggested alternative to the H index?

A

The M index which will be calculated the same way as the H index then be divided by the number of years since the first publication

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Experimental design: what is a unit?

A

What you’re testing your factor on (number of people or plants or horses…)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Joint probability

A

P(AnB) is the joint probability that both A & B are true.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the t value equation?

A

t = X-μ OR there isanother t calculation

  ----- 

S/sqrtN
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What symbol is used to represent significance level?

A

alpha

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why use R?

A

+ Free

+ Open Source

+ Widely used

- Command line

- Intimidating

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When do you use the MULTIPLICATION RULE of probability?

A

To calculate the joint probability of two or more independent events. i.e flipping a head AND then flipping another head

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When does confounding occur?

A

When it is impossible to separate the effects of experimental treatment from other factors that might affect the outcome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the methods of randomisation?

A

simple, stratified, paired, pairwise, minimisation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is pseudoreplication?

A

A special case of inadequate specification of random factors where both random and fixed factors are present.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a good M value?

A

Around 1 is a good M value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

With what data would you use a barplot?

A

With FREQUENCY data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When to use a Chi-Squared test?

A

* With nominal data

* “Goodness of fit” tests used to compare observed against theoretical frequencies

* Contingency test used to show whether data are associated or independent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How to calculate the value of cells in a contingency table?

A

column total X row total

grand total

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Name measures of central tendency?

A

* Mean

* Median

* Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is covariance?

A

Covariance is a measure of how much two random variables change together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Experimental design: what is a level?

A

The level is the things you’re varying. So if your factor was a certain drug, you could have several levels within this: 10mg; 20mg; 30mg

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why are controls necessary?

A

Controls help avoid the treatment in question being confounded with experimental procedures associated with treatment. (eg without a placebo, drug effects are confounded with the act of taking the treatment)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What helps to reduce the risk of confounding?

A

Replication and randomisation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In the standard equation y = ax + b what variables are y and x?

A

y is the dependent variable

x is the independent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Standard deviation equation

A

S = (sqrt)S^2 ?????

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Graphics for exploratory analysis : univariate data

A

* Stem-and-leaf plots

* Histograms

* Boxplots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are the main points of the scientific method?

A

1) Logical guess based on other people’s results
2) Predictions tested
3) Results. Agree with hypothesis = win. If not, formulate new hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What should you do if you cannot control for some confounding variables at the experimental design stage?

A

Attempt to control for variation statistically. - take measurements of variables that might influence the result, and hope we can quantify their influence. - this generally requires replication - we lose some degrees of freedom in estimating the effect of these variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the correlation coefficient and what does it show?

A

The correlation coefficient OR Pearson’s Product-Moment Correlation Coefficient OR r.

  • falls between 1 and -1.
  • 1 = complete positive correlation
  • -1 = complete negative correlation
  • 0 = no correlation
  • Defined as the covariance divided by the product of their standard deviations.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Why are H, M and IF a bit shit?

A

All of them are strongly affected by discipline.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Common problems with experimental design and interpretation

A

* Non-independence of data points and pseudoreplication

* Sample size too small

* Confirmation bias & observer expectation

* Researcher degrees of freedom & ‘p-hacking’

* Interpreting non-significant result as meaning something true

* Interpreting a significant result as meaning that something is true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are the pros and cons of Bayesian statistics?

A

+ Allows direct statements about probability (eg the probability that one drug is better than another)

+ Can be used to calculate the probability of future observations.

- It is subjective: because the posterior probability is affected by the prior probability, different people (with different priors) can reach different conclusions from the same data. - However, as more evidence is accumulated the posterior probabilities will converge on the same result, whatever the priors. Advocates of Bayesian statistics argue that since science is based on differences of opinion, methods of analysis should reflect this.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What do stripplots and boxplots show us?

A

* allows to identify outliers, errors and patterns in variance

* gives an impression of how the continuous variable is dependent on the categorical variable

* less useful when n is high

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What do scatterplots show us?

A

* see relationships between two variables

* check for non-linearity

* check for outliers and errors

* check for change in variance

* check for structure in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

How can you achieve a more stringent significance level?

A

Use lower significance levels (e.g, 0.01 or 0.001)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What does a two-factor ANOVA allow us to test for?

A

Main effect and interactions. A main effect is the effect of one factor in isolation. An interaction is the effect of one factor when the level of the other factors is taken into account.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Why do we randomise?

A

* to avoid selection bias

* control for temporal effects

* control for regression to the mean

* basis for statistical inference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Classical statistics

A

In classical statistics, we ask what the probability of seeing our data is, given a particular hypothesis (the null hypothesis)

28
Q

What are the assumptions of the t test?

A

* Normal errors

* Independence of data points

* Equal variances - R uses a version that is fine with unequal variances

29
Q

What does ANOVA rely on?

A

The partitioning of variance in the data into that unexplained by the factor(s) & that which is explained.

29
Q

What is the equation for the correlation coefficient?

A

(x-x̄)(y-ȳ) / n-1

SxSy

30
Q

What is an H index?

A

Used to assess the quality of an individual’s scientific output

31
Q

What are the graphical rules according to that Tufte bloke?

A
  1. Data - ink ratio & graphical redesign
  2. Chartjunk
  3. Data - ink maximisation
  4. Multi-functioning graphical elements
  5. High resolution data graphics
  6. Aesthetics & technique in graphical design
32
Q

In a study…

1 in 1000 people have a rare disease.

The test for this rare disease is 99.9% accurate.

You have tested positive. What are the chances that you have the disease?

A

Two people will test positive. - one will have the disease - one is a false positive There is a 50:50 chance that you have the disease.

34
Q

Chi-squared equation

A

χ2 = Σ (obs - exp)^2

exp

36
Q

Why is using an estimate of the standard deviation bad?

A

Causes problems because it will lead to systemic underestimation of σ

38
Q

Should you swap in the Monty Hall Problem?

A

Always

38
Q

When do you use the ADDITION RULE of probability?

A

When the outcomes of an event are mutually exclusive (cannot happen at the same time) i.e the probability of rolling a 2 OR a 5

39
Q

What must you have for both correlation and regression?

A

* Normal errors

* Variances must be similar across the relationship

40
Q

Yates’ correction equation

A

χ2 = Σ ((obs-exp)-0.5)^2

exp

41
Q

What is calculated in ANOVA?

A

ANOVA calculates the between group variance, or the factor variance. - This is compared with the within group variance, or error variance by using an f test.

42
Q

What is a type 2 error?

A

Failing to reject a null hypothesis which is actually incorrect. FALSE NEGATIVE

43
Q

Bayesian Statistics

A

In Bayesian statistics, we ask what the probability of different hypotheses are, given our data: we then pick the most likely hypothesis.

44
Q

What are the assumptions of a ANOVA?

A

* Normally distributed errors

* Homoscedasticity

* Observations are independent

45
Q

Variance equation

A

S^2 = (x - x̄ )^2

       -------------

    n-1
46
Q

Does 95% PI exceed CI or does 95% CI exceed PI?

A

95% PI ALWAYS exceeds CI

47
Q

What are the different types of experimental design?

A

Single-factor; two-factor; Higher level factorial design; incomplete design; Nested design

49
Q

With what data would you use a scatterplot?

A

With two CONTINUOUS variables

50
Q

What is Bayes’ rule?

A

Bayes’ rule = P(A|B) = P(B|A) x P(A) ——————————— P(B)

51
Q

When do you use Yates’ correction?

A

When a contingency table is 2x2

52
Q

Replication is no use (per se) as you need to replicate the right things. What do you need to replicate?

A

Replicate the treatment that it is applied to

54
Q

Limitations of Chi-Squared test

A

* Each set of measurements must be independent * No sample must be s exact test instead )

55
Q

How do we overcome the systematic underestimation of σ?

A

By comparing our value of t with Student’s t Distribution which takes account of this.

55
Q

How do you report a t test statistic?

A

the difference between means was (or was not) statistically significant (t=X.XX, Ydf, P=Z.ZZ)

56
Q

What are the cons of the H index?

A

It is strongly affected by the length of a person’s career

58
Q

What is the coefficient of determination?

A

r^2 The coefficient of determination is an estimate of the % variability in one variable explained by the other variable.

60
Q

What are measures of dispersion?

A

* Variance * Standard deviation * IQR

62
Q

How do you calculate regression in R?

A

lm(dependent~independent) then use summary( ) to get more information.

63
Q

When comparing more than one mean use pairwise comparisons, what is the formula for the number of pairwise comparisons?

A

(N-1)N/2 pairwise comparisons

64
Q

What is the equation for covariance?

A

COVARIANCE = Σ (x-x̄)(y-ȳ) ————- n-1

64
Q

What is the R function for calculating the correlation coefficient?

A

cor.test( )

66
Q

How do you compare regression lines?

A

ANCOVA –> Analysis of covariance - uses the independent variable as the covariate.

68
Q

What does correlation show?

A

The strength and significance of the relationship between two variables.

69
Q

When do you calculate a t value?

A

* Compare two means * Compare a before and after

71
Q

How do you partition variability?

A

Use sum of squares (SS), do not use the variance (S^2) SS = Σ(x-x̄ )^2 OR SS = S^2 X df

73
Q

When performing pairwise comparisons, what is the formula that determines the number of errors?

A

1-(0.95^number of tests)

74
Q

What do we use regression analysis for?

A

To fit a line to allow estimates of the dependent variable to be made from the independent.

76
Q

What is extrapolation?

A

Estimating dependent variables from a regression equation outside the range of your data

77
Q

What are the uses of graphical methods (histograms, stem-and-leaf plots) to demonstrate univariate data?

A

* tell us about the shape of the frequency distribution * helps to identify outliers * helps to identify possible errors

79
Q

What is a type 1 error?

A

Reject a null hypothesis that is actually correct. FALSE POSITIVE

81
Q

How to calculate Impact Factor

A

number of times articles published in 2010 & 2011 were cited in 2012

citable articles in 2010/11

83
Q

What does P < 0.05 mean?

A

Your result is SIGNIFICANT, reject the NULL hypothesis

84
Q

How do you calculate the H index?

A

A person’s H index is the highest number, h, for which they have h papers each with h citations.

85
Q

What is the equation for the mean?

A

x̄ = 1/n Σ

86
Q

What do you need to check when looking at regression?

A

* structure in the data * Error distribution * Variance structure * Linearity

87
Q

How do you get an ANOVA table in R?

A

lm( ) or aov( )

88
Q

How do you get a histogram of residuals in R?

A

hist(model$residuals)

89
Q

How is a line fitted?

A

By the method of least squares

90
Q

How do we test for statistically significant correlation?

A

Calculate a p value associated with r

91
Q

How do we know if H1 is one-tailed or two-tailed?

A

If it is one-tailed then it will have one outcome, if it is two-tailed then it will have two outcomes.

92
Q

With what data would you use a stripplot / boxplot?

A

With one CONTINUOUS variable and one CATEGORICAL variable

93
Q

What is a prediction interval (PI)?

A

A prediction interval indicates a region we are 95% certain predictions of the dependent lie.

94
Q

How do we know if t is significant? (or any other letters that aren’t p)

A

they are greater than 0.05

95
Q

When is Pearson’ relationship used?

A

On 2 continuous variables