2 - Me, Myself and I Flashcards

1
Q

Why is biology a considered a quantitative subject?

A

Research relies heavily on accurate and precise measurements, and variables are often manipulated and use of controls allow us to observe cause and effect relationships. We can quantify diversity via experiments, and using this data we can observe trends and create graphs; We can also create hypotheses, making predictions and establishing causes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why are statistical analyses important?

A

Mathematical models and statistical analyses are important as they help us to understand genetic data and complex mechanisms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the mean, median and range?

A

Mean - the average of a dataset.

Median - the middle value of a dataset.

Range - maximum and minimum values of a dataset (spread of variable values).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the difference between a sample and population?

A

A sample is only a small subset of the total population, and so when looking at results of a sample we must take this into consideration - the data collected will/may differ from the wider population.

The population is all members of a defined group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a sampling error and its causes?

A

A sampling error is the random variation introduced into a dataset as a function of only sampling a subset of the total population - there is a difference in the value(s) from the sample compared to the true population value(s).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Ways to represent categorical data (1).

A

Chi-square test assumes variables are categorical (can be divided into groups), independent and are >5. It can be used where the observations are assigned into mutually exclusive classes - these are compared to those under the null.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Ways to represent continuous data (1).

A

Boxplots are effective in presenting continuous data (changes over time) - it shows the median, range, IQR and dots for outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the null hypothesis?

A

The default expectation that categorical outcomes are all equally likely and so there is no relationship or association.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the alternative hypothesis?

A

The expectation that categorical outcomes are not all equally likely, and so there is a relationship between two measured phenomena, or association.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the degrees of freedom?

A

This refers to the number of values in a calculation that are free to vary - minimum is 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does p<0.05 mean?

A

The probability is statistically significant and so we can reject the null and accept the alternative.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does statistical significance relate to the p-value?

A

p < 0.05 means that there is strong evidence supporting the alternative hypothesis. So if it is small, we are more inclined to reject the null and favour the alternative as we now have less than a 5% chance of seeing a trend/deviation following the null.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How can p-values be used as evidence?

A

If the p-value is sufficiently lower than 0.05, then we know to reject the null as the probability of it occurring is little - but too close to the threshold may incline you to repeat the experiment and increase sample size; This then allows us to see whether we support the null or an alternative through statistical analysis. It also allows us to see whether the deviation from the null is likely due to sampling error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a type I and II error?

A

Type I: false positives e.g. p = 0.049.

Type II: false negatives e.g. p = 0.051.

In both cases we should not reject/accept any hypotheses, and instead should collect more data.

These sampling errors may arise due to sample size (data collection) and experimental design.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is effect size?

A

The effect size is the degree to which the phenomena affects the whole population, and not just the sample - the magnitude of the effect. Small effect size - indicates minimal/negligible effects, large - substantial effects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why is biological context important?

A

Context surrounding data is key as we need to know at which level the effect is observed at - like population - and the data must be analysed with sample and effect size taken into consideration. We do not know the cause and effect relationship solely from the statistics.

17
Q

What is a t-test and when would you use it?

A

It determines whether the mean of one group is statistically different from the mean of another (only suitable for two variables). It can be used for a boxplot.

18
Q

What is a regression model and its uses?

A

A regression model describes the relationship between a response and explanatory variable - cause and effect (with context). This can be used for a scatter plot.

19
Q

What is the line of best fit in relation to regression models?

A

The line of best fit represents the relationship between the two variables (dependent and independent) in the regression model. There will be residuals that do not fit with the trend, and these are the differences between observed and predicted values. The line is calculates the values which minimises the amount of these residuals (hence best fit).

20
Q

What is the workflow for data analysis?

A

(1)Plot data, (2) initial visual analysis, (3) statistical test, (4) interpret test output, (5) interpret output in biological context.

21
Q

What is the R^2 value?

A

The proportion of the variance of our response variable explained by the explanatory variable - will be between 0 and 1. The closer to 1 the stronger the relationship/association - 0 means none.

22
Q

What is a biphasic relationship?

A

Two phases can be identified.

23
Q

What is the 95% confidence interval?

A

A range where it is likely that the true population average is located - given we never know this for sure.

24
Q

How does sample size relate to error margin?

A

The smaller the sample, the greater margin for error, and so there is a larger 95% confidence interval.

The larger and more unbaised the sample distribution, the better representation of the population distribution, and so the interval decreases in size as we know the sample average will be closer to that of the true population.

25
Q

What does the overlap of notches in a boxplot indicate?

A

This indicates there may not be a statistically significant difference between groups - can be tested by a statistical model, so in theory for now we accept the null (no relationship).

26
Q

What is a multivariate linear model and its uses?

A

There is multiple dependent variables (Y) and a single independent variable (X), and so multiple outcomes are measured.

27
Q

What can we use if we have less than five datasets?

A

Due to chi-square being most effective with >5, we can use the Fishers Exact Test if required.

28
Q

What is longitudinal data and its effects?

A

This follows and individual over a period of time - we cannot determine relationships/outcomes without this data as it provides a sufficient amount (long timescale).

29
Q

What us cross-sectional data and its effects?

A

This is a graph showing different individuals at a certain time. We can measure both the cause and effect at the same point in time.

30
Q

What is multiple testing and its effects?

A

This is when we test many variables at once which may be unnecessary and have no relevance - in the hope we discover a chance relationship/association between two variables (hold significance). This wastes time, money, resources etc.

31
Q

What is cherry picking and its effects?

A

This is when we only present the positive results and ignore any other relevant findings - want research to be significant. This is a misrepresentation of the data.

32
Q

What would show evidence of a paired t-test?

A

Comparison of 2 similar populations, each individual tested twice.