Statistics and Research Design Flashcards

1
Q

Outline the scientific method.

A
  1. Make an observation.
  2. Ask a question.
  3. Form a hypothesis, or testable explanation.
  4. Make a prediction based on the hypothesis.
  5. Test the prediction.
  6. Iterate: use the results to make new hypotheses or predictions.

https://www.khanacademy.org/science/biology/intro-to-biology/science-of-biology/a/the-science-of-biology

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Hypothesis

A

An explanation of something that was observed.

A clear statement that articulates a plausible explanation that would either refute or support that explanation.

Needs to be testable.

(BIOL 1105 Notes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Prediction

A

More specific than hypothesis - it is the outcome that you expect to observe if your hypothesis is true.

(BIOL 1105 Notes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why is it common practice to come up with multiple alternative hypotheses?

A

It reduces the chance that the researcher becomes attached to one hypothesis and causes confirmation bias.

It causes researchers to think of possible causes for patterns in nature before-hand rather than after-the-fact making findings more reliable.

(Betts et al, 2021)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why is a hypothesis important?

A
  1. Reduce bias
  2. More reliable
  3. Increase reproducibility

(Betts et al, 2021)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why do we do statistics?

A

Statistics allow us to
- make educated decisions,
- infer information from a sample rather than having to study a whole population
- make predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When are hypotheses not useful?

A
  1. When the goal is prediction rather than understanding.
  2. When the goal is description rather than understanding.
  3. When the objective is a practical planning outcome such as reserve design.

(Betts et al, 2021)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is inductive research?

A

Observing first then coming up with explanations later.

(Betts et al, 2021)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Shannon-Wiener Biodiversity Index

How is it interpreted?

A

The Shannon Diversity Index (sometimes called the Shannon-Wiener Index) is a way to measure the diversity of species in a community quantitatively.

Denoted as H, this index is calculated as:

H = -Σpi * ln(pi)

where:

Σ: A Greek symbol that means “sum”
ln: Natural log
pi: The proportion of the entire community made up of species i

The higher the value of H, the higher the diversity of species in a particular community.

https://www.statology.org/shannon-diversity-index/

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the limitations of the Shannon-Wiener Biodiversity Index?

A

It won’t show ecological differences between the habitats.

i.e., my two wetlands may have the same biodiversity values even though they are made up of different species.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Pseudoreplication

A

Occurs when subjects are not independent of each other but you treat them as if they were (e.g., sampling the same individual more than once).

(BIOL 1105 Notes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does pseudoreplication apply to telemetry?

A

The lack of independence between successive observations in telemetry data or in the derived behavior or fates of tagged fish can give rise to pseudo-replication if treated as independent observations in analyses. Failing to account for pseudo-replication can lead to incorrect conclusions in hypothesis testing frameworks as well as misinformed interpretations of the data.

(Brownscombe et al, 2019)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Population (in statistics)

A

The group of ALL things we are interested in (e.g., all house cats).

(BIOL 1105 Notes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Sample

A

Subset of the population that we measure.

(BIOL 1105 Notes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the properties of a good random sample?

A

Every unit in the population has to have an equal chance of being included in the sample.

Every unit in the sample should be independent of each other - an observation of one individual should not provide any useful information about another individual in the sample.

(BIOL 1105 Notes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the two major types of data?

A

Quantitative
and
Qualitative

(BIOL 1105 Notes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is quantitative data? How is it broken down?

A

Numerical data.

It can be discrete or continuous.

(BIOL 1105 Notes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Discrete Data

A

Numerical data that includes integer values only (e.g., # of matings, # of species).

(BIOL 1105 Notes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Continuous Data

A

Numerical data that is real numbers; can have decimals (e.g., length, mass).

(BIOL 1105 Notes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is qualitative data? How is it broken down?

A

Categorical data i.e., data that is subdivided into categories.

It can be nominal or ordinal.

(BIOL 1105 Notes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Nominal Data

A

Categorical data that has no inherent order (e.g., sex, hair color).

(BIOL 1105 Notes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Ordinal Data

A

Categorical data that has a natural order (e.g., rank, life history stage).

(BIOL 1105 Notes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Replication

A

Repeating a measurement.

The number of “subjects”, “objects”, or “individuals” sampled; how the procedure was repeated.

Each of the repetitions is called a replicate

(BIOL 1105 Notes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Biological Definition of Replicates

A

An exact copy of a sample that is being analyzed, such as a cell, organism or molecule, on which exactly the same procedure is done. This is often done in order to check for experimental or procedural error. In the absence of error replicates should yield the same result. However, replicates are not independent tests of the hypothesis because they are still the same sample, and so do not test for variation between samples.

(Wikipedia)

25
Q

Statistics

A

A method for describing and measuring aspects of nature from samples.

We need it whenever the features we are trying to study are noisy/variable/unpredictable.

Methods allow us to quantify uncertainty (error) in our esimates.

(BIOL 1105 Notes)

26
Q

Descriptive Statistics

A

Numbers that capture important features of a sample (prior to testing).

Summarizes details of the sample (e.g., sample size, average tail length).

(BIOL 1105 Notes)

27
Q

Inferential Statistics

A

Numbers that capture important features of the population after conducting hypothesis testing.

Used to determine how well our observed data fits with a particular hypothesis/null hypothesis.

(BIOL 1105 Notes)

28
Q

Response Variable

A

Aka dependent or outcome variable.

The outcome we are interested in - effect.

(BIOL 1105 Notes)

29
Q

Predictor Variable

A

Aka independent or explanatory variable.

The thing(s) that we hypothesize is affecting the outcome - cause.

(BIOL 1105 Notes)

30
Q

Confounding Variable

A

An ‘extra’ variable that you did not account for and that influences the variable you are investigating.

(BIOL 1105 Notes)

31
Q

Hypothesis Testing

A

Compares a dataset to the expectation derived from a specific null hypothesis. If the data are too unusual under the assumption that the null hypothesis is true, then we reject the null hypothesis.

(BIOL 1105 Notes)

32
Q

Null Hypothesis

A

A statement about a population parameter that negates our research hypothesis.

i.e., there’s no effect or relationship

(BIOL 1105 Notes)

33
Q

P-Value

A

The probability of getting a result at least as extreme (or more extreme) than the result we actually did get, assuming the null hypothesis is true.

If p < 0.05 there is only a 5% chance that we would have obtained the results we did if the null hypothesis were true so we reject the null hypothesis.

(BIOL 1105 Notes)

34
Q

Confidence Intervals

A

Provide a measure of uncertainty on an estimate by indicating the plausible range in which we can expect the true value of the parameter to lie.

e.g., If we repeated the sampling measure many times, the 95% CI is the interval that would capture the true value 95% of the time.

(BIOL 1105 Notes)

35
Q

Type I Errors

A

Aka false positives.

Occur when we reject the null when it’s actually true.

For example, in telemetry this would be detecting the presence of an animal when it was actually absent.

(Adams et al, 2012; BIOL 1105 Notes)

36
Q

Type II Errors

A

Aka false negatives.

Occur when we do not reject the null when it’s actually false.

For example, in telemetry this would be not detecting the presence of an animal when it was actually there.

(Adams et al, 2012; BIOL 1105 Notes)

37
Q

Power

A

The probability that a study will correctly reject a false null hypothesis.

(BIOL 1105 Notes)

38
Q

Are Type I or Type II errors worse for conservation? What about for my research?

A

Normally statistically we want fewer Type II errors.

Because a study that has low type II error is said to have high power which is what we want.

This is because if the power is too low, it allows little chance of finding a significant difference even when a real difference exists.

But it really needs to be looked at in the practical sense.

So for example, in conservation if you get a false positive (a type I error) and say this action needs to be done to protect such and such species even though it wouldn’t actually work you could be spending a lot of money for nothing. But if you get a false negative you could be doing something to save that species but don’t…..so really I think the spending money for nothing is better and we still want to limit the type II errors.

And I think the same for my research.

(BIOL 1105 Notes; Brown et al, 2012)

39
Q

What does statistical power depend on?

A

Alpha level.

Sample size.

The magnitude of the effect/difference we are studying.

The variability (spread) in the data.

The test we are using.

(BIOL 1105 Notes)

40
Q

What’s the best way to increase statistical power?

A

Use a larger sample size.

In an observational study where we can’t control things this is also the only way which would apply to my research.

(BIOl 1105 Notes)

41
Q

What does a high statistical power indicate?

A

A really high power means that we’d virtually always correctly reject a false null hypothesis, i.e., it means we can more easily detect what we’re looking for.

(BIOL 1105 Notes)

42
Q

Power Analysis

A

In a power analysis, the objective is to estimate the sample size needed to detect an effect (i.e. departure from the null hypothesis) with a reasonable level of power while allowing for a margin of error.

(Brown et al, 2012)

43
Q

Accuracy

A

How close a measurement is to the true value.

(Zar, 2010)

44
Q

Precision

A

How close repeated measurements are to each other.

(Zar, 2010)

45
Q

Effect Size

A

A value measuring the strength of the relationship between two variables in a population, or a sample-based estimate of that quantity.

Examples of effect sizes include the correlation between two variables, the regression coefficient in a regression, the mean difference, or the risk of a particular event (such as a heart attack) happening.

(Wikipedia)

46
Q

Margin of Error

A

Expressed as +/- percentage points, margin of error tells you to what degree your research results may differ from the real-world results, revealing how different – more and less – the stated percentage may be from reality.

A smaller margin of error is better as it suggests the results are more precise.

https://www.qualtrics.com/experience-management/research/margin-of-error/

47
Q

Collinearity and Multicollinearity

How do you address?

A

When two predictor variables are correlated.

When this happens, these variables cannot independently predict the response variable.

Multicollinearity is when more than two are correlated.

To address you need to check for this during analysis and may have to just keep one when performing hypothesis tests.

https://www.britannica.com/topic/collinearity-statistics

48
Q

Interactions (in Statistics)

A

The effect of one causal variable on an outcome depends on the state of a second causal variable (that is, when effects of the two causes are not additive).

(Wikipedia)

49
Q

Random Effects

A

Factors that that vary randomly across individuals or groups and affect the response variable.

e.g., receiver location, individual differences, sampling year

The most familiar types of random effect are the blocks in experiments or observational studies that are replicated across sites or times. Random effects also encompass variation among individuals (when multiple responses are measured per individual, such as survival of multiple offspring or sex ratios of multiple broods), genotypes, species and regions or time periods.

(Bolker et al, 2009; Whoriskey et al, 2019)

50
Q

Why did I decide to use generalized linear mixed models (GLMMs)?

A

Take random effects into account to prevent pseudoreplication.

For example, telemetry data are usually collected on a random subset of individuals from a population.

To conduct population‐level inference, individual ID, space, time,
and receiver location can be included in the model.

(Whoriskey et al, 2019)

51
Q

Generalized Linear Mixed Models (GLMMs)

A

Models that combine the properties of two statistical frameworks that are widely used in EE, linear mixed models (which incorporate random effects) and generalized linear models (which handle nonnormal data by using link functions and exponential family [e.g. normal, Poisson or binomial] distributions).

GLMMs are the best tool for analyzing nonnormal data that involve random effects.

(Bolker et al, 2009)

52
Q

Cronbach’s alpha

A

Cronbach’s alpha is a number between 0 and 1 that measures internal consistency reliability of Likert scales.

Zero indicates low internal consistency reliability and 1 indicates high internal consistency reliability.

Internal consistency reliability is how well a group of questions measure the same construct.

In general, a good Cronbach’s alpha is between 0.75 and 0.90.

Cronbach’s alpha is impacted by the number of questions with more questions producing a higher Cronbach’s alpha value, therefore, if the Cronbach alpha level is low, it is possible it is just a matter of needing to add more questions and not poor reliability. If the Cronbach’s alpha is above 0.90 the survey likely has redundant questions that can be removed.

(Tavakol, 2011)

53
Q

Ordinal Logistic Regression

A

Ordinal logistic regression is a statistical analysis method that can be used to model the relationship between an ordinal response variable and one or more explanatory variables (which can be discrete, continuous, or ordinal).

This will be used for the social study as likert responses are considered ordinal since they are categorical and have no natural order and because I’m looking at the impacts of various responses on pro-environmental behaviour.

https://cscu.cornell.edu/wp-content/uploads/91_ordlogistic.pdf

54
Q

What are the assumptions of ordinal logistic regression?

A

The dependent variable is measured on an ordinal level.

One or more of the independent variables are either continious, categorical or ordinal.

No Multi-collinearity - i.e. when two or more independent variables are highly correlated with each other.

Proportional Odds - i.e. that each independent variable has an identical effect at each cumulative split
of the ordinal dependent variable.

https://www.st-andrews.ac.uk/media/ceed/students/mathssupport/ordinal%20logistic%20regression.pdf

55
Q

Thematic Analysis

A

One of the most common forms of analysis within qualitative research.

It emphasizes identifying, analysing and interpreting patterns of meaning (or “themes”) within qualitative data.

(Wikipedia)

56
Q

Home Range Analysis and Kernel Density

A

Home range analysis looks at the area an animal uses for the majority of its activities.

Kernel density is one method to evaluate home range. It determines the probability of finding an animal at any one spot.

(Calenge, 2023)

57
Q

What are the benefits and disadvantages of using Likert scales?

A

Benefits:
- Easy to implement
- More standardized
- Easier to quantify
- Makes questions easier to answer for respondent

Disadvantages:
- If their real choice isn’t listed, they’re forced to choose another
- Subject to bias

58
Q

How do you code survey responses?

A

Each response in Likert scale gets assigned a particular number in a defined way (e.g., 5 = strongly agree = more positive, 0 = strongly disagree = more negative) then these numbers are used to find overall scores.

Open-ended questions are assigned theme codes then summarized.