Statistics Flashcards

1
Q

What is a t-test and why would you use it?

A

a t-test is a statistical test that is used to compare the means of two groups.

it is used to determine if populations have significant difference or this difference is down to variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a P-Value?

A

the probability that the results from your sample data occurred by chance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What P-value should you use and why?

A

The p-value threshold of 0.05 is widely used because it represents a practical balance between the risks of Type I and Type II errors.

It is essential because it helps you make informed decisions about your hypotheses based on the significance of your results.

P value > 0.05 is the propability that the null hypothesis is true.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the T-value?

A

The t-value is a statistic used in hypothesis testing to measure the difference between the observed sample mean and the expected population mean, accounting for the variation in the sample data.

A higher magnitude of the t-value indicates stronger evidence against the null hypothesis (H0), suggesting that the observed difference is more significant and less likely to be due to random chance.

Therefore, the greater the t-value, the more compelling the argument that the null hypothesis may not be true.

T critical at 0.05 p is 1.895 - so, we accept T values above this.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

You complete a T test comparing two variables, your T value is 0.75, what does this mean and why?

A

this t value is below t critical of 1.895 so this means that there is no significant difference between the groups, this is because the t critical at 0.05 P is 1.895, so anything below would mean the null hypothesis likelihood is more than 5%.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

If a T-test comparing test scores between two study methods gives a T value of 2.8, is there a significant difference? Why?

A

Yes, a T value of 2.8 suggests a significant difference, as it indicates a notable difference between group means compared to within-group variation, likely rejecting the null hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you calculate degrees of freedom

A

df = N1 + N2 - 2

N = sample number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is degrees of freedom?

A

Degrees of freedom (df) are the number of values in a calcution that are free to vary.

It is the flexibility you have when working with a dataset.

This is often defined as sample size - 1 (n-1).

The more df (larger sample) means a more reliable estimate of population parameters as there’s more data to reduce uncertainty.

Fewer df (smaller sample) means higher uncertainty in estimates, requiring adjustments to the t -distribution to account for this.

For two independent samples, the formula is:

df = (n-1)+(n+1)

where n and n are the sample sizes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a paired t test?

A

A paired T-test compares the means of two related groups, such as measurements from the same subjects at two different times or conditions, to determine if there is a significant difference.

It calculates the mean of the differences and tests if this mean is significantly different from zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a chi-squared test

A

The chi-squared test is a statistical test used to see if there’s a meaningful difference between what we expect to find in data and what we actually observe. It’s particularly useful when working with data divided into categories (like types of fruits, survey answers, etc.).

It is used when you have categorical data and want to see if the differences between groups are just by chance or if they’re statistically significant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When should you do a two-tailed test?

A

A two-tailed test is used to detect differences in both directions (positive or negative) between groups or conditions.

It tests for any significant difference, not just an increase or decrease, making it more flexible and applicable when there’s no specific prediction about the direction of the effect.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is beta in power calculations?

A

beta (β) is the probability of making a Type II error (a false negative), which is failing to reject the null hypothesis when there is a true effect.

It represents the chance of missing a real difference. Typically, power = 1 - β, with a lower beta indicating higher test sensitivity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is alpha in power calculation?

A

alpha (α) is the probability of making a Type I error, which occurs when the null hypothesis is incorrectly rejected (false positive).

It represents the significance level of a test, commonly set at 0.05, meaning there’s a 5% chance of finding a significant result by chance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When would you use Anova?

A

ANOVA (Analysis of Variance) is used when comparing the means of three or more groups to determine if at least one group mean is significantly different from the others.

It’s helpful in experiments with multiple groups or conditions, testing for overall differences rather than just pairwise comparisons.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is quantitive data?

A

Quantitative data is numerical information that represents measurable quantities.

It includes data that can be counted or measured, such as height, weight, age, or test scores, and is typically analysed with statistical methods to identify patterns or relationships.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is qualitative data?

A

Qualitative data is non-numerical information that describes characteristics, qualities, or categories.

It includes data like interview responses, observations, and descriptions, often used to understand concepts, opinions, or experiences.

17
Q

What is an F value?

A

An F value is a statistic calculated in ANOVA tests, comparing the variance between group means to the variance within groups.

A higher F value indicates a greater likelihood that there is a significant difference among group means.

It’s used to assess whether the observed variation is due to group differences or random chance.

18
Q

What is effect size?

A

Effect size is a quantitative measure that reflects the magnitude of a treatment or intervention’s effect in a statistical analysis.

It provides a standardised way to compare results across different studies or experiments, allowing researchers to understand the practical significance of their findings beyond just statistical significance.

19
Q

What is a Type I error?

A

A Type I error occurs when the null hypothesis is incorrectly rejected, meaning a false positive result.

It suggests finding a significant effect when there is none.

The probability of a Type I error is represented by alpha (α), often set at 0.05, indicating a 5% chance of mistakenly declaring a result significant due to random chance.

20
Q

What is a power calculation, and what does it tell you?

A

A power calculation determines the likelihood that a test will detect a true effect if one exists, quantified as statistical power.

It calculates the probability of avoiding a Type II error (false negative).

Power is affected by sample size, effect size, significance level (alpha), and variability.

Higher power (typically 80% or above) indicates a stronger ability to detect meaningful differences, ensuring that the test is sensitive enough to find real effects.

21
Q

Why do we do statistical analysis?

A

Statistical analysis is conducted to interpret data objectively and identify patterns or relationships.

It helps test hypotheses, assess the significance of results, and calculate power to ensure the analysis is sensitive enough to detect true effects.

This process controls for chance, provides reliable conclusions, and supports generalizable findings.

22
Q

What does an a priori power calculation measure, and why would you use this?

A

An a priori power calculation estimates the sample size needed to achieve sufficient power (typically 80% or higher) before conducting a study. It ensures the study is designed with enough participants to detect a true effect if one exists, reducing the risk of a Type II error (false negative). This calculation is essential for planning reliable, efficient studies that are statistically valid.

23
Q

What does a post-hoc power calculation measure, and why would you use this?

A

A post-hoc power calculation measures the power of a study after data collection and analysis to determine the likelihood that the study could detect an effect if one exists. It’s used to assess whether a non-significant result might be due to insufficient power or a true absence of effect, helping to evaluate the robustness and reliability of study findings retrospectively.

24
Q

What is a manipulative experiment?

A

A manipulative experiment is a study in which researchers actively alter or manipulate one or more independent variables to observe the effects on a dependent variable. This type of experiment allows for testing causality, as it controls other variables to isolate the impact of the manipulated factor(s). It’s commonly used in laboratory and field research to establish cause-and-effect relationships.

25
Q

What is an observational experiment?

A

An observational experiment is a study in which researchers observe and record data without manipulating any variables. Instead of altering conditions, they analyze natural variations in the variables to identify associations or correlations. Observational studies are often used when manipulation isn’t ethical or feasible, such as in epidemiology, to study patterns, relationships, and trends in real-world settings.

26
Q

What does methodology mean?

A

Methodology refers to the systematic approach and set of methods used to conduct research. It includes the design, data collection techniques, and analysis strategies employed to answer research questions. Methodology ensures that the research is structured, replicable, and suited to achieving valid, reliable results.

27
Q

What is a hypothesis?

A

A hypothesis is a testable statement or prediction about the relationship between two or more variables. It provides a basis for experimentation, guiding the research process by proposing an expected outcome. Hypotheses are often formulated as either a null hypothesis (no effect or relationship) or an alternative hypothesis (indicating an effect or relationship) to be tested and evaluated through data analysis.

28
Q

What are the factors affecting statistical power?

A

Statistical power is influenced by:

  1. Sample Size (N): Larger samples increase power by reducing the margin of error.
  2. Effect Size (d): Larger effects are easier to detect, increasing power.
  3. Significance Level (α): A higher α (e.g., 0.05 vs. 0.01) increases power but also raises the risk of a Type I error.
  4. Variance: Lower variability in data increases power, as differences are easier to detect.
  5. Test Type (One-tailed vs. Two-tailed): One-tailed tests have higher power to detect effects in a specific direction compared to two-tailed tests.
29
Q

What are the four possible outcomes of hypotheses?

A

The four outcomes of hypothesis testing are:

True Positive: Correctly rejecting the null hypothesis when there is a true effect (no error).
False Positive (Type I Error): Incorrectly rejecting the null hypothesis when there is no true effect.
True Negative: Correctly failing to reject the null hypothesis when there is no true effect (no error).
False Negative (Type II Error): Failing to reject the null hypothesis when there is a true effect.

30
Q

When is sensitivity power analysis done?

A

Sensitivity power analysis is performed after data collection to determine the minimum effect size that a study could detect with the given sample size, power level, and significance level. It’s used when researchers want to assess the study’s sensitivity to detect an effect, especially if the results were non-significant. This helps in interpreting whether the study was adequately powered to detect meaningful effects.

31
Q

How do you calculate the effect size?

A

It measures the strength of the result, it is magnitude-based and does not depend on sample size.

Cohen’s d (for comparing two means):

PooledStandardDeviation

Pearson’s r (for correlation): Measures the strength of association between two continuous variables.

Effect size helps interpret the impact of results, with larger values indicating stronger effects.

32
Q

A researcher has identified molecularly-defined subgroups of a brain tumour. They want to investigate whether there are differences in clinical features between subgroups. They collect a cohort of 100 patients, and record their molecular subgroup and whether or not their tumour was metastic (i.e. had spread beyond the primary tumour site).

What test would br appriopiate to investigate an association between molecular subgroup and metastatic status? ( 4 marks) What are the null and alternative hypotheses for this experiment (2 marks)

Working at a 5% significance level, the researcher performs the appropiate test and gets a p value of 0.049. Which hypothesis do you accept? Describe in 1-2 sentences a follow up experiment to further invertigate the molecular subgroups? (4 marks)

A

jghyuuyg

33
Q
A