Final (stats) Flashcards

1
Q

Table

A

used to present many numerical values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Figure

A

used to show patterns, trends, or relationships

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Qualities of a good table

A

Should be understandable on its own
Includes appropriate title in proper location
Logical format
Justified numbers → decimal points line up
Good / consistent spacing
Legend

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Qualities of a good figure

A

Understandable on its own
Axes labels (with units)
Appropriate scaling of axis
Symbols
Customized (not the excel default)
No need for box borders around graph
Trendline should be thicker and clear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Figure legends (caption)

A

The key to understanding a figure
A good figure legend includes:
Title
Materials and methods (description of techniques used)
Results (further explanation of the data)
Definitions (of symbols, patterns, lines, abbreviations, etc).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Monty Hall Problem

A

It involves a scenario where you have a 1/3 chance of initially choosing the door with a prize behind it. When the host reveals one of the other doors with no prize, the probabilities shift. By switching doors, you essentially capitalize on the new information and increase your chances of winning to 2/3.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Probability

A

The degree of certainty or chance that something will happen.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Statistics

A

Help us…
Reduce and describe data
Quantify relationships among data
Determine if sets of data are similar / different

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Goals of a data analysis

A

Data reduction (and description)
Reduce measures to make more meaningful
Averages, spread, bar chart / plots / histograms (descriptive)
Easier and more meaningful to read than all the individual data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Establish relationships

A

Descriptive – describe relationship between two observations
Relationship between height and weight
Casual – did something cause the other

Intervention → caused some response

Inference
Infer outcome from sample to population
Is what we see in sample true in population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Purpose of sampling

A

to approximate a larger population on characteristics relevant to the research question.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Histograms

A

Graphical representations
Mainly represent frequency (# of subjects that fall into a range).
Measures of central tendency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Mean

A

average
x̄ = ΣX / N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Median

A

middle of distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Mode

A

most frequently occurring value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Range

A

difference between high and low values in a data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Confidence interval

A

interval estimate of the population mean (using SEM)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Standard Error of the Mean (equation)

A

standard deviation / √sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Normal distribution

A

probability that is symmetric about the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Kurtosis

A

measure of outliers in a distribution

High kurtosis → heavy tails or outliers (platykurtic
Low kurtosis → light tails or no outliers (leptokurtic)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Standard deviation

A

Measure of data around the mean
Amount by which every value varies from the mean
How tightly values in dataset are bunched around the mean
Variability of individual observations around a single sample mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Central limit theory

A

when many samples are drawn from a population, the means of these samples tend to be normally distributed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Empirical rule

A

for a normal distribution, nearly all data fall within three standard distributions of the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Standard error of the mean (SEM)

A

how close sample values are to the average of all data points
also shows how accurately the average reflects the sample data
essentially compares the experimental mean to the true sample mean

SEM will always be lower than SDEV

the larger the sample, the lower the SEM which is good

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Confidence intervals

A

give an estimate of how well the sample mean represents the population mean.
Range of likely values for population parameter
Uses reliability coefficient (s) and SEM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Statistical hypothesis testing

A

Applies the scientific method to data with random fluctuation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

The null hypothesis (H0)

A

effect of data does not represent real effect in hypothesis but is merely a result of random fluctuation.
Hypothesis that there will be no difference nor relationship between variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Alternative hypothesis (Ha)

A

hypothesis formulated based on existing knowledge, theories, or observations.
Difference between variables is specified (one group is greater / uses the other)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

One Tailed vs. Two Tailed Test

A

One-tailed → clear directional prediction based on prior knowledge
Two-tailed → no specific direction expected / both directions equally plausible
The decision to use one tailed vs. two tailed test must be made prior to conducting analysis.

30
Q

P-Value

A

The p-value is the probability of finding your result when the null hypothesis is true.
Computed under the assumption the null is supported

Region of high probability = high p-value (shaded blue)
Region of low probability = low p-value (shaded gray)

if the p-value is 0.05 % that tells us that we have a 5% chance of the data supporting the null hypothesis. the confidence level tells us that given this value, we think that there is a 95 % chance that the data will fall into the alternative hypothesis

31
Q

Parameters of likelihood for observations

A

called alpha levels
pre determined (usually 0.05)

Alpha value is a probability value
The probability threshold decided is low enough for us to decide the null hypothesis is unlikely to be true.
If p-value is less than alpha, it is unlikely the null is true.

32
Q

Significant difference

A

If p-value is low enough, we reject the null hypothesis and conclude a significant difference. (when p is < a → less than a)

33
Q

Parametric tests

A

Assumes the sample represents the population
Follows a normal population distribution (regular bell shape)

34
Q

Non-parametric tests

A

No assumptions
The area of study is better represented by the median (not the normal distribution)
Very small sample size
Ordinal or ranked data, or outliers cannot be removed.

35
Q

T-test (overview)

A

compares the means between two groups
Based on t distribution
T-value measures the size of the difference relative to variation in sample data.

36
Q

Independent (unpaired) t-test

A

Grouping categories are independent and unrelated
Ex. different people, animals, or things where values of one group do not affect the other.

37
Q

Dependent (paired) t-test

A

Grouping categories are related
Ex. the same person at two points in time.
If the t test statistic is greater than the critical value, the null can be rejected.
A smaller sample means fatter tails (greater likelihood that values will be outliers → bad thing). A larger sample indicates that the value will be closer to the mean.

38
Q

ANOVA

A

compares the means among three or more groups

39
Q

Degrees of freedom

A

Is the freedom to vary
n (sample) - 1 = degrees of freedom
Indicates the number of independent pieces of information

40
Q

Critical value

A

AKA THE Z-VALUE

a specific value or threshold used to determine the acceptance or rejection of a statistical test or hypothesis

41
Q

Types of error

A

Type I error (false positive) → rejection of a null hypothesis that is actually true in the population (theres a significance when there actually isn’t)
Type II error (false negative) → failure to reject a null hypothesis that is actually false in the population (theres no significance when there actually is)
Type II is especially a risk when doing multiple t-tests instead of doing ANOVA

42
Q

Bonferroni technique

A

You reduce the chances of incorrectly rejecting the null hypothesis (type I error) in any of the individual tests, but it also increases the possibility of making a Type II error (false negatives), meaning that you might fail to detect a true effect.
Looks at the alpha level and divides that by the number of comparisons being made.

43
Q

Downsides of just an ANOVA test

A

Doesn’t tell us which means are different, it only determines that there is a difference.
Hence, we can follow that with a post-hoc test.
They help identify where the significant differences lie, providing more specific and detailed information about the relationships between the groups or conditions being compared

44
Q

Correlation

A

Measures the association of two variables
Uses correlation when we want to quantify the strength and direction of a relationship.

45
Q

Depictions of correlations

A

r < 0.30 → weak to no correlation
r = 0.60 → moderate to strong relationship
r > 0.70 → substantial to very strong relationship

46
Q

r

A

correlation coefficient

Refers to a measure of the strength and direction of the linear relationship between two variables. It ranges between -1 and 1, where -1 indicates a perfect negative linear relationship, 1 indicates a perfect positive linear relationship, and 0 indicates no linear relationship. Essentially it tells you strength (magnitude) and direction (positive or negative sign) for the relationship.
The Pearson correlation coefficient is the specific calculated value.

47
Q

r^2

A

coefficient of determination
The amount of variance in one variable that is explained or predicted by variance in another variable.
It ranges between 0 and 1, where 0 indicates that the independent variables explain none of the variability in the dependent variable, and 1 indicates that the independent variables explain all of the variability.

48
Q

b

A

slope

49
Q

a

A

slope

50
Q

Best straight line fit

A

Minimize the sum of the squared difference between data and curve fit line.
y = mx + b

51
Q

Single vs. multiple correlations

A

Single correlation refers to the relationship between two variables, typically measured using the Pearson correlation coefficient (r-value)

Multiple correlation refers to the relationship between a dependent variable and multiple independent variables.

52
Q

Type I Error

A

The probability of committing a type I (rejected the null but it’s actually true) error is whatever our alpha value is set to.
Ex. for an alpha value of 0.05, we have a 5% chance of committing type I error.

53
Q

Type II Error

A

The probability of committing a type II error (failure to reject the null when it is false) is denoted by beta, which has to do with power (its typical value is 0.8 → 80% chance we are sure we aren’t committing a type II error).
Beta is the probability that the experiment will yield a not significant result
High power = high chance that your experiment will find a statistically significant result when one is present.

54
Q

Power

A

the probability of rejecting the null hypothesis when it is false (good thing).
Is the probability (can’t be negative)…
Of making a correct decision
That a significance test will pick up an effect that is present
Of avoiding a type II error

55
Q

Effect of Power On:
- effect size
- sample size
- sample variance
- alpha level

A

Effect size
Large discrimination (shown in bottom right) indicates that we won’t need much power to be sure that the groups are different since they are more spread out and distinct.
Hence, more power is needed to detect smaller differences.

Sample size
Smaller sample sizes require larger amounts of power to detect differences

Sample variance
Higher variance yields small amounts of power.

Alpha value
Alpha level is proportional to power
lowering the alpha value increases the strictness of the test, making it harder to reject the null hypothesis

56
Q

Diagnostic tests

A

used to determine the presence or absence of a particular condition in an individual.
Validity is evaluated by a test’s ability to assess the presence (sensitivity) or absence (specificity) of a medical condition.
Tries to answer a yes or no, often from a non-binary variable. Thus, there must be a cut off point to help create a yes or no answer.

57
Q

True positive

A

test predicts condition and they have condition

58
Q

False positive

A

test predicts condition but they do not have condition

59
Q

False negative

A

test predicts no condition but the have condition

60
Q

True negative

A

test predicts no condition and they do not have condition

61
Q

Prevalence

A

portion of population that has a condition

(true positive + false negative) / everyone (total population)

62
Q

Sensitivity

A

proportion of people with conditions that test positive (relative to all individuals with the condition).

True positives / total people with condition (TP + FN)

63
Q

Specificity

A

proportion of people without condition that test negative (relative to all individuals without the condition)

True negatives / total people without condition (TN + FP)

64
Q

Positive predictive value

A

proportion of people with condition that tested positive for the condition (relative to all p tests).

True positives / total positives (TP + FP)

65
Q

Negative predictive value

A

proportion of people without condition who test negative for the condition (relative to all negative tests)

True negatives / total negative tests (TN + FN)

66
Q

Accuracy

A

ability to identify true results

(True positives + true negatives) / total number of tests (TP + FP + TN + FN)

67
Q

Prevalence

A

proportion of the population that has a condition

(True positive + false negative) / total population

68
Q

z-value

A

related to confidence intervals

tells you how many standard deviations you are away from the mean. If a z-score is equal to 0, it is on the mean

69
Q

how to calculate confidence interval

A

CI % = x +/- z*(s/√n)

x = sample mean (overall and stays consistent)
z = z-value typically taken from a graph. corresponds to the desired confidence interval %
s = standard deviation
n = sample

output is a range

70
Q

z-value at each confidence interval

A

99% –> 2.576
95% –> 1.96
90% –> 1.645

71
Q

percentage of common association equation

A

r value * 100

72
Q

how do t-stat and critical value effect the acceptance or rejection of the critical value

A

t test statistic is greater than the critical value, the null can be rejected.