Final (stats) Flashcards

1
Q

Table

A

used to present many numerical values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Figure

A

used to show patterns, trends, or relationships

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Qualities of a good table

A

Should be understandable on its own
Includes appropriate title in proper location
Logical format
Justified numbers → decimal points line up
Good / consistent spacing
Legend

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Qualities of a good figure

A

Understandable on its own
Axes labels (with units)
Appropriate scaling of axis
Symbols
Customized (not the excel default)
No need for box borders around graph
Trendline should be thicker and clear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Figure legends (caption)

A

The key to understanding a figure
A good figure legend includes:
Title
Materials and methods (description of techniques used)
Results (further explanation of the data)
Definitions (of symbols, patterns, lines, abbreviations, etc).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Monty Hall Problem

A

It involves a scenario where you have a 1/3 chance of initially choosing the door with a prize behind it. When the host reveals one of the other doors with no prize, the probabilities shift. By switching doors, you essentially capitalize on the new information and increase your chances of winning to 2/3.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Probability

A

The degree of certainty or chance that something will happen.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Statistics

A

Help us…
Reduce and describe data
Quantify relationships among data
Determine if sets of data are similar / different

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Goals of a data analysis

A

Data reduction (and description)
Reduce measures to make more meaningful
Averages, spread, bar chart / plots / histograms (descriptive)
Easier and more meaningful to read than all the individual data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Establish relationships

A

Descriptive – describe relationship between two observations
Relationship between height and weight
Casual – did something cause the other

Intervention → caused some response

Inference
Infer outcome from sample to population
Is what we see in sample true in population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Purpose of sampling

A

to approximate a larger population on characteristics relevant to the research question.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Histograms

A

Graphical representations
Mainly represent frequency (# of subjects that fall into a range).
Measures of central tendency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Mean

A

average
x̄ = ΣX / N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Median

A

middle of distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Mode

A

most frequently occurring value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Range

A

difference between high and low values in a data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Confidence interval

A

interval estimate of the population mean (using SEM)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Standard Error of the Mean (equation)

A

standard deviation / √sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Normal distribution

A

probability that is symmetric about the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Kurtosis

A

measure of outliers in a distribution

High kurtosis → heavy tails or outliers (platykurtic
Low kurtosis → light tails or no outliers (leptokurtic)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Standard deviation

A

Measure of data around the mean
Amount by which every value varies from the mean
How tightly values in dataset are bunched around the mean
Variability of individual observations around a single sample mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Central limit theory

A

when many samples are drawn from a population, the means of these samples tend to be normally distributed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Empirical rule

A

for a normal distribution, nearly all data fall within three standard distributions of the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Standard error of the mean (SEM)

A

how close sample values are to the average of all data points
also shows how accurately the average reflects the sample data
essentially compares the experimental mean to the true sample mean

SEM will always be lower than SDEV

the larger the sample, the lower the SEM which is good

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Confidence intervals
give an estimate of how well the sample mean represents the population mean. Range of likely values for population parameter Uses reliability coefficient (s) and SEM
26
Statistical hypothesis testing
Applies the scientific method to data with random fluctuation
27
The null hypothesis (H0)
effect of data does not represent real effect in hypothesis but is merely a result of random fluctuation. Hypothesis that there will be no difference nor relationship between variables.
28
Alternative hypothesis (Ha)
hypothesis formulated based on existing knowledge, theories, or observations. Difference between variables is specified (one group is greater / uses the other)
29
One Tailed vs. Two Tailed Test
One-tailed → clear directional prediction based on prior knowledge Two-tailed → no specific direction expected / both directions equally plausible The decision to use one tailed vs. two tailed test must be made prior to conducting analysis.
30
P-Value
The p-value is the probability of finding your result when the null hypothesis is true. Computed under the assumption the null is supported Region of high probability = high p-value (shaded blue) Region of low probability = low p-value (shaded gray) if the p-value is 0.05 % that tells us that we have a 5% chance of the data supporting the null hypothesis. the confidence level tells us that given this value, we think that there is a 95 % chance that the data will fall into the alternative hypothesis
31
Parameters of likelihood for observations
called alpha levels pre determined (usually 0.05) Alpha value is a probability value The probability threshold decided is low enough for us to decide the null hypothesis is unlikely to be true. If p-value is less than alpha, it is unlikely the null is true.
32
Significant difference
If p-value is low enough, we reject the null hypothesis and conclude a significant difference. (when p is < a → less than a)
33
Parametric tests
Assumes the sample represents the population Follows a normal population distribution (regular bell shape)
34
Non-parametric tests
No assumptions The area of study is better represented by the median (not the normal distribution) Very small sample size Ordinal or ranked data, or outliers cannot be removed.
35
T-test (overview)
compares the means between two groups Based on t distribution T-value measures the size of the difference relative to variation in sample data.
36
Independent (unpaired) t-test
Grouping categories are independent and unrelated Ex. different people, animals, or things where values of one group do not affect the other.
37
Dependent (paired) t-test
Grouping categories are related Ex. the same person at two points in time. If the t test statistic is greater than the critical value, the null can be rejected. A smaller sample means fatter tails (greater likelihood that values will be outliers → bad thing). A larger sample indicates that the value will be closer to the mean.
38
ANOVA
compares the means among three or more groups
39
Degrees of freedom
Is the freedom to vary n (sample) - 1 = degrees of freedom Indicates the number of independent pieces of information
40
Critical value
AKA THE Z-VALUE a specific value or threshold used to determine the acceptance or rejection of a statistical test or hypothesis
41
Types of error
Type I error (false positive) → rejection of a null hypothesis that is actually true in the population (theres a significance when there actually isn't) Type II error (false negative) → failure to reject a null hypothesis that is actually false in the population (theres no significance when there actually is) Type II is especially a risk when doing multiple t-tests instead of doing ANOVA
42
Bonferroni technique
You reduce the chances of incorrectly rejecting the null hypothesis (type I error) in any of the individual tests, but it also increases the possibility of making a Type II error (false negatives), meaning that you might fail to detect a true effect. Looks at the alpha level and divides that by the number of comparisons being made.
43
Downsides of just an ANOVA test
Doesn’t tell us which means are different, it only determines that there is a difference. Hence, we can follow that with a post-hoc test. They help identify where the significant differences lie, providing more specific and detailed information about the relationships between the groups or conditions being compared
44
Correlation
Measures the association of two variables Uses correlation when we want to quantify the strength and direction of a relationship.
45
Depictions of correlations
r < 0.30 → weak to no correlation r = 0.60 → moderate to strong relationship r > 0.70 → substantial to very strong relationship
46
r
correlation coefficient Refers to a measure of the strength and direction of the linear relationship between two variables. It ranges between -1 and 1, where -1 indicates a perfect negative linear relationship, 1 indicates a perfect positive linear relationship, and 0 indicates no linear relationship. Essentially it tells you strength (magnitude) and direction (positive or negative sign) for the relationship. The Pearson correlation coefficient is the specific calculated value.
47
r^2
coefficient of determination The amount of variance in one variable that is explained or predicted by variance in another variable. It ranges between 0 and 1, where 0 indicates that the independent variables explain none of the variability in the dependent variable, and 1 indicates that the independent variables explain all of the variability.
48
b
slope
49
a
slope
50
Best straight line fit
Minimize the sum of the squared difference between data and curve fit line. y = mx + b
51
Single vs. multiple correlations
Single correlation refers to the relationship between two variables, typically measured using the Pearson correlation coefficient (r-value) Multiple correlation refers to the relationship between a dependent variable and multiple independent variables.
52
Type I Error
The probability of committing a type I (rejected the null but it’s actually true) error is whatever our alpha value is set to. Ex. for an alpha value of 0.05, we have a 5% chance of committing type I error.
53
Type II Error
The probability of committing a type II error (failure to reject the null when it is false) is denoted by beta, which has to do with power (its typical value is 0.8 → 80% chance we are sure we aren't committing a type II error). Beta is the probability that the experiment will yield a not significant result High power = high chance that your experiment will find a statistically significant result when one is present.
54
Power
the probability of rejecting the null hypothesis when it is false (good thing). Is the probability (can’t be negative)... Of making a correct decision That a significance test will pick up an effect that is present Of avoiding a type II error
55
Effect of Power On: - effect size - sample size - sample variance - alpha level
Effect size Large discrimination (shown in bottom right) indicates that we won’t need much power to be sure that the groups are different since they are more spread out and distinct. Hence, more power is needed to detect smaller differences. Sample size Smaller sample sizes require larger amounts of power to detect differences Sample variance Higher variance yields small amounts of power. Alpha value Alpha level is proportional to power lowering the alpha value increases the strictness of the test, making it harder to reject the null hypothesis
56
Diagnostic tests
used to determine the presence or absence of a particular condition in an individual. Validity is evaluated by a test’s ability to assess the presence (sensitivity) or absence (specificity) of a medical condition. Tries to answer a yes or no, often from a non-binary variable. Thus, there must be a cut off point to help create a yes or no answer.
57
True positive
test predicts condition and they have condition
58
False positive
test predicts condition but they do not have condition
59
False negative
test predicts no condition but the have condition
60
True negative
test predicts no condition and they do not have condition
61
Prevalence
portion of population that has a condition (true positive + false negative) / everyone (total population)
62
Sensitivity
proportion of people with conditions that test positive (relative to all individuals with the condition). True positives / total people with condition (TP + FN)
63
Specificity
proportion of people without condition that test negative (relative to all individuals without the condition) True negatives / total people without condition (TN + FP)
64
Positive predictive value
proportion of people with condition that tested positive for the condition (relative to all p tests). True positives / total positives (TP + FP)
65
Negative predictive value
proportion of people without condition who test negative for the condition (relative to all negative tests) True negatives / total negative tests (TN + FN)
66
Accuracy
ability to identify true results (True positives + true negatives) / total number of tests (TP + FP + TN + FN)
67
Prevalence
proportion of the population that has a condition (True positive + false negative) / total population
68
z-value
related to confidence intervals tells you how many standard deviations you are away from the mean. If a z-score is equal to 0, it is on the mean
69
how to calculate confidence interval
CI % = x +/- z*(s/√n) x = sample mean (overall and stays consistent) z = z-value typically taken from a graph. corresponds to the desired confidence interval % s = standard deviation n = sample output is a range
70
z-value at each confidence interval
99% --> 2.576 95% --> 1.96 90% --> 1.645
71
percentage of common association equation
r value * 100
72
how do t-stat and critical value effect the acceptance or rejection of the critical value
t test statistic is greater than the critical value, the null can be rejected.