Module 1 & 2 Flashcards

Introduction to Statistics and Experimental Design

1
Q

What is a sample?

A

A subset of individuals from a population of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a population?

A

Set of all subjects relevant to the scientific hypothesis under examination

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a statistic?

A

A value calculated from a sample, used to estimate population parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a parameter?

A

A true measurement that describes a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a statistical hypothesis?

A

A claim regarding a population parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is sampling error?

A

The deviation from estimates and a true population parameter, purely based on chance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the characteristics of a good sample?

A

1) It is a random sample
2) It is precise
3) It is unbiased

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does precision refer to?

A

The spread of values for an estimate due to sampling error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the relationship between sample size and precision?

A

Higher sample size = higher precision and lower sampling error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does bias refer to?

A

Systematic discrepancy between estimates from multiple samples and the true population parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are 2 types of non-random samples?

A

1) Sample of convenience
2) Volunteer sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are 2 types of studies?

A

1) Experimental, where treatments are assigned
2) Observational, where treatments are not assigned by the researcher

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What can be a problem for observation studies?

A

Confounding variables - variables that influence the outcome, which is not accounted for

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the 2 types of variables

A

1) Qualitative/Categorical (membership)
2) Quantitative (magnitude)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the measurement scales for qualitative data?

A

Nominal and ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the 2 types of quantitative data?

A

Continuous and discrete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the measurement scales for quantitative data?

A

Ratio and interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is descriptive statistics?

A

Quantities that describe the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the components of descriptive statistics?

A

1) Shape
2) Spread
3) Location
4) Frequency distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is frequency distribution?

A

Describes the number of times a particular value of a variable occurs in a sample (can be absolute or relative)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are 2 ways we can depict frequency distributions?

A

Bar graphs or histograms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

When is it best to use a histogram?

A

When we are looking at the frequency distribution of a numerical data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

When is it best to use a bar graph?

A

When we are looking at categorical data sets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are the 3 types of distribution?

A

1) Frequency distribution
2) Probability distribution
3) Sampling distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is a probability distribution?
A distribution that depicts the probabilities associated with different values of a specific variable
26
What is a sampling distribution?
A probability distribution of values of an estimate we could obtain from sampling a population
27
What are the 3 measurements of location?
Mean, median, mode
28
How can we describe the spread of data?
1) Skew 2) Range (max-min) 3) Variance 4) Standard deviation
29
What is standard deviation (SD)?
It is a measure of spread - describes the average difference of values observed from the mean
30
What is estimation?
Process of inferring a population parameter using sample data
31
What is uncertainty and how do we quantify uncertainty?
Error of an estimate 1) Standard error (of the mean) 2) Confidence interval
32
What is standard error (SE)?
Describes the standard deviation of the sample distribution of an estimate (e.g., the mean)
33
How do we obtain a smaller standard error?
Bigger sample size > narrower sampling distribution > more precise estimate
34
What is confidence interval (CI)?
Range of values calculated from the data likely to contain the population parameter within its range
35
What are the steps of hypothesis testing?
1) State the hypotheses 2) Calculate test statistic 3) Determine p-value 4) Appropriate conclusion
36
What is a test statistic?
Calculated from the data and used to determine how compatible the observed data is to expected results under the null hypothesis
37
What is the p-value?
Probability of a obtaining a result as extreme or more extreme than the observed, assuming the null hypothesis were true (obtained from the null distribution)
38
What is the null distribution?
Sampling distribution for a test statistic under the assumption that the null hypothesis is true
39
What should a conclusion include?
Test used, test statistic value, df, p-value (and sample size)
40
What is type I error?
The probability associated with rejecting a true null hypothesis (false positive)
41
What determines the probability of committing a type I error?
The significance level, which sets the criterion for rejecting the null hypothesis
42
What is a type II error?
Probability association with failing to reject a false null hypothesis (false negative)
43
What determines the probability of committing a type II error?
Power - where the risk of a type II is inversely related to the statistical power of a study
44
What is power?
Extent to which a test can correctly detect a real effect when there is one
45
What determines power?
1) Size of the effect (bigger = more easily detectable) 2) Significance level (increase = more powerful) 3) Measurement error 4) Sample size (bigger = more powerful)
46
What can power analysis be used for?
To determine how big a sample should be to attain a desired power level
47
What is a disadvantage of experimental studies?
Experimental artifacts which introduce bias through unintended consequences of experimental procedures
48
What are 3 ways to reduce bias?
1) Control groups 2) Randomization 3) Blinding
49
What are 3 ways to reduce the effect of sampling errors?
1) Replication 2) Balance 3) Blocking
50
What is a control group?
Group of experimental units that do not receive the treatment of interest but are kept under the same exact conditions as the treated experimental units
51
What is randomization?
Random assignment of treatments to units in an experimental study, breaking associations between possible confounding variables
52
What is blinding?
Process of concealing information about the control/treatment group assignment (single or double blind)
53
How would having all identical experimental units affect sample error/bias?
Sample error is reduced
54
What is replication?
Applying the same treatment to multiple, independent experimental subjects
55
What is pseudoreplication?
Assumptions of independence when assigning the same treatment to multiple individuals is violated
56
What is balance?
All treatments have equal sample size
57
What is blocking?
Grouping of experimental units that have similar properties (repeating the same experiment to account for spatial/temporal differences)
58
What are the benefits/disadvantages of using extreme treatments?
1) Treatment effects are easier to detect when they are large (increased power) 2) The effects of a treatment do not always scale linearly with the magnitude
59
When do we use a scatterplot?
When both variables are numerical
60
When do we use a boxplot?
When we want to depict a continuous variable in terms of its distribution OR when Y is a continuous variable and X is categorical
61
What do we use a QQ plot for?
To test for normalcy
62
What are the patterns we can see on a QQ plot?
1) Linear = normal 2) Exponential curve upward = right skew 3) Exponential downward curve = left skew