Module 1 & 2 Flashcards

Introduction to Statistics and Experimental Design

1
Q

What is a sample?

A

A subset of individuals from a population of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a population?

A

Set of all subjects relevant to the scientific hypothesis under examination

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a statistic?

A

A value calculated from a sample, used to estimate population parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a parameter?

A

A true measurement that describes a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a statistical hypothesis?

A

A claim regarding a population parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is sampling error?

A

The deviation from estimates and a true population parameter, purely based on chance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the characteristics of a good sample?

A

1) It is a random sample
2) It is precise
3) It is unbiased

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does precision refer to?

A

The spread of values for an estimate due to sampling error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the relationship between sample size and precision?

A

Higher sample size = higher precision and lower sampling error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does bias refer to?

A

Systematic discrepancy between estimates from multiple samples and the true population parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are 2 types of non-random samples?

A

1) Sample of convenience
2) Volunteer sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are 2 types of studies?

A

1) Experimental, where treatments are assigned
2) Observational, where treatments are not assigned by the researcher

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What can be a problem for observation studies?

A

Confounding variables - variables that influence the outcome, which is not accounted for

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the 2 types of variables

A

1) Qualitative/Categorical (membership)
2) Quantitative (magnitude)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the measurement scales for qualitative data?

A

Nominal and ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the 2 types of quantitative data?

A

Continuous and discrete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the measurement scales for quantitative data?

A

Ratio and interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is descriptive statistics?

A

Quantities that describe the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the components of descriptive statistics?

A

1) Shape
2) Spread
3) Location
4) Frequency distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is frequency distribution?

A

Describes the number of times a particular value of a variable occurs in a sample (can be absolute or relative)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are 2 ways we can depict frequency distributions?

A

Bar graphs or histograms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

When is it best to use a histogram?

A

When we are looking at the frequency distribution of a numerical data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

When is it best to use a bar graph?

A

When we are looking at categorical data sets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are the 3 types of distribution?

A

1) Frequency distribution
2) Probability distribution
3) Sampling distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is a probability distribution?

A

A distribution that depicts the probabilities associated with different values of a specific variable

26
Q

What is a sampling distribution?

A

A probability distribution of values of an estimate we could obtain from sampling a population

27
Q

What are the 3 measurements of location?

A

Mean, median, mode

28
Q

How can we describe the spread of data?

A

1) Skew
2) Range (max-min)
3) Variance
4) Standard deviation

29
Q

What is standard deviation (SD)?

A

It is a measure of spread - describes the average difference of values observed from the mean

30
Q

What is estimation?

A

Process of inferring a population parameter using sample data

31
Q

What is uncertainty and how do we quantify uncertainty?

A

Error of an estimate
1) Standard error (of the mean)
2) Confidence interval

32
Q

What is standard error (SE)?

A

Describes the standard deviation of the sample distribution of an estimate (e.g., the mean)

33
Q

How do we obtain a smaller standard error?

A

Bigger sample size > narrower sampling distribution > more precise estimate

34
Q

What is confidence interval (CI)?

A

Range of values calculated from the data likely to contain the population parameter within its range

35
Q

What are the steps of hypothesis testing?

A

1) State the hypotheses
2) Calculate test statistic
3) Determine p-value
4) Appropriate conclusion

36
Q

What is a test statistic?

A

Calculated from the data and used to determine how compatible the observed data is to expected results under the null hypothesis

37
Q

What is the p-value?

A

Probability of a obtaining a result as extreme or more extreme than the observed, assuming the null hypothesis were true (obtained from the null distribution)

38
Q

What is the null distribution?

A

Sampling distribution for a test statistic under the assumption that the null hypothesis is true

39
Q

What should a conclusion include?

A

Test used, test statistic value, df, p-value (and sample size)

40
Q

What is type I error?

A

The probability associated with rejecting a true null hypothesis (false positive)

41
Q

What determines the probability of committing a type I error?

A

The significance level, which sets the criterion for rejecting the null hypothesis

42
Q

What is a type II error?

A

Probability association with failing to reject a false null hypothesis (false negative)

43
Q

What determines the probability of committing a type II error?

A

Power - where the risk of a type II is inversely related to the statistical power of a study

44
Q

What is power?

A

Extent to which a test can correctly detect a real effect when there is one

45
Q

What determines power?

A

1) Size of the effect (bigger = more easily detectable)
2) Significance level (increase = more powerful)
3) Measurement error
4) Sample size (bigger = more powerful)

46
Q

What can power analysis be used for?

A

To determine how big a sample should be to attain a desired power level

47
Q

What is a disadvantage of experimental studies?

A

Experimental artifacts which introduce bias through unintended consequences of experimental procedures

48
Q

What are 3 ways to reduce bias?

A

1) Control groups
2) Randomization
3) Blinding

49
Q

What are 3 ways to reduce the effect of sampling errors?

A

1) Replication
2) Balance
3) Blocking

50
Q

What is a control group?

A

Group of experimental units that do not receive the treatment of interest but are kept under the same exact conditions as the treated experimental units

51
Q

What is randomization?

A

Random assignment of treatments to units in an experimental study, breaking associations between possible confounding variables

52
Q

What is blinding?

A

Process of concealing information about the control/treatment group assignment (single or double blind)

53
Q

How would having all identical experimental units affect sample error/bias?

A

Sample error is reduced

54
Q

What is replication?

A

Applying the same treatment to multiple, independent experimental subjects

55
Q

What is pseudoreplication?

A

Assumptions of independence when assigning the same treatment to multiple individuals is violated

56
Q

What is balance?

A

All treatments have equal sample size

57
Q

What is blocking?

A

Grouping of experimental units that have similar properties (repeating the same experiment to account for spatial/temporal differences)

58
Q

What are the benefits/disadvantages of using extreme treatments?

A

1) Treatment effects are easier to detect when they are large (increased power)
2) The effects of a treatment do not always scale linearly with the magnitude

59
Q

When do we use a scatterplot?

A

When both variables are numerical

60
Q

When do we use a boxplot?

A

When we want to depict a continuous variable in terms of its distribution OR when Y is a continuous variable and X is categorical

61
Q

What do we use a QQ plot for?

A

To test for normalcy

62
Q

What are the patterns we can see on a QQ plot?

A

1) Linear = normal
2) Exponential curve upward = right skew
3) Exponential downward curve = left skew