Exam 1 - Sept 20 Flashcards

1
Q

What are the four steps of the experimental process?

A

Formulate Theory → Collect Data → Summarize Results → Interpret Results and Make Decisions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Variable

A

An observed category (label) or quantity (number) in an experiment that may “vary” for different individuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Categorical variable

A

Individuals are classified into groups or categories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Quantitative variable

A

A numerical quantity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explanatory variable

A

Variable that is thought to affect (“explain”) another variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Response Variable

A

Variable that is thought to be affected by (“respond to”) the explanatory variable(s)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Inference

A

A conclusion that patterns from data can be extended to some broader context

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Statistical Inference

A

Justified by a probability model linking the data to the broader context; Incorporates measure of uncertainty

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Causal Inference

A

Enables us to establish a cause and effect relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Population Inference

A

About population characteristics, Expand results from study to larger population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe the probability model of randomization. What kind of inferences can be made when it is used?

A

Assigning experimental units (subjects) to treatment groups using a chance mechanism
Causal inference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe the probability model of random sampling. What kind of inferences can be made when it is used?

A

Selecting experimental units (subjects) to be in a sample using a chance mechanism
Population inference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Anecdotal Evidence

A

A short story or example of an interesting event that could lead to scientific investigation, but does not establish a scientific theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Observational Study

A

A study in which the group status (e.g., gender) is beyond the control of the researcher; results may be due to confounding variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Randomized Experiments

A

An experiment in which randomization is done to assign subjects to groups; accounts for confounding variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Main Lesson for Causal Inferences

A

causal inferences can be made from randomized experiments, but not observational studies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Confounding Variables

A

variables that are related to both the group membership and the outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Main Lesson for Population Inferences

A

population inferences can only be made from samples which utilize random sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Population

A

A well-defined collection of objects that we are interested in drawing conclusions about

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Sample

A

A subset of objects from the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Describe the two types of random sampling

A

Simple Random Sample (SRS) → All individuals have an equal chance of being selected

Stratified Random Sample → Individuals selected within groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Self-selection

A

sampling using volunteers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Convenience sampling

A

more common but allows for a higher probability of bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Control Groups

A

Gives a baseline for comparison with test groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Placebo Effect

A

Individuals may respond favorably even when given a treatment that is known to be ineffective, opposite is nocebo effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Blinding

A

The treatment assignment is kept secret from the experimental subject

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Double Blinding

A

The treatment assignment is kept secret from both the experimental subject and the individuals measuring the response

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Sampling Error

A

Discrepancy between the sample and population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Nonresponse bias

A

Not everyone who is asked to participate agrees to do so, and nonresponders differ from responders

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What are some ways to display categorical variables in graphic form?

A

Bar plots and pie charts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Give a general description of a histogram

A

The range of observations is divided into subintervals (usually of equal size)
The frequency of observations is plotted as a bar on the y-axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What three aspects of the data are shown by histograms?

A

Center, Outliers, and General Shape

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What would data look like that is symmetric or left/right skewed?

A

Symmetric or skewed - shape of the distribution
Both halves are a reflection of each other
Can be left or right skewed
One side has a tail (named side), one side has the bulk of the data

34
Q

Unimodal/Multimodal

A

number of peaks in the distribution

35
Q

What is a quartile?

A

The 25th and 75th percentiles are the first (Q1) and third quartiles (Q3)

36
Q

How do you make a box plot?

A

The median of the observations is denoted by a thick line
A box is drawn from the Q1 to the Q3
Whiskers extend to the largest and smallest observation
Outliers are shown as stars

37
Q

What is a five-star summary?

A

The set of numbers that make up the → minimum, Q1, median, Q3, maximum

38
Q

Observations

A

The categorical or quantitative measurements made (data)

39
Q

Frequency

A

A count of observations that fall into a certain category

40
Q

Statistic (general)

A

A numerical measure calculated from the observations; sample characteristic

41
Q

(2) measures of center

A

mean or median

42
Q

(3) measures of spread

A

variance, standard deviation, IQR

43
Q

What is the symbol for mean? What is its strength/weakness?

A

y with a horizontal line over it

efficient in using all data

44
Q

What is the symbol for median? What is its strength/weakness?

A

M - population median
m (italics) - sample median

resistant to outliers

45
Q

Percentile

A

The pth percentile of the observations is the observation value such that p% of the observations are smaller than it

46
Q

IQR or Interquartile Range

A

Q3 - Q1

Measures dispersion

47
Q

What is the symbol for variance?

A

σ^2 - population variance

s^2 (italics) - sample variance

48
Q

Standard Deviation (formula, will not need to calculate) Why is SD better than Variance?

A

the square root of ………. 1/(n-1) times the sum of the squared differences between each value and the mean
(The average distance of each value from the mean)
same units as the data, variance is squared

49
Q

What is the symbol for standard deviation?

A

σ - population standard deviation

s (italics) - sample standard deviation

50
Q

How is an ‘outlier’ defined?

A

An observation is considered an outlier if it is smaller than Q1 - 1.5(IQR) or larger than Q3 + 1.5(IQR)

51
Q

Parameter

A

population characteristic

52
Q

(Box-plots) What is the meaning of long-tailed or short-tailed?

A

Long-Tailed → Spike in data

Short-Tailed → Data evenly spread

53
Q

What are the proper graphs (2) to show the relationship between two categorical variables?

A

Frequency or Relative Frequency Table
Row percentages displayed, each cell is the count for that cell divided by the row total

Stacked Relative Frequency Bar Chart
Percent within levels of ____

54
Q

What are the proper graphs (2) to show the relationship between a quantitative and a categorical variable?

A

Side by Side Box Plots

Side by Side Dotplots

55
Q

What is the proper graph to show the relationship between two quantitative variables?

A

Explanatory variable on x-axis and response on the y-axis

56
Q

What is the standard notation for a normal distribution?

A

Y ~ N(μ, σ)
μ is mean
σ is SD

57
Q

How can the mean and the SD affect the appearance of a graph of normal distribution?

A

Mean (μ) → Determines the center

SD (σ) → Determines the spread or height/width

58
Q

What does it mean to standardize a data point with respect to the normal curve?

A

Rescaling each normally distributed variable to make them equivalent with respect to the area under the curve

59
Q

What is the equation to standardize a data point with respect to the normal curve?

A

Subtract the mean and divide by the standard deviation to yield # of SDs from the mean (Z)

60
Q

Using a normal distribution table, how can you convert from a data point to the proportion of data above or below that point?

A

Convert to Z value
The exact Z is the value on the leftmost column plus the value on the topmost row
→ Area/Proportion below Z = table value
→ Area/Proportion above Z = 1 - (table value)

61
Q

Using a normal distribution table, how can you convert two data points to the proportion of data between those points?

A

Convert to Z value

→ Area/Proportion between ZA and ZB = table value B - table value A

62
Q

Using a normal distribution table, how can you convert a percentile to the corresponding cutoff point?

A

Convert to Z by finding proportion in table then the corresponding Z-value
Convert Z-value back to Y using the standardization equation

63
Q

What are the four ways to assess the normality of data?

A

Histogram, Normal Curve, Probability Tables, Normality Tests

64
Q

How do you assess normality using a histogram?

A

Plot the data into a histogram and superimpose a normal curve

65
Q

How do you assess normality using a normal curve?

A

Compare data with 68-95-99.7 rules

66
Q

How do you assess normality using probability tables?

A

Comparison of observed versus expected left tail percentages

67
Q

How do you assess normality using the Shapiro-Wilk test?

A

Yields a p-value, above .1 is no evidence for non-normality

68
Q

Sampling Variability

A

Variability among random samples from the same population

69
Q

Sampling Distribution

A

A probability distribution that characterizes some aspect of sampling variability

70
Q

Cutoff for CLT

A

A sample size over 30 allows for the use of the CLT (Central Limit Theorem)

71
Q

Standard Error (defn and formula)

A

The uncertainty in the mean of the sample data due to sampling characteristics, equal to the SD of X-bar

σ (or s) over √n

72
Q

Bias

A

Estimates are systematically away from center, reduced by random sampling

73
Q

Variability

A

Spread of estimates, reduced by increasing sample size

74
Q

Confidence Level

A

The percentage of samples that will produce confidence intervals containing μ

75
Q

Margin of Error (MOE)

A

Half the width of the confidence interval, equal to t(alpha/2, n-1) * s/√n

76
Q

Critical Value

A

The normal tail probability corresponding to Z𝞪/2

The z-value corresponding to the cutoffs for the confidence interval, can be converted to Y to find the values for the confidence interval

77
Q

What is the notation for a normal curve created for a sample mean (SD known)?

A

X-bar ~ Normal(μ, σ/√n)

78
Q

How do you find the confidence interval for a population mean calculated from sample means when SD is known?

A

100(1-𝞪)% → Zalpha/2 → Critical Value = upper bound on confidence interval (if +)

Mean +/- Critical Value* Standard Error (standard deviation/sample size) = Confidence Intervals

79
Q

How do you find the confidence interval for a population mean calculated from sample means using only estimated components?

A

X-bar +/- t(alpha/2, n-1) * s/√n

X-bar is sample mean, s is the sample standard deviation, n is the sample size

t(alpha/2, n-1) is the critical value of Student’s t-distribution with n-1 degrees of freedom for tail probability 𝞪/2

80
Q

How do you calculate required sample size for a 95% confidence interval using sample standard deviation and desired margin of error?

A

Margin of Error depends on 𝞪 and n, if 𝞪 is .05 then t(.025,n-1)=2 and the number of samples (n) is equal to (2s/MOE) squared

Plug in desired MOE and sample s to get recommended n, then round up

Or solve for t(alpha/2, n-1) * s/√n = MOE with an estimated t-value*

*same thing, different equation

81
Q

What are the assumptions when creating a one-sample confidence interval ?

A

Data must be regarded as a random sample from a large population
Observations must be independent of each other
If n is small, the population distribution must be approximately normal

82
Q

What measure of spread is resistant to outliers?

A

IQR