Exam 1 - Sept 20 Flashcards

Question 1

Q

What are the four steps of the experimental process?

Answer

A

Formulate Theory → Collect Data → Summarize Results → Interpret Results and Make Decisions

Question 2

Q

Variable

Answer

A

An observed category (label) or quantity (number) in an experiment that may “vary” for different individuals

Question 3

Q

Categorical variable

Answer

A

Individuals are classified into groups or categories

Question 4

Q

Quantitative variable

Answer

A

A numerical quantity

Question 5

Q

Explanatory variable

Answer

A

Variable that is thought to affect (“explain”) another variable

Question 6

Q

Response Variable

Answer

A

Variable that is thought to be affected by (“respond to”) the explanatory variable(s)

Question 7

Q

Inference

Answer

A

A conclusion that patterns from data can be extended to some broader context

Question 8

Q

Statistical Inference

Answer

A

Justified by a probability model linking the data to the broader context; Incorporates measure of uncertainty

Question 9

Q

Causal Inference

Answer

A

Enables us to establish a cause and effect relationship

Question 10

Q

Population Inference

Answer

A

About population characteristics, Expand results from study to larger population

Question 11

Q

Describe the probability model of randomization. What kind of inferences can be made when it is used?

Answer

A

Assigning experimental units (subjects) to treatment groups using a chance mechanism
Causal inference

Question 12

Q

Describe the probability model of random sampling. What kind of inferences can be made when it is used?

Answer

A

Selecting experimental units (subjects) to be in a sample using a chance mechanism
Population inference

Question 13

Q

Anecdotal Evidence

Answer

A

A short story or example of an interesting event that could lead to scientific investigation, but does not establish a scientific theory

Question 14

Q

Observational Study

Answer

A

A study in which the group status (e.g., gender) is beyond the control of the researcher; results may be due to confounding variables

Question 15

Q

Randomized Experiments

Answer

A

An experiment in which randomization is done to assign subjects to groups; accounts for confounding variables

Question 16

Q

Main Lesson for Causal Inferences

Answer

A

causal inferences can be made from randomized experiments, but not observational studies

Question 17

Q

Confounding Variables

Answer

A

variables that are related to both the group membership and the outcome

Question 18

Q

Main Lesson for Population Inferences

Answer

A

population inferences can only be made from samples which utilize random sampling

Question 19

Q

Population

Answer

A

A well-defined collection of objects that we are interested in drawing conclusions about

Question 20

Q

Sample

Answer

A

A subset of objects from the population

Question 21

Q

Describe the two types of random sampling

Answer

A

Simple Random Sample (SRS) → All individuals have an equal chance of being selected

Stratified Random Sample → Individuals selected within groups

Question 22

Q

Self-selection

Answer

A

sampling using volunteers

Question 23

Q

Convenience sampling

Answer

A

more common but allows for a higher probability of bias

Question 24

Q

Control Groups

Answer

A

Gives a baseline for comparison with test groups

Question 25

Q

Placebo Effect

Answer

A

Individuals may respond favorably even when given a treatment that is known to be ineffective, opposite is nocebo effect

Question 26

Q

Blinding

Answer

A

The treatment assignment is kept secret from the experimental subject

Question 27

Q

Double Blinding

Answer

A

The treatment assignment is kept secret from both the experimental subject and the individuals measuring the response

Question 28

Q

Sampling Error

Answer

A

Discrepancy between the sample and population

Question 29

Q

Nonresponse bias

Answer

A

Not everyone who is asked to participate agrees to do so, and nonresponders differ from responders

Question 30

Q

What are some ways to display categorical variables in graphic form?

Answer

A

Bar plots and pie charts

Question 31

Q

Give a general description of a histogram

Answer

A

The range of observations is divided into subintervals (usually of equal size)
The frequency of observations is plotted as a bar on the y-axis

Question 32

Q

What three aspects of the data are shown by histograms?

Answer

A

Center, Outliers, and General Shape

Question 33

Q

What would data look like that is symmetric or left/right skewed?

Answer

A

Symmetric or skewed - shape of the distribution
Both halves are a reflection of each other
Can be left or right skewed
One side has a tail (named side), one side has the bulk of the data

Question 34

Q

Unimodal/Multimodal

Answer

A

number of peaks in the distribution

Question 35

Q

What is a quartile?

Answer

A

The 25th and 75th percentiles are the first (Q1) and third quartiles (Q3)

Question 36

Q

How do you make a box plot?

Answer

A

The median of the observations is denoted by a thick line
A box is drawn from the Q1 to the Q3
Whiskers extend to the largest and smallest observation
Outliers are shown as stars

Question 37

Q

What is a five-star summary?

Answer

A

The set of numbers that make up the → minimum, Q1, median, Q3, maximum

Question 38

Q

Observations

Answer

A

The categorical or quantitative measurements made (data)

Question 39

Q

Frequency

Answer

A

A count of observations that fall into a certain category

Question 40

Q

Statistic (general)

Answer

A

A numerical measure calculated from the observations; sample characteristic

Question 41

Q

(2) measures of center

Answer

A

mean or median

Question 42

Q

(3) measures of spread

Answer

A

variance, standard deviation, IQR

Question 43

Q

What is the symbol for mean? What is its strength/weakness?

Answer

A

y with a horizontal line over it

efficient in using all data

Question 44

Q

What is the symbol for median? What is its strength/weakness?

Answer

A

M - population median
m (italics) - sample median

resistant to outliers

Question 45

Q

Percentile

Answer

A

The pth percentile of the observations is the observation value such that p% of the observations are smaller than it

Question 46

Q

IQR or Interquartile Range

Answer

A

Q3 - Q1

Measures dispersion

Question 47

Q

What is the symbol for variance?

Answer

A

σ^2 - population variance

s^2 (italics) - sample variance

Question 48

Q

Standard Deviation (formula, will not need to calculate) Why is SD better than Variance?

Answer

A

the square root of ………. 1/(n-1) times the sum of the squared differences between each value and the mean
(The average distance of each value from the mean)
same units as the data, variance is squared

Question 49

Q

What is the symbol for standard deviation?

Answer

A

σ - population standard deviation

s (italics) - sample standard deviation

Question 50

Q

How is an ‘outlier’ defined?

Answer

A

An observation is considered an outlier if it is smaller than Q1 - 1.5(IQR) or larger than Q3 + 1.5(IQR)

Question 51

Q

Parameter

Answer

A

population characteristic

Question 52

Q

(Box-plots) What is the meaning of long-tailed or short-tailed?

Answer

A

Long-Tailed → Spike in data

Short-Tailed → Data evenly spread

Question 53

Q

What are the proper graphs (2) to show the relationship between two categorical variables?

Answer

A

Frequency or Relative Frequency Table
Row percentages displayed, each cell is the count for that cell divided by the row total

Stacked Relative Frequency Bar Chart
Percent within levels of ____

Question 54

Q

What are the proper graphs (2) to show the relationship between a quantitative and a categorical variable?

Answer

A

Side by Side Box Plots

Side by Side Dotplots

Question 55

Q

What is the proper graph to show the relationship between two quantitative variables?

Answer

A

Explanatory variable on x-axis and response on the y-axis

Question 56

Q

What is the standard notation for a normal distribution?

Answer

A

Y ~ N(μ, σ)
μ is mean
σ is SD

Question 57

Q

How can the mean and the SD affect the appearance of a graph of normal distribution?

Answer

A

Mean (μ) → Determines the center

SD (σ) → Determines the spread or height/width

Question 58

Q

What does it mean to standardize a data point with respect to the normal curve?

Answer

A

Rescaling each normally distributed variable to make them equivalent with respect to the area under the curve

Question 59

Q

What is the equation to standardize a data point with respect to the normal curve?

Answer

A

Subtract the mean and divide by the standard deviation to yield # of SDs from the mean (Z)

Question 60

Q

Using a normal distribution table, how can you convert from a data point to the proportion of data above or below that point?

Answer

A

Convert to Z value
The exact Z is the value on the leftmost column plus the value on the topmost row
→ Area/Proportion below Z = table value
→ Area/Proportion above Z = 1 - (table value)

Question 61

Q

Using a normal distribution table, how can you convert two data points to the proportion of data between those points?

Answer

A

Convert to Z value

→ Area/Proportion between ZA and ZB = table value B - table value A

Question 62

Q

Using a normal distribution table, how can you convert a percentile to the corresponding cutoff point?

Answer

A

Convert to Z by finding proportion in table then the corresponding Z-value
Convert Z-value back to Y using the standardization equation

Question 63

Q

What are the four ways to assess the normality of data?

Answer

A

Histogram, Normal Curve, Probability Tables, Normality Tests

Question 64

Q

How do you assess normality using a histogram?

Answer

A

Plot the data into a histogram and superimpose a normal curve

Answer 65

A

Compare data with 68-95-99.7 rules

Answer 66

A

Comparison of observed versus expected left tail percentages

Answer 67

A

Yields a p-value, above .1 is no evidence for non-normality

Answer 68

A

Variability among random samples from the same population

Answer 69

A

A probability distribution that characterizes some aspect of sampling variability

Answer 70

A

A sample size over 30 allows for the use of the CLT (Central Limit Theorem)

Answer 71

A

The uncertainty in the mean of the sample data due to sampling characteristics, equal to the SD of X-bar

σ (or s) over √n

Answer 72

A

Estimates are systematically away from center, reduced by random sampling

Answer 73

A

Spread of estimates, reduced by increasing sample size

Answer 74

A

The percentage of samples that will produce confidence intervals containing μ

Answer 75

A

Half the width of the confidence interval, equal to t(alpha/2, n-1) * s/√n

Answer 76

A

The normal tail probability corresponding to Z𝞪/2

The z-value corresponding to the cutoffs for the confidence interval, can be converted to Y to find the values for the confidence interval

Answer 77

A

X-bar ~ Normal(μ, σ/√n)

Answer 78

A

100(1-𝞪)% → Zalpha/2 → Critical Value = upper bound on confidence interval (if +)

Mean +/- Critical Value* Standard Error (standard deviation/sample size) = Confidence Intervals

Answer 79

A

X-bar +/- t(alpha/2, n-1) * s/√n

X-bar is sample mean, s is the sample standard deviation, n is the sample size

t(alpha/2, n-1) is the critical value of Student’s t-distribution with n-1 degrees of freedom for tail probability 𝞪/2

Answer 80

A

Margin of Error depends on 𝞪 and n, if 𝞪 is .05 then t(.025,n-1)=2 and the number of samples (n) is equal to (2s/MOE) squared

Plug in desired MOE and sample s to get recommended n, then round up

Or solve for t(alpha/2, n-1) * s/√n = MOE with an estimated t-value*

*same thing, different equation

Answer 81

A

Data must be regarded as a random sample from a large population
Observations must be independent of each other
If n is small, the population distribution must be approximately normal